← Back

Metaprogramming Model (Preview)

Build with frameworks you love.
Ship one native binary.

Daisyflow sits on top of PyTorch, NumPy, and C/C++. It captures your existing code as a unified dataflow graph, optimizes across framework boundaries, and exports the entire application as a single native library.

Quick start

$pip install docc-compiler
Browse full docs

Your application components

ResNet-50
Preprocess
Solver
Tracker

Daisyflow

Metaprogramming Model

Optimizing Compiler Collection

PyTorch Compiler
NumPy Compiler
C/C++ Compiler

Native artifact

pipeline.so

No runtime · Zero dependencies · Any hardware

Build · Export · Run

app.py
import daisyflow as df
from torchvision.models import resnet50
 
def preprocess(image: np.ndarray) -> np.ndarray:
return (image - image.mean()) / image.std()
 
# Compose the application graph
flow = df.Flow("my-pipeline")
flow.numpy(preprocess)
flow.torch(resnet50)
flow.native("jacobi", library="solver.so", name="solve")
flow.connect("preprocess", "vision", "solve")

Mix frameworks freely

Compose PyTorch models, NumPy functions, and native C/C++ code in a single application graph. Each component is compiled independently, then linked together.

One native artifact

Export the entire graph as a standalone .so or .dll. No Python runtime, no framework dependencies, no container overhead. Just one binary that runs at native speed.

Retarget in one flag

Change the target parameter and Daisyflow recompiles every kernel for the new hardware. From x86 to ARM to CUDA, one codebase covers all your deployment targets.

Why Daisyflow?

Framework silos

PyTorch, NumPy, and C++ each have their own compilation and deployment model. No tool sees the full picture. Your application is split into islands that can never be jointly optimized.

The rewrite tax

Prototype in Python. Rewrite in C++ for edge and embedded. Two codebases, two teams, twice the bugs. Every iteration cycle means reconciling diverging implementations.

Dependency hell

Ship Python, CUDA runtime, libtorch, NumPy, and a dozen shared libraries. One version mismatch and production breaks. Containers hide the problem, they do not solve it.

Hardware lock-in

Write CUDA for NVIDIA. Rewrite for ROCm. Again for ARM. Your deployment code is tailored to specific hardware. Every new chip means another porting project.

Performance left on the table

When frameworks run in separate processes, the compiler never sees the full dataflow. Data copies between stages dominate runtime. Optimization stops at every framework boundary.

No single artifact

Containers, orchestration, glue scripts. What should be one pipeline becomes a distributed system. Debugging, profiling, and reproducibility all suffer.

Available today

Use the compilers directly

While Daisyflow is in development, each compiler in the Daisytuner Optimizing Compiler Collection can be used standalone for developing and optimizing individual components.

torch.compile(backend="docc")

Drop-in backend for PyTorch 2.0. Your existing models compile into native kernels with zero code changes.

@native annotation

Annotate any NumPy function to compile it into a native kernel. Array operations are vectorized and parallelized automatically.

Clang drop-in for C/C++

Use docc as a drop-in replacement for clang. OpenMP pragmas are compiled with Transfer Tuning for each target.

model.py
import torch
from torchvision.models import resnet50
 
# One line to compile with docc
model = resnet50()
model = torch.compile(model, backend="docc", target="rocm")
 
# Use as usual - now runs as optimized native kernels
output = model(torch.randn(1, 3, 224, 224))

Core technology

Transfer Tuning

Every compiler in docc is cloud-connected. Instead of fixed heuristics or manual kernel writing, the compiler queries an optimization database for the best-known configuration on your target hardware, then refines it. New results feed back into the database, so performance improves over time.

Optimization space

AI-guided search

UnexploredFrom databaseBest found

Cloud-connected compilers

Each compilation queries a shared optimization database. Kernels tuned on one machine accelerate compilations everywhere.

No manual kernel writing

Your PyTorch, NumPy, or C/C++ code is automatically lowered into optimized compute kernels. No CUDA, no intrinsics, no rewrites.

Next-gen architectures

Transfer Tuning adapts to any hardware with a compiler backend. From x86 and ARM to GPUs, RISC-V, and photonic processors.

Intel

Intel

AMD

AMD

ARM

ARM

NVIDIA

NVIDIA

Tenstorrent

Tenstorrent

Q.ANT

Q.ANT

Track performance on every commit

Daisyflow pairs with our CB/CT platform. Every compilation is benchmarked on real hardware, regressions are caught on your pull request, and autotuning suggestions land before you merge.

Learn more about CB/CT

Write frameworks. Ship native.

Quick start

$pip install docc-compiler
Browse full docs