Metaprogramming Model (Preview)
Build with frameworks you love.
Ship one native binary.
Daisyflow sits on top of PyTorch, NumPy, and C/C++. It captures your existing code as a unified dataflow graph, optimizes across framework boundaries, and exports the entire application as a single native library.
Your application components

Daisyflow
Metaprogramming Model
Optimizing Compiler Collection
C/C++ CompilerNative artifact
pipeline.so
No runtime · Zero dependencies · Any hardware
Build · Export · Run
Mix frameworks freely
Compose PyTorch models, NumPy functions, and native C/C++ code in a single application graph. Each component is compiled independently, then linked together.
One native artifact
Export the entire graph as a standalone .so or .dll. No Python runtime, no framework dependencies, no container overhead. Just one binary that runs at native speed.
Retarget in one flag
Change the target parameter and Daisyflow recompiles every kernel for the new hardware. From x86 to ARM to CUDA, one codebase covers all your deployment targets.
Why Daisyflow?
Framework silos
PyTorch, NumPy, and C++ each have their own compilation and deployment model. No tool sees the full picture. Your application is split into islands that can never be jointly optimized.
The rewrite tax
Prototype in Python. Rewrite in C++ for edge and embedded. Two codebases, two teams, twice the bugs. Every iteration cycle means reconciling diverging implementations.
Dependency hell
Ship Python, CUDA runtime, libtorch, NumPy, and a dozen shared libraries. One version mismatch and production breaks. Containers hide the problem, they do not solve it.
Hardware lock-in
Write CUDA for NVIDIA. Rewrite for ROCm. Again for ARM. Your deployment code is tailored to specific hardware. Every new chip means another porting project.
Performance left on the table
When frameworks run in separate processes, the compiler never sees the full dataflow. Data copies between stages dominate runtime. Optimization stops at every framework boundary.
No single artifact
Containers, orchestration, glue scripts. What should be one pipeline becomes a distributed system. Debugging, profiling, and reproducibility all suffer.
Available today
Use the compilers directly
While Daisyflow is in development, each compiler in the Daisytuner Optimizing Compiler Collection can be used standalone for developing and optimizing individual components.
torch.compile(backend="docc")
Drop-in backend for PyTorch 2.0. Your existing models compile into native kernels with zero code changes.
@native annotation
Annotate any NumPy function to compile it into a native kernel. Array operations are vectorized and parallelized automatically.

Clang drop-in for C/C++
Use docc as a drop-in replacement for clang. OpenMP pragmas are compiled with Transfer Tuning for each target.
Core technology
Transfer Tuning
Every compiler in docc is cloud-connected. Instead of fixed heuristics or manual kernel writing, the compiler queries an optimization database for the best-known configuration on your target hardware, then refines it. New results feed back into the database, so performance improves over time.
Optimization space
AI-guided search
Cloud-connected compilers
Each compilation queries a shared optimization database. Kernels tuned on one machine accelerate compilations everywhere.
No manual kernel writing
Your PyTorch, NumPy, or C/C++ code is automatically lowered into optimized compute kernels. No CUDA, no intrinsics, no rewrites.
Next-gen architectures
Transfer Tuning adapts to any hardware with a compiler backend. From x86 and ARM to GPUs, RISC-V, and photonic processors.
Intel
AMD
ARM
NVIDIA
Tenstorrent
Q.ANT
Track performance on every commit
Daisyflow pairs with our CB/CT platform. Every compilation is benchmarked on real hardware, regressions are caught on your pull request, and autotuning suggestions land before you merge.