Build fast software, independent of hardware
Daisytuner provides the optimization platform and developer tools to run software across CPUs, GPUs, and accelerators without rewriting a single line.
Build with Frameworks You Love


Optimization Platform
Compile and optimize software
across hardware
Run on CPUs, GPUs, and accelerators
Your application, fast and portable
Compatible with major frameworks. Fast optimizations across processors.
Zero runtime dependencies.
Image Classification on AMD ROCm
10% faster than vendor-provided PyTorch. ResNet-18 from torchvision compiled and optimized for ROCm.
Object Detection on Q.ANT Gen 2 NPU
Computing with light. Faster R-CNN with ResNet-50 backbone compiled directly from PyTorch to photonic hardware.
OpenFOAM on Tenstorrent Blackhole
Computational fluid dynamics on RISC-V. C++ source compiled directly with FP32 and BF16 support.
LLM Inference Engine
Serve large language models with optimized attention and KV-cache on any accelerator.
Sensor fusion, video analytics, and more
Daisytuner supports applications built on PyTorch, NumPy, or C/C++. Browse the docs to see all supported workloads and hardware targets.
Why Daisytuner
The Optimization Platform
Write your code once. Daisytuner compiles it into optimized native code, retargets it across processors, and packages it for deployment.
One native artifact, no runtime stack
The compiler packages your entire application into a single native library with generated bindings for Python, C++, and other languages. No interpreter, no framework, no system dependencies.
Runtime stack
Native artifact
Faster than hand-tuned frameworks
The compiler analyzes your full application end-to-end and generates optimized native code. Not just at the kernel level, but across the entire data flow including pre- and post-processing.
Switch processors, keep your code
Retarget the same application to GPUs, RISC-V accelerators, or photonic processors. One source, compiled for each target. No vendor-specific rewrites.
Cheaper GPUs
Same code, different vendor. Cut hardware costs without rewriting code.
Modern RISC-V accelerators
Run on next-generation silicon designed for AI and HPC workloads.
Compute at the speed of light
Target photonic processors for fundamentally new performance frontiers.
How It Works
From your code to optimized deployment
You write the code. We handle compilation, optimization, benchmarking, and deployment.
Start with your code
You write your application using the frameworks you already know: PyTorch, NumPy, C/C++. Nothing changes about how you build software.
Application components
Daisyflow captures the graph
Your multi-framework application is captured as a single dataflow graph. Daisyflow sees across framework boundaries, so the full pipeline can be optimized as one unit.

Daisyflow
Dataflow graph
connected across frameworks
docc compiles and optimizes
Each node in the graph is compiled into an optimized native kernel using Transfer Tuning. The compilers query a cloud-connected optimization database to find the best configuration for your target hardware.

docc compilers
AI-optimized compilation
Optimization space
AI-guided search
Track performance on every change
Each pull request runs on our benchmarking infrastructure. You get performance numbers, regression alerts, and bottleneck analysis right in GitHub.
Performance dashboard
Deploy with zero overhead
The output is a single native artifact with no runtime dependencies. Ship it to your own machines or run jobs on our cloud with AMD, Tenstorrent, and more.
Native artifact
app.so
No runtime. Small deployment.
Self-hosted
Your hardware
Runs
Our cloud
Beyond GPU
Production workloads on next-generation processors
Object detection on Q.ANT NPU Gen 2
Daisytuner compiled and deployed an object detection model directly from PyTorch onto Q.ANT's photonic processor. The first time a standard ML framework has targeted photonic hardware.
PyTorch to Photonic: Standard model format compiled directly to Q.ANT hardware.
Full Pipeline: Pre-processing, inference, and post-processing run end-to-end.
No Custom Code: Our platform handled all hardware-specific translation automatically.
OpenFOAM on Tenstorrent Blackhole
In collaboration with Tenstorrent, we cross-compiled the industry-standard CFD toolkit OpenFOAM to Tenstorrent's RISC-V based Blackhole accelerator.
Zero Code Changes: Original C++ source code compiled directly with docc.
Automatic Optimization: docc identifies offloadable sections and moves them to the accelerator.
Full Portability: Seamlessly switch between Wormhole, Blackhole, or other vendors.
Start building on any processor
Install the tools, point at your code, and get optimized native artifacts in minutes.