Build fast software, independent of hardware

Daisytuner provides the optimization platform and developer tools to run software across CPUs, GPUs, and accelerators without rewriting a single line.

Quick start

$pip install docc-compiler
Browse full docs

Build with Frameworks You Love

PyTorch
NumPy
OpenMP

Optimization Platform

Compile and optimize software
across hardware

NVIDIA
AMD
ARM
Q.ANT

Run on CPUs, GPUs, and accelerators

Your application, fast and portable

Compatible with major frameworks. Fast optimizations across processors.
Zero runtime dependencies.

Sensor fusion, video analytics, and more

Daisytuner supports applications built on PyTorch, NumPy, or C/C++. Browse the docs to see all supported workloads and hardware targets.

Why Daisytuner

The Optimization Platform

Write your code once. Daisytuner compiles it into optimized native code, retargets it across processors, and packages it for deployment.

01Deployment

One native artifact, no runtime stack

The compiler packages your entire application into a single native library with generated bindings for Python, C++, and other languages. No interpreter, no framework, no system dependencies.

Runtime stack

Python
Runtime
Libraries
System

Native artifact

Bindings
Native Library
02Performance

Faster than hand-tuned frameworks

The compiler analyzes your full application end-to-end and generates optimized native code. Not just at the kernel level, but across the entire data flow including pre- and post-processing.

End-to-end runtime

Daisytuner
PyTorch
NumPy
OpenMP Supportmore in docs
03Portability

Switch processors, keep your code

Retarget the same application to GPUs, RISC-V accelerators, or photonic processors. One source, compiled for each target. No vendor-specific rewrites.

GPU

Cheaper GPUs

Same code, different vendor. Cut hardware costs without rewriting code.

NVIDIAAMD
RISC-V Accelerator

Modern RISC-V accelerators

Run on next-generation silicon designed for AI and HPC workloads.

Tenstorrent
Photonic

Compute at the speed of light

Target photonic processors for fundamentally new performance frontiers.

Q.ANT

How It Works

From your code to optimized deployment

You write the code. We handle compilation, optimization, benchmarking, and deployment.

01

Start with your code

You write your application using the frameworks you already know: PyTorch, NumPy, C/C++. Nothing changes about how you build software.

Application components

Camera
YOLOv8
Resize
Object Tracker
02

Daisyflow captures the graph

Your multi-framework application is captured as a single dataflow graph. Daisyflow sees across framework boundaries, so the full pipeline can be optimized as one unit.

Daisyflow

Dataflow graph

Camera
YOLOv8
Resize
Object Tracker

connected across frameworks

03

docc compiles and optimizes

Each node in the graph is compiled into an optimized native kernel using Transfer Tuning. The compilers query a cloud-connected optimization database to find the best configuration for your target hardware.

docc compilers

AI-optimized compilation

Optimization space

AI-guided search

04

Track performance on every change

Each pull request runs on our benchmarking infrastructure. You get performance numbers, regression alerts, and bottleneck analysis right in GitHub.

Performance dashboard

PR #142+12% throughput
PR #143-8% latency regression
BottleneckMemory-bound in Resize kernel
05

Deploy with zero overhead

The output is a single native artifact with no runtime dependencies. Ship it to your own machines or run jobs on our cloud with AMD, Tenstorrent, and more.

Native artifact

app.so

No runtime. Small deployment.

Self-hosted

Your hardware

Runs

Our cloud

Beyond GPU

Production workloads on next-generation processors

First from PyTorch
Q.ANT

Object detection on Q.ANT NPU Gen 2

Daisytuner compiled and deployed an object detection model directly from PyTorch onto Q.ANT's photonic processor. The first time a standard ML framework has targeted photonic hardware.

  • PyTorch to Photonic: Standard model format compiled directly to Q.ANT hardware.

  • Full Pipeline: Pre-processing, inference, and post-processing run end-to-end.

  • No Custom Code: Our platform handled all hardware-specific translation automatically.

World firstSC '25
Tenstorrent

OpenFOAM on Tenstorrent Blackhole

In collaboration with Tenstorrent, we cross-compiled the industry-standard CFD toolkit OpenFOAM to Tenstorrent's RISC-V based Blackhole accelerator.

  • Zero Code Changes: Original C++ source code compiled directly with docc.

  • Automatic Optimization: docc identifies offloadable sections and moves them to the accelerator.

  • Full Portability: Seamlessly switch between Wormhole, Blackhole, or other vendors.

Start building on any processor

Install the tools, point at your code, and get optimized native artifacts in minutes.

$pip install docc-compiler