Zero-Overhead Checkpointing for
Any GPU at Any Scale.
The async I/O engine that makes checkpointing free. 16+ GB/s effective throughput with 10x deduplication on NVIDIA H100/B200 and AMD MI300X. Compiled native binaries for Linux & Windows.
Effective write speed for fine-tuning workloads. Near-zero I/O overhead for incremental checkpointing across all supported GPUs.
Automatic block-level deduplication for LoRA, MoE, and optimizer states. Reduces storage TCO by 90%.
NVIDIA: Native support for Turing, Ampere, Hopper & Blackwell. Tested on RTX 2070S, A60, RTX 3080, RTX 5080, H100, B200. AMD: MI300X (gfx942, ROCm 6.x). Precompiled fat binaries (SM75→SM100+PTX).