hanzoai/engine

Hanzo Engine

Rust-native LLM and embedding inference engine

High-throughput, low-latency inference for transformer LLMs and embedding models. Built in Rust for memory safety, predictable performance, and deployment to any hardware target.

Get started View on GitHub

Inference without the overhead

One engine for LLM generation and embedding generation. No Python runtime, no GIL, no cold-start tax.

Rust-native

Pure Rust runtime. No Python interpreter, no FFI overhead. Predictable latency under load.

High throughput

Continuous batching, paged attention, and quantized kernels. Saturate your GPU or CPU.

LLM + embeddings

One binary serves both generation and embedding workloads. Share memory, share weights.

Quantization built-in

GGUF, AWQ, and GPTQ formats supported out of the box. Run 70B models on consumer GPUs.

Any hardware

CUDA, Metal, ROCm, and CPU backends. Same engine, same API, every target.

Memory safe

Rust's ownership model eliminates entire classes of CVEs. Production-grade reliability.

Get started with Hanzo Engine

Read the docs View on GitHub

Open source

License: Apache-2.0hanzoai/engine

Get Engine

AI inference engine

Deploy to Cloud Self-host