GA
Open Source
Free Tier

Hanzo Serving

Production model inference

Deploy models to production with KServe. Auto-scaling, canary deployments, A/B testing.

Serving
Training in progress
Epoch 15/50
Loss vs Epochs
Train
Val
Loss
0.0234
Accuracy
94.2%
LR
1e-4
GPU
A100

Features

Everything you need to get started

Auto-scaling
Canary deployments
A/B testing
GPU inference
Batching
SDKs & Libraries

Official Serving SDKs

Use our official SDKs to integrate Serving into your application

Hanzo Python SDK

PyPI
pip install hanzoai

Hanzo TypeScript SDK

npm
npm install @hanzo/ai

Hanzo Go SDK

Go Modules
go get github.com/hanzoai/go-sdk

Hanzo Rust SDK

crates.io
cargo add hanzoai
Community

Join the Serving Community

Get help, share ideas, and contribute to the project

Want to Contribute?

We welcome contributions of all kinds: bug reports, feature requests, documentation improvements, and code contributions.

Read our Contributing Guide
Built on Open Source

Powered by vLLM

32k+

Hanzo Serving is built on top of vLLM, an open-source project.High-throughput and memory-efficient inference engine for LLMs.

Licensed under Apache-2.0

We're grateful to the vLLM maintainers and community for their incredible work.

Ready to get started with Serving?

Deploy in minutes with Hanzo Cloud or self-host with our open-source release.