GA
Open Source
Free Tier
Hanzo Serving
Production model inference
Deploy models to production with KServe. Auto-scaling, canary deployments, A/B testing.
Serving
Training in progress
Epoch 15/50Loss vs Epochs
Train
Val
Loss
0.0234
Accuracy
94.2%
LR
1e-4
GPU
A100
Serving
Training in progress
Epoch 15/50Loss vs Epochs
Train
Val
Loss
0.0234
Accuracy
94.2%
LR
1e-4
GPU
A100
Features
Everything you need to get started
Auto-scaling
Canary deployments
A/B testing
GPU inference
Batching
SDKs & Libraries
Official Serving SDKs
Use our official SDKs to integrate Serving into your application
Community
Join the Serving Community
Get help, share ideas, and contribute to the project
Want to Contribute?
We welcome contributions of all kinds: bug reports, feature requests, documentation improvements, and code contributions.
Read our Contributing GuideRelated Products
More from Hanzo Ml
Built on Open Source
Powered by vLLM
32k+Hanzo Serving is built on top of vLLM, an open-source project.High-throughput and memory-efficient inference engine for LLMs.
Licensed under Apache-2.0
We're grateful to the vLLM maintainers and community for their incredible work.
Ready to get started with Serving?
Deploy in minutes with Hanzo Cloud or self-host with our open-source release.