Hanzo Serving
Production model inference
Deploy models to production with KServe. Auto-scaling, canary deployments, A/B testing.
Features
Everything you need to get started
Official Serving SDKs
Use our official SDKs to integrate Serving into your application
Join the Serving Community
Get help, share ideas, and contribute to the project
Want to Contribute?
We welcome contributions of all kinds: bug reports, feature requests, documentation improvements, and code contributions.
Read our Contributing GuideRelated Products
More from Hanzo Ml
Powered by vLLM
32k+Hanzo Serving is built on top of vLLM, an open-source project.High-throughput and memory-efficient inference engine for LLMs.
Licensed under Apache-2.0
We're grateful to the vLLM maintainers and community for their incredible work.
Contributors to vLLM earn a share of Hanzo compute revenue through our SBOM-verified revenue sharing program.
Ready to get started with Serving?
Deploy in minutes with Hanzo Cloud or self-host with our open-source release.