Inception: Mercury

Mercury is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like GPT-4.1 Nano and Claude 3.5 Haiku while matching their performance. Mercury's speed enables developers to provide responsive user experiences, including with voice agents, search interfaces, and chatbots. Read more in the [blog post] (https://www.inceptionlabs.ai/blog/introducing-mercury) here.

text

Get API Key View Docs Try in Chat

Specifications

Context Window	128K
Modalities	text
Status	available
Category	third-party
Model ID	inception/mercury

Quick Start

TypeScript

import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.HANZO_API_KEY,
  baseURL: 'https://api.hanzo.ai/v1'
})

const response = await client.chat.completions.create({
  model: 'inception/mercury',
  messages: [{ role: 'user', content: 'Hello!' }]
})

console.log(response.choices[0].message.content)

Python

from openai import OpenAI

client = OpenAI(
    api_key=os.environ["HANZO_API_KEY"],
    base_url="https://api.hanzo.ai/v1"
)

response = client.chat.completions.create(
    model="inception/mercury",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

cURL

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $HANZO_API_KEY" \
  -d '{
    "model": "inception/mercury",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/sashabaranov/go-openai"
)

func main() {
    cfg := openai.DefaultConfig(os.Getenv("HANZO_API_KEY"))
    cfg.BaseURL = "https://api.hanzo.ai/v1"
    client := openai.NewClientWithConfig(cfg)

    resp, _ := client.CreateChatCompletion(context.Background(),
        openai.ChatCompletionRequest{
            Model: "inception/mercury",
            Messages: []openai.ChatCompletionMessage{
                {Role: openai.ChatMessageRoleUser, Content: "Hello!"},
            },
        },
    )
    fmt.Println(resp.Choices[0].Message.Content)
}

More from Inception

Inception: Mercury 2

128K

Mercury 2 is an extremely fast reasoning LLM, and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving >1,000 tokens/sec on standard GPUs. Mercury 2 is 5x+ faster than leading speed-optimized LLMs like Claude 4.5 Haiku and GPT 5 Mini, at a fraction of the cost. Mercury 2 supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice/search, and agent loops. OpenAI API compatible. Read more in the [blog post](https://www.inceptionlabs.ai/blog/introducing-mercury-2).

Inception: Mercury Coder

128K

Mercury Coder is the first diffusion large language model (dLLM). Applying a breakthrough discrete diffusion approach, the model runs 5-10x faster than even speed optimized models like Claude 3.5 Haiku and GPT-4o Mini while matching their performance. Mercury Coder's speed means that developers can stay in the flow while coding, enjoying rapid chat-based iteration and responsive code completion suggestions. On Copilot Arena, Mercury Coder ranks 1st in speed and ties for 2nd in quality. Read more in the [blog post here](https://www.inceptionlabs.ai/blog/introducing-mercury).

View all Inception models →

Use Inception: Mercury via Hanzo AI

One API key. 390+ models. OpenAI-compatible. Start free.

Get Free API Key Browse All Models