Qwen: Qwen3 Next 80B A3B Thinking
Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic planning, and reports strong results across knowledge, reasoning, coding, alignment, and multilingual evaluations. Compared with prior Qwen3 variants, it emphasizes stability under long chains of thought and efficient scaling during inference, and it is tuned to follow complex instructions while reducing repetitive or off-task behavior. The model is suitable for agent frameworks and tool use (function calling), retrieval-heavy workflows, and standardized benchmarking where step-by-step solutions are required. It supports long, detailed completions and leverages throughput-oriented techniques (e.g., multi-token prediction) for faster generation. Note that it operates in thinking-only mode.
Specifications
| Context Window | 131K |
| Modalities | text |
| Status | available |
| Category | third-party |
| Model ID | qwen/qwen3-next-80b-a3b-thinking |
Quick Start
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: process.env.HANZO_API_KEY,
baseURL: 'https://api.hanzo.ai/v1'
})
const response = await client.chat.completions.create({
model: 'qwen/qwen3-next-80b-a3b-thinking',
messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key=os.environ["HANZO_API_KEY"],
base_url="https://api.hanzo.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3-next-80b-a3b-thinking",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)curl https://api.hanzo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HANZO_API_KEY" \
-d '{
"model": "qwen/qwen3-next-80b-a3b-thinking",
"messages": [{"role": "user", "content": "Hello!"}]
}'package main
import (
"context"
"fmt"
"os"
"github.com/sashabaranov/go-openai"
)
func main() {
cfg := openai.DefaultConfig(os.Getenv("HANZO_API_KEY"))
cfg.BaseURL = "https://api.hanzo.ai/v1"
client := openai.NewClientWithConfig(cfg)
resp, _ := client.CreateChatCompletion(context.Background(),
openai.ChatCompletionRequest{
Model: "qwen/qwen3-next-80b-a3b-thinking",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "Hello!"},
},
},
)
fmt.Println(resp.Choices[0].Message.Content)
}More from Qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
Use Qwen: Qwen3 Next 80B A3B Thinking via Hanzo AI
One API key. 390+ models. OpenAI-compatible. Start free.