Qwen: Qwen3 30B A3B Thinking 2507
Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated from final answers. Compared to earlier Qwen3-30B releases, this version improves performance across logical reasoning, mathematics, science, coding, and multilingual benchmarks. It also demonstrates stronger instruction following, tool use, and alignment with human preferences. With higher reasoning efficiency and extended output budgets, it is best suited for advanced research, competitive problem solving, and agentic applications requiring structured long-context reasoning.
Specifications
| Context Window | 33K |
| Modalities | text |
| Status | available |
| Category | third-party |
| Model ID | qwen/qwen3-30b-a3b-thinking-2507 |
Quick Start
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: process.env.HANZO_API_KEY,
baseURL: 'https://api.hanzo.ai/v1'
})
const response = await client.chat.completions.create({
model: 'qwen/qwen3-30b-a3b-thinking-2507',
messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key=os.environ["HANZO_API_KEY"],
base_url="https://api.hanzo.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3-30b-a3b-thinking-2507",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)curl https://api.hanzo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HANZO_API_KEY" \
-d '{
"model": "qwen/qwen3-30b-a3b-thinking-2507",
"messages": [{"role": "user", "content": "Hello!"}]
}'package main
import (
"context"
"fmt"
"os"
"github.com/sashabaranov/go-openai"
)
func main() {
cfg := openai.DefaultConfig(os.Getenv("HANZO_API_KEY"))
cfg.BaseURL = "https://api.hanzo.ai/v1"
client := openai.NewClientWithConfig(cfg)
resp, _ := client.CreateChatCompletion(context.Background(),
openai.ChatCompletionRequest{
Model: "qwen/qwen3-30b-a3b-thinking-2507",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "Hello!"},
},
},
)
fmt.Println(resp.Choices[0].Message.Content)
}More from Qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
Use Qwen: Qwen3 30B A3B Thinking 2507 via Hanzo AI
One API key. 390+ models. OpenAI-compatible. Start free.