Qwen: Qwen3 VL 8B Thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs. Compared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.
Specifications
| Context Window | 131K |
| Modalities | text, vision |
| Status | available |
| Category | third-party |
| Model ID | qwen/qwen3-vl-8b-thinking |
Quick Start
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: process.env.HANZO_API_KEY,
baseURL: 'https://api.hanzo.ai/v1'
})
const response = await client.chat.completions.create({
model: 'qwen/qwen3-vl-8b-thinking',
messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key=os.environ["HANZO_API_KEY"],
base_url="https://api.hanzo.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3-vl-8b-thinking",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)curl https://api.hanzo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HANZO_API_KEY" \
-d '{
"model": "qwen/qwen3-vl-8b-thinking",
"messages": [{"role": "user", "content": "Hello!"}]
}'package main
import (
"context"
"fmt"
"os"
"github.com/sashabaranov/go-openai"
)
func main() {
cfg := openai.DefaultConfig(os.Getenv("HANZO_API_KEY"))
cfg.BaseURL = "https://api.hanzo.ai/v1"
client := openai.NewClientWithConfig(cfg)
resp, _ := client.CreateChatCompletion(context.Background(),
openai.ChatCompletionRequest{
Model: "qwen/qwen3-vl-8b-thinking",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "Hello!"},
},
},
)
fmt.Println(resp.Choices[0].Message.Content)
}More from Qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
Use Qwen: Qwen3 VL 8B Thinking via Hanzo AI
One API key. 390+ models. OpenAI-compatible. Start free.