Qwen: Qwen3 VL 235B A22B Instruct
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table extraction, multilingual OCR). The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning. Beyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows—turning sketches or mockups into code and assisting with UI debugging—while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.
Specifications
| Context Window | 262K |
| Modalities | text, vision |
| Status | available |
| Category | third-party |
| Model ID | qwen/qwen3-vl-235b-a22b-instruct |
Quick Start
import OpenAI from 'openai'
const client = new OpenAI({
apiKey: process.env.HANZO_API_KEY,
baseURL: 'https://api.hanzo.ai/v1'
})
const response = await client.chat.completions.create({
model: 'qwen/qwen3-vl-235b-a22b-instruct',
messages: [{ role: 'user', content: 'Hello!' }]
})
console.log(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key=os.environ["HANZO_API_KEY"],
base_url="https://api.hanzo.ai/v1"
)
response = client.chat.completions.create(
model="qwen/qwen3-vl-235b-a22b-instruct",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)curl https://api.hanzo.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $HANZO_API_KEY" \
-d '{
"model": "qwen/qwen3-vl-235b-a22b-instruct",
"messages": [{"role": "user", "content": "Hello!"}]
}'package main
import (
"context"
"fmt"
"os"
"github.com/sashabaranov/go-openai"
)
func main() {
cfg := openai.DefaultConfig(os.Getenv("HANZO_API_KEY"))
cfg.BaseURL = "https://api.hanzo.ai/v1"
client := openai.NewClientWithConfig(cfg)
resp, _ := client.CreateChatCompletion(context.Background(),
openai.ChatCompletionRequest{
Model: "qwen/qwen3-vl-235b-a22b-instruct",
Messages: []openai.ChatCompletionMessage{
{Role: openai.ChatMessageRoleUser, Content: "Hello!"},
},
},
)
fmt.Println(resp.Choices[0].Message.Content)
}More from Qwen
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design with early fusion of multimodal tokens, allowing the model to process and reason across text and images within the same context.
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.
Use Qwen: Qwen3 VL 235B A22B Instruct via Hanzo AI
One API key. 390+ models. OpenAI-compatible. Start free.