Name: GLM-4.6V
Brand: Zhipu AI

GLM-4.6V is Zhipu AI's language model with a 131K context window and up to 33K output tokens, available from 4 providers, starting at $0.3 / 1M input and $0.9 / 1M output. A multimodal MoE vision-language model from Z AI in the GLM-V family, supporting vision, file input, and scalable reinforcement learning-based reasoning.

Specifications
Canonical ID	`zhipu-glm-4-6v`
Type	Language
Status	Active
Creator	Zhipu AI
Providers	Hugging Face Novita OpenRouter Vercel AI Gateway
Context Window	131K tokens
Max Output	33K tokens
Input Modalities	ImagePDFTextVideo
Output Modalities	Text
Reasoning Efforts	default
Parameters	108B
HuggingFace Likes	390
HuggingFace Downloads (30d)	6,998
HuggingFace Downloads (all-time)	405,792
Release Date	2025-12-08 · 7 months ago

Benchmarks
Intelligence Index	11.0 #282
Math Index	26.3 #186
MMLU-Pro	0.8 #162
GPQA	0.6 #300
HLE	0.0 #443
LiveCodeBench	0.4 #162
IFBench	0.3 #355
Time to First Token	1.13s #384
SciCode	0.3 #289
AIME 2025	0.3 #186
LCR	0.1 #303
TerminalBench Hard	0.0 #296
TAU2	0.3 #236
Output TPS	43.5 #233

Capabilities

Input4/5

Text✓

Image✓

Audio·

Video✓

PDF✓

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities6/13

Reasoning✓

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling✓

Structured Outputs✓

Native JSON Schema✓

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching✓

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M	Cache Read $ / 1M
Hugging Face `novita:zai-org/glm-4.6v`	$0.3	$0.9	N/A
Novita `novita/zai-org/glm-4.6v`	$0.3	$0.9	$0.055
OpenRouter `z-ai/glm-4.6v`	$0.3	$0.9	$0.055
Vercel AI Gateway `zai/glm-4.6v`	$0.3	$0.9	$0.05

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Cache write tokens

Cache read tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host GLM-4.6V, ranked by cheapest on-demand price. The model needs about 259 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
Standard_NC96ads_A100_v4	Azure	4× NVIDIA A100	320 GB	$14.69/hr	westus2
g7e.24xlarge	AWS	4× RTX PRO Server 6000	384 GB	$16.57/hr	us-east-1
p4d.24xlarge	AWS	8× A100	320 GB	$21.96/hr	us-west-2
7 more instances can run GLM-4.6V Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
GLM-4.6V	2025-12-08	131K	$0.300	$0.900	Current
GLM-4.5V	2025-08-11	131K	$0.600	$1.20	Deprecating

Model IDs

glm-4-6v

glm-4-6v-reasoning

novita/zai-org/glm-4.6v

z-ai/glm-4.6v

zai-org/glm-4.6v

zai-org/GLM-4.6V

zai/glm-4.6v

zhipu-glm-4-6v

GLM-4.6V

CapabilitiesAPIGET/api/v1/models/zhipu-glm-4-6v

Pricing by ProviderAPIGET/api/v1/models/zhipu-glm-4-6v/pricing

Cost CalculatorAPIGET/api/v1/models/zhipu-glm-4-6v/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/zhipu-glm-4-6v/instances

VersionsAPIGET/api/v1/models?family=glm4v_moe

Model IDsAPIGET/api/v1/models/zhipu-glm-4-6v

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs