Name: Gemma 3 4B
Brand: Google

Gemma 3 4B is Google's language model with a 128K context window and up to 8K output tokens, starting at $0.03 / 1M input and $0.08 / 1M output. A 4-billion-parameter Gemma 3 open-weight LLM balancing multimodal capability with compact size.

Specifications
Canonical ID	`google-gemma-3-4b`
Type	Language
Status	Active
Creator	Google
Providers	LlamaGate
Context Window	128K tokens
Max Output	8K tokens
Input Modalities	Image
Output Modalities	Text
Parameters	4B

Benchmarks
Intelligence Index	1.1 #486
Coding Index	2.7 #143
Math Index	12.7 #220
MMLU-Pro	0.4 #294
GPQA	0.3 #442
HLE	0.1 #281
LiveCodeBench	0.1 #291
AIME	0.1 #134
IFBench	0.3 #352
Time to First Token	0.00s #114
SciCode	0.1 #430
MATH-500	0.8 #107
AIME 2025	0.1 #220
LCR	0.1 #335
TerminalBench Hard	0.0 #335
TAU2	0.0 #372
Output TPS	0.0 #373

Capabilities

Input1/5

Text·

Image✓

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities2/13

Reasoning·

Adaptive Reasoning·

Function Calling✓

Parallel Function Calling·

Structured Outputs✓

Native JSON Schema·

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Pricing by Provider

US Dollar ($)

Per 1M tokens

Provider	Standard
Provider	Input $ / 1M	Output $ / 1M
LlamaGate `llamagate/gemma3-4b`	$0.03	$0.08

Cost Calculator

US Dollar ($)

Preset:

Input tokens

Output tokens

Number of calls

Cheapest Instances to Run It

Cloud GPU instances that can host Gemma 3 4B, ranked by cheapest on-demand price. The model needs about 10 GB of GPU memory at FP16 precision (estimated from its parameter count), so treat the fit as guidance rather than a guarantee.

All clouds

FP16 (full precision)

US Dollar ($)

Instance	Cloud	GPU	VRAM	Price	Cheapest region
Standard_NV4as_v4	Azure	AMD Radeon Instinct MI25	16 GB	$0.233/hr	westus2
g5g.xlarge	AWS	T4g	16 GB	$0.420/hr	us-east-1
Standard_NV8as_v4	Azure	AMD Radeon Instinct MI25	16 GB	$0.466/hr	westus2
7 more instances can run Gemma 3 4B Unlock the full ranked list and FP8 / INT4 quantization with a CloudPrice subscription.

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Gemma 4 31B	—	256K	$0.140	$0.400	Available
Gemma 4 26B A4B	—	256K	$0.130	$0.400	Available
Gemma 4 12B	—	—	—	—	Available
Gemma 4 31B	—	—	—	—	Available
Gemma 4 26B A4B	—	—	—	—	Available
Gemma 4 E4B	—	—	—	—	Available
Gemma 4 E2B	—	128K	$0.040	$0.080	Available
Gemma 4 E4B	—	—	—	—	Available
Gemma 4 E2B	—	—	—	—	Available
Gemma 4	—	—	—	—	Available
Gemma 3 4B	—	128K	$0.030	$0.080	Current

Model IDs

gemma-3-4b

google-gemma-3-4b

llamagate/gemma3-4b

Gemma 3 4B

CapabilitiesAPIGET/api/v1/models/google-gemma-3-4b

Pricing by ProviderAPIGET/api/v1/models/google-gemma-3-4b/pricing

Cost CalculatorAPIGET/api/v1/models/google-gemma-3-4b/pricing/calculate?input_tokens=1000000&output_tokens=500000

Cheapest Instances to Run ItAPIGET/api/v1/models/google-gemma-3-4b/instances

VersionsAPIGET/api/v1/models?family=gemma

Model IDsAPIGET/api/v1/models/google-gemma-3-4b

Capabilities

Pricing by Provider

Cost Calculator

Cheapest Instances to Run It

Versions

Model IDs