Google logo

Multimodal Embedding


Multimodal Embedding is Google logoGoogle's embedding model with a 2K context window, available from 2 providers, starting at $0.800 / 1M input and $N/A / 1M output. A multimodal embedding model that encodes both text and images into a shared vector space for cross-modal retrieval and similarity tasks.
Spec
Canonical IDgoogle-multimodal-embedding
TypeEmbedding
StatusActive
CreatorGoogleGoogle
Providers
Context Window2K tokens
Input ModalitiesText
Output ModalitiesEmbedding
Embedding Dimensions768

Capabilities

Input1/5
Text
Image·
Audio·
Video·
PDF·
Output1/5
Text·
Image·
Audio·
Video·
Embedding
Capabilities0/13
Reasoning·
Adaptive Reasoning·
Function Calling·
Parallel Function Calling·
Structured Outputs·
Native JSON Schema·
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Google Gemini logo
Google Gemini
multimodalembedding
$0.800
Google Vertex AI logo
Google Vertex AI
multimodalembedding
$0.800

Cost Calculator

Preset:
Compares every provider & tier in USD

Versions

VersionReleasedContextInput / 1MOutput / 1MStatus
Text Embedding 0052K$0.025$0.000Available
Embed 4128K$0.120$0.470Available
Embed 4128K$0.120$0.000Available
NV Embed QA 4$0.100$0.100Available
Text Embedding 42K$0.100$0.000Deprecated
Voyage 432K$0.060$0.000Available
Voyage 4 Large32K$0.120$0.000Available
Voyage 4 Lite32K$0.020$0.000Available
Text Embedding 3.7 Large Exp8K$0.100$0.000Available
Voyage 3.532K$0.060$0.000Available
Multimodal Embedding2K$0.800$0.000Current

Model IDs