Name: Swin Large
Brand: Microsoft

Swin Large is Microsoft's image to text model. A large-scale Swin Transformer image classification model using shifted window attention for high-accuracy visual recognition.

Specifications
Canonical ID	`microsoft-swin-large`
Type	Image to Text
Status	Active
Creator	Microsoft
Input Modalities	Image
Output Modalities	Text

Capabilities

Input1/5

Text·

Image✓

Audio·

Video·

PDF·

Output1/5

Text✓

Image·

Audio·

Video·

Embedding·

Capabilities0/13

Reasoning·

Adaptive Reasoning·

Function Calling·

Parallel Function Calling·

Structured Outputs·

Native JSON Schema·

Web Search·

URL Context·

Computer Use·

Code Execution·

File Search·

Prompt Caching·

Assistant Prefill·

Versions

Version	Released	Context	Input / 1M	Output / 1M	Status
Swin Large	—	—	—	—	Current
Swin Base 4 12 384	—	—	—	—	Available
Swin S3 Base	—	—	—	—	Available
Swin S3 Small	—	—	—	—	Available
Swin S3 Tiny	—	—	—	—	Available
Swin Small	—	—	—	—	Available
Swin Tiny	—	—	—	—	Available

Model IDs

amazon_sagemaker/tensorflow-ic-swin-large-patch4-window12-384
amazon_sagemaker/tensorflow-ic-swin-large-patch4-window7-224
microsoft-swin-large