Step 3.7 Flash is StepFun's language model with a 256K context window, available from 2 providers, starting at $0.200 / 1M input and $1.15 / 1M output. A high-efficiency multimodal Mixture-of-Experts LLM combining a large language backbone with a vision encoder for native image and video understanding.
Specifications
Canonical IDstep-3-7-flash
TypeLanguage
StatusActive
CreatorStepFunStepFun
Providers
Context Window256K tokens
Input ModalitiesImageTextVideo
Output ModalitiesText
Reasoning Effortsdefault
Release Date · 6 days ago
Benchmarks
Intelligence Index
42.6
#52
Coding Index
37.1
#55
GPQA
0.8
#89
HLE
0.2
#62
IFBench
0.7
#70
Time to First Token
0.79s
#305
SciCode
0.4
#87
LCR
0.6
#66
TerminalBench Hard
0.4
#50
TAU2
1.0
#4
Output TPS

Capabilities

Input3/5
Text
Image
Audio·
Video
PDF·
Output1/5
Text
Image·
Audio·
Video·
Embedding·
Capabilities4/13
Reasoning
Adaptive Reasoning·
Function Calling
Parallel Function Calling·
Structured Outputs
Native JSON Schema
Web Search·
URL Context·
Computer Use·
Code Execution·
File Search·
Prompt Caching·
Assistant Prefill·

Pricing by Provider

ProviderStandard
Input
$ / 1M
Output
$ / 1M
Cache Read
$ / 1M
OpenRouter logo
OpenRouter
stepfun/step-3.7-flash
$0.200$1.15$0.040
Vercel AI Gateway logo
Vercel AI Gateway
stepfun/step-3.7-flash
$0.200$1.15$0.040

Cost Calculator

Preset:

Other models

ModelTierReleasedContextInput / 1MOutput / 1M
Step 3 VL 10B
Step 1X Edit 1.2
Step 1X Edit

Model IDs