Leaderboards

Rankings computed from verified data. Updated when underlying data changes.

Biggest Price Collapse

Largest % reduction from GPT-3 baseline ($60/1M tokens)

1 Gemini 1.5 Flash ↓99.9% from $60
2 Llama 3.3 70B ↓99.8% from $60
3 GPT-4o mini ↓99.8% from $60

Largest Context Window

Models ranked by maximum context window size in tokens

1 Gemini 1.5 Pro 2,000,000 tokens
2 Gemini 1.5 Flash 1,000,000 tokens
3 Gemini 2.0 Flash 1,000,000 tokens

Lowest Input Price

Active models ranked by input cost per million tokens

1 Gemini 1.5 Flash $0.07 / 1M tokens
2 Llama 3.3 70B $0.10 / 1M tokens
3 GPT-4o mini $0.15 / 1M tokens

Highest Benchmark

Models ranked by MMLU score — reasoning across 57 subjects

1 o1 92.3% MMLU
2 DeepSeek R1 90.8% MMLU
3 GPT-4o 88.7% MMLU

Longest Tracked

Models we have been tracking the longest on sourc.dev

1 GPT-3 (davinci-002) Since 2026-03-24 — 0 days
2 GPT-3.5 Turbo Since 2026-03-24 — 0 days
3 GPT-4 Since 2026-03-24 — 0 days

Most Price Reductions

Models with the most recorded price decreases over time

Pipeline data pending

EU Data Residency Leaders

Top-performing models available with EU data residency

1 Llama 3.1 405B EU ✓ · 88.6% MMLU
2 Qwen 2.5 72B EU ✓ · 86.1% MMLU
3 Llama 3.3 70B EU ✓ · 86.0% MMLU

Open Weights Leaders

Top-performing open-source / open-weights models by MMLU

1 Qwen 2.5 72B ◆ OPEN · 86.1% MMLU
2 Mixtral 8x7B ◆ OPEN · 70.6% MMLU
3 Mistral 7B ◆ OPEN · 62.5% MMLU

Best Value

Highest benchmark score per dollar of input cost

1 Llama 3.3 70B 860.0 score/$1
2 GPT-4o mini 546.7 score/$1
3 DeepSeek V3 327.8 score/$1

Most Context Per Dollar

Largest context window relative to input token cost

1 Gemini 1.5 Flash 13,333,333 tokens/$1
2 Gemini 2.0 Flash 6,666,667 tokens/$1
3 Gemini 1.5 Pro 1,600,000 tokens/$1
Submit a model or correction →