Best LLMs — 2026 Rankings
LLM Leaderboard
The definitive ranking of every major LLM — open and closed source — compared across reasoning, coding, math, agentic, software engineering, and chat benchmarks.
Last updated: 2026-02-17
S
Claude Opus 4.6
N/A
GPT-5.2
N/A
GLM-5
744B
Kimi K2.5
1T
A
Claude Sonnet 4.6
N/A
Gemini 3 Pro
N/A
Qwen 3.5
397B
DeepSeek R1
671B
Mistral Large
675B
B
GPT-oss 120B
117B
Nemotron Ultra 253B
253B
C
Grok 3
N/A
DeepSeek V3
671B
Llama 4 Maverick
400B
D
Gemma 3 27B
27B
Best LLMs by Task — Benchmark Rankings
Which LLM is best for coding, reasoning, or agentic tasks? See how every model stacks up across key benchmarks — hover any bar for details.
Best Overall (MMLU)
General knowledge across 57 subjects (MMLU)
Best Multilingual
Multilingual Q&A across languages (MMMLU)
Best Visual Reasoning
Visual reasoning across disciplines (MMMU-Pro)
Hardest Exam
Expert-level multidisciplinary reasoning (Humanity's Last Exam)
LLM Benchmark Scores & Pricing
Complete benchmark results and pricing for every major LLM. Click any column header to sort and rank.
Filter:
Claude Opus 4.6 Anthropic | N/A | 200K | $15.00 | $75.00 | 91.0 | 91.1 | 77.3 | 53.0 | 80.8 | 65.4 | 95.0 | 76.0 | 100.0 | 97.6 | N/A | 94.0 | 91.3 | 68.8 | 82.0 | 91.9 | 72.7 | 84.0 |
Claude Sonnet 4.6 Anthropic | N/A | 200K | $3.00 | $15.00 | N/A | 89.3 | 75.6 | 49.0 | 79.6 | 59.1 | N/A | N/A | 52.8 | 97.8 | N/A | N/A | 89.9 | 58.3 | 79.1 | 91.7 | 72.5 | 74.7 |
DeepSeek R1 DeepSeek | 671B | 128K | $0.28 | $0.42 | 90.8 | N/A | N/A | N/A | N/A | N/A | 90.2 | 65.9 | 87.5 | 97.3 | 1398 | 83.3 | 71.5 | N/A | 84.0 | N/A | N/A | N/A |
DeepSeek V3 DeepSeek | 671B | 128K | $0.28 | $1.10 | 88.5 | N/A | N/A | N/A | 38.8 | N/A | N/A | 49.2 | N/A | 94.0 | 1359 | N/A | 68.4 | N/A | 81.2 | N/A | N/A | N/A |
Gemini 3 Pro | N/A | 1M | $1.25 | $10.00 | 91.8 | 91.8 | 81.0 | 45.8 | 78.0 | 56.2 | 93.0 | 81.3 | 100.0 | 94.0 | 1492 | 85.0 | 91.9 | 31.1 | 85.0 | 85.3 | N/A | 59.2 |
Gemma 3 27B | 27B | 128K | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 29.7 | N/A | 89.0 | 1365 | N/A | 42.4 | N/A | 67.5 | N/A | N/A | N/A |
GLM-5 Zhipu AI | 744B | 200K | N/A | N/A | 85.0 | N/A | N/A | 50.4 | 77.8 | 56.2 | 90.0 | 52.0 | 84.0 | 88.0 | 1451 | 88.0 | 86.0 | N/A | 70.4 | 89.7 | N/A | 75.9 |
GPT-5.2 OpenAI | N/A | 128K | $2.00 | $8.00 | N/A | 89.6 | 80.4 | 50.0 | 80.0 | 64.7 | N/A | N/A | 100.0 | N/A | N/A | N/A | 93.2 | 54.2 | N/A | 82.0 | 38.2 | 77.9 |
GPT-oss 120B OpenAI | 117B | 128K | N/A | N/A | 90.0 | N/A | N/A | N/A | 62.4 | N/A | N/A | 60.0 | N/A | N/A | 1354 | N/A | 80.9 | N/A | 90.0 | N/A | N/A | N/A |
Grok 3 xAI | N/A | 131K | $3.00 | $15.00 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 93.3 | N/A | 1400 | N/A | 84.6 | N/A | N/A | N/A | N/A | N/A |
Kimi K2.5 Moonshot | 1T | 262K | N/A | N/A | 92.0 | N/A | 78.5 | N/A | 76.8 | N/A | 99.0 | 85.0 | 96.1 | 98.0 | 1447 | 94.0 | 87.6 | N/A | 87.1 | N/A | N/A | N/A |
Llama 4 Maverick Meta | 400B | 1M | N/A | N/A | 85.5 | 84.6 | N/A | N/A | N/A | N/A | 62.0 | 43.4 | N/A | N/A | 1328 | N/A | 69.8 | N/A | 80.5 | N/A | N/A | N/A |
Mistral Large Mistral | 675B | 256K | N/A | N/A | 85.5 | N/A | N/A | N/A | N/A | N/A | 92.0 | 82.8 | 88.0 | 93.6 | 1416 | N/A | 43.9 | N/A | N/A | N/A | N/A | N/A |
Nemotron Ultra 253B Nvidia | 253B | 128K | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | 66.3 | 72.5 | 97.0 | 1347 | 89.5 | 76.0 | N/A | N/A | N/A | N/A | N/A |
Qwen 3.5 Qwen | 397B | 262K | N/A | N/A | 88.5 | 88.5 | 79.0 | 28.7 | 76.4 | 52.5 | N/A | 83.6 | N/A | N/A | N/A | 92.6 | 88.4 | N/A | 87.8 | 86.7 | 62.2 | 78.6 |
Compare LLMs Head-to-Head
Select two models to see how they stack up across all benchmarks.
Model A
Model B
GPT-5.2
Claude Opus 4.6
MMMLU
89.6
vs
91.1
MMMU-Pro
80.4
vs
77.3
HLE
50.0
vs
53.0
SWE-bench Verified
80.0
vs
80.8
Terminal-Bench 2.0
64.7
vs
65.4
AIME 2025
100.0
vs
100.0
GPQA Diamond
93.2
vs
91.3
ARC-AGI-2
54.2
vs
68.8
τ2-bench
82.0
vs
91.9
OSWorld
38.2
vs
72.7
BrowseComp
77.9
vs
84.0
Benchmarks won
2
vs
8
Try These Models in Onyx
Onyx is the open-source AI platform that lets you connect any of these LLMs to your team's docs, apps, and people.