Best Open Source LLMs — 2026 Rankings
Open Source LLM Leaderboard
The definitive ranking of every major open source model — compared across reasoning, coding, math, software engineering, and instruction following benchmarks.
Last updated: 2026-02-18
S
Qwen 3.5
397B
Kimi K2.5
1T
GLM-5
744B
A
MiniMax M2.5
230B
DeepSeek R1
671B
Qwen 3 235B
235B
DeepSeek V3.2
685B
B
GPT-oss 120B
117B
DeepSeek V3
671B
Mistral Large
675B
Nemotron Ultra 253B
253B
Nemotron Super 49B
49B
C
Gemma 3 27B
27B
Nemotron Nano 30B
30B
D
Llama 4 Maverick
400B
Best Open Source Models by Task — Benchmark Rankings
Which open source LLM is best for coding, reasoning, or math? See how every model stacks up across key benchmarks — hover any bar for details.
Best Overall (MMLU)
General knowledge across 57 subjects (MMLU)
Best Overall Knowledge
Advanced knowledge across subjects (MMLU-Pro)
Open Source Model Benchmark Scores
Complete benchmark results for every open source LLM. Click any column header to sort and rank. Scores sourced from official tech reports.
Filter:
DeepSeek R1 DeepSeek | 671B | 128K | 90.8 | 84.0 | 90.2 | N/A | 65.9 | 87.5 | 71.5 | 97.3 | 1398 | 83.3 |
DeepSeek V3 DeepSeek | 671B | 128K | 88.5 | 81.2 | N/A | 38.8 | 49.2 | N/A | 68.4 | 94.0 | 1359 | N/A |
DeepSeek V3.2 DeepSeek | 685B | 130K | 88.5 | 85.0 | N/A | 67.8 | 74.1 | 89.3 | 79.9 | N/A | 1421 | N/A |
Gemma 3 27B | 27B | 128K | N/A | 67.5 | N/A | N/A | 29.7 | N/A | 42.4 | 89.0 | 1365 | N/A |
GLM-5 Zhipu AI | 744B | 200K | 85.0 | 70.4 | 90.0 | 77.8 | 52.0 | 84.0 | 86.0 | 88.0 | 1451 | 88.0 |
GPT-oss 120B OpenAI | 117B | 128K | 90.0 | 90.0 | N/A | 62.4 | 60.0 | N/A | 80.9 | N/A | 1354 | N/A |
Kimi K2.5 Moonshot | 1T | 262K | 92.0 | 87.1 | 99.0 | 76.8 | 85.0 | 96.1 | 87.6 | 98.0 | 1447 | 94.0 |
Llama 4 Maverick Meta | 400B | 1M | 85.5 | 80.5 | 62.0 | N/A | 43.4 | N/A | 69.8 | N/A | 1328 | N/A |
MiniMax M2.5 MiniMax | 230B | 205K | 85.0 | 76.5 | 89.6 | 80.2 | 65.0 | 86.3 | 85.2 | N/A | N/A | 87.5 |
Mistral Large Mistral | 675B | 256K | 85.5 | N/A | 92.0 | N/A | 82.8 | 88.0 | 43.9 | 93.6 | 1416 | N/A |
Nemotron Nano 30B Nvidia | 30B | 1M | N/A | 78.1 | N/A | N/A | N/A | N/A | 78.1 | N/A | N/A | N/A |
Nemotron Super 49B Nvidia | 49B | 128K | N/A | 79.5 | N/A | N/A | 73.6 | 82.7 | 72.0 | 97.4 | N/A | 88.6 |
Nemotron Ultra 253B Nvidia | 253B | 128K | N/A | N/A | N/A | N/A | 66.3 | 72.5 | 76.0 | 97.0 | 1347 | 89.5 |
Qwen 3 235B Qwen | 235B | 262K | N/A | 84.4 | N/A | N/A | 74.1 | 92.3 | 81.1 | N/A | 1422 | 87.8 |
Qwen 3.5 Qwen | 397B | 262K | 88.5 | 87.8 | N/A | 76.4 | 83.6 | N/A | 88.4 | N/A | N/A | 92.6 |
Compare Open Source LLMs Head-to-Head
Select two open source models to see how they stack up across all benchmarks.
Model A
Model B
GLM-5
Kimi K2.5
MMLU
85.0
vs
92.0
MMLU-Pro
70.4
vs
87.1
HumanEval
90.0
vs
99.0
SWE-bench Verified
77.8
vs
76.8
LiveCodeBench
52.0
vs
85.0
AIME 2025
84.0
vs
96.1
GPQA Diamond
86.0
vs
87.6
MATH-500
88.0
vs
98.0
Chatbot Arena
1451
vs
1447
IFEval
88.0
vs
94.0
Benchmarks won
2
vs
8
Try These Open Source Models in Onyx
Onyx is the open-source AI platform that lets you connect any of these open source LLMs to your team's docs, apps, and people.