Best Open Source LLMs — 2026 Rankings

Open Source LLM Leaderboard

The definitive ranking of every major open source model — compared across reasoning, coding, math, software engineering, and instruction following benchmarks.

Last updated: 2026-02-18

S

Qwen 3.5

397B

Kimi K2.5

1T

GLM-5

744B

A

MiniMax M2.5

230B

DeepSeek R1

671B

Qwen 3 235B

235B

DeepSeek V3.2

685B

B

GPT-oss 120B

117B

DeepSeek V3

671B

Mistral Large

675B

Nemotron Ultra 253B

253B

Nemotron Super 49B

49B

C

Gemma 3 27B

27B

Nemotron Nano 30B

30B

D

Llama 4 Maverick

400B

Best Open Source Models by Task — Benchmark Rankings

Which open source LLM is best for coding, reasoning, or math? See how every model stacks up across key benchmarks — hover any bar for details.

Best Overall (MMLU)

General knowledge across 57 subjects (MMLU)

Best Overall Knowledge

Advanced knowledge across subjects (MMLU-Pro)

Open Source Model Benchmark Scores

Complete benchmark results for every open source LLM. Click any column header to sort and rank. Scores sourced from official tech reports.

Filter:

DeepSeek R1

DeepSeek

671B

128K

90.8

84.0

90.2

N/A

65.9

87.5

71.5

97.3

1398

83.3

DeepSeek V3

DeepSeek

671B

128K

88.5

81.2

N/A

38.8

49.2

N/A

68.4

94.0

1359

N/A

DeepSeek V3.2

DeepSeek

685B

130K

88.5

85.0

N/A

67.8

74.1

89.3

79.9

N/A

1421

N/A

Gemma 3 27B

Google

27B

128K

N/A

67.5

N/A

N/A

29.7

N/A

42.4

89.0

1365

N/A

GLM-5

Zhipu AI

744B

200K

85.0

70.4

90.0

77.8

52.0

84.0

86.0

88.0

1451

88.0

GPT-oss 120B

OpenAI

117B

128K

90.0

90.0

N/A

62.4

60.0

N/A

80.9

N/A

1354

N/A

Kimi K2.5

Moonshot

1T

262K

92.0

87.1

99.0

76.8

85.0

96.1

87.6

98.0

1447

94.0

Llama 4 Maverick

Meta

400B

1M

85.5

80.5

62.0

N/A

43.4

N/A

69.8

N/A

1328

N/A

MiniMax M2.5

MiniMax

230B

205K

85.0

76.5

89.6

80.2

65.0

86.3

85.2

N/A

N/A

87.5

Mistral Large

Mistral

675B

256K

85.5

N/A

92.0

N/A

82.8

88.0

43.9

93.6

1416

N/A

Nemotron Nano 30B

Nvidia

30B

1M

N/A

78.1

N/A

N/A

N/A

N/A

78.1

N/A

N/A

N/A

Nemotron Super 49B

Nvidia

49B

128K

N/A

79.5

N/A

N/A

73.6

82.7

72.0

97.4

N/A

88.6

Nemotron Ultra 253B

Nvidia

253B

128K

N/A

N/A

N/A

N/A

66.3

72.5

76.0

97.0

1347

89.5

Qwen 3 235B

Qwen

235B

262K

N/A

84.4

N/A

N/A

74.1

92.3

81.1

N/A

1422

87.8

Qwen 3.5

Qwen

397B

262K

88.5

87.8

N/A

76.4

83.6

N/A

88.4

N/A

N/A

92.6

Compare Open Source LLMs Head-to-Head

Select two open source models to see how they stack up across all benchmarks.

Model A

Model B

GLM-5

Kimi K2.5

MMLU

85.0

vs

92.0

MMLU-Pro

70.4

vs

87.1

HumanEval

90.0

vs

99.0

SWE-bench Verified

77.8

vs

76.8

LiveCodeBench

52.0

vs

85.0

AIME 2025

84.0

vs

96.1

GPQA Diamond

86.0

vs

87.6

MATH-500

88.0

vs

98.0

Chatbot Arena

1451

vs

1447

IFEval

88.0

vs

94.0

Benchmarks won

2

vs

8

Try These Open Source Models in Onyx

Onyx is the open-source AI platform that lets you connect any of these open source LLMs to your team's docs, apps, and people.