Self-Hosted LLMs — 2026 Rankings

Self-Hosted LLM Leaderboard

The definitive ranking of self-hostable LLMs for enterprise — compared across quality, speed, hardware requirements, and cost. Find the best open-weight model for your infrastructure.

Last updated: 2026-02-23

S

Kimi K2.5

1T

GLM-5

745B

Qwen 3.5

397B

A

DeepSeek R1

671B

Mistral Large 3

675B

GPT-oss 120B

120B

DeepSeek V3

671B

Step-3.5 Flash

196B

MiMo-V2-Flash

309B

B

Llama 4 Maverick

400B

Nemotron Ultra 253B

253B

Qwen3-235B-A22B

235B

Hunyuan 2.0

406B

GPT-oss 20B

20B

Llama 4 Scout

109B

C

Llama 3.3 70B

70B

DS-R1-Distill-Llama-70B

70B

Qwen 2.5-72B

72B

Gemma 3 27B

27B

DS-R1-Distill-Qwen-32B

32B

Command R+

104B

Qwen2.5-Coder-32B

32B

D

Mistral Small 3.1

24B

Phi-4

14B

Llama 3.1-8B

8B

Qwen3-30B-A3B

30B

Gemma 3 12B

12B

DS-R1-Distill-Qwen-14B

14B

DS-R1-Distill-Qwen-7B

7B

Phi-4-mini

3.8B

Best Self-Hosted LLMs by Task — Benchmark Rankings

Which self-hosted model is best for coding, reasoning, or agentic tasks? See how every open-weight model stacks up — hover any bar for details.

Best Advanced Knowledge

Advanced knowledge with harder 10-option format (MMLU-Pro)

Best in Graduate Reasoning

PhD-level science reasoning (GPQA Diamond)

Best at Instruction Following

Instruction following accuracy (IFEval)

Chatbot Arena Rankings

Crowdsourced Elo from human preference votes (LMArena)

Self-Hosted LLM Benchmark Scores & Hardware Requirements

Complete benchmark results, VRAM requirements, and licensing for every major self-hostable LLM. Click any column header to sort and rank.

Filter:

Command R+

Cohere

104B

131K

CC-BY-NC

55 GB

208 GB

N/A

N/A

N/A

1195

N/A

N/A

N/A

N/A

N/A

DeepSeek R1

DeepSeek

671B

128K

MIT

350 GB

1340 GB

84.0

71.5

83.3

1398

49.2

90.2

65.9

93.3

97.3

DeepSeek V3

DeepSeek

671B

128K

MIT

350 GB

1340 GB

81.2

68.4

N/A

1359

38.8

N/A

49.2

N/A

94.0

DS-R1-Distill-Llama-70B

DeepSeek

70B

128K

MIT

38 GB

140 GB

N/A

65.2

N/A

N/A

N/A

86.0

57.5

70.0

94.5

DS-R1-Distill-Qwen-14B

DeepSeek

14B

128K

MIT

9 GB

28 GB

N/A

59.1

N/A

N/A

N/A

N/A

53.1

N/A

93.9

DS-R1-Distill-Qwen-32B

DeepSeek

32B

128K

MIT

19 GB

64 GB

N/A

62.1

N/A

N/A

N/A

85.4

53.1

72.0

94.3

DS-R1-Distill-Qwen-7B

DeepSeek

7B

128K

MIT

5 GB

14 GB

N/A

49.1

N/A

N/A

N/A

N/A

N/A

N/A

92.8

Gemma 3 12B

Google

12B

128K

Gemma License

8 GB

24 GB

60.0

40.9

N/A

N/A

N/A

85.4

N/A

N/A

N/A

Gemma 3 27B

Google

27B

128K

Gemma License

16 GB

54 GB

67.5

42.4

N/A

1365

N/A

N/A

29.7

N/A

89.0

GLM-5

Zhipu AI

745B

200K

MIT

390 GB

1490 GB

70.4

86.0

88.0

1451

77.8

90.0

52.0

84.0

88.0

GPT-oss 120B

OpenAI

120B

128K

Apache 2.0

62 GB

80 GB

90.0

80.9

N/A

1354

62.4

N/A

60.0

97.9

N/A

GPT-oss 20B

OpenAI

20B

128K

Apache 2.0

14 GB

16 GB

85.3

71.5

N/A

N/A

N/A

N/A

N/A

98.7

N/A

Hunyuan 2.0

Tencent

406B

256K

Tencent License

215 GB

812 GB

N/A

N/A

N/A

N/A

53.0

N/A

N/A

N/A

N/A

Kimi K2.5

Moonshot

1T

262K

MIT

600 GB

2000 GB

87.1

87.6

94.0

1447

76.8

99.0

85.0

96.1

98.0

Llama 3.1-8B

Meta

8B

131K

Llama License

5 GB

16 GB

48.3

32.8

80.4

1186

N/A

72.6

N/A

N/A

51.9

Llama 3.3 70B

Meta

70B

131K

Llama License

38 GB

140 GB

68.9

50.7

92.1

1310

N/A

88.4

N/A

N/A

77.0

Llama 4 Maverick

Meta

400B

1M

Llama License

210 GB

800 GB

80.5

69.8

N/A

1328

N/A

62.0

43.4

N/A

N/A

Llama 4 Scout

Meta

109B

10M

Llama License

58 GB

218 GB

74.3

58.2

N/A

N/A

N/A

N/A

N/A

N/A

N/A

MiMo-V2-Flash

Xiaomi

309B

128K

MIT

165 GB

618 GB

84.9

83.7

N/A

N/A

73.4

N/A

80.6

94.1

N/A

Mistral Large 3

Mistral

675B

256K

Apache 2.0

355 GB

1350 GB

N/A

43.9

N/A

1416

N/A

92.0

82.8

88.0

93.6

Mistral Small 3.1

Mistral

24B

131K

Apache 2.0

14 GB

48 GB

66.8

40.7

79.8

N/A

N/A

87.2

N/A

N/A

N/A

Nemotron Ultra 253B

Nvidia

253B

128K

Open Weight

135 GB

506 GB

N/A

76.0

89.5

1347

N/A

N/A

66.3

72.5

97.0

Phi-4

Microsoft

14B

16K

MIT

9 GB

28 GB

70.4

56.1

64.6

N/A

N/A

82.6

N/A

N/A

80.4

Phi-4-mini

Microsoft

3.8B

131K

MIT

3 GB

8 GB

52.8

30.4

N/A

N/A

N/A

72.0

N/A

N/A

N/A

Qwen 2.5-72B

Qwen

72B

131K

Apache 2.0

39 GB

145 GB

71.1

49.0

86.5

1295

N/A

86.6

N/A

N/A

83.1

Qwen 3.5

Qwen

397B

262K

Apache 2.0

210 GB

794 GB

87.8

88.4

92.6

N/A

76.4

N/A

83.6

N/A

N/A

Qwen2.5-Coder-32B

Qwen

32B

131K

Apache 2.0

19 GB

64 GB

N/A

N/A

N/A

N/A

N/A

92.7

43.2

N/A

N/A

Qwen3-235B-A22B

Qwen

235B

131K

Apache 2.0

125 GB

470 GB

N/A

71.1

N/A

N/A

N/A

N/A

70.7

81.5

N/A

Qwen3-30B-A3B

Qwen

30B

131K

Apache 2.0

18 GB

60 GB

68.7

60.0

N/A

N/A

N/A

N/A

N/A

76.7

95.2

Step-3.5 Flash

StepFun

196B

262K

Apache 2.0

120 GB

392 GB

85.8

N/A

N/A

N/A

74.4

81.1

86.4

99.8

N/A

Compare Self-Hosted LLMs Head-to-Head

Select two models to see how they stack up across all benchmarks.

Model A

Model B

DeepSeek R1

Qwen 3.5

MMLU-Pro

84.0

vs

87.8

GPQA Diamond

71.5

vs

88.4

IFEval

83.3

vs

92.6

SWE-bench Verified

49.2

vs

76.4

LiveCodeBench

65.9

vs

83.6

Benchmarks won

0

vs

5

Deploy These Models with Onyx

Onyx is the open-source AI platform that lets you self-host any of these LLMs and connect them to your team's docs, apps, and people.