Best LLMs — 2026 Rankings

LLM Leaderboard

The definitive ranking of every major LLM — open and closed source — compared across reasoning, coding, math, agentic, software engineering, and chat benchmarks.

Last updated: 2026-02-17

S

Claude Opus 4.6

N/A

GPT-5.2

N/A

GLM-5

744B

Kimi K2.5

1T

A

Claude Sonnet 4.6

N/A

Gemini 3 Pro

N/A

Qwen 3.5

397B

DeepSeek R1

671B

Mistral Large

675B

B

GPT-oss 120B

117B

Nemotron Ultra 253B

253B

C

Grok 3

N/A

DeepSeek V3

671B

Llama 4 Maverick

400B

D

Gemma 3 27B

27B

Best LLMs by Task — Benchmark Rankings

Which LLM is best for coding, reasoning, or agentic tasks? See how every model stacks up across key benchmarks — hover any bar for details.

Best Overall (MMLU)

General knowledge across 57 subjects (MMLU)

Best Multilingual

Multilingual Q&A across languages (MMMLU)

Best Visual Reasoning

Visual reasoning across disciplines (MMMU-Pro)

Hardest Exam

Expert-level multidisciplinary reasoning (Humanity's Last Exam)

LLM Benchmark Scores & Pricing

Complete benchmark results and pricing for every major LLM. Click any column header to sort and rank.

Filter:

Claude Opus 4.6

Anthropic

N/A

200K

$15.00

$75.00

91.0

91.1

77.3

53.0

80.8

65.4

95.0

76.0

100.0

97.6

N/A

94.0

91.3

68.8

82.0

91.9

72.7

84.0

Claude Sonnet 4.6

Anthropic

N/A

200K

$3.00

$15.00

N/A

89.3

75.6

49.0

79.6

59.1

N/A

N/A

52.8

97.8

N/A

N/A

89.9

58.3

79.1

91.7

72.5

74.7

DeepSeek R1

DeepSeek

671B

128K

$0.28

$0.42

90.8

N/A

N/A

N/A

N/A

N/A

90.2

65.9

87.5

97.3

1398

83.3

71.5

N/A

84.0

N/A

N/A

N/A

DeepSeek V3

DeepSeek

671B

128K

$0.28

$1.10

88.5

N/A

N/A

N/A

38.8

N/A

N/A

49.2

N/A

94.0

1359

N/A

68.4

N/A

81.2

N/A

N/A

N/A

Gemini 3 Pro

Google

N/A

1M

$1.25

$10.00

91.8

91.8

81.0

45.8

78.0

56.2

93.0

81.3

100.0

94.0

1492

85.0

91.9

31.1

85.0

85.3

N/A

59.2

Gemma 3 27B

Google

27B

128K

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

29.7

N/A

89.0

1365

N/A

42.4

N/A

67.5

N/A

N/A

N/A

GLM-5

Zhipu AI

744B

200K

N/A

N/A

85.0

N/A

N/A

50.4

77.8

56.2

90.0

52.0

84.0

88.0

1451

88.0

86.0

N/A

70.4

89.7

N/A

75.9

GPT-5.2

OpenAI

N/A

128K

$2.00

$8.00

N/A

89.6

80.4

50.0

80.0

64.7

N/A

N/A

100.0

N/A

N/A

N/A

93.2

54.2

N/A

82.0

38.2

77.9

GPT-oss 120B

OpenAI

117B

128K

N/A

N/A

90.0

N/A

N/A

N/A

62.4

N/A

N/A

60.0

N/A

N/A

1354

N/A

80.9

N/A

90.0

N/A

N/A

N/A

Grok 3

xAI

N/A

131K

$3.00

$15.00

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

93.3

N/A

1400

N/A

84.6

N/A

N/A

N/A

N/A

N/A

Kimi K2.5

Moonshot

1T

262K

N/A

N/A

92.0

N/A

78.5

N/A

76.8

N/A

99.0

85.0

96.1

98.0

1447

94.0

87.6

N/A

87.1

N/A

N/A

N/A

Llama 4 Maverick

Meta

400B

1M

N/A

N/A

85.5

84.6

N/A

N/A

N/A

N/A

62.0

43.4

N/A

N/A

1328

N/A

69.8

N/A

80.5

N/A

N/A

N/A

Mistral Large

Mistral

675B

256K

N/A

N/A

85.5

N/A

N/A

N/A

N/A

N/A

92.0

82.8

88.0

93.6

1416

N/A

43.9

N/A

N/A

N/A

N/A

N/A

Nemotron Ultra 253B

Nvidia

253B

128K

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

N/A

66.3

72.5

97.0

1347

89.5

76.0

N/A

N/A

N/A

N/A

N/A

Qwen 3.5

Qwen

397B

262K

N/A

N/A

88.5

88.5

79.0

28.7

76.4

52.5

N/A

83.6

N/A

N/A

N/A

92.6

88.4

N/A

87.8

86.7

62.2

78.6

Compare LLMs Head-to-Head

Select two models to see how they stack up across all benchmarks.

Model A

Model B

GPT-5.2

Claude Opus 4.6

MMMLU

89.6

vs

91.1

MMMU-Pro

80.4

vs

77.3

HLE

50.0

vs

53.0

SWE-bench Verified

80.0

vs

80.8

Terminal-Bench 2.0

64.7

vs

65.4

AIME 2025

100.0

vs

100.0

GPQA Diamond

93.2

vs

91.3

ARC-AGI-2

54.2

vs

68.8

τ2-bench

82.0

vs

91.9

OSWorld

38.2

vs

72.7

BrowseComp

77.9

vs

84.0

Benchmarks won

2

vs

8

Try These Models in Onyx

Onyx is the open-source AI platform that lets you connect any of these LLMs to your team's docs, apps, and people.