Question 1

What subjects does MMLU cover?

Accepted Answer

57 subjects spanning STEM, humanities, social sciences, and professional domains — including abstract algebra, anatomy, astronomy, business ethics, clinical knowledge, computer security, econometrics, jurisprudence, and virology, among others.

Question 2

What is a good MMLU score?

Accepted Answer

Human expert performance averages approximately 89.8%. Leading models now score above 85%, with some exceeding 90%. A score above 70% indicates strong general knowledge. Below 50% is roughly random guessing on four-choice questions.

Question 3

Is MMLU still a useful benchmark?

Accepted Answer

MMLU remains the most widely cited benchmark for general reasoning. However, as top models approach and exceed human expert scores, its discriminative power is decreasing. Newer benchmarks like GPQA and MATH target harder problems.

MMLU benchmark

What is mmlu benchmark?

Why it matters

Where models stand

How sourc.dev tracks this

Related

HumanEval benchmark

GPQA Diamond benchmark

MATH benchmark

Frequently asked questions

What subjects does MMLU cover?

What is a good MMLU score?

Is MMLU still a useful benchmark?