mattdf

The Scaling Laws are already broken: smaller models win out on reasoning long term

April 25, 2025

The SoTA LLMs that score highest on standard benchmarks all have >100B parameter counts, but these consist of mainly “flat” tasks: single-prompt problems with short, self-contained answers. The current scaling curves that plot test loss vs. parameter count show smooth power-law gains and suggest that more weights yield monotonic progress. However, these curves are misleading: they measure token-level accuracy, not whole-task reliability across longer, chained sequences of actions.

Once models need to maintain that correctness through hundreds or thousands of dependent steps (writing, compiling, running, reading, revising, etc), they break down. Below is my argument for why parameter growth or increased test-time compute alone cannot overcome that shift, and why smaller, modular, hierarchy-aware systems will ultimately likely dominate.

The Scaling Laws are already broken: smaller models win out on reasoning long term

Why Quantum Computing will take another 50 years