
The pace of AI, in perspective
March 20, 2026
Most takes on AI progress are either breathless ("everything changes tomorrow") or dismissive ("it's all hype"). Neither survives contact with the numbers. The useful view is the slope — how fast capability is climbing, how fast cost is falling, and what the gap between the two is doing to anyone building on top.
The slope, roughly
Frontier training compute has been roughly doubling every 6 months since 2019. That is not a marketing line — it is what you get when you plot publicly disclosed training FLOPs for GPT-3, PaLM, GPT-4, Gemini, and the 2025–26 Claude and GPT families.
| Model Era | Representative Cost | Capability Anchor | Release Window |
|---|---|---|---|
| GPT-3 class | ~$4.6M train | ~57% MMLU | 2020 |
| GPT-4 class | ~$80M train | ~86% MMLU | 2023 |
| Frontier 2025 | ~$500M train | ~92% MMLU | 2025 |
| Frontier 2026 | ~$1B+ train | Approaching saturation | 2026 |
Two things jump out. Training cost has gone up ~200× in six years. And the capability curve has flattened at the top — MMLU is saturated, so it stopped being a useful yardstick some time ago.
Inference is the story you are actually paying for
Training is what OpenAI and Anthropic pay. Inference is what *you* pay. And inference cost per unit of capability is falling much faster than training cost is rising.
| Year | Representative Model | $ per 1M input tokens | $ per 1M output tokens |
|---|---|---|---|
| 2023 | GPT-4 (8k) | $30.00 | $60.00 |
| 2024 | GPT-4o / Claude 3.5 | $2.50–$3.00 | $10.00–$15.00 |
| 2025 | GPT-5 / Claude 4.5 | $1.25–$3.00 | $5.00–$15.00 |
| 2026 | Haiku-class frontier | $0.25–$1.00 | $1.25–$5.00 |
A 10–30× drop in three years, at the same or better quality. This is the part of the curve builders actually feel. The product you could not afford to ship in 2024 is profitable in 2026. The feature you prototyped and shelved because the per-request cost was $0.40 is now $0.02.
Benchmark saturation is a signal, not a victory lap
| Benchmark | 2022 SOTA | 2024 SOTA | 2026 SOTA | Human expert |
|---|---|---|---|---|
| MMLU | 67% | 88% | ~93% | ~90% |
| HumanEval | 48% | 92% | ~98% | ~95% |
| GPQA (hard science) | — | 50% | ~80% | ~70% |
| SWE-bench Verified | — | 18% | 70%+ | — |
When a benchmark saturates against expert humans, it stops being a measurement and becomes a floor. The interesting question shifts from "can the model do this" to "how reliably, how cheaply, and at what latency."
What this means if you are building
The compounding lesson
**Process Steps:**
Prototype on frontier model → Ship → Wait 6-12 months → Migrate to smaller cheaper model at same quality → Unit economics flip
**Time Investment:**
1 week → ship → wait → 2 days migrate → profitable
**Total Duration:** 3-4 quarters
**Key Advantage:** The tailwind is doing half your work. If your product is GPT-4-class capable today, it will be Haiku-class cost within 12 months without you touching the model.
Where builders get this wrong
Three failure modes I keep seeing:
2. **Under-engineering for today's capability.** Still routing every request through a small model when the frontier model is 4× better at the thing that actually matters in your product, and now costs within an order of magnitude.
3. **Building the wrong moat.** "We use GPT-4" is not a moat. "We built the interface and the evals and the domain data that make an agent trustworthy in our specific workflow" is.
The actual rate of change, felt in a product
| Year | What took an afternoon | What took a week | What was unbuildable |
|---|---|---|---|
| 2022 | Classification prompts | Semantic search | Multi-step agents |
| 2024 | Semantic search | Multi-step agents | Reliable tool use on messy data |
| 2026 | Multi-step agents | Voice-first agents over live APIs | Long-horizon autonomy |
The unbuildable row is the one worth watching. The Osmo-class product — voice-first AI that touches your calendar, your email, your actual day — was not a 2024 product. It is a 2026 product. Not because the idea was new, but because the reliability floor finally crossed the threshold where someone would actually let it act on their behalf.
The honest summary
AI progress is not "everything changes tomorrow" and it is not "all hype." It is a compounding curve where training costs go up and inference costs fall much faster, and where the gap between those two is the window consumer products get built in.
If you are building, the job is to pick an idea that was unbuildable 18 months ago, is just barely shippable today, and will be trivially cheap 12 months from now. Everything else is either chasing last year's product or building for a frontier that does not exist yet.




