The pace of AI, in perspective

The pace of AI, in perspective

March 20, 2026

Gurjeet Singh

Most takes on AI progress are either breathless ("everything changes tomorrow") or dismissive ("it's all hype"). Neither survives contact with the numbers. The useful view is the slope — how fast capability is climbing, how fast cost is falling, and what the gap between the two is doing to anyone building on top.


The slope, roughly


Frontier training compute has been roughly doubling every 6 months since 2019. That is not a marketing line — it is what you get when you plot publicly disclosed training FLOPs for GPT-3, PaLM, GPT-4, Gemini, and the 2025–26 Claude and GPT families.


Model Era
GPT-3 class
Representative Cost
~$4.6M train
Capability Anchor
~57% MMLU
Release Window
2020
Model Era
GPT-4 class
Representative Cost
~$80M train
Capability Anchor
~86% MMLU
Release Window
2023
Model Era
Frontier 2025
Representative Cost
~$500M train
Capability Anchor
~92% MMLU
Release Window
2025
Model Era
Frontier 2026
Representative Cost
~$1B+ train
Capability Anchor
Approaching saturation
Release Window
2026

Two things jump out. Training cost has gone up ~200× in six years. And the capability curve has flattened at the top — MMLU is saturated, so it stopped being a useful yardstick some time ago.


Inference is the story you are actually paying for


Training is what OpenAI and Anthropic pay. Inference is what *you* pay. And inference cost per unit of capability is falling much faster than training cost is rising.


Year
2023
Representative Model
GPT-4 (8k)
$ per 1M input tokens
$30.00
$ per 1M output tokens
$60.00
Year
2024
Representative Model
GPT-4o / Claude 3.5
$ per 1M input tokens
$2.50–$3.00
$ per 1M output tokens
$10.00–$15.00
Year
2025
Representative Model
GPT-5 / Claude 4.5
$ per 1M input tokens
$1.25–$3.00
$ per 1M output tokens
$5.00–$15.00
Year
2026
Representative Model
Haiku-class frontier
$ per 1M input tokens
$0.25–$1.00
$ per 1M output tokens
$1.25–$5.00

A 10–30× drop in three years, at the same or better quality. This is the part of the curve builders actually feel. The product you could not afford to ship in 2024 is profitable in 2026. The feature you prototyped and shelved because the per-request cost was $0.40 is now $0.02.


Benchmark saturation is a signal, not a victory lap


Benchmark
MMLU
2022 SOTA
67%
2024 SOTA
88%
2026 SOTA
~93%
Human expert
~90%
Benchmark
HumanEval
2022 SOTA
48%
2024 SOTA
92%
2026 SOTA
~98%
Human expert
~95%
Benchmark
GPQA (hard science)
2022 SOTA
2024 SOTA
50%
2026 SOTA
~80%
Human expert
~70%
Benchmark
SWE-bench Verified
2022 SOTA
2024 SOTA
18%
2026 SOTA
70%+
Human expert

When a benchmark saturates against expert humans, it stops being a measurement and becomes a floor. The interesting question shifts from "can the model do this" to "how reliably, how cheaply, and at what latency."


What this means if you are building


The compounding lesson


**Process Steps:**

Prototype on frontier model → Ship → Wait 6-12 months → Migrate to smaller cheaper model at same quality → Unit economics flip


**Time Investment:**

1 week → ship → wait → 2 days migrate → profitable


**Total Duration:** 3-4 quarters

**Key Advantage:** The tailwind is doing half your work. If your product is GPT-4-class capable today, it will be Haiku-class cost within 12 months without you touching the model.


Where builders get this wrong


Three failure modes I keep seeing:


  • Over-engineering for today's cost. Writing brittle caching layers and distillation pipelines to make a 2024-era bill survivable, when the 2025 model priced it in for you six months later.
  • 2. **Under-engineering for today's capability.** Still routing every request through a small model when the frontier model is 4× better at the thing that actually matters in your product, and now costs within an order of magnitude.

    3. **Building the wrong moat.** "We use GPT-4" is not a moat. "We built the interface and the evals and the domain data that make an agent trustworthy in our specific workflow" is.


    The actual rate of change, felt in a product


    Year
    2022
    What took an afternoon
    Classification prompts
    What took a week
    Semantic search
    What was unbuildable
    Multi-step agents
    Year
    2024
    What took an afternoon
    Semantic search
    What took a week
    Multi-step agents
    What was unbuildable
    Reliable tool use on messy data
    Year
    2026
    What took an afternoon
    Multi-step agents
    What took a week
    Voice-first agents over live APIs
    What was unbuildable
    Long-horizon autonomy

    The unbuildable row is the one worth watching. The Osmo-class product — voice-first AI that touches your calendar, your email, your actual day — was not a 2024 product. It is a 2026 product. Not because the idea was new, but because the reliability floor finally crossed the threshold where someone would actually let it act on their behalf.


    The honest summary


    AI progress is not "everything changes tomorrow" and it is not "all hype." It is a compounding curve where training costs go up and inference costs fall much faster, and where the gap between those two is the window consumer products get built in.


    If you are building, the job is to pick an idea that was unbuildable 18 months ago, is just barely shippable today, and will be trivially cheap 12 months from now. Everything else is either chasing last year's product or building for a frontier that does not exist yet.

    More Posts

    Learning how to lead

    Learning how to lead

    April 26, 2026

    This started as curiosity. The same instinct that pulled me into building things eventually asked a harder question: what does it take to lead them? Courage, I have come to understand, is not the absence of fear. It is being scared and doing the thing anyway.

    The hidden cost of building

    The hidden cost of building

    April 10, 2026

    Nobody writes down the real P&L of a software studio. API bills that creep, Apple review cycles, the gravity of three codebases, verification interviews, infrastructure that layers silently. The sticker price of "it just works" is paid in places you do not expect.

    How to build in the digital

    How to build in the digital

    January 15, 2025

    The introduction of digital intelligence has paved the way for learning and rapid implementation of ideas, transforming how we approach creative problem-solving and innovation in the modern world.

    The art of creation

    The art of creation

    December 8, 2024

    Exploring the relationship between creative freedom and the mental constraints such freedom introduces, revealing how boundaries can paradoxically enhance artistic expression.

    What are tools?

    What are tools?

    June 14, 2025

    How the unique ability of humans to craft and use tools has helped change the course of history, from primitive implements to digital technologies.