devdot
← All postsEngineering ·

AI Has Outpaced How We Measure Developer Productivity — DORA Wasn't Built for This

Engineering leaders say AI tools made teams faster, but their metrics can't explain by how much. DORA was built for a world where humans wrote every line.

Faster By What, Exactly?

A recent Harness report landed with two cheerful numbers: the vast majority of engineering leaders say productivity is up since adopting AI coding tools, and nearly as many say developer satisfaction is up too. Good news. But there's a question those numbers can't answer — up by what?

That's not a rhetorical jab. It's a measurement gap. Teams feel faster and report being happier, and their dashboards show roughly the same picture they always did. When perception and instrumentation disagree this much, the instrumentation is usually the thing that's out of date.

Why DORA Misses What AI Changes

DORA metrics — deploy frequency, lead time, change failure rate, mean time to recovery — were designed for a world where humans wrote every line of code. They're good metrics. They're just measuring the wrong axis for what AI actually shifts.

AI doesn't primarily change how often you deploy. It changes the shape of the work:

  • How fast a junior ramps when a model is answering the questions they used to queue up for a senior.
  • How much review time replaces typing time.
  • How much context-loading a senior does before each task now that drafting is cheap.

None of that shows up cleanly in lead time or deploy frequency. So you get the telltale symptoms: teams feel faster but velocity charts look flat; PR review load quietly doubles while lines shipped stays the same; the bottleneck migrates from writing code to verifying it — and no standard dashboard tracks verification cost at all.

The Bottleneck Moved. Your Metrics Didn't.

This is the crux. AI didn't make the work disappear; it moved the work. The expensive part used to be producing code. Increasingly the expensive part is reviewing, verifying, and integrating code that was produced fast and cheap. If your metrics still center on output volume, you're measuring the part that got easy and ignoring the part that got hard.

That's how you end up with a team that's genuinely working differently and a metrics suite that insists nothing changed. The numbers aren't lying — they're just pointed at the old bottleneck.

A Fix to Try

You don't have to throw out DORA. Keep it, and add a few measures that capture the work AI actually created:

  • Review-to-author ratio. As AI drafts more, review becomes the scarce human input. Watch how that ratio moves.
  • Time-to-merge after the first AI draft. This captures the real cost — the gap between "AI produced something" and "a human trusted it enough to ship."
  • AI output rejection rate. How often does AI-generated work get thrown away or heavily reworked? That's a direct read on quality and on wasted review cycles.

Those three numbers tell you the story DORA can't: where the verification cost is concentrating and whether your AI tooling is actually paying off or just shifting effort around.

Rethinking how an engineering org measures what matters is exactly the kind of work that compounds. We're here to help founders and teams design and build digital products that scale with you, not slow you down. If you're rethinking how your team measures productivity in an AI-heavy workflow, get in contact with us today.

The takeaway: if your team is using AI tools and your metrics haven't moved, you're not measuring the work that's actually happening. Find the new bottleneck — and point a metric at it.

NEXT POST →Frontier Companies Use 3.5x More AI Per Employee — The Adoption Gap Is Becoming the Capability Gap