Technical Debt Metrics That Actually Matter
Technical Debt Metrics That Actually Matter
For two years, I tracked technical debt using SonarQube's "technical debt ratio." Every month I'd present the number to leadership. Every month they'd nod politely and move on. The metric meant nothing to them because it meant almost nothing to me either.
Then I switched to metrics that connected directly to outcomes people cared about: shipping speed, incident frequency, and engineering retention. Everything changed.
Here's the thing most articles about tech debt metrics get wrong: they treat measurement as an end in itself. A number on a dashboard doesn't reduce debt. A metric only matters if it changes behavior. Every metric below has a clear action trigger attached to it.
The Metrics That Don't Work
Let me save you some time. These metrics sound good in blog posts but fail in practice:
Lines of code. More code isn't necessarily more debt. A 500-line file with clear abstractions beats a 100-line file with three responsibilities crammed together.
SonarQube debt ratio in isolation. It measures code smells and rule violations. Useful as one input, useless as your primary metric. I've seen codebases with an "A" rating that were miserable to work in because the architecture was broken, not the syntax.
Code coverage percentage. 90% coverage with bad tests is worse than 60% coverage with good tests. Coverage measures quantity of testing, not quality.
Cyclomatic complexity averages. Averaging complexity across a codebase hides the hotspots. A codebase with an average complexity of 5 might have one module at 45 that causes all the pain.
The 7 Metrics That Actually Drive Action
1. Developer Velocity Delta
This is my single most important metric. It measures how much slower your team is compared to what they could be.
Velocity Delta = (Time on Feature Work) / (Time on Feature Work + Time Fighting Debt)
Track it weekly by asking developers a simple question in standup or async: "How many hours this week did you spend on work that would have been unnecessary if the codebase were in better shape?"
This includes time spent:
- Working around known issues
- Debugging failures caused by brittle code
- Manually testing because automated tests are unreliable
- Waiting for slow builds or deploys
- Onboarding struggles due to unclear code
Action trigger: When velocity delta drops below 70% (meaning 30%+ of engineering time is debt-related), you have a business case for dedicated remediation.
2. Hotspot Churn Rate
Not all complex code is problematic. Only complex code that changes frequently matters.
# Find files with highest churn in last 6 months
git log --since="6 months ago" --name-only --pretty=format: | \
sort | uniq -c | sort -rn | head -20Cross-reference this with complexity (measured via your static analysis tool). Files that are both complex AND frequently changed are your top priorities.
| File | Changes (6mo) | Complexity | Priority |
|---|---|---|---|
| src/payments/processor.ts | 47 | High | CRITICAL |
| src/auth/session.ts | 38 | High | CRITICAL |
| src/api/routes.ts | 52 | Medium | HIGH |
| src/utils/helpers.ts | 31 | Low | MEDIUM |
| src/db/migrations.ts | 28 | Low | LOW |
Action trigger: Any file with >30 changes in 6 months AND high complexity gets added to the debt register immediately.
3. Change Lead Time by Module
How long does it take from "start working on a change" to "change is in production," broken down by area of the codebase?
interface ModuleLeadTime {
module: string;
avgLeadTimeHours: number;
p95LeadTimeHours: number;
baselineHours: number; // what it should take
overhead: number; // avgLeadTime - baseline
}
// Real numbers from a project:
const modules: ModuleLeadTime[] = [
{ module: "user-profile", avgLeadTimeHours: 8, p95LeadTimeHours: 16, baselineHours: 6, overhead: 2 },
{ module: "payments", avgLeadTimeHours: 34, p95LeadTimeHours: 72, baselineHours: 8, overhead: 26 },
{ module: "notifications", avgLeadTimeHours: 12, p95LeadTimeHours: 24, baselineHours: 6, overhead: 6 },
];
// payments module: 26 hours of overhead per change. That's the cost of debt.Action trigger: Any module where average lead time is >3x baseline gets a root cause analysis within two weeks.
4. Incident Density by Component
Which parts of your system cause the most production incidents per change?
Incident Density = Incidents / Deployments (per component)
This tells you where your riskiest code lives. An incident density of 0.25 means one in four deployments to that component causes an issue.
Action trigger: Incident density >0.15 triggers mandatory remediation planning. Above 0.3, freeze non-critical changes to that component until the root cause is fixed.
5. Build and CI Pipeline Duration
Track this as a trend line, not a point-in-time number. Build time creep is one of the most reliable early indicators of accumulating debt.
Week 1: CI Pipeline: 12 min
Week 8: CI Pipeline: 18 min
Week 16: CI Pipeline: 27 min
Week 24: CI Pipeline: 41 min <-- you waited too long
Every minute added to your CI pipeline is multiplied by the number of developers and the number of daily pushes. A 10-minute increase with 15 developers pushing 3 times a day costs 7.5 hours of wait time daily.
Action trigger: >15% increase in CI time over any 8-week period triggers investigation.
6. Onboarding Time
How long does it take a new engineer to ship their first meaningful PR? Track this for every new hire.
If your onboarding time is increasing, your codebase is getting harder to understand. That's debt.
| Hire | Start Date | First Meaningful PR | Days |
|---|---|---|---|
| Engineer A | Jan 2025 | Jan 12, 2025 | 8 |
| Engineer B | Apr 2025 | Apr 22, 2025 | 15 |
| Engineer C | Aug 2025 | Sep 5, 2025 | 21 |
| Engineer D | Jan 2026 | Jan 29, 2026 | 20 |
That trend (8, 15, 21, 20 days) tells a story. The codebase is getting harder to work with.
Action trigger: When onboarding time exceeds 2 weeks consistently, invest in documentation, code simplification, or both.
7. Dependency Staleness Index
interface DependencyHealth {
name: string;
currentVersion: string;
latestVersion: string;
majorVersionsBehind: number;
lastUpdated: string;
hasKnownVulnerabilities: boolean;
riskScore: number; // 0-10
}
// Risk scoring:
// 1 major version behind = 2 points
// Each additional major = 3 points
// Known vulnerability = 5 points
// Last updated >2 years ago = 3 pointsI was wrong about this metric for a long time. I used to think minor version updates didn't matter. Then a transitive dependency three levels deep had a critical CVE, and the fix required updating through two major versions of a direct dependency. We spent a week on what should have been a 1-hour security patch.
Action trigger: Any dependency with a risk score >7 gets scheduled for update within 30 days.
Building Your Debt Metrics Dashboard
Don't measure everything at once. Start with three metrics:
- Developer Velocity Delta (most important, easiest to start)
- Hotspot Churn Rate (most actionable)
- Change Lead Time by Module (most persuasive to leadership)
Add the others as you build the habit of reviewing and acting on debt data.
The Weekly Debt Standup
Every Friday, spend 15 minutes reviewing:
- Did any metric cross an action trigger this week?
- What's the trend over the last 4 weeks?
- Are our remediation efforts showing results?
This cadence is more valuable than any tool. Consistency of attention beats sophistication of measurement.
The Contrarian Take
Most teams measure too much and act too little. I'd rather have one metric tracked religiously with clear action triggers than a dashboard of 20 metrics that nobody looks at.
If you only track one thing, track Developer Velocity Delta. When engineers are spending 30% of their time fighting the codebase instead of building features, the business case writes itself. Everything else is supporting evidence.
Stop collecting data. Start making decisions.
$ ls ./related
Explore by topic