codeintelligently
Back to posts
Codebase Understanding

How to Run a Codebase Health Check

Vaibhav Verma
10 min read
codebase-healthmetricsengineering-leadershipcode-qualitydeveloper-experienceframework

How to Run a Codebase Health Check

Last quarter, a VP asked me a question I couldn't answer: "Is our codebase healthy?" I had opinions. I had gut feelings. I didn't have data. That moment forced me to build a repeatable process for measuring codebase health, and the results surprised everyone on the leadership team.

Most engineering orgs treat codebase health like a vibe check. Someone senior says "the code is fine" or "the code is a mess," and everyone nods. That's not engineering. That's folklore. I'm going to walk you through the exact process I now run every quarter, complete with the metrics, tools, and thresholds that actually matter.

Why Codebase Health Checks Matter

Here's the contrarian take: your codebase doesn't need to be "clean." It needs to be changeable. I've seen beautifully architected codebases where every feature takes 3 weeks to ship. I've seen messy codebases where the team delivers daily. The difference isn't code aesthetics. It's whether the codebase actively resists the changes your business needs.

A codebase health check measures changeability, not beauty. If you're optimizing for anything else, you're solving the wrong problem.

The VITAL Signs Framework

After running health checks across 9 codebases over two years, I've settled on 5 dimensions that predict whether a codebase will slow your team down. I call them VITAL signs.

V - Velocity of Change

How fast can you safely make changes? Measure this with:

  • Lead time for changes: Time from first commit to production deploy. Healthy: under 24 hours. Concerning: over 1 week.
  • Deploy frequency: How often you ship to production. Healthy: daily or more. Concerning: monthly or less.
  • PR cycle time: Time from PR open to merge. Healthy: under 12 hours. Concerning: over 48 hours.
bash
# Measure average PR cycle time from the last 90 days
gh pr list --state merged --limit 100 --json createdAt,mergedAt \
  --jq '[.[] | ((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 3600] | add / length'

If your lead time is over a week, the codebase is fighting you. It doesn't matter how elegant the architecture looks on a whiteboard.

I - Incident Correlation

Which parts of the codebase cause production incidents? This is the metric that gets executive attention.

typescript
// Track change failure rate per module
interface ModuleIncidentData {
  module: string;
  deploysInPeriod: number;
  incidentsCaused: number;
  changeFailureRate: number;
}

// Real numbers from a health check I ran:
const results: ModuleIncidentData[] = [
  { module: "auth/", deploysInPeriod: 34, incidentsCaused: 8, changeFailureRate: 0.235 },
  { module: "billing/", deploysInPeriod: 21, incidentsCaused: 6, changeFailureRate: 0.286 },
  { module: "notifications/", deploysInPeriod: 45, incidentsCaused: 2, changeFailureRate: 0.044 },
];

A module with a change failure rate above 15% is a health hazard. Above 25%, it's an emergency.

T - Test Effectiveness

Notice I said effectiveness, not coverage. Coverage is a vanity metric. I've seen codebases with 90% coverage that break constantly because the tests were testing implementation details instead of behavior.

Measure test effectiveness with:

  • Defect escape rate: What percentage of bugs make it past your test suite into production?
  • Test-to-change ratio: When you change production code, how many test files need updating? If it's more than 1:1, your tests are coupled to implementation.
  • Mutation testing score: Tools like Stryker can tell you what percentage of intentionally introduced bugs your tests actually catch. Aim for above 60%.
bash
# Run mutation testing to measure true test effectiveness
npx stryker run --reporters clear-text
# Look for the mutation score, not line coverage

A - Architectural Clarity

Can a new engineer understand where to put things? I measure this by tracking how often PRs get "wrong location" feedback in code review. If more than 10% of PRs get this feedback, your architecture isn't communicating intent.

Other signals:

  • Circular dependencies: Use madge --circular src/ to detect them. Any circular dependency is a health issue.
  • God modules: Any module with more than 50 direct dependents is a bottleneck. It'll show up in merge conflicts and slow reviews.
  • Layering violations: Are your UI components importing from database modules? Map your intended layers and check for violations.

L - Legacy Burden

How much of your codebase is effectively frozen because nobody understands it or dares to change it?

bash
# Find files that haven't been modified in over a year but are still imported
git log --all --diff-filter=M --since="1 year ago" --name-only --pretty=format: | \
  sort -u > recently_modified.txt
# Compare against files that are actually imported/used

In one health check, I found that 34% of the codebase hadn't been touched in 18 months but was still actively referenced. That's 34% of code that's effectively unmaintainable because nobody on the current team wrote it or understands it.

Running the Health Check: Step by Step

Step 1: Gather Automated Metrics (2-4 hours)

Pull data from your existing tools. You don't need anything new.

Metric Source Command/Query
PR cycle time GitHub/GitLab API See script above
Deploy frequency CI/CD platform Count deploys per week
Change failure rate Incident tracker + git Correlate incidents to commits
Test mutation score Stryker/pit Run mutation testing
Circular deps madge madge --circular src/
Code ownership gaps git-fame or git-of-theseus See below
bash
# Identify ownership gaps: files where the last author has left the team
git log --format='%ae %H' --diff-filter=M -- src/ | \
  sort -k1,1 -u | \
  awk '{print $1}' | sort | uniq -c | sort -rn

Step 2: Run Developer Surveys (1 day)

Automated metrics miss the human side. Ask your team 5 questions, rated 1-5:

  1. "I can confidently make changes to any part of the codebase" (measures knowledge distribution)
  2. "Our test suite catches bugs before production" (measures test confidence)
  3. "I know where to put new code" (measures architectural clarity)
  4. "I can understand code written by other team members" (measures readability)
  5. "Our development environment rarely blocks me" (measures tooling health)

Average score below 3 on any question is a red flag. Below 2 is a five-alarm fire.

Step 3: Correlate and Prioritize (2 hours)

The magic happens when you cross-reference automated data with survey results. In my last health check:

  • Automated metrics showed billing/ had a 28.6% change failure rate
  • Developer surveys showed billing/ scored 1.8 on "confident making changes"
  • Git analysis showed the original billing author left 14 months ago

That's not three separate problems. That's one problem with three symptoms: knowledge loss. The fix wasn't rewriting billing code. It was pairing sessions and documentation sprints.

Step 4: Build the Health Report

Present findings as a one-page scorecard:

CODEBASE HEALTH REPORT - Q2 2026
=================================
Overall Health Score: 62/100 (Needs Attention)

VITAL Signs:
  V - Velocity of Change:    7/10  (Lead time: 18 hours)
  I - Incident Correlation:  4/10  (2 modules above 20% CFR)
  T - Test Effectiveness:    6/10  (Mutation score: 54%)
  A - Architectural Clarity: 8/10  (Low circular deps)
  L - Legacy Burden:         5/10  (28% frozen code)

Top 3 Action Items:
  1. Pair programming rotation for billing/ module
  2. Add mutation testing to CI for auth/ module
  3. Schedule architecture decision records (ADR) sprint

The Stealable Framework: Quarterly VITAL Check

Here's the process you can copy directly:

  1. Week 1, Day 1: Run automated metric collection scripts (save them; they're reusable)
  2. Week 1, Day 2-3: Send developer survey (5 questions, anonymous, takes 3 minutes)
  3. Week 1, Day 4: Cross-reference data, build scorecard
  4. Week 1, Day 5: Present to engineering leadership with 3 prioritized action items
  5. Weeks 2-12: Execute on action items, track improvement

The key insight that took me too long to learn: don't try to fix everything. Pick the 3 items where automated metrics AND developer surveys agree there's a problem. Those are your highest-conviction bets.

What Good Looks Like

After four quarters of running VITAL checks, the team I worked with went from a health score of 48 to 79. Lead time dropped from 6 days to 14 hours. Change failure rate went from 22% to 7%. But the number I'm most proud of is the developer survey: "I can confidently make changes to any part of the codebase" went from 2.1 to 3.8.

That's what a healthy codebase feels like. Not perfect code. Confident engineers.

Your codebase is talking to you through metrics, incident reports, and the frustration on your team's faces. A health check is just learning to listen systematically. Start this quarter. You'll wonder why you waited.

$ ls ./related

Explore by topic