codeintelligently
Back to posts
Code Intelligence & Analysis

Behavioral Code Analysis: Mining Patterns From How Teams Work

Vaibhav Verma
8 min read
behavioral analysiscode intelligenceteam healthengineering managementknowledge management

Behavioral Code Analysis: Mining Patterns From How Teams Work

Code doesn't write itself. Humans write it. And the patterns those humans follow (when they commit, how they review, where they concentrate their effort) reveal more about a codebase's future than any static metric.

I got hooked on behavioral code analysis after reading Adam Tornhill's "Your Code as a Crime Scene" in 2017. His central thesis was elegant: apply forensic psychology techniques to codebases. Instead of analyzing what the code looks like, analyze what the people who write it do. Since then, I've applied these techniques to every team I've led, and the results have been consistently more actionable than any SonarQube dashboard.

What Behavioral Code Analysis Is

Behavioral code analysis examines the patterns of human activity around a codebase. It uses data from:

  • Version control: Who committed what, when, and how often
  • Code review systems: Who reviews what, how long reviews take, how many iterations
  • Issue trackers: Which commits link to bugs vs features
  • Communication tools: (In advanced implementations) Which code discussions happen on Slack vs in PRs

The goal is to answer questions that structural analysis can't:

  • Which parts of the system are understood by only one person?
  • Where do knowledge silos exist?
  • Which modules have the highest social complexity (require coordinating many people to change)?
  • What patterns predict production incidents?

Pattern 1: Knowledge Silos

A knowledge silo exists when one developer is the sole expert on a critical piece of code. This is the single most dangerous pattern I find in behavioral analysis, and it exists in every team I've worked with.

Here's how I detect it:

bash
# For each file changed in the last year, count unique authors
git log --since="12 months ago" --format='%an' --name-only -- '*.ts' | \
  awk '/^$/{next} !author{author=$0;next} {files[author][$0]++; next} {author=""}' | \
  # (simplified - actual implementation uses a Python script)

In practice, I use a Python script that produces a matrix: files as rows, developers as columns, commit counts as values. Files where a single developer accounts for more than 70% of commits in the last 12 months are knowledge silos.

At one company, we found that 40% of the modules responsible for revenue-critical features had a bus factor of 1. A single senior developer's departure would have left those modules without anyone who deeply understood them. We immediately started a knowledge-spreading program: pair programming rotations, recorded architecture walkthroughs, and deliberate review assignments.

Within 6 months, we'd reduced single-author-dominant modules from 40% to 15%.

Pattern 2: Temporal Coupling

I covered the mechanics of temporal coupling in the git history article, but the behavioral insight goes deeper.

When two files consistently change together, there are two possible explanations:

  1. Legitimate architectural coupling: The files genuinely need to change together (e.g., a component and its styles, a handler and its types).
  2. Accidental coupling: The files change together because of a hidden dependency that shouldn't exist.

The behavioral signal that distinguishes them: who makes the changes?

If the same developer always changes both files in the same commit, it might be legitimate coupling. They know the relationship and handle it intentionally.

If different developers change the files in separate commits (and the second change is often a bug fix for the first), that's accidental coupling. The relationship isn't visible in the code, and developers who don't know about it break things.

I track this by analyzing commit sequences:

bash
# Find bug-fix commits that follow changes to specific files
# (Requires commits to reference issue numbers)
git log --oneline --all | grep -i "fix\|bug\|hotfix" | head -20

A more sophisticated approach links issue tracker data to commits and identifies which files are most often changed in bug-fix commits that follow changes to other files. This reveals causal coupling: "When file A changes, file B tends to break."

Pattern 3: Review Bottlenecks

Code review patterns reveal organizational bottlenecks that don't show up in any code metric.

Questions to investigate:

  • Who reviews the most PRs? If one person reviews 40% of all PRs, they're a bottleneck and a burnout risk.
  • Which PRs take the longest to merge? If PRs touching module X take 3x longer than average, that module might have unclear ownership or excessive complexity.
  • What's the review-to-rework ratio? If a developer's PRs consistently require 3+ review cycles, they might need mentoring on that part of the codebase.

Most code review platforms (GitHub, GitLab, Azure DevOps) expose this data through their APIs. A simple script that queries the API weekly and tracks these metrics over time gives you visibility into your review process health.

bash
# GitHub CLI: list PR review times for the last 30 days
gh pr list --state merged --limit 50 --json number,createdAt,mergedAt,reviewDecision | \
  jq '.[] | {number, days_to_merge: (((.mergedAt | fromdateiso8601) - (.createdAt | fromdateiso8601)) / 86400 | floor)}'

Pattern 4: Coordination Costs

Conway's Law says that systems mirror the communication structures of the organizations that build them. Behavioral code analysis lets you see Conway's Law in action.

Social complexity measures how many different teams or individuals need to coordinate to change a module. High social complexity means changes require cross-team communication, scheduling, and sign-off, all of which slow delivery and increase error rates.

I measure it by counting the number of distinct authors per module per quarter. Modules where 5+ developers from 3+ teams make regular changes have high social complexity. These modules tend to have:

  • Higher defect rates (3-4x average, in my experience)
  • Longer PR cycle times (2-3x average)
  • More merge conflicts
  • More "this broke when I changed that" incidents

The fix is usually organizational, not technical. Either refactor the module along team boundaries or change the team structure so one team owns the entire module.

Pattern 5: Development Rhythm

The cadence of commits reveals how your team actually works:

  • Spike-and-silence pattern: Intense activity followed by quiet periods. Common when teams don't break work into small increments. Leads to large, risky PRs.
  • Steady rhythm: Consistent daily commits. Indicates well-scoped work and good development habits.
  • Friday afternoon surge: A disproportionate number of commits on Friday afternoon. Often means developers are rushing to complete sprint work. These commits have a higher-than-average defect rate in every dataset I've analyzed.
  • After-hours commits: Regular commits at 10 PM or on weekends. Signals burnout or unrealistic deadlines.
bash
# Commit distribution by hour and day
git log --since="6 months ago" --format='%ad' --date=format:'%u %H' | \
  awk '{day=$1; hour=$2; count[day" "hour]++} END {for (k in count) print count[k], k}' | \
  sort -k2,2n -k3,3n

The Framework: PEOPLE Analysis

P - Patterns: Mine your git log and PR data for the five patterns above: knowledge silos, temporal coupling, review bottlenecks, coordination costs, and development rhythm.

E - Evaluate risk: For each pattern, assess the risk. A knowledge silo in a logging module is less dangerous than one in the billing engine.

O - Overlay with structural data: Cross-reference behavioral patterns with structural metrics. Knowledge silos in high-complexity modules are your top priority. Review bottlenecks on high-churn files are your second priority.

P - Plan interventions: Design specific actions for each high-risk finding. Pair programming assignments, review rotations, module refactoring, team restructuring.

L - Loop: Re-run the analysis quarterly. Track whether your interventions are working. Behavioral patterns are slow to change, so don't expect overnight results.

E - Expand: Once you've addressed the highest-risk patterns, expand your analysis to cover more of the codebase. Start with your revenue-critical paths and work outward.

The Contrarian Take: Code Quality Is a People Problem

The uncomfortable truth that behavioral code analysis reveals is that most "code quality" problems are actually people problems. The code is messy because the team is under pressure. The architecture is tangled because the team structure doesn't match the system structure. The bugs keep recurring because knowledge is concentrated in one person.

You can run every linter and static analyzer on the market, and you still won't catch these issues. They live in the space between the code and the humans who write it.

Behavioral code analysis bridges that gap. It takes the human dimension of software development, the part we usually address with retrospectives and gut feelings, and turns it into data.

Data doesn't replace good management. But it does replace guessing with knowing. And that changes everything.

$ ls ./related

Explore by topic