codeintelligently
Back to posts
Codebase Understanding

Monorepo vs Polyrepo: Impact on Understanding

Vaibhav Verma
11 min read
monorepopolyrepocodebase-understandingarchitectureengineering-leadershipdeveloper-experience

Monorepo vs Polyrepo: Impact on Understanding

The monorepo vs. polyrepo debate has been going on for years, and most of it misses the point. Teams argue about build tooling, CI performance, and dependency management. Those are real concerns. But the factor that actually determines whether your team ships or stalls is something nobody benchmarks: how well engineers understand the code they're working with.

I've managed teams on both sides. I ran a 140-service polyrepo setup at one company and migrated a 50-engineer team to a monorepo at another. The technical tradeoffs are well-documented. What isn't documented is how each approach changes the way engineers think about, discover, and reason about code.

The Understanding Gap Nobody Measures

Here's my contrarian take: the best repo strategy is the one that minimizes the time between "I need to change something" and "I understand what to change." Everything else is secondary.

I tracked this metric informally across both setups. In the polyrepo world, engineers spent an average of 47 minutes just finding the right code before they could start making changes. In the monorepo, that dropped to 11 minutes. That's 36 minutes of cognitive overhead per task, multiplied by every engineer, every day.

But it's not that simple. The monorepo had its own understanding tax that showed up in different ways. Let me break down what I observed across both models.

How Polyrepos Fragment Knowledge

In a polyrepo setup, each service or library lives in its own repository. The pitch is clean separation of concerns. The reality is fragmented understanding.

Problem 1: The Discovery Tax

When an engineer needs to understand how user authentication works end-to-end, they might need to look at:

  • auth-service repo (handles login)
  • user-service repo (manages profiles)
  • api-gateway repo (routes and validates tokens)
  • shared-types repo (TypeScript interfaces)
  • auth-sdk repo (client library)

That's 5 repos to clone, 5 README files to read, 5 different project structures to learn. Each repo has its own conventions, its own folder structure, its own way of doing things.

bash
# The polyrepo discovery workflow
git clone git@github.com:company/auth-service.git
git clone git@github.com:company/user-service.git
git clone git@github.com:company/api-gateway.git
# Now figure out which versions are compatible...
# Now figure out which branch each team is working from...
# Now figure out where the actual logic lives in each one...

Problem 2: Version Confusion

In one polyrepo setup, I found that the shared-types package had 14 different versions in use across 23 services. Engineers were building against different contracts without knowing it. Bugs that should've been caught by TypeScript sailed through because each service compiled against its own snapshot of the types.

Problem 3: The Tribal Knowledge Problem

Each repo develops its own culture. The auth team uses a result pattern for errors. The billing team throws exceptions. The notifications team uses event-driven error propagation. A new engineer joining the company has to learn 3 different paradigms to understand 3 services.

I measured this: it took new engineers an average of 4.2 months to feel "productive" in our 140-repo polyrepo setup. In a comparable monorepo org, the number was 2.1 months.

How Monorepos Create Different Understanding Problems

Monorepos solve the discovery problem but create their own challenges. The common narrative that monorepos are strictly better for understanding is wrong.

Problem 1: The Overwhelm Effect

A monorepo with 2 million lines of code is intimidating. Engineers don't know where to start. In a polyrepo, the boundary is obvious: this repo does auth, that repo does billing. In a monorepo, you need explicit organizational structure or engineers drown.

# Without clear structure, monorepos become a maze
src/
  components/     # 400+ files
  services/       # 200+ files
  utils/          # 150+ files - the junk drawer
  lib/            # what's the difference between this and utils?
  helpers/        # seriously?
  core/           # everything is "core" apparently

Problem 2: False Familiarity

Because everything is in one repo, engineers assume they understand modules they've never worked in. I call this the "I can see it, so I understand it" fallacy. In polyrepos, the act of cloning a new repo creates a natural pause where the engineer acknowledges "I'm entering unfamiliar territory." Monorepos remove that signal.

The result: engineers make changes to unfamiliar modules with unearned confidence. In one quarter, 60% of our production incidents came from engineers modifying code outside their team's ownership area, without consulting the owning team.

Problem 3: The Blast Radius of Refactoring

In a monorepo, a well-intentioned refactor can touch 200 files across 15 teams. The person making the change understands their module. They don't understand the 14 other modules they're affecting.

typescript
// "I'll just rename this shared interface, what could go wrong"
// git diff --stat:
// 247 files changed, 1,893 insertions(+), 1,891 deletions(-)
// Now 14 teams need to review changes they didn't ask for

The Understanding-First Decision Framework

Stop choosing based on build tooling. Choose based on these understanding-focused criteria:

Factor 1: Cross-Team Change Frequency

Track how often engineers need to modify code owned by another team. If more than 30% of your PRs cross team boundaries, a monorepo reduces friction. If less than 10%, polyrepos enforce healthy boundaries.

Factor 2: Shared Abstraction Count

Count the number of shared libraries, types, and utilities. If you have more than 20 shared packages, a monorepo makes versioning and discovery dramatically easier. Fewer than 5? Polyrepos are fine.

Factor 3: Onboarding Coherence

Ask recent hires: "How long until you could make changes confidently?" If the answer is over 3 months, you have an understanding problem that repo structure might help solve.

Factor Favors Monorepo Favors Polyrepo
Cross-team changes > 30% of PRs < 10% of PRs
Shared packages > 20 < 5
Onboarding time > 3 months (fragment problem) > 3 months (overwhelm problem)
Team size < 100 engineers > 500 engineers
Deploy independence Low need High need
Code search importance Critical Nice-to-have

Factor 4: Code Search Infrastructure

This is the sleeper factor. If you go polyrepo, you must invest in cross-repo code search (Sourcegraph, GitHub code search, or similar). Without it, discovery is broken. If you go monorepo, standard IDE search works. The cost of cross-repo search infrastructure is real and often overlooked.

The Hybrid Approach That Actually Works

After years on both sides, I've landed on a pattern I call "monorepo core, polyrepo periphery." Here's the structure:

In the monorepo:

  • All shared libraries and types
  • All core business logic
  • All API definitions and contracts
  • Internal tools and scripts

In separate repos:

  • Infrastructure and IaC (different lifecycle)
  • ML/data pipelines (different toolchain)
  • Mobile apps (different build systems)
  • Third-party integrations with dedicated teams

This gives you unified understanding for the code that matters most (business logic and shared contracts) while allowing specialized toolchains where they're needed.

The Stealable Framework: The FIND Test

Before choosing a repo strategy, run the FIND test with 5 engineers on your team:

  1. Find: "Find the code that handles payment refunds." Time how long it takes.
  2. Inspect: "What other services does the refund flow touch?" See if they can trace the full path.
  3. Navigate: "Find the test for this logic." Measure the hop count between production code and its tests.
  4. Discover: "What changed in the refund flow in the last month?" See if they can reconstruct recent history.

Run FIND in your current setup. If average times exceed: F > 15 min, I > 30 min, N > 5 min, D > 20 min, your repo structure is hurting understanding.

The right repo strategy isn't about what Google or Facebook uses. It's about whether your specific team, with your specific codebase, can find, understand, and safely change code. Measure that, and the repo decision makes itself.

$ ls ./related

Explore by topic