Monorepo vs Polyrepo: Impact on Understanding
Monorepo vs Polyrepo: Impact on Understanding
The monorepo vs. polyrepo debate has been going on for years, and most of it misses the point. Teams argue about build tooling, CI performance, and dependency management. Those are real concerns. But the factor that actually determines whether your team ships or stalls is something nobody benchmarks: how well engineers understand the code they're working with.
I've managed teams on both sides. I ran a 140-service polyrepo setup at one company and migrated a 50-engineer team to a monorepo at another. The technical tradeoffs are well-documented. What isn't documented is how each approach changes the way engineers think about, discover, and reason about code.
The Understanding Gap Nobody Measures
Here's my contrarian take: the best repo strategy is the one that minimizes the time between "I need to change something" and "I understand what to change." Everything else is secondary.
I tracked this metric informally across both setups. In the polyrepo world, engineers spent an average of 47 minutes just finding the right code before they could start making changes. In the monorepo, that dropped to 11 minutes. That's 36 minutes of cognitive overhead per task, multiplied by every engineer, every day.
But it's not that simple. The monorepo had its own understanding tax that showed up in different ways. Let me break down what I observed across both models.
How Polyrepos Fragment Knowledge
In a polyrepo setup, each service or library lives in its own repository. The pitch is clean separation of concerns. The reality is fragmented understanding.
Problem 1: The Discovery Tax
When an engineer needs to understand how user authentication works end-to-end, they might need to look at:
auth-servicerepo (handles login)user-servicerepo (manages profiles)api-gatewayrepo (routes and validates tokens)shared-typesrepo (TypeScript interfaces)auth-sdkrepo (client library)
That's 5 repos to clone, 5 README files to read, 5 different project structures to learn. Each repo has its own conventions, its own folder structure, its own way of doing things.
# The polyrepo discovery workflow
git clone git@github.com:company/auth-service.git
git clone git@github.com:company/user-service.git
git clone git@github.com:company/api-gateway.git
# Now figure out which versions are compatible...
# Now figure out which branch each team is working from...
# Now figure out where the actual logic lives in each one...Problem 2: Version Confusion
In one polyrepo setup, I found that the shared-types package had 14 different versions in use across 23 services. Engineers were building against different contracts without knowing it. Bugs that should've been caught by TypeScript sailed through because each service compiled against its own snapshot of the types.
Problem 3: The Tribal Knowledge Problem
Each repo develops its own culture. The auth team uses a result pattern for errors. The billing team throws exceptions. The notifications team uses event-driven error propagation. A new engineer joining the company has to learn 3 different paradigms to understand 3 services.
I measured this: it took new engineers an average of 4.2 months to feel "productive" in our 140-repo polyrepo setup. In a comparable monorepo org, the number was 2.1 months.
How Monorepos Create Different Understanding Problems
Monorepos solve the discovery problem but create their own challenges. The common narrative that monorepos are strictly better for understanding is wrong.
Problem 1: The Overwhelm Effect
A monorepo with 2 million lines of code is intimidating. Engineers don't know where to start. In a polyrepo, the boundary is obvious: this repo does auth, that repo does billing. In a monorepo, you need explicit organizational structure or engineers drown.
# Without clear structure, monorepos become a maze
src/
components/ # 400+ files
services/ # 200+ files
utils/ # 150+ files - the junk drawer
lib/ # what's the difference between this and utils?
helpers/ # seriously?
core/ # everything is "core" apparently
Problem 2: False Familiarity
Because everything is in one repo, engineers assume they understand modules they've never worked in. I call this the "I can see it, so I understand it" fallacy. In polyrepos, the act of cloning a new repo creates a natural pause where the engineer acknowledges "I'm entering unfamiliar territory." Monorepos remove that signal.
The result: engineers make changes to unfamiliar modules with unearned confidence. In one quarter, 60% of our production incidents came from engineers modifying code outside their team's ownership area, without consulting the owning team.
Problem 3: The Blast Radius of Refactoring
In a monorepo, a well-intentioned refactor can touch 200 files across 15 teams. The person making the change understands their module. They don't understand the 14 other modules they're affecting.
// "I'll just rename this shared interface, what could go wrong"
// git diff --stat:
// 247 files changed, 1,893 insertions(+), 1,891 deletions(-)
// Now 14 teams need to review changes they didn't ask forThe Understanding-First Decision Framework
Stop choosing based on build tooling. Choose based on these understanding-focused criteria:
Factor 1: Cross-Team Change Frequency
Track how often engineers need to modify code owned by another team. If more than 30% of your PRs cross team boundaries, a monorepo reduces friction. If less than 10%, polyrepos enforce healthy boundaries.
Factor 2: Shared Abstraction Count
Count the number of shared libraries, types, and utilities. If you have more than 20 shared packages, a monorepo makes versioning and discovery dramatically easier. Fewer than 5? Polyrepos are fine.
Factor 3: Onboarding Coherence
Ask recent hires: "How long until you could make changes confidently?" If the answer is over 3 months, you have an understanding problem that repo structure might help solve.
| Factor | Favors Monorepo | Favors Polyrepo |
|---|---|---|
| Cross-team changes | > 30% of PRs | < 10% of PRs |
| Shared packages | > 20 | < 5 |
| Onboarding time | > 3 months (fragment problem) | > 3 months (overwhelm problem) |
| Team size | < 100 engineers | > 500 engineers |
| Deploy independence | Low need | High need |
| Code search importance | Critical | Nice-to-have |
Factor 4: Code Search Infrastructure
This is the sleeper factor. If you go polyrepo, you must invest in cross-repo code search (Sourcegraph, GitHub code search, or similar). Without it, discovery is broken. If you go monorepo, standard IDE search works. The cost of cross-repo search infrastructure is real and often overlooked.
The Hybrid Approach That Actually Works
After years on both sides, I've landed on a pattern I call "monorepo core, polyrepo periphery." Here's the structure:
In the monorepo:
- All shared libraries and types
- All core business logic
- All API definitions and contracts
- Internal tools and scripts
In separate repos:
- Infrastructure and IaC (different lifecycle)
- ML/data pipelines (different toolchain)
- Mobile apps (different build systems)
- Third-party integrations with dedicated teams
This gives you unified understanding for the code that matters most (business logic and shared contracts) while allowing specialized toolchains where they're needed.
The Stealable Framework: The FIND Test
Before choosing a repo strategy, run the FIND test with 5 engineers on your team:
- Find: "Find the code that handles payment refunds." Time how long it takes.
- Inspect: "What other services does the refund flow touch?" See if they can trace the full path.
- Navigate: "Find the test for this logic." Measure the hop count between production code and its tests.
- Discover: "What changed in the refund flow in the last month?" See if they can reconstruct recent history.
Run FIND in your current setup. If average times exceed: F > 15 min, I > 30 min, N > 5 min, D > 20 min, your repo structure is hurting understanding.
The right repo strategy isn't about what Google or Facebook uses. It's about whether your specific team, with your specific codebase, can find, understand, and safely change code. Measure that, and the repo decision makes itself.
$ ls ./related
Explore by topic