Refactor vs. Rewrite: How to Make the Call
Refactor vs. Rewrite: How to Make the Call
I've been on both sides of this decision. In 2021, I advocated for a rewrite that should have been a refactor. We spent 9 months building a replacement system that launched with 60% of the features of the original and introduced new bugs the old system never had. In 2023, I pushed for a refactor that should have been a rewrite. We spent 6 months incrementally improving a system that was architecturally broken in ways incremental changes couldn't fix.
Both mistakes cost hundreds of thousands of dollars in engineering time and months of delayed product work. They also taught me that the refactor-vs-rewrite decision is the highest-stakes call an engineering leader makes, and most of us make it based on emotion rather than evidence.
Let me give you the framework I wish I'd had.
Why This Decision Is So Hard
The refactor-vs-rewrite question triggers every cognitive bias engineers have:
Bias toward rewriting: We overestimate our ability to build a better system from scratch. We underestimate the hidden complexity in the existing system. We assume the problems are in the code rather than in the requirements. Joel Spolsky wrote about this in 2000, and engineers keep making the same mistake.
Bias toward refactoring: We overestimate our ability to incrementally fix a fundamentally broken architecture. We fall prey to sunk cost fallacy. We keep patching because the idea of throwing away code feels like admitting the original was wrong.
Both biases are strong, and they often cancel out in team discussions, leading to paralysis or decision by whoever argues loudest. That's not engineering. That's politics.
The Decision Framework
I score the system on 8 factors. Each factor points toward refactor or rewrite. Tally the results.
Factor 1: Architecture Fitness
Is the fundamental architecture sound, or is it the wrong architecture for the current requirements?
Signs architecture is fundamentally wrong:
- The system was designed for 100 users and serves 100,000
- It's a monolith that needs to be distributed (or vice versa)
- The data model doesn't represent the current business domain
- You need real-time and the system is batch-oriented
Signs architecture is sound but implementation is messy:
- The right abstractions exist but they're poorly implemented
- The data model is correct but queries are inefficient
- The service boundaries make sense but the interfaces are clunky
Architecture wrong: Points to REWRITE. Architecture sound: Points to REFACTOR.
Factor 2: Test Coverage
Can you refactor safely?
| Coverage Level | Implication |
|---|---|
| >70% with quality tests | Refactor safely |
| 30-70% | Refactor cautiously, add tests first |
| <30% | Refactoring is risky; consider rewrite |
| 0% and no specification | Rewrite might be necessary |
Without tests, every refactoring step is a gamble. You might introduce regressions you won't catch until production. If writing tests for the existing system is harder than building a new one, that's a strong signal.
Good coverage: Points to REFACTOR. No coverage and untestable design: Points to REWRITE.
Factor 3: Team Knowledge
Does anyone on your team understand the existing system deeply?
If the original authors are gone and there's no documentation, a rewrite might be more practical than reverse-engineering a system nobody understands. If the original authors are still around, refactoring is faster because they know where the bodies are buried.
Team understands the system: Points to REFACTOR. Nobody understands the system: Points to REWRITE (but budget for rediscovering hidden requirements).
Factor 4: Business Criticality and Risk Tolerance
How much risk can the business absorb?
Rewrites are riskier. The new system might not reach feature parity. There's a long period where you're maintaining two systems. If anything goes wrong, you might need to fall back to the old system.
Refactoring is lower risk per change, but the cumulative risk over many small changes adds up.
Low risk tolerance (payments, healthcare, financial): Points to REFACTOR. Higher risk tolerance (internal tools, early-stage products): Neutral.
Factor 5: Technology Stack
Is the current technology viable going forward?
If you're on a framework that's been abandoned, a language that can't hire for, or a platform that's being deprecated, refactoring preserves a dependency on a dead technology. That's throwing good effort after bad.
Stack is viable: Points to REFACTOR. Stack is dead/dying: Points to REWRITE.
Factor 6: Incremental Value
Can you deliver value incrementally during the remediation?
A refactoring effort delivers small improvements continuously. A rewrite delivers zero value until the new system is ready. If the business needs continuous improvement, refactoring is more compatible.
Need continuous delivery: Points to REFACTOR. Can tolerate a pause for a better outcome: Neutral.
Factor 7: Scope of Change
What percentage of the codebase needs to change?
Rule of thumb:
<25% of code needs changing -> Refactor
25-50% of code needs changing -> Could go either way
>50% of code needs changing -> Rewrite
If you're going to touch most of the code anyway, you're doing a rewrite with extra steps. Just do the rewrite.
Small scope: Points to REFACTOR. Most of the system: Points to REWRITE.
Factor 8: The Strangler Fig Option
Can you replace the system piece by piece?
This is the golden path. Instead of choosing between "refactor everything in place" and "rewrite everything from scratch," you build new components alongside old ones and gradually migrate.
Old System: [Auth] [Payments] [Orders] [Shipping] [Reporting]
Phase 1: [Auth] [Payments] [Orders] [Shipping] [NEW Reporting]
Phase 2: [Auth] [NEW Payments] [Orders] [Shipping] [NEW Reporting]
Phase 3: [NEW Auth] [NEW Payments] [Orders] [Shipping] [NEW Reporting]
...
If the system has clear boundaries between components, the strangler fig approach gives you the best of both worlds: you can rewrite the worst parts while keeping the stable parts.
Clear boundaries exist: STRANGLER FIG (the right answer 60% of the time). No clear boundaries: Choose between REFACTOR and REWRITE based on other factors.
The Scoring Sheet
REFACTOR vs. REWRITE SCORECARD
Factor | Refactor | Rewrite | Score
------------------------------|----------|---------|------
1. Architecture fitness | [ ] | [ ] |
2. Test coverage | [ ] | [ ] |
3. Team knowledge | [ ] | [ ] |
4. Risk tolerance | [ ] | [ ] |
5. Technology stack viability | [ ] | [ ] |
6. Incremental value needed | [ ] | [ ] |
7. Scope of change | [ ] | [ ] |
8. Strangler fig possible | N/A if yes (use strangler fig)
TALLY: Refactor: ___ Rewrite: ___
If strangler fig is possible, prefer that approach regardless of tally.
If tally is 5+ for one option, the decision is clear.
If tally is close (4-3 or 3-4), default to refactor. Rewrites carry
more risk and should only win with a clear margin.
The Rewrite Trap: Three Questions Before You Commit
If the scorecard points to rewrite, ask these three questions before committing:
1. Have you documented all the hidden requirements?
Every system has behaviors that nobody asked for but everyone relies on. That weird edge case in the billing logic? A customer depends on it. That inconsistency in the date format? Three downstream systems parse it. If you can't list at least 20 hidden requirements, you haven't looked hard enough.
2. Can you run both systems in parallel?
The safest rewrite runs the new system alongside the old one, comparing outputs. If you can't do this, your risk goes up dramatically.
3. What's your rollback plan?
If the rewrite fails at month 6, what happens? If the answer is "we're committed," you need to reconsider. Always have a path back to the old system until the new one is proven.
The Contrarian Take
Here's the uncomfortable truth: most rewrites fail. Studies from companies like Basecamp, Netscape, and others show that full rewrites are completed on time and on scope less than 30% of the time. The industry's track record is terrible.
But here's what nobody says: some rewrites are the right call despite the odds. When I look at the rewrites that succeeded, they share three traits: (1) the old architecture was genuinely wrong for the current scale, (2) the team had a clear specification for the new system, and (3) they used a strangler fig approach rather than a big bang.
The biggest mistake isn't choosing to rewrite. It's choosing to rewrite as a big bang. Build the new system one component at a time. Migrate traffic gradually. Keep the old system running until the new one proves itself. That's not really a rewrite. It's a series of targeted replacements. And it works.
When in doubt, default to refactor. It's the safer bet. But don't let fear of rewrites keep you trapped in a system that fundamentally can't support where your business needs to go.
$ ls ./related
Explore by topic