Codebase Understanding

Code Search: Why It Matters More Than You Think

Vaibhav Verma

May 22, 2026

9 min read

code-searchdeveloper-toolscodebase-understandingdeveloper-productivityengineering-infrastructuresourcegraph

Code Search: Why It Matters More Than You Think

I didn't think about code search until I watched a senior engineer spend 35 minutes looking for where a webhook payload was processed. She checked 4 services, grepped through 3 repos, and asked 2 teammates before finding it in a file that didn't match any naming convention she expected. Thirty-five minutes to answer the question "where does this code live?"

That moment changed how I think about developer tooling. Code search isn't a nice-to-have. It's the foundation of everything else. You can't review code you can't find. You can't refactor code you can't trace. You can't understand a codebase you can't search.

The Contrarian Take: Code Search Is More Important Than Code Quality

I know that sounds extreme. Let me defend it. A team with poor code quality but excellent code search can find problems, understand them, and fix them. A team with beautiful code but no way to search across their 30 repos, 5 services, and 200K lines? They're working blind.

I've seen this play out. Teams that invested in search infrastructure (Sourcegraph, cross-repo grep, semantic search) shipped faster than teams that invested in code cleanups. Not because clean code doesn't matter. Because you can't clean what you can't find.

What Developers Actually Search For

I analyzed search query logs from a team of 40 engineers over 3 months. The patterns were revealing:

Search Category	% of Queries	Typical Query
"Where is this used?"	34%	Function/class name references
"How was this done before?"	22%	Pattern examples in existing code
"What calls this?"	18%	Reverse dependency tracing
"When did this change?"	12%	Git blame/log for specific code
"Where is this defined?"	9%	Type/interface definitions
"Who owns this?"	5%	Finding the right person to ask

The top three categories (74% of all searches) are about understanding relationships between code. Not finding a specific file, but understanding how pieces connect.

Standard IDE "find in files" handles simple text matching. It completely fails at relationship queries. "Show me everything that calls processPayment" requires understanding the call graph, not just text matching.

The Search Infrastructure Stack

After evaluating multiple approaches, here's the stack I recommend based on team size:

Tier 1: Small Teams (< 20 engineers, < 5 repos)

bash

# ripgrep is the minimum viable code search
# Install: cargo install ripgrep (or brew install ripgrep)

# Basic text search across all code
rg "processPayment" --type ts

# Search with context
rg "processPayment" --type ts -C 5

# Find all files that import a specific module
rg "from.*payments/processor" --type ts -l

# Search with regex for pattern matching
rg "async function \w+Payment" --type ts

ripgrep is fast enough for codebases under 500K lines. Combined with IDE search, it covers most needs. Cost: free. Setup time: 5 minutes.

Tier 2: Medium Teams (20-100 engineers, 5-30 repos)

At this scale, you need cross-repo search and basic code intelligence. Options:

GitHub Code Search: Free with GitHub. Handles cross-repo text search. No semantic understanding.
Sourcegraph (self-hosted or cloud): Cross-repo search with basic code intelligence. Understands language constructs.

// Sourcegraph query examples:

// Find all implementations of an interface
type:symbol PatternType:structural lang:typescript interface PaymentProcessor

// Find all error handling patterns in a specific repo
repo:^github\.com/company/api$ catch.*Error lang:typescript

// Find all TODO comments from a specific author
repo:company/* type:diff author:jane TODO

Tier 3: Large Teams (100+ engineers, 30+ repos)

At this scale, text search isn't enough. You need semantic code search: search that understands code structure, not just text patterns.

Features you need:

Cross-reference search: "Find all callers of this function across all repos"
Type-aware search: "Find all functions that return a Promise<User>"
Dependency graph search: "What services depend on the auth library?"
Historical search: "Show me all changes to payment processing in the last 30 days"

Building a Search-First Development Culture

The tooling matters less than the culture. Here's what I mean.

Practice 1: Search Before You Ask

We implemented a rule: before asking a teammate "where is X?" or "how does Y work?", spend 5 minutes searching. This sounds obvious, but most teams default to Slack questions. The problem with Slack answers is they're ephemeral. The answer helps one person once. A good search query helps everyone forever.

To make this work, your codebase has to be searchable. That means:

typescript

// BAD: Generic names that are impossible to search for
const data = await process(input);
const result = handle(data);

// GOOD: Specific names that are searchable
const paymentAuthorization = await authorizePayment(chargeRequest);
const settlementResult = settleAuthorizedPayment(paymentAuthorization);

Naming is a search optimization problem. Every function name, every variable name, every file name is a search term that future engineers will (or won't) find.

Practice 2: Document Through Searchable Landmarks

I use what I call "landmark comments." These are comments specifically designed to be found by search:

typescript

// ARCHITECTURE: Payment flow entry point
// OWNER: payments-team
// DEPENDENCIES: auth-service, billing-db, stripe-sdk
// LAST-REVIEWED: 2026-03-15
export async function initiatePayment(request: PaymentRequest): Promise&#x3C;PaymentResult> {
  // ...
}

These comments serve as search anchors. An engineer searching for "payment flow" or "OWNER: payments-team" finds exactly what they need. The structured format makes it greppable.

Practice 3: Search-Driven Code Review

During code review, I search for related code that the author might have missed:

bash

# Before approving a PR that changes error handling in payments:
rg "catch.*Payment" --type ts -l
# This finds all other payment error handling that might need the same change

This catches inconsistencies that no amount of staring at the diff would reveal. I've found related bugs in code review this way more times than I can count.

Measuring Search Effectiveness

Track these metrics to know if your search infrastructure is working:

typescript

interface SearchEffectivenessMetrics {
  avgTimeToFindCode: number;      // Target: &#x3C; 5 minutes
  searchQueriesPerDevPerDay: number;  // Higher is better (means they're using it)
  slackQuestionsAboutCodeLocation: number;  // Lower is better
  codeDiscoveryInOnboarding: number;  // Days until new hire finds code independently
}

When we invested in search tooling, our metrics shifted:

Metric	Before	After (3 months)
Avg time to find code	14 minutes	3 minutes
Slack "where is X" questions/week	23	4
New hire code independence	3 weeks	1 week
Cross-repo PRs (engineers found reusable code)	2/month	11/month

That last metric surprised me. When engineers could actually find existing code, they reused it instead of rewriting it. Code duplication dropped 28% in the first quarter.

The Stealable Framework: The FIND Audit

Run this monthly to assess your search capability:

F - Findability test: Give 3 engineers a list of 5 "find this code" challenges. Time them. If average time exceeds 10 minutes per challenge, your search is broken.

I - Infrastructure check: Can you search across all repos simultaneously? Can you search by symbol type (not just text)? Can you search git history? Score: 0-3 based on yes/no.

N - Naming audit: Sample 50 recent functions. Are the names specific enough to be found by someone who doesn't know the file path? If fewer than 70% are, you have a naming problem that's really a search problem.

D - Discovery rate: Track how often engineers discover existing code through search vs. reimplementing it. If your code duplication is increasing, search is failing.

Code search isn't glamorous. It's not going to show up in your engineering blog or conference talks. But it's the invisible infrastructure that determines whether your team can understand, modify, and improve the code they're responsible for. Invest in it before your 35-minute scavenger hunts become the norm.