codeintelligently
Back to posts
AI & Code Quality

How to Review AI-Generated Code Without Losing Your Mind

Vaibhav Verma
8 min read
aicode-reviewbest-practicesengineering-processcopilotcode-quality

How to Review AI-Generated Code Without Losing Your Mind

Code review was already hard. Then AI entered the picture and made it three times harder. Not because AI code is worse. Because it's different in ways that break your existing review habits.

I used to spend 20-30 minutes reviewing a typical PR. Now, for AI-heavy PRs, I spend 15 minutes on the code and 30 minutes on what the code doesn't show. That shift took me a while to figure out.

Why Traditional Code Review Fails for AI Code

When a human writes code, the PR tells a story. You can see their thought process in the commit history. You can infer their intent from the changes they made. You know they had context about the existing system because they work in it daily.

AI code has none of that. It looks polished on the surface, and that's exactly the problem. Polished code creates a false sense of security during review. Your brain says "this looks clean" and moves on. But "looks clean" and "is correct" are different things.

The three review traps I fell into:

  1. The formatting trap. AI code is beautifully formatted. Your eyes glide over it. You approve faster because it looks professional.
  2. The naming trap. AI picks reasonable variable names. You don't question the logic because the names make sense.
  3. The completeness trap. AI generates comprehensive-looking code with error handling, comments, and types. You assume it covers all the cases because it covers a lot of them.

The REVIEW Method for AI Code

I developed a 6-step method specifically for reviewing AI-generated PRs. It takes about 25 minutes per PR once you get the hang of it.

R - Read the Requirements First

Before looking at the code, read the ticket or requirements doc. Write down the 3 most important things the code must do. Then check if the code does those 3 things. AI code often solves a slightly different problem than what was asked.

E - Examine Edge Cases

AI is consistently weak on edge cases. For every function, ask yourself:

  • What happens with null or undefined input?
  • What happens with an empty array or empty string?
  • What happens with concurrent requests?
  • What happens when the network fails?
typescript
// AI generated this user lookup
async function getUserByEmail(email: string): Promise<User> {
  const user = await prisma.user.findUnique({
    where: { email },
  });
  return user!; // <-- Non-null assertion. What if the user doesn't exist?
}

// What it should be
async function getUserByEmail(email: string): Promise<User | null> {
  if (!email || !email.includes("@")) {
    return null;
  }
  return prisma.user.findUnique({
    where: { email: email.toLowerCase().trim() },
  });
}

V - Verify Against Existing Patterns

Open 2-3 similar files in your codebase. Does the AI code follow the same patterns? Check specifically:

  • Error handling approach
  • Import style and dependency choices
  • Naming conventions
  • File/folder structure

I - Inspect Security Boundaries

AI consistently generates code with security gaps. My checklist:

markdown
- [ ] Authentication check present where needed?
- [ ] Authorization check (not just authn)?
- [ ] Input validated and sanitized?
- [ ] SQL/NoSQL injection possible?
- [ ] Secrets hardcoded anywhere?
- [ ] Rate limiting considered?
- [ ] CORS configured properly?
- [ ] Sensitive data in logs?

E - Evaluate Test Quality

Don't just check that tests exist. Check what they actually test.

typescript
// AI generated test (looks complete, tests nothing useful)
test("processPayment should process payment", async () => {
  const result = await processPayment({
    amount: 100,
    currency: "USD",
    userId: "user-1",
  });
  expect(result).toBeDefined();
  expect(result.status).toBe("success");
});

// What a useful test looks like
test("processPayment should fail for amounts below $0.50", async () => {
  await expect(
    processPayment({ amount: 0.49, currency: "USD", userId: "user-1" })
  ).rejects.toThrow("Amount below minimum");
});

test("processPayment should be idempotent for same idempotencyKey", async () => {
  const key = "idem-123";
  const first = await processPayment({ amount: 100, idempotencyKey: key });
  const second = await processPayment({ amount: 100, idempotencyKey: key });
  expect(first.chargeId).toBe(second.chargeId);
});

W - Watch for Unnecessary Complexity

AI loves to over-engineer. It'll add factory patterns, dependency injection, and abstract base classes where a simple function would do. If you find yourself thinking "why is this so complicated?", the answer is probably "because the AI's training data included a lot of enterprise Java patterns."

The complexity sniff test: Can you explain what the code does to a teammate in one sentence? If not, it's probably overcomplicated.

Time-Boxing Your Review

Here's my time allocation for a typical AI-generated PR (500-800 lines):

Phase Time Focus
Requirements check 3 min Does it solve the right problem?
Architecture scan 5 min Does it fit the existing codebase?
Edge case review 5 min What's missing?
Security check 3 min Run through security checklist
Test review 5 min Do tests verify behavior or implementation?
Overall assessment 4 min Would I be comfortable debugging this at 2am?

Total: 25 minutes. Compare to 45+ minutes of unstructured reviewing.

The Contrarian Take

Most teams try to make code reviews faster. I think we should make them more selective. Not every AI-generated PR needs a full REVIEW pass.

My triage system:

  • Green (5 min review): UI tweaks, copy changes, config updates. AI does these well.
  • Yellow (15 min review): New features with no external integrations. Standard REVIEW method.
  • Red (25+ min review): Anything touching auth, payments, data migrations, or external APIs. Full REVIEW method plus a pair review with another senior engineer.

The key insight: spend less time on the code AI gets right (formatting, basic CRUD, UI components) and more time on what it gets wrong (security, edge cases, architecture).

What I Got Wrong

I used to leave long review comments explaining why the AI code was wrong. Complete waste of time. The person who prompted the AI often can't fix the underlying issue because they don't understand it either. They just re-prompt and get slightly different wrong code.

Now, when I find a significant issue in AI-generated code, I write a 3-line fix and push it directly to the branch. Then I leave a comment explaining the issue so the team learns from it. This is 5x faster and actually prevents the issue from recurring.

Making It Sustainable

AI code review fatigue is real. If every PR requires deep investigation, your team will burn out or start rubber-stamping. The antidote is automated guardrails that catch the common issues before code review even starts. Invest in custom ESLint rules, pre-commit hooks, and CI checks. Let machines catch machine mistakes. Save human review time for the things only humans can evaluate: "Does this make sense in context?"

$ ls ./related

Explore by topic