AI Code Generation: When to Trust It and When to Rewrite
AI Code Generation: When to Trust It and When to Rewrite
Six months ago, I had a simple rule: always review AI code carefully but accept it if it looks correct. I was wrong. "Looks correct" isn't the right bar. Some AI-generated code that looks perfect should still be rewritten. And some AI code that looks rough is perfectly fine to ship.
The difference isn't about code quality. It's about what kind of code the AI generated and where it lives in your system.
The Trust Spectrum
After analyzing outcomes from hundreds of AI-generated code blocks, I've mapped out where AI excels and where it consistently fails. This isn't opinion. It's based on tracking bug reports and maintenance costs back to their source code.
High Trust: Ship It (With Standard Review)
UI components and layouts. AI is remarkably good at generating React/Vue/Svelte components. It's seen millions of them in training data, and UI components are relatively self-contained. The patterns are well-established, and errors are immediately visible.
// AI-generated component: usually trustworthy
export function UserCard({ user }: { user: User }) {
return (
<div className="rounded-lg border p-4 shadow-sm">
<div className="flex items-center gap-3">
<img
src={user.avatarUrl}
alt={user.name}
className="h-10 w-10 rounded-full"
/>
<div>
<h3 className="font-medium">{user.name}</h3>
<p className="text-sm text-gray-500">{user.email}</p>
</div>
</div>
</div>
);
}
// Why trust it: self-contained, visual errors caught immediately,
// follows established patternsType definitions and interfaces. AI generates accurate TypeScript types from descriptions. Types are declarative, easy to verify, and if wrong, the compiler catches it.
Config files and boilerplate. Dockerfile, docker-compose, tsconfig, ESLint configs. AI generates these well because they're well-documented and standardized.
Data transformation functions. Simple map/filter/reduce operations, format conversions, data reshaping. These are easy to test and verify.
Medium Trust: Review Carefully, Modify as Needed
CRUD API endpoints. AI generates working CRUD endpoints but frequently misses authorization checks, input validation edge cases, or proper error responses. The structure is right; the details need work.
Database queries. Simple queries are fine. Complex queries with joins, subqueries, or aggregations need careful verification. AI often generates queries that are correct but inefficient.
Form validation logic. AI generates reasonable validation but misses domain-specific rules. It knows email format validation but not that your system requires company email addresses only.
Low Trust: Usually Rewrite
Authentication and authorization logic. This is the most dangerous category. AI generates auth code that looks secure but has subtle flaws. Missing token expiry checks, incorrect role comparisons, or IDOR vulnerabilities.
// AI generated this auth middleware. Looks fine. Has a critical bug.
async function requireAdmin(req: Request, res: Response, next: NextFunction) {
const token = req.headers.authorization?.split(" ")[1];
if (!token) return res.status(401).json({ error: "Unauthorized" });
const decoded = jwt.verify(token, process.env.JWT_SECRET!);
if (decoded.role === "admin") {
req.user = decoded;
return next();
}
return res.status(403).json({ error: "Forbidden" });
}
// Problems:
// 1. jwt.verify throws on invalid tokens, uncaught exception crashes the server
// 2. No token expiry check (decoded.exp vs Date.now())
// 3. decoded.role comes from the token payload, which the client controls
// if you're not using asymmetric signing
// 4. No type assertion on decoded (could be string, not JwtPayload)Business logic with domain rules. AI doesn't know that your shipping rates changed last quarter, or that California orders have special tax rules, or that enterprise customers get different pricing tiers. It generates plausible business logic that's subtly wrong.
Error handling and recovery flows. AI generates generic error handling. Your system needs specific error handling based on which downstream services fail, what the fallback behavior should be, and what needs to be rolled back.
Database migrations. AI-generated migrations frequently miss data preservation, create incorrect indexes, or handle null values incorrectly. Always write these by hand.
Concurrency and state management. AI doesn't handle race conditions, deadlocks, or distributed system concerns well. If your code deals with concurrent access, locks, or distributed transactions, write it yourself.
The Decision Matrix
Use this matrix for any AI-generated code block:
| Question | Yes → | No → |
|---|---|---|
| Is this a well-known pattern (CRUD, UI, config)? | Lean toward trust | Lean toward rewrite |
| Can I verify correctness by looking at it for 2 minutes? | Lean toward trust | Lean toward rewrite |
| Is the blast radius small if there's a bug? | Lean toward trust | Lean toward rewrite |
| Does the codebase already have a similar pattern I can compare against? | Lean toward trust | Lean toward rewrite |
| Does this touch user data, money, or access control? | Lean toward rewrite | Lean toward trust |
| Does this require knowledge of business rules? | Lean toward rewrite | Lean toward trust |
| Will a failure here be silent (no immediate error)? | Lean toward rewrite | Lean toward trust |
Scoring: Count your "lean toward trust" answers.
- 5-7: Ship it with standard review.
- 3-4: Review thoroughly and modify where needed.
- 0-2: Rewrite from scratch. Use the AI output as a reference, not a starting point.
The Rewrite Doesn't Mean Starting Over
When I say "rewrite," I don't mean throw away the AI output entirely. I mean use it as a specification of what the code should do, then write the implementation yourself.
The AI output tells you: "Here are the functions needed, here are the inputs and outputs, here's the general structure." That's valuable even if the implementation itself isn't trustworthy.
// AI generated a payment processing function
// I used its structure but rewrote the implementation
// FROM AI: I kept the function signature and error types
interface PaymentResult {
success: boolean;
chargeId?: string;
error?: PaymentError;
}
// FROM AI: I kept the general flow outline
// MY CODE: I wrote the actual implementation with our patterns
async function processPayment(
order: Order,
idempotencyKey: string
): Promise<Result<PaymentResult, PaymentError>> {
// Check idempotency (AI missed this entirely)
const existing = await db.payment.findUnique({
where: { idempotencyKey },
});
if (existing) return ok(existing.result);
// Validate amount (AI used wrong minimum)
if (order.total < 0.50) {
return err({ code: "AMOUNT_TOO_LOW", message: "Minimum charge is $0.50" });
}
// Process with Stripe (AI used try/catch, we use Result)
const charge = await stripe.charges
.create({
amount: Math.round(order.total * 100),
currency: "usd",
idempotency_key: idempotencyKey,
})
.then((c) => ok(c))
.catch((e) => err(mapStripeError(e)));
if (!charge.ok) return charge;
// Record payment (AI forgot to record before returning)
await db.payment.create({
data: {
orderId: order.id,
chargeId: charge.value.id,
amount: order.total,
idempotencyKey,
result: { success: true, chargeId: charge.value.id },
},
});
return ok({ success: true, chargeId: charge.value.id });
}The Contrarian Take
Most AI code quality advice says "review everything carefully." I think that's wrong and unsustainable. Careful review of everything leads to review fatigue, which leads to rubber-stamping, which is worse than selective trust.
Instead, trust AI code in the categories where it consistently performs well. Focus your limited review energy on the categories where it consistently fails. This isn't being careless. It's being strategic about where human attention adds the most value.
The teams I work with that perform best aren't the ones that review AI code most thoroughly. They're the ones that review the right AI code thoroughly and ship the rest quickly.
Your Action Items
- Print the decision matrix and use it for your next 10 AI-generated code blocks
- Track which category each block falls into
- After two weeks, you'll have data on where your team's AI usage falls on the trust spectrum
- Adjust your process: automate checks for high-trust code, allocate more review time for low-trust code
- Re-evaluate monthly as AI tools improve and your team's patterns evolve
The goal isn't to distrust AI. It's to trust it precisely where it deserves trust, and apply human judgment precisely where it's needed.
$ ls ./related
Explore by topic