How to Prevent AI Hallucination in Production

The Hard Part of Production AI Isn't the AI

If you want to prevent AI hallucination in production, the first thing to accept is that the model is the easy part. The capability is cheap now. Any developer can wire up an LLM that answers questions, writes copy, or generates images, and the demo will look incredible.

The demo always works. That's the problem.

The hard, valuable engineering is everything around the model. It's the constraint, not the capability. It's the part that stops the model from confidently inventing something that costs your client money, a lawsuit, or their reputation.

Here's what I mean in plain terms. An AI that recommends a product you don't sell is a liability. An AI that tells a business owner he can take a job he isn't licensed to do is a liability. An AI that draws a fake version of your product label, with the wrong ingredients, is a liability. None of those are features. They look like features in a sandbox, then they blow up in front of a customer.

I've built 15+ AI systems in production across my own DTC fashion brand and for clients. The thread running through all of them isn't a clever prompt or a bigger model. It's the same engineering muscle: constrain the model so it physically cannot do the thing that hurts you.

In this article I'll walk through three real builds, in three different industries, that all came down to that same move. These are shipped systems, not slides. And in every case the work that mattered wasn't teaching the AI to be smart. It was stopping it from being confidently wrong.

Why a Confident AI Is More Dangerous Than a Wrong One

Hallucination isn't random noise. That's the part most people get wrong about it.

Architecture diagram showing the AI model handling judgment on the left and code handling truth retrieval on the right, separated by a boundary line so the model never generates factual values. Model judges, code computes separation

A hallucination is plausible-sounding fiction delivered with total confidence. The model doesn't stammer or flag uncertainty. It states the invented price, the fake SKU, the out-of-scope promise in exactly the same tone it uses for facts. Your customer cannot tell the difference. Neither can your staff, half the time.

That's the specific failure mode that hurts businesses. A sales assistant quotes a SKU that doesn't exist. An intake agent promises a service the company doesn't offer. An image generator renders a product label with the wrong claims on it. Each of those is delivered with the same polished confidence as a correct answer, which is exactly why it's dangerous.

You don't fix this with a better prompt. You don't fix it with a bigger model. Prompting "don't make things up" reduces the frequency, it doesn't remove the capability. As long as the model is allowed to generate the answer freely, it can invent freely.

The real fix is structural. You remove the model's ability to make things up in the first place. You take the dangerous output out of the model's hands entirely and hand it to code that can only return real values.

This is the core reason most AI projects fail. Teams treat the model as the whole system. They ship the raw output straight to the customer and hope the prompt holds.

The roughly 12% of AI projects that actually work in production treat the model as one component inside a larger system that constrains it. The model gets to be smart about judgment. The system stays in charge of truth. That separation is the entire game.

Constraint One: Lock the Recommender to a Real Catalog

The first build was an AI sales assistant for a brand selling a large catalog of physical products.

The naive version that hallucinates

The naive build is simple. You hand the model a customer question and let it answer freely. "What's a good waterproof option under $80?" The model writes a confident, friendly answer, and somewhere in that answer it names a product, a price, and a spec.

The trouble is the model invents all three. It generates a product name that sounds right because similar names appeared in its training data. It guesses a price. It assigns specs that may or may not match anything in your inventory. The customer reads a polished recommendation for something you've never sold.

In a catalog of hundreds of items, this happens constantly. And it's invisible until a customer tries to buy the thing that doesn't exist.

How catalog-locking actually works

Here's the fix. The model never generates a SKU. Not once.

Comparison showing a naive AI recommender inventing fake SKUs and prices versus a catalog-locked system that retrieves real products from a verified index. Catalog-locked recommender vs naive recommender

The model's only job is to read the customer's intent and match it. The code's job is to retrieve real, in-stock items at their real, current prices from a verified product index. The model can say "you want something waterproof and affordable," and then the system returns the actual products that fit, pulled from the source of truth.

This is the principle I write about constantly: let the model judge and let the code compute. The model is good at judgment, matching intent, reading tone. Code is good at retrieving exact values. You separate those two responsibilities and never let them blur.

In my own DTC brand I run 564+ products that are dynamically priced by an AI pricing engine on a four-tier classification. Those prices change. If a recommendation assistant ever guessed a price instead of pulling the live one, it would quote numbers that are wrong the moment they're spoken. The catalog has to stay authoritative, always.

The payoff is structural. The assistant is genuinely helpful and it is incapable of selling something that doesn't exist. Not unlikely to. Incapable. That's the difference a constraint makes.

Constraint Two: Gate the Agent Against a Real Credential

The second build was a research agent for a licensed contractor.

Telling the owner when to say no

The owner wanted an agent that could research an incoming job, figure out what was involved, and tell him whether to take it. The obvious version is an enthusiastic AI that scopes the work and says "yes, you can do this, here's how."

That version is exposure, not help.

A lot of contracting work requires a specific license class or a jurisdiction permit. If the agent looks at a job, gets excited, and greenlights work the owner isn't licensed to perform, it just walked him into a code violation, a failed inspection, or worse. The AI being helpful and the AI being safe are pointing in opposite directions here.

Why the no is the valuable output

So we gated every recommendation against two real things: the owner's actual credentials and the actual rules for the jurisdiction.

Vertical decision flowchart showing an AI contractor agent gating every job recommendation against the owner's credentials and jurisdiction permits, with a kill switch that forces a stop when checks fail. Credential gate kill switch decision flow

Before the agent ever says "take this job," it checks whether the work falls inside what he's licensed to do, and whether the jurisdiction requires a permit he'd need to pull. If the job needs a license class he doesn't hold, the agent tells him to walk away or partner with someone who does.

The most valuable output of that agent is the no. A constrained AI that says "you can't legally do this" is worth far more than an unconstrained one that says yes to everything.

This is why every system I ship stops for a human. The credential check isn't a polite suggestion buried in the response. It's a kill switch built into the logic. The agent cannot route a job into the "go ahead" path if the credential check fails. The model's enthusiasm doesn't get a vote on a legal question.

That's the whole point. You decide ahead of time what the AI must never get wrong, then you build the system so it can't override that boundary, no matter how confident it sounds.

Constraint Three: Composite the Real Product, Don't Let the Model Draw It

The third build was product imagery for a brand selling a labeled physical product, the kind that comes in a bottle.

Where generated product images go wrong

Ask an image model to generate your product and it will absolutely produce something gorgeous. It will also invent the label.

The text gets garbled. The ingredient list changes. The model adds a claim you'd never put on packaging, or removes one you're legally required to show. For a generic prop in a lifestyle shot, fine. For a regulated or branded product, that's a legal and brand problem wearing a pretty filter.

Nobody catches it at first, because the image looks professional. The garbled label only becomes a problem when it's live on a product page and someone reads it closely.

Real label, real photo, AI does the staging

The fix is to never let the model draw the part that has to be exact.

Diagram contrasting a fully AI-generated product image with a garbled label against a composite approach where the real product label is dropped in and AI only handles the staging. Composite real label vs AI-generated label

We composite the real product asset, the actual label and the actual photograph, and let AI handle only the staging. Lighting, background, scene, mood. The creative parts. The label fidelity is untouchable because the label is a real asset dropped into the frame, not something the model imagined.

This is the pattern I cover in detail in composite the real thing instead. The model does the part where being creative is safe. The source of truth, the actual product, stays exactly as it is.

In my own product photography pipeline I take this further. The system scores its own output and rejects shots that don't hold up, so the creative work is constrained by a quality bar too. But the foundational rule is the same as the other two builds. The model is creative about the background. It is never creative about the label.

Same move as the catalog. Same move as the credential. The thing that must be exact is held outside the model's reach.

Same Engineering Muscle, Three Industries

Look at all three and the pattern is identical.

Radial infographic showing one constraint engineering move applied across three industries, each protecting a different source of truth: the catalog, credentials, and the real product asset. One engineering muscle across three industries

In the recommender, the source of truth was the catalog. In the contractor agent, it was the credential and the jurisdiction rules. In the imagery pipeline, it was the real product asset. Three industries, three completely different problems, one engineering move.

Identify the source of truth. Make the model structurally incapable of overriding it. Then let the model do only the part where being creative or judgmental is actually safe.

That reduces to a rule you can use on any project: figure out what the AI must never get wrong, then engineer it so it can't. Not "remind it not to." Engineer it so the wrong answer isn't a path the system can take.

This is the line between a demo and a production system. A demo shows you the capability. A production system shows you the capability with all the dangerous outputs fenced off.

And notice this is the same muscle whether you're in fashion, trades, or regulated consumer products. That's why range matters more than narrow specialization. The constraint thinking transfers. The industry specifics are just a different source of truth to protect.

I learned this the expensive way, building 15+ systems for my own DTC brand before I ever did it for a client. The constraints in my work came from real failures, watching a model confidently do something I never would have approved, not from a whitepaper. By the time I built these for clients, I already knew where the model lies.

Build the Constraints Before You Build the Capability

If you've been burned by an AI vendor, here's what probably happened. They sold you capability and skipped the constraints. The demo dazzled. The production rollout embarrassed someone.

That's the whole story of bad AI projects. The capability was never the hard part. The vendor built the impressive thing and left out the unglamorous work that keeps it from being confidently wrong in front of your customers.

If you're putting AI anywhere near customers, money, or compliance, the question is not "can the model do this." It can. The question is "what stops it from doing the wrong thing confidently." That's the work. It's where I spend most of my time, and it's the part that almost nobody shows you in a sales call.

So if you want AI in your business that's built to be wrong-proof on the things that actually matter, that's exactly what I build. Real catalogs the model can't break. Credential gates the model can't override. Source-of-truth assets the model can't redraw. The way to do this right is to build the constraints before the capability.

I'd rather tell you what won't work yet than sell you a demo that falls over in month two.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI actually fits.

Book a Discovery Call