Back to Blog
llm-guardrailscatalogquotingclaudestructured-output

Prevent AI Hallucination in Product Recommendations

How I built a catalog-locked AI sales assistant that can't invent a SKU. The architecture that prevents AI hallucination in product recommendations.

By Mike Hodgen

Short on time? Read the simplified version

The fear that kills most customer-facing AI projects

A distributor came to me with a simple ask. He sold packaging supplies, about 20 SKUs total, fixed catalog, nothing fancy. He wanted an AI assistant on his site that could take a plain-English question like "what do I use to ship fragile candles" and point the customer to the right box.

Pipeline diagram showing the three-move architecture to prevent AI hallucination: inject catalog, structured SKU output, deterministic validation, before reaching the customer The three-move architecture pipeline (inject, structure, validate)

His first question wasn't "can it be smart." It was "can it lie to my customers."

That's the right question. That's the only question that matters before you put AI in front of paying customers. Because the moment your assistant confidently recommends a SKU you don't carry, you've built something worse than useless. You've built a machine that quotes products you can't fulfill, in your own brand voice, to people about to buy.

I hear this fear in every CEO conversation. It comes out different ways. "What if it makes something up." "What if it promises a discount we don't offer." "What if it tells someone we have a product we discontinued." Same underlying worry. AI will confidently invent things and the customer won't know the difference.

The fear is correct. Left alone, a language model absolutely will do this. But here's what most people get wrong about the fix.

This is not a prompt problem. You cannot write a clever enough instruction to make hallucination impossible. The reason customer-facing AI projects stall at the demo stage is that everyone treats this as something you can prompt your way out of. You can't. To prevent AI hallucination in product recommendations, you need architecture, not better wording. Let me show you the three moves that actually make it safe.

Why a better prompt doesn't fix hallucination

Here's the fix almost everyone tries first. They add a line to the system prompt: "Only recommend products we actually sell." Then they ship it.

Comparison showing prompt-only approach reduces hallucination but still leaks fake SKUs, versus deterministic gate that rejects fabricated products before reaching the customer Why a prompt cannot guarantee truth vs deterministic gate

That reduces hallucination. It does not eliminate it. And the gap between "reduced" and "eliminated" is exactly where your brand reputation lives.

An LLM is a probability machine, not a database lookup. When you ask it for a product recommendation, it isn't checking a list. It's predicting the most likely next words based on patterns it learned from billions of pages of text. Most of the time those predictions land on real products you carry. But under the right phrasing, the right edge case, the right unusual customer question, it will generate a plausible-sounding SKU code that does not exist. It'll invent "HD-1200 Heavy Duty Mailer" because that's a believable thing a packaging company would sell. It just isn't a thing you sell.

The prompt says "only recommend real products." The model agrees, sincerely, and then invents one anyway, because it has no actual mechanism to know what's real. It's pattern-matching what a real answer looks like.

Here's the line I make every client sit with: if a wrong answer can ever reach a customer, you don't have a safe system. Not "rarely wrong." Not "usually correct." If there exists any path where a fabricated product renders on the page, your AI can lie to your customers, and you're one weird question away from finding out.

The only way to get an actual guarantee is to stop relying on the model to police itself. You put a deterministic gate between the model and the customer. Code that cannot be persuaded, cannot be confused, and cannot improvise.

That gate is built from three moves. Feed the model the exact catalog. Force it to return verbatim SKU codes in structured output. Then validate every single one against reality before it renders. Each move sets up the next. Together they turn "probably honest" into "physically incapable of lying." Let's build them.

Move 1: Feed the model the exact catalog, not a description of it

The first constraint changes what the model's job even is.

By default, when you ask an AI about packaging, it draws on its training data and vague category knowledge. It "knows about" mailers and boxes the way it knows about everything: fuzzily, statistically, with no connection to your actual inventory. That's the source of the invention.

So you take that away. You inject the literal catalog into context. Every real SKU code, the product name, dimensions, the use case, current stock status, every row of the real table, dropped directly into the model's working memory at the moment of the question.

For 20 SKUs this is trivial and cheap. It's a few hundred tokens. You're not training anything, you're not building a vector database, you're just handing the model the menu before you ask it to order.

Now the model's job is completely reframed. It's no longer "know about packaging." It's "pick from this exact list." The customer says "I ship fragile candles," and the model maps that plain-English intent to the real rows that fit: the cushioned mailer, the small double-wall box, whatever you actually stock for fragile goods. It's doing the part it's genuinely good at, understanding messy human language, while pointed at a finite real-world set.

This is the foundation of a catalog-locked AI assistant, and the same principle covers credentials, pricing, and product composites too. Read that piece for the full pattern.

One thing to flag because it always comes up. For a small fixed catalog, you inject the whole thing, done. For a 5,000-SKU store you don't dump the entire catalog into every request. You retrieve the relevant slice first, the 20 or 50 candidate products that match the query, then inject those. The mechanism scales. The principle is identical: the model only ever sees and chooses from real rows.

But injecting the catalog isn't the guarantee. The model could still, in theory, garble a SKU code. That's why the next move matters.

Move 2: Force verbatim SKU codes and JSON-only output

Two rules here, and both exist to make the third move possible.

Diagram contrasting unverifiable prose output where fake products hide versus structured JSON output with an exact, verifiable SKU field JSON structured output with verifiable SKU field

Rule one: the model returns the SKU code verbatim. Copied exactly from the table I gave it. Not paraphrased, not reconstructed, not "close enough." If the catalog says BX-0408-DW, the model returns BX-0408-DW, character for character. The instant the model tries to be helpful and reformat it, validation gets harder, so I forbid it.

Rule two: output is JSON only, never prose. This is the one people resist because prose feels friendlier. It's also exactly what makes hallucination undetectable.

Think about a sentence like "I'd suggest our heavy-duty candle mailer for that." Where's the machine-checkable claim in there? There isn't one. "Heavy-duty candle mailer" isn't a SKU. It's a phrase. Your code can't look it up, can't verify it, can't reject it. A fabricated product hides perfectly inside a friendly paragraph because the paragraph has no field your code can check.

JSON with a sku field has a claim you can verify. Every recommendation becomes a parseable line item, not a story.

Here's the shape I ask for, generic field names:

{
  "recommendations": [
    {
      "sku": "BX-0408-DW",
      "reason": "Double-wall box sized for fragile small goods",
      "quantity": 1
    }
  ]
}

That sku field is the whole point. It's a discrete, isolated, exact value sitting in a structured slot. The reason field can be as conversational as you like, because nothing in the reason gets trusted as a product claim. Only the SKU does, and the SKU is verifiable.

Structured output also kills a second class of problem. The model can't sneak in a recommendation buried in a closing pleasantry, can't append "and you might also like our new line of..." There is no free text channel where an off-list product can ride along. There are line items, each with a SKU, or there's nothing.

This is the setup. The model has now made specific, checkable claims. Now we check them.

Move 3: The deterministic layer that rejects anything off-list

This is the actual guardrail. Everything before it was preparation.

Vertical decision flowchart showing the deterministic validation gate: real SKUs pass to the customer, fabricated SKUs are rejected with honest fallback options The deterministic validation gate decision logic

After the model returns its JSON, code, not AI, takes every SKU in the response and checks it against the real catalog table. Plain lookup. Is BX-0408-DW in the list of SKUs we actually sell? Yes, it passes. Is HD-1200 in the list? No, it gets rejected before it ever renders to the customer.

That's it. That's the move that turns "usually correct" into "cannot be wrong."

The model proposes. The code disposes. The AI is allowed to suggest anything it wants, and it doesn't matter, because nothing reaches the customer without clearing a deterministic check that the AI has no ability to influence. The validation layer isn't smart. It doesn't reason. It doesn't get talked into anything. It compares strings against a list, and a fabricated SKU fails that comparison every single time.

When a line fails validation, the system has a few honest options. It can drop that line and serve the ones that passed. It can re-ask the model with a note that the previous answer included an invalid product. Or it can fall back to a safe human-style message, "let me connect you with our team for that one." What it never, ever does is show the customer an invented product. There is no code path where that happens, because the invented SKU was filtered out by the lookup.

This is the principle I build every customer-facing system on: let the model judge intent and let the code enforce truth. The AI is excellent at understanding "I ship fragile candles." The code is excellent at guaranteeing the answer is real. Use each for what it's actually good at and stop asking the model to be a database.

This is the single line I tell every CEO to insist on for any customer-facing AI. Not "is it accurate." Ask: "is there a deterministic check between the model and my customer." If the answer is no, you have a demo, not a product. With that check in place, the worst the model can do is fail to recommend something. It cannot recommend something fake. That asymmetry is the whole game.

From recommendation to quote: keeping the same guardrail through checkout

The distributor didn't just want recommendations. He wanted the assistant to build a real quote the customer could act on. So we carried the exact same guardrail straight through to the quote.

End-to-end flow showing a customer plain-English request mapped to validated SKUs, priced from the real price book, approved by a human, and sent as a branded quote Full loop from plain-English question to approved quote

The recommender hands off validated SKUs, codes that have already cleared the deterministic check. The quote builder takes only those SKUs, pulls the real price and real stock for each from the actual price book, and assembles a branded quote. Same pattern, one layer deeper.

The critical property: the AI never invents a price or a quantity you can't fulfill. It doesn't estimate, doesn't round, doesn't guess at a bulk discount. Every line on the finished quote traces back to a validated catalog row with a real price attached. The model chose which products. The code supplied every number. There is no field on that quote the AI made up.

And before the quote sends, it stops for a human. A quote is a commercial commitment, a number your business is promising to honor, so every system I ship stops for a person at the moment that matters. The owner or a rep glances at the assembled quote, confirms, sends. Seconds of human time, total protection on the commercial side.

So the full loop for that distributor: a customer types "I ship fragile candles, need about 200 a month" in plain English. The assistant maps that to real, in-stock SKUs, builds a branded quote with real pricing, a human approves it, and it goes out. Plain-English ask in, real quote out, nothing on it the business can't actually ship.

The pattern is the lesson, not the catalog

That catalog had 20 SKUs. The architecture I just described is identical for a 500-product store, a 12-item services menu, a multi-page price book, a SaaS plan grid, anything with a real, finite list of things you can actually deliver.

Three reusable moves, and they don't change with scale:

  • Inject the real list so the model chooses from reality, not from training-data guesses.
  • Force verbatim structured references so every claim the model makes is exact and machine-checkable.
  • Validate deterministically so nothing reaches a customer without clearing a check the AI cannot bend.

That sequence is the entire difference between an AI demo and an AI you can put in front of paying customers. A demo is impressive in a controlled room. A product survives the weird question at 11pm from a customer about to spend money. The validation layer is what survives.

Here's the reassurance I gave that distributor, and it was literally true: a customer-facing AI built this way physically cannot recommend something you don't sell. Not "is trained not to." Cannot. The fabricated SKU fails the lookup and never renders. Full stop.

If you've been holding back from deploying AI because you're afraid it'll lie to your customers, your instinct is right and your hesitation is correct. Most AI on the market deserves that skepticism. But the fear points at a solvable problem. The fix isn't a better prompt. It's architecture, and it's the kind of thing I build. If you've got a catalog, a price book, or a services menu and you want AI in front of customers without the risk, let's talk about your catalog.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI actually fits.

Book a Discovery Call

Get AI insights for business leaders

Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.

Ready to automate your growth?

Book a free 30-minute strategy call with Hodgen.AI.

Book a Strategy Call