Constraining LLMs in Production: 3 Guardrails That Work

The Real Fear Isn't That AI Is Dumb. It's That It's Confident.

When a CEO finally tells me their actual fear about AI, it's almost never "the model won't be smart enough." It's the opposite. They're afraid it will say something wrong to a customer and say it with total confidence.

Infographic showing AI capability as the easy 20 percent and the constraint layer as the critical 80 percent that makes a system shippable The Capability vs Constraint split (80/20)

That fear is legitimate. It's not hype, and it's not a reason to wait. It's the right thing to be worried about.

Here's the thing most vendors won't tell you. A capable model is dangerous precisely because it's capable. It fills gaps. Ask it a question it doesn't have the answer to, and it will manufacture one that sounds completely reasonable. It will recommend a product you don't stock. It will encourage work you aren't licensed to do. It will draw a product label that doesn't exist.

This is why so many AI projects fail. Teams chase the capability and skip the part that actually makes it shippable. If you've watched a vendor demo something slick that fell apart in production, you've seen the gap I'm talking about. I wrote more about that in why most AI projects fail.

So here's my thesis, stated early: constraining LLMs in production for a small business is mostly about building the constraint, not the capability. The constraint layer is the deliverable. The smart-sounding demo is the easy part.

Let me show you what that looks like in practice across three very different businesses: a distributor, an electrician, and a winery. Different industries, same engineering problem.

Pattern One: Lock the Recommender to a Real Catalog

The failure mode: confident hallucination

Picture an AI sales assistant for a distributor. Its job is to recommend products to customers. Sounds simple, and the demo looks great.

Left alone, here's what a capable model actually does. It invents a SKU that sounds plausible. It recommends a competitor's product because the competitor's name showed up in its training data. It suggests an item that was discontinued two years ago. Every one of those answers is delivered with the same confident tone as the correct ones.

The customer can't tell the difference. Neither can the model. That's the problem.

Post-validation against the source of truth

The fix is structural, not a better prompt. The model proposes, but every recommendation gets validated against the actual live catalog before it ever reaches the customer. If the SKU doesn't exist in the real product list, it gets dropped. Not flagged, not softened. Dropped.

Flowchart showing an LLM proposing product recommendations that are validated against a live catalog, with non-existent SKUs dropped before reaching the customer Catalog-locking: model proposes, catalog validates

In plain terms: the model is a ranking and phrasing engine. It's good at understanding what a customer wants and saying it well. The catalog is the source of truth. The model never gets to be the database.

I run this discipline in my own DTC fashion brand. We have 564+ products in production, and the recommender never free-forms a product into existence. It can only surface what's actually in stock and actually real. When it wants to recommend something, that something gets checked against the live data first.

I broke down the full architecture in lock the AI to a real catalog, but the reusable lesson fits in one line.

Never let the model be the database.

Pattern Two: Gate the Agent Against the Owner's Actual Credentials

When helpful becomes dangerous

Now picture an electrician running an AI permit-research and intake agent. It handles inbound questions and helps scope jobs. Useful, real, and the kind of thing that saves an owner hours a week.

Here's where helpful turns dangerous. A capable model wants to be useful, so it will happily explain how to do work the business isn't licensed for. It will imply the owner can take a job in a jurisdiction where they hold no credential. It will scope a project that crosses a line the business legally cannot cross.

That's not a typo or a quirky bug. That's liability. A confident, well-phrased answer that points a customer toward work the business can't legally perform.

Credential-scope gating in the agent

The fix is to gate the agent against the owner's actual license scope and jurisdiction. The agent is forbidden from recommending or scoping work outside that credential. When a request falls outside the boundary, it doesn't improvise a workaround. It flags the request and routes it to a human.

Vertical flowchart showing an AI intake agent gated against the owner's license scope, routing out-of-scope requests to a human instead of improvising Credential-scope gating for the agent

I think of this as scope gating. The model can research and inform all day long. It cannot authorize. Those are two different powers, and the second one stays out of the model's hands entirely.

This is the same principle that applies to anything that moves money or touches a legal obligation. Those decisions need hard boundaries, not a friendly suggestion engine. The model gathers information and presents it. A human with the actual license makes the call.

The capability here, a fast and knowledgeable intake agent, is genuinely valuable. But it's only safe to ship because of the constraint sitting underneath it. Strip out the credential gate and you've built a liability with a chat interface.

The constraint is what makes the capability safe to put in front of a customer.

Pattern Three: Composite the Real Product, Don't Let AI Draw It

Why generated product imagery embarrasses brands

Last one. A winery, where the label on the bottle matters more than almost anything else about the image.

Ask an image model to render a bottle of wine and watch what happens. It invents a label. The text is garbled. The vintage is wrong. The logo is something that has never existed. For most products that's annoying. For a regulated, brand-sensitive product like wine, it's a disaster. You can't put a fabricated label in front of customers or regulators.

The model isn't broken. Drawing a plausible-looking label is exactly what it's built to do. The problem is you asked it to be accurate about something it has no way of being accurate about.

Composite over generate

The fix is to composite the real label and real product onto an AI-generated scene, rather than letting the model draw the product itself.

Comparison showing AI generating a garbled fake wine label versus compositing the real product asset onto an AI-generated scene Composite vs Generate for product imagery

The model handles the environment. Lighting, background, mood, the table the bottle sits on, the way the light catches a glass next to it. That's what image models are genuinely great at. Meanwhile the actual product asset, the real bottle with the real label, stays untouched and pixel-accurate.

I run a composite pipeline like this for my own product imagery, and the rule is non-negotiable: AI can't draw your product. It can build a beautiful world around your product. It cannot be trusted to recreate the product itself.

I went deep on the mechanics in composite the real product instead of generating it. The reusable lesson is the same shape as the first two.

Separate what the model is good at, scenes and context and atmosphere, from what it must never touch, the truth of the product.

The Common Thread: The Constraint Is the Engineering

Three industries. A distributor, an electrician, a winery. Three completely different problems on the surface.

Underneath, it's one move repeated three times.

Catalog-locking, credential gating, and composite-don't-generate are all the same discipline. You identify the thing the model must never be the source of truth for, and you build a hard boundary around it. The catalog is the truth, not the model. The license is the truth, not the model. The product asset is the truth, not the model.

The capability, a good recommendation, a helpful answer, a beautiful image, is the easy 20%. Any decent model gets you that in an afternoon. The constraint is the 80% that makes the whole thing shippable to a real customer without keeping you up at night.

This is exactly where businesses get burned. A vendor demos the capability, because the capability is what looks impressive in a meeting. They skip the constraint, because the constraint is invisible work that doesn't photograph well in a slide deck. Then the thing goes live and starts confidently telling customers things that aren't true.

When you buy "an AI feature" without the constraint layer, you are not buying a feature. You are buying a liability with a nice interface. The demo and the deliverable are not the same object.

The guardrail is the deliverable. Everything else is the part that's easy enough that it isn't worth paying for.

How to Spot Whether Your AI Has Constraints or Just Capability

You don't need to be technical to pressure-test this. Here's the short checklist I'd hand any skeptical CEO before they approve an AI build, internal or vendor-supplied.

Vertical checklist infographic listing four questions a CEO should ask to test whether an AI build has real constraints or just capability The four-question constraint pressure-test checklist

Where is the source of truth, and can the model overwrite it? Find the data that has to be correct, the catalog, the pricing, the license, the product spec. Then ask whether the model can change it or invent around it. If it can, you have a problem.

What happens when the model hits something outside its scope? Ask the builder to walk you through it. Does the system improvise a confident answer, or does it stop and escalate? "It just answers" is the wrong answer. You want to hear about a boundary and a handoff.

Does anything customer-facing get validated before it ships? Raw model output going straight to a customer is the failure mode that burns brands. There should be a checkpoint, automated or human, between what the model generates and what the customer sees.

Is there a human on anything that moves money or touches a legal or license-bound decision? These never go fully autonomous. I covered why in every AI system I ship stops for a human. If money or liability is involved and there's no human checkpoint, walk away.

One honest note. No constraint layer is perfect. The goal isn't to pretend AI is infallible, because it isn't. The goal is to shrink the blast radius so that when something does slip, it's caught before it reaches a customer instead of after. That's the difference between a system you can trust and a system you're just hoping behaves.

Where This Leaves You

The fear that AI will embarrass you in front of a customer is valid. I'm not going to talk you out of it, because it's the right instinct.

But the answer was never to avoid AI. The answer is to build the constraint layer that most vendors skip. The recommender that can't invent a product. The agent that can't authorize work outside your license. The image pipeline that can't fabricate your own label.

Notice that across three completely different businesses, the engineering was the same move. That's the part that matters for you. The constraint discipline transfers. An operator who's built catalog-locking, credential gating, and composite pipelines before doesn't have to relearn the lesson on your dollar. They can move fast because they already know where the boundaries go.

If you've been burned by a vendor who demoed the capability and shipped you a liability, or you just want customer-facing AI you can actually trust, that's the work I do. Come talk to me about your build and I'll tell you straight where the constraints need to be.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI actually fits.

Book a Discovery Call