Human-in-the-Loop AI: Why the Expert Approves First

The Question Every Owner Asks: Should AI Talk to My Customers?

Here's the question that comes up in every first conversation I have with a business owner thinking about AI: do I let this thing talk to my customers directly?

It's the right question. And the answer, for anything with real stakes, is almost always no, at least not without a human in the loop.

Let me give you a real example. I built a coaching app for a fitness professional. The whole point was to generate workout programs, nutrition plans, and grocery lists at scale, so one coach could serve far more clients without staying up until midnight typing out meal plans by hand.

The AI part worked beautifully. But the moment you let a model give advice with real consequences, somebody's body, their health, their money, you have to answer a much harder question than "does the AI work." You have to answer "who is accountable when it's wrong."

Because the model will be wrong sometimes. Confidently wrong. It'll recommend a training load that's too aggressive or a calorie target that doesn't fit the client's medical history. And when that happens, it's not the AI's name on the line. It's the coach's name, the coach's reputation, and in some cases the coach's license.

So the default I build into these systems is simple: the AI drafts, the human expert approves. The client never sees raw model output. Ever. The model does the typing. The professional makes the call.

That sounds obvious written down. But in this exact app, I found a hole where it wasn't true. There were live routes that let a paying client trigger AI generation directly, with no coach in between. The expert had been bypassed and didn't even know it.

Here's how human-in-the-loop AI actually works, where it's non-negotiable, and the specific gap I found and closed.

What Human-in-the-Loop AI Actually Means

The draft-and-approve pattern

Human-in-the-loop AI means the model produces a draft, and a qualified human reviews it and either approves, edits, or rejects it before it reaches the end user. That's the whole pattern. The AI never ships directly to a customer.

Comparison diagram showing fully autonomous AI sending output directly to customers versus human-in-the-loop AI where a human expert reviews drafts before they reach the customer Draft-and-approve vs fully autonomous AI

Contrast that with fully autonomous AI, where the model generates output and it lands in front of the customer with no human checkpoint. Autonomous AI is fast and cheap. It's also a loaded gun when the output carries consequences.

The draft-and-approve pattern keeps the speed (the human isn't typing from scratch) while keeping the judgment (the human still decides what's good enough to ship). You get most of the efficiency and almost none of the risk.

Where it's non-negotiable

Full autonomy is fine for low-stakes work. Internal drafts. Brainstorming. First-pass marketing copy that a person reads before it goes anywhere. If a mistake costs you ten minutes of editing, automate it completely and don't think twice.

Vertical decision tree showing when to fully automate AI versus when to require a human in the loop based on whether output reaches customers and carries real consequences The stakes-based decision rule for when to gate AI

But anywhere the output is advice with consequences, human-in-the-loop is non-negotiable. Health. Legal. Financial. Anything that can physically hurt someone or create real liability for your business.

A wrong nutrition plan isn't a typo you fix later. A wrong financial recommendation isn't a brainstorm. These outputs change what a real person does in the real world, and that means a qualified human has to stand behind them.

I've written before about where I pull the plug on AI, which goes deeper on the full framework for deciding what gets gated and what gets automated. The short version: the higher the stakes, the more the human stays in the loop.

The Hole I Found: Users Could Trigger AI Directly

Back to the coaching app. When I came in to audit it, the intended flow was clean. The coach reviewed every program before it reached a client. The AI drafted, the expert approved, the client saw vetted output. Exactly the pattern I just described.

Architecture diagram contrasting the intended coach-reviewed AI flow against three hidden legacy public routes that let clients trigger raw AI output, bypassing the expert The bypass hole: intended flow vs hidden legacy routes

Except that wasn't the only way to trigger generation.

The app had legacy public routes left over from an earlier version. Three of them. One generated a workout, one generated a meal plan, one generated a grocery list. And they were public, meaning a client could trigger AI generation directly, with no coach in between.

Let that sink in. A paying client could hit one of those endpoints and get raw model output, advice about how to train their body and what to eat, without the trainer ever seeing it. The expert had been designed into the loop on the main flow and quietly designed out of it on the legacy flow.

This is a problem on two levels.

First, quality. The model can be confidently wrong about training loads or macronutrient targets. It doesn't know the client has a bad shoulder or a history of disordered eating unless someone with judgment checks the output against the full picture. Unconstrained model output in front of a customer is exactly the failure mode I wrote about in constraining what AI can say. When the model can say anything, eventually it says something it shouldn't.

Second, liability. The expert's name and professional license sit behind every program. If the AI hands a client a bad plan and the client gets hurt, the coach is on the hook, not the model. The bypass route meant the coach was legally exposed for advice they never saw.

The sneaky part: these routes had no frontend references anymore. Nothing in the app linked to them. The team assumed they were dead. But dead-looking and dead are different things. The routes were still live, still callable by anyone who knew the URL or poked at the API. They weren't being used. They were just waiting to be.

The Fix: Deleting the Routes That Bypassed the Expert

Delete the legacy public endpoints

The fix started with deletion, not addition. I removed three legacy public generation routes, roughly 800 lines of code, that had no frontend references and did nothing but create risk.

This is worth sitting with for a second. The security win came from deleting code. Not adding a fancy auth layer, not bolting on a monitoring tool, just removing the routes that should never have existed in the customer-facing surface area. Less code, less attack surface, fewer ways for raw model output to reach a client.

Every line of code is a liability until proven otherwise. Legacy routes that "aren't used anymore" are the ones that bite you, because nobody's watching them.

Route everything through admin-gated generation

With the public routes gone, I rerouted all AI generation through admin-gated endpoints. Now the only way to trigger generation is through the coach's authenticated admin context. The expert, and only the expert, can kick off a draft and review it.

There's no longer a path where the client triggers the model. The client requests a program. The coach generates and reviews it. The client receives the approved version. The loop is closed by design, not by convention.

Gate generation behind onboarding completion

Then I added a precondition: no AI generation can happen until a client finishes onboarding.

Three-step infographic showing the fix: delete legacy public endpoints, route generation through admin-gated access, and gate generation behind onboarding completion to close the human-in-the-loop The three-part fix that closed the loop

This does two things. It guarantees the model has the inputs it needs (goals, history, restrictions, the full profile) before it drafts anything. Garbage in, garbage out, and incomplete in is just as bad. And it guarantees the expert is always reviewing the output against a complete profile, not a half-filled form.

The result is the pattern working end to end. The AI drafts. The expert approves against a full picture. The client only ever sees vetted output. No bypass, no half-data generation, no raw model output reaching a paying customer.

The whole fix was less code, tighter gates, and one precondition. That's usually what good looks like.

The Boring Detail That Made It Work: maxDuration

Here's an unglamorous detail that almost broke the whole thing.

The app runs on a serverless host (Vercel), and serverless functions have a default timeout that's short, often around ten seconds. AI generation for a full multi-week program isn't a ten-second job. Those calls were running long and timing out before they finished.

The fix was small: add maxDuration exports to the generation functions so the long calls could run to completion instead of getting killed mid-flight.

But the point for a CEO isn't the config flag. It's what happens when the call fails silently.

Human-in-the-loop only works if the generation actually finishes and lands in the expert's review queue. A timed-out call that silently fails is worse than no call at all, because the coach thinks the system is working, the client is waiting, and there's nothing in the queue to review. The loop breaks not because of bad judgment but because the plumbing dropped the request on the floor.

This is the part that gets ignored in AI demos. The demo always finishes. Production calls time out, hit rate limits, and fail in ways nobody planned for. The unglamorous plumbing, timeouts, retries, queues, is what keeps the human reliably in the loop. Get it wrong and your beautiful approval workflow has a hole in it that nobody notices until a client complains.

Why This Is the Right Default for Any AI That Gives Advice

The expert's judgment is the product

Some owners worry that adding a human review step kills the efficiency they bought the AI for. I'd push back hard on that.

For advice-giving AI, the expert's review isn't friction. It's the product.

Clients aren't paying for model output. They can get model output for free by opening ChatGPT. They're paying for a trusted professional's judgment, the coach who knows what a safe progression looks like, who catches the plan that doesn't fit this particular body.

What the AI removes is the typing, not the thinking. The coach used to spend hours writing programs from scratch. Now the AI drafts and the coach reviews, which means they can review ten times more programs in the same hours. The judgment stays human. The grunt work goes to the machine. That's the whole idea behind AI replaced the typing, not the thinking, and it's the reason these systems make a professional more valuable instead of replacing them.

Liability follows the human, not the model

The liability argument settles it.

When something goes wrong with advice, the lawsuit names the human and the business. It never names the model. There is no scenario where a court holds the LLM accountable and lets the licensed professional off the hook. The accountability already lives with the human, so the human has to be in the loop by design.

Building autonomous advice generation is building a liability with no one standing behind it. That's not a tech decision. It's a decision to ship risk straight to your customers and hope it never lands on you.

This is why every AI system I ship stops for a human when the output carries consequences. It's not a one-off I did for this client. It's the default across everything I build.

Where I'd Start If You're Putting AI in Front of Customers

Here's a decision rule you can use today. If the AI output reaches a customer and a mistake has real consequences, keep a human in the loop. If it's internal or low-stakes, automate it fully and move on. Don't overthink it past that.

Square checklist infographic showing the three things to audit in customer-facing AI: who can trigger generation, who reviews before shipping, and whether legacy routes can bypass the review The three-point AI customer-facing audit checklist

When I audit a client's AI, I check three things:

Who can trigger generation. Is it locked to the expert, or can a customer set it off directly?
Who reviews before it ships. Is there a real approval step, or does output reach the customer the moment the model finishes?
Whether any legacy or unguarded routes can bypass the review. This is the one that catches people.

That third one is where the real risk hides. Most teams have a clean main flow and assume that's the whole story. They don't realize there's an old endpoint, a forgotten route, a public API that quietly skips the human, until someone goes looking. Nobody finds these by accident. You find them by auditing for them.

The coaching app had a textbook approval flow on the surface and three live bypass routes underneath it. The team had no idea. That's normal, not negligent. It's just what happens when software grows and old code never gets cleaned up.

This is exactly the kind of thing I find and fix when I come in as your Chief AI Officer. I look at where AI touches your customers, find the gaps between what you think the system does and what it actually does, and close them before they cost you.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI fits.

Book a Discovery Call