AI Agents That Take Actions: Talk vs. Do

The Difference Between an AI That Talks and an AI That Does

A customer emailed my DTC fashion brand asking to change their shipping address before an order went out. The email AI replied in seconds: "I'll get your address updated right away." Polite. On brand. Fast.

Comparison diagram showing a drafting-only email AI that creates a promise gap versus an action-capable chat AI that delivers truthful confirmation Talk vs Do: Promise Gap vs Truthful Confirmation

And completely hollow.

Because nothing happened. The AI wrote the sentence, then dumped the actual work onto a human. Me, or someone on the team, had to log into the store, find the order, and change the address by hand. The customer thought it was done. It wasn't.

Here's the strange part. On the exact same brand, my chat agent was already changing addresses itself. No human touched it. A customer would ask, the agent would verify the order state, update the address against the live store, and confirm it was finished. Same company. Two completely different levels of capability.

This is the gap most people miss when they think about AI agents that take actions. They assume support AI just generates polite text and the value is in how nice it sounds.

But the sound of the reply isn't the point. The real question is whether the AI can actually do the thing it just said it would do.

My email agent could talk. My chat agent could do. And the difference between them had nothing to do with prompting, model choice, or how clever the writing was.

The fix was plumbing. Not smarter words, but the infrastructure underneath the words. The rest of this piece is about exactly what that plumbing is, why drafting-only AI is a liability, and how I got one channel to stop promising and start doing.

Why a Drafting-Only Agent Is a Liability, Not a Feature

The promise gap

My email agent was scoped to produce text. That's it. It could read a customer's message, correctly understand the intent (address change, cancellation, a request for a goodwill discount), and write a fluent, on-brand reply.

What it could not do was act.

So every reply it sent was a promise with a hidden human dependency behind it. "Done, I've updated that" sounded like a resolution. It was actually a to-do item disguised as a resolution, sitting in a queue waiting for a person.

That's the promise gap. The AI was confidently writing checks a human had to cash.

Where the work piles up

Walk through the failure modes and it gets ugly fast.

A customer gets "I'll update that right away," and then nothing happens for three hours because the human queue is backed up. Or the human picks up the thread, misreads which order the customer meant, and changes the wrong one. Or the action falls through entirely because it got buried under forty other tickets.

Now the customer is angry, and they're angry with cause, because they were told it was handled.

This is why a chatbot that only talks doesn't reduce work. It relocates it. The drafting feels like automation, but the actual labor (the part that touches the system of record) is still 100% human. You've just added a trust risk on top, because now a machine is making promises on your behalf that you may or may not keep.

I wrote more about this maturity gap in two support agents, one had hands and one didn't. One channel could write. One channel could act. The writing channel looked impressive in a demo and created more problems in production.

A drafting-only agent is not a half-finished feature you can ship and improve later. In a support context, it's actively worse than no automation, because at least with no automation nobody is making false promises in your brand's voice.

What an Action Engine Actually Is

The 18 live tools

So what did the chat agent have that the email agent didn't?

An action engine. In plain language, that's the code that sits between the AI's intent and the real system of record. The model decides what should happen. The action engine is what actually makes it happen against the live store.

The chat agent had roughly 18 live tools wired in. Change a shipping address. Cancel an order. Look up order status. Issue a discount as store credit instead of a cash refund. Each tool maps to a real, specific, reversible (or carefully gated) operation against the live store.

These aren't suggestions the AI hands to a person. They're functions the AI can call directly. I covered the full scope of this in an AI support system that handles returns and exchanges, but the short version is that the agent had hands, and those hands were connected to real controls.

The policy verdict layer

Here's the part that makes it safe, and the part nobody shows in a demo.

Architecture diagram showing the AI model proposing a tool, a deterministic policy verdict layer approving or blocking it, and approved actions executing against the live store while gated actions route to a human Action Engine Architecture: Model proposes, Policy layer disposes

Before any action runs, the engine asks one question: is this allowed, in this state, for this customer, right now?

I call it the policy verdict layer. It's deterministic code, not a model guess. You can't cancel an order that already shipped. You can't issue store credit above a set threshold without routing to a different path. You can't change an address on an order that's already in transit.

The model proposes. The policy layer disposes.

This is also where the confirmation gating lives, the logic that stops high-stakes or irreversible actions and hands them to a human. I build this into every AI system I ship, because it's the difference between an agent that's useful and an agent that's a lawsuit.

The point for anyone evaluating this stuff: "AI takes actions" does not mean "AI does whatever it decides." It means the model picks a tool, and a hard-coded policy layer either approves it or refuses it. That constraint, that refusal to let the AI act outside a defined boundary, is the whole game. I get into the constraint side of this in constraining what AI is allowed to do.

That's how you take actions safely. The intelligence proposes. The plumbing decides.

The Fix Was Reuse, Not Rebuild

The instinct most teams have when they hit my email problem is to build new tooling for the email channel. Write an address-change function for email. Write a cancel function for email. Wire up all 18 capabilities again, this time on the email side.

Diagram showing chat and email channels both feeding into a single shared action engine, policy layer, and live store, illustrating reuse instead of rebuilding Two Front Doors, One Action Engine (Reuse vs Rebuild)

I didn't do that. And refusing to do it was the highest-leverage decision in the whole project.

The chat agent already had a battle-tested evaluate() engine. It had been hardened by real customer traffic. The policy verdict layer was proven. The confirmation logic had already caught edge cases in production. All of that work was done, paid for, and trusted.

So instead of duplicating it, I pointed the email pipeline at the exact same action engine the chat agent was already using.

Here's the architectural insight that made it obvious. The channel (email versus chat) is just the surface. It's the front door. The decision of whether and how to act on an address change is identical underneath, regardless of how the customer's request arrived.

The action engine is a shared service, not channel-specific code. Two front doors, one set of hands.

That's the moment the email agent stopped promising and started doing. It could now truthfully write "Done, I've updated your address," because the engine had actually updated it before the draft was even composed.

No new tools. No second policy layer to maintain and keep in sync. No risk of the email channel and the chat channel disagreeing about whether a shipped order can be cancelled. One engine, one set of rules, one source of truth.

The engineering judgment wasn't in writing clever new code. It was in recognizing the capability already existed and refusing to build it twice. Most of the leverage in AI systems is exactly this kind of thing: not inventing, but connecting what's already proven to a new surface.

How an Email AI Goes From Promising to Doing

Act first, write second

The old flow looked like this:

Vertical flowchart contrasting the old write-first flow with the new act-first-write-second flow where the AI performs the action before composing a truthful reply Act First, Write Second, Inverted Flow

Read the message. Write a draft that promises something. A human acts later (maybe).

The new flow inverts the most important step:

Read the message. Determine intent. Run it through the shared action engine. The policy layer approves or blocks. If approved, perform the action against the live store. Only then compose the reply describing what actually happened.

The key inversion is act first, write second.

The draft is generated from the result of the action, not the intention to act. This is the whole difference between chat versus email AI that promises and AI that does. The AI doesn't say what it's going to do. It does the thing, then describes what it did.

Truthful confirmation

So when a customer reads "I've moved your shipment to the new address," that sentence is a description of a completed fact. Not a forecast. Not a promise sitting in a queue. The address was already changed in the live store before that sentence existed.

That's truthful confirmation, and it's only possible because the action runs before the language does.

The gating still applies, exactly as it does on chat. High-stakes or out-of-policy actions stop for a human. And critically, the email reflects that honestly too. If something needs human review, the reply says "I've flagged this for our team," because that's what actually happened.

Let me be honest about the limits. It doesn't auto-execute everything. Some categories are deliberately human-only by design, refunds above a threshold, anything that looks like fraud, edge cases the policy layer doesn't recognize. Those still route to a person.

That restraint isn't a weakness. It's the trust mechanism. An agent that knows what it shouldn't touch is more valuable than one that touches everything.

Can AI Take Real Actions Safely? Here's the Honest Answer

Yes. AI can take real actions safely. But the safety does not come from the model.

Vertical infographic showing the three layers that make AI actions safe: a defined set of tools, a deterministic policy layer, and confirmation gating, with the separation of model and engine responsibilities Three Layers of Safe AI Action

It comes from three things wrapped around the model.

First, a defined set of tools. Not open-ended access to your systems. A specific, bounded list of operations the AI is allowed to perform, and nothing outside that list.

Second, a deterministic policy layer that decides what's permitted in the current state. Not a model judgment call. Hard code that knows you can't cancel a shipped order and enforces it every single time.

Third, confirmation gating on anything irreversible or high value. The risky stuff stops for a human, always.

The model's job is narrow: understand the customer and choose the right tool. The engine's job is to refuse the wrong actions and execute the right ones. Keep those jobs separate and you get an agent that's both useful and safe.

Now, to the skeptic who says "AI chatbots just talk." That's true of bad implementations. It's false of good ones. And the entire difference lives in the plumbing nobody puts in a demo, the action engine, the policy layer, the gating.

The real outcome on my brand: the email channel went from a backlog of human follow-ups to a channel that resolves common requests end to end. Address changes, status lookups, store-credit offers, all handled without a person, all confirmed truthfully.

What still stays human? Anything ambiguous, anything high-value, anything the policy layer flags. I'd rather over-gate and keep trust than over-automate and break it. That line is a judgment call, and I keep it conservative on purpose.

Most Support AI Stops at the Draft. That's the Cheap Half.

Here's the strategic point if you're a CEO or COO evaluating this.

Writing a good reply is now the easy, commoditized part. Any tool can draft a polite, on-brand message. That capability is essentially free. It impresses people in demos and means almost nothing in production.

The expensive, valuable half is everything underneath: the action engine, the policy layer, and the channel-agnostic architecture that lets one set of proven hands serve every front door you have.

So when you're evaluating support AI and the vendor demo only shows it writing nice messages, you're looking at the cheap half. Ask them what happens after the AI says "I've updated that." Ask who actually does the updating. If the answer is "a human in your team," you're buying a more expensive way to write promises you still have to keep yourself.

The leverage is in reusable infrastructure, not channel-specific bots bolted on one at a time. Build the action engine once, harden it with real traffic, and point every channel at it. That's how the email agent on my brand went from a liability to a closer in a single architectural decision.

This is the kind of system I build and audit, both across the businesses I run and the clients I work with. If your support stack writes great messages and still drowns your team in follow-up work, that's a fixable problem, and usually a smaller one than people expect. Talk to me about your support stack.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI actually fits.

Book a Discovery Call