I Built an AI Executive Assistant That Triages 200 Emails a Day

Running a DTC fashion brand out of San Diego means my inbox is a warzone by 6am. Supplier updates from three countries, Shopify order notifications, customer service escalations, wholesale inquiries, ad platform alerts, shipping carrier updates, return requests, and the endless drip of newsletters I subscribed to in 2019 and never unsubscribed from. On an average day, that's 200+ emails. And for the longest time, I handled every single one of them the same way: manually, first thing in the morning, one at a time.

That process — AI email triage automation — is what I eventually built to replace myself. But before I explain how, let me explain why 90 minutes of morning email was slowly killing my ability to run the business.

200 Emails a Day Was Eating 90 Minutes of My Morning

I timed it. Across two weeks, my average morning email session was 87 minutes. Not responding to emails — just triaging them. Deciding what mattered, what could wait, what needed to be forwarded, and what could be ignored.

That's 7.5 hours a week. Roughly 390 hours a year. Spent not writing a single reply, but simply deciding which emails deserved my attention.

The real cost wasn't the time, though. It was the cognitive load. By the time I'd triaged 200 emails, I'd context-switched 200 times. Supplier quality issue, Shopify theme update, wholesale lead, UPS delay notification, customer compliment, ad spend alert, newsletter, newsletter, newsletter. Each one demanded a micro-decision. And each micro-decision burned a small piece of the focus I needed for actual strategic work.

I tried all the standard advice. Batch processing. Inbox Zero. Scheduled email windows. None of it addressed the fundamental problem: every email demands equal attention at first glance, whether it's a $15,000 wholesale inquiry or a notification that my Shopify theme auto-updated overnight.

The disease isn't email volume. It's that you're the classification engine, and you're running 200 classification tasks before your first real decision of the day.

The Decision Framework: Urgent, Important, FYI, Noise

Why Most Email AI Fails

I tried three off-the-shelf AI email tools before building my own. They all had the same flaw: they summarized everything equally. A supplier flagging a production delay got the same two-sentence summary as a marketing newsletter. The AI was technically accurate and practically useless.

The problem isn't summarization. It's prioritization. And prioritization requires a decision framework the AI can follow — not just comprehension of what the email says, but judgment about what it means for your business.

Most AI email tools skip this entirely. They're built to be general. But general means they can't tell the difference between urgent and noise in your specific context.

The Four-Bucket Classification System

I built the classification around a modified Eisenhower matrix — four buckets, each with clear rules:

URGENT: Requires action within 4 hours. Supplier delays affecting active orders, customer escalations with refund deadlines, payment processing failures, time-sensitive partnership responses. If I don't see this by mid-morning, something breaks.
IMPORTANT: Requires action within 24-48 hours. New wholesale inquiries, inventory threshold alerts, pricing decisions, vendor contract questions. These matter, but they won't detonate if I handle them after lunch.
FYI: Useful context, zero action required. Order confirmations, shipping status updates, weekly analytics digests, team status reports. Good to know. Not worth interrupting deep work.
NOISE: Newsletters I don't read, promotional emails, platform notifications with no actionable content, duplicate alerts. These shouldn't exist in my field of vision at all.

Every email gets classified into one of these four buckets along with a confidence score. If the model's confidence drops below 80%, it escalates the email to me directly rather than guessing. This is a pattern I use across every AI system I build — the AI rejects its own uncertain work rather than confidently getting it wrong.

The classification prompt is about 400 tokens, and it runs through Claude Haiku as part of a multi-model architecture I use across the business. Heavier models for complex reasoning, lightweight models for high-volume classification tasks. The cost is fractions of a cent per email. For 200 emails a day, we're talking about pennies.

How the System Actually Works: Gmail to Telegram in 30 Seconds

Gmail OAuth and Encrypted Ingestion

The pipeline starts with a Gmail OAuth connection that polls for new emails every five minutes. This is the part that sounds simple but requires careful handling — you're piping potentially sensitive business communications through an AI system, so every email is encrypted at rest before any processing happens.

The OAuth setup is honestly the most annoying part of the whole build. Google's API console is not designed for humans. Budget a few hours and some frustration. Once it's connected, it's rock solid.

Task Extraction and Memory

Each email passes through a three-stage pipeline:

Stage 1: Classification. The four-bucket system with confidence scoring. Takes about 2 seconds per email.

Stage 2: Entity and task extraction. The AI pulls out deadlines, dollar amounts, names, and specific action items. "Please confirm the PO by Friday" becomes a task with a deadline. "$4,200 outstanding balance" gets flagged with the financial amount. A new contact name gets added to the entity index.

Stage 3: Summary generation. One sentence per email for FYI items. Up to three sentences for urgent items, including the specific action needed and any deadline.

Extracted tasks feed directly into the broader task management system. This is where the AI executive assistant goes from clever to actually useful — it doesn't just tell me about the email, it creates the follow-up action.

The memory layer uses Zep and Mem0 for persistent context. This matters more than you'd think. If a supplier emailed about a fabric delay last Tuesday, and follows up today saying "any update on the timeline?", the system doesn't just summarize the new email. It connects the thread and tells me: "This is the third message from [supplier] about the denim delay. Original ETA was March 15, then pushed to March 22. They're now asking for your timeline confirmation."

That context assembly used to happen in my head. Now it happens before I wake up.

The Voice Briefing Nobody Asked For (But I Can't Live Without)

Every morning at 7am, a Telegram message hits my phone. Urgent items at the top with one-tap action buttons (approve, reply, escalate, snooze). Then important items. Then a count: "47 emails classified as FYI, auto-filed. 38 emails classified as noise, auto-archived."

But the part I didn't expect to love: the voice briefing. Using Cartesia TTS, the system generates a 3-minute audio summary of everything that matters. I listen to it while making coffee.

The voice briefing covers urgent items with suggested responses, flags any financial thresholds I've set (orders over a certain dollar amount, refund requests above a threshold), and highlights any new contacts who might be worth a personal reply.

I built this as a "nice-to-have." It's now the first thing I engage with every morning. Three minutes of audio replaces what used to be 90 minutes of screen time. I walk into my office already knowing what needs my attention and what doesn't.

What 30 Days of AI Email Triage Actually Looked Like

The Numbers

Over the first 30 days, the system processed approximately 6,200 emails. Here's what the data showed:

Classification accuracy: 94.2% agreement with my manual spot-checks. I audited 50 random emails per week, comparing the AI's classification to what I would have chosen.
Time savings: From 90 minutes/day down to roughly 12 minutes reviewing the triage summary and handling the 3-5 genuinely urgent items. That's 78 minutes saved per day, or 39 hours per month.
Noise ratio: 61% of all emails were classified as pure noise. Auto-archived. I never see them unless I go looking.
Urgent items: Averaged 4.3 per day. These were the emails that actually needed me. The rest was overhead.

Where It Got Things Wrong

The 5.8% miss rate wasn't random. It clustered around specific patterns.

Ambiguous urgency was the biggest issue. A supplier writing "when you get a chance, can we discuss pricing?" — is that IMPORTANT or FYI? The language says low urgency, but the topic says it matters. I solved this with a context escalation rule: any email from my top-10 contacts automatically gets bumped to at least IMPORTANT, regardless of how casual their language is.

Sarcasm and indirect requests tripped it up early on. A customer writing "Oh great, another delayed shipment" doesn't read the same way to an LLM as it does to a human. I tuned the prompt to flag uncertain emotional tone and escalate those for human review.

I'll be direct about what it can't do: it doesn't replace relationship judgment. When a long-time customer writes something that technically classifies as FYI, but I know from ten years of history that this person deserves a personal reply — that's still on me. The AI handles triage. I handle relationships.

The Cross-Skill Architecture That Makes This Possible

Email triage doesn't exist in isolation. It's one of 14 skills in the AI platform I built for my ecommerce business, and the compounding effects of connected systems are where the real value lives.

Here's what happens when email triage talks to other systems:

Task management: Extracted action items get scheduled with deadlines and assigned priority. I don't copy-paste from email to my task list — it's automatic.
Customer intelligence: Email patterns feed into customer health scoring. A customer who's emailed three times in two weeks with issues gets flagged before it becomes a churn event.
Inventory management: When a supplier emails about a delay, it triggers inventory rebalancing logic that adjusts reorder points and flags any products at risk of stockout.

The memory layer (Zep + Mem0) spans all of these skills. The system remembers that this supplier has been late three times this quarter. That this wholesale lead first reached out six weeks ago and hasn't gotten a response. That this customer complained last month and just placed another order — maybe worth a thank-you note.

A standalone email AI tool gives you summaries. An integrated AI email classification system gives you an executive assistant that actually acts on what it reads. The difference is architectural: connected systems compound, isolated tools don't.

Building Your Own Email Triage: What You'd Need

The Minimum Viable Version

You don't need 14 skills and 22,000 lines of Python to get value from this pattern. Here's the minimum:

Gmail API access via OAuth. This is the hardest part — Google's developer console is a slog. Budget 2-3 hours.
A classification prompt built around YOUR decision framework. Not mine. What's urgent for a law firm is completely different from what's urgent for a DTC brand. Spend 30 minutes writing down what actually needs your attention within 4 hours. That's your URGENT bucket. Work backward from there.
A delivery mechanism. Telegram bot, Slack webhook, or even a daily email digest with the structured triage. Whatever you'll actually look at.

What to Skip

Don't build these on day one: voice briefings (nice-to-have), memory systems (start stateless, add context when you see specific patterns you wish it remembered), and auto-responses (do not let AI reply on your behalf until your classification accuracy is consistently above 95%).

Cost: Running Gmail AI triage through Claude Haiku costs roughly $3-5/month at 200 emails/day. The ROI math is almost silly — $4/month to save 39 hours/month.

But the real value isn't the hours. It's the cognitive clarity of starting every day knowing exactly what needs you and what doesn't.

The Real Question Isn't Whether AI Can Handle Your Email

Email triage is the gateway. Once you've experienced an AI making 200 correct decisions before you've had your coffee, you start seeing every repetitive decision bottleneck in your business differently.

The pattern — clear decision framework, confidence-based escalation, human-in-the-loop for edge cases — applies everywhere. Customer support triage. Lead scoring. Inventory management. Content prioritization. It's the same architecture with different classification rules.

This is what I do as a Chief AI Officer: I find the 200-decision-a-day bottlenecks in a business and build systems that handle them. Not with generic tools. With systems tuned to your decision frameworks, your business context, your escalation thresholds.

If your mornings look like mine used to — drowning in micro-decisions before you've done anything strategic — that's a solvable problem. And it's probably one of a dozen solvable problems hiding in your operations right now.

Thinking About AI for Your Business?

If any of this hit close to home, I'd like to hear about it. I do free 30-minute discovery calls where we look at your operations and figure out where AI could actually save you time and money — not in theory, but in the specific, measurable way I've described here.

No pitch deck. No slides. Just a conversation about what's eating your time and whether it's fixable.

Book a Discovery Call