AI Document Data Extraction Tool: Mess to Math
How I built an AI document data extraction tool that parses claims from any format, then lets audited code do the math a professional can defend.
By Mike Hodgen
The Excel Problem Nobody Builds For
Picture a bankruptcy attorney at 9pm. A case is moving toward a filing deadline, and they need to model a creditor payment waterfall. Absolute priority. Who gets paid, in what order, and how much is left when the money runs out.
There is no purpose-built tool for this. So they do what they have done a hundred times before. They open a blank Excel file and rebuild the entire model from scratch.
Every claim gets typed in by hand. Secured, unsecured, priority, administrative. Then the formulas, the amortization schedules, the priority logic, all reconstructed from memory and a prior file that may or may not still be correct. Hours of manual entry. No validation. Every single number is a fresh chance to fat-finger a figure that decides who gets paid and who walks away with nothing.
This is the kind of work an ai document data extraction tool should handle, and almost nobody builds for it. The problem isn't that AI would be a nice addition. The problem is that this is high-stakes manual work with zero guardrails.
These answers get filed in court. They get scrutinized by opposing counsel, by a trustee, by a judge. A transposed digit isn't an embarrassing typo, it's a number someone relies on to make a legal argument. The model has to survive people who are paid to find the mistake.
I have seen this exact pattern across professional services. A specialist who is brilliant at the judgment part spends most of their time on the data-entry part. The expensive brain is doing clerical work, and the clerical work is where the risk lives.
That gap is the whole reason I built the tool I'm about to describe. Not because AI is exciting. Because rebuilding a high-stakes calculation by hand every time is a terrible use of a specialist, and the manual entry is exactly where the indefensible errors creep in.
Where AI Belongs and Where It Doesn't
This is the part most people get backwards, so I want to be precise about it.
AI extracts, code computes architecture (the core division of labor)
The boring part AI is good at
The input to this whole process is a mess. Claims pasted in from a forwarded email. A schedule copied out of a PDF. Numbers in inconsistent formats, half of them with dollar signs, half without, creditor names spelled three different ways. A specialist normally re-types all of that into structured rows by hand.
AI is genuinely good at this. Reading unstructured, inconsistent text and turning it into clean structured data is exactly what a large language model does well. Give a Claude agent a wall of pasted text and it will reliably pull out creditor, amount, claim type, and date. That's the boring part, and it's the part that eats hours.
The part that has to be right
The math is a different animal. The classification into priority classes, the amortization, the waterfall logic that determines who gets paid before whom. None of that can be "usually right."
In a courtroom, "usually right" is not defensible. A model that produces the correct answer 95% of the time is a liability, because you cannot tell which 5% you're looking at.
So here is the line I draw, every time, on every system like this. One AI agent extracts. Deterministic code computes and verifies. I wrote about this principle in more depth here: let the model judge and the code compute.
The AI reads the mess. The code does the math. That division is what makes the output defensible, not the AI itself. The model handles the part where being a little fuzzy is fine. The code handles the part where fuzzy gets you sanctioned.
When a buyer asks me whether an AI tool can produce answers their clients can stand behind, this is the honest answer. Yes, but only if the AI never touches the part that has to be exact.
What the Tool Actually Does
The tool I built is a claims waterfall calculator. Let me explain what that means for a CEO who has never modeled a bankruptcy estate.
The payment waterfall (buckets metaphor + priority classes)
A payment waterfall is money flowing down a strict order of priority. Imagine pouring water into a stack of buckets. The top bucket fills completely before a single drop reaches the one below it. Each class of creditor gets paid in full before the next class sees a dollar. When the money runs out, everyone below that point gets nothing.
The order is not negotiable. It's the law. Getting it right is the entire job.
Seven priority classes and six payment structures
The calculator handles seven priority classes and six payment structures. It runs per-creditor amortization, so a claim that gets paid over time is modeled correctly across the schedule, not just as a lump sum.
That sounds like a lot of moving parts, and it is. That's exactly why rebuilding it in a blank spreadsheet every time is so error-prone. Seven classes and six structures is a combinatorial mess to maintain by hand.
The cramdown and best-interest check
It also runs a cramdown and a best-interest-of-creditors check. In plain terms, these are the tests that confirm a proposed plan is legally allowed to be forced through. Does each creditor do at least as well under the plan as they would in a straight liquidation. The calculator answers that deterministically.
The whole thing is purpose-built for a workflow that previously lived in a spreadsheet that got reconstructed from scratch on every case. The logic is written once, audited, and reused. The specialist stops rebuilding the engine and starts feeding it the inputs.
The Parser: One Agent, Not Three
Here's a design decision I want to be honest about, because it goes against what a lot of AI vendors will sell you.
One agent vs multi-agent comparison
The extraction layer is a single Claude agent. It reads the pasted input in whatever format it arrives in and returns structured claims. One agent. Not three.
I could have built a multi-agent setup. One agent to extract, one to classify, one to validate, all chained together with an orchestrator on top. That architecture demos beautifully. It also would have been slower and less accurate.
More agents is not more reliable. It's more surface area for things to drift. Every handoff between agents is a place where context gets lost, where one model's slightly-off output becomes the next model's confidently-wrong input. You don't get redundancy, you get a longer chain of things that can each fail independently.
One agent with strong code verification behind it beat the multi-agent version on both speed and accuracy in my testing. So that's what shipped.
The agent has exactly one job: extraction. It never does math. It never makes the final call on a creditor's priority class on its own authority. It reads the text and hands back structured data, and that's the end of its responsibility.
This is the broader principle I apply to every system I build. You constrain what the AI is allowed to decide, hard, and you let audited code own everything that has to be correct. I wrote about that here: constraining AI so it can't embarrass a client.
The temptation with AI is always to give it more rope. More autonomy, more decisions, more of the workflow. The discipline is doing the opposite. Give it the narrowest possible job, do that job well, and let deterministic code take over the moment correctness matters.
The Verification Layer That Makes It Defensible
This is the part that answers the skeptic, so I want to spend real time on it.
The verification layer battery of checks
After the agent extracts the claims, deterministic code runs a full battery of checks before any human even looks at the result. This is ai plus deterministic verification in practice, not as a slogan.
Math validation and classification rules
First, math validation. The code confirms totals reconcile. It confirms amortization schedules sum to the correct principal. If the agent pulled a number that doesn't add up against the rest of the data, the code catches the contradiction.
Then classification. The priority class for each claim is assigned by code, applying the actual rules, not by the model guessing. The agent might suggest what kind of claim something looks like, but the binding classification is done by deterministic logic that follows the statute. The model's opinion doesn't get to be the final word.
Duplicate detection and reasonableness checks
The code also runs duplicate detection. The same creditor entered twice, under slightly different spellings, is a classic manual-entry error that doubles a claim and quietly breaks the whole waterfall. The code flags it.
Finally, reasonableness checks. A claim that's an order of magnitude off from everything around it gets flagged for review. If a number that should be $40,000 comes through as $400,000, that's the kind of fat-finger error that used to slip past a tired person at 9pm. The code catches it.
Here's the point for a buyer who has been burned by AI hype. Yes, the model can hallucinate a number. I am not going to pretend otherwise. The difference is that the hallucinated number hits a wall of deterministic checks before anyone relies on it. Reliability here is engineered, not promised. The trust doesn't come from the AI being good. It comes from the code assuming the AI might be wrong and verifying everything.
A Human Approves Before Anything Imports
There's one more gate, and it's the most important one.
Nothing the agent extracts gets used until a human reviews and approves the import. The agent extracts, the code verifies, and then the whole thing stops and waits.
The professional sees the structured claims laid out. They see every flag the verification layer raised. They see the reasonableness warnings, the duplicate alerts, the reconciliation results. And then they sign off, or they don't.
This is human in the loop ai by design, not human-in-the-loop as a disclaimer buried in the terms of service. The stop is a real stop. The system cannot proceed without a person.
Why does this matter so much for professional services? Because the expert has to stay accountable. An attorney's name goes on the filing. When a judge asks why a number is what it is, "the software decided" is not an answer that holds up. The expert needs to have looked, understood, and approved.
The tool removes the typing. It does not remove the judgment. That distinction is the entire value proposition. The specialist stops spending three hours on manual entry and starts spending those minutes on the thing only they can do, which is deciding whether the numbers and the strategy are right.
So when someone asks me, will an AI tool give my clients answers they can defend, the answer is yes. Because the AI only extracts. Audited code does every calculation. And a human signs off before anything is relied on.
Why It Leads With Reliability, Not "AI-Powered"
Here's the positioning lesson, and it applies to anyone selling into professional services.
The reusable architecture pattern across use cases
Attorneys, accountants, financial professionals. These audiences do not want "AI-powered" splashed across the top of the page. The moment they see it, they get nervous, and they should. They have heard the promises. They have seen the demos that fall apart on a real case.
What they want is trust. So a good professional services ai tool leads with what it guarantees, not with the technology underneath. Defensible math. Verified classification. A human approval step. The AI is the engine. It is not the marketing.
That ordering isn't a copywriting trick. It reflects how the thing is actually built. The reliability comes first in the messaging because reliability came first in the architecture.
And here is the pattern worth taking away, whatever business you run. Any complex calculation your specialists rebuild by hand every single time is a candidate for exactly this architecture. AI extracts the messy input. Audited deterministic code does the part that has to be right. A human approves before anyone relies on the output.
It works for a creditor waterfall. It works for a commission calculation, a pricing model, a compliance schedule, an actuarial estimate. Anywhere a smart person re-types data into a spreadsheet and re-derives the same logic over and over, there's a system to be built that removes the typing and the risk while keeping the human in charge of the judgment.
If that sounds like something happening in your business, tell me what your specialists rebuild by hand. That's usually where the highest-value, lowest-risk AI work lives. Not the flashy stuff. The expensive, error-prone, manual calculation your best people do at 9pm.
Want to explore what AI could do for your business?
Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI actually fits.
Get AI insights for business leaders
Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.
Ready to automate your growth?
Book a free 30-minute strategy call with Hodgen.AI.
Book a Strategy Call