Back to Blog
technicalcase-study

AI Medical Record Parsing: CCDA XML to Health Data

Same medication, three different names across three hospitals. I built AI that normalizes messy medical records into one unified health picture.

By Mike Hodgen

Want the full technical deep dive? Read the detailed version

My grandmother had seven specialists. Cardiologist, kidney doctor, diabetes doctor, lung doctor, primary care, and two more I'm probably forgetting. Each one had fragments of her medical history, but nobody — not a single provider — had the complete picture.

That's the problem I set out to solve when I built a health monitoring system for a family member with a similarly complex medical situation. AI medical record parsing wasn't something I went looking for as a project. It was the only way to get a unified view of someone's health when the healthcare system refused to provide one.

The raw materials were all there. Patient portals let you download your records. Specialists will send documents if you ask nicely. But "having access" and "being able to use it" are two very different things. What I had was a pile of digital medical files in formats that don't play well together, 30+ documents from various providers, and a growing sense that no human being should have to manually piece all of this together.

Medical Records Are a Mess — By Design

When you download your records from a patient portal, you get files in a format called CCDA. Think of it as the official language hospitals use to store your medical history digitally. It contains your medications, allergies, lab results, conditions, procedures — everything.

In theory, every hospital uses the same format. In practice, every hospital speaks a slightly different dialect.

Here's what I mean. The same blood pressure medication showed up three different ways across three different hospitals:

  • "Metformin HCl 500mg" from one system
  • "METFORMIN HYDROCHLORIDE 500 MG ORAL TABLET" from another
  • "metformin 500mg tabs" from a third

Same drug. Same dose. Three different names. Now multiply that by every medication, every lab test, every diagnosis, across every provider. Dates are written differently. Lab results use different units. Some hospitals give you precise numbers. Others just write "normal" with no data behind it.

Comparison table showing how three different hospital systems encode the same medication name, date format, and lab result differently in CCDA XML, with a unified normalized output row at the bottom demonstrating AI medical record parsing reconciliation The same patient data, encoded three different ways by three different hospitals.

You can't build simple "find and replace" rules for this kind of mess, because the mess changes depending on which hospital generated the file. You need something smarter.

Building a Smart Assistant That Reads Medical Records

I built an assembly line for medical information. It works in two passes.

First pass: A program I wrote in Python (a common programming language) reads through each medical file and sorts the raw data into categories — medications in one pile, allergies in another, lab results in a third. Think of it like dumping a box of unsorted mail onto a table and making stacks by type.

Second pass: This is where AI medical record parsing actually happens. I use an AI that reads and writes like a human (specifically, Claude by Anthropic) to look at those sorted piles and recognize that "Metformin HCl 500mg" and "METFORMIN HYDROCHLORIDE 500 MG ORAL TABLET" are the same thing. It merges duplicates, standardizes names, and flags conflicts.

Flowchart showing the two-pass medication normalization process in AI medical record parsing: first pass uses RxNorm code matching for 60% of entries, second pass uses Claude AI for the remaining 40% of edge cases requiring clinical context understanding, producing a single canonical medication entry Simple matching catches about 60% of duplicates. AI handles the tricky 40% — which is where the critical information usually lives.

On top of the structured files, I also had 30 PDF documents — specialist visit notes, imaging reports, discharge summaries. These required a different approach. I broke them into searchable chunks, preserved the important context (you can't split a set of kidney labs across two chunks and expect useful answers), and stored them in a way that lets AI search through them intelligently.

The result: two knowledge bases working together. One with clean, organized medical data. One with searchable clinical notes. Together, they cover the full picture.

Asking Questions and Getting Real Answers

With both knowledge bases connected, the system can actually answer useful questions:

"What's the kidney function trend over the last 18 months?" It pulls the lab values, puts them in order by date, spots the trend, and then finds what the kidney doctor actually said about those results during visits.

"Are any current medications a problem given the allergy list?" It cross-checks every medication against every documented allergy — not just exact matches, but related drug families that could cause reactions.

"What did the heart doctor recommend about blood pressure medication at the last visit?" It searches through the visit notes, filtered by doctor and date, and pulls the relevant recommendation.

Architecture diagram showing the complete AI medical record parsing pipeline from CCDA XML files and clinical PDFs through Python extraction, Claude normalization, Voyage AI embeddings, and dual-retrieval RAG pipeline to queryable clinical answers with source attribution The complete assembly line: messy files go in, searchable medical history comes out.

I have to be honest about limitations. The system provides information, not medical advice. It's wrong sometimes, particularly with vague abbreviations — "CP" could mean chest pain, cerebral palsy, or a dozen other things. Scanned handwritten notes are unreliable.

That's why every answer includes a source reference — which document, which section, which date. A family member or caregiver can verify anything the system surfaces. When you're dealing with medical data, trust requires traceability.

Why This Matters Beyond Healthcare

Healthcare has a data problem that makes every other industry I've worked in look organized. Your heart doctor doesn't know what your kidney doctor prescribed last week. Your primary care doctor gets a one-paragraph summary instead of the full specialist workup. The promise of systems that "talk to each other" has been "five years away" for twenty years.

Infographic contrasting the fragmented state of patient health data across seven specialists using different formats like CCDA XML, PDFs, and faxed notes versus a unified queryable medical history achieved through AI medical record parsing with a RAG pipeline Seven specialists, seven formats, zero coordination — until AI stitches it together.

AI doesn't fix the healthcare system. But it makes it possible to build what the system won't: a unified, searchable view of one patient's complete history.

Here's the thing — this same pattern shows up everywhere. Messy data in, organized knowledge out. I've built 15+ AI systems across product creation, SEO, pricing, and customer service. My DTC fashion brand runs on 22,000+ lines of custom code handling exactly this kind of data cleanup — just for product catalogs and pricing instead of medical records. The shape of the data changes. The core challenge is identical.

Legal teams drowning in thousands of contracts. Financial firms reconciling decades of regulatory filings. Manufacturers getting specs from 50 suppliers in 50 formats. If your business has valuable information trapped in formats that don't cooperate, this approach works.

Thinking About AI for Your Business?

If this resonated — especially the part about valuable data stuck in formats that resist being useful — I'd like to hear about it. I do free 30-minute discovery calls where we look at your operations and identify where AI could actually move the needle for your specific situation. No slides. No pitch deck. Just an honest conversation about what's possible.

Book a Discovery Call

Get AI insights for business leaders

Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.

Ready to automate your growth?

Book a free 30-minute strategy call with Hodgen.AI.

Book a Strategy Call