Sensitive Data in Git: Getting PHI Out of the Repo (Simply Explained)
A plain-language guide to sensitive data in git. No jargon, no tech speak, just what it means for your business.
By Mike Hodgen
I Built Something Useful and Almost Created a Disaster
I built a private health AI for a family member. The idea was simple. Ask plain-English questions about their medical history and get real answers based on their actual records, not random guesses from the internet.
It worked great. Fast, simple, ran on any computer. For a weekend project, exactly the kind of thing I love building.
Then I actually stopped and thought about it.
To make the AI work, I had stored their real medical records inside the system I used to manage my code. Think of it like a filing cabinet where programmers keep their work. The problem is that this particular filing cabinet copies itself everywhere and never throws anything away.
So I had taken someone's private medical records and quietly scattered them across every backup, every copy, every machine that ever touched the project. All because it made the build faster.
This is the trap. The fast way to build something is almost always the messy way to store data. And the messiest place is usually the easiest one.
Here is the uncomfortable question I had to ask myself, and you should too. If I did this on a careful personal project, where is your customer data sitting right now? In a spreadsheet someone saved last spring? In a file someone uploaded once and forgot about?
You probably do not know. Most businesses do not.
Why "We Deleted It" Doesn't Mean It's Gone
Here is the part almost everyone gets wrong.
The system programmers use to manage their code keeps a permanent record of everything. Imagine a notebook where every page you ever wrote stays forever, even the pages you crossed out. Deleting a file today does nothing. Last week's version is still sitting there.
Now think about how many copies exist. Every team member who downloaded the project has the full notebook. Every backup. Every contractor who pulled it once. Every automated copy made along the way.
If that project ever becomes public by accident, the data is already out. If a contractor leaves but keeps their copy, they keep the data. You cannot un-share it.
For my health project, that is not a hypothetical. That is the exact kind of leak that medical privacy laws exist to prevent.
The same logic applies to anything sensitive. Passwords, customer lists, financial records. Once it gets saved into that system, it carries forward forever, even after you "remove" it.
So when someone tells you "we deleted it," with this kind of system, that is not the same as "it's gone." Those are two completely different statements.
How I Fixed It Without Rebuilding Everything
The good news is the fix was small. A few hours, not a rebuild.
The approach I used to make the AI fast was actually correct. I had pre-organized the medical information so the system could search it instantly, no expensive software required. The search part was fine. The problem was purely where I stored the information.
So I moved it. I took the sensitive data out of the messy filing cabinet and put it into a proper, locked database. Same information, same fast search, completely different storage.
Then I locked the door. The data can only be read by the system itself on a secure server, never by a web browser, never by the public. Think of it like moving cash from a desk drawer into a vault that only one trusted manager can open.
Here is the part business owners should hear. I did not throw out the fast, cheap approach and buy some expensive enterprise system. I just moved the data behind a real lock and changed where the AI reads it from. The convenient setup stayed convenient. The data stopped being dangerous.
If your team built something fast, the fix is usually this small too. The hard part is not the work. It is noticing the problem exists.
Cleaning Up the Past Is Its Own Job
Stopping new copies is easy. Cleaning up the old ones is the part everyone wants to skip.
Remember, every past version of the medical records is still sitting in that permanent notebook. Telling the system to ignore them going forward does nothing about what is already there.
To actually erase them, you have to rewrite the entire history. Strip the records out of every old version, then make everyone with a copy download a fresh, clean one. On a one-person project like mine, this was quick. On a team with contractors and backups, it is a real coordination effort.
And here is the honest limit. If a copy of the project ever lived somewhere you do not fully control, a contractor's laptop, an outside backup service, you have to assume the data is already exposed. You cannot clean up a copy you cannot reach.
That is exactly why you never want sensitive data in there to begin with. The cheapest fix is the one you do before the first mistake.
Getting It Out Is Step One, Not the Finish Line
Moving the data into a locked database only helps if the lock actually works.
I have seen a real case where patient data was sitting in a proper database but the public could still read it because the access rules were set wrong. Out of the messy filing cabinet, still exposed. The location changed but the lock didn't.
So who can open the door matters just as much as where you put the data. A database with weak access is no safer than a file anyone can grab. Sometimes worse, because it feels safe.
The right way to think about it is layers. Get the data out of the messy system. Lock who can read it. Scramble it so even a stolen backup is useless. Watch the logs. Every place data lands is a place it can leak from. Securing one and ignoring the rest just moves the weak spot.
Where Your Customer Data Is Probably Hiding Right Now
Here is why this matters to you.
Most businesses moving fast with AI have done some version of what I did without noticing. Customer records in a spreadsheet someone saved for testing. Passwords uploaded once and "removed." A full database backup sitting in a folder it shouldn't be in. Customer details typed into an AI tool that quietly logs everything.
The same convenience that makes AI fast is what scatters sensitive data into the wrong places. Almost nobody slows down to fix it.
Here is my honest take. I found this in my own careful project, built by someone who knows better. If it can happen to me, it is absolutely sitting in business systems built under deadline pressure right now. The question is not whether you have data in the wrong place. It is where, and how much.
You cannot protect data you cannot find. So the first thing I build when I come in is a real map of where your customer data actually lives. Every spreadsheet, every backup, every tool it flows through. We lock the leaks before we build anything new on top of them.
If you are not sure where your data lives, that uncertainty is the work.
Want to explore what AI could do for your business?
Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI fits.
Get AI insights for business leaders
Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.
Ready to automate your growth?
Book a free 30-minute strategy call with Hodgen.AI.
Book a Strategy Call