Back to Blog
gitphisecuritydata-migrationcompliance

Remove Sensitive Data From Git History (The Real Way) (Simply Explained)

A plain-language guide to remove sensitive data from git history. No jargon, no tech speak, just what it means for your business.

By Mike Hodgen

Want the full technical deep dive? Read the detailed version

The "Delete" That Doesn't Actually Delete Anything

Here is a mistake I see all the time. Someone realizes a file with sensitive information got saved into their project's code. They delete the file, save the change, and breathe a sigh of relief. Problem solved.

It is not solved. Not even close.

Most software teams use a tool called Git to track every change they make to their code. Think of it like a filing cabinet that keeps a copy of every version of every document, forever. That is the whole point. It never forgets.

So when you delete a file and save that change, Git does not throw the old file away. It just adds a note that says "this file is gone now." The original file still sits in the cabinet, fully intact, ready for anyone to pull back out.

Picture crossing a name off a guest list that has already been photocopied a hundred times. You crossed it off your copy. The other ninety-nine copies still have the name. Anyone with one of those copies can read it.

A Real Example: A Medical Record Buried in the Code

Let me walk you through a real situation. I built a health command-center app for a family member. At some point, a full patient medical record had gotten saved into the project's code history.

The project was private, but private does not mean safe. The medical record was sitting right there in the history, readable by anyone who ever touched the project.

A medical record is about as sensitive as data gets. This was not a "clean it up when I get around to it" problem. This was a fix-it-now problem.

And here is the catch. I could not just delete the file. The app actually used that file to work. The code read the medical record at runtime to do its job.

So if I had done the obvious thing and yanked the file out, two bad things would have happened. The old copies would still be sitting in the history (the exact thing I was trying to fix). And the app would break, because the code still needed that file to run.

This is the trap most quick fixes fall into. You cannot just rip the file out. You have to do it in the right order, or you make things worse.

The Right Order: Four Steps, No Shortcuts

The fix is boring and sequential. That is exactly why it works.

Step one: build a locked-down home for the data. Before moving anything, I created a secure storage spot set to "deny everyone by default." In plain terms, no web browser, no outside request, nothing could reach this data. Only the app's own behind-the-scenes server could open it, and the record was scrambled (encrypted) before it ever landed there. Even if someone broke into the database, they would get gibberish.

A wide-open storage setting is one of the most common ways sensitive data leaks without anyone noticing. So I started from "everything is locked" and opened exactly one carefully controlled door.

Step two: move the data, then update the code. I wrote a one-time script that copied the full medical record into the new secure storage. Then I rewired the app so it fetched the record from that secure spot instead of from the old file.

Then I ran a check across the entire project to confirm nothing else was still reaching for the old file. It came back clean. That clean check is the proof. If even one hidden part of the code still needed that file, deleting it would break the app. Verification beats hope every time.

At this point the app worked perfectly. It no longer needed the original file at all.

But here is the uncomfortable truth: the file was still buried in the history. The app did not need it anymore, but every old saved version still held the full medical record. The dangerous part was not done. I had only earned the right to do it.

That trips a lot of people up. Getting the app working without the file feels like the finish line. It is not.

Step three: scrub the file out of every version in the history. There is a tool built exactly for this. It goes back through every saved version of the project and removes the file from all of them, not just the latest one.

A quick tip that saves real pain: before doing this, set aside any unrelated work you have in progress. A history rewrite can tangle with other changes, so I put my unrelated work in a safe holding spot first, ran the scrub, then brought it back. Nothing got eaten because nothing else was exposed to the operation.

After the scrub, I verified the actual file content was gone from the history. Not hidden, not unlinked. Gone. That is the difference between a real fix and the fake "I deleted it" version we started with.

Step four: push the cleaned-up version everywhere, and treat the leak as a leak. The scrub only fixed my local copy. I still had to overwrite the shared online copy that the rest of the team used.

Now the honest part, because it matters more than any of the steps above.

Cleaning the history fixes your copy and the shared copy. It cannot un-ring a bell that already rang. If anyone downloaded the project before the cleanup, their copy still has the original data. If the hosting platform kept old copies cached somewhere, those might still exist too.

So the rule I live by: anything that was genuinely exposed should be treated as compromised. You clean the history, and then you assume the data leaked anyway, because you cannot prove it did not. With real patient data, that means re-securing it and disclosing it if the law requires.

This is exactly why preventing the problem beats fixing it. The cleanest history rewrite is the one you never have to do.

Why This Matters for Your Business

If you or your team "deleted" a sensitive file just by saving a new version, it is almost certainly still sitting in your project's history, fully retrievable by anyone with access.

This is the kind of quiet landmine I find when I audit projects that were built fast. It is especially common in projects built with AI coding assistants, which happily save a sample data file or a password without anyone thinking about what that means. The app runs fine. The exposure just sits there for months.

If you have shipped fast and you are not sure what is buried in your code history, that is exactly the kind of thing I check. It is a specific, finite problem. And finding it before someone else does is the entire point.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI fits.

Book a Discovery Call

Get AI insights for business leaders

Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.

Ready to automate your growth?

Book a free 30-minute strategy call with Hodgen.AI.

Book a Strategy Call