Master Data Record Deduplication Without Breaking Anything (Simply Explained)
A plain-language guide to master data record deduplication. No jargon, no tech speak, just what it means for your business.
By Mike Hodgen
One Person, Three Names, Three Different Answers
A window-treatment company called me with a simple question. Their COO wanted to know how much one sales rep sold last quarter, and what they paid her in commission.
They ran three reports. They got three different numbers.
The math wasn't broken. The problem was that one real person showed up as three different people in their computers. The HR system had her as "Jennifer Smith." The commission system called her "Jen Smith." The order report named her "J. Smith."
Nothing connected those three records. No employee number, no ID, nothing. Each system had its own copy of her, and none of them knew the others existed.
So every time someone tried to answer a basic question, they did it by hand. Squinting at names. Guessing which "Smith" was which. A report that should take five minutes ate up most of a day, and the answer still wasn't trustworthy.
This is what messy company records actually look like. Not some abstract score on a dashboard. Just one person scattered across systems, and no single place to find the truth about her.
The COO's real question was the one I hear most: our records are a mess, can we fix this without blowing up payroll?
Yes. And the fix is more about discipline than cleverness.
Why This Happens (And Why It Quietly Costs You)
This wasn't a typo problem. It was built into how the company grew.
Each system showed up at a different time. HR came first. Sales built the commission tracker later. The order report got bolted on when they needed numbers. Nobody ever agreed on a shared ID for each person.
The real culprit was typed-in names. When the only thing identifying someone is a name a human typed, you get "Jen" in one place and "Jennifer" in another. All correct. All the same woman. None of them connect.
The hidden cost is that nothing can be matched up. Commissions get attributed to the wrong person. The wrong installer gets sent to a job. Someone burns a whole day reconciling spreadsheets, and the result is still soft.
The deeper cost is trust. Once a report burns you with wrong numbers twice, you stop believing any of them.
Most small companies solve this in someone's head. They just know "J. Smith" is Jennifer. That works until that person quits, or the data grows, or an auditor asks you to prove it.
Why I Didn't Rip Everything Out and Rebuild
When people see a mess like this, the instinct is to tear it down and start clean. Rebuild it all over a weekend and flip the switch.
That is the single riskiest thing you can do.
Every report, every nightly export, every payroll calculation depends on how the data is shaped right now. Some of those connections are written down. Most aren't. They live in something someone wrote three years ago that quietly runs payroll every two weeks.
Miss one, and you don't find out at the time. You find out when paychecks are wrong or the install crew shows up at the wrong house.
So I set one rule and never broke it: every change had to add something, never remove. Nothing that was running could break.
Instead of demolishing the old setup, I built a clean layer next to it. The old systems kept running exactly as before. Then I moved things onto the clean layer one at a time, checking each one before moving on.
The whole job became a series of small, reversible steps instead of one giant gamble.
How the Fix Actually Worked
First, I picked one system to hold the real version of each person. I chose the HR table, because it had the most complete data and a stable ID that didn't change when someone's name got typed differently.
Then I connected the commission records to that ID. Not by name anymore. By a permanent number that never changes.
Here's the safe part. I didn't delete anything. The commission system kept its own record for Jennifer. I just added a note pointing it at the real person. The old reports kept working exactly as before. But now I could finally ask "show me everything tied to this one person" and get a complete answer.
Next, the names. The obvious move is to "fix" all three spellings to one and move on. That breaks history. Old records pointed at "Jen Smith." Past reports were built around "J. Smith." Erase those names and you orphan every record attached to them.
So instead of deleting, I built a translation list. Think of it like a list of nicknames for the same person. "Jen," "Jennifer," and "J. Smith" all point to one real human. Look up any spelling and you land on the same person.
The payoff was instant. That blended report that used to miss Jennifer's commission now found it through the nickname list and matched cleanly. The number was finally right, and I could explain exactly why.
There were two more pieces. One person could be both a sales rep and an installer, but the old setup assumed one job per person. So I let a single record hold more than one role, without touching the part that schedules installs.
And commission rates changed over time. The old system only knew today's rate, so recalculating last year's pay used this year's rate and quietly corrupted the number. I fixed it so every past payout uses the rate that was actually in effect back then.
What This Bought Them
Because every step only added and never removed, the proof was almost boring. Which is exactly what you want.
Every report and export ran the same before and after. Identical output. Nothing broke. No weekend downtime. No wrong paychecks. No crew at the wrong address.
What changed was what became newly possible. One real record per person. Reliable matching between sales and pay. Full history preserved. And the ability to answer "how much did this person sell and what were they paid" with one number you can defend.
Most companies live with this mess for years. Not because they can't see it, but because cleaning it up feels too risky to start. They assume fixing it means a rewrite, and a rewrite means betting payroll on it.
It doesn't have to be a rewrite. It can be clean layers added on top of what you already run, checked one at a time, with nothing at risk.
If your records are a tangle of duplicates and mismatched names, the first move is to look at the actual shape of the data before anyone touches it. That's where I start. Have me look at your data mess and I'll tell you what's really going on before we change a single thing.
Want to explore what AI could do for your business?
Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI fits.
Get AI insights for business leaders
Practical AI strategy from someone who built the systems — not just studied them. No spam, no fluff.
Ready to automate your growth?
Book a free 30-minute strategy call with Hodgen.AI.
Book a Strategy Call