AI Lead Scoring Bias: Why I Split One Score Into Two (Simply Explained)

The One Number That Was Burying Good Cases

I built a system for a personal-injury law firm to help them sort incoming calls. The idea was simple. When someone calls in with a potential case, the system gives that call a score. The higher the score, the sooner a lawyer calls them back.

The first version did exactly what people ask for. One number per caller. The list sorted itself, best on top. Clean and easy to explain in a meeting.

It was also quietly hiding the firm's best cases. And I did it by accident.

Here's what went wrong. A caller comes in: a worker hurt on a job site, no insurance, no savings. But there's a clear at-fault party (a general contractor) with real money behind them. Strong case. The kind a firm wants.

The system scored him low and dropped him to position 40 on the list. Nobody calls position 40 back the same day.

Meanwhile, a smooth talker with a weak case told a tidy, confident story and floated to the top. The injured worker sank. The polished caller won.

Nobody programmed that. The math did it on its own.

Why Mixing Two Things Into One Score Backfires

The score was secretly answering two completely different questions at once.

Question one: how strong is this legal case? That's a judgment call. Is there a clear at-fault party? Is the injury documented?

Question two: how much money can the firm actually recover? That's just arithmetic. Insurance limits, coverage, damages. You add it up.

These two things have nothing to do with each other. A case can be airtight and worth almost nothing. A case can be worth a fortune and impossible to prove.

When you blend both into one number, money starts overriding merit (or the other way around) in some ratio nobody actually chose. The computer picked it for you, silently.

Then you sort the list by that one number. Now the cases a human sees first are decided by an invisible formula nobody signed off on. Low-money individuals get pushed to the bottom every single time.

That's the problem with one number. You can't point to it and explain why it sorted the way it did. And if you can't explain it, you can't defend it.

The Fix: Two Numbers, Never One

The solution wasn't a smarter system. It was refusing to combine two things that should never have been combined in the first place.

So I split the score in two.

First number: legal strength. A score from 0 to 100 that answers one question only. How strong is this case? It looks at the facts. Is there a clear at-fault party? Is the injury documented? That's it.

It knows nothing about the caller's wallet. A strong case is strong whether the person is a CEO or a day laborer.

This is where I let the AI actually think. Judging how strong a case is takes reasoning, the kind of work you'd hand a junior attorney. AI that reads and reasons like a person is genuinely good at this.

Second number: money tier. A simple grade, A through B through C through D, estimating how much the firm could realistically recover.

Here's the key decision. This number is not generated by the AI. It's calculated by plain old math in the software.

Why? Because money recovery is just arithmetic. Insurance limit is 250,000, documented damages are 80,000. That's not a judgment. It's addition. And math belongs in code, not in an AI's opinion.

I do this for three reasons. The math gives the same answer every time, forever. I can show you the exact formula in an audit. And a smooth-talking caller can't inflate their own money tier, because the calculation doesn't care how confident they sound. It only cares about real coverage and real damages.

The two numbers sit side by side. Legal strength and money tier. I never multiply them. I never average them. The moment you blend them, you've rebuilt the bias machine.

A reviewer sees "case strength 88, money tier C" and instantly understands: strong case, modest payout. They decide what to do with it. The money tier informs the human. It does not bury the case.

Stopping the Smooth Talker From Gaming It

Remember the polished caller with the weak case? Here's how I shut that down.

The case-strength score gets knocked down when facts are missing. A beautifully told story with no medical records, no proof, no documentation cannot score 90. The missing facts cap it.

Confidence is the easiest thing in the world to fake. A nervous caller with a broken arm and a clear at-fault driver has a better case than a smooth talker with no records. The score now reflects what's actually known, not how well it was told.

Confidence can't buy a high score. Only facts can.

I also did something I care about a lot. Before the AI ever reads a call, I strip out anything about the caller's accent, immigration status, or where they're from. The AI scores the case on facts only, never on who the person sounds like.

I built this because the original one-number version was doing something ugly without being told to. It was rewarding fluent English. It was quietly punishing the exact people most likely to get hurt on a job site, immigrant workers and non-native speakers, the people this kind of firm often represents.

Sit with that. A tool built to help a firm that fights discrimination was repeating that discrimination at the front door. Not because anyone wanted it to. Because the raw call notes carried those signals and the AI picked them up.

I won't pretend the fix is perfect. Subtle hints can still slip through, and I keep tightening it. But scoring on cleaned-up notes beats scoring on raw notes every time.

The System Never Rejects Anyone

This is the guardrail that matters more than any score.

The system sorts. It never throws anyone out. A low score or a D tier does not drop a caller from the list. It just sends that caller to a human to review.

The AI decides what to look at first. It never decides what to throw away. No case quietly disappears because a machine didn't like the numbers.

Every caller carries two clear numbers a person can question and override. A reviewer can ask "why is this an 88?" and get a real answer. They can ask "why is this a C tier?" and trace the math.

That worker who used to land at position 40 now shows up as strength 88, tier C. A human sees that and knows exactly what they're looking at: a real case worth chasing for reasons the dollar figure alone would have hidden.

The lesson here is bigger than law firms. Any time you let AI fold two unrelated things into one score and sort by it, you've built a quiet bias machine. Loans, job candidates, support tickets, it doesn't matter. The math will discriminate for you on things nobody chose.

This is the work I actually do. Not "add a score." Pull the score apart, find where it's quietly hurting someone, and rebuild it so a human can stand behind it. That's the difference between AI that looks good in a demo and AI that survives a real audit.

Want to explore what AI could do for your business?

Book a free 30-minute strategy call. No pitch deck, no sales team, just a real conversation about your operations and where AI fits.

Book a Discovery Call