AI Security Audit: 19 Lenses, 58 Codebases, One Day

The Problem With Building Fast: Security Debt Hides Between Projects

I run dozens of revenue systems across 58 codebases. A DTC fashion brand. A pricing engine touching 564 products. A content pipeline managing 313 blog articles. Internal tools, client builds, experiments that turned into production systems because they worked.

Every one of them shipped fast. That was the point. When you build at speed, you ship the thing that makes money and you move on.

But here's what nobody tells you about building this way: none of those 58 codebases had ever had a single unified security read. Not one. Each repo got attention when it was being built, sure. Then it went live and I stopped looking at it the way an attacker would.

The danger was never any one repo. The danger was that nobody, including me, knew the true blast radius across the whole estate. A leaked key in one project. An open database in another. A missing access control somewhere I'd half-forgotten existed. Findings were scattered across different tools, different mental notes, different moments of "I should fix that later."

That's the trap. When you build fast, the security debt hiding inside fast AI builds doesn't collect inside any single project. It collects in the gaps between them. Each repo looks fine in isolation. The risk lives in the aggregate, in the parts no single audit ever scoped.

So I had a problem most growing companies have and never name: I had no idea what my real security exposure was across everything I'd built.

The question was how to run an AI security audit across 58 repos without spending months and a fortune doing it. A traditional approach would have cost me a quarter of a year and six figures. That math doesn't work for one person, and it doesn't work for most small businesses either.

So I built something else.

Why a Traditional Security Audit Doesn't Fit a Multi-System Operation

Let me be fair to the traditional security world before I tear into it. Deep manual penetration testing has real value. If you have one flagship product handling sensitive transactions, you want a skilled human trying to break it. That work is irreplaceable.

Comparison table showing traditional pen-test firm costs (one year and six figures for 58 codebases) versus an AI estate audit (one day at the cost of compute, repeatable monthly) Traditional pen-test cost math vs. AI audit economics

But it doesn't fit a multi-system operation. Here's the math.

The pen-test firm math

A traditional security firm quotes you in weeks, and they scope one application or one repo per engagement. Call it two to four weeks per app, somewhere between $15,000 and $40,000 depending on depth.

Now multiply that by 58 codebases.

That's not a project. That's a year of work and well over six figures. No solo operator pays that. No $5M business pays that. And by the time you finished auditing repo 58, repos 1 through 30 would have shipped new code and gone stale.

A traditional audit produces a snapshot. You pay a fortune for a photo of one moment, and the moment passes the second you deploy again.

The single-repo blind spot

Here's the deeper issue. Pen-test firms scope one repo because that's how the work is sold. But the risk in a multi-system operation isn't inside one repo. It's the pattern across all of them.

If the same authentication mistake exists in twelve projects, a single-repo audit catches it once and misses eleven. You fix the flagship and leave the rest exposed.

Most owners assume security audits are slow, expensive, and only worth it for big companies. That assumption is exactly what leaves small operations wide open. You skip the audit because the only version you know about costs too much, and you tell yourself you're too small to be a target.

You're not too small. You're just unaudited. There's a difference.

The 19-Lens Audit Spec: What I Actually Check

The core of this is a documented, reusable spec. Nineteen lenses, applied consistently to every repo. This isn't a one-off. It's a methodology I wrote down so I could run it again next month for the cost of compute.

The lenses

The 19 lenses group into categories that cover the gaps where real damage happens:

Infographic grouping the 19 security audit lenses into three categories: critical exposure, attack surface, and resilience and visibility The 19-lens audit spec organized by category

Secrets management, API keys, tokens, credentials committed to code or sitting in plain config
Authentication and identity, how users prove who they are, where that breaks
Access control and row-level security, whether one user can read another user's data
Dependency vulnerabilities, known holes in the packages every project pulls in
Data encryption at rest, whether sensitive data is encrypted in the database or sitting in plaintext
Observability and logging, whether you'd even know an attack happened
Backup and recovery, whether a revenue-critical system can be restored if it goes down
Compliance exposure, PHI and PII handling, the stuff that turns a breach into a legal event
Input validation, whether user input can be weaponized
Rate limiting, whether your endpoints can be hammered or scraped

And more across configuration, error handling, and data exposure. Every lens is a specific, checkable question, not a vibe.

The whole thing is read-only by design. No destructive testing. The agents read code, they don't run exploits against live systems. For an estate of revenue systems, you don't want an audit that might knock something over.

Per-repo tiering

Not every repo earns the full 19-lens treatment. That would be wasteful.

Decision diagram showing repo tiering: live or revenue-critical repos get the full 19-lens audit while dormant repos get a light pass on secrets and exposed data Per-repo tiering: deep audit vs. light pass

So the spec includes tiering. Live, regulated, or revenue-critical systems get the complete 19-lens audit. Dormant repos, experiments, and dead-end prototypes get a lighter pass focused on the highest-severity lenses, secrets and exposed data.

This is what makes a full-estate sweep affordable. You spend the deep effort where the blast radius is real and skim where it isn't.

What surprised me least was how often the same five security holes show up in every AI-built app. The same patterns recur across projects built months apart. The spec exists because those patterns are predictable, and predictable means checkable at scale.

How 400+ AI Agents Read Every Repo in Parallel

A human auditor reads code sequentially. One file, then the next. Across 58 codebases that's where the months go.

I don't read sequentially. I read in parallel.

Parallel reads, not sequential

More than 400 AI agents ran across the estate, each assigned to specific repos and specific lenses. In a single day they read 13.3 million tokens of code. That's the entire estate, every lens, in one day instead of a year.

Each agent writes to a structured findings schema. Severity, lens, repo, file, line, description, recommended fix. Consistent fields, every time. That matters more than it sounds. The output isn't a wall of prose you have to wade through. It's a sortable dataset. I can filter to every critical finding across all 58 repos in one query.

That structure is the difference between a report you read once and a tool you actually use.

The adversarial verify phase

Here's the part I won't skip, because skipping it is how AI security audits earn their bad reputation.

Flowchart of the AI security audit pipeline: 58 codebases read in parallel by 400+ agents, first-pass findings, adversarial verify phase, and a structured findings schema Parallel agent read pipeline with adversarial verify phase

AI generates false positives. A lot of them. An agent will flag a "hardcoded secret" that turns out to be a test fixture, or a "missing access control" on an endpoint that's already protected upstream. If you ship the raw first pass, you've handed someone a panic attack, not an audit.

So there's a second pass. An adversarial verify phase. A separate set of agents takes each finding and tries to confirm or disprove it. They look at surrounding context, trace the actual data flow, and check whether the flagged issue is real or a misread.

A finding only counts after it survives the verify phase. This is non-negotiable. Good AI agents code review work isn't the first read. It's the verification. The first pass casts a wide net. The verify pass is what separates a useful automated code security audit from noise that nobody trusts.

What the Audit Found: 32 Critical and 131 High Findings

After the verify phase, across the whole estate: 32 critical findings and 131 high findings.

Let me be clear about what those were, with everything anonymized.

The categories that recurred

None of these were exotic. That's the part worth sitting with.

Data visualization showing 32 critical and 131 high findings and a bar chart of recurring categories led by publicly readable databases and committed secrets Audit findings breakdown by severity and recurring category

Publicly readable databases. This was the category that stung. Databases readable by anyone with the URL, where the access key meant to gate the data did nothing of the kind. Anyone who found the endpoint could read everything.
Secrets committed or under-protected. Keys in config files, tokens that should have rotated, credentials with more access than they needed.
Missing access controls. Endpoints that trusted the caller when they shouldn't have. Row-level security that wasn't enforced.
Unencrypted sensitive data. Information sitting in plaintext that should have been encrypted at rest.
No backup and recovery for revenue-critical systems. Systems that, if they went down, had no clean path to restoration.

The same boring gaps, repeated across many fast-built projects. No zero-days. No clever attacker. Just the predictable cost of shipping fast across an estate nobody had ever read as a whole.

The blast radius nobody had measured

The findings list wasn't the real product. The blast radius was.

For the first time I could answer the question that had been sitting unanswered for years: if someone wanted to do damage, where exactly could they get in, and how far could they reach across everything I'd built?

That map is worth more than any single fix. You can't prioritize what you can't see, and you can't sleep well not knowing.

I'll be honest about the hard part. Finding problems is the easy part. Fixing 32 critical and 131 high findings, then re-verifying each fix actually closed the hole, is the real work. The audit doesn't do that for you. It tells you exactly where to spend the work.

The Real Product Is the Spec, Not the Findings

Any single audit goes stale the moment you ship new code. The 163 findings I got were a snapshot. Useful, but perishable.

What doesn't go stale is the spec.

The 19 lenses. The severity rubric that decides what counts as critical versus high. The tiering rules that say which repos get the deep read. The structured findings schema. All of it is documented and reusable. I own the process, not just one report.

That's the shift that makes a comprehensive multi-repo security review viable for a small business. The old model is a consultant who flies in once a year, charges you a fortune, and hands you a PDF that's outdated before you've read it.

The new model is a documented process you run on demand. I can re-run the entire 58-repo sweep next month for the cost of compute, which is a rounding error next to weeks of billable hours. After a big release, I run it again. The output is consistent every time because the spec is fixed.

This is the answer to the cost-and-scale doubt directly. You're not buying a snapshot. You're buying a repeatable capability. The marginal cost of the second audit, and the third, and the twelfth, is close to zero.

A security audit for small business was never supposed to be a luxury. It was just priced like one because the only delivery method was expensive human hours. Change the delivery method and the economics change with it.

Running This On Your Own Systems

Most companies running 5 to 50 internal tools, integrations, and apps have never had a unified security read either. You've got the CRM integration someone built two years ago. The internal dashboard. The customer portal. The three Zapier-and-code mashups holding operations together. Each one shipped fast. None got read as a whole.

The approach is the same whether you have 5 repos or 58. Same 19 lenses, same parallel reads, same verify phase, same blast-radius map at the end.

Let me be honest about the limits, because that's how you know I'm not selling magic.

This doesn't replace deep manual penetration testing on a flagship product. If you have one critical application handling money or health data, you still want a skilled human trying to break it. The two things are complementary, not interchangeable.

And the verify phase still needs human judgment on what to fix first. The audit tells you what's wrong and how severe. It doesn't decide your priorities or do the remediation. That's your call and your work.

But for measuring blast radius across an entire estate, fast and repeatable beats slow and partial every time. A complete read in a day, repeatable forever, beats a perfect read of one repo while the other 57 sit unexamined.

If you don't actually know your true security exposure across all your systems, I can run this audit and hand you the spec so you can re-run it yourself. You can run this audit on your own systems and stop guessing about your blast radius.

Thinking about AI for your business?

If this resonated, let's have a conversation. I do free 30-minute discovery calls where we look at your operations and identify where AI could actually move the needle.

Book a Discovery Call