I Built an AI Photo Book Maker That Curates Itself

Why Most AI Photo Tools Are a Lie

Every "ai photo book maker" on the market right now is lying to you. Not maliciously. But the promise and the product don't match.

Here's what they actually are: a thin assistant bolted onto a manual editor. The tool tags faces. Maybe it auto-fills a template with whatever photos you drag in. Then it hands you a 1,400-photo drag-and-drop nightmare and calls it "AI-powered."

The marketing says "automatic." The reality is you still do all the work.

And the work it leaves you with is the actual hard part. The layout engine was never the bottleneck. Templates have existed for twenty years. The hard part of a photo book is the culling, the sequencing, and the taste. That's the job. That's why most people never finish theirs.

So I built the opposite.

I've been developing an autonomous memory-book builder where you say a trip name and a date range, and a finished premium book comes back. No template-filling. No drag-and-drop. Zero manual curation. The system does the brutal, decision-heavy work that the existing tools quietly hand back to you.

I want to be straight about where this stands: it's a project in development, not a shipped product you can buy today. I built it the way I build most of my systems, to solve a problem I personally had. I take a lot of photos. I finish almost none of the books.

But the interesting part isn't the photo book. It's the architecture underneath it. Because once you've watched a crew of AI agents do an entire taste-driven creative job end to end, you stop asking "can AI help with this task" and start asking a much bigger question about your own business.

I'll get to that. First, let me show you why this is so much harder than it looks.

The Job a Photo Book Actually Requires

A single trip produces somewhere between 1,200 and 3,000 photos. A good 40-page book uses maybe 80 of them.

That ratio is the whole problem. You're throwing away 95% of what you shot, and the value of the book lives entirely in which 5% you keep and how you arrange them.

A professional doing this by hand does three distinct jobs. None of them are mechanical.

Culling is the real work

You shot the same sunset eight times. Six are near-identical. One is slightly sharper, one has better color, the rest are noise.

Culling means deduping every cluster of near-identical shots down to the single best version of that moment. Across 2,000 photos, that's hundreds of micro-decisions. It's tedious. It's exhausting. And it's the first thing that makes people abandon a book at hour two.

Sequencing is storytelling

Once you have your keepers, you can't just dump them chronologically. A chronological dump is a camera roll, not a book.

A good book has an arc. A beginning that sets the scene, a middle that builds, an ending that lands. You're deciding where moments breathe and where they compress. You're building beats. This is editorial work, and it's invisible when it's done well.

Taste is the part nobody automates

This is the one that breaks every existing tool.

Taste is selecting for wow-density instead of coverage. The amateur instinct is to include one photo of everything, every meal, every landmark, every group shot. The result is padded and flat. The professional instinct is the opposite: every spread has to earn its place, even if that means leaving out an entire afternoon.

That's hours of brutal, decision-heavy work. It's exactly why people pay a professional to do it, or why their photos sit on a drive forever.

The Agent Crew: One AI Per Human Role

Here's where most AI tools go wrong: they use one model to do everything. One giant prompt, one autocomplete-style pass, and you get autocomplete-style results. Mediocre and generic.

Pipeline diagram showing five specialized AI agents (Archivist, Curator, Story Editor, Art Director, Designer) processing a trip from photo ingestion to a finished draft book The Agent Crew Pipeline (one AI per human role)

I did the opposite. I assigned each human role in the process to a dedicated agent. A crew, not a single brain. This is the same pattern I use across multi-specialist AI teams for consumer apps, and it's the core idea that makes the whole thing work.

Here's the crew.

Archivist and curator

The Archivist ingests and analyzes every single photo. It scores composition, sharpness, who's in the frame, and what moment it belongs to. Every photo gets a profile.

The Curator takes those profiles and does the culling. It dedupes each cluster to the best-of-moment and ranks the keepers. This is the 2,000-down-to-200 pass, done with consistent criteria instead of decision fatigue.

Story editor and art director

The Story editor takes the keepers and builds the narrative arc. It breaks the trip into beats, beginning, middle, end, and decides what story the book is telling. Not chronology. Structure.

The Art director makes the impact calls. Which moments deserve a full double-page spread, and which get a thumbnail in a grid. This is where the wow-density judgment lives.

Designer

The Designer lays out the actual spreads. One key technical detail here: it lays out gutter-safe, meaning nothing important falls into the fold where the pages meet. No faces split down the middle. It also writes captions in a real voice, with actual context, not "Photo 1" and "Photo 2."

Why does this beat one giant prompt? Because each agent has a narrow job, a clear success criterion, and can be tuned independently. When captions feel off, I fix the Designer without touching the culling logic. Specialization means I can debug taste one role at a time. That's something a monolithic model can't give you.

The Critic Agent: Why It Re-Edits Before You See Anything

This is the part I'm proudest of, and it's the part that builds trust.

Circular diagram showing the critic agent reviewing the draft book, sending weak spreads back for re-editing in a self-correction loop before any human sees the result The Critic Agent self-correction loop

After the crew produces a draft book, a critic agent reviews it. Not the photos individually. The whole book, the way a tough editor would read a finished draft.

It looks for flat pages that don't land. Redundant moments that repeat the same beat. Weak captions that say nothing. And the killer, a sagging middle, where the book loses energy halfway through.

When it finds weak spreads, it sends them back for re-editing. Before any human ever sees the result.

Think about what that means. The system catches its own bad work. This is the exact opposite of babysitting AI, where you sit there correcting a tool that keeps making the same mistakes. I build this self-correction into most of my systems. I've written before about an AI that rejects its own bad work, and the principle is the same here: quality control belongs inside the system, not bolted on afterward by a tired human.

Now, honesty. The critic isn't perfect. It has taste preferences baked in, which means it has blind spots. The biggest one: it occasionally over-culls a moment that mattered emotionally but scored low technically. A slightly blurry photo of someone you love beats a tack-sharp photo of a building, every time. The critic doesn't always know that.

That limitation is the point. It's exactly why a human still belongs in the loop, which I'll get to. But the critic does something no consumer photo tool does today: it edits itself before you, and it raises the floor on every book it produces.

What "Wow-Density Over Coverage" Means in Practice

Let me make this concrete, because it's the single design decision that separates a good book from a boring one.

Comparison graphic contrasting coverage-based layout (uniform even grid, boring) against wow-density layout (uneven, high-impact spreads with weak sections left out entirely) Coverage vs Wow-Density tradeoff

Most auto-generated books optimize for coverage. A little of everything, evenly spaced. Every day gets roughly equal pages. Every location gets a photo. It feels fair and complete.

It's also boring. Coverage produces a padded book where nothing stands out because everything is treated the same.

I optimized for wow-density instead. The rule is simple: every spread should make you stop. If a page doesn't earn attention, it doesn't exist.

In practice this means the system will leave out an entire afternoon if nothing in it scored high enough. A whole block of the trip, gone, because it didn't produce anything worth a spread. And it means one extraordinary moment might get a full double-page spread to itself, where a coverage-driven tool would have squeezed it into a four-photo grid.

That's the tradeoff the agents make on purpose: fewer photos, higher impact, ruthless selection.

Here's the reframe for anyone running a business. Taste isn't an accident or a vibe. It's a design decision you can encode. "Leave out the afternoon if it's weak" is editorial judgment expressed as a rule an autonomous system can follow. You can put real editorial judgment into a system, not just mechanical filters like "remove blurry photos." That distinction is most of what makes agentic systems feel different from the tools that came before.

Where the Human Still Enters the Loop

If you're a skeptical buyer, here's the question forming in your head: where does this break, and how much hand-holding does it actually need?

Comparison diagram showing old autocomplete tools requiring a human gate at every step versus an agentic system that runs autonomously with a single human approval gate at the end Autocomplete vs Delegated Job (one human gate)

Fair question. Here's the honest answer.

The system runs fully autonomously through curation, sequencing, layout, and self-critique. Nobody touches it during those stages. The agents ingest, cull, sequence, design, and re-edit without me watching.

But I stop it before final output for a human approval pass. On purpose.

The reason is the limit I mentioned earlier. Emotional context is something the agents can't fully judge. The blurry photo that's the only shot of someone who matters. The unremarkable moment that's actually the most important one of the trip because of something the camera couldn't see. A score can't catch that. A person can.

This is human-in-the-loop by design, not a gap I'm apologizing for. Every AI system I ship stops for a human at the point where judgment genuinely requires context the machine doesn't have.

And here's the key distinction for the skeptical CEO. This isn't autocomplete that needs constant babysitting through every step. It's a full creative job delegated end to end, with a single human gate at the finish line. You're not correcting the work. You're approving it.

That difference, between supervising every keystroke and approving a finished draft, is what's actually new about agentic systems in 2026. The old tools needed you in the loop constantly. This one needs you once, at the end, where it matters most.

What This Says About Delegating Creative Work to AI

Zoom out from photo books for a second, because the book itself was never the point.

The point is the demonstration. If a crew of agents can do an entire taste-driven creative job, the culling, the narrative, the design, and the self-editing, autonomously, then the question for your business changes.

It's no longer "can AI help with this task." That's the autocomplete question, and most leaders are still stuck on it. They think of AI as a smarter autocomplete that speeds up steps a human still owns.

The better question is: which whole jobs can I now hand to a crew of specialists? Not tasks. Jobs. End-to-end work with judgment in it, the kind you assumed only a person could own.

That's the agentic shift, and it's bigger than the autocomplete framing lets most people see. The photo book proves it works on a job that's almost entirely taste. If it works there, it works on plenty of jobs sitting in your operations right now that you've never thought to question.

This is the thinking I bring as a Chief AI Officer. I don't just tell you what's possible in a slide deck. I build the systems, the way I built this one and the fifteen-plus others running in production across my own businesses and my clients'. If you want the longer version of what a Chief AI Officer actually does, I wrote it down.

The leaders who win the next few years won't be the ones using AI to type faster. They'll be the ones delegating whole jobs to crews of agents while everyone else is still autocompleting.

Thinking about AI for your business?

If this resonated, let's have a conversation. I do free 30-minute discovery calls where we look at your operations and find where AI could actually move the needle, not where it sounds good in a board meeting.

Book a Discovery Call