Avoxy Technologies
← Back to all projects

Generative Image Pipeline

Teaching myself AI image generation by shipping a book-writing agent

Self-directed experiment

Role

Self-directed experiment

Tech Stack

PythonLangChainGoogle GeminiMeta-promptingFastAPICloudflare R2Print-on-demand

I wanted to learn AI image generation properly — not toy prompts in a playground, but something with a hard, unforgiving output: a physical book someone would actually pay for. So I set the constraint and built an agent to hit it. Give it a title, get back a print-ready picture book. No human in the loop.

The constraint

A real book is a brutal test for image models. Thirty pages that have to feel like one book, not thirty unrelated images. Text and art that line up. Resolution and bleed that survive a printer. If any stage is sloppy, you can see it — in your hands.

So the bar was simple: type a title, get a finished, print-ready PDF — cover and all — good enough to send straight to a print-on-demand service.

How it worked

The agent ran the whole pipeline:

  • Plan the book. From just a title, generate the structure — what goes on every page.
  • Meta-prompt the art. For each page, the agent writes the image prompt — AI prompting AI. Getting this layer right mattered more than any single image model.
  • Generate in parallel. Pages render concurrently through the image model, so a full book takes minutes, not hours.
  • QA and retry. Each page is checked automatically and regenerated when it misses the bar.
  • Assemble for print. Pages and cover are composed into a print-ready PDF — correct trim, bleed, and resolution.

A full thirty-page book, start to finish, in about ten minutes.

A superyacht ABC — every page generated by the pipeline, then printed on demand.
A designer-shoe ABC. Same pipeline, different title in.

What I learned

Most of the difficulty wasn't the image model — it was everything around it. Consistency across pages is the real problem; one great image is easy, thirty that belong together is not. Meta-prompting — having the model write its own image prompts — was the highest-leverage layer; small changes there moved quality more than swapping models did. And print is unforgiving: DPI, bleed, and trim turn "looks fine on screen" into "wrong in your hands." The automated QA-and-retry loop is what made the output usable instead of a pile of near-misses.

What transfers

The goal was never to become a publisher. I pushed a handful of titles through print-on-demand purely to close the loop — proof the pipeline produced real, sellable artifacts, each one reviewed by hand before it shipped, not just nice-looking screen demos.

What transfers is the part that isn't the model at all: wrapping a generative-AI capability in the planning, meta-prompting, QA, and assembly that turn a clever demo into something that comes out right every time.

Want to ship something like this?

Book a 30-minute consult. No pitch - just a fit conversation.