Skip to content

A 42-Million Token Design Marathon

The largest single AI coding session on record: 42 million tokens of iterative motion design refinement for a product demo video. 28+ rounds of granular feedback on cursor animations, camera panning, modal sizing, icon spacing, and scene transitions.

Codex CLI Codex CLI / gpt-5.4 41.9M tokens 600 messages ~12 hours 24 files

Forty-one point nine million tokens. Six hundred messages. Twelve hours. One continuous conversation that started with “hii” and never stopped.

The goal was a Remotion-powered product demo video for GAIA’s workflow feature — something that could live on the landing page and explain in thirty seconds what the product does. I had a rough first version but it felt like a developer built it: functional, not cinematic. What followed was the most granular creative iteration I’ve ever done with an AI.

The feedback loop

Every round went the same way: I’d watch the rendered video, identify exactly what was wrong, and describe it in precise terms. The agent would implement the fix and re-render. Then I’d watch again. Twenty-eight rounds of this, over twelve hours, with feedback that got progressively more surgical as the obvious problems were solved and the subtle ones emerged.

Early rounds were about fundamental correctness. The cursor was moving in straight lines between click targets. Real cursors don’t do that — they arc, they overshoot slightly, they decelerate before landing. I asked for natural cursor movement. The agent implemented cubic bezier interpolation with slight randomized jitter on each segment, sampling a different bezier curve per path so no two movements looked identical. The cursor now felt like a hand was driving it.

The camera was cutting to scenes like a PowerPoint slide transition. I wanted it to feel like a documentary: subtle push-ins when something important was happening, pull-outs when revealing context. The agent built a focal-point camera system — for each scene, I declared a focal point (the primary element the viewer should notice), and the camera would scale and translate simultaneously to center on it over four hundred milliseconds with expo-out easing. When the modal opened, the camera zoomed gently into the modal’s center point. When the workflow steps cascaded in, the camera held wide to show all of them.

Mid-session, I flagged a deeper issue: the triggers were all appearing simultaneously instead of cascading. It looked like a loading glitch, not an intentional reveal. The fix was staggered entrance with elastic easing — each trigger card arriving sixty milliseconds after the previous, with a cubic-bezier(0.68, -0.55, 0.27, 1.55) curve that gave each card a slight bounce on landing. The cascade now read as “these things are flowing in” rather than “these things broke.”

Typography and architecture

Halfway through, I noticed all the text looked tiny and hard to read. The problem was a fundamental video vs. web difference I’d overlooked: a paragraph styled at fourteen pixels looks fine on a monitor but reads as seven pixels when scaled to a video frame at full resolution. Every text element needed to be essentially doubled. The agent rebuilt the entire typography scale from scratch — headings at sixty-four pixels, body copy at twenty-eight, UI labels at twenty-two — and also had to recheck spacing and padding throughout because doubling text size without adjusting layout produces a cramped result.

That surfaced a second structural problem: the entire composition was one monolithic React file. A single twelve-hundred-line component with inline styles everywhere. With twenty-eight feedback rounds worth of incremental edits piled on top of each other, the file had become brittle — changing one element’s timing risked breaking the layout of everything downstream. The agent refactored the composition into six modular components: CursorLayer, CameraRig, TriggerRow, WorkflowModal, SceneTransition, and TypographyScale. Each became independently editable. This wasn’t asked for — the agent proposed it when it recognized the maintenance risk and I approved it. The remaining fifteen rounds of feedback were dramatically faster because of it.

What 41.9M tokens actually produced

The final count is staggering in abstract terms — it’s roughly equivalent to the full text of thirty-one thousand pages processed in a single conversation. But the number that matters is twenty-eight: twenty-eight complete render-and-review cycles in twelve hours. A professional motion designer working in After Effects might complete three or four review cycles in a day. The agent completed twenty-eight, each producing a runnable video within seconds of the change request. The quality bar at the end — bezier cursor paths, focal-point camera, scene continuity anchors, elastic staggered reveals, modular Tailwind components — is what you’d expect from two weeks of professional motion design work.

Hello, World