Self-Analyzing Conversations to Create a Reusable Skill

After six sessions of iterative Remotion animation work, each with detailed feedback on timing, easing, typography, and layout — feedback that had never been written down anywhere, just scattered across hundreds of messages in conversation JSONL files — I asked Claude to read all of it, extract the patterns, and turn them into a skill that any future session could load.

This is a form of meta-cognition: the model analyzing its own correction history to build a playbook for doing better work.

Reading its own history

Claude located all six conversation JSONL files from the remotion-animation-main branch — approximately forty megabytes of structured session data. It spawned two subagents to parallelize the work: one focused on workflow patterns (how I prompted, what information I provided, what order I gave feedback in) and one focused on style extraction (what specific design decisions I approved, rejected, or corrected).

The style extraction subagent parsed user messages specifically, filtering out all agent responses and focusing only on what I had said. It built a list of every explicit correction (“the cursor should arc, not go straight”), every confirmed preference (“yes, expo-out is right”), and every rejection (“no bouncing, ever — elastic easing looks wrong in UI recordings”). Across six sessions and hundreds of user messages, it identified seventeen distinct, actionable patterns.

What the patterns were

The timing rules were the clearest category. Entrance animations: three hundred to four hundred milliseconds, expo-out easing, no exceptions. Interaction feedback: one hundred fifty milliseconds maximum — anything longer makes the UI feel sluggish on screen. Stagger between sequential items: forty to sixty milliseconds per item. Never use bounce, elastic, or spring overshoot — these easing curves look appealing in toy demos but read as glitchy in screen recordings of real software.

Typography for video has different requirements than typography for web. The minimum body size is twenty-four pixels; anything smaller is unreadable after video compression. Headings need negative letter-spacing of at least negative zero point zero three em. Use Inter with weight variations for hierarchy rather than switching font families. Labels should be eleven pixels, uppercase, with wider tracking than you’d use on the web — video compression blurs fine detail, so labels need extra clarity.

Layout rules: sixteen-by-nine safe margins with forty-eight pixel gutters. Content centers on the focal point — wherever the camera is looking, that’s where the primary element should sit. No more than three distinct elements should be entering or animating simultaneously; more than that creates visual noise that the viewer can’t track. Everything aligns to an eight-pixel grid.

The motion principles were the most nuanced to extract because they were expressed as corrections rather than rules. Camera movement establishes context — when you pan to something, you’re saying “look here.” Element animation creates focus — when something scales or fades, it’s asking for attention. These two should never compete. Stagger implies that items in a sequence are related; simultaneous appearance implies they’re a group. Every element entering should have space to breathe before the next one starts.

The output

The skill file documented all of this in a structured format: animation timing constants with numeric values and their rationale, a typography scale for 1920×1080 video with all sizes and weights, layout composition rules, scene transition patterns with duration recommendations, sound design guidelines, and a list of anti-patterns to avoid with explanations of why each one looks wrong. Future video sessions load this skill at the start and produce output that already incorporates six sessions of accumulated taste — without needing to re-discover any of it through correction cycles.