Building a development diary for Claude Code

I'd seen a reference to developer diaries — the kind game studios like Paradox publish, the kind individual developers keep in plain text files, the kind that occasionally show up in long-form blog posts where someone explains how they actually built something. I asked Perplexity for a quick summary of the format's history, and what came back was a tidy taxonomy: game-dev diaries, personal coding logs, modern cloud-based variants that auto-generate from version control.

The cloud-based variant is interesting on the surface and disappointing on inspection. Auto-generating documentation from commit messages produces something that nobody reads — because the model of what happened is wrong. Commits are the artifacts of work, not the process of it. They tell you bucket.py was modified at 14:32, not that the first ring-buffer implementation dropped context I wanted to keep, that I spent forty minutes trying to weight events by token count before realising token count isn't a relevance signal, and that I pivoted to a content-aware filter at ingestion. The texture lives in those forty minutes, and git log has none of it.

I believe, for AI-assisted development this matters more than it does for solo human coding. When I sit down with Claude Code, the session contains a reasoning trace — the things tried, the dead ends, the moments I correct course, the sub-agent that came back with timings that didn't match what I'd assumed. None of that survives the session. The next time I open the project, or the next time another agent picks it up, the conversation is gone. The commits remain. The texture is lost.

So I asked Claude what it would take to build a Claude Code skill that captured this — a real diary, written in the voice you'd actually use in a notebook at the end of a working day, not an auto-generated changelog.

What the first design got wrong

The first instinct was structured. Each entry would have sections: Context, Attempts, Outcome, Pivot, Lessons. Frontmatter with tags. A timestamp range. The kind of thing that looks tidy in a directory listing and is easy to grep.

It was wrong. Reading the early sketches felt like reading reports, not journal entries. The structure was doing the work the prose should have done. Real coding journals — the human kind people actually keep — read more like:

Started on the shishi bucket — content-aware event accumulator for tmuxllm. First instinct was a ring buffer with fixed-size window. Wrote about eighty lines and hit a wall: when events arrive in bursts, the window drops context I want to keep, because relevance isn't correlated with recency. Tried weighting by token count next — keep the heavy stuff, drop the rest. Broke on a different axis: short events can be high-relevance (a stack-trace summary line), long events can be padding (a config dump). Token count isn't a relevance signal. Pivoted to a content-aware filter at ingestion. Worked in about two hours after the pivot.

One paragraph. No headers. The structure is carried by the verbs — tried, broke, pivoted — not by section labels. The lesson, if there is one, lives in the final sentence: Token count is a tempting proxy for relevance and it's wrong. That's it. No "Lessons Learned" heading, no bullets.

I switched the design to flowing first-person prose. Entries should read like a developer wrote them, because that's what makes them re-readable months later.

The capture problem

The second wrong turn was assuming Claude could just write entries when it noticed something worth recording. Mid-session, after a pivot, at the end of a session — the model could decide.

That doesn't work. There are two failure modes that pull in opposite directions. The first is the model decides nothing is worth logging and writes nothing — the modal outcome when you ask an agent to "remember to do X" in the middle of doing Y. The second is the model logs everything, mechanically, every successful test run and every file edit. Both produce a diary nobody reads.

What you actually want is deterministic capture of raw events and judgment-driven synthesis into prose. Those are different jobs that should be done by different things.

Claude Code has hooks. Hooks fire on lifecycle events — PostToolUse after every tool call, UserPromptSubmit when the user types something, SubagentStop when a sub-agent task returns, and a handful of others. They run shell commands, not model calls. They execute every single time their conditions are met, regardless of what the model decides. They are the deterministic layer.

So, hooks capture events to a structured log — append-only, one JSON object per line, in .dev-diary/.events.jsonl. The model never decides whether to log; it just happens. Synthesis is the model's job, invoked manually via a slash command at the end of a session, when the full reasoning trace is still in the active context window.

The events.jsonl tells you what happened with timestamps. The session context tells you why and how. Together they let the model write something readable.

Inline versus deferred

The last design tension was when synthesis happens. Two options.

Inline: the user types /diary at the end of a session, and Claude — with the full reasoning trace still in its active context — writes the entry. The texture of why things happened is right there in the conversation. Deferred: the SessionEnd hook fires a small prompt-type hook that synthesizes from events.jsonl alone, or alternatively the next session's first turn reads the previous session's events and writes the entry retrospectively.

Deferred is appealing because it doesn't compete with the user's attention at end of session, and because fresh context produces cleaner prose. But deferred only has the events log, not the reasoning trace. The events log tells you what files were edited and what commands ran, with exit codes; it does not tell you that I pushed back on a design decision and steered to a different approach, or that Claude considered approach A, sketched it out, and rejected it before approach B.

That texture only lives in the active context. So inline won — manual /diary invocation while the session is still alive, with the events log as one input among several, the conversation itself as the other. The hooks still capture events deterministically the whole time, so even if /diary never gets typed, the audit trail is there to reconstruct later.

This also rules out an automatic SessionEnd auto-synthesizer, which would have been the obvious nice-to-have. By the time SessionEnd fires, the main session is gone. A prompt-type hook at that point synthesizes from a stripped context, which produces exactly the auto-generated-changelog feel I wanted to avoid.

Shipping it

The skill is at ikangai/claude-skills/diary. The whole thing is one SKILL.md (voice rules, synthesis workflow, when-to-invoke decision rule), one bash script (log-event.sh, ninety-odd lines, defensive, exits zero on every failure mode so a misconfigured hook never breaks a Claude Code session), and three reference docs (voice examples, hooks setup, events schema).

Five hook events get captured: Write/Edit/MultiEdit (state changes to files), Bash (commands and exit codes — failures are the moments worth retelling later), UserPromptSubmit (user feedback and course-corrections), SubagentStop (sub-agent task returns), and Stop (turn-final messages, optional). Read, Grep, Glob, LS are deliberately not captured — those are reconnaissance, not narrative. Capturing them would bury the events log in noise.

Long fields truncate at five hundred characters. The script wraps every hook payload with a normalized envelope and writes one line to .dev-diary/.events.jsonl. If jq is missing it logs a sentinel event with "error": "jq_not_installed" so the missing dependency is visible without breaking anything. Whole script fits on one screen.

The diary itself lives at .dev-diary/ at the project root, gitignored by default. Entries are YYYY-MM-DD-<slug>.md. The audit trail is .events.jsonl. The synthesis side reads both — the events to anchor the timeline, the active session to fill in the texture — and writes one paragraph of first-person past-tense prose, with the lesson sentence at the end if one was earned.

The diary will document its own continued development. That's the kind of self-referential thing that's only useful if it actually works, so I will find out. v0 is shipped; if entries stop getting written, or get written badly, the skill itself needs work, and that work will be documented in the diary, and that documentation will be the first real test of whether any of this was a good idea.