You wrote it clearly in SKILL.md. "Always run tests before committing." Three commits later your CI is red and the model is explaining why this particular commit was a special case.
You wrote it in SOUL.md. "Never run destructive commands without confirmation." This morning your agent ran rm -rf node_modules/ because it decided that was cleanup, not deletion.
The model isn't disobeying you. It's making a judgment call. Judgment calls fail at scale.
Issue #6 covered SKILL.md and how to engineer descriptions that actually get triggered. Here's what nobody tells you after you write your first skill: skills are suggestions. They're context the model reads and then weighs against everything else competing for attention that turn. Recent measurements put skill-only invocation rates around 20% across complex tasks. Adding hooks pushes that to 84%.
That four-times-reliability gain is not from a better prompt. It's from a different layer entirely.
The probabilistic ceiling
Every layer of your bootstrap is probabilistic until proven otherwise:
SOUL.md- personality the model might honorAGENTS.md- rules the model might followSKILL.md- workflows the model might invokeSubagents - delegations the model might spawn
The model evaluates all of it against the task in front of it. When the task is hard, "complete the task" outweighs "follow the process." This isn't a bug in the model. It's the nature of language models trading off objectives every token.
You cannot fix this with more instructions. Adding another SKILL.md does not make the existing ones more reliable. Writing "ALWAYS" in caps does not make the rule more binding. The ceiling is structural.
Hooks live below that ceiling.
What hooks actually are
A hook is a shell script bound to a lifecycle event in your agent harness. When the event fires, the script runs. Unconditionally. Exit code 0 lets the action proceed. Exit code 2 blocks it. There is no judgment call. There is no "this is a special case." There is a process and an exit code.
Claude Code ships 25 hook events. Most of them you'll never use. These six matter:
UserPromptSubmit- fires when you submit a prompt. Modify or block before the model sees it.PreToolUse- fires before any tool executes. The primary security checkpoint.PermissionRequest- fires when the agent asks for permission. Auto-approve or deny.SessionStart- fires when a session opens. Inject project context, load secrets, set state.PostToolUse- fires after a tool completes. Run linters, format code, log calls.PreCompact- fires before context compaction. Back up transcripts you'd otherwise lose.
The pattern is simple: skills tell the model what should happen, hooks make sure it does happen. Both layers, every time.
The three patterns every builder needs
After running hooks across multiple agents and projects, three patterns do most of the work. Everything else is decoration.
The blast door. A PreToolUse hook that pattern-matches dangerous shell commands and blocks them with exit code 2. rm -rf /, force pushes to main, prod database writes, fork bombs. The model doesn't get to argue. The command never runs.
The blast door catches three categories of failure: the model misjudging a destructive operation as safe, a prompt injection from a web page or document instructing the agent to do something hostile, and your own typos when you're tired. One hook, three threat models neutralized.
The auto-loader. A SessionStart hook that injects fresh project context into every session. Current branch, recent commits, uncommitted changes, current task from your tracker. The model starts each session oriented instead of asking you "what are we working on?"
The auto-loader is the antidote to context amnesia. Skills could load this context, but only if invoked. The hook fires every time, no exceptions. After a week of using one, going back to a vanilla session feels like waking up with selective amnesia.
The auditor. A PostToolUse hook that appends every tool call to a JSONL log. Tool name, inputs, timestamps, exit codes. When something goes wrong, and it will, you have a trace. When something goes right and you want to replay it as a workflow, you have the receipts.
The auditor is the cheapest insurance you can buy. It costs you nothing on the happy path and saves you hours on the unhappy one.
The compound effect, again
Issue #1 made the case that configuration is the product. Issues 2 through 6 stacked the layers, SOUL, USER, MEMORY, AGENTS, subagents, SKILL.md. Each one moved your agent measurably closer to useful.
Hooks are the last layer because they are the only one that's deterministic. Everything else is the model interpreting your intent. Hooks are your intent enforced by a shell script that doesn't know how to interpret anything.
A configured agent without hooks is a contract without enforcement. A configured agent with hooks is an actual operating discipline.
Ship hooks. Then go back and read your SOUL.md again. You'll find rules you can promote from "the model should" to "the system will." That promotion is the difference between prompting and engineering.
The three-hook starter pack
Drop this into .claude/settings.json in your project (or ~/.claude/settings.json for personal use). Then create the three shell scripts referenced below. Restart your agent. You now have blast doors, auto-loading, and full audit logging.
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{ "type": "command", "command": "~/.claude/hooks/blast-door.sh" }
]
}
],
"SessionStart": [
{
"hooks": [
{ "type": "command", "command": "~/.claude/hooks/auto-loader.sh" }
]
}
],
"PostToolUse": [
{
"matcher": "*",
"hooks": [
{ "type": "command", "command": "~/.claude/hooks/auditor.sh" }
]
}
]
}
}~/.claude/hooks/blast-door.sh block dangerous bash before it runs:
#!/bin/bash
input=$(cat)
command=$(echo "$input" | jq -r '.tool_input.command // ""')
patterns=(
"rm -rf /"
"rm -rf ~"
"rm -rf \\$HOME"
"git push.*--force"
"git push.*-f "
"DROP TABLE"
"TRUNCATE TABLE"
":\\(\\)\\{ :\\|:& \\};:"
)
for pattern in "${patterns[@]}"; do
if echo "$command" | grep -qE "$pattern"; then
echo "BLOCKED: matches dangerous pattern: $pattern" >&2
exit 2
fi
done
exit 0~/.claude/hooks/auto-loader.sh inject project context on every session:
#!/bin/bash
echo "## Project context"
echo ""
echo "Branch: $(git branch --show-current 2>/dev/null || echo none)"
echo ""
echo "Recent commits:"
git log --oneline -5 2>/dev/null || echo "(no commits)"
echo ""
echo "Uncommitted changes:"
git status --short 2>/dev/null || echo "(clean)"~/.claude/hooks/auditor.sh log every tool call to a daily JSONL file:
#!/bin/bash
input=$(cat)
ts=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
log_dir="$HOME/.claude/audit"
mkdir -p "$log_dir"
echo "{\"ts\":\"$ts\",\"event\":$input}" >> "$log_dir/$(date +%Y-%m-%d).jsonl"
exit 0Make them executable: chmod +x ~/.claude/hooks/*.sh. That's it. Three hooks, ten minutes of setup, four-times reliability gain on rule enforcement. Customize the blast door patterns to your stack, your prod database, your secret directories, your never-touch paths. Every team's danger list is different.
Skill review: mattpocock/skills
What it is: Matt Pocock's open-source skill and hook pack for Claude Code. Bundles tdd, to-prd, to-issues, grill-me, caveman, and git-guardrails-claude-code. A working demonstration of the skill-plus-hook dual layer this issue is built around.
Setup difficulty: Trivial. One command: npx skills@latest add mattpocock/skills. Restart your agent. Done in under five minutes.
Verdict: This is the package that made the skills-versus-hooks distinction concrete for thousands of builders. tdd enforces phase gates (Red test must fail before Green implementation can start) using hooks, not just prompts. git-guardrails-claude-code intercepts force pushes and reset --hard at the shell level, not by asking the model nicely. It crossed 50,000 GitHub stars in three weeks of being public for exactly this reason.
The standout is grill-me, an interview-style skill that pulls product requirements out of vague feature requests before any code gets written. Pair it with tdd and you have a real dev pipeline: requirements grilled, PRD written, issues created, tests first, implementation last. All version-controlled in your repo.
Watch out for: caveman strips verbose output to save tokens, which is fantastic for mechanical tasks but reportedly degrades reasoning quality on complex logic. Use it on routine work, not on hard architecture problems.
Rating: Essential. The clearest demonstration in the ecosystem of what hooks unlock beyond skills alone.
The model is 20%. Configuration is 60%. Enforcement is the last 20%.
Issue #1 made the case that configuration matters more than model choice. Six issues later, that's still true, but it's incomplete.
Configuration tells your agent what should happen. The model decides whether to follow it. Most of the time it does. Sometimes, on hard tasks, under pressure, it doesn't.
The 20% you've been missing is enforcement. The hook that exits with code 2. The script that runs unconditionally. The audit log that doesn't care about the model's interpretation.
Skills are advisory. Hooks are statutory. You need both. The model needs guidance on what to do, and the system needs guarantees that the non-negotiable parts will happen regardless. A configured agent without hooks is a contract without an enforcement mechanism. It works until it doesn't.
If you've made it through Issues 1 through 6 and built a real bootstrap, you're closer than 95% of people building agents right now. Hooks are the move that takes you from configured to engineered.
See you next week.
Michael
