The Forge
An approval gate is a checkpoint that stops your agent right before it takes an action you can't take back, and holds it there until a human says yes or no. That's the whole idea. It sounds like a small thing. It's the difference between an agent that saves you an hour and an agent that empties a customer's account at 2am while you're asleep.
The last two issues made the agent more autonomous on purpose. Issue #12 moved it off your laptop so it runs on a schedule while your machine is off. Issue #13 gave it a grader so it checks its own work before calling it done. Both are wins. Both also quietly raised the stakes, because an agent that runs unattended and trusts its own verification is an agent that acts on the world without you in the room.
That's fine when every tool it holds is a read. Search a page, fetch a file, look up an order: nothing there you'd lose sleep over. The moment you hand it a tool that does something, the math changes. send_email mails a real customer. issue_refund moves real money. delete_record isn't coming back. The agent doesn't know which of its tools are reversible and which aren't. To the model they're all just tools.
The common take is that the risk with agents is bad output, and that a grader fixes it. That's half right. A grader catches a bad brief before you ship it. It does nothing about a good decision to take an irreversible action that you simply didn't want taken without a look first. Those are different problems, and verification only solves the first one.
Here's the part people miss, with a real number on it. Claude Code's original safety model asked you to approve tools one prompt at a time. Anthropic's own telemetry found users approved roughly 93% of those prompts. A gate that fires on everything doesn't get read. It gets clicked through. So the goal isn't more approvals. It's fewer, on the actions that actually matter, with enough detail that the human reading them actually stops.
The short version: a grader decides whether the work is good. An approval gate decides whether an action is yours to take without asking. You need both, and they live in different places.
The Blueprint
The gate lives at exactly one place: the tool boundary, the moment right after the model asks to call a tool and right before your code runs it. Whether you hand-roll a loop or run this on Managed Agents, that seam is where a human goes in. Managed Agents ships this as a built-in permission callback, but hand-rolling it once shows you exactly what the gate does, so let's build it.
Three pieces: a list of which tools are sensitive, an interceptor in the loop, and an approve function that a human answers.
Step 1: mark the tools that need a human. Reads never make the list. Only actions you can't cleanly undo do.
import anthropic
client = anthropic.Anthropic()
# Reversible tools run freely. Only irreversible or costly ones gate.
SENSITIVE = {"send_email", "issue_refund", "delete_record"}Step 2: intercept tool_use blocks before executing them. This is the standard tool-use loop with one added check. On a sensitive tool, it calls approve() first. On a no, it hands the model an error result that tells it to stop instead of retrying.
def run_with_gate(messages, tools, dispatch, approve):
while True:
resp = client.messages.create(
model="claude-sonnet-5",
max_tokens=1024,
tools=tools,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason != "tool_use":
return resp # model is done, no tool to run
results = []
for block in resp.content:
if block.type != "tool_use":
continue
if block.name in SENSITIVE and not approve(block.name, block.input):
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": (
"Denied by a human reviewer. Do not retry this "
"action. Explain what you would have done, then stop."
),
"is_error": True,
})
continue
results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": dispatch(block.name, block.input),
})
messages.append({"role": "user", "content": results})Step 3: write the approve function. The one rule that matters: show the arguments, not just the tool name. The danger in issue_refund isn't the verb, it's amount_cents=5000000. A human approving "issue_refund" blind is worse than no gate, because it feels safe.
def approve(name, args):
print(f"\nThe agent wants to call: {name}")
for key, value in args.items():
print(f" {key} = {value!r}")
return input("Approve this action? [y/N] ").strip().lower() == "y"That's the whole gate. A read like read_ticket runs untouched. A send_email stops, prints the recipient and the body, and waits. Say no and the model gets told to stand down and explain itself rather than trying the same thing with slightly different arguments.
For the autonomous case from Issue #12, where the agent runs while you're asleep, you don't want approve() blocking on input() for eight hours. Swap it for a queue: park the pending action, send yourself a notification, and resume the session when you answer. Managed Agents supports running approvals asynchronously for exactly this so a long review doesn't hold the agent open. Same gate, same seam, the human just answers later.
The Anvil
Now the part the demos skip: where approval gates rot, and how to keep them honest.
Gating everything trains you to rubber-stamp. This is the 93% problem, and it's the failure that matters most. If every tool call needs a yes, you stop reading and start clicking, and the gate becomes theater that makes you feel safe while protecting nothing. The fix is discipline about the SENSITIVE set. If you'd be fine finding out an action happened by reading a log the next morning, it doesn't belong on the list. Reserve the gate for the handful of actions you'd genuinely want to have said yes to first.
Approving the tool name instead of the arguments. A prompt that says "the agent wants to send an email, approve?" is almost useless. Send it to whom, saying what? The argument values are the entire decision. If your approval surface hides to, body, or amount_cents, you've built a gate that a human physically cannot evaluate, so they'll default to yes. Always render the full input.
The model retries a denied action instead of adapting. If your denial result is a bland "not allowed," the model often just calls the same tool again with a slightly reworded argument, and now you're approving the same thing five times. The tool_result you hand back on a denial has to do two things: set is_error true, and tell the model in plain language to stop and explain rather than retry. Vague denials create loops.
Blocking the whole run on a synchronous prompt. A gate built on input() is fine at your desk and a disaster in a scheduled agent. The session sits open holding a sandbox and a context window while it waits on a human who's asleep. For anything unattended, the gate has to be async: park the action, notify, resume. A gate that blocks forever isn't safer, it's just stuck.
The rule of thumb: gate the actions you'd want a paper trail for, and let everything else run. If you'd be comfortable learning an action happened after the fact, it doesn't need a gate. If you'd want to have approved it first, it does. Everything in between is you being indecisive at the model's expense.
Sparks
A few more things worth your attention this week:
Claude Code shipped connector observability in public beta. Admins and owners can now see adoption, errors, latency, and usage across MCP connectors from one place. If you're running agents against internal MCP servers, this is how you find the tool that's silently failing before it becomes a support ticket.
Claude Code's latest release added
/rewindto jump a conversation back to before a/clear, hardened MCP OAuth retries, and cut CPU use during streaming by about 37%. Small quality-of-life changes, but the OAuth retry fix matters if your agents authenticate against flaky servers.Claude in Chrome went generally available, with background notifications and draft PR handoff. A browsing agent that hands you a draft to review instead of merging on its own is the same approval-gate idea in a different shell: it does the work, you keep the last click.
Environment variables in vaults now support CLIs, so a command-line tool can make authenticated requests by registering its API keys against specific domains. Useful the moment your gated action tool is itself a CLI that needs a credential.
The Smith's Take
For two issues we made the agent more independent: off your laptop, checking its own work. This issue is the deliberate counterweight. Autonomy is a setting you turn up on the safe actions and hold back on the ones you can't undo, and an approval gate is how you draw that line in code instead of hoping the model draws it for you.
The builders who get burned aren't the ones whose agents write a bad sentence. A grader catches that. They're the ones who gave an agent a send or a pay or a delete and assumed "it's usually right" was the same as "it's safe to run unattended." It isn't. Right and reversible are different properties, and only one of them is the model's job to get correct.
Take one agent you're running that holds a tool that does something in the real world. Put that tool, and only that tool, behind the gate above, and make the approval prompt print the actual arguments. Then watch what it asks to do. The first time it surfaces an action you would not have approved, the gate has already paid for itself.
Build agents that actually work.
