Why AI assistants need a kill switch

The conversation about emergency controls for AI assistants tends to skip the part that actually matters. A red button on a dashboard is easy to draw. A red button that reliably reaches every laptop, every session, and every tool surface across a working fleet — under conditions where something has already started going wrong — is a different problem.

This post is about that second problem. What a kill switch needs to be once an assistant is operating across more than a handful of machines, and what the design has to look like to be useful in an actual incident.

The shape of an incident

Incidents almost never arrive with full context. Someone notices the assistant doing something it should not — a tool call to an environment it should not touch, a credential pattern that should not be in scope, an action that resembles the early steps of something worse. The operator does not yet know how widespread it is, whether the behavior is local or fleet-wide, or whether the model itself is being manipulated.

In that moment, the right move is containment, not investigation. Stop the tool surface, hold the audit trail, and decide what to do once the picture clears. The problem is that containment is only useful if it actually applies — to every running session, to every machine on the fleet, including ones that are currently offline or partially connected.

A kill switch that only works for healthy, well-connected, currently-attentive runtimes is a kill switch that fails exactly when it is needed.

Why per-machine controls do not generalise

The instinct is often to start with per-machine controls. The agent has a config file, the runtime has a flag, the user can disable tools locally. None of that scales, and none of it survives an incident.

A few specific failure modes:

Coverage drift. Per-machine controls require every machine to be updated. The one machine that did not get the update is the one the incident comes through.
Race conditions. During an active incident, the agent is doing things faster than a human can ssh to a laptop. The window between "operator notices" and "operator can intervene" matters.
Offline state. A laptop that is not currently connected cannot be reached. When it comes back online, the question is what posture it assumes — and per-machine controls do not have a good answer.
No audit alignment. Each machine's local mute is its own record. Reconstructing what was suspended where, when, and by whom is painful enough that it usually does not happen.

The operator wants one action that reaches the whole fleet, that survives partial connectivity, and that leaves a single audit record. That is a control-plane problem, not a runtime problem.

What a real kill switch looks like

A kill switch worth deploying has three properties:

Heartbeat-driven. The runtime checks in with the control plane on a known cadence. Emergency state rides the same loop as policy updates. The runtime does not have to be told it is in an incident — it picks up the new posture on the next heartbeat. That is what gets response time below the threshold where containment is still useful.
Fail-secure on silence. If the runtime stops hearing from the control plane for too long, the assumption is that something is wrong, not that everything is fine. The runtime defaults to a restricted posture rather than continuing under stale policy. The exact threshold should be configurable, but the default has to fail toward safe.
Policy-graded, not binary. Real emergencies often need a narrower response than "all assistants off everywhere." A kill switch that can scope to a runtime, an environment, a user, or a tool surface lets the operator contain the suspicious behavior without taking the whole fleet down. The binary version is useful as a last resort, not as the only resort.

All three matter together. A heartbeat without fail-secure means the kill switch silently stops working when the control plane goes quiet. Fail-secure without graded scope means every minor incident is a full outage. Graded scope without a heartbeat means the operator is back to per-machine work.

How ClawForge handles this

The ClawForge model treats the kill switch as a state in the control plane, not a command sent to runtimes.

Operators change the state through the same console where policy lives. The action is auditable and role-gated; it does not bypass the rest of the operator model.
The state is published through the heartbeat loop that runtimes already use to pull policy. A runtime that is healthy and connected picks it up on its next check-in. A runtime that is online but slow picks it up on the one after that. A runtime that is offline does not get to keep operating under stale assumptions — once it has been disconnected past the fail-secure threshold, it lands in a restricted posture until it has reconnected and resynced.
The state is graded. It can suspend specific tool surfaces, specific runtimes, specific sessions, or specific scopes of user identity. The binary "everything off" version is one expression of the same state machine, not a separate code path.

The result is that an operator response is one action in one place, propagating to every machine the control plane knows about, with a single audit record explaining what was changed, by whom, and when.

What a kill switch is not

It is worth being explicit about what this does not solve.

A kill switch is not a substitute for runtime sandboxing. The assistant should not be able to do dangerous things in the first place; the kill switch is for the cases where something the runtime considered safe turns out not to be.

A kill switch is not a detection system. It is a response surface for a detection that has already happened, whether through prompt-level scanning, anomaly detection, audit review, or a human noticing.

A kill switch is not an excuse to ship with weak default policy. The point is to have an emergency control because emergencies happen, not to push the entire risk model onto the operator's ability to react in real time.

Practical guidance

If you are setting up emergency controls for an AI assistant deployment, a short checklist:

Pick a heartbeat interval that matches your incident-response target. If you need to contain an incident within minutes, the heartbeat cannot be measured in tens of minutes.
Decide your fail-secure threshold ahead of time, and write down what "restricted posture" actually means in your environment. The middle of an incident is not the time to be inventing definitions.
Make sure the audit record for an emergency-state change is at least as detailed as the record for a policy change. The follow-up review will want to know exactly what happened.
Test the kill switch on a recurring schedule, the way teams test backups. A kill switch that has never been exercised is a kill switch that has never been verified.
Rehearse the operator side. The control plane action should be small and obvious; the harder part is the human decision to take it, and that benefits from practice.

The kill switch is not the most exciting part of an AI governance model, and it should not be the marketing centrepiece. But it is the part that determines whether the rest of the model holds up the day something goes wrong.