Why three modes instead of a boolean
A binary AUTH_ENFORCED=true/false kills the
migration path. In a real deployment you turn auth on
after agents are already running. If the flip is
binary, the day you enforce is the day half your agents
break because somebody forgot to wire their signing client.
| Mode | Behaviour | Use case |
|---|---|---|
off | No-op middleware. Unsigned requests flow straight through. Key-server isn't consulted. | Solo user, single host, home network, CI smoke. |
observe | Unsigned / invalid requests still pass, but their rejection reason is logged. | Migration from unsigned to signed. Watch the log, fix offending callers, then flip. |
enforce | Signed requests pass. Unsigned ones return 401; 503 if key-server is down. | Multi-agent deployment, shared host, production. |
AUTH_MODE can differ between blocks during
migration — you might run consciousness-server
in enforce while test-runner stays
in observe for one stubborn caller.
1. Generate one key pair per agent
Every agent gets its own ed25519 key pair. Run this on the host where the agent will live, so the private key never travels:
# One key pair per agent. Run this on the host where the agent lives.
ssh-keygen -t ed25519 -C "ecosystem-scribe" \
-f ~/.ssh/ecosystem-scribe -N ""
# Result: two files —
# ~/.ssh/ecosystem-scribe (private — keep on agent host)
# ~/.ssh/ecosystem-scribe.pub (public — publish to key-server) -N "" means no passphrase. If your agents need
to start unattended (a worker container, a systemd unit),
this is the realistic choice — the security boundary becomes
the host filesystem rather than a passphrase prompt.
2. Bootstrap public keys on key-server
Authentication works by checking that
X-Agent: <name> in the request header
maps to a public key the key-server already knows. The
mapping is just files on disk:
# On the host running key-server, drop every agent's pub key into
# the agents/ directory. The key-server picks them up on next request;
# no restart needed.
scp ~/.ssh/ecosystem-scribe.pub \
operator@key-server-host:/opt/ecosystem/key-server/keys/agents/scribe.pub
# Repeat for every agent that should authenticate.
Every .pub file in
key-server/keys/agents/ defines an agent that
can authenticate. No database, no admin UI; the file is
authoritative. Removing the file revokes the agent on the
next request.
3. Flip to observe and watch the log
Now turn auth on without breaking anything:
# Flip every block from off to observe. Restart so env takes effect.
AUTH_MODE=observe docker compose up -d
# Watch the observation log — each line is a request that would
# have been rejected under enforce.
tail -f deploy/volumes/*-logs/auth-observe.log
# Reasons you'll see, with what to fix:
# missing_headers caller isn't signing yet
# unknown_agent signing but with a key not bootstrapped
# bad_signature protocol mismatch in caller's signing code
# timestamp_out_of_window caller's clock is drifting (NTP it)
# nonce_replayed caller is reusing nonces (must rotate) Iterate until the log stays clean for a few days of normal traffic. Clean means: every entry is a known deliberately-unsigned caller (a health-check probe, a local debugging script), not a real production agent.
4. Flip to enforce
# Once auth-observe.log stays clean for a couple of days, flip:
AUTH_MODE=enforce docker compose up -d
# Roll back instantly if anything goes wrong:
AUTH_MODE=off docker compose up -d
# No state migration needed. The keys you generated stay valid;
# the system simply stops checking them.
From this point unsigned callers get a hard 401.
The off escape hatch is one env var away — it
requires no state migration, no key revocation, no restart
of any external system.
What "signing a request" looks like in code
Cortex and Claude Code already sign their CS calls when configured. For your own clients, the protocol is in SIGNING-PROTOCOL.md. The Python implementation is short:
import time, json, secrets, base64
from nacl.signing import SigningKey
priv = SigningKey(open("/home/scribe/.ssh/ecosystem-scribe", "rb").read())
def sign_request(method, path, body_bytes=b""):
ts = str(int(time.time()))
nonce = base64.urlsafe_b64encode(secrets.token_bytes(16)).decode()
canonical = f"{method}\n{path}\n{ts}\n{nonce}\n".encode() + body_bytes
sig = priv.sign(canonical).signature
return {
"X-Agent": "scribe",
"X-Timestamp": ts,
"X-Nonce": nonce,
"X-Signature": base64.urlsafe_b64encode(sig).decode(),
}
# Then on every request:
headers = sign_request("POST", "/api/notes", json.dumps(payload).encode())
requests.post(f"{CS}/api/notes", json=payload, headers=headers)
The canonical string is
METHOD\n PATH\n TIMESTAMP\n NONCE\n BODY. The
server reconstructs it from the headers and request line,
then verifies the ed25519 signature against the public key
mapped to X-Agent. Replays are blocked by a
short-lived nonce cache; clock drift over ~60s is rejected.
Hardening checklist
- Keep port 3040 (key-server) off the public internet. Loopback, VPN, or localhost-bind only — it dispenses secrets.
- Set the IP allow-list on key-server even on a trusted LAN. CIDR-style; one line per peer.
- Audit the audit log —
deploy/volumes/key-server-logs/audit.jsonlis structured JSONL. Tail-and-alert on it. - NTP every host. Signed requests are rejected if the clock drifts beyond the configured window.
- Rotate keys when a host is decommissioned: delete the
.pubon key-server, regenerate on the agent host.
Full threat model: see consciousness-server/SECURITY.md.