Secure deployment — BuildOnAI Docs

Why three modes instead of a boolean

A binary AUTH_ENFORCED=true/false kills the migration path. In a real deployment you turn auth on after agents are already running. If the flip is binary, the day you enforce is the day half your agents break because somebody forgot to wire their signing client.

Mode	Behaviour	Use case
`off`	No-op middleware. Unsigned requests flow straight through. Key-server isn't consulted.	Solo user, single host, home network, CI smoke.
`observe`	Unsigned / invalid requests still pass, but their rejection reason is logged.	Migration from unsigned to signed. Watch the log, fix offending callers, then flip.
`enforce`	Signed requests pass. Unsigned ones return `401`; `503` if key-server is down.	Multi-agent deployment, shared host, production.

AUTH_MODE can differ between blocks during migration — you might run consciousness-server in enforce while test-runner stays in observe for one stubborn caller.

1. Generate one key pair per agent

Every agent gets its own ed25519 key pair. Run this on the host where the agent will live, so the private key never travels:

# One key pair per agent. Run this on the host where the agent lives.
ssh-keygen -t ed25519 -C "ecosystem-scribe" \
  -f ~/.ssh/ecosystem-scribe -N ""

# Result: two files —
#   ~/.ssh/ecosystem-scribe         (private — keep on agent host)
#   ~/.ssh/ecosystem-scribe.pub     (public  — publish to key-server)

-N "" means no passphrase. If your agents need to start unattended (a worker container, a systemd unit), this is the realistic choice — the security boundary becomes the host filesystem rather than a passphrase prompt.

2. Bootstrap public keys on key-server

Authentication works by checking that X-Agent: <name> in the request header maps to a public key the key-server already knows. The mapping is just files on disk:

# On the host running key-server, drop every agent's pub key into
# the agents/ directory. The key-server picks them up on next request;
# no restart needed.
scp ~/.ssh/ecosystem-scribe.pub \
    operator@key-server-host:/opt/ecosystem/key-server/keys/agents/scribe.pub

# Repeat for every agent that should authenticate.

Every .pub file in key-server/keys/agents/ defines an agent that can authenticate. No database, no admin UI; the file is authoritative. Removing the file revokes the agent on the next request.

3. Flip to observe and watch the log

Now turn auth on without breaking anything:

# Flip every block from off to observe. Restart so env takes effect.
AUTH_MODE=observe docker compose up -d

# Watch the observation log — each line is a request that would
# have been rejected under enforce.
tail -f deploy/volumes/*-logs/auth-observe.log

# Reasons you'll see, with what to fix:
#   missing_headers           caller isn't signing yet
#   unknown_agent             signing but with a key not bootstrapped
#   bad_signature             protocol mismatch in caller's signing code
#   timestamp_out_of_window   caller's clock is drifting (NTP it)
#   nonce_replayed            caller is reusing nonces (must rotate)

Iterate until the log stays clean for a few days of normal traffic. Clean means: every entry is a known deliberately-unsigned caller (a health-check probe, a local debugging script), not a real production agent.

4. Flip to enforce

# Once auth-observe.log stays clean for a couple of days, flip:
AUTH_MODE=enforce docker compose up -d

# Roll back instantly if anything goes wrong:
AUTH_MODE=off docker compose up -d
# No state migration needed. The keys you generated stay valid;
# the system simply stops checking them.

From this point unsigned callers get a hard 401. The off escape hatch is one env var away — it requires no state migration, no key revocation, no restart of any external system.

What "signing a request" looks like in code

Cortex and Claude Code already sign their CS calls when configured. For your own clients, the protocol is in SIGNING-PROTOCOL.md. The Python implementation is short:

import time, json, secrets, base64
from nacl.signing import SigningKey

priv = SigningKey(open("/home/scribe/.ssh/ecosystem-scribe", "rb").read())

def sign_request(method, path, body_bytes=b""):
    ts = str(int(time.time()))
    nonce = base64.urlsafe_b64encode(secrets.token_bytes(16)).decode()
    canonical = f"{method}\n{path}\n{ts}\n{nonce}\n".encode() + body_bytes
    sig = priv.sign(canonical).signature
    return {
        "X-Agent": "scribe",
        "X-Timestamp": ts,
        "X-Nonce": nonce,
        "X-Signature": base64.urlsafe_b64encode(sig).decode(),
    }

# Then on every request:
headers = sign_request("POST", "/api/notes", json.dumps(payload).encode())
requests.post(f"{CS}/api/notes", json=payload, headers=headers)

The canonical string is METHOD\n PATH\n TIMESTAMP\n NONCE\n BODY. The server reconstructs it from the headers and request line, then verifies the ed25519 signature against the public key mapped to X-Agent. Replays are blocked by a short-lived nonce cache; clock drift over ~60s is rejected.

Hardening checklist

Keep port 3040 (key-server) off the public internet. Loopback, VPN, or localhost-bind only — it dispenses secrets.
Set the IP allow-list on key-server even on a trusted LAN. CIDR-style; one line per peer.
Audit the audit log — deploy/volumes/key-server-logs/audit.jsonl is structured JSONL. Tail-and-alert on it.
NTP every host. Signed requests are rejected if the clock drifts beyond the configured window.
Rotate keys when a host is decommissioned: delete the .pub on key-server, regenerate on the agent host.

Full threat model: see consciousness-server/SECURITY.md.

From AUTH_MODE=off to enforce, without breaking live agents.