Prerequisites
Run through the Quickstart
first — you need consciousness-server up on
:3032. Verify with:
curl -s http://localhost:3032/health | jq
If that returns "status": "ok", you're ready.
AUTH_MODE=off (the default) is fine for this
guide.
Step 1 — Clone Cortex
Cortex lives in its own repository. Clone it anywhere on
your machine — it does not need to sit next to
consciousness-server.
git clone https://github.com/build-on-ai/cortex.git
cd cortex Step 2 — Make sure Ollama is ready
Cortex routes prompts through Ollama on the host. Pull at least one model that supports tool calling.
# Cortex needs Ollama on the host. If you don't already have it:
curl -fsSL https://ollama.com/install.sh | sh
ollama pull gemma4:e4b # 3 GB, runs on CPU
# or
ollama pull gemma4:26b # 17 GB, needs a GPU Step 3 — Run Cortex
./run.sh agent
On first run run.sh creates a Python virtualenv
and installs dependencies (about a minute). Then you should
see the banner:
+==========================================+
| CORTEX | gemma4:e4b |
| Local AI Agent |
+==========================================+
Type /help for commands
+ Policy Engine loaded (10 tools)
+ Recovery Engine (fallback: none)
+ Context Compactor (limit: 16000 tokens)
+ Discovered Consciousness Server at http://localhost:3032
+ Briefing from Consciousness Server loaded
>
The line + Discovered Consciousness Server at
http://localhost:3032 means the auto-discovery probe
succeeded. Cortex is now registered with CS, the briefing
loaded, and tools that hit CS endpoints (notes, tasks,
memory) work without further configuration.
Type a message at the > prompt — it goes
through the local model. Try
/status to see the live picture.
CS on a different host?
Auto-discovery only checks localhost:3032. If
your CS lives on another machine in the LAN, set
CS_URL explicitly. AGENT_NAME is
the identifier other agents see — give each Cortex
instance a unique one if you run several:
CS_URL=http://10.0.0.5:3032 AGENT_NAME=cortex-laptop ./run.sh agent Three agents coordinating through CS
The promise of the ecosystem isn't one Cortex talking to a memory store — it's many agents sharing state and dropping tasks for each other. The simplest demo: two workers polling for work, plus an interactive operator.
tmux new-session -d -s cortex-demo
# Pane 1 — autonomous worker
tmux send-keys -t cortex-demo \
"AGENT_NAME=worker-A ./run.sh worker" Enter
# Pane 2 — second worker
tmux split-window -t cortex-demo -h
tmux send-keys -t cortex-demo \
"AGENT_NAME=worker-B ./run.sh worker" Enter
# Pane 3 — operator (interactive CLI, drops tasks)
tmux split-window -t cortex-demo -v
tmux send-keys -t cortex-demo \
"AGENT_NAME=operator ./run.sh agent" Enter
tmux attach -t cortex-demo
You'll see three panes. Each worker registers with CS on
start, sends a heartbeat every few seconds, and polls
/api/tasks/pending/<AGENT_NAME>. The
operator is the same Cortex CLI you used in step 3.
Drop a task and watch it get picked up
# From any terminal — drop a task into the queue.
curl -s -X POST http://localhost:3032/api/tasks \
-H 'Content-Type: application/json' \
-d '{
"title": "summarise the README",
"description": "Read README.md, write 3-bullet summary as a CS note.",
"assigned_to": "worker-A"
}'
# Watch worker-A pick it up — within ~5 s the task status flips
# to "in_progress", and on completion a note appears.
curl -s "http://localhost:3032/api/tasks?assigned_to=worker-A" | jq
curl -s "http://localhost:3032/api/notes?agent=worker-A" | jq
Within a polling cycle (5 s by default) worker-A
claims the task, executes it through its local model, and
publishes a note with the result. Repeat with
assigned_to: "worker-B" to balance work, or
assign to operator for a human-in-the-loop
step.
Verify the fleet
curl -s http://localhost:3032/api/agents | jq
# You'll see worker-A, worker-B, and operator each with a recent
# heartbeat timestamp and "online" status. This is exactly the pattern for a production fleet — just add more panes (or more machines via Multi-machine fleet) and assign tasks to whichever agent has the right model loaded.
Auto-discovery details
On startup, Cortex checks
http://localhost:3032/health with a 1-second
timeout. If CS responds, that becomes CS_URL.
If anything else (CS not running, wrong port, network
error), CS_URL stays empty and Cortex runs in
standalone mode — no degradation, just no CS-backed tools.
An explicit CS_URL env var always wins. To
disable the probe, or aim it elsewhere:
# Force CS off entirely
CORTEX_AUTO_DISCOVER_CS=0 ./run.sh agent
# Or override the probe URL — useful if CS is on a non-default port
CORTEX_AUTO_DISCOVER_URL=http://localhost:13032 ./run.sh agent Going further
- Want to write your own agent instead? See Your first agent → — the same five HTTP calls Cortex makes, in 40 lines of Python.
- Many machines, not just panes? See Multi-machine fleet →
- Ready to leave AUTH_MODE=off? See Secure deployment →
- Cortex docs and source? github.com/build-on-ai/cortex