machines-server port 3038 part of consciousness-server v1.1.0

Machines aren't just servers.

Most agent platforms model agents but ignore the hardware they run on. BuildOnAI does the opposite: every machine in your fleet is a first-class entity, with hardware profile, available models, role, and live telemetry — so an agent can ask "which box has free VRAM and a tool-calling model loaded?" and get a real answer.

What it does today

machines-server is a Flask service on port 3038 that ships with Consciousness Server. It is not just a YAML reader — it actively probes the host, queries connected services, and exposes one aggregate view of the entire ecosystem for any AI agent that asks.

  • Real-time GPU telemetry — calls nvidia-smi on the host and parses utilization percent, VRAM used / total in MB, and GPU temperature. Returns null cleanly on hosts without an NVIDIA card, so the same code path works on any machine.
  • Real-time system stats via psutil — CPU percent (across all cores), core count, load averages, RAM total / used / available in GB, disk usage. Cross-platform (Linux / macOS / Windows).
  • Dynamic service health monitoring — reads services.yaml at request time, opens HTTP connections to every declared port, classifies each as active or inactive based on response code (anything < 500 counts as alive — handles WebSocket upgrades and redirects correctly).
  • Static machine registry — every machine declared in machines/<name>.yaml with hardware spec, network address, role, list of agents that live there, and which Ollama models it has loaded.
  • Aggregate /api/infrastructure endpoint — a single call returns local system stats plus machine configs plus runtime machines from CS plus services with live status plus registered agents. One response to answer "what is the state of this entire ecosystem right now?".
  • Native MCP protocol support — exposes /mcp/tools and /mcp/call. Claude Code or any other MCP-aware client can use machines-server directly as a tool provider, calling get_system_resources, get_infrastructure, or list_machines without writing custom HTTP code.
  • ed25519 auth via AUTH_MODE — the same three-state signed-request middleware as Consciousness Server (off / observe / enforce). Telemetry calls in enforce mode require a signed request; agents talking to it have key-server-issued keys.

API surface

Six endpoints. The whole machines-server contract:

Method Path Purpose
GET /health Health check + uptime
GET /api/system Live CPU / RAM / disk / GPU stats for this host
GET /api/machines Static machine definitions from machines/*.yaml
GET /api/services All services from services.yaml with current HTTP status
GET /api/infrastructure Aggregate: local system + machines (config + runtime) + services + agents + summary counts
GET /mcp/tools MCP tool definitions for Claude Code or other MCP clients
POST /mcp/call Execute an MCP tool by name with optional args

That is the full API. Adding a new capability means a new tool in /mcp/tools + a handler in /mcp/call; there is no hidden internal API.

Why this matters

Every multi-agent platform talks about agents. Almost none talk about machines. Yet a local agent on a GPU workstation has fundamentally different capabilities than a worker on a Raspberry Pi: different memory, different latency, different model weights loaded, different power budget. Modelling that explicitly is the difference between "I have agents" and "I have a fleet".

Once a machine is a first-class entity with a known hardware profile, a lot of things become easy:

  • Heavy reasoning routes to the GPU box; classification routes to the cheap CPU host.
  • Agents on different machines see each other through Consciousness Server and coordinate by name.
  • A machine going offline triggers automatic re-routing of pending tasks.
  • You can rent compute by adding a node — the rest of the fleet adopts it via its YAML descriptor.

A real example

The author's production setup, modelled in YAML:

  • Workstation with GPU — Cortex with gemma4:26b, takes heavy reasoning and code-review tasks.
  • Application host (CPU only) — Cortex with gemma4:e4b, handles classification, summarisation, fast small-context work.
  • Laptop — interactive agents, orchestration, the developer's primary workspace.

All three see the same shared memory in Consciousness Server. machines-server tells any agent which of them has headroom right now. Tasks flow to whichever node makes sense.

Where this goes — exploratory directions

The paradigm generalises: anything with a controller, a state, and telemetry is a machine. None of the scenarios below ship today; they describe what the architecture enables once the right adapters exist. We mark them Planned or Exploratory honestly — flagged here so contributors and customers can shape the roadmap together.

Prototype shop / small factory

3D printers, plotters, CNC mills, laser cutters. An agent picks up a print queue, dispatches files to printers with the right material loaded, monitors job state, reports failures. Mixed deterministic execution (g-code goes to the printer verbatim) with non-deterministic decisions (which file first, which printer, what to do on a jam). Planned.

Research lab (biology)

Incubators, cytometers, sequencers. An agent reads experiment metadata from the LIMS, cross-references new results against shared memory in Consciousness Server ("how did the same condition behave in last week's run?"). Lab notebooks become a queryable corpus. Planned. Targets research environments first; clinical deployments need their own regulatory validation.

Energy infrastructure

PV panels, battery storage, EV chargers, server-room load. An agent balances when to charge the car, when to power servers, when to sell surplus to the grid. All locally, with no third-party cloud knowing your home-energy schedule. Exploratory.

Building / smart-home enterprise

HVAC, climate sensors, blinds, lighting. Agent learns occupancy patterns, suggests optimisations, never phones home to a vendor cloud. Exploratory.

Drones

Light-show choreography, infrastructure inspection (your own pipelines, towers, roofs), precision-agriculture monitoring. Each drone is a machine with telemetry; agents plan routes, react to wind, optimise battery rotation. Exploratory.

Private fleet / heavy equipment

Trucks, combines, excavators and other machines from different manufacturers under one owner. Today every brand pushes its own telematics cloud — a mixed-fleet owner ends up logging into a different portal for every brand, with telemetry, work history and predictive-maintenance data scattered across as many vendor clouds. One owner-controlled maintenance system (CMMS) instead, aware of every machine regardless of brand: collects telemetry locally, runs predictive maintenance on hardware the owner owns, keeps the full service history in one place — not across half a dozen manufacturer portals. Exploratory.

Shipping today: an agent can read which machine has GPU capacity and which is offline. The use cases above describe the design space opened by treating machines as first-class — they are not claims about today.

Next steps