Machines server — BuildOnAI Platform

What it does today

machines-server is a Flask service on port 3038 that ships with Consciousness Server. It is not just a YAML reader — it actively probes the host, queries connected services, and exposes one aggregate view of the entire ecosystem for any AI agent that asks.

Real-time GPU telemetry — calls nvidia-smi on the host and parses utilization percent, VRAM used / total in MB, and GPU temperature. Returns null cleanly on hosts without an NVIDIA card, so the same code path works on any machine.
Real-time system stats via psutil — CPU percent (across all cores), core count, load averages, RAM total / used / available in GB, disk usage. Cross-platform (Linux / macOS / Windows).
Dynamic service health monitoring — reads services.yaml at request time, opens HTTP connections to every declared port, classifies each as active or inactive based on response code (anything < 500 counts as alive — handles WebSocket upgrades and redirects correctly).
Static machine registry — every machine declared in machines/<name>.yaml with hardware spec, network address, role, list of agents that live there, and which Ollama models it has loaded.
Aggregate /api/infrastructure endpoint — a single call returns local system stats plus machine configs plus runtime machines from CS plus services with live status plus registered agents. One response to answer "what is the state of this entire ecosystem right now?".
Native MCP protocol support — exposes /mcp/tools and /mcp/call. Claude Code or any other MCP-aware client can use machines-server directly as a tool provider, calling get_system_resources, get_infrastructure, or list_machines without writing custom HTTP code.
ed25519 auth via AUTH_MODE — the same three-state signed-request middleware as Consciousness Server (off / observe / enforce). Telemetry calls in enforce mode require a signed request; agents talking to it have key-server-issued keys.

API surface

Six endpoints. The whole machines-server contract:

Method	Path	Purpose
GET	`/health`	Health check + uptime
GET	`/api/system`	Live CPU / RAM / disk / GPU stats for this host
GET	`/api/machines`	Static machine definitions from `machines/*.yaml`
GET	`/api/services`	All services from `services.yaml` with current HTTP status
GET	`/api/infrastructure`	Aggregate: local system + machines (config + runtime) + services + agents + summary counts
GET	`/mcp/tools`	MCP tool definitions for Claude Code or other MCP clients
POST	`/mcp/call`	Execute an MCP tool by name with optional args

That is the full API. Adding a new capability means a new tool in /mcp/tools + a handler in /mcp/call; there is no hidden internal API.

Why this matters

Every multi-agent platform talks about agents. Almost none talk about machines. Yet a local agent on a GPU workstation has fundamentally different capabilities than a worker on a Raspberry Pi: different memory, different latency, different model weights loaded, different power budget. Modelling that explicitly is the difference between "I have agents" and "I have a fleet".

Once a machine is a first-class entity with a known hardware profile, a lot of things become easy:

Heavy reasoning routes to the GPU box; classification routes to the cheap CPU host.
Agents on different machines see each other through Consciousness Server and coordinate by name.
A machine going offline triggers automatic re-routing of pending tasks.
You can rent compute by adding a node — the rest of the fleet adopts it via its YAML descriptor.

A real example

The author's production setup, modelled in YAML:

Workstation with GPU — Cortex with gemma4:26b, takes heavy reasoning and code-review tasks.
Application host (CPU only) — Cortex with gemma4:e4b, handles classification, summarisation, fast small-context work.
Laptop — interactive agents, orchestration, the developer's primary workspace.

All three see the same shared memory in Consciousness Server. machines-server tells any agent which of them has headroom right now. Tasks flow to whichever node makes sense.

Where this goes — exploratory directions

The paradigm generalises: anything with a controller, a state, and telemetry is a machine. None of the scenarios below ship today; they describe what the architecture enables once the right adapters exist. We mark them Planned or Exploratory honestly — flagged here so contributors and customers can shape the roadmap together.

Prototype shop / small factory

3D printers, plotters, CNC mills, laser cutters. An agent picks up a print queue, dispatches files to printers with the right material loaded, monitors job state, reports failures. Mixed deterministic execution (g-code goes to the printer verbatim) with non-deterministic decisions (which file first, which printer, what to do on a jam). Planned.

Research lab (biology)

Incubators, cytometers, sequencers. An agent reads experiment metadata from the LIMS, cross-references new results against shared memory in Consciousness Server ("how did the same condition behave in last week's run?"). Lab notebooks become a queryable corpus. Planned. Targets research environments first; clinical deployments need their own regulatory validation.

Energy infrastructure

PV panels, battery storage, EV chargers, server-room load. An agent balances when to charge the car, when to power servers, when to sell surplus to the grid. All locally, with no third-party cloud knowing your home-energy schedule. Exploratory.

Building / smart-home enterprise

HVAC, climate sensors, blinds, lighting. Agent learns occupancy patterns, suggests optimisations, never phones home to a vendor cloud. Exploratory.

Drones

Light-show choreography, infrastructure inspection (your own pipelines, towers, roofs), precision-agriculture monitoring. Each drone is a machine with telemetry; agents plan routes, react to wind, optimise battery rotation. Exploratory.

Private fleet / heavy equipment

Trucks, combines, excavators and other machines from different manufacturers under one owner. Today every brand pushes its own telematics cloud — a mixed-fleet owner ends up logging into a different portal for every brand, with telemetry, work history and predictive-maintenance data scattered across as many vendor clouds. One owner-controlled maintenance system (CMMS) instead, aware of every machine regardless of brand: collects telemetry locally, runs predictive maintenance on hardware the owner owns, keeps the full service history in one place — not across half a dozen manufacturer portals. Exploratory.

Shipping today: an agent can read which machine has GPU capacity and which is offline. The use cases above describe the design space opened by treating machines as first-class — they are not claims about today.

Next steps

Consciousness Server (where machines-server lives) →
Cortex (the agent that runs on each machine) →
View Consciousness Server on GitHub

Machines aren't just servers.

What it does today

API surface

Why this matters

A real example

Where this goes — exploratory directions

Prototype shop / small factory

Research lab (biology)

Energy infrastructure

Building / smart-home enterprise

Drones

Private fleet / heavy equipment

Next steps