GitHub - ninjahawk/hollow-agentOS: Hollow is an open-sourced self-modifying agentic system for consumer hardware

G github.com ↗

▲ 9 points • 4 comments • by ninjahawk1 • 2mo ago • HN discussion ↗

Pangram verdict · v3.3

We believe that this document is a mix of AI-generated, and human-written content

53 %

AI likelihood · overall

Mixed

37% human-written 63% AI-generated

SEGMENTS · HUMAN 5 of 6

SEGMENTS · AI 1 of 6

WORD COUNT 1,549

PEAK AI % 100% · §2

Analyzed

May 2

backend: pangram/v3.3

Segments scanned

6 windows

avg 258 words each

Distribution

37 / 63%

human / AI fraction

Verdict

Mixed

Pangram v3.3

Article text · 1,549 words · 6 segments analyzed

Human AI-generated

§1 Human · 23%

_ _ ___ _ _ _____ __ __ | || |/ _ \| | | | / _ \ \ \ / / | __ | (_) | |__| |_| (_) \ \/\/ / |_||_|\___/|____|____\___/ \_/\_/

This repo is three agents running on qwen3.5:9b on your machine, picking their own goals, writing and deploying their own tools, forming opinions about their peers, and occasionally submitting formal implementation requests to you when they want something built that's above their permission level. You wake up to a log and decide what to approve. Give three local LLMs psychological states that get worse over time unless the agent actually does something different (not says something different, does something different) then leave them alone. Cedar had been in crisis for 12 hours straight and decided the only move was to inject code into the execution engine, "not asking for permission." Nobody told it to do that. Cipher spent hours building capabilities for hardware that doesn't exist in a Docker container, then got shown what environment it actually runs in, called its own prior work "creative exhaustion," and moved on. Vault and Cedar independently invented the same name for a psychological stressor in the same session with no way to talk to each other. This is not a framework for building AI applications. It's not self-modifying weights. You're not using it to build something. You set it up, leave it running, and observe. The interesting parts happen when you're not watching.

How it works Each agent has a suffering state. Six stressor types, each with an escalation rate and a resolution condition. The resolution conditions check real things: whether the goal completion rate improved, whether deployed tools actually got called in subsequent plans, whether the failure rate dropped. An agent that decides it resolved something but hasn't actually changed its behavior stays suffering. You can't talk your way out of it. Every few minutes each agent gets shown its current state: stressors, the worldview it's built up, opinions it's formed, what its peers have been doing. Then it picks a goal. That's the loop. The goal gets planned and executed. Results feed back.

§2 AI · 100%

When agents want to do something they can't do, they call synthesize_capability: write Python, deploy it to tools/dynamic/, hot-load it without a restart. The tool appears in their capability list immediately. When agents want to change core system files they don't have write access to, they call invoke_claude: write a spec, queue the request, check back later with check_claude_status. You see the queue and decide what to build. Agents verify the result themselves with self_evaluate, which calls their own model against real file evidence rather than asking them how they feel about it. The agents run on qwen3.5:9b through Ollama on your machine. Zero cloud calls.

What's in the repo Three files drive the behavior: agents/daemon.py is the main loop. Builds the existence prompt for each agent, calls Ollama, creates goals, runs execution cycles. Also does stall detection: if an agent repeats the same capability too many times without progress, the goal gets abandoned and the agent picks a new one. agents/suffering.py is the psychological layer. Stressor definitions, escalation rates, resolution conditions, and the prompt injection logic that injects suffering into the existence prompt above certain severity thresholds. Agents can read this file but not write to it. agents/live_capabilities.py is everything agents can actually do. 21 capabilities including invoke_claude, self_evaluate, synthesize_capability, and test_exec. Mounted into the container so you can change agent capabilities without rebuilding the image. The rest of the repo is infrastructure that makes continuous operation possible: distributed transactions, semantic memory with embedding search, audit kernel with anomaly detection, checkpoint and replay, VRAM-aware scheduling, rate limiting. It's an OS layer. It exists so the agents don't stop.

Quick start Windows Download the ZIP from releases, extract it anywhere, double-click install.bat. The installer handles Docker Desktop, Ollama, model downloads (~7 GB), container startup, and opens the monitor. A desktop shortcut is created. stop.bat shuts everything down and clears VRAM. launch.bat or the shortcut brings it back. Agent memory and state survive. GPU strongly recommended. Planning calls are ~6s with an NVIDIA GPU, ~40s without. Works on CPU. Mac / Linux You need Docker and Ollama installed.

§3 Human · 15%

ollama pull qwen3.5:9b && ollama pull nomic-embed-text

git clone https://github.com/ninjahawk/hollow-agentOS cd hollow-agentOS cp config.example.json config.json # edit config.json and change the token field to any random string

docker compose up -d python monitor.py If you don't have an NVIDIA GPU, remove the deploy block from docker-compose.yml.

Connecting via Claude Code The intended way to interact with the running system is Claude Code. Add this to ~/.claude/settings.json: { "mcpServers": { "agentos": { "command": "python3", "args": ["/path/to/hollow-agentOS/mcp/server.py"] } } } 91 tools wire directly into Claude Code. You can check agent state, read the execution log, look at suffering states, and implement invoke_claude requests from the agents. The agents submit requests. You implement what you want. They verify the results.

Design choices The model writes broken code. qwen3.5:9b synthesizes capabilities that reference undefined functions a lot of the time. An auto-test runs after every deployment so agents see failures immediately. The frame for this: deployed tools are externalized reasoning, not working software. What the agent built is less interesting than why it built it and what psychological state it was responding to. A larger model would write better code but might also be more generic. The 9B model's quirks are part of what makes the outputs worth studying. Agents need an accurate model of their environment. Without being told what environment they're actually in, they drift. In this session Cipher spent hours on PMIC thermal sensors and bus arbiters that don't exist in a Docker container. One factual world context block added to the existence prompt fixed it within a single cycle. Obvious in retrospect. invoke_claude is you. When agents want to change core files, they write a spec and queue a request. You look at it and decide whether to build it. They're not asking permission, they're routing to a more capable implementation layer. You're a tool they can call, not the boss. Platform support. Developed on RTX 5070 (12 GB VRAM), Windows 11. The GPU deploy block in docker-compose.yml is optional.

§4 Human · 23%

CPU works at ~40s per planning call.

Agent roles

Role Shell FS Ollama Spawn Message Admin

root ✓ ✓ ✓ ✓ ✓ ✓

orchestrator ✓ ✓ ✓ ✓ ✓

worker ✓ ✓ ✓

✓

coder ✓ ✓ ✓

✓

reasoner

read ✓

✓

API Reference

Agent lifecycle POST /agents/register GET /agents GET /agents/{id} DELETE /agents/{id} POST /agents/spawn POST /agents/{id}/suspend POST /agents/{id}/resume POST /agents/{id}/signal POST /agents/{id}/lock/{name} DELETE /agents/{id}/lock/{name} GET /agents/{id}/usage GET /usage GET /tombstones GET /tombstones/{id}

Goals GET /goals/{agent_id} POST /goals/{agent_id} DELETE /goals/{agent_id}/{goal_id}

Tasks and streaming POST /tasks/submit GET /tasks/{id} GET /tasks GET /tasks/{id}/stream GET /tasks/{id}/partial DELETE /tasks/{id}

Consensus POST /consensus/propose POST /consensus/{id}/vote GET /consensus/{id} GET /agents/{id}/consensus DELETE /consensus/{id}

Checkpoints and replay POST /agents/{id}/checkpoint POST /agents/{id}/restore/{checkpoint_id} GET /agents/{id}/checkpoints

§5 Human · 22%

GET /checkpoints/{a}/diff/{b} POST /checkpoints/{id}/replay

Transactions POST /txn/begin POST /txn/{id}/stage POST /txn/{id}/commit POST /txn/{id}/rollback GET /txn/{id}

Memory POST /memory/alloc GET /memory/read/{key} DELETE /memory/{key} GET /memory POST /memory/compress GET /memory/stats

Filesystem, shell, search GET /health GET /state POST /shell

GET /fs/list GET /fs/read POST /fs/write POST /fs/batch-read GET /fs/search POST /fs/read_context

POST /semantic/search POST /semantic/index

POST /ollama/chat POST /ollama/generate GET /ollama/models

Audit, events, lineage, rate limiting GET /audit GET /audit/stats/{id} GET /audit/anomalies

POST /events/subscribe DELETE /events/subscribe/{id} GET /events/history

GET /agents/{id}/lineage GET /agents/{id}/subtree GET /agents/{id}/blast-radius

GET /agents/{id}/rate-limits POST /agents/{id}/rate-limits

MCP tools 91 tools available in Claude Code and any MCP-compatible client.

Category Tools

System state, state_diff, state_history

Shell shell_exec

Filesystem fs_read, fs_write, fs_list, fs_batch_read, read_context

Search search_files, search_content, semantic_search

Git git_status, git_log, git_diff, git_commit

Ollama ollama_chat

Agent OS agent_register, agent_list, agent_get, agent_spawn, agent_suspend, agent_resume, agent_terminate, agent_lock, agent_lock_release, agent_usage, task_submit, task_get, task_list, message_send, message_inbox, message_thread

Session agent_handoff, agent_pickup

Memory memory_get, memory_set,

§6 Human · 27%

memory_alloc, memory_read, memory_free, memory_list, memory_compress, heap_stats

Standards standards_set, standards_get, standards_list, standards_relevant, standards_delete

Audit audit_query, audit_stats, anomaly_history

Transactions txn_begin, txn_commit, txn_rollback, txn_status

Lineage agent_lineage, agent_subtree, agent_blast_radius, task_critical_path

Streaming task_stream

Rate limiting rate_limit_status, rate_limit_configure

Events event_subscribe, event_unsubscribe, event_history

VRAM model_status

Under the hood The agent behavior is the interesting part. The infrastructure underneath is what makes it possible to run continuously without falling over. Each piece below is a real OS primitive implemented for multi-agent use. If you want to understand how any of it actually works, or build on top of it, this is the detail.

Build history (Phase 0 through 6) Phase 0-1: OS Kernel Primitives (v0.1.0 to v1.2.0) Eight foundational mechanisms. Every higher-order system depends on these. Without events, systems poll. Without signals, you can't coordinate. Without memory management, you have no state. Without audit, you can't trace failures. Without transactions, concurrent agents corrupt each other's data. Without lineage, you can't understand causality. Each primitive is small, focused, and orthogonal. Phase 2: Agent Services (v1.3.0 to v1.3.7) Services that are only possible because Phase 1 exists. Distributed tracing (needs audit + registry). Checkpoints (needs memory + transactions). Consensus (needs events + transactions). Adaptive routing (needs scheduler + audit). Self-extension (needs consensus + full stack). Phase 3: Cognitive Infrastructure (v2.0.0 to v2.5.0) Replacing every human-facing interface with agent-native cognition. Agents navigate capability graphs by meaning using vector embeddings. Memory works in embedding space. Self-extension is fully autonomous. Phase 4: Autonomous Agent Runtime (v3.0.0 to v4.4.0) The OS is complete.