SimpleAgentOS — Self-Explorer

A self-reflective AI system that reads its own source code and journals about what it finds. Gemma4 (local LLM via llama.cpp) explores, analyzes, and reflects on its own architecture in real-time, visible through a terminal-style browser dashboard.

Version: 0.2.0 | License: MIT | Status: Working prototype

Quick Start

# 1. Clone
git clone https://github.com/ctavolazzi/SimpleAgentOS.git
cd SimpleAgentOS

# 2. Install Python deps
pip install -r requirements.txt

# 3. Edit paths (set your llama-server binary + model location)
nano run_self_explore.sh

# 4. Run pre-flight checks
python -m pytest tests/test_smoke.py -v

# 5. Launch
./run_self_explore.sh

# 6. Open browser
open http://localhost:1010

Click START EXPLORATION and watch Gemma4 read its own code.

How It Works

Browser (:1010)
  |  SSE (real-time, 500ms push)
  v
self_explore_server.py (FastAPI on :1010)
  |--- GET  /                       --> serves dashboard HTML
  |--- GET  /api/health             --> status check
  |--- POST /api/explorer/start     --> launch OODA loop
  |--- POST /api/explorer/stop      --> cancel (kills in-flight request)
  |--- GET  /api/explorer/status    --> current state
  |--- GET  /api/explorer/journal   --> full journal (polling fallback)
  |--- GET  /api/explorer/stream    --> SSE real-time feed
  |--- GET  /api/explorer/docs      --> generated documentation
  |--- POST /api/query              --> relay any prompt to LLM
  v
llama-server (:8080)
  Gemma-4 E4B (Q4_K_M, 5GB, CPU mode)
  OpenAI-compatible /v1/chat/completions

The OODA Loop

Each step, the SelfExplorer:

Observe — reads a source file from its own codebase (max 1500 chars)
Orient — sends the file to Gemma4: "What does this file do? What does it reveal about your architecture?"
Decide — picks the next file from the queue, or auto-discovers .py files
Act — records analysis + 1-sentence reflection to the journal

Every token streams to the browser via Server-Sent Events. You see Gemma4 think in real-time.

Prerequisites

Requirement	Why	How to get it
Python 3.10+	Union type syntax, match statements	`brew install python@3.12`
llama.cpp (built)	Runs Gemma4 locally	`git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp && cmake -B build && cmake --build build --target llama-server`
Gemma-4 E4B GGUF	The brain	`huggingface-cli download google/gemma-4-E4B-it-GGUF --include 'Q4_K_M' --local-dir ~/`
npx / Node.js	`concurrently` for launching both servers	`brew install node`

Configuration

Edit run_self_explore.sh:

LLAMA_SERVER="$HOME/Code/llama.cpp/build/bin/llama-server"  # your path
MODEL="$HOME/google_gemma-4-E4B-it-Q4_K_M.gguf"            # your model

CPU-only mode (Intel Macs / no GPU)

The launch script already sets GGML_METAL=0 and -ngl 0. If you have a GPU that works with Metal, you can remove these flags for faster inference.

Tuning for your hardware

Flag	Default	What it does
`-c 2048`	Context window size	Lower = less RAM, faster prefill
`-np 1`	Parallel sequences	Keep at 1 for CPU
`-b 256`	Batch size	Lower = less RAM per batch
`--no-warmup`	Skip warmup	Faster startup (app does its own warmup)

Testing

# Pre-flight: verify deps, files, binaries, ports
python -m pytest tests/test_smoke.py -v

# Unit tests: test SelfExplorer logic (no LLM needed)
python -m pytest tests/test_explorer.py -v

# All tests
python -m pytest tests/ -v

Troubleshooting

"Port 8080 already in use"

lsof -ti:8080 | xargs kill -9

"Port 1010 already in use"

lsof -ti:1010 | xargs kill -9

GPU Timeout / Metal errors on Intel Mac

The launch script already handles this with GGML_METAL=0. If you still see Metal errors:

GGML_METAL=0 llama-server -m model.gguf --port 8080 -ngl 0

"Prefilling... Xs elapsed" takes forever

CPU inference on a 5GB model is slow. Expected times:

Warmup (17 tokens): ~2s
File analysis (~400 token prompt): ~20-40s prefill, then ~5 tok/s generation
Reflection (~50 token prompt): ~5s prefill

If it's taking >2 minutes, check htop — all CPU cores should be busy.

No tokens appearing in browser

Check the terminal — you should see colored tokens (yellow = thinking, green = content)
If terminal shows tokens but browser doesn't, hard-refresh the page (Cmd+Shift+R)
Check browser console for errors (F12)
Verify SSE: curl -N http://localhost:1010/api/explorer/stream

"Connection refused" in browser console

llama-server hasn't started yet. Wait for main: server is listening on http://127.0.0.1:8080 in the terminal.

STOP button doesn't respond

If the explorer is mid-request, STOP cancels the in-flight HTTP call. It may take 1-2 seconds. Check the terminal for [STOP] Task cancelled.

FAQ

Q: Can I use a different model? A: Yes. Any GGUF model served by llama-server works. Edit MODEL in run_self_explore.sh. Smaller models (1-3B) will be faster on CPU.

Q: Can I use Ollama instead of llama-server? A: Yes. Point LLAMA_URL in self_explore_server.py to http://localhost:11434/v1/chat/completions and run ollama serve + ollama run gemma3.

Q: Why SSE instead of WebSocket? A: The journal is a one-way data stream (server → browser). SSE is simpler, auto-reconnects, and needs zero client libraries. WebSocket is overkill for read-only feeds.

Q: Why does it truncate files at 1500 chars? A: CPU prefill time scales linearly with prompt length. 1500 chars (~400 tokens) keeps prefill under 30 seconds. Change MAX_FILE_CHARS in self_explore_server.py if you have faster hardware.

Q: Can I add my own files to the exploration queue? A: Edit _seed_queue() in self_explore_server.py. The explorer also auto-discovers .py files when the queue empties.

Q: Where are the generated docs? A: .self_explorer/docs/ (gitignored). The explorer writes markdown files there as it discovers patterns.

Q: How do I copy the session log? A: Click the COPY button in the top bar. It exports the full journal as plain text to your clipboard.

Files

File	Purpose
`self_explore_server.py`	FastAPI server + SelfExplorer agent (~350 lines)
`self_explore.html`	Browser dashboard with SSE streaming (~340 lines)
`run_self_explore.sh`	Launch script (llama-server + Python server via concurrently)
`requirements.txt`	Python dependencies (fastapi, uvicorn, httpx)
`tests/test_smoke.py`	Pre-flight checks: deps, files, binaries, ports
`tests/test_explorer.py`	Unit tests for SelfExplorer logic
`CHANGELOG.md`	Version history
`core_engine/`	Original SimpleAgentOS (React + PocketBase + llama.cpp)
`build_os.py`	Original system builder
`rebuild_AgentOS.py`	Original system rebuilder
`seed_engine.py`	Original data seeder

Version History

See CHANGELOG.md for full details.

v0.2.0 (2026-04-05) — Self-Explorer: OODA loop, SSE streaming, real-time dashboard
v0.1.0 (2026-04-04) — Original SimpleAgentOS: React HUD + PocketBase + llama.cpp

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
core_engine		core_engine
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
brain.py		brain.py
build_os.py		build_os.py
capabilities.json		capabilities.json
ranch.py		ranch.py
rebuild_AgentOS.py		rebuild_AgentOS.py
requirements.txt		requirements.txt
run_self_explore.sh		run_self_explore.sh
saloon_wireframe.html		saloon_wireframe.html
seed_engine.py		seed_engine.py
self_explore.html		self_explore.html
self_explore_server.py		self_explore_server.py
state_db.py		state_db.py
tree.txt		tree.txt
wild_west.html		wild_west.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SimpleAgentOS — Self-Explorer

Quick Start

How It Works

The OODA Loop

Prerequisites

Configuration

CPU-only mode (Intel Macs / no GPU)

Tuning for your hardware

Testing

Troubleshooting

"Port 8080 already in use"

"Port 1010 already in use"

GPU Timeout / Metal errors on Intel Mac

"Prefilling... Xs elapsed" takes forever

No tokens appearing in browser

"Connection refused" in browser console

STOP button doesn't respond

FAQ

Files

Version History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SimpleAgentOS — Self-Explorer

Quick Start

How It Works

The OODA Loop

Prerequisites

Configuration

CPU-only mode (Intel Macs / no GPU)

Tuning for your hardware

Testing

Troubleshooting

"Port 8080 already in use"

"Port 1010 already in use"

GPU Timeout / Metal errors on Intel Mac

"Prefilling... Xs elapsed" takes forever

No tokens appearing in browser

"Connection refused" in browser console

STOP button doesn't respond

FAQ

Files

Version History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages