Skip to content

Commit 6961cd9

Browse files
authored
Merge pull request #178 from Integration-Automation/dev
Add headless MCP server for AutoControl
2 parents 10c67b4 + 6c7d79a commit 6961cd9

35 files changed

Lines changed: 7987 additions & 4 deletions

README.md

Lines changed: 167 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
- [Screenshot](#screenshot)
2929
- [Action Recording & Playback](#action-recording--playback)
3030
- [JSON Action Scripting](#json-action-scripting)
31+
- [MCP Server (Use AutoControl from Claude)](#mcp-server-use-autocontrol-from-claude)
3132
- [Scheduler (Interval & Cron)](#scheduler-interval--cron)
3233
- [Global Hotkey Daemon](#global-hotkey-daemon)
3334
- [Event Triggers](#event-triggers)
@@ -66,6 +67,7 @@
6667
- **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
6768
- **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
6869
- **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
70+
- **MCP Server** — JSON-RPC 2.0 Model Context Protocol server (stdio + HTTP/SSE) so Claude Desktop / Claude Code / custom tool-use loops can drive AutoControl. ~90 tools, full protocol coverage (resources, prompts, sampling, roots, logging, progress, cancellation, elicitation), bearer-token auth + TLS, audit log, rate limit, plugin hot-reload, CI fake backend
6971
- **Remote Automation** — TCP socket server **and** REST API server to receive automation commands
7072
- **Plugin Loader** — drop `.py` files exposing `AC_*` callables into a directory and register them as executor commands at runtime
7173
- **Shell Integration** — execute shell commands within automation workflows with async output capture
@@ -81,6 +83,97 @@
8183

8284
## Architecture
8385

86+
The runtime is layered: **client surfaces** (CLI, GUI, MCP/REST/socket
87+
servers) sit on top of the **headless API** (`wrapper/` + `utils/`),
88+
which resolves to a **per-OS backend** chosen at import time by
89+
`wrapper/platform_wrapper.py`. The package façade
90+
(`je_auto_control/__init__.py`) re-exports every public name so users
91+
need only `import je_auto_control` regardless of which surface or
92+
backend they hit.
93+
94+
```mermaid
95+
flowchart LR
96+
subgraph Clients["Client Surfaces"]
97+
direction TB
98+
Claude[["Claude Desktop /<br/>Claude Code"]]
99+
APIUser[["Custom Anthropic /<br/>OpenAI tool loops"]]
100+
HTTPClient[["HTTP / SSE clients"]]
101+
TCPClient[["Socket / REST clients"]]
102+
GUIUser[["PySide6 GUI"]]
103+
CLIUser[["python -m<br/>je_auto_control[.cli]"]]
104+
Library[["Library users<br/>(import je_auto_control)"]]
105+
end
106+
107+
subgraph Transports["Transports & Servers"]
108+
direction TB
109+
Stdio["MCP stdio<br/>JSON-RPC 2.0"]
110+
HTTPMCP["MCP HTTP /<br/>SSE + auth + TLS"]
111+
REST["REST server<br/>:9939"]
112+
Socket["Socket server<br/>:9938"]
113+
end
114+
115+
subgraph MCP["mcp_server/"]
116+
direction TB
117+
Dispatcher["MCPServer<br/>(JSON-RPC dispatcher)"]
118+
Tools["tools/<br/>~90 ac_* + aliases"]
119+
Resources["resources/<br/>files · history ·<br/>commands · screen-live"]
120+
Prompts["prompts/<br/>built-in templates"]
121+
Context["context · audit ·<br/>rate-limit · log-bridge"]
122+
FakeBE["fake_backend<br/>(CI smoke)"]
123+
end
124+
125+
subgraph Core["Headless Core (wrapper/ + utils/)"]
126+
direction TB
127+
Wrapper["wrapper/<br/>mouse · keyboard · screen ·<br/>image · record · window"]
128+
Executor["executor/<br/>AC_* JSON action engine"]
129+
Vision["vision/ · ocr/ ·<br/>accessibility/"]
130+
Recorder["scheduler/ · triggers/ ·<br/>hotkey/ · plugin_loader/<br/>run_history/"]
131+
IOUtils["clipboard/ · cv2_utils/ ·<br/>shell_process/ · json/"]
132+
end
133+
134+
subgraph Backends["Per-OS Backends"]
135+
direction TB
136+
Win["windows/<br/>Win32 ctypes"]
137+
Mac["osx/<br/>pyobjc · Quartz"]
138+
X11["linux_with_x11/<br/>python-Xlib"]
139+
end
140+
141+
Claude --> Stdio
142+
APIUser --> Stdio
143+
HTTPClient --> HTTPMCP
144+
TCPClient --> Socket
145+
TCPClient --> REST
146+
147+
Stdio --> Dispatcher
148+
HTTPMCP --> Dispatcher
149+
Dispatcher --> Tools
150+
Dispatcher --> Resources
151+
Dispatcher --> Prompts
152+
Dispatcher -.- Context
153+
Tools -.optional.-> FakeBE
154+
155+
Tools --> Wrapper
156+
Tools --> Executor
157+
Tools --> Vision
158+
Tools --> Recorder
159+
Tools --> IOUtils
160+
Resources --> Recorder
161+
Resources --> Wrapper
162+
163+
REST --> Executor
164+
Socket --> Executor
165+
166+
GUIUser --> Wrapper
167+
GUIUser --> Recorder
168+
CLIUser --> Executor
169+
Library --> Wrapper
170+
Library --> Executor
171+
172+
Wrapper --> Backends
173+
Vision -.- Wrapper
174+
Recorder -.- Executor
175+
```
176+
84177
```
85178
je_auto_control/
86179
├── wrapper/ # Platform-agnostic API layer
@@ -89,19 +182,21 @@ je_auto_control/
89182
│ ├── auto_control_keyboard.py# Keyboard operations
90183
│ ├── auto_control_image.py # Image recognition (OpenCV template matching)
91184
│ ├── auto_control_screen.py # Screenshot, screen size, pixel color
185+
│ ├── auto_control_window.py # Cross-platform window manager facade
92186
│ └── auto_control_record.py # Action recording/playback
93187
├── windows/ # Windows-specific backend (Win32 API / ctypes)
94188
├── osx/ # macOS-specific backend (pyobjc / Quartz)
95189
├── linux_with_x11/ # Linux-specific backend (python-Xlib)
96190
├── gui/ # PySide6 GUI application
97191
└── utils/
192+
├── mcp_server/ # MCP server (stdio + HTTP/SSE) — server, tools/, resources, prompts, audit, rate_limit, fake_backend, plugin_watcher
98193
├── executor/ # JSON action executor engine
99194
├── callback/ # Callback function executor
100195
├── cv2_utils/ # OpenCV screenshot, template matching, video recording
101196
├── accessibility/ # UIA (Windows) / AX (macOS) element finder
102197
├── vision/ # VLM-based locator (Anthropic / OpenAI backends)
103198
├── ocr/ # Tesseract-backed text locator
104-
├── clipboard/ # Cross-platform clipboard
199+
├── clipboard/ # Cross-platform clipboard (text + image)
105200
├── scheduler/ # Interval + cron scheduler
106201
├── hotkey/ # Global hotkey daemon
107202
├── triggers/ # Image/window/pixel/file triggers
@@ -408,6 +503,77 @@ je_auto_control.execute_action([
408503
| Process | `AC_execute_process` |
409504
| Executor | `AC_execute_action`, `AC_execute_files` |
410505

506+
### MCP Server (Use AutoControl from Claude)
507+
508+
Expose AutoControl as a Model Context Protocol server so any
509+
MCP-compatible client (Claude Desktop, Claude Code, custom Anthropic
510+
/ OpenAI tool-use loops) can drive the host machine. Stdlib-only —
511+
JSON-RPC 2.0 over stdio or HTTP+SSE.
512+
513+
**Register with Claude Code:**
514+
515+
```bash
516+
claude mcp add autocontrol -- python -m je_auto_control.utils.mcp_server
517+
```
518+
519+
**Register with Claude Desktop** (`claude_desktop_config.json`):
520+
521+
```json
522+
{
523+
"mcpServers": {
524+
"autocontrol": {
525+
"command": "python",
526+
"args": ["-m", "je_auto_control.utils.mcp_server"]
527+
}
528+
}
529+
}
530+
```
531+
532+
**Start programmatically:**
533+
534+
```python
535+
import je_auto_control as ac
536+
537+
# Stdio (blocks until stdin closes)
538+
ac.start_mcp_stdio_server()
539+
540+
# Or HTTP / SSE with bearer-token auth + optional TLS
541+
ac.start_mcp_http_server(host="127.0.0.1", port=9940,
542+
auth_token="hunter2")
543+
```
544+
545+
**Inspect the catalogue without starting the server:**
546+
547+
```bash
548+
je_auto_control_mcp --list-tools
549+
je_auto_control_mcp --list-tools --read-only
550+
je_auto_control_mcp --list-resources
551+
je_auto_control_mcp --list-prompts
552+
```
553+
554+
**What ships:**
555+
556+
| Surface | Coverage |
557+
|---|---|
558+
| Tools (~90) | mouse · keyboard · drag · screen / multi-monitor · screenshot-as-image · diff · OCR · image · windows (move/min/max/restore/...) · clipboard text+image · process / shell · recording · screen recording · scheduler / triggers / hotkeys · accessibility tree · VLM locator · executor · history |
559+
| Aliases | `click`, `type`, `screenshot`, `find_image`, `drag`, `shell`, `wait_image`, ... — toggle with `JE_AUTOCONTROL_MCP_ALIASES=0` |
560+
| Resources | `autocontrol://files/<name>`, `autocontrol://history`, `autocontrol://commands`, `autocontrol://screen/live` (with `resources/subscribe`) |
561+
| Prompts | `automate_ui_task`, `record_and_generalize`, `compare_screenshots`, `find_widget`, `explain_action_file` |
562+
| Protocol | tools / resources / prompts / sampling / roots / logging / progress / cancellation / list_changed / elicitation |
563+
| Transports | stdio, HTTP `POST /mcp`, SSE streaming when `Accept: text/event-stream` |
564+
| Safety | tool annotations · `JE_AUTOCONTROL_MCP_READONLY` · `JE_AUTOCONTROL_MCP_CONFIRM_DESTRUCTIVE` · audit log · token-bucket rate limiter · auto-screenshot on error |
565+
| Ops | bearer-token auth · TLS via `ssl_context` · `PluginWatcher` hot-reload · `JE_AUTOCONTROL_FAKE_BACKEND=1` for CI |
566+
567+
See [docs/source/Eng/doc/mcp_server/mcp_server_doc.rst](docs/source/Eng/doc/mcp_server/mcp_server_doc.rst)
568+
for the full reference (or the
569+
[繁體中文](docs/source/Zh/doc/mcp_server/mcp_server_doc.rst) version).
570+
571+
> ⚠️ The MCP server can move the mouse, send keystrokes, capture the
572+
> screen, and execute arbitrary `AC_*` actions. Only register it with
573+
> MCP clients you trust. HTTP defaults to `127.0.0.1`; binding to
574+
> `0.0.0.0` requires explicit reason and **must** be paired with
575+
> `auth_token` plus `ssl_context`.
576+
411577
### Scheduler (Interval & Cron)
412578

413579
```python

0 commit comments

Comments
 (0)