Skip to content

Commit 51fc521

Browse files
committed
Expose remote desktop over MCP and document recent UX changes
MCP tool surface - New ``remote_desktop_tools()`` factory in ``je_auto_control/utils/mcp_server/tools/_factories.py`` exposes the same singleton remote-desktop registry the GUI's Remote Desktop tab uses: ac_remote_host_start (token, bind, port, fps, quality, …) ac_remote_host_stop (timeout) ac_remote_host_status (read-only) ac_remote_viewer_connect (host, port, token, expected_host_id) ac_remote_viewer_disconnect (timeout) ac_remote_viewer_status (read-only) ac_remote_viewer_send_input (action: dict) - Adapter handlers in ``_handlers.py`` import the registry lazily so the existing tool group stays cheap to load. - Status / observer tools (`*_status`) carry ``READ_ONLY`` so they survive ``--readonly`` mode; ``send_input`` is correctly tagged ``destructiveHint`` so MCP clients can prompt for confirmation. Tests - ``test_mcp_server.test_remote_desktop_tools_are_registered`` — schema + annotation sanity check. - ``test_mcp_server.test_remote_desktop_status_tools_survive_read_only_mode`` — confirms the read-only filter keeps status tools and drops the destructive ones. - 591 / 591 headless pytest pass; ruff clean. Docs - ``docs/source/Eng|Zh/doc/new_features/new_features_doc.rst`` — three new sections: AnyDesk-style popout viewer window, responsive ``QScrollArea`` sub-tab sizing, and the new ``ac_remote_*`` MCP tool surface (with a worked example). - ``docs/source/Eng|Zh/doc/mcp_server/mcp_server_doc.rst`` — the Remote Desktop tool group is listed in the tool catalogue. - ``README.md`` and the two CN/TW READMEs — Remote Desktop entry now mentions the popout, ``QScrollArea`` resizing, and the headless / MCP driveability; MCP entry highlights the new ``ac_remote_*`` tools and the bumped tool count (~100).
1 parent 03c9dfe commit 51fc521

10 files changed

Lines changed: 445 additions & 7 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump
6464
- **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude
6565
- **Runtime Variables & Control Flow**`${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts
66-
- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README)
66+
- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools
6767
- **Clipboard** — read/write system clipboard text on Windows, macOS, and Linux
6868
- **Screenshot & Screen Recording** — capture full screen or regions as images, record screen to video (AVI/MP4)
6969
- **Action Recording & Playback** — record mouse/keyboard events and replay them
@@ -73,7 +73,7 @@
7373
- **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
7474
- **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
7575
- **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
76-
- **MCP Server** — JSON-RPC 2.0 Model Context Protocol server (stdio + HTTP/SSE) so Claude Desktop / Claude Code / custom tool-use loops can drive AutoControl. ~90 tools, full protocol coverage (resources, prompts, sampling, roots, logging, progress, cancellation, elicitation), bearer-token auth + TLS, audit log, rate limit, plugin hot-reload, CI fake backend
76+
- **MCP Server** — JSON-RPC 2.0 Model Context Protocol server (stdio + HTTP/SSE) so Claude Desktop / Claude Code / custom tool-use loops can drive AutoControl. ~100 tools, full protocol coverage (resources, prompts, sampling, roots, logging, progress, cancellation, elicitation), bearer-token auth + TLS, audit log, rate limit, plugin hot-reload, CI fake backend. New in this release: `ac_remote_host_start` / `ac_remote_host_stop` / `ac_remote_host_status` / `ac_remote_viewer_connect` / `ac_remote_viewer_disconnect` / `ac_remote_viewer_status` / `ac_remote_viewer_send_input` wrap the same singleton remote-desktop registry the GUI uses, so a model can spin up a host, open a viewer to another machine, and forward mouse / keyboard / type / hotkey actions through the active session
7777
- **Remote Automation** — TCP socket server **and** hardened REST API: bearer-token auth, per-IP rate limit + lockout, SQLite audit hook, Prometheus `/metrics`, OpenAPI-style endpoint table (`/health`, `/screen_size`, `/sessions`, `/screenshot`, `/execute`, `/audit/list`, `/audit/verify`, `/inspector/recent`, `/usb/devices`, `/diagnose`, ...), and a vanilla-JS browser dashboard at `/dashboard` (any phone with HTTP reach can monitor the host)
7878
- **Plugin Loader** — drop `.py` files exposing `AC_*` callables into a directory and register them as executor commands at runtime
7979
- **Shell Integration** — execute shell commands within automation workflows with async output capture

README/README_zh-CN.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
- **OCR** — 使用 Tesseract 从屏幕提取文字,可搜索、点击或等待文字出现;支持 regex 搜索与整块区域 dump
6363
- **LLM 动作规划器** — 用 Claude 把自然语言描述翻译成验证过的 `AC_*` 动作清单
6464
- **运行期变量与流程控制** — 执行时 `${var}` 替换,加上 `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` 让脚本数据驱动
65-
- **远程桌面** — 用 token 认证的 TCP 协议串流本机画面并接收输入,**** 连接到他机观看与控制(host + viewer GUI 内置)。可选 TLS(HTTPS 级加密)、WebSocket 传输(``ws://`` + ``wss://``,穿墙/浏览器友好)、持久化 9 位数 Host ID、host→viewer 音频串流、双向剪贴板同步(文字 + 图片)、分块文件传输(拖放 + 进度条;任意目的路径;无大小上限)。另含文件夹同步(增量镜像 — 本地删除不会传出去)与自建 coturn TURN 配置包生成器(turnserver.conf + systemd unit + docker-compose + README)
65+
- **远程桌面** — 用 token 认证的 TCP 协议串流本机画面并接收输入,**** 连接到他机观看与控制(host + viewer GUI 内置)。可选 TLS(HTTPS 级加密)、WebSocket 传输(``ws://`` + ``wss://``,穿墙/浏览器友好)、持久化 9 位数 Host ID、host→viewer 音频串流、双向剪贴板同步(文字 + 图片)、分块文件传输(拖放 + 进度条;任意目的路径;无大小上限)。另含文件夹同步(增量镜像 — 本地删除不会传出去)与自建 coturn TURN 配置包生成器(turnserver.conf + systemd unit + docker-compose + README)**AnyDesk 风格弹出窗口**:viewer 认证成功后远程桌面会开在独立的可调整大小顶层窗口,控制面板保持简洁;Remote Desktop 子分页外层包了 `QScrollArea`,小窗口下可滚动、4K 屏幕下会铺满。同时支持 headless API 与 MCP 工具 (`ac_remote_*`) 直接驱动
6666
- **剪贴板** — 于 Windows / macOS / Linux 读写系统剪贴板文本
6767
- **截图与屏幕录制** — 捕获全屏或指定区域为图片,录制屏幕为视频(AVI/MP4)
6868
- **动作录制与回放** — 录制鼠标/键盘事件并重新播放
@@ -72,7 +72,7 @@
7272
- **事件触发器** — 检测到图像出现、窗口出现、像素变化或文件变动时自动执行脚本
7373
- **执行历史** — 使用 SQLite 记录 scheduler / triggers / hotkeys / REST 的执行结果;错误时自动附带截图
7474
- **报告生成** — 将测试记录导出为 HTML、JSON 或 XML 报告,包含成功/失败状态
75-
- **MCP 服务器** — JSON-RPC 2.0 Model Context Protocol 服务(stdio + HTTP/SSE),让 Claude Desktop / Claude Code / 自定义 tool-use 循环直接驱动 AutoControl。约 90 个工具,完整协议支持(resources、prompts、sampling、roots、logging、progress、cancellation、elicitation),Bearer token 验证 + TLS、审计 log、rate limit、plugin 热加载、CI fake backend
75+
- **MCP 服务器** — JSON-RPC 2.0 Model Context Protocol 服务(stdio + HTTP/SSE),让 Claude Desktop / Claude Code / 自定义 tool-use 循环直接驱动 AutoControl。约 100 个工具,完整协议支持(resources、prompts、sampling、roots、logging、progress、cancellation、elicitation),Bearer token 验证 + TLS、审计 log、rate limit、plugin 热加载、CI fake backend**本次新增** `ac_remote_host_start` / `ac_remote_host_stop` / `ac_remote_host_status` / `ac_remote_viewer_connect` / `ac_remote_viewer_disconnect` / `ac_remote_viewer_status` / `ac_remote_viewer_send_input` 包装 GUI 远程桌面分页所用的 process-global registry,模型可以直接启动 host、连线 viewer、转发 mouse/keyboard/type/hotkey 动作
7676
- **远程自动化** — TCP Socket 服务器 **加上** 强化版 REST API:bearer token 认证、per-IP 速率限制 + 失败锁定、SQLite 审计 hook、Prometheus `/metrics`、完整端点列表(`/health``/screen_size``/sessions``/screenshot``/execute``/audit/list``/audit/verify``/inspector/recent``/usb/devices``/diagnose`、…),以及 vanilla-JS 的浏览器 dashboard `/dashboard`(任何能 HTTP 连到主机的手机都能监控)
7777
- **插件加载器** — 将定义 `AC_*` 可调用对象的 `.py` 文件放入目录,运行时即可注册为 executor 命令
7878
- **Shell 集成** — 在自动化流程中执行 Shell 命令,支持异步输出捕获

README/README_zh-TW.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@
6262
- **OCR** — 使用 Tesseract 從螢幕擷取文字,可搜尋、點擊或等待文字出現;支援 regex 搜尋與整塊區域 dump
6363
- **LLM 動作規劃器** — 用 Claude 把自然語言描述翻譯成驗證過的 `AC_*` 動作清單
6464
- **執行期變數與流程控制** — 執行時 `${var}` 取代,加上 `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` 讓腳本資料驅動
65-
- **遠端桌面** — 用 token 認證的 TCP 協定串流本機畫面並接收輸入,**** 連線到他機觀看與控制(host + viewer GUI 皆內建)。可選 TLS(HTTPS 級加密)、WebSocket 傳輸(``ws://`` + ``wss://``,穿牆/瀏覽器友善)、持久化 9 位數 Host ID、host→viewer 音訊串流、雙向剪貼簿同步(文字 + 圖片)、分塊檔案傳輸(拖放 + 進度條;任意目的路徑;無大小上限)。另含資料夾同步(增量鏡像 — 本地刪除不會傳出去)與自架 coturn TURN 設定包產生器(turnserver.conf + systemd unit + docker-compose + README)
65+
- **遠端桌面** — 用 token 認證的 TCP 協定串流本機畫面並接收輸入,**** 連線到他機觀看與控制(host + viewer GUI 皆內建)。可選 TLS(HTTPS 級加密)、WebSocket 傳輸(``ws://`` + ``wss://``,穿牆/瀏覽器友善)、持久化 9 位數 Host ID、host→viewer 音訊串流、雙向剪貼簿同步(文字 + 圖片)、分塊檔案傳輸(拖放 + 進度條;任意目的路徑;無大小上限)。另含資料夾同步(增量鏡像 — 本地刪除不會傳出去)與自架 coturn TURN 設定包產生器(turnserver.conf + systemd unit + docker-compose + README)**AnyDesk 風格彈出視窗**:viewer 認證成功後遠端桌面會開在獨立的可調整大小頂層視窗,控制面板維持簡潔;Remote Desktop 子分頁外層包了 `QScrollArea`,小視窗下可捲動、4K 螢幕下會延展到整寬。同時可由 headless API 與 MCP 工具(`ac_remote_*`)直接驅動
6666
- **剪貼簿** — 於 Windows / macOS / Linux 讀寫系統剪貼簿文字
6767
- **截圖與螢幕錄製** — 擷取全螢幕或指定區域為圖片,錄製螢幕為影片(AVI/MP4)
6868
- **動作錄製與回放** — 錄製滑鼠/鍵盤事件並重新播放
@@ -72,7 +72,7 @@
7272
- **事件觸發器** — 偵測到影像出現、視窗出現、像素變化或檔案變動時自動執行腳本
7373
- **執行歷史** — 以 SQLite 紀錄 scheduler / triggers / hotkeys / REST 的執行結果;錯誤時自動附上截圖
7474
- **報告產生** — 將測試紀錄匯出為 HTML、JSON 或 XML 報告,包含成功/失敗狀態
75-
- **MCP 伺服器** — JSON-RPC 2.0 Model Context Protocol 服務(stdio + HTTP/SSE),讓 Claude Desktop / Claude Code / 自訂 tool-use 迴圈直接驅動 AutoControl。約 90 個工具,完整協定支援(resources、prompts、sampling、roots、logging、progress、cancellation、elicitation),Bearer token 驗證 + TLS、稽核 log、rate limit、plugin 熱重載、CI fake backend
75+
- **MCP 伺服器** — JSON-RPC 2.0 Model Context Protocol 服務(stdio + HTTP/SSE),讓 Claude Desktop / Claude Code / 自訂 tool-use 迴圈直接驅動 AutoControl。約 100 個工具,完整協定支援(resources、prompts、sampling、roots、logging、progress、cancellation、elicitation),Bearer token 驗證 + TLS、稽核 log、rate limit、plugin 熱重載、CI fake backend**本次新增** `ac_remote_host_start` / `ac_remote_host_stop` / `ac_remote_host_status` / `ac_remote_viewer_connect` / `ac_remote_viewer_disconnect` / `ac_remote_viewer_status` / `ac_remote_viewer_send_input` 包裝 GUI 遠端桌面分頁所用的 process-global registry,模型可以直接啟動 host、連線 viewer、轉送 mouse/keyboard/type/hotkey 動作
7676
- **遠端自動化** — TCP Socket 伺服器 **加上** 強化版 REST API:bearer token 認證、per-IP 速率限制 + 失敗鎖定、SQLite 稽核 hook、Prometheus `/metrics`、完整端點清單(`/health``/screen_size``/sessions``/screenshot``/execute``/audit/list``/audit/verify``/inspector/recent``/usb/devices``/diagnose`、…),以及 vanilla-JS 的瀏覽器 dashboard `/dashboard`(任何能 HTTP 連到主機的手機都能監看)
7777
- **外掛載入器** — 將定義 `AC_*` 可呼叫物的 `.py` 檔放入目錄,執行時即可註冊成 executor 指令
7878
- **Shell 整合** — 在自動化流程中執行 Shell 命令,支援非同步輸出擷取

docs/source/Eng/doc/mcp_server/mcp_server_doc.rst

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,18 @@ Scheduler / triggers / hotkeys
7070
``ac_hotkey_bind``, ``ac_hotkey_unbind``, ``ac_hotkey_list``,
7171
``ac_hotkey_daemon_start``, ``ac_hotkey_daemon_stop``.
7272

73+
Remote desktop (TCP host + viewer registry)
74+
``ac_remote_host_start``, ``ac_remote_host_stop``,
75+
``ac_remote_host_status``, ``ac_remote_viewer_connect``,
76+
``ac_remote_viewer_disconnect``, ``ac_remote_viewer_status``,
77+
``ac_remote_viewer_send_input``. These wrap the same singleton
78+
registry the GUI's Remote Desktop tab uses, so a model can spin
79+
up a host (``token``, ``bind``, ``port``, ``fps``, ``quality``,
80+
``host_id``), open a viewer to another machine, query status, and
81+
forward mouse / keyboard / type / hotkey actions through the
82+
active viewer. Status tools are read-only and survive
83+
``--readonly`` mode; ``send_input`` is destructive by design.
84+
7385
Every tool carries the MCP 2025-06-18 ``annotations`` block
7486
(``readOnlyHint``, ``destructiveHint``, ``idempotentHint``,
7587
``openWorldHint``) so well-behaved clients can auto-approve

docs/source/Eng/doc/new_features/new_features_doc.rst

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -673,3 +673,120 @@ upload flow.
673673
Keep ``trusted token holders == trusted users`` in mind, or wrap
674674
the headless API in your own restricted ``FileReceiver`` subclass
675675
that vets the destination path.
676+
677+
678+
Remote desktop — AnyDesk-style popout window
679+
============================================
680+
681+
The viewer panel no longer renders the live remote screen inline —
682+
when the viewer authenticates, a dedicated top-level
683+
:class:`RemoteScreenWindow` opens with the remote desktop, and the
684+
panel shrinks back to the connection card + controls. Closing the
685+
popup ✕ disconnects the session, matching AnyDesk's session-window
686+
ergonomics.
687+
688+
* New module: ``je_auto_control/gui/remote_desktop/remote_screen_window.py``
689+
* Wraps a ``_FrameDisplay`` and re-emits its mouse / keyboard /
690+
drag-and-drop / annotation signals so the panel keeps a single
691+
signal source after the popout.
692+
* Bottom footer carries the optional file-transfer progress label /
693+
bar; hidden when no transfer is active.
694+
* Both the TCP ``_ViewerPanel`` and the WebRTC
695+
``_WebRTCViewerPanel`` open the popup on connect / on auth_ok and
696+
close it on disconnect / on stop.
697+
698+
Why
699+
The previous layout fought for vertical space: a frame display +
700+
connection card + collapsibles + action row + stats + sparklines
701+
+ transfer progress + status bar all stacked on one tab. Pulling
702+
the live screen out into its own window leaves the operator with
703+
a real workspace and keeps the control surface uncluttered.
704+
705+
706+
Remote desktop — responsive sub-tab sizing
707+
==========================================
708+
709+
Every Remote Desktop sub-tab is now wrapped in a ``QScrollArea``
710+
with ``setWidgetResizable(True)``. The wrapper lives in
711+
``gui/remote_desktop/tab.py`` (helper ``_wrap_in_scroll_area``).
712+
713+
* Small / shrunk window: a vertical scrollbar appears instead of
714+
clipping the dense WebRTC panels.
715+
* Enlarged / 4K window: the inner panel widget grows horizontally
716+
with the viewport, so the connection card and session table
717+
stretch edge-to-edge instead of clustering at the top-left.
718+
* The bottom ``addStretch(1)`` in each panel still pushes content
719+
up when there is leftover height, so the layout doesn't sag.
720+
721+
Heavy / rarely used groups (Manual SDP, Remote Files, Sync) on the
722+
WebRTC viewer tab are also wrapped in collapsed-by-default
723+
``_CollapsibleSection`` shells via the new ``_wrap_collapsed``
724+
helper, halving the panel's first-paint height.
725+
726+
Removed the previous hard ``setMaximumHeight(140)`` on the WebRTC
727+
host's session table: ``setMinimumHeight(140)`` keeps 140 px as a
728+
starting hint without capping the table on large displays.
729+
730+
731+
Remote desktop — MCP tool surface
732+
=================================
733+
734+
The MCP server now wraps the same singleton remote-desktop
735+
registry the GUI uses. The tools live under a new
736+
``remote_desktop_tools()`` factory in
737+
``je_auto_control/utils/mcp_server/tools/_factories.py``:
738+
739+
``ac_remote_host_start``
740+
Start (or restart) the singleton TCP host with ``token``,
741+
``bind``, ``port``, ``fps``, ``quality``, ``max_clients``,
742+
``host_id``. Returns
743+
``{running, port, host_id, connected_clients}``.
744+
745+
``ac_remote_host_stop``
746+
Stop the host (no-op when nothing is running).
747+
748+
``ac_remote_host_status``
749+
Read-only snapshot of the host registry. Survives
750+
``--readonly`` mode.
751+
752+
``ac_remote_viewer_connect``
753+
Open the singleton viewer to a remote host, supporting
754+
``expected_host_id`` to verify the 9-digit ID before accepting
755+
the session.
756+
757+
``ac_remote_viewer_disconnect`` / ``ac_remote_viewer_status``
758+
Close / observe the active viewer (status is read-only).
759+
760+
``ac_remote_viewer_send_input``
761+
Forward an input action dict (``mouse_move``, ``mouse_press``,
762+
``mouse_release``, ``mouse_scroll``, ``key_press``,
763+
``key_release``, ``type``, ``hotkey``) through the connected
764+
viewer to the remote host. Destructive — stripped under
765+
``--readonly``.
766+
767+
A model can now drive a complete remote-control flow without
768+
clicking through the GUI:
769+
770+
.. code-block:: text
771+
772+
ac_remote_host_start(token="tok", bind="127.0.0.1", port=0)
773+
→ {"running": true, "port": 51234, "host_id": "123456789",
774+
"connected_clients": 0}
775+
776+
# … on a different machine …
777+
ac_remote_viewer_connect(host="10.0.0.5", port=51234, token="tok",
778+
expected_host_id="123456789")
779+
→ {"connected": true, "host_id": "123456789"}
780+
781+
ac_remote_viewer_send_input(action={
782+
"action": "mouse_move", "x": 100, "y": 200,
783+
})
784+
ac_remote_viewer_send_input(action={
785+
"action": "type", "text": "hello",
786+
})
787+
788+
The status / observer tools (``ac_remote_host_status``,
789+
``ac_remote_viewer_status``) are read-only and survive the MCP
790+
server's ``--readonly`` filter; everything that mutates state is
791+
correctly tagged ``destructiveHint: true`` so MCP clients can
792+
prompt for user confirmation.

0 commit comments

Comments
 (0)