|
24 | 24 | - [Accessibility Element Finder](#accessibility-element-finder) |
25 | 25 | - [AI Element Locator (VLM)](#ai-element-locator-vlm) |
26 | 26 | - [OCR (Text on Screen)](#ocr-text-on-screen) |
| 27 | + - [LLM Action Planner](#llm-action-planner) |
| 28 | + - [Runtime Variables & Control Flow](#runtime-variables--control-flow) |
| 29 | + - [Remote Desktop](#remote-desktop) |
27 | 30 | - [Clipboard](#clipboard) |
28 | 31 | - [Screenshot](#screenshot) |
29 | 32 | - [Action Recording & Playback](#action-recording--playback) |
|
57 | 60 | - **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold |
58 | 61 | - **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role |
59 | 62 | - **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates |
60 | | -- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text |
| 63 | +- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump |
| 64 | +- **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude |
| 65 | +- **Runtime Variables & Control Flow** — `${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts |
| 66 | +- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap) |
61 | 67 | - **Clipboard** — read/write system clipboard text on Windows, macOS, and Linux |
62 | 68 | - **Screenshot & Screen Recording** — capture full screen or regions as images, record screen to video (AVI/MP4) |
63 | 69 | - **Action Recording & Playback** — record mouse/keyboard events and replay them |
@@ -408,6 +414,201 @@ If Tesseract is not on `PATH`, point at it explicitly: |
408 | 414 | ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe") |
409 | 415 | ``` |
410 | 416 |
|
| 417 | +Dump every recognised text record in a region (or full screen), or |
| 418 | +search by regex when the text varies: |
| 419 | + |
| 420 | +```python |
| 421 | +import je_auto_control as ac |
| 422 | + |
| 423 | +# Every hit in a region as TextMatch records (text, bounding box, confidence) |
| 424 | +for match in ac.read_text_in_region(region=[0, 0, 800, 600]): |
| 425 | + print(match.text, match.center, match.confidence) |
| 426 | + |
| 427 | +# Regex — accepts a pattern string or a compiled re.Pattern |
| 428 | +for match in ac.find_text_regex(r"Order#\d+"): |
| 429 | + print(match.text, match.center) |
| 430 | +``` |
| 431 | + |
| 432 | +GUI: **OCR Reader** tab. |
| 433 | + |
| 434 | +### LLM Action Planner |
| 435 | + |
| 436 | +Translate plain-language descriptions into validated `AC_*` action lists |
| 437 | +using an LLM (Anthropic Claude by default). Output is leniently parsed |
| 438 | +(strips code fences, extracts the first JSON array from prose) and then |
| 439 | +validated by the same schema the executor uses, so the result can be |
| 440 | +piped straight into `execute_action`: |
| 441 | + |
| 442 | +```python |
| 443 | +import je_auto_control as ac |
| 444 | +from je_auto_control.utils.executor.action_executor import executor |
| 445 | + |
| 446 | +actions = ac.plan_actions( |
| 447 | + "click the Submit button, then type 'done' and save", |
| 448 | + known_commands=executor.known_commands(), |
| 449 | +) |
| 450 | +executor.execute_action(actions) |
| 451 | + |
| 452 | +# Or in a single call: |
| 453 | +ac.run_from_description("open Notepad and type hello", executor=executor) |
| 454 | +``` |
| 455 | + |
| 456 | +| Variable | Effect | |
| 457 | +|---|---| |
| 458 | +| `ANTHROPIC_API_KEY` | Enables the Anthropic backend | |
| 459 | +| `AUTOCONTROL_LLM_BACKEND` | `anthropic` to force a backend | |
| 460 | +| `AUTOCONTROL_LLM_MODEL` | Override the default model (e.g. `claude-opus-4-7`) | |
| 461 | + |
| 462 | +GUI: **LLM Planner** tab — description box, `QThread`-backed *Plan* |
| 463 | +button, action-list preview, and a *Run plan* button. |
| 464 | + |
| 465 | +### Runtime Variables & Control Flow |
| 466 | + |
| 467 | +The executor resolves `${var}` placeholders **per command call** rather |
| 468 | +than pre-flattening, so nested `body` / `then` / `else` lists keep their |
| 469 | +placeholders and re-bind on every iteration. Combined with new mutation |
| 470 | +commands, scripts can drive themselves from data without Python glue: |
| 471 | + |
| 472 | +```json |
| 473 | +[ |
| 474 | + ["AC_set_var", {"name": "items", "value": ["alpha", "beta"]}], |
| 475 | + ["AC_set_var", {"name": "i", "value": 0}], |
| 476 | + ["AC_for_each", { |
| 477 | + "items": "${items}", "as": "name", |
| 478 | + "body": [ |
| 479 | + ["AC_inc_var", {"name": "i"}], |
| 480 | + ["AC_if_var", { |
| 481 | + "name": "i", "op": "ge", "value": 2, |
| 482 | + "then": [["AC_break"]], "else": [] |
| 483 | + }] |
| 484 | + ] |
| 485 | + }] |
| 486 | +] |
| 487 | +``` |
| 488 | + |
| 489 | +`AC_if_var` operators: `eq`, `ne`, `lt`, `le`, `gt`, `ge`, `contains`, |
| 490 | +`startswith`, `endswith`. GUI: **Variables** tab — live view of |
| 491 | +`executor.variables` with single-set, JSON seed, and clear-all controls. |
| 492 | + |
| 493 | +### Remote Desktop |
| 494 | + |
| 495 | +Stream this machine's screen and accept remote input, **or** view and |
| 496 | +control another machine. The wire format is a length-prefixed framing |
| 497 | +on raw TCP (no extra deps), starting with an HMAC-SHA256 |
| 498 | +challenge / response handshake; viewers that fail auth are dropped |
| 499 | +before they can see a frame. JPEG frames are produced at the configured |
| 500 | +FPS / quality and broadcast to authenticated viewers via a shared |
| 501 | +latest-frame slot, so a slow viewer drops frames instead of blocking |
| 502 | +the rest. Viewer input is JSON, validated against an allowlist, and |
| 503 | +applied through the existing wrappers. |
| 504 | + |
| 505 | +```python |
| 506 | +# Be remoted — start a host and hand the token + port to whoever views you |
| 507 | +from je_auto_control import RemoteDesktopHost |
| 508 | +host = RemoteDesktopHost(token="hunter2", bind="127.0.0.1", |
| 509 | + port=0, fps=10, quality=70) |
| 510 | +host.start() |
| 511 | +print("listening on", host.port, "viewers:", host.connected_clients) |
| 512 | +``` |
| 513 | + |
| 514 | +```python |
| 515 | +# Control another machine — connect a viewer and send input |
| 516 | +from je_auto_control import RemoteDesktopViewer |
| 517 | +viewer = RemoteDesktopViewer(host="10.0.0.5", port=51234, token="hunter2", |
| 518 | + on_frame=lambda jpeg: ...) |
| 519 | +viewer.connect() |
| 520 | +viewer.send_input({"action": "mouse_move", "x": 100, "y": 200}) |
| 521 | +viewer.send_input({"action": "type", "text": "hello"}) |
| 522 | +viewer.disconnect() |
| 523 | +``` |
| 524 | + |
| 525 | +GUI: **Remote Desktop** tab with two sub-tabs. |
| 526 | + |
| 527 | +- **Host** — token field with a *Generate* button, security warning |
| 528 | + about the bind address, start / stop controls, refreshing port + |
| 529 | + viewer-count status, and a 4 fps preview pane below the controls so |
| 530 | + the user being remoted sees what viewers see. |
| 531 | +- **Viewer** — address / port / token form, *Connect* / *Disconnect*, |
| 532 | + and a custom frame-display widget that paints incoming JPEG frames |
| 533 | + scaled with `KeepAspectRatio`. Mouse / wheel / key events on the |
| 534 | + display are remapped from widget coordinates back to the remote |
| 535 | + screen's pixel space using the latest frame's dimensions, then |
| 536 | + forwarded as `INPUT` messages. |
| 537 | + |
| 538 | +> ⚠️ Anyone with the host:port and token gets full mouse / keyboard |
| 539 | +> control of the host machine. Default bind is `127.0.0.1`; expose |
| 540 | +> externally only via SSH tunnel or TLS front-end. The token is the |
| 541 | +> only line of defence — treat it like a password. |
| 542 | +
|
| 543 | +**Encrypted transports + alternate protocols.** Pass an `ssl_context` |
| 544 | +to either `RemoteDesktopHost` or `RemoteDesktopViewer` to wrap every |
| 545 | +connection in TLS. For firewall-friendly access, use the in-tree |
| 546 | +WebSocket variants (no extra deps) — same protocol, RFC 6455 framing, |
| 547 | +and `wss://` if you also pass `ssl_context`: |
| 548 | + |
| 549 | +```python |
| 550 | +from je_auto_control import ( |
| 551 | + WebSocketDesktopHost, WebSocketDesktopViewer, |
| 552 | +) |
| 553 | +host = WebSocketDesktopHost(token="hunter2", ssl_context=server_ctx) |
| 554 | +viewer = WebSocketDesktopViewer( |
| 555 | + host="example.com", port=443, token="hunter2", |
| 556 | + ssl_context=client_ctx, expected_host_id="123456789", |
| 557 | +) |
| 558 | +``` |
| 559 | + |
| 560 | +**Persistent Host ID.** Every host owns a stable 9-digit numeric ID |
| 561 | +(persisted at `~/.je_auto_control/remote_host_id`), announced in |
| 562 | +`AUTH_OK` and verifiable via the viewer's `expected_host_id`: |
| 563 | + |
| 564 | +```python |
| 565 | +print(host.host_id) # e.g. "123456789" |
| 566 | +viewer = RemoteDesktopViewer( |
| 567 | + host=..., port=..., token=..., |
| 568 | + expected_host_id="123456789", # AuthenticationError on mismatch |
| 569 | +) |
| 570 | +``` |
| 571 | + |
| 572 | +**Audio streaming (host → viewer).** Optional `sounddevice` dep; opt |
| 573 | +in with `enable_audio=True` on the host, attach an `AudioPlayer` (or |
| 574 | +your own callback) on the viewer: |
| 575 | + |
| 576 | +```python |
| 577 | +host = RemoteDesktopHost(token="tok", enable_audio=True) |
| 578 | + |
| 579 | +from je_auto_control.utils.remote_desktop import AudioPlayer |
| 580 | +player = AudioPlayer(); player.start() |
| 581 | +viewer = RemoteDesktopViewer(host=..., on_audio=player.play) |
| 582 | +``` |
| 583 | + |
| 584 | +**Clipboard sync (text + image, bidirectional).** Explicit per-call — |
| 585 | +no auto-poll loops. Image clipboard works on Windows (CF_DIB via |
| 586 | +ctypes) and Linux (`xclip -t image/png`); macOS get is supported via |
| 587 | +Pillow ImageGrab, set requires PyObjC. |
| 588 | + |
| 589 | +```python |
| 590 | +viewer.send_clipboard_text("hello") |
| 591 | +viewer.send_clipboard_image(open("logo.png", "rb").read()) |
| 592 | +host.broadcast_clipboard_text("greetings") |
| 593 | +``` |
| 594 | + |
| 595 | +**File transfer with progress.** Bidirectional, chunked, arbitrary |
| 596 | +destination path, no size cap; the GUI viewer also accepts drag-drop: |
| 597 | + |
| 598 | +```python |
| 599 | +viewer.send_file( |
| 600 | + "local.bin", "/tmp/uploaded.bin", |
| 601 | + on_progress=lambda tid, done, total: print(done, total), |
| 602 | +) |
| 603 | +host.send_file_to_viewers("local.bin", "/tmp/from_host.bin") |
| 604 | +``` |
| 605 | + |
| 606 | +> ⚠️ Path is unrestricted and there is no aggregate size limit. |
| 607 | +> Anyone with the token can write any file to any location and can |
| 608 | +> fill the disk — keep "trusted token holders == trusted users" in |
| 609 | +> mind, or wrap with your own `FileReceiver` subclass that vets |
| 610 | +> destination paths. |
| 611 | +
|
411 | 612 | ### Clipboard |
412 | 613 |
|
413 | 614 | ```python |
@@ -494,10 +695,13 @@ je_auto_control.execute_action([ |
494 | 695 | | Screen | `AC_screen_size`, `AC_screenshot` | |
495 | 696 | | Accessibility | `AC_a11y_list`, `AC_a11y_find`, `AC_a11y_click` | |
496 | 697 | | VLM (AI Locator) | `AC_vlm_locate`, `AC_vlm_click` | |
497 | | -| OCR | `AC_locate_text`, `AC_click_text`, `AC_wait_text` | |
| 698 | +| OCR | `AC_locate_text`, `AC_click_text`, `AC_wait_text`, `AC_read_text_in_region`, `AC_find_text_regex` | |
| 699 | +| LLM planner | `AC_llm_plan`, `AC_llm_run` | |
498 | 700 | | Clipboard | `AC_clipboard_get`, `AC_clipboard_set` | |
499 | 701 | | Window | `AC_list_windows`, `AC_focus_window`, `AC_wait_window`, `AC_close_window` | |
500 | | -| Flow control | `AC_loop`, `AC_break`, `AC_continue`, `AC_if_image_found`, `AC_if_pixel`, `AC_while_image`, `AC_wait_image`, `AC_wait_pixel`, `AC_sleep`, `AC_retry` | |
| 702 | +| Flow control | `AC_loop`, `AC_break`, `AC_continue`, `AC_if_image_found`, `AC_if_pixel`, `AC_if_var`, `AC_while_image`, `AC_for_each`, `AC_wait_image`, `AC_wait_pixel`, `AC_sleep`, `AC_retry` | |
| 703 | +| Variables | `AC_set_var`, `AC_get_var`, `AC_inc_var` | |
| 704 | +| Remote desktop | `AC_start_remote_host`, `AC_stop_remote_host`, `AC_remote_host_status`, `AC_remote_connect`, `AC_remote_disconnect`, `AC_remote_viewer_status`, `AC_remote_send_input` | |
501 | 705 | | Record | `AC_record`, `AC_stop_record`, `AC_set_record_enable` | |
502 | 706 | | Report | `AC_generate_html`, `AC_generate_json`, `AC_generate_xml`, `AC_generate_html_report`, `AC_generate_json_report`, `AC_generate_xml_report` | |
503 | 707 | | Run history | `AC_history_list`, `AC_history_clear` | |
|
0 commit comments