Skip to content

Commit 241b812

Browse files
authored
Merge pull request #175 from Integration-Automation/dev
Merge dev: VLM, accessibility, run history, analyzer fixes
2 parents 08b7b18 + 6c046e8 commit 241b812

101 files changed

Lines changed: 6112 additions & 1112 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.codacy.yaml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
---
2+
# Codacy repository configuration.
3+
# Docs: https://docs.codacy.com/repositories-configure/codacy-configuration-file/
4+
5+
engines:
6+
bandit:
7+
enabled: true
8+
exclude_paths:
9+
# Test code legitimately uses `assert` (B101); pytest depends on it.
10+
# Library/non-test code is constrained by CLAUDE.md "no assert outside tests".
11+
- "test/**"
12+
prospector:
13+
enabled: true
14+
exclude_paths:
15+
- "test/**"
16+
17+
# Drop generated docs / build outputs from analysis entirely.
18+
exclude_paths:
19+
- "docs/build/**"
20+
- ".venv/**"
21+
- "build/**"
22+
- "dist/**"
23+
# One-off GUI smoke scripts — not library code, not pytest targets.
24+
- "test/gui_test/**"
25+
- "test/manual_test/**"

README.md

Lines changed: 259 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -18,15 +18,23 @@
1818
- [Installation](#installation)
1919
- [Requirements](#requirements)
2020
- [Quick Start](#quick-start)
21-
- [API Reference](#api-reference)
2221
- [Mouse Control](#mouse-control)
2322
- [Keyboard Control](#keyboard-control)
2423
- [Image Recognition](#image-recognition)
25-
- [Screen Operations](#screen-operations)
24+
- [Accessibility Element Finder](#accessibility-element-finder)
25+
- [AI Element Locator (VLM)](#ai-element-locator-vlm)
26+
- [OCR (Text on Screen)](#ocr-text-on-screen)
27+
- [Clipboard](#clipboard)
28+
- [Screenshot](#screenshot)
2629
- [Action Recording & Playback](#action-recording--playback)
27-
- [Action Scripting (JSON Executor)](#action-scripting-json-executor)
30+
- [JSON Action Scripting](#json-action-scripting)
31+
- [Scheduler (Interval & Cron)](#scheduler-interval--cron)
32+
- [Global Hotkey Daemon](#global-hotkey-daemon)
33+
- [Event Triggers](#event-triggers)
34+
- [Run History](#run-history)
2835
- [Report Generation](#report-generation)
29-
- [Remote Automation (Socket Server)](#remote-automation-socket-server)
36+
- [Remote Automation (Socket / REST)](#remote-automation-socket--rest)
37+
- [Plugin Loader](#plugin-loader)
3038
- [Shell Command Execution](#shell-command-execution)
3139
- [Screen Recording](#screen-recording)
3240
- [Callback Executor](#callback-executor)
@@ -46,17 +54,27 @@
4654
- **Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control
4755
- **Keyboard Automation** — press/release individual keys, type strings, hotkey combinations, key state detection
4856
- **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
57+
- **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
58+
- **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
59+
- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text
60+
- **Clipboard** — read/write system clipboard text on Windows, macOS, and Linux
4961
- **Screenshot & Screen Recording** — capture full screen or regions as images, record screen to video (AVI/MP4)
5062
- **Action Recording & Playback** — record mouse/keyboard events and replay them
51-
- **JSON-Based Action Scripting** — define and execute automation flows using JSON action files
63+
- **JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug)
64+
- **Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts
65+
- **Global Hotkey Daemon** — bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
66+
- **Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
67+
- **Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
5268
- **Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
53-
- **Remote Automation** — start a TCP socket server to receive and execute automation commands from remote clients
69+
- **Remote Automation** — TCP socket server **and** REST API server to receive automation commands
70+
- **Plugin Loader** — drop `.py` files exposing `AC_*` callables into a directory and register them as executor commands at runtime
5471
- **Shell Integration** — execute shell commands within automation workflows with async output capture
5572
- **Callback Executor** — trigger automation functions with callback hooks for chaining operations
5673
- **Dynamic Package Loading** — extend the executor at runtime by importing external Python packages
5774
- **Project & Template Management** — scaffold automation projects with keyword/executor directory structure
5875
- **Window Management** — send keyboard/mouse events directly to specific windows (Windows/Linux)
59-
- **GUI Application** — built-in PySide6 graphical interface for interactive automation
76+
- **GUI Application** — built-in PySide6 graphical interface with live language switching (English / 繁體中文 / 简体中文 / 日本語)
77+
- **CLI Runner**`python -m je_auto_control.cli run|list-jobs|start-server|start-rest`
6078
- **Cross-Platform** — unified API across Windows, macOS, and Linux (X11)
6179

6280
---
@@ -80,10 +98,23 @@ je_auto_control/
8098
├── executor/ # JSON action executor engine
8199
├── callback/ # Callback function executor
82100
├── cv2_utils/ # OpenCV screenshot, template matching, video recording
101+
├── accessibility/ # UIA (Windows) / AX (macOS) element finder
102+
├── vision/ # VLM-based locator (Anthropic / OpenAI backends)
103+
├── ocr/ # Tesseract-backed text locator
104+
├── clipboard/ # Cross-platform clipboard
105+
├── scheduler/ # Interval + cron scheduler
106+
├── hotkey/ # Global hotkey daemon
107+
├── triggers/ # Image/window/pixel/file triggers
108+
├── run_history/ # SQLite run log + error-screenshot artifacts
109+
├── rest_api/ # Stdlib HTTP/REST server
110+
├── plugin_loader/ # Dynamic AC_* plugin discovery
83111
├── socket_server/ # TCP socket server for remote automation
84112
├── shell_process/ # Shell command manager
85113
├── generate_report/ # HTML / JSON / XML report generators
86114
├── test_record/ # Test action recording
115+
├── script_vars/ # Script variable interpolation
116+
├── watcher/ # Mouse / pixel / log watchers (Live HUD)
117+
├── recording_edit/ # Trim, filter, re-scale recorded actions
87118
├── json/ # JSON action file read/write
88119
├── project/ # Project scaffolding & templates
89120
├── package_manager/ # Dynamic package loading
@@ -135,6 +166,13 @@ sudo apt-get install cmake libssl-dev
135166
| `python-Xlib` | Linux X11 backend (auto-installed on Linux) |
136167
| `PySide6` | GUI application (optional, install with `[gui]`) |
137168
| `qt-material` | GUI theme (optional, install with `[gui]`) |
169+
| `uiautomation` | Windows accessibility backend (optional, loaded on demand) |
170+
| `pytesseract` + Tesseract | OCR engine (optional, loaded on demand) |
171+
| `anthropic` | VLM locator — Anthropic backend (optional, loaded on demand) |
172+
| `openai` | VLM locator — OpenAI backend (optional, loaded on demand) |
173+
174+
See [Third_Party_License.md](Third_Party_License.md) for a full list of
175+
third-party components and their licenses.
138176

139177
---
140178

@@ -197,6 +235,95 @@ print(f"Found at: ({cx}, {cy})")
197235
je_auto_control.locate_and_click("submit_button.png", mouse_keycode="mouse_left")
198236
```
199237

238+
### Accessibility Element Finder
239+
240+
Query the OS accessibility tree to locate controls by name, role, or app.
241+
Works on Windows (UIA, via `uiautomation`) and macOS (AX).
242+
243+
```python
244+
import je_auto_control
245+
246+
# List all visible buttons in the Calculator app
247+
elements = je_auto_control.list_accessibility_elements(app_name="Calculator")
248+
249+
# Find a specific element
250+
ok = je_auto_control.find_accessibility_element(name="OK", role="Button")
251+
if ok is not None:
252+
print(ok.bounds, ok.center)
253+
254+
# Click it directly
255+
je_auto_control.click_accessibility_element(name="OK", app_name="Calculator")
256+
```
257+
258+
Raises `AccessibilityNotAvailableError` if no accessibility backend is
259+
installed for the current platform.
260+
261+
### AI Element Locator (VLM)
262+
263+
When template matching and accessibility both fail, describe the element
264+
in plain language and let a vision-language model find its coordinates.
265+
266+
```python
267+
import je_auto_control
268+
269+
# Uses Anthropic by default if ANTHROPIC_API_KEY is set, else OpenAI.
270+
x, y = je_auto_control.locate_by_description("the green Submit button")
271+
272+
# Or click it in one shot
273+
je_auto_control.click_by_description(
274+
"the cookie-banner 'Accept all' button",
275+
screen_region=[0, 800, 1920, 1080], # optional crop
276+
)
277+
```
278+
279+
Configuration (environment variables only — keys are never persisted or
280+
logged):
281+
282+
| Variable | Effect |
283+
|---|---|
284+
| `ANTHROPIC_API_KEY` | Enables the Anthropic backend |
285+
| `OPENAI_API_KEY` | Enables the OpenAI backend |
286+
| `AUTOCONTROL_VLM_BACKEND` | `anthropic` or `openai` to force a backend |
287+
| `AUTOCONTROL_VLM_MODEL` | Override the default model (e.g. `claude-opus-4-7`, `gpt-4o-mini`) |
288+
289+
Raises `VLMNotAvailableError` if neither SDK is installed or no API key
290+
is set.
291+
292+
### OCR (Text on Screen)
293+
294+
```python
295+
import je_auto_control as ac
296+
297+
# Locate all matches of a piece of text
298+
matches = ac.find_text_matches("Submit")
299+
300+
# Center of the first match, or None
301+
cx, cy = ac.locate_text_center("Submit")
302+
303+
# Click text in one call
304+
ac.click_text("Submit")
305+
306+
# Block until text appears (or timeout)
307+
ac.wait_for_text("Loading complete", timeout=15.0)
308+
```
309+
310+
If Tesseract is not on `PATH`, point at it explicitly:
311+
312+
```python
313+
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
314+
```
315+
316+
### Clipboard
317+
318+
```python
319+
import je_auto_control as ac
320+
ac.set_clipboard("hello")
321+
text = ac.get_clipboard()
322+
```
323+
324+
Backends: Windows (Win32 via `ctypes`), macOS (`pbcopy`/`pbpaste`),
325+
Linux (`xclip` or `xsel`).
326+
200327
### Screenshot
201328

202329
```python
@@ -270,13 +397,84 @@ je_auto_control.execute_action([
270397
| Keyboard | `AC_type_keyboard`, `AC_press_keyboard_key`, `AC_release_keyboard_key`, `AC_write`, `AC_hotkey`, `AC_check_key_is_press` |
271398
| Image | `AC_locate_all_image`, `AC_locate_image_center`, `AC_locate_and_click` |
272399
| Screen | `AC_screen_size`, `AC_screenshot` |
400+
| Accessibility | `AC_a11y_list`, `AC_a11y_find`, `AC_a11y_click` |
401+
| VLM (AI Locator) | `AC_vlm_locate`, `AC_vlm_click` |
402+
| OCR | `AC_locate_text`, `AC_click_text`, `AC_wait_text` |
403+
| Clipboard | `AC_clipboard_get`, `AC_clipboard_set` |
273404
| Record | `AC_record`, `AC_stop_record` |
274405
| Report | `AC_generate_html`, `AC_generate_json`, `AC_generate_xml`, `AC_generate_html_report`, `AC_generate_json_report`, `AC_generate_xml_report` |
275406
| Project | `AC_create_project` |
276407
| Shell | `AC_shell_command` |
277408
| Process | `AC_execute_process` |
278409
| Executor | `AC_execute_action`, `AC_execute_files` |
279410

411+
### Scheduler (Interval & Cron)
412+
413+
```python
414+
import je_auto_control as ac
415+
416+
# Interval job — run every 30 seconds
417+
job = ac.default_scheduler.add_job(
418+
script_path="scripts/poll.json", interval_seconds=30, repeat=True,
419+
)
420+
421+
# Cron job — 09:00 on weekdays (minute hour dom month dow)
422+
cron_job = ac.default_scheduler.add_cron_job(
423+
script_path="scripts/daily.json", cron_expression="0 9 * * 1-5",
424+
)
425+
426+
ac.default_scheduler.start()
427+
```
428+
429+
Both flavours coexist; `job.is_cron` tells them apart.
430+
431+
### Global Hotkey Daemon
432+
433+
Bind OS-level hotkeys to action JSON scripts (Windows backend today;
434+
macOS / Linux raise `NotImplementedError` on `start()` with Strategy-
435+
pattern seams in place).
436+
437+
```python
438+
from je_auto_control import default_hotkey_daemon
439+
440+
default_hotkey_daemon.bind("ctrl+alt+1", "scripts/greet.json")
441+
default_hotkey_daemon.start()
442+
```
443+
444+
### Event Triggers
445+
446+
Poll-based triggers that fire a script when a condition becomes true:
447+
448+
```python
449+
from je_auto_control import (
450+
default_trigger_engine, ImageAppearsTrigger,
451+
WindowAppearsTrigger, PixelColorTrigger, FilePathTrigger,
452+
)
453+
454+
default_trigger_engine.add(ImageAppearsTrigger(
455+
trigger_id="", script_path="scripts/click_ok.json",
456+
image_path="templates/ok_button.png", threshold=0.85, repeat=True,
457+
))
458+
default_trigger_engine.start()
459+
```
460+
461+
### Run History
462+
463+
Every run from the scheduler, trigger engine, hotkey daemon, REST API,
464+
and manual GUI replay is recorded to `~/.je_auto_control/history.db`.
465+
Errors automatically attach a screenshot under
466+
`~/.je_auto_control/artifacts/run_{id}_{ms}.png` for post-mortem.
467+
468+
```python
469+
from je_auto_control import default_history_store
470+
471+
for run in default_history_store.list_runs(limit=20):
472+
print(run.id, run.source, run.status, run.artifact_path)
473+
```
474+
475+
The GUI **Run History** tab exposes filter/refresh/clear and
476+
double-click-to-open on the artifact column.
477+
280478
### Report Generation
281479

282480
```python
@@ -302,19 +500,24 @@ xml_string = je_auto_control.generate_xml()
302500

303501
Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
304502

305-
### Remote Automation (Socket Server)
503+
### Remote Automation (Socket / REST)
306504

307-
Start a TCP server to receive JSON automation commands from remote clients:
505+
Two servers are available — a raw TCP socket and a stdlib HTTP/REST
506+
server. Both default to `127.0.0.1`; binding to `0.0.0.0` is an explicit,
507+
documented opt-in.
308508

309509
```python
310-
import je_auto_control
510+
import je_auto_control as ac
311511

312-
# Start the server (default: localhost:9938)
313-
server = je_auto_control.start_autocontrol_socket_server(host="localhost", port=9938)
512+
# TCP socket server (default: 127.0.0.1:9938)
513+
ac.start_autocontrol_socket_server(host="127.0.0.1", port=9938)
314514

315-
# The server runs in a background thread
316-
# Send JSON action commands via TCP to execute remotely
317-
# Send "quit_server" to shut down
515+
# REST API server (default: 127.0.0.1:9939)
516+
ac.start_rest_api_server(host="127.0.0.1", port=9939)
517+
# Endpoints:
518+
# GET /health liveness probe
519+
# GET /jobs scheduler job list
520+
# POST /execute body: {"actions": [...]}
318521
```
319522

320523
Client example:
@@ -339,6 +542,26 @@ print(response)
339542
sock.close()
340543
```
341544

545+
### Plugin Loader
546+
547+
Drop `.py` files defining top-level `AC_*` callables into a directory,
548+
then register them as executor commands at runtime:
549+
550+
```python
551+
from je_auto_control import (
552+
load_plugin_directory, register_plugin_commands,
553+
)
554+
555+
commands = load_plugin_directory("./my_plugins")
556+
register_plugin_commands(commands)
557+
558+
# Now usable from any JSON action script:
559+
# [["AC_greet", {"name": "world"}]]
560+
```
561+
562+
> **Warning:** Plugin files execute arbitrary Python on load. Only load
563+
> from directories you control.
564+
342565
### Shell Command Execution
343566

344567
```python
@@ -497,6 +720,24 @@ python -m je_auto_control --execute_str '[["AC_screenshot", {"file_path": "test.
497720
python -m je_auto_control -c ./my_project
498721
```
499722

723+
A richer subcommand CLI built on the headless APIs:
724+
725+
```bash
726+
# Run a script, optionally with variables, and/or a dry-run
727+
python -m je_auto_control.cli run script.json
728+
python -m je_auto_control.cli run script.json --var name=alice --dry-run
729+
730+
# List scheduler jobs
731+
python -m je_auto_control.cli list-jobs
732+
733+
# Start the socket or REST server
734+
python -m je_auto_control.cli start-server --port 9938
735+
python -m je_auto_control.cli start-rest --port 9939
736+
```
737+
738+
`--var name=value` is parsed as JSON when possible (so `count=10` becomes
739+
an int), otherwise treated as a string.
740+
500741
---
501742

502743
## Platform Support
@@ -541,4 +782,6 @@ python -m pytest test/integrated_test/
541782

542783
## License
543784

544-
[MIT License](LICENSE) © JE-Chen
785+
[MIT License](LICENSE) © JE-Chen.
786+
See [Third_Party_License.md](Third_Party_License.md) for the licenses of
787+
bundled and optional third-party dependencies.

0 commit comments

Comments
 (0)