1818- [ Installation] ( #installation )
1919- [ Requirements] ( #requirements )
2020- [ Quick Start] ( #quick-start )
21- - [ API Reference] ( #api-reference )
2221 - [ Mouse Control] ( #mouse-control )
2322 - [ Keyboard Control] ( #keyboard-control )
2423 - [ Image Recognition] ( #image-recognition )
25- - [ Screen Operations] ( #screen-operations )
24+ - [ Accessibility Element Finder] ( #accessibility-element-finder )
25+ - [ AI Element Locator (VLM)] ( #ai-element-locator-vlm )
26+ - [ OCR (Text on Screen)] ( #ocr-text-on-screen )
27+ - [ Clipboard] ( #clipboard )
28+ - [ Screenshot] ( #screenshot )
2629 - [ Action Recording & Playback] ( #action-recording--playback )
27- - [ Action Scripting (JSON Executor)] ( #action-scripting-json-executor )
30+ - [ JSON Action Scripting] ( #json-action-scripting )
31+ - [ Scheduler (Interval & Cron)] ( #scheduler-interval--cron )
32+ - [ Global Hotkey Daemon] ( #global-hotkey-daemon )
33+ - [ Event Triggers] ( #event-triggers )
34+ - [ Run History] ( #run-history )
2835 - [ Report Generation] ( #report-generation )
29- - [ Remote Automation (Socket Server)] ( #remote-automation-socket-server )
36+ - [ Remote Automation (Socket / REST)] ( #remote-automation-socket--rest )
37+ - [ Plugin Loader] ( #plugin-loader )
3038 - [ Shell Command Execution] ( #shell-command-execution )
3139 - [ Screen Recording] ( #screen-recording )
3240 - [ Callback Executor] ( #callback-executor )
4654- ** Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control
4755- ** Keyboard Automation** — press/release individual keys, type strings, hotkey combinations, key state detection
4856- ** Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
57+ - ** Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
58+ - ** AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
59+ - ** OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text
60+ - ** Clipboard** — read/write system clipboard text on Windows, macOS, and Linux
4961- ** Screenshot & Screen Recording** — capture full screen or regions as images, record screen to video (AVI/MP4)
5062- ** Action Recording & Playback** — record mouse/keyboard events and replay them
51- - ** JSON-Based Action Scripting** — define and execute automation flows using JSON action files
63+ - ** JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug)
64+ - ** Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts
65+ - ** Global Hotkey Daemon** — bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
66+ - ** Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
67+ - ** Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
5268- ** Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
53- - ** Remote Automation** — start a TCP socket server to receive and execute automation commands from remote clients
69+ - ** Remote Automation** — TCP socket server ** and** REST API server to receive automation commands
70+ - ** Plugin Loader** — drop ` .py ` files exposing ` AC_* ` callables into a directory and register them as executor commands at runtime
5471- ** Shell Integration** — execute shell commands within automation workflows with async output capture
5572- ** Callback Executor** — trigger automation functions with callback hooks for chaining operations
5673- ** Dynamic Package Loading** — extend the executor at runtime by importing external Python packages
5774- ** Project & Template Management** — scaffold automation projects with keyword/executor directory structure
5875- ** Window Management** — send keyboard/mouse events directly to specific windows (Windows/Linux)
59- - ** GUI Application** — built-in PySide6 graphical interface for interactive automation
76+ - ** GUI Application** — built-in PySide6 graphical interface with live language switching (English / 繁體中文 / 简体中文 / 日本語)
77+ - ** CLI Runner** — ` python -m je_auto_control.cli run|list-jobs|start-server|start-rest `
6078- ** Cross-Platform** — unified API across Windows, macOS, and Linux (X11)
6179
6280---
@@ -80,10 +98,23 @@ je_auto_control/
8098 ├── executor/ # JSON action executor engine
8199 ├── callback/ # Callback function executor
82100 ├── cv2_utils/ # OpenCV screenshot, template matching, video recording
101+ ├── accessibility/ # UIA (Windows) / AX (macOS) element finder
102+ ├── vision/ # VLM-based locator (Anthropic / OpenAI backends)
103+ ├── ocr/ # Tesseract-backed text locator
104+ ├── clipboard/ # Cross-platform clipboard
105+ ├── scheduler/ # Interval + cron scheduler
106+ ├── hotkey/ # Global hotkey daemon
107+ ├── triggers/ # Image/window/pixel/file triggers
108+ ├── run_history/ # SQLite run log + error-screenshot artifacts
109+ ├── rest_api/ # Stdlib HTTP/REST server
110+ ├── plugin_loader/ # Dynamic AC_* plugin discovery
83111 ├── socket_server/ # TCP socket server for remote automation
84112 ├── shell_process/ # Shell command manager
85113 ├── generate_report/ # HTML / JSON / XML report generators
86114 ├── test_record/ # Test action recording
115+ ├── script_vars/ # Script variable interpolation
116+ ├── watcher/ # Mouse / pixel / log watchers (Live HUD)
117+ ├── recording_edit/ # Trim, filter, re-scale recorded actions
87118 ├── json/ # JSON action file read/write
88119 ├── project/ # Project scaffolding & templates
89120 ├── package_manager/ # Dynamic package loading
@@ -135,6 +166,13 @@ sudo apt-get install cmake libssl-dev
135166| ` python-Xlib ` | Linux X11 backend (auto-installed on Linux) |
136167| ` PySide6 ` | GUI application (optional, install with ` [gui] ` ) |
137168| ` qt-material ` | GUI theme (optional, install with ` [gui] ` ) |
169+ | ` uiautomation ` | Windows accessibility backend (optional, loaded on demand) |
170+ | ` pytesseract ` + Tesseract | OCR engine (optional, loaded on demand) |
171+ | ` anthropic ` | VLM locator — Anthropic backend (optional, loaded on demand) |
172+ | ` openai ` | VLM locator — OpenAI backend (optional, loaded on demand) |
173+
174+ See [ Third_Party_License.md] ( Third_Party_License.md ) for a full list of
175+ third-party components and their licenses.
138176
139177---
140178
@@ -197,6 +235,95 @@ print(f"Found at: ({cx}, {cy})")
197235je_auto_control.locate_and_click(" submit_button.png" , mouse_keycode = " mouse_left" )
198236```
199237
238+ ### Accessibility Element Finder
239+
240+ Query the OS accessibility tree to locate controls by name, role, or app.
241+ Works on Windows (UIA, via ` uiautomation ` ) and macOS (AX).
242+
243+ ``` python
244+ import je_auto_control
245+
246+ # List all visible buttons in the Calculator app
247+ elements = je_auto_control.list_accessibility_elements(app_name = " Calculator" )
248+
249+ # Find a specific element
250+ ok = je_auto_control.find_accessibility_element(name = " OK" , role = " Button" )
251+ if ok is not None :
252+ print (ok.bounds, ok.center)
253+
254+ # Click it directly
255+ je_auto_control.click_accessibility_element(name = " OK" , app_name = " Calculator" )
256+ ```
257+
258+ Raises ` AccessibilityNotAvailableError ` if no accessibility backend is
259+ installed for the current platform.
260+
261+ ### AI Element Locator (VLM)
262+
263+ When template matching and accessibility both fail, describe the element
264+ in plain language and let a vision-language model find its coordinates.
265+
266+ ``` python
267+ import je_auto_control
268+
269+ # Uses Anthropic by default if ANTHROPIC_API_KEY is set, else OpenAI.
270+ x, y = je_auto_control.locate_by_description(" the green Submit button" )
271+
272+ # Or click it in one shot
273+ je_auto_control.click_by_description(
274+ " the cookie-banner 'Accept all' button" ,
275+ screen_region = [0 , 800 , 1920 , 1080 ], # optional crop
276+ )
277+ ```
278+
279+ Configuration (environment variables only — keys are never persisted or
280+ logged):
281+
282+ | Variable | Effect |
283+ | ---| ---|
284+ | ` ANTHROPIC_API_KEY ` | Enables the Anthropic backend |
285+ | ` OPENAI_API_KEY ` | Enables the OpenAI backend |
286+ | ` AUTOCONTROL_VLM_BACKEND ` | ` anthropic ` or ` openai ` to force a backend |
287+ | ` AUTOCONTROL_VLM_MODEL ` | Override the default model (e.g. ` claude-opus-4-7 ` , ` gpt-4o-mini ` ) |
288+
289+ Raises ` VLMNotAvailableError ` if neither SDK is installed or no API key
290+ is set.
291+
292+ ### OCR (Text on Screen)
293+
294+ ``` python
295+ import je_auto_control as ac
296+
297+ # Locate all matches of a piece of text
298+ matches = ac.find_text_matches(" Submit" )
299+
300+ # Center of the first match, or None
301+ cx, cy = ac.locate_text_center(" Submit" )
302+
303+ # Click text in one call
304+ ac.click_text(" Submit" )
305+
306+ # Block until text appears (or timeout)
307+ ac.wait_for_text(" Loading complete" , timeout = 15.0 )
308+ ```
309+
310+ If Tesseract is not on ` PATH ` , point at it explicitly:
311+
312+ ``` python
313+ ac.set_tesseract_cmd(r " C:\P rogram Files\T esseract-OCR\t esseract. exe" )
314+ ```
315+
316+ ### Clipboard
317+
318+ ``` python
319+ import je_auto_control as ac
320+ ac.set_clipboard(" hello" )
321+ text = ac.get_clipboard()
322+ ```
323+
324+ Backends: Windows (Win32 via ` ctypes ` ), macOS (` pbcopy ` /` pbpaste ` ),
325+ Linux (` xclip ` or ` xsel ` ).
326+
200327### Screenshot
201328
202329``` python
@@ -270,13 +397,84 @@ je_auto_control.execute_action([
270397| Keyboard | ` AC_type_keyboard ` , ` AC_press_keyboard_key ` , ` AC_release_keyboard_key ` , ` AC_write ` , ` AC_hotkey ` , ` AC_check_key_is_press ` |
271398| Image | ` AC_locate_all_image ` , ` AC_locate_image_center ` , ` AC_locate_and_click ` |
272399| Screen | ` AC_screen_size ` , ` AC_screenshot ` |
400+ | Accessibility | ` AC_a11y_list ` , ` AC_a11y_find ` , ` AC_a11y_click ` |
401+ | VLM (AI Locator) | ` AC_vlm_locate ` , ` AC_vlm_click ` |
402+ | OCR | ` AC_locate_text ` , ` AC_click_text ` , ` AC_wait_text ` |
403+ | Clipboard | ` AC_clipboard_get ` , ` AC_clipboard_set ` |
273404| Record | ` AC_record ` , ` AC_stop_record ` |
274405| Report | ` AC_generate_html ` , ` AC_generate_json ` , ` AC_generate_xml ` , ` AC_generate_html_report ` , ` AC_generate_json_report ` , ` AC_generate_xml_report ` |
275406| Project | ` AC_create_project ` |
276407| Shell | ` AC_shell_command ` |
277408| Process | ` AC_execute_process ` |
278409| Executor | ` AC_execute_action ` , ` AC_execute_files ` |
279410
411+ ### Scheduler (Interval & Cron)
412+
413+ ``` python
414+ import je_auto_control as ac
415+
416+ # Interval job — run every 30 seconds
417+ job = ac.default_scheduler.add_job(
418+ script_path = " scripts/poll.json" , interval_seconds = 30 , repeat = True ,
419+ )
420+
421+ # Cron job — 09:00 on weekdays (minute hour dom month dow)
422+ cron_job = ac.default_scheduler.add_cron_job(
423+ script_path = " scripts/daily.json" , cron_expression = " 0 9 * * 1-5" ,
424+ )
425+
426+ ac.default_scheduler.start()
427+ ```
428+
429+ Both flavours coexist; ` job.is_cron ` tells them apart.
430+
431+ ### Global Hotkey Daemon
432+
433+ Bind OS-level hotkeys to action JSON scripts (Windows backend today;
434+ macOS / Linux raise ` NotImplementedError ` on ` start() ` with Strategy-
435+ pattern seams in place).
436+
437+ ``` python
438+ from je_auto_control import default_hotkey_daemon
439+
440+ default_hotkey_daemon.bind(" ctrl+alt+1" , " scripts/greet.json" )
441+ default_hotkey_daemon.start()
442+ ```
443+
444+ ### Event Triggers
445+
446+ Poll-based triggers that fire a script when a condition becomes true:
447+
448+ ``` python
449+ from je_auto_control import (
450+ default_trigger_engine, ImageAppearsTrigger,
451+ WindowAppearsTrigger, PixelColorTrigger, FilePathTrigger,
452+ )
453+
454+ default_trigger_engine.add(ImageAppearsTrigger(
455+ trigger_id = " " , script_path = " scripts/click_ok.json" ,
456+ image_path = " templates/ok_button.png" , threshold = 0.85 , repeat = True ,
457+ ))
458+ default_trigger_engine.start()
459+ ```
460+
461+ ### Run History
462+
463+ Every run from the scheduler, trigger engine, hotkey daemon, REST API,
464+ and manual GUI replay is recorded to ` ~/.je_auto_control/history.db ` .
465+ Errors automatically attach a screenshot under
466+ ` ~/.je_auto_control/artifacts/run_{id}_{ms}.png ` for post-mortem.
467+
468+ ``` python
469+ from je_auto_control import default_history_store
470+
471+ for run in default_history_store.list_runs(limit = 20 ):
472+ print (run.id, run.source, run.status, run.artifact_path)
473+ ```
474+
475+ The GUI ** Run History** tab exposes filter/refresh/clear and
476+ double-click-to-open on the artifact column.
477+
280478### Report Generation
281479
282480``` python
@@ -302,19 +500,24 @@ xml_string = je_auto_control.generate_xml()
302500
303501Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
304502
305- ### Remote Automation (Socket Server )
503+ ### Remote Automation (Socket / REST )
306504
307- Start a TCP server to receive JSON automation commands from remote clients:
505+ Two servers are available — a raw TCP socket and a stdlib HTTP/REST
506+ server. Both default to ` 127.0.0.1 ` ; binding to ` 0.0.0.0 ` is an explicit,
507+ documented opt-in.
308508
309509``` python
310- import je_auto_control
510+ import je_auto_control as ac
311511
312- # Start the server (default: localhost :9938)
313- server = je_auto_control .start_autocontrol_socket_server(host = " localhost " , port = 9938 )
512+ # TCP socket server (default: 127.0.0.1 :9938)
513+ ac .start_autocontrol_socket_server(host = " 127.0.0.1 " , port = 9938 )
314514
315- # The server runs in a background thread
316- # Send JSON action commands via TCP to execute remotely
317- # Send "quit_server" to shut down
515+ # REST API server (default: 127.0.0.1:9939)
516+ ac.start_rest_api_server(host = " 127.0.0.1" , port = 9939 )
517+ # Endpoints:
518+ # GET /health liveness probe
519+ # GET /jobs scheduler job list
520+ # POST /execute body: {"actions": [...]}
318521```
319522
320523Client example:
@@ -339,6 +542,26 @@ print(response)
339542sock.close()
340543```
341544
545+ ### Plugin Loader
546+
547+ Drop ` .py ` files defining top-level ` AC_* ` callables into a directory,
548+ then register them as executor commands at runtime:
549+
550+ ``` python
551+ from je_auto_control import (
552+ load_plugin_directory, register_plugin_commands,
553+ )
554+
555+ commands = load_plugin_directory(" ./my_plugins" )
556+ register_plugin_commands(commands)
557+
558+ # Now usable from any JSON action script:
559+ # [["AC_greet", {"name": "world"}]]
560+ ```
561+
562+ > ** Warning:** Plugin files execute arbitrary Python on load. Only load
563+ > from directories you control.
564+
342565### Shell Command Execution
343566
344567``` python
@@ -497,6 +720,24 @@ python -m je_auto_control --execute_str '[["AC_screenshot", {"file_path": "test.
497720python -m je_auto_control -c ./my_project
498721```
499722
723+ A richer subcommand CLI built on the headless APIs:
724+
725+ ``` bash
726+ # Run a script, optionally with variables, and/or a dry-run
727+ python -m je_auto_control.cli run script.json
728+ python -m je_auto_control.cli run script.json --var name=alice --dry-run
729+
730+ # List scheduler jobs
731+ python -m je_auto_control.cli list-jobs
732+
733+ # Start the socket or REST server
734+ python -m je_auto_control.cli start-server --port 9938
735+ python -m je_auto_control.cli start-rest --port 9939
736+ ```
737+
738+ ` --var name=value ` is parsed as JSON when possible (so ` count=10 ` becomes
739+ an int), otherwise treated as a string.
740+
500741---
501742
502743## Platform Support
@@ -541,4 +782,6 @@ python -m pytest test/integrated_test/
541782
542783## License
543784
544- [ MIT License] ( LICENSE ) © JE-Chen
785+ [ MIT License] ( LICENSE ) © JE-Chen.
786+ See [ Third_Party_License.md] ( Third_Party_License.md ) for the licenses of
787+ bundled and optional third-party dependencies.
0 commit comments