2222 - [ Mouse Control] ( #mouse-control )
2323 - [ Keyboard Control] ( #keyboard-control )
2424 - [ Image Recognition] ( #image-recognition )
25+ - [ Accessibility Element Finder] ( #accessibility-element-finder )
26+ - [ AI Element Locator (VLM)] ( #ai-element-locator-vlm )
27+ - [ OCR (Text on Screen)] ( #ocr-text-on-screen )
28+ - [ Clipboard] ( #clipboard )
2529 - [ Screen Operations] ( #screen-operations )
2630 - [ Action Recording & Playback] ( #action-recording--playback )
2731 - [ Action Scripting (JSON Executor)] ( #action-scripting-json-executor )
32+ - [ Scheduler (Interval & Cron)] ( #scheduler-interval--cron )
33+ - [ Global Hotkey Daemon] ( #global-hotkey-daemon )
34+ - [ Event Triggers] ( #event-triggers )
35+ - [ Run History] ( #run-history )
2836 - [ Report Generation] ( #report-generation )
29- - [ Remote Automation (Socket Server)] ( #remote-automation-socket-server )
37+ - [ Remote Automation (Socket / REST)] ( #remote-automation-socket--rest )
38+ - [ Plugin Loader] ( #plugin-loader )
3039 - [ Shell Command Execution] ( #shell-command-execution )
3140 - [ Screen Recording] ( #screen-recording )
3241 - [ Callback Executor] ( #callback-executor )
4655- ** Mouse Automation** — move, click, press, release, drag, and scroll with precise coordinate control
4756- ** Keyboard Automation** — press/release individual keys, type strings, hotkey combinations, key state detection
4857- ** Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
58+ - ** Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
59+ - ** AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
60+ - ** OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text
61+ - ** Clipboard** — read/write system clipboard text on Windows, macOS, and Linux
4962- ** Screenshot & Screen Recording** — capture full screen or regions as images, record screen to video (AVI/MP4)
5063- ** Action Recording & Playback** — record mouse/keyboard events and replay them
51- - ** JSON-Based Action Scripting** — define and execute automation flows using JSON action files
64+ - ** JSON-Based Action Scripting** — define and execute automation flows using JSON action files (dry-run + step debug)
65+ - ** Scheduler** — run scripts on an interval or cron expression; jobs persist across restarts
66+ - ** Global Hotkey Daemon** — bind OS-level hotkeys to action scripts (Windows today; macOS/Linux stubs in place)
67+ - ** Event Triggers** — fire scripts when an image appears, a window opens, a pixel changes, or a file is modified
68+ - ** Run History** — SQLite-backed run log across scheduler / triggers / hotkeys / REST with auto error-screenshot artifacts
5269- ** Report Generation** — export test records as HTML, JSON, or XML reports with success/failure status
53- - ** Remote Automation** — start a TCP socket server to receive and execute automation commands from remote clients
70+ - ** Remote Automation** — TCP socket server ** and** REST API server to receive automation commands
71+ - ** Plugin Loader** — drop ` .py ` files exposing ` AC_* ` callables into a directory and register them as executor commands at runtime
5472- ** Shell Integration** — execute shell commands within automation workflows with async output capture
5573- ** Callback Executor** — trigger automation functions with callback hooks for chaining operations
5674- ** Dynamic Package Loading** — extend the executor at runtime by importing external Python packages
5775- ** Project & Template Management** — scaffold automation projects with keyword/executor directory structure
5876- ** Window Management** — send keyboard/mouse events directly to specific windows (Windows/Linux)
59- - ** GUI Application** — built-in PySide6 graphical interface for interactive automation
77+ - ** GUI Application** — built-in PySide6 graphical interface with live language switching (English / 繁體中文 / 简体中文 / 日本語)
78+ - ** CLI Runner** — ` python -m je_auto_control.cli run|list-jobs|start-server|start-rest `
6079- ** Cross-Platform** — unified API across Windows, macOS, and Linux (X11)
6180
6281---
@@ -80,10 +99,23 @@ je_auto_control/
8099 ├── executor/ # JSON action executor engine
81100 ├── callback/ # Callback function executor
82101 ├── cv2_utils/ # OpenCV screenshot, template matching, video recording
102+ ├── accessibility/ # UIA (Windows) / AX (macOS) element finder
103+ ├── vision/ # VLM-based locator (Anthropic / OpenAI backends)
104+ ├── ocr/ # Tesseract-backed text locator
105+ ├── clipboard/ # Cross-platform clipboard
106+ ├── scheduler/ # Interval + cron scheduler
107+ ├── hotkey/ # Global hotkey daemon
108+ ├── triggers/ # Image/window/pixel/file triggers
109+ ├── run_history/ # SQLite run log + error-screenshot artifacts
110+ ├── rest_api/ # Stdlib HTTP/REST server
111+ ├── plugin_loader/ # Dynamic AC_* plugin discovery
83112 ├── socket_server/ # TCP socket server for remote automation
84113 ├── shell_process/ # Shell command manager
85114 ├── generate_report/ # HTML / JSON / XML report generators
86115 ├── test_record/ # Test action recording
116+ ├── script_vars/ # Script variable interpolation
117+ ├── watcher/ # Mouse / pixel / log watchers (Live HUD)
118+ ├── recording_edit/ # Trim, filter, re-scale recorded actions
87119 ├── json/ # JSON action file read/write
88120 ├── project/ # Project scaffolding & templates
89121 ├── package_manager/ # Dynamic package loading
@@ -135,6 +167,13 @@ sudo apt-get install cmake libssl-dev
135167| ` python-Xlib ` | Linux X11 backend (auto-installed on Linux) |
136168| ` PySide6 ` | GUI application (optional, install with ` [gui] ` ) |
137169| ` qt-material ` | GUI theme (optional, install with ` [gui] ` ) |
170+ | ` uiautomation ` | Windows accessibility backend (optional, loaded on demand) |
171+ | ` pytesseract ` + Tesseract | OCR engine (optional, loaded on demand) |
172+ | ` anthropic ` | VLM locator — Anthropic backend (optional, loaded on demand) |
173+ | ` openai ` | VLM locator — OpenAI backend (optional, loaded on demand) |
174+
175+ See [ Third_Party_License.md] ( Third_Party_License.md ) for a full list of
176+ third-party components and their licenses.
138177
139178---
140179
@@ -197,6 +236,95 @@ print(f"Found at: ({cx}, {cy})")
197236je_auto_control.locate_and_click(" submit_button.png" , mouse_keycode = " mouse_left" )
198237```
199238
239+ ### Accessibility Element Finder
240+
241+ Query the OS accessibility tree to locate controls by name, role, or app.
242+ Works on Windows (UIA, via ` uiautomation ` ) and macOS (AX).
243+
244+ ``` python
245+ import je_auto_control
246+
247+ # List all visible buttons in the Calculator app
248+ elements = je_auto_control.list_accessibility_elements(app_name = " Calculator" )
249+
250+ # Find a specific element
251+ ok = je_auto_control.find_accessibility_element(name = " OK" , role = " Button" )
252+ if ok is not None :
253+ print (ok.bounds, ok.center)
254+
255+ # Click it directly
256+ je_auto_control.click_accessibility_element(name = " OK" , app_name = " Calculator" )
257+ ```
258+
259+ Raises ` AccessibilityNotAvailableError ` if no accessibility backend is
260+ installed for the current platform.
261+
262+ ### AI Element Locator (VLM)
263+
264+ When template matching and accessibility both fail, describe the element
265+ in plain language and let a vision-language model find its coordinates.
266+
267+ ``` python
268+ import je_auto_control
269+
270+ # Uses Anthropic by default if ANTHROPIC_API_KEY is set, else OpenAI.
271+ x, y = je_auto_control.locate_by_description(" the green Submit button" )
272+
273+ # Or click it in one shot
274+ je_auto_control.click_by_description(
275+ " the cookie-banner 'Accept all' button" ,
276+ screen_region = [0 , 800 , 1920 , 1080 ], # optional crop
277+ )
278+ ```
279+
280+ Configuration (environment variables only — keys are never persisted or
281+ logged):
282+
283+ | Variable | Effect |
284+ | ---| ---|
285+ | ` ANTHROPIC_API_KEY ` | Enables the Anthropic backend |
286+ | ` OPENAI_API_KEY ` | Enables the OpenAI backend |
287+ | ` AUTOCONTROL_VLM_BACKEND ` | ` anthropic ` or ` openai ` to force a backend |
288+ | ` AUTOCONTROL_VLM_MODEL ` | Override the default model (e.g. ` claude-opus-4-7 ` , ` gpt-4o-mini ` ) |
289+
290+ Raises ` VLMNotAvailableError ` if neither SDK is installed or no API key
291+ is set.
292+
293+ ### OCR (Text on Screen)
294+
295+ ``` python
296+ import je_auto_control as ac
297+
298+ # Locate all matches of a piece of text
299+ matches = ac.find_text_matches(" Submit" )
300+
301+ # Center of the first match, or None
302+ cx, cy = ac.locate_text_center(" Submit" )
303+
304+ # Click text in one call
305+ ac.click_text(" Submit" )
306+
307+ # Block until text appears (or timeout)
308+ ac.wait_for_text(" Loading complete" , timeout = 15.0 )
309+ ```
310+
311+ If Tesseract is not on ` PATH ` , point at it explicitly:
312+
313+ ``` python
314+ ac.set_tesseract_cmd(r " C:\P rogram Files\T esseract-OCR\t esseract. exe" )
315+ ```
316+
317+ ### Clipboard
318+
319+ ``` python
320+ import je_auto_control as ac
321+ ac.set_clipboard(" hello" )
322+ text = ac.get_clipboard()
323+ ```
324+
325+ Backends: Windows (Win32 via ` ctypes ` ), macOS (` pbcopy ` /` pbpaste ` ),
326+ Linux (` xclip ` or ` xsel ` ).
327+
200328### Screenshot
201329
202330``` python
@@ -270,13 +398,84 @@ je_auto_control.execute_action([
270398| Keyboard | ` AC_type_keyboard ` , ` AC_press_keyboard_key ` , ` AC_release_keyboard_key ` , ` AC_write ` , ` AC_hotkey ` , ` AC_check_key_is_press ` |
271399| Image | ` AC_locate_all_image ` , ` AC_locate_image_center ` , ` AC_locate_and_click ` |
272400| Screen | ` AC_screen_size ` , ` AC_screenshot ` |
401+ | Accessibility | ` AC_a11y_list ` , ` AC_a11y_find ` , ` AC_a11y_click ` |
402+ | VLM (AI Locator) | ` AC_vlm_locate ` , ` AC_vlm_click ` |
403+ | OCR | ` AC_locate_text ` , ` AC_click_text ` , ` AC_wait_text ` |
404+ | Clipboard | ` AC_clipboard_get ` , ` AC_clipboard_set ` |
273405| Record | ` AC_record ` , ` AC_stop_record ` |
274406| Report | ` AC_generate_html ` , ` AC_generate_json ` , ` AC_generate_xml ` , ` AC_generate_html_report ` , ` AC_generate_json_report ` , ` AC_generate_xml_report ` |
275407| Project | ` AC_create_project ` |
276408| Shell | ` AC_shell_command ` |
277409| Process | ` AC_execute_process ` |
278410| Executor | ` AC_execute_action ` , ` AC_execute_files ` |
279411
412+ ### Scheduler (Interval & Cron)
413+
414+ ``` python
415+ import je_auto_control as ac
416+
417+ # Interval job — run every 30 seconds
418+ job = ac.default_scheduler.add_job(
419+ script_path = " scripts/poll.json" , interval_seconds = 30 , repeat = True ,
420+ )
421+
422+ # Cron job — 09:00 on weekdays (minute hour dom month dow)
423+ cron_job = ac.default_scheduler.add_cron_job(
424+ script_path = " scripts/daily.json" , cron_expression = " 0 9 * * 1-5" ,
425+ )
426+
427+ ac.default_scheduler.start()
428+ ```
429+
430+ Both flavours coexist; ` job.is_cron ` tells them apart.
431+
432+ ### Global Hotkey Daemon
433+
434+ Bind OS-level hotkeys to action JSON scripts (Windows backend today;
435+ macOS / Linux raise ` NotImplementedError ` on ` start() ` with Strategy-
436+ pattern seams in place).
437+
438+ ``` python
439+ from je_auto_control import default_hotkey_daemon
440+
441+ default_hotkey_daemon.bind(" ctrl+alt+1" , " scripts/greet.json" )
442+ default_hotkey_daemon.start()
443+ ```
444+
445+ ### Event Triggers
446+
447+ Poll-based triggers that fire a script when a condition becomes true:
448+
449+ ``` python
450+ from je_auto_control import (
451+ default_trigger_engine, ImageAppearsTrigger,
452+ WindowAppearsTrigger, PixelColorTrigger, FilePathTrigger,
453+ )
454+
455+ default_trigger_engine.add(ImageAppearsTrigger(
456+ trigger_id = " " , script_path = " scripts/click_ok.json" ,
457+ image_path = " templates/ok_button.png" , threshold = 0.85 , repeat = True ,
458+ ))
459+ default_trigger_engine.start()
460+ ```
461+
462+ ### Run History
463+
464+ Every run from the scheduler, trigger engine, hotkey daemon, REST API,
465+ and manual GUI replay is recorded to ` ~/.je_auto_control/history.db ` .
466+ Errors automatically attach a screenshot under
467+ ` ~/.je_auto_control/artifacts/run_{id}_{ms}.png ` for post-mortem.
468+
469+ ``` python
470+ from je_auto_control import default_history_store
471+
472+ for run in default_history_store.list_runs(limit = 20 ):
473+ print (run.id, run.source, run.status, run.artifact_path)
474+ ```
475+
476+ The GUI ** Run History** tab exposes filter/refresh/clear and
477+ double-click-to-open on the artifact column.
478+
280479### Report Generation
281480
282481``` python
@@ -302,19 +501,24 @@ xml_string = je_auto_control.generate_xml()
302501
303502Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.
304503
305- ### Remote Automation (Socket Server )
504+ ### Remote Automation (Socket / REST )
306505
307- Start a TCP server to receive JSON automation commands from remote clients:
506+ Two servers are available — a raw TCP socket and a stdlib HTTP/REST
507+ server. Both default to ` 127.0.0.1 ` ; binding to ` 0.0.0.0 ` is an explicit,
508+ documented opt-in.
308509
309510``` python
310- import je_auto_control
511+ import je_auto_control as ac
311512
312- # Start the server (default: localhost :9938)
313- server = je_auto_control .start_autocontrol_socket_server(host = " localhost " , port = 9938 )
513+ # TCP socket server (default: 127.0.0.1 :9938)
514+ ac .start_autocontrol_socket_server(host = " 127.0.0.1 " , port = 9938 )
314515
315- # The server runs in a background thread
316- # Send JSON action commands via TCP to execute remotely
317- # Send "quit_server" to shut down
516+ # REST API server (default: 127.0.0.1:9939)
517+ ac.start_rest_api_server(host = " 127.0.0.1" , port = 9939 )
518+ # Endpoints:
519+ # GET /health liveness probe
520+ # GET /jobs scheduler job list
521+ # POST /execute body: {"actions": [...]}
318522```
319523
320524Client example:
@@ -339,6 +543,26 @@ print(response)
339543sock.close()
340544```
341545
546+ ### Plugin Loader
547+
548+ Drop ` .py ` files defining top-level ` AC_* ` callables into a directory,
549+ then register them as executor commands at runtime:
550+
551+ ``` python
552+ from je_auto_control import (
553+ load_plugin_directory, register_plugin_commands,
554+ )
555+
556+ commands = load_plugin_directory(" ./my_plugins" )
557+ register_plugin_commands(commands)
558+
559+ # Now usable from any JSON action script:
560+ # [["AC_greet", {"name": "world"}]]
561+ ```
562+
563+ > ** Warning:** Plugin files execute arbitrary Python on load. Only load
564+ > from directories you control.
565+
342566### Shell Command Execution
343567
344568``` python
@@ -497,6 +721,24 @@ python -m je_auto_control --execute_str '[["AC_screenshot", {"file_path": "test.
497721python -m je_auto_control -c ./my_project
498722```
499723
724+ A richer subcommand CLI built on the headless APIs:
725+
726+ ``` bash
727+ # Run a script, optionally with variables, and/or a dry-run
728+ python -m je_auto_control.cli run script.json
729+ python -m je_auto_control.cli run script.json --var name=alice --dry-run
730+
731+ # List scheduler jobs
732+ python -m je_auto_control.cli list-jobs
733+
734+ # Start the socket or REST server
735+ python -m je_auto_control.cli start-server --port 9938
736+ python -m je_auto_control.cli start-rest --port 9939
737+ ```
738+
739+ ` --var name=value ` is parsed as JSON when possible (so ` count=10 ` becomes
740+ an int), otherwise treated as a string.
741+
500742---
501743
502744## Platform Support
@@ -541,4 +783,6 @@ python -m pytest test/integrated_test/
541783
542784## License
543785
544- [ MIT License] ( LICENSE ) © JE-Chen
786+ [ MIT License] ( LICENSE ) © JE-Chen.
787+ See [ Third_Party_License.md] ( Third_Party_License.md ) for the licenses of
788+ bundled and optional third-party dependencies.
0 commit comments