-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathconfig.json.example
More file actions
97 lines (81 loc) · 6.13 KB
/
Copy pathconfig.json.example
File metadata and controls
97 lines (81 loc) · 6.13 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
{
"_about": "Copy this file to config.json and edit for your machine. config.json is in .gitignore. All sections are optional; OSScreenObserver falls back to reasonable defaults when a key is missing. Keys starting with an underscore are documentation-only and ignored at load time.",
"_web_ui": "HTTP inspector server. Bind to 127.0.0.1 unless you are deliberately exposing it to your LAN. Disable debug in shared environments.",
"web_ui": {
"host": "127.0.0.1",
"port": 5001,
"debug": false
},
"_mcp": "Identifiers reported over the MCP framing channel. Bump 'version' when you publish a breaking change to a custom fork.",
"mcp": {
"server_name": "os-screen-observer",
"version": "0.1.0"
},
"_ocr": "Tesseract pipeline used by /api/sketch?ocr=1 and the description endpoints. Set 'enabled' to false to skip OCR entirely. 'tesseract_cmd' is the absolute path to the tesseract binary; required on Windows (where the installer does not add it to PATH) and ignored on systems where 'tesseract' is on PATH. 'min_confidence' is the floor (0-100) for word-level confidence. The fidelity knobs ('preprocess', 'upscale', 'psm_window', 'psm_roi', 'max_roi_crops') are consumed by ascii_renderer.py. 'backend' is reserved for future engines (paddle, easyocr); 'tesseract' is the only currently-supported value.",
"ocr": {
"enabled": true,
"tesseract_cmd": "c:\\Program Files\\Tesseract-OCR\\tesseract.exe",
"min_confidence": 30,
"preprocess": true,
"upscale": 2,
"psm_window": 11,
"psm_roi": 7,
"max_roi_crops": 40,
"backend": "tesseract"
},
"_vlm": "Vision-LLM modality reached through any OpenAI-compatible chat-completions endpoint. Common values for 'base_url': 'http://localhost:3000' (OpenWebUI), 'http://localhost:11434' (Ollama direct — use a vision model such as 'qwen2.5vl:7b' for best UI/screen quality, 'llama3.2-vision:11b' for a different family, or 'minicpm-v:8b' for OCR-heavy screens). OSScreenObserver probes /api/v1/... first (OpenWebUI convention) then falls back to /v1/... (Ollama/OpenAI convention) automatically. 'base_url' should NOT include a trailing path component. 'api_key' may be left null when the endpoint accepts $OWUI_API_KEY from the environment. Pick a 'model' interactively via `python main.py --mode inspect`, or set it here directly. Two operating modes — 'single' sends one screenshot + grounded prompt and returns the raw response; 'multipass' runs a three-pass scene→controls→actions pipeline (plus an optional verify pass) and returns a structured JSON envelope. The ground_with_* flags include the accessibility tree, OCR text, ASCII sketch, and focused-element hint as in-context ground truth alongside the screenshot, gated independently and silently omitted on failure.",
"vlm": {
"enabled": true,
"base_url": "http://localhost:11434",
"api_key": null,
"model": "qwen2.5vl:7b",
"model_fast": "qwen2.5vl:3b",
"model_actions": null,
"model_verify": null,
"max_tokens": 2000,
"temperature": 0.1,
"timeout_s": 60,
"mode": "multipass",
"output_format": "json",
"ground_with_tree": true,
"ground_with_ocr": true,
"ground_with_sketch": true,
"ground_with_focus": true,
"tree_max_lines": 80,
"ocr_max_chars": 4000,
"sketch_max_chars": 6000,
"image_max_dim": 1600,
"focused_zoom_pad": 96,
"_ollama_runner": "How to invoke the Ollama CLI. Leave null and set vlm.enabled=true, then run `python main.py --mode inspect` once — OSScreenObserver will ask you interactively and save the choice here. Common values: [] to skip auto-pull; ['ollama'] for a native install; ['docker', 'exec', 'my_container', 'ollama'] for a Docker-based Ollama. On startup, OSScreenObserver checks which configured models are present in the local Ollama library and pulls any that are missing, printing pull progress to stderr.",
"ollama_runner": null,
"_prompt_legacy": "The legacy 'prompt' key remains honoured as a synonym for 'prompt_single' when mode=='single'. When unset, the built-in defaults in description.py are used; tune them per app/agent.",
"prompt_single": null,
"prompt_scene": null,
"prompt_controls": null,
"prompt_actions": null,
"prompt_verify": null
},
"_ascii_sketch": "Text-sketch renderer (ascii_renderer.py). 'grid_width'/'grid_height' control the output cell dimensions; the renderer projects the window's screen-pixel bounds into that grid. 'unicode_box' uses ┌─┐│└┘ glyphs (set false for plain ASCII +-|). The fidelity toggles are all independently switchable: 'role_glyphs' enables compact [x]/(•)/▼ control representations; 'occlusion_prune' hides siblings fully covered by later-drawn siblings (modals); 'tab_index_badges' writes ①②③ into focusable elements in DFS focus order; 'landmark_headers' bakes role+name into the top border of toolbars / dialogs / status bars; 'vlm_fallback' (off by default) lets the renderer call the VLM endpoint above to label unidentified custom widgets — set true only when you have vlm.enabled=true and accept the network cost.",
"ascii_sketch": {
"grid_width": 110,
"grid_height": 38,
"unicode_box": true,
"role_glyphs": true,
"occlusion_prune": true,
"tab_index_badges": true,
"landmark_headers": true,
"vlm_fallback": false
},
"_tree": "Accessibility-tree walk limits. 'max_depth' caps recursion so pathological apps (web views with deep DOMs) cannot blow up tree retrieval. Raise it if your target app legitimately nests deeper.",
"tree": {
"max_depth": 8
},
"_logging": "Python logging level for the OSScreenObserver loggers. Use DEBUG to trace adapter behavior, WARNING to silence routine info.",
"logging": {
"level": "INFO"
},
"_mock": "When true, all adapters are replaced with MockAdapter — useful for development on machines without a real accessibility API (e.g. headless CI). Leave false for real captures.",
"mock": false,
"_platform": "'auto' detects the host platform (Windows / macOS / Linux / WSL). Override with 'mock' to force MockAdapter, or with a specific platform name to force a particular adapter.",
"platform": "auto"
}