Skip to content

[codex] Add onboarding system scan#4

Merged
James-Chapman merged 8 commits into
masterfrom
codex-onboarding-system-scan-index
May 20, 2026
Merged

[codex] Add onboarding system scan#4
James-Chapman merged 8 commits into
masterfrom
codex-onboarding-system-scan-index

Conversation

@Jeremy8776
Copy link
Copy Markdown
Owner

@Jeremy8776 Jeremy8776 commented May 19, 2026

Summary

  • Reworks first-run onboarding into a production system-scan flow for AI tools, IDEs, skill sources, configs, rules, MCP servers, and follow-up opportunities.
  • Adds system-scan APIs plus modular scanner definitions/IDE probes, onboarding memory and handoff population, and UI icon/render/style modules.
  • Fixes new-user Electron profile/window reveal behavior and restores the draggable hidden-titlebar region.
  • Keeps benchmark fixtures and generated benchmark outputs out of the production PR; bench/fixtures/skills/, bench/artifacts/, and bench/data/ are ignored.

Production cleanup

  • Removed benchmark/eval harness files and fixture SKILL.md files from the PR payload.
  • Split new scanner/onboarding modules so the production files introduced by this PR stay under the 500-line soft target.
  • No benchmark fixture path remains in the PR diff.

Validation

  • npm run check
  • npm run smoke
  • npm run smoke:desktop
  • npm run smoke:onboarding (run outside sandbox for Electron profile/cache access)
  • npm run test:vectorstore

@Jeremy8776 Jeremy8776 changed the title [codex] Add onboarding system scan and retrieval benchmarks [codex] Add onboarding system scan May 19, 2026
Copy link
Copy Markdown
Collaborator

@James-Chapman James-Chapman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #4 Review — Actionable Fix List

BLOCKER 1: Synchronous I/O in scanSystem() freezes the event loop

File: system-scan.js

scanSystem() and all its helpers (probeHostDir, scanStandaloneFiles, probeIDEs, probeAIExtensions, getDriveRoots) use fs.statSync, fs.readdirSync, fs.readFileSync, fs.realpathSync exclusively. The POST /api/system/scan route in router.js calls scanSystem() directly on the main thread. On a machine with multiple drives or large directories, this blocks all other requests for seconds.

Required fix: Convert all filesystem operations in system-scan.js and system-scan-ides.js to async (fs.promises.stat, fs.promises.readdir, fs.promises.readFile, fs.promises.realpath). Use Promise.all for parallel probing. Update the router handler to await scanSystem(...).

BLOCKER 2: Drive scanning defaults to enabled — too aggressive

File: server/lib/system-scan.js, line ~scanSystem(customPaths = [], opts = {})

The destructured defaults are { skipDrives = false, skipHomedir = false, skipWorkspaces = false }, meaning drives: true in the UI's scanConfig. getDriveRoots() iterates all 26 drive letters with statSync. This will traverse network drives, slow external HDDs, and large data volumes by default.

Required fix: Change the default so drives are NOT scanned unless the user explicitly opts in. In onboarding.js, change:

let scanConfig = { drives: true, homedir: true, workspaces: true };

to:

let scanConfig = { drives: false, homedir: true, workspaces: true };

BLOCKER 3: @ts-nocheck on new files without required annotation

Files: onboarding-render.js (line 1), onboarding.js (line 1)

Both files have // @ts-nocheck without the required "— Path-A backlog: " annotation. Per AGENTS.md lint rules, bare @ts-nocheck is an eslint error.

Required fix: Either:
(A) Remove @ts-nocheck and add proper JSDoc types, or
(B) Change to: // @ts-nocheck — Path-A backlog: new onboarding modules, typing deferred to post-merge

ISSUE 4: Hybrid search weights and stemming are hardcoded and fragile

File: server/lib/vectorstore.js, functions hybridSearch, computeLexicalScore, extractQueryTerms

Magic numbers 0.6, 0.4, 0.4, 0.3, 0.3 are scattered inline with no named constants.
The naive stemming in extractQueryTerms (replace(/s
/
,


)
.
r
e
p
l
a
c
e
(
/
i
n
g
/,
′′
).replace(/ing/, '').replace(/ed$/, '')) will produce false matches: "string" becomes "str", "processed" becomes "process" (collides with the noun), "based" becomes "bas".
Fix: Extract weights to named constants at the top of the file:

const VECTOR_WEIGHT = 0.6;
const LEXICAL_WEIGHT = 0.4;
const LEXICAL_SKILL_WEIGHT = 0.4;
const LEXICAL_SECTION_WEIGHT = 0.3;
const LEXICAL_TEXT_WEIGHT = 0.3;

Replace the naive stemming with a simple suffix-stripping function that checks minimum remaining length (e.g., do not strip if result is less than 3 chars).

ISSUE 5: CDN-dependent icons with dark-only filter

File: icons.js

All icon URLs point to external CDNs (cdn.jsdelivr.net, raw.githubusercontent.com). The CSS in onboarding-hosts.css applies:

.ob-row-icon img {
filter: grayscale(1) invert(1);
}

This only works on dark backgrounds. If a light theme is ever added, icons become invisible.

Fix: Download the SVG icons into ui/assets/icons/ and reference them locally. Remove the invert(1) filter and use the icons' native colors, or make the filter conditional on a [data-theme="dark"] attribute.

ISSUE 6: Typo in function name probeIdegGroup

File: server/lib/system-scan.js, line ~function probeIdegGroup(ideList)

Should be probeIdeGroup (IDE, not IDEG). This is an exported symbol that appears in the module's internal call site.

Fix: Rename probeIdegGroup to probeIdeGroup at definition and call site.

ISSUE 7: system-scan.js at 497 lines — approaching soft limit

File: system-scan.js

The project convention is a 500-line soft limit per module. This file is at 497 lines and will grow as more hosts are added.

Fix: Extract scanStandaloneFiles() and readWorkspaces() into a separate file (e.g., server/lib/system-scan-utils.js) to keep the main module under budget.

ISSUE 8: No test coverage for system scan

Missing file: scripts/system-scan-smoke.js

The new /api/system/scan endpoint and scanSystem() function have no tests. Per AGENTS.md, tests should follow BDD GIVEN/WHEN/THEN format.

Fix: Create scripts/system-scan-smoke.js that:

GIVEN a server running on the smoke port
WHEN GET /api/system/scan is called
THEN the response contains ok: true, hosts (array), ides (array), ideExtensions (object)
AND each host has expected shape (id, label, icon, path, skills, configs, instructions, rules, mcpServers, opportunities)
ISSUE 9: Global window functions in onboarding-populate.js

File: onboarding-populate.js

populateOnboardingMemory and populateOnboardingHandoff are bare globals on window, depending on MS, apiFetch, and Toast also being globals. The rest of the UI uses IIFE modules (e.g., const Onboarding = (() => { ... })()).

Fix: Wrap in a namespace:

window.OnboardingPopulate = (() => {
// ...existing code...
return { populateMemory, populateHandoff };
})();

Then call OnboardingPopulate.populateMemory(scanResults) from onboarding.js.

MINOR ISSUES

Raw hex color in main.cjs
Line with color: '#050309' in titleBarOverlay is a raw color literal. While the CSS guard only checks .css files, this should use a token or at least a named constant.

advanceProgress() DOM manipulation during async scan
In ui/onboarding.js, advanceProgress() calls document.getElementById() during an async scan. If the user navigates away mid-scan, these silently no-op. Add a guard:

if (!mounted) return;

readWorkspaces() reads data/workspaces.json with no comment
File: server/lib/system-scan.js, readWorkspaces() function

The file data/workspaces.json does not exist in the repo. Add a comment explaining this is runtime-only:

// workspaces.json is created at runtime by the projects API; absence is expected on first run

OPPORTUNITY_FILES has null entries without explanation
File: server/lib/system-scan-definitions.js, OPPORTUNITY_FILES object

Hosts like .opencode, .continue, .roo, .antigravity, .void have null values. Add a comment:

// null = host has no standard global config file to create

Dashboard "quality gate pending" text should have a TODO
File: ui/dashboard.js, line with "— quality gate pending"

Add a TODO comment:

// TODO: Remove "quality gate pending" once Recall@8 = 1.00 benchmark passes

PRIORITY ORDER FOR FIXES

Blockers 1-3 — must fix before merge
Issue 8 — no test for new endpoint
Issues 4, 6, 7 — code quality in backend
Issues 5, 9 — UI architecture
Minor 10-14 — cleanup

James-Chapman and others added 5 commits May 19, 2026 23:21
Keep branch's expanded priority model (hard/soft/style) over master's
simplified version (hard/soft) in validation.js and config.js.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ludes()

The destructured loop variable `allowed` is already the per-section
array, so indexing it with `key` was undefined. This fixes the TS7015
type error and the runtime TypeError in validation tests.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pace, minor fixes

BLOCKER 1: Convert all sync fs operations in system-scan.js and
system-scan-ides.js to async (fs.promises.*) with Promise.all for
parallel probing. Router handler now awaits scanSystem().

BLOCKER 2: Default drives scanning to disabled (drives: false) in
onboarding.js; skipDrives defaults to true in scanSystem().

BLOCKER 3: Add required @ts-nocheck annotation to onboarding.js and
onboarding-render.js per project lint rules.

Issue 4: Extract hybrid search magic numbers (0.6, 0.4, 0.3) to named
constants. Replace naive suffix stripping with min-length guard to
avoid false matches (e.g. "string" → "str").

Issue 9: Wrap onboarding-populate.js in OnboardingPopulate namespace
with convenience shims for backward compatibility.

Minor: Use windowBackground constant in main.cjs titleBarOverlay, add
mounted guard to advanceProgress(), document null values in
OPPORTUNITY_FILES, add TODO for quality gate pending in dashboard.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…check

- Raise stripSuffix minimum length from 3 to 4 to prevent "rules"→"rul"
- Cache getDriveRoots() result to avoid double I/O when skipDrives=false
- Deduplicate isDir: export from system-scan-ides, import in system-scan
- Fix typo: probeIdegGroup → probeIdeGroup
- Use explicit null check for OPPORTUNITY_FILES entries
- Add TODO documenting sync countSkillFiles/listSkillNames dependency

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@James-Chapman James-Chapman merged commit 1fea068 into master May 20, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants