Skip to content

Add Amazon learning QA product#3082

Open
f0rsan wants to merge 30 commits into
tinyhumansai:mainfrom
f0rsan:codex/amazon-learning-qa-final
Open

Add Amazon learning QA product#3082
f0rsan wants to merge 30 commits into
tinyhumansai:mainfrom
f0rsan:codex/amazon-learning-qa-final

Conversation

@f0rsan
Copy link
Copy Markdown

@f0rsan f0rsan commented May 31, 2026

Summary

Adds a repository-owned Amazon learning Q&A product package under tools/amazon-learning-qa.

The package wraps the local OpenHuman amazon-learning knowledge base as a focused Q&A entry with semantic retrieval status, source citations, per-answer communication graphs, session restore, learning notes, source-scoped follow-up controls, and local product smoke/doctor commands.

Verification

  • node --test tools/amazon-learning-qa/amazon-qa-product.test.mjs tools/amazon-learning-qa/amazon-qa-ui.test.mjs tools/amazon-learning-qa/amazon-qa-e2e-smoke.test.mjs tools/amazon-learning-qa/amazon-qa-lib.test.mjs tools/amazon-learning-qa/amazon-dossier-lib.test.mjs tools/amazon-learning-qa/amazon-source-tree-lib.test.mjs
  • node tools/amazon-learning-qa/amazon-qa-product.mjs doctor --json
  • node tools/amazon-learning-qa/amazon-qa-product.mjs smoke
  • Local browser checks at desktop and mobile widths, with no console errors or horizontal overflow.

Notes

The current product is intentionally local-only because it depends on local SQLite data, Ollama, openhuman-core, and local filesystem paths. Vercel deployment would require a separate cloud architecture for storage, model serving, and the OpenHuman core service.

Summary by CodeRabbit

  • New Features

    • Local Amazon Learning Q&A tools: CLI commands for health/start/status/smoke/handoff, adversarial and e2e runners, source-tree import/ingest, drain runner, reindexing and knowledge import workflows
  • Documentation

    • Added local setup and configuration guide for the Amazon Learning Q&A workflow
  • Tests

    • Extensive test suites covering QA flows, UI contract checks, product doctor, source-tree tooling, e2e smoke and adversarial checks
  • Chores

    • Ignore local QA data/artifacts; added npm scripts for QA commands

@f0rsan f0rsan requested a review from a team May 31, 2026 15:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 31, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a local Amazon Learning Q&A product: repo setup and paths, CLI product commands (doctor/start/status/smoke/handoff), QA smoke/adversarial runners, source-tree library/import/drain runner, KB HTML→Markdown import, workflow reindexing, and broad unit/e2e/UI tests.

Changes

Amazon Learning Q&A Product Workflow

Layer / File(s) Summary
Project setup and path resolution
\.gitignore, package.json, tools/amazon-learning-qa/README.md, tools/amazon-learning-qa/amazon-qa-paths.mjs
Adds ignore patterns, npm scripts (amazon:doctor, amazon:start, amazon:smoke, amazon:test, amazon:handoff), README, and path-resolving utilities for workspace/repo/delivery roots.
QA smoke & adversarial evaluation
tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs, tools/amazon-learning-qa/amazon-qa-e2e-smoke.test.mjs, tools/amazon-learning-qa/amazon-qa-adversarial-eval.mjs
Implements smoke runner/validator, fetch helpers, CLI parsing, e2e smoke tests, and an adversarial QA evaluation script.
Product health check and management
tools/amazon-learning-qa/amazon-qa-product.mjs, tools/amazon-learning-qa/amazon-qa-product.test.mjs
buildProductDoctorReport, start/status/smoke/handoff CLI commands, Ollama/model preflight checks, git context detection, readiness polling, and handoff Markdown generation with tests.
Source tree library and calibration
tools/amazon-learning-qa/amazon-source-tree-lib.mjs, tools/amazon-learning-qa/amazon-source-tree-lib.test.mjs
Library: stable source IDs, ingest request builder, status/drain summarizers, import-plan, search-term expansion, calibration/ranking, date normalization, and tests.
Source tree manifest import CLI
tools/amazon-learning-qa/amazon-source-tree.mjs
CLI commands to report source-tree status and import manifested Markdown into OpenHuman via JSON-RPC, with dry-run and ingestion-state handling.
Source tree drain background job runner
tools/amazon-learning-qa/amazon-source-tree-drain-runner.mjs
Resumable, locked, batched runner invoking drain binary, maintaining run state/logs, querying SQLite job counts, and supporting stop/restart.
Learning dossier reindexing to workflow memory
tools/amazon-learning-qa/amazon-workflow-reindex.mjs
CLI to convert stored dossier JSON into workflow memory documents via JSON-RPC, update local index/sync metadata, and rewrite dossiers with results.
Amazon KB HTML-to-Markdown preparation and import
tools/amazon-learning-qa/openhuman-amazon-kb.mjs
prepare transforms HTML articles to per-author Markdown and manifest; import posts Markdown to OpenHuman memory_doc_put; verify runs context queries.
Dossier, UI, and test suites
tools/amazon-learning-qa/amazon-dossier-lib.test.mjs, tools/amazon-learning-qa/amazon-qa-ui.test.mjs, other tests/scripts
Extensive dossier behavioral tests, comprehensive UI regex/syntax tests, and adversarial QA evaluation script coverage.

Sequence Diagram(s)

sequenceDiagram
  participant CLI as Product CLI
  participant Server as Local Server (/api)
  participant Ollama as Ollama (local AI)
  participant OpenHuman as OpenHuman RPC
  participant DB as Local SQLite

  CLI->>Server: GET /api/status, POST /api/source-selection, POST /api/ask
  Server->>Ollama: request /api/tags or model calls
  Server->>OpenHuman: JSON-RPC (memory_doc_put, memory_tree_ingest)
  Server->>DB: query job counts / vector chunk state
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

  • Suggested reviewers:
    • graycyrus

"🐰
I hopped through files both far and near,
Built tests and tools to make QA clear,
From HTML to memory and drain-run queues,
I packed the repo with scripts and views,
A carrot-coded patch for reviewer cheer!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Amazon learning QA product' accurately describes the main change: a new Amazon learning Q&A product package at tools/amazon-learning-qa with comprehensive QA, testing, documentation, and tooling.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. working A PR that is being worked on by the team. labels May 31, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (5)
tools/amazon-learning-qa/openhuman-amazon-kb.mjs (1)

121-128: 💤 Low value

Consider static import for readdir.

readdir is dynamically imported inside listHtmlFiles but could be statically imported alongside the other node:fs/promises functions on line 3 for consistency.

♻️ Move readdir to static imports
-import { mkdir, readFile, writeFile } from "node:fs/promises";
+import { mkdir, readdir, readFile, writeFile } from "node:fs/promises";
 async function listHtmlFiles(dir) {
-  const { readdir } = await import("node:fs/promises");
   const names = await readdir(dir);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/openhuman-amazon-kb.mjs` around lines 121 - 128, The
dynamic import of readdir inside listHtmlFiles should be replaced by a static
import alongside the other node:fs/promises imports; remove the await
import("node:fs/promises") from inside listHtmlFiles and use the statically
imported readdir in that function (keep the existing filter/sort/map logic
intact) so listHtmlFiles simply calls readdir(dir) instead of dynamically
importing it at runtime.
tools/amazon-learning-qa/amazon-workflow-reindex.mjs (1)

53-59: 💤 Low value

JSON parse errors will abort the entire run.

If any dossier file contains malformed JSON, JSON.parse on line 55 throws before entering the try-catch, stopping all remaining files from being processed. Consider whether you want fail-fast behavior or graceful per-file error handling.

🛡️ Optional: wrap early parsing in try-catch for resilience
   for (const file of files) {
     const filePath = path.join(archiveDir, file);
-    const dossier = normalizeStoredDossier(JSON.parse(await readFile(filePath, "utf8")));
-    const memoryDoc = buildOpenHumanMemoryDocument(dossier, { sourceNamespace: namespace });
-    if (memoryDoc.namespace === namespace) {
-      throw new Error(`Refusing to write workflow dossier into source namespace: ${namespace}`);
-    }
-
     try {
+      const dossier = normalizeStoredDossier(JSON.parse(await readFile(filePath, "utf8")));
+      const memoryDoc = buildOpenHumanMemoryDocument(dossier, { sourceNamespace: namespace });
+      if (memoryDoc.namespace === namespace) {
+        throw new Error(`Refusing to write workflow dossier into source namespace: ${namespace}`);
+      }
       const result = await rpcCall(rpcUrl, token, "openhuman.memory_doc_put", memoryDoc);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/amazon-workflow-reindex.mjs` around lines 53 - 59,
The loop that reads each dossier file uses JSON.parse directly (inside the
for-of over files) so a malformed file will throw and abort the whole run; wrap
the per-file processing — starting from reading/parsing the file through
normalizeStoredDossier and buildOpenHumanMemoryDocument — in a try/catch (around
JSON.parse and subsequent calls) so you can log the parse error (including the
filename and error) and continue to the next file instead of failing the entire
workflow; keep the existing namespace check/throw logic inside the try block so
valid logic still applies.
tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs (1)

340-349: 💤 Low value

Use pathToFileURL for the direct-execution check.

file://${process.argv[1]} only matches for plain POSIX paths; it breaks on paths containing spaces/special characters (needs percent-encoding) and on Windows. amazon-qa-product.mjs (Line 449) already uses the robust form — align this entry point with it.

♻️ Suggested change
+import { pathToFileURL } from "node:url";
+
-if (import.meta.url === `file://${process.argv[1]}`) {
+if (import.meta.url === pathToFileURL(process.argv[1] || "").href) {
   runAmazonQaSmoke(parseArgs(process.argv.slice(2)))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs` around lines 340 - 349, The
direct-execution check using string literal file://${process.argv[1]} is
fragile; replace it with a robust comparison that converts process.argv[1] to a
file URL using pathToFileURL from the 'url' module and compare that to
import.meta.url. Import pathToFileURL, call pathToFileURL(process.argv[1]).href
(or toString()) and use that in the if condition that currently guards
runAmazonQaSmoke(parseArgs(...)); this will match paths with spaces/special
characters and work on Windows.
tools/amazon-learning-qa/README.md (1)

9-14: 💤 Low value

Genericize the documented default paths.

These hardcoded /Users/yangyingjia/... paths are one developer's machine layout, not the actual default. The resolver falls back to <toolDir>/../openhuman-kb (see amazon-qa-paths.mjs Line 33), so this both leaks a username and misstates behavior. Consider using a placeholder.

📝 Suggested wording
-```text
-/Users/yangyingjia/OpenHuman/openhuman
-/Users/yangyingjia/OpenHuman/openhuman-kb
-```
+```text
+<parent>/openhuman
+<parent>/openhuman-kb
+```
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/README.md` around lines 9 - 14, Update the README
example paths to remove the hardcoded username and reflect the resolver's actual
fallback (as implemented in amazon-qa-paths.mjs) by replacing the specific
/Users/yangyingjia/... entries with generic placeholders like <parent>/openhuman
and <parent>/openhuman-kb; ensure the README text matches the resolver behavior
(the fallback to <toolDir>/../openhuman-kb) and avoid leaking any
developer-specific paths.
tools/amazon-learning-qa/amazon-source-tree-drain-runner.mjs (1)

209-214: ⚡ Quick win

Avoid hardcoding a developer-specific home path in the child PATH.

/Users/yangyingjia/.cargo/bin embeds one contributor's machine layout. On any other host this prefix silently does nothing, so if the drain binary actually depends on cargo/rust tooling on PATH, it will fail outside that machine. Prefer deriving these from the environment or amazon-qa-paths.mjs config.

♻️ Suggested approach
       env: {
         ...process.env,
-        PATH: `/Users/yangyingjia/.cargo/bin:/opt/homebrew/bin:${process.env.PATH || ""}`,
+        PATH: [
+          process.env.CARGO_BIN || path.join(process.env.HOME || "", ".cargo/bin"),
+          "/opt/homebrew/bin",
+          process.env.PATH || "",
+        ].filter(Boolean).join(":"),
         OPENHUMAN_WORKSPACE: WORKSPACE,
         RUST_LOG: process.env.RUST_LOG || "warn",
       },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/amazon-source-tree-drain-runner.mjs` around lines
209 - 214, The PATH in the env object for amazon-source-tree-drain-runner.mjs
currently hardcodes /Users/yangyingjia/.cargo/bin; change this to derive the
cargo/bin location from the environment or the shared amazon-qa-paths.mjs config
instead of embedding a developer home. Update the env -> PATH assignment used
where OPENHUMAN_WORKSPACE and WORKSPACE are set so it prepends a derived
CARGO_BIN (e.g. process.env.CARGO_HOME or path.join(process.env.HOME, '.cargo',
'bin') or a value exported from amazon-qa-paths.mjs) and falls back to
process.env.PATH when absent, ensuring portability across machines.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tools/amazon-learning-qa/amazon-dossier-lib.test.mjs`:
- Line 294: The test passes the entire dossier as the message argument to
validateLearningDossierForSave (const validation =
validateLearningDossierForSave(noEvidence, { message: noEvidence })), but the
function reads input.message.evidenceAudit.feedback; replace the second argument
with the original message object that was used to construct noEvidence so that
message.evidenceAudit is present (and update the similar incorrect call at the
other occurrence around line 813). Ensure the shape matches the server API
(message.evidenceAudit.feedback) so the needs_accepted_evidence path is
exercised correctly.

In `@tools/amazon-learning-qa/openhuman-amazon-kb.mjs`:
- Around line 257-267: The verify function calls argValue for "--query" but
doesn't validate it, so undefined may be sent to rpcCall; update verify to check
the result of argValue("--query") (in function verify) and if it's missing or
empty, print a clear usage/error message and exit (or throw) before calling
rpcCall("openhuman.memory_context_query"); ensure the validation uses the same
symbols (verify, argValue, rpcCall) and mirrors existing error/usage behavior
for other required args.

---

Nitpick comments:
In `@tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs`:
- Around line 340-349: The direct-execution check using string literal
file://${process.argv[1]} is fragile; replace it with a robust comparison that
converts process.argv[1] to a file URL using pathToFileURL from the 'url' module
and compare that to import.meta.url. Import pathToFileURL, call
pathToFileURL(process.argv[1]).href (or toString()) and use that in the if
condition that currently guards runAmazonQaSmoke(parseArgs(...)); this will
match paths with spaces/special characters and work on Windows.

In `@tools/amazon-learning-qa/amazon-source-tree-drain-runner.mjs`:
- Around line 209-214: The PATH in the env object for
amazon-source-tree-drain-runner.mjs currently hardcodes
/Users/yangyingjia/.cargo/bin; change this to derive the cargo/bin location from
the environment or the shared amazon-qa-paths.mjs config instead of embedding a
developer home. Update the env -> PATH assignment used where OPENHUMAN_WORKSPACE
and WORKSPACE are set so it prepends a derived CARGO_BIN (e.g.
process.env.CARGO_HOME or path.join(process.env.HOME, '.cargo', 'bin') or a
value exported from amazon-qa-paths.mjs) and falls back to process.env.PATH when
absent, ensuring portability across machines.

In `@tools/amazon-learning-qa/amazon-workflow-reindex.mjs`:
- Around line 53-59: The loop that reads each dossier file uses JSON.parse
directly (inside the for-of over files) so a malformed file will throw and abort
the whole run; wrap the per-file processing — starting from reading/parsing the
file through normalizeStoredDossier and buildOpenHumanMemoryDocument — in a
try/catch (around JSON.parse and subsequent calls) so you can log the parse
error (including the filename and error) and continue to the next file instead
of failing the entire workflow; keep the existing namespace check/throw logic
inside the try block so valid logic still applies.

In `@tools/amazon-learning-qa/openhuman-amazon-kb.mjs`:
- Around line 121-128: The dynamic import of readdir inside listHtmlFiles should
be replaced by a static import alongside the other node:fs/promises imports;
remove the await import("node:fs/promises") from inside listHtmlFiles and use
the statically imported readdir in that function (keep the existing
filter/sort/map logic intact) so listHtmlFiles simply calls readdir(dir) instead
of dynamically importing it at runtime.

In `@tools/amazon-learning-qa/README.md`:
- Around line 9-14: Update the README example paths to remove the hardcoded
username and reflect the resolver's actual fallback (as implemented in
amazon-qa-paths.mjs) by replacing the specific /Users/yangyingjia/... entries
with generic placeholders like <parent>/openhuman and <parent>/openhuman-kb;
ensure the README text matches the resolver behavior (the fallback to
<toolDir>/../openhuman-kb) and avoid leaking any developer-specific paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c256c04e-ca15-45c4-a330-1a2f120a9482

📥 Commits

Reviewing files that changed from the base of the PR and between 225b1da and d8a4345.

📒 Files selected for processing (22)
  • .gitignore
  • package.json
  • tools/amazon-learning-qa/README.md
  • tools/amazon-learning-qa/amazon-dossier-lib.mjs
  • tools/amazon-learning-qa/amazon-dossier-lib.test.mjs
  • tools/amazon-learning-qa/amazon-qa-adversarial-eval.mjs
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.test.mjs
  • tools/amazon-learning-qa/amazon-qa-lib.mjs
  • tools/amazon-learning-qa/amazon-qa-lib.test.mjs
  • tools/amazon-learning-qa/amazon-qa-paths.mjs
  • tools/amazon-learning-qa/amazon-qa-product.mjs
  • tools/amazon-learning-qa/amazon-qa-product.test.mjs
  • tools/amazon-learning-qa/amazon-qa-server.mjs
  • tools/amazon-learning-qa/amazon-qa-ui.html
  • tools/amazon-learning-qa/amazon-qa-ui.test.mjs
  • tools/amazon-learning-qa/amazon-source-tree-drain-runner.mjs
  • tools/amazon-learning-qa/amazon-source-tree-lib.mjs
  • tools/amazon-learning-qa/amazon-source-tree-lib.test.mjs
  • tools/amazon-learning-qa/amazon-source-tree.mjs
  • tools/amazon-learning-qa/amazon-workflow-reindex.mjs
  • tools/amazon-learning-qa/openhuman-amazon-kb.mjs

});

assert.equal(validateLearningDossierForSave(accepted, { message: SAMPLE_MESSAGE }).ok, true);
const validation = validateLearningDossierForSave(noEvidence, { message: noEvidence });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the validateLearningDossierForSave signature and all call sites
ast-grep --pattern $'validateLearningDossierForSave($_, { message: $_ })'

Repository: tinyhumansai/openhuman

Length of output: 1059


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find the implementation/signature of validateLearningDossierForSave
rg -n "validateLearningDossierForSave" -S . \
  --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' > /tmp/validate_refs.txt

echo "=== References ==="
cat /tmp/validate_refs.txt

# 2) Try to extract a nearby definition (function/const/exports)
# Narrow to likely source locations
rg -n "(function|const|export|module\\.exports).*validateLearningDossierForSave" -S . \
  --glob '!**/node_modules/**' --glob '!**/dist/**' --glob '!**/build/**' > /tmp/validate_defs.txt

echo "=== Potential definitions ==="
cat /tmp/validate_defs.txt

# 3) If we have a plausible file from defs, dump the relevant region
# (Heuristic: pick the first matching definition file)
first_def_file="$(head -n1 /tmp/validate_defs.txt | cut -d: -f1 || true)"
if [ -n "${first_def_file}" ] && [ -f "${first_def_file}" ]; then
  echo "=== Context around first definition in ${first_def_file} ==="
  # Print ~80 lines around the first match line
  def_line="$(head -n1 /tmp/validate_defs.txt | cut -d: -f2 || true)"
  start=$((def_line-40))
  end=$((def_line+80))
  sed -n "${start},${end}p" "${first_def_file}" | nl -ba | sed -e 's/^/    /'
fi

Repository: tinyhumansai/openhuman

Length of output: 2801


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Show validateLearningDossierForSave implementation
echo "=== tools/amazon-learning-qa/amazon-dossier-lib.mjs (around line 69) ==="
sed -n '50,140p' tools/amazon-learning-qa/amazon-dossier-lib.mjs | nl -ba | head -n 200

# Show the call site in amazon-qa-server.mjs (around line 2944)
echo "=== tools/amazon-learning-qa/amazon-qa-server.mjs (around line 2944) ==="
sed -n '2920,2995p' tools/amazon-learning-qa/amazon-qa-server.mjs | nl -ba | head -n 200

# Confirm other patterns of { message: ... } in this repo (limit output)
echo "=== Other validateLearningDossierForSave call patterns ==="
rg -n "validateLearningDossierForSave\\(.*\\{\\s*message:" tools/amazon-learning-qa -S

Repository: tinyhumansai/openhuman

Length of output: 200


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== tools/amazon-learning-qa/amazon-dossier-lib.mjs (around export function) ==="
sed -n '1,120p' tools/amazon-learning-qa/amazon-dossier-lib.mjs | cat -n

echo "=== tools/amazon-learning-qa/amazon-dossier-lib.mjs (search for input.message usage) ==="
rg -n "input\\.message|message\\b" tools/amazon-learning-qa/amazon-dossier-lib.mjs

echo "=== tools/amazon-learning-qa/amazon-qa-server.mjs (around validateLearningDossierForSave call) ==="
sed -n '2920,2975p' tools/amazon-learning-qa/amazon-qa-server.mjs | cat -n

echo "=== Other validateLearningDossierForSave call patterns in tools/amazon-learning-qa ==="
rg -n "validateLearningDossierForSave\\(.*\\{\\s*message:" tools/amazon-learning-qa -S

Repository: tinyhumansai/openhuman

Length of output: 12575


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "evidenceAudit" tools/amazon-learning-qa/amazon-dossier-lib.mjs tools/amazon-learning-qa/amazon-qa-server.mjs tools/amazon-learning-qa/amazon-dossier-lib.test.mjs -S

Repository: tinyhumansai/openhuman

Length of output: 1186


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== tools/amazon-learning-qa/amazon-dossier-lib.test.mjs (276-310) ==="
sed -n '260,320p' tools/amazon-learning-qa/amazon-dossier-lib.test.mjs | cat -n

echo "=== tools/amazon-learning-qa/amazon-dossier-lib.test.mjs (790-830) ==="
sed -n '780,840p' tools/amazon-learning-qa/amazon-dossier-lib.test.mjs | cat -n

Repository: tinyhumansai/openhuman

Length of output: 5154


Fix test inputs: { message: noEvidence } passes the wrong shape

validateLearningDossierForSave only uses input.message.evidenceAudit.feedback; line 294 passes the dossier itself as message, so this test isn’t exercising the message/evidenceAudit path (though it still hits needs_accepted_evidence because that check depends on dossier.acceptedEvidence). Align the test with the API (and server usage) by passing the original message object used to build noEvidence (and do the same for the similar pattern at line 813).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/amazon-dossier-lib.test.mjs` at line 294, The test
passes the entire dossier as the message argument to
validateLearningDossierForSave (const validation =
validateLearningDossierForSave(noEvidence, { message: noEvidence })), but the
function reads input.message.evidenceAudit.feedback; replace the second argument
with the original message object that was used to construct noEvidence so that
message.evidenceAudit is present (and update the similar incorrect call at the
other occurrence around line 813). Ensure the shape matches the server API
(message.evidenceAudit.feedback) so the needs_accepted_evidence path is
exercised correctly.

Comment on lines +257 to +267
async function verify(args) {
const rpcUrl = argValue(args, "--rpc", DEFAULT_RPC);
const namespace = argValue(args, "--namespace", DEFAULT_NAMESPACE);
const query = argValue(args, "--query");
const result = await rpcCall(rpcUrl, "openhuman.memory_context_query", {
namespace,
query,
limit: 5,
});
console.log(typeof result === "string" ? result : JSON.stringify(result, null, 2));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Missing validation for required --query argument.

If --query is not provided, argValue returns undefined (no fallback given), and this undefined value is passed to the RPC call. The usage message indicates --query <text> is required, but no enforcement exists.

🐛 Add validation for required --query
 async function verify(args) {
   const rpcUrl = argValue(args, "--rpc", DEFAULT_RPC);
   const namespace = argValue(args, "--namespace", DEFAULT_NAMESPACE);
   const query = argValue(args, "--query");
+  if (!query) {
+    throw new Error("--query is required for verify command");
+  }
   const result = await rpcCall(rpcUrl, "openhuman.memory_context_query", {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async function verify(args) {
const rpcUrl = argValue(args, "--rpc", DEFAULT_RPC);
const namespace = argValue(args, "--namespace", DEFAULT_NAMESPACE);
const query = argValue(args, "--query");
const result = await rpcCall(rpcUrl, "openhuman.memory_context_query", {
namespace,
query,
limit: 5,
});
console.log(typeof result === "string" ? result : JSON.stringify(result, null, 2));
}
async function verify(args) {
const rpcUrl = argValue(args, "--rpc", DEFAULT_RPC);
const namespace = argValue(args, "--namespace", DEFAULT_NAMESPACE);
const query = argValue(args, "--query");
if (!query) {
throw new Error("--query is required for verify command");
}
const result = await rpcCall(rpcUrl, "openhuman.memory_context_query", {
namespace,
query,
limit: 5,
});
console.log(typeof result === "string" ? result : JSON.stringify(result, null, 2));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/openhuman-amazon-kb.mjs` around lines 257 - 267, The
verify function calls argValue for "--query" but doesn't validate it, so
undefined may be sent to rpcCall; update verify to check the result of
argValue("--query") (in function verify) and if it's missing or empty, print a
clear usage/error message and exit (or throw) before calling
rpcCall("openhuman.memory_context_query"); ensure the validation uses the same
symbols (verify, argValue, rpcCall) and mirrors existing error/usage behavior
for other required args.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tools/amazon-learning-qa/amazon-qa-ui.test.mjs (1)

385-397: ⚡ Quick win

Add missing negative assertion for evidence auto-promotion prevention.

The test checks that the source decision table doesn't include saveLearningNote, but it's missing a similar check for data-dossier-source-decision. Almost all similar tests throughout this file (lines 201-202, 255-256, 291-292, 320-321, 335-336, 352-353, 364-365, 381-382, etc.) verify that features preventing auto-promotion of evidence check for BOTH patterns.

✅ Add the missing assertion for consistency
   assert.match(html, /不是新的作者原文证据/);
   assert.doesNotMatch(html, /source-decision-table[\s\S]{0,1400}saveLearningNote/);
+  assert.doesNotMatch(html, /source-decision-table[\s\S]{0,1400}data-dossier-source-decision/);
 });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/amazon-learning-qa/amazon-qa-ui.test.mjs` around lines 385 - 397, The
test "assistant answers expose a clickable source decision table" is missing the
negative assertion that prevents evidence auto-promotion via the
data-dossier-source-decision pattern; add an assertion alongside the existing
assert.doesNotMatch(html, /source-decision-table[\s\S]{0,1400}saveLearningNote/)
to also assert.doesNotMatch(html,
/data-dossier-source-decision[\s\S]{0,1400}saveLearningNote/), so both selectors
are checked for absence of saveLearningNote.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tools/amazon-learning-qa/amazon-qa-ui.test.mjs`:
- Around line 385-397: The test "assistant answers expose a clickable source
decision table" is missing the negative assertion that prevents evidence
auto-promotion via the data-dossier-source-decision pattern; add an assertion
alongside the existing assert.doesNotMatch(html,
/source-decision-table[\s\S]{0,1400}saveLearningNote/) to also
assert.doesNotMatch(html,
/data-dossier-source-decision[\s\S]{0,1400}saveLearningNote/), so both selectors
are checked for absence of saveLearningNote.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dfe0ccba-507a-461c-9b71-790fd7c1dbc4

📥 Commits

Reviewing files that changed from the base of the PR and between 50e224a and 5903f13.

📒 Files selected for processing (8)
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.test.mjs
  • tools/amazon-learning-qa/amazon-qa-lib.mjs
  • tools/amazon-learning-qa/amazon-qa-lib.test.mjs
  • tools/amazon-learning-qa/amazon-qa-product.mjs
  • tools/amazon-learning-qa/amazon-qa-server.mjs
  • tools/amazon-learning-qa/amazon-qa-ui.html
  • tools/amazon-learning-qa/amazon-qa-ui.test.mjs
🚧 Files skipped from review as they are similar to previous changes (3)
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.test.mjs
  • tools/amazon-learning-qa/amazon-qa-e2e-smoke.mjs
  • tools/amazon-learning-qa/amazon-qa-product.mjs

Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@f0rsan thanks for the substantial work here. I went through the core changes carefully and they look solid:

  • The summariser fail-soft path in src/openhuman/memory/tree/tree_source/summariser/llm.rs (falling back to the inert summariser when the provider returns nothing usable, plus the trailing-comma / first-object JSON repair) is well-reasoned and the new unit tests cover the interesting cases.
  • Excluding global trees from list_stale_buffers and making the generic seal handler a no-op for global trees is consistent with the count-based cascade comment, and the regression tests lock it in.

One thing blocks a full sign-off right now: the branch has merge conflicts with the base (it's showing as not mergeable). Could you rebase on the latest base and resolve those? Once the branch merges cleanly I'll come back and do the final pass to approve.

A couple of smaller notes while you're in there:

  • The new tools/amazon-learning-qa/ package is large, and amazon-qa-ui.html is a ~14k-line single file. Since it's intentionally local-only, that's defensible, but it would help reviewers (and future maintenance) if the README spelled out that the UI is a generated/single-file artifact and how it's regenerated, so nobody hand-edits 14k lines by mistake.
  • CodeRabbit already pointed out the missing --query validation in openhuman-amazon-kb.mjs and the test-input shape in amazon-dossier-lib.test.mjs — worth folding those in on the same pass.

Resolve the conflicts and I'll re-review. Happy to help if anything in the rebase gets hairy.

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@f0rsan the Rust core changes are solid work — the global-tree seal guard, the SQL filter fix in list_stale_buffers, and the summariser hardening (prose-mandatory prompt, JSON repair, empty-summary fallback) are all well-reasoned and well-tested. Those I'd merge on their own without hesitation.

The blocker is the tool addition. 41K lines of new product code landing in a single PR with no linked issue makes it impossible to review scope alignment. I can't verify that tools/amazon-learning-qa belongs here (versus a separate repo or a dedicated feature branch with a spec) because there's no spec to check against. The PR description says the tool is intentionally local-only with no cloud path — that's fine, but it's also an argument for why this might live somewhere else, and that call should be documented.

Two requests before this can land:

  1. Link a GitHub issue that specifies what problem this tool solves, who it's for, and why it belongs in the openhuman repo rather than a standalone repo. The Rust changes can note the same issue as motivation (since they appear to be prerequisite fixes for the tool's use case).

  2. The server uses sqlite3 CLI via execFileAsync for all DB access. The dashboard already has better-sqlite3 as a dependency and uses it correctly. Shelling out to sqlite3 works, but it adds a runtime dependency on the sqlite3 binary being in PATH, buffers entire result sets in memory, and is inconsistent with the rest of the codebase. Either use better-sqlite3 or add a short comment explaining why the CLI is preferred here (e.g. no native build step in the tools directory).

The vercel-entry directory also ships a static landing page and vercel.json, yet the PR notes Vercel deployment would require a separate cloud architecture. If there's no intent to deploy it, that directory should stay out of main.

const NOTEBOOK_ROOT = path.join(WORKSPACE, "learning-notebooks");
const USER_SOURCE_ROOT = path.join(WORKSPACE, "user-sources");
const LEARNING_NOTE_ROOT = path.join(WORKSPACE, "learning-notes");
const RUN_DIR = path.join(WORKSPACE, "run");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor] Server shells out to the sqlite3 CLI binary for every DB query. This adds an implicit runtime dependency (sqlite3 must be in PATH), buffers full result sets as JSON strings, and diverges from the dashboard which uses better-sqlite3. If there's a reason to avoid a native module here (e.g. keeping tools/ free of native deps), add a comment explaining it.

Copy link
Copy Markdown
Contributor

@sanil-23 sanil-23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @graycyrus has requested changes — deferring to their feedback. Will review once those are addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants