Skip to content

Add proxy support for fetch indexing#345

Open
mxyhi wants to merge 14 commits intomksglu:nextfrom
mxyhi:fetch-proxy-support
Open

Add proxy support for fetch indexing#345
mxyhi wants to merge 14 commits intomksglu:nextfrom
mxyhi:fetch-proxy-support

Conversation

@mxyhi
Copy link
Copy Markdown

@mxyhi mxyhi commented Apr 26, 2026

Summary

ctx_fetch_and_index currently generates a subprocess script that calls bare fetch(url). In environments where direct DNS/network access is blocked or polluted, this makes fetch indexing fail even when users have standard proxy variables configured.

This PR adds proxy-aware fetch support for both JavaScript runtimes used by context-mode:

  • Bun: passes fetch(targetUrl, { proxy }).
  • Node: passes fetch(targetUrl, { dispatcher: new ProxyAgent(proxy) }) through undici.

Details

  • Added src/fetch-proxy.ts to select proxy URLs from HTTPS_PROXY, https_proxy, HTTP_PROXY, http_proxy, ALL_PROXY, and all_proxy.
  • Added NO_PROXY / no_proxy bypass support for exact hosts, suffix hosts, wildcard, host:port, and IPv4 CIDR entries.
  • Updated ctx_fetch_and_index generated fetch code to call fetchWithProxy(url) instead of bare fetch(url).
  • Added undici@^6.25.0 as a runtime dependency so Node 18+ can use a stable ProxyAgent path.
  • Updated bundled artifacts.

Validation

  • bunx vitest run tests/core/fetch-proxy.test.ts
  • npm run typecheck
  • npm test -- tests/core/fetch-proxy.test.ts tests/core/server.test.ts tests/core/search.test.ts
  • npm run build

Also ran npm test; unrelated existing ABI-cache assertions in tests/core/cli.test.ts fail locally under Node 25 because ensureNativeCompat() returns early when hasModernSqlite() is true:

  • cache hit: copies cached ABI binary to active path
  • cache hit does not trigger rebuild

@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 26, 2026

Review

The proxy logic in fetch-proxy.ts is well-structured — NO_PROXY parsing with CIDR/suffix/wildcard support, correct Bun vs Node runtime branching, and clean subprocess code serialization. Good work.

Before we can merge

1. Remove bundle files from the diff.
cli.bundle.mjs and server.bundle.mjs are 356 lines of unreviable minified code. These are built by CI — they should not be in the PR. Remove them and let the build pipeline regenerate.

2. Target the next branch, not main.
All feature PRs go through next first. Please retarget.

3. SOCKS5 validation.
The PR reads ALL_PROXY which commonly points to socks5:// URLs, but undici's ProxyAgent only supports HTTP/HTTPS proxies (nodejs/undici#2224). A socks5:// URL will silently fail. Please either:

  • Validate the protocol and throw a clear error: "SOCKS proxies are not supported — only HTTP/HTTPS"
  • Or skip ALL_PROXY entirely and only read HTTP_PROXY / HTTPS_PROXY

Question: real-world demand

I want to understand the actual need before committing to maintaining proxy infrastructure:

  • Are you personally behind a corporate proxy? Did ctx_fetch_and_index fail for you?
  • How many users have reported this? I haven't seen proxy-related issues filed.
  • Would documenting HTTP_PROXY support in the README be sufficient, or do you need the full NO_PROXY/CIDR logic?

This adds ~200 lines of source + undici as a new dependency. I want to make sure we're solving a real problem for real users, not a theoretical one.

Thanks for the thorough implementation — the code quality is solid. Just need clarity on demand before we merge.

- Bug report: 12 → 14 platforms, PR target main → next
- Feature request: add JetBrains Copilot, Qwen Code, Pi to dropdown
- PR template: add JetBrains Copilot, Qwen Code, Pi to checklist
- CONTRIBUTING.md: feature branch from main → next

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mxyhi mxyhi changed the base branch from main to next April 26, 2026 08:00
@mxyhi mxyhi force-pushed the fetch-proxy-support branch from d9e0744 to 76b86dc Compare April 26, 2026 08:02
@mxyhi
Copy link
Copy Markdown
Author

mxyhi commented Apr 26, 2026

Thanks for the review. I pushed an update addressing the merge blockers:

  • Removed cli.bundle.mjs and server.bundle.mjs from the PR diff. The PR now only changes bun.lock, package.json, src/fetch-proxy.ts, src/server.ts, and tests/core/fetch-proxy.test.ts.
  • Retargeted the PR to next and rebased the branch onto upstream/next.
  • Removed ALL_PROXY / all_proxy support to avoid SOCKS ambiguity. The code now only reads HTTP_PROXY / HTTPS_PROXY variants.
  • Added explicit validation for selected proxy URLs. If someone sets HTTP_PROXY or HTTPS_PROXY to socks*://..., it throws SOCKS proxies are not supported — only HTTP/HTTPS instead of failing later inside undici.

On demand: this came from a real failure I hit while using ctx_fetch_and_index against https://swr.vercel.app/docs/api and /docs/mutation. example.com fetched successfully, but the SWR docs path hung because direct DNS/network resolution returned non-Vercel/suspicious IPs and the generated fetch subprocess used bare fetch(url). Manually using Bun fetch(url, { proxy: "http://127.0.0.1:7890" }) succeeded.

I do not know of additional user reports, so I would frame this as one concrete reproducible environment rather than broad demand. README documentation alone would not have fixed this case because the generated fetch script needed an explicit cross-runtime proxy path: Bun needs { proxy }, and Node needs an undici dispatcher.

I kept NO_PROXY support because once HTTP_PROXY/HTTPS_PROXY is honored, users need a way to avoid proxying local/private routes. The CIDR support matches common existing NO_PROXY configs such as 100.64.0.0/10; happy to simplify that further if you prefer a smaller first version.

Validation after the update:

  • npm run typecheck
  • npm test -- tests/core/fetch-proxy.test.ts tests/core/server.test.ts tests/core/search.test.ts
  • npm run build

All passed locally.

@mksglu
Copy link
Copy Markdown
Owner

mksglu commented Apr 26, 2026

Thanks for the quick turnaround — all three items addressed cleanly.

I haven't seen this need come up before, so I'd like to wait for a bit of community signal before merging. If others need proxy support too, they can 👍 this PR. Once we see demand, we'll merge it in.

Appreciate the contribution and the detailed repro.

@mxyhi
Copy link
Copy Markdown
Author

mxyhi commented Apr 26, 2026

I think everyone in mainland China needs it; you should understand.

mksglu and others added 11 commits April 26, 2026 13:03
…stats

Shows all events across all sessions with visual bars — visible in both
fresh and active session states. Replaces old single-line continuity summary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…laude-code#46915)

Claude Code auto-update can leave installed_plugins.json pointing to a
non-existent directory, breaking all hooks. This adds 4 defense layers:

1. start.mjs startup: reverse heal — symlink from broken registry path to our dir
2. server.ts first tool call: mid-session heal — catches auto-update during session
3. postinstall.mjs: backward symlink on new install for stale registry
4. start.mjs auto-deploy: global SessionStart hook at ~/.claude/hooks/ that
   survives total plugin cache breakage

9 TDD tests verify heal behavior with real filesystem fixtures.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…th pure Node.js

- postinstall.mjs: removed duplicate self-heal block (was copy-pasted twice)
- Global heal hook: replaced #!/usr/bin/env bash with #!/usr/bin/env node
  Bash dependency caused issues on Windows (no bash) and macOS with SIP/MDM
  security policies that restrict shell execution. Pure Node.js works everywhere.
- Auto-cleans old .sh hook if present, deploys .mjs version
- Windows: uses junction type for symlinks (no admin required)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agent review found critical issues in the initial cache self-heal:

1. CRITICAL: Layer 4 global hook was dead code — now registers in
   ~/.claude/settings.json SessionStart (Claude Code doesn't auto-discover hooks)
2. HIGH: Path traversal — installPath now validated inside plugin cache root
   via resolve(rp).startsWith(cacheRoot + sep) in all 4 code paths
3. HIGH: server.ts bundle mode — symlink target used wrong dir, now uses
   pluginRoot pattern (package.json existence check)
4. HIGH: Dangling symlink — lstatSync + unlinkSync before symlinkSync
   prevents EEXIST when broken symlink already exists
5. MEDIUM: Tests asserted .sh but code deploys .mjs — fixed
6. LOW: Dead lstatSync import removed, unlinkSync in static imports
7. LOW: Exact key match (=== "context-mode@context-mode") instead of includes()

13 TDD tests now cover: dangling symlink, path traversal, exact match,
settings.json registration, and all previous behaviors.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ss (mksglu#347)

Claude Code spawns hooks via `bash -c "node ..."` on WSL2/Linux. The
intermediate shell makes process.ppid point to a transient bash PID,
not Claude Code. The PPID-keyed sentinel is never found, causing all
MCP redirects to be bypassed — context window floods.

Fix: hooks now scan /tmp for `context-mode-mcp-ready-*` files and probe
each PID with kill(pid,0). Server writes sentinel with process.pid (not
ppid). Hardcoded /tmp on Unix avoids TMPDIR mismatch.

Changes:
- hooks/core/mcp-ready.mjs: glob scan + PID liveness + stale cleanup
- src/server.ts: sentinel uses process.pid, /tmp on Unix
- 11 test files: updated sentinel path pattern

Closes mksglu#347

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The existing minSpan proximity formula `1/(1 + minSpan/contentLen)` rewards
longer documents — a long doc with one tight occurrence outranks a short doc
with multiple tight occurrences at the same span. Add a saturating phrase-
frequency reward layered on top: count ordered adjacent-pair occurrences of
consecutive query terms within 30 chars, contribute `0.5 * min(1, pairs/4)`
to the boost so frequency breaks the tie without unbounded keyword-stuffing
wins.

Pure additive signal — minSpan formula and length-norm intent are unchanged.
Cap (0.5) sits below max proximity (≈1.0) and in title-boost range (0.3-0.6).
Saturation at 4 hits keeps stuffed docs from unbeatably dominating.

Bench (1000-doc corpus, 500 iters, p95 us/call):
                       baseline   with-boost
  cache invalidation     403        325
  database connection    398        390
  auth token             285        264
  rate limiter           320        299
  circuit breaker retry  517        453
  → latency neutral (within noise)

Ranking quality (3 controlled cases of long-1-hit vs short-N-hits):
  baseline correct:    0/3
  with-boost correct:  3/3

Co-authored-by: Mert Köseoğlu <bm.ksglu@gmail.com>
…#350) (mksglu#351)

ContentStore.index() used `content ?? readFileSync(path)`, so an empty
string `""` won out over `path` and 0 sections were indexed. Some MCP
clients materialize optional string fields as `""`, which masked the
file path. Treat empty content as missing so a valid `path` falls back
to reading the file. Adds regression test covering the empty-content +
path code path.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ousamabenyounes
Copy link
Copy Markdown
Contributor

ousamabenyounes commented Apr 26, 2026

After reading the diff carefully I want to flag three concrete points before merge — not blockers per se, but the kind of things that bite later.

1. Duplication of all NO_PROXY logic between TS and runtime template

src/fetch-proxy.ts defines the bypass logic twice:

  • TS version at module scope (lines ~172–242 of the diff): shouldBypassProxy, splitNoProxyToken, ipv4ToNumber, matchesIpv4Cidr, getProxyUrlForRequest, validateProxyUrl, normalizeHost, defaultPort.
  • Runtime version inside buildFetchProxyRuntimeCode(...) template literal (lines ~81–158): the same 8 functions, same bodies, just with TS annotations stripped.

Two sources of truth for identical logic, with no compile-time link between them. The unit tests exercise the TS version (CIDR / suffix / wildcard / port edge cases), but the string-injected runtime version is only exercised by a smoke test that checks "Bun gets { proxy }, Node gets { dispatcher }". A NO_PROXY regression in the runtime copy would not be caught.

Cleaner alternative: ship a vendored plain-JS file (e.g. src/fetch-proxy.runtime.js) that the subprocess requires at runtime, and have the TS module re-export the same functions. One file, one set of tests, no string templating, type-checked and lintable.

This is the change I'd push back on hardest before merging — once it lands as-is, the two copies will drift the first time someone fixes a NO_PROXY edge case.

2. NO_PROXY="*:port" ignores the port (logic-order bug)

In shouldBypassProxy:

if (token.host === "*") return true;
if (token.port && token.port !== port) continue;

splitNoProxyToken("*:8080") returns { host: "*", port: "8080" }, the wildcard branch fires first, and the port check is never reached. So NO_PROXY="*:8080" bypasses every request, including :443. Likely not the intent given the rest of the parser carefully tracks port. Swap the two checks (or treat * with a port as a separate case).

3. validateProxyUrl doesn't wrap new URL(...)

const protocol = new URL(proxyUrl).protocol;

If a user sets HTTPS_PROXY=corp-proxy.local:8080 (missing scheme — common mistake), this throws a generic TypeError: Invalid URL from the URL constructor instead of the descriptive errors the function is built to surface. Wrapping in try/catch and rethrowing with the proxy var name in the message would make misconfigurations easier to debug.

Test coverage gap

The "selects HTTPS proxy before HTTP proxy" test passes both vars and checks HTTPS wins, but there's no test for the documented fallback: HTTPS request URL with only HTTP_PROXY set (no HTTPS_PROXY). Given HTTPS_PROXY_KEYS = ["HTTPS_PROXY", "https_proxy", "HTTP_PROXY", "http_proxy"] lists HTTP_PROXY as a fallback, that path should have an explicit test.


On the broader merge question: I'd lean toward shipping this. Honoring HTTP_PROXY / HTTPS_PROXY is table-stakes UNIX behavior (curl, git, npm, pip all do it), and a silent fetch failure is the kind of bug that doesn't generate GitHub 👍 because affected users just stop using the tool. But the duplication above is the right reason to ask for one more revision rather than wait for a community signal that won't arrive.

@mxyhi if you want a hand on this, I would be happy to contribute directly on your PR — I can push a commit that consolidates the runtime into a vendored JS file, fixes the *:port ordering, wraps the URL parsing in try/catch, and adds the missing HTTPS-URL → HTTP_PROXY fallback test. Just say the word and I will open the changes against your branch.

Generated by Ora Studio (With Claude)
Reviewed by Ousama Ben Younes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants