Skip to content

feat(backend): support a custom OpenAI-compatible provider in BYOK (#6878)#8466

Open
ZachL111 wants to merge 11 commits into
BasedHardware:mainfrom
ZachL111:zach/byok-custom-provider
Open

feat(backend): support a custom OpenAI-compatible provider in BYOK (#6878)#8466
ZachL111 wants to merge 11 commits into
BasedHardware:mainfrom
ZachL111:zach/byok-custom-provider

Conversation

@ZachL111

@ZachL111 ZachL111 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

BYOK currently only accepts keys from the major labs (OpenAI, Anthropic, Gemini, Deepgram). This adds a fifth option: a user-hosted, OpenAI-compatible provider, so a BYOK user can run Omi's LLM work on OpenRouter, Together, Groq, a local server, or any endpoint that speaks the OpenAI chat-completions API. Refs #6878.

The provider is configured per-request, alongside the existing BYOK keys:

  • X-BYOK-Custom: the API key (fingerprinted on enrollment like the other providers)
  • X-BYOK-Custom-Base-Url: the endpoint, e.g. https://openrouter.ai/api/v1
  • X-BYOK-Custom-Model: the model to call, e.g. qwen/qwen-2.5-72b-instruct

When all three are present for an enrolled, BYOK-active user, that user's get_llm calls (conversation structuring, memory extraction, summaries, and the other OpenAI-compatible features) run on their endpoint, with their model and their key.

How it works

  • Intake (utils/byok.py): custom joins the existing per-request header set; the base URL and model travel as companion config headers and are stashed in the same request contextvar under reserved names, so they never collide with a provider key and are never fingerprint-validated.
  • Enrollment gate: the custom key is fingerprinted exactly like the others. _check_byok_validity drops the custom key and config unless custom is actually enrolled, so an arbitrary X-BYOK-Custom header can never route a user's traffic to an unvalidated endpoint. Non-BYOK users' headers are ignored as before.
  • SSRF guard (validate_custom_base_url): the backend makes outbound calls to this URL, so it must be https and must not point at a loopback, link-local, private, or reserved address (including the cloud metadata endpoint 169.254.169.254). A bad URL degrades to the default provider instead of failing the request.
  • Routing (utils/llm/clients.py): get_llm short-circuits to a ChatOpenAI(base_url, api_key, model) for the custom provider when one is configured. Clients are cached by hashed key plus base URL, so a raw key is never part of a cache key and different endpoints never collide.
  • Enrollment (routers/users.py): custom is allowed as an optional additional provider. The required set (openai/anthropic/gemini/deepgram) and the quota model are unchanged.

Scope, and one decision for a maintainer

This is intentionally additive and fully opt-in: with no custom headers, behavior is byte-for-byte unchanged.

Two things I scoped deliberately and would like a maintainer to weigh in on, since this is your BYOK feature @kodjima33:

  1. The main agentic chat runs on the Anthropic client path, which does not go through get_llm, so the custom provider currently powers the get_llm LLM features (extraction, structuring, summaries, and so on) and not the streaming chat agent. Routing the agent to an arbitrary OpenAI-compatible endpoint is a larger change (tool-format differences) and is a sensible follow-up.

  2. Today a user must still be BYOK-active (enroll the standard four) to also enroll custom. That keeps the free-plan and quota model untouched, but it does not yet fully serve the original ask in Add third party LLM providers/gateways other than OpenAI, Anthropic, Google and Deepgram #6878 (people who want to use only a non-lab provider). Decoupling custom from the four-provider requirement is a product call on the free-plan model, so I left it to you rather than changing who qualifies for the BYOK plan.

Testing

  • Extended tests/unit/test_byok_security.py with a TestBYOKCustomProvider suite: base-URL validation (https only; rejects localhost, .local, private, and metadata IPs), the per-request getter (full config, missing field, invalid-URL degradation, config-alone is not a usable request), header intake, enrollment accepting an optional custom provider, the enrollment gate dropping un-enrolled custom config while honoring enrolled, client caching by base URL, and get_llm routing to the custom client.
  • black --line-length 120 --skip-string-normalization is clean and scripts/lint_async_blockers.py reports no violations. The activation-endpoint tests that import routers.users need GCP credentials to import and so run green only in CI, the same as the existing BYOK activation tests.

Review in cubic

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 4 files

Confidence score: 2/5

  • In backend/utils/byok.py, the SSRF check only inspects the input hostname/IP string and does not validate the DNS-resolved destination, so a benign-looking host could resolve to private/internal or metadata addresses and still be requested, creating a real network-exposure path if merged as-is — resolve and validate final IPs (including DNS rebinding protections) before allowing custom provider calls.
  • In backend/utils/llm/clients.py (get_llm), the custom BYOK early return skips prompt-cache binding and the downstream Anthropic/Perplexity provider guard, which can cause inconsistent behavior and bypass intended safety checks for those providers — route BYOK through the same post-resolution guard/cache path (or apply equivalent checks before return) before merging.

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread backend/utils/byok.py
Comment thread backend/utils/llm/clients.py Outdated
@Git-on-my-level Git-on-my-level added security-review Needs security/privacy review needs-maintainer-review Needs a human maintainer to review/approve (e.g. stacked, product, or architecture judgment) feature-fit-review PR touches feature direction; qualitative fit assessed labels Jun 27, 2026
@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Thanks @ZachL111 — nice, well-structured addition, and the test coverage for the new SSRF guard is genuinely good. I'm not approving this one yet because it needs human maintainer review before it can merge, and there are a couple of concrete things worth addressing first.

Why this needs human review

This introduces a new provider type to BYOK and a new outbound-call surface (the backend calling a user-supplied endpoint with a user key). That touches model/provider routing and security-sensitive surfaces, so I'd want a maintainer to weigh in on the product direction and the security posture rather than rubber-stamp it.

Security — SSRF guard has a DNS-resolution gap

validate_custom_base_url checks the raw hostname/IP literal, which blocks 127.0.0.1, 169.254.169.254, .local, etc. directly. But it never resolves DNS, so a benign-looking hostname that resolves to a private/internal/metadata address can still be requested. For a brand-new network-exposure path, I'd feel much better if the guard resolved the host and validated the resulting IP(s) before allowing the call, ideally with DNS-rebinding protection (re-resolve at connect time, or pin to the resolved IP). Worth tightening before merge.

get_llm early-return bypasses downstream guards

The custom-provider early return skips prompt-cache binding and the post-resolution Anthropic/Perplexity provider guard that the normal path applies. That's probably intentional (the custom path is OpenAI-compatible only), but it should either route through the same post-resolution path or apply equivalent checks before returning, so the two paths can't drift in safety-relevant ways.

Smaller notes

  • Good that an un-enrolled x-byok-custom header can't route traffic — that enrollment gate reads correctly.
  • The has_byok_keys() change to exclude __-prefixed config keys is the right call so config-only headers can't flip quota bypass.

Adding security-review, needs-maintainer-review, and feature-fit-review so this gets the right eyes. Appreciate the contribution and the thorough tests.

@ZachL111

Copy link
Copy Markdown
Contributor Author

Thanks @Git-on-my-level and cubic, both findings were valid. Pushed fixes:

  1. SSRF DNS gap. validate_custom_base_url now resolves the hostname (socket.getaddrinfo) and rejects the request if any resolved address is private, loopback, link-local, reserved, multicast, or unspecified, so a public-looking name that points at an internal or metadata IP no longer slips past the literal-IP checks. Added tests for a hostname resolving to a private IP, to 169.254.169.254, a non-resolving host, and a public host.

    One limitation I called out honestly in the docstring: this validates at call time and does not pin the resolved IP through to the LLM client's connection, so an active DNS-rebinding attacker (public at check, internal at connect) is not fully covered. Connect-time pinning means routing the custom client through a dedicated transport that re-resolves and pins the IP, which is a focused follow-up, so I left a note rather than half-building it. Given the path is opt-in, enrollment-gated, and headed for security review, the resolve-and-validate closes the realistic exposure.

  2. get_llm early return. The custom path now applies the same prompt-cache binding the default path uses (a no-op unless the user's model is one that supports prompt caching), so the two paths cannot drift on that. On the Anthropic/Perplexity guard: those run at the top of get_llm before the custom check, so anthropic/perplexity-only features still raise before reaching custom. The only thing the early return skips is the redundant post-resolution provider == anthropic/perplexity check, which is intentional since the custom path overrides the resolved provider entirely.

Appreciate the careful security read.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/utils/byok.py">

<violation number="1" location="backend/utils/byok.py:163">
P2: Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread backend/utils/byok.py
Comment thread backend/utils/byok.py
# public-looking name pointing at a private/metadata IP cannot slip through.
try:
infos = socket.getaddrinfo(host, parsed.port or 443, type=socket.SOCK_STREAM)
except socket.gaierror as e:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Resolver exceptions other than socket.gaierror are not normalized, so malformed/IDNA hostnames can raise UnicodeError and bypass the graceful degradation in get_byok_custom_provider(), surfacing as a 500 error instead of falling back to the default provider.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/utils/byok.py, line 163:

<comment>Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</comment>

<file context>
@@ -133,14 +146,30 @@ def validate_custom_base_url(base_url: str) -> str:
+    # public-looking name pointing at a private/metadata IP cannot slip through.
+    try:
+        infos = socket.getaddrinfo(host, parsed.port or 443, type=socket.SOCK_STREAM)
+    except socket.gaierror as e:
+        raise ValueError(f'custom base URL host did not resolve: {e}')
+    for info in infos:
</file context>
Suggested change
except socket.gaierror as e:
+ except (socket.gaierror, UnicodeError) as e:

@ZachL111

Copy link
Copy Markdown
Contributor Author

Thanks cubic. On the synchronous getaddrinfo: get_llm (and therefore this validation) runs in a worker thread, not on the event loop. Its callers offload the blocking LLM .invoke() to the executor, so the DNS lookup is already off the loop. The async loop.getaddrinfo you pointed at lives in the web-search tool, which genuinely runs on the loop; this BYOK path is the threadpool LLM path.

That said, I added a per-host TTL cache (5 minutes) around the resolution, so a base URL is resolved at most once per host per window instead of on every get_llm call. That keeps the lookup off the hot path and bounds the work a client could trigger: repeated calls with the same host hit the cache, and resolution failures are not cached. Added a test that a second validation of the same host is served from the cache.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/utils/byok.py">

<violation number="1" location="backend/utils/byok.py:163">
P2: Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</violation>
</file>

Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic

Comment thread backend/utils/byok.py Outdated
@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Follow-up review after the latest commits — thanks for addressing the earlier feedback, @ZachL111.

What was addressed

  • SSRF DNS-resolution gap (resolved): validate_custom_base_url now resolves the hostname and rejects any resolved address that is private/loopback/link-local/reserved/multicast/unspecified, including 169.254.169.254. Tests cover hostname-resolves-to-private, hostname-resolves-to-metadata, non-resolving host, and public-resolution happy path. This closes the original gap.
  • get_llm early-return drift (resolved): the custom path now applies the same cache_key / supports_prompt_cache binding as the default path, so the two cannot drift on prompt-cache behavior.
  • Blocking getaddrinfo off the hot path: the per-host TTL cache (5 min, 512 entries) bounds repeated resolution. The clarification that this runs in the worker-thread LLM path (not the event loop) checks out.
  • Enrollment gate + has_byok_keys(): still read correctly — an un-enrolled x-byok-custom header cannot route traffic, and __-prefixed config keys alone cannot flip quota bypass.

Residual item worth noting before merge

The SSRF validation happens at call time but does not pin the resolved IP through to the LLM client connection. Combined with the 5-minute positive DNS cache, there is a DNS-rebinding window: a host that validates as public can re-resolve to an internal/metadata address between this check and the actual outbound request. The code documents this honestly (connect-time pinning via a dedicated transport is called out as a follow-up). For a brand-new outbound-call surface carrying a user key, I would want a maintainer to decide whether connect-time IP pinning (or a shorter/no positive cache) should land in this PR or as an immediate follow-up before broad rollout.

Recommendation

Still needs human security + maintainer review before merge — this introduces a new provider type, a new network-exposure surface, and touches model/provider routing, all of which warrant a product and security judgment call. Not approving. The implementation quality and test coverage are genuinely good.

@Git-on-my-level

Copy link
Copy Markdown
Collaborator

Thanks @ZachL111 — nice work on this. Adding a custom OpenAI-compatible provider to BYOK is a coherent extension of the existing surface, and the hardening around the new SSRF exposure is thoughtful: HTTPS-only, full internal/loopback/link-local/private/reserved/metadata blocking, hostname resolution that rejects any internal resolved address, and the deliberate choice to cache only rejections (re-resolving successes on every call to limit DNS-rebinding windows). The SSRF test coverage is solid, and gating the custom provider behind the existing fingerprint enrollment check is the right call.

A few things that need human maintainer eyes before merge:

  1. Residual SSRF / DNS-rebinding TOCTOU. validate_custom_base_url resolves and validates the IP at call time, but the actual outbound connection (ChatOpenAI/httpx) re-resolves the hostname independently. Between validation and connect, an attacker who controls their domain's DNS can rebind from a public IP to an internal/metadata address (169.254.169.254) and ride the validation window. The docstring flags this honestly and suggests connect-time IP pinning as a follow-up — for a backend that will make authenticated outbound calls to arbitrary user-supplied endpoints, I'd treat that pinning (a custom httpx transport bound to the validated resolved IP, or equivalent) as a prerequisite rather than a follow-up.

  2. Product/feature fit. Adding user-controlled provider routing (OpenRouter/Together/Groq/local) to BYOK expands the trust, performance, and abuse surface beyond the four major-lab providers BYOK supports today — rate limits, model-capability assumptions baked into get_llm feature routing, and prompt-cache binding behavior across unknown models. Worth an explicit maintainer decision on direction.

  3. Model-name control. The model comes from a request header, so a user can point an arbitrary model string at their endpoint. That's expected for BYOK semantics, but it means feature paths in get_llm (structured output, prompt-cache binding, Anthropic-only gating) may behave unpredictably on unknown models — worth confirming degradation is graceful.

Keeping security-review / needs-maintainer-review / feature-fit-review. Not approving — this needs product + security sign-off, and connect-time SSRF pinning should be resolved first. Appreciate the contribution and the clear docstrings.

@ZachL111

Copy link
Copy Markdown
Contributor Author

Thanks for the careful security pass, and for separating the engineering hardening from the product-direction call.

Your characterization of the residual TOCTOU is exactly right. validate_custom_base_url resolves and checks the IP, but ChatOpenAI/httpx re-resolves the hostname independently at connect time, so a domain owner who controls their DNS can rebind a validated public IP to an internal or metadata address in the window between the check and the outbound request. Removing the positive cache (cache rejections only, re-resolve on every success) closes the cache-amplified window you and cubic flagged, but it does not close that underlying validate-then-connect gap.

The proper fix is connect-time IP pinning: resolve and validate once, then bind the outbound connection to that exact validated IP while preserving the original hostname for TLS SNI and certificate verification, via a custom httpx transport passed to the OpenAI client on both the sync and async paths. That removes the second resolution entirely, so the connection cannot be redirected after validation. I am glad to land that in this PR, or split it out as the immediate prerequisite follow-up, whichever you and the maintainer prefer.

On points 2 and 3, agreed those are maintainer calls. Custom-provider BYOK expands the trust and abuse surface beyond the four major-lab providers and changes the get_llm feature-routing assumptions for unknown models, so it warrants an explicit product sign-off. I scoped the change additive and enrollment-gated so existing behavior is byte-for-byte unchanged, and on model-name control the unknown-model paths degrade rather than fail: prompt-cache binding is only applied when supports_prompt_cache is true, and the Anthropic-only gating is unaffected because the custom path routes through the OpenAI-compatible client.

@kodjima33 kodjima33 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backend feature: custom OpenAI-compatible BYOK provider — approve only.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature-fit-review PR touches feature direction; qualitative fit assessed needs-maintainer-review Needs a human maintainer to review/approve (e.g. stacked, product, or architecture judgment) security-review Needs security/privacy review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants