feat(backend): support a custom OpenAI-compatible provider in BYOK (#6878) by ZachL111 · Pull Request #8466 · BasedHardware/omi

ZachL111 · 2026-06-27T21:37:24Z

Summary

BYOK currently only accepts keys from the major labs (OpenAI, Anthropic, Gemini, Deepgram). This adds a fifth option: a user-hosted, OpenAI-compatible provider, so a BYOK user can run Omi's LLM work on OpenRouter, Together, Groq, a local server, or any endpoint that speaks the OpenAI chat-completions API. Refs #6878.

The provider is configured per-request, alongside the existing BYOK keys:

X-BYOK-Custom: the API key (fingerprinted on enrollment like the other providers)
X-BYOK-Custom-Base-Url: the endpoint, e.g. https://openrouter.ai/api/v1
X-BYOK-Custom-Model: the model to call, e.g. qwen/qwen-2.5-72b-instruct

When all three are present for an enrolled, BYOK-active user, that user's get_llm calls (conversation structuring, memory extraction, summaries, and the other OpenAI-compatible features) run on their endpoint, with their model and their key.

How it works

Intake (utils/byok.py): custom joins the existing per-request header set; the base URL and model travel as companion config headers and are stashed in the same request contextvar under reserved names, so they never collide with a provider key and are never fingerprint-validated.
Enrollment gate: the custom key is fingerprinted exactly like the others. _check_byok_validity drops the custom key and config unless custom is actually enrolled, so an arbitrary X-BYOK-Custom header can never route a user's traffic to an unvalidated endpoint. Non-BYOK users' headers are ignored as before.
SSRF guard (validate_custom_base_url): the backend makes outbound calls to this URL, so it must be https and must not point at a loopback, link-local, private, or reserved address (including the cloud metadata endpoint 169.254.169.254). A bad URL degrades to the default provider instead of failing the request.
Routing (utils/llm/clients.py): get_llm short-circuits to a ChatOpenAI(base_url, api_key, model) for the custom provider when one is configured. Clients are cached by hashed key plus base URL, so a raw key is never part of a cache key and different endpoints never collide.
Enrollment (routers/users.py): custom is allowed as an optional additional provider. The required set (openai/anthropic/gemini/deepgram) and the quota model are unchanged.

Scope, and one decision for a maintainer

This is intentionally additive and fully opt-in: with no custom headers, behavior is byte-for-byte unchanged.

Two things I scoped deliberately and would like a maintainer to weigh in on, since this is your BYOK feature @kodjima33:

The main agentic chat runs on the Anthropic client path, which does not go through get_llm, so the custom provider currently powers the get_llm LLM features (extraction, structuring, summaries, and so on) and not the streaming chat agent. Routing the agent to an arbitrary OpenAI-compatible endpoint is a larger change (tool-format differences) and is a sensible follow-up.
Today a user must still be BYOK-active (enroll the standard four) to also enroll custom. That keeps the free-plan and quota model untouched, but it does not yet fully serve the original ask in Add third party LLM providers/gateways other than OpenAI, Anthropic, Google and Deepgram #6878 (people who want to use only a non-lab provider). Decoupling custom from the four-provider requirement is a product call on the free-plan model, so I left it to you rather than changing who qualifies for the BYOK plan.

Testing

Extended tests/unit/test_byok_security.py with a TestBYOKCustomProvider suite: base-URL validation (https only; rejects localhost, .local, private, and metadata IPs), the per-request getter (full config, missing field, invalid-URL degradation, config-alone is not a usable request), header intake, enrollment accepting an optional custom provider, the enrollment gate dropping un-enrolled custom config while honoring enrolled, client caching by base URL, and get_llm routing to the custom client.
black --line-length 120 --skip-string-normalization is clean and scripts/lint_async_blockers.py reports no violations. The activation-endpoint tests that import routers.users need GCP credentials to import and so run green only in CI, the same as the existing BYOK activation tests.

… URL to BYOK

…nfigured

…t gating, and routing

cubic-dev-ai

2 issues found across 4 files

Confidence score: 2/5

In backend/utils/byok.py, the SSRF check only inspects the input hostname/IP string and does not validate the DNS-resolved destination, so a benign-looking host could resolve to private/internal or metadata addresses and still be requested, creating a real network-exposure path if merged as-is — resolve and validate final IPs (including DNS rebinding protections) before allowing custom provider calls.
In backend/utils/llm/clients.py (get_llm), the custom BYOK early return skips prompt-cache binding and the downstream Anthropic/Perplexity provider guard, which can cause inconsistent behavior and bypass intended safety checks for those providers — route BYOK through the same post-resolution guard/cache path (or apply equivalent checks before return) before merging.

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

Git-on-my-level · 2026-06-27T22:49:09Z

Thanks @ZachL111 — nice, well-structured addition, and the test coverage for the new SSRF guard is genuinely good. I'm not approving this one yet because it needs human maintainer review before it can merge, and there are a couple of concrete things worth addressing first.

Why this needs human review

This introduces a new provider type to BYOK and a new outbound-call surface (the backend calling a user-supplied endpoint with a user key). That touches model/provider routing and security-sensitive surfaces, so I'd want a maintainer to weigh in on the product direction and the security posture rather than rubber-stamp it.

Security — SSRF guard has a DNS-resolution gap

validate_custom_base_url checks the raw hostname/IP literal, which blocks 127.0.0.1, 169.254.169.254, .local, etc. directly. But it never resolves DNS, so a benign-looking hostname that resolves to a private/internal/metadata address can still be requested. For a brand-new network-exposure path, I'd feel much better if the guard resolved the host and validated the resulting IP(s) before allowing the call, ideally with DNS-rebinding protection (re-resolve at connect time, or pin to the resolved IP). Worth tightening before merge.

`get_llm` early-return bypasses downstream guards

The custom-provider early return skips prompt-cache binding and the post-resolution Anthropic/Perplexity provider guard that the normal path applies. That's probably intentional (the custom path is OpenAI-compatible only), but it should either route through the same post-resolution path or apply equivalent checks before returning, so the two paths can't drift in safety-relevant ways.

Smaller notes

Good that an un-enrolled x-byok-custom header can't route traffic — that enrollment gate reads correctly.
The has_byok_keys() change to exclude __-prefixed config keys is the right call so config-only headers can't flip quota bypass.

Adding security-review, needs-maintainer-review, and feature-fit-review so this gets the right eyes. Appreciate the contribution and the thorough tests.

…ect internal IPs

… path too

…esses

ZachL111 · 2026-06-28T03:19:48Z

Thanks @Git-on-my-level and cubic, both findings were valid. Pushed fixes:

SSRF DNS gap. validate_custom_base_url now resolves the hostname (socket.getaddrinfo) and rejects the request if any resolved address is private, loopback, link-local, reserved, multicast, or unspecified, so a public-looking name that points at an internal or metadata IP no longer slips past the literal-IP checks. Added tests for a hostname resolving to a private IP, to 169.254.169.254, a non-resolving host, and a public host.

One limitation I called out honestly in the docstring: this validates at call time and does not pin the resolved IP through to the LLM client's connection, so an active DNS-rebinding attacker (public at check, internal at connect) is not fully covered. Connect-time pinning means routing the custom client through a dedicated transport that re-resolves and pins the IP, which is a focused follow-up, so I left a note rather than half-building it. Given the path is opt-in, enrollment-gated, and headed for security review, the resolve-and-validate closes the realistic exposure.
get_llm early return. The custom path now applies the same prompt-cache binding the default path uses (a no-op unless the user's model is one that supports prompt caching), so the two paths cannot drift on that. On the Anthropic/Perplexity guard: those run at the top of get_llm before the custom check, so anthropic/perplexity-only features still raise before reaching custom. The only thing the early return skips is the redundant post-resolution provider == anthropic/perplexity check, which is intentional since the custom path overrides the resolved provider entirely.

Appreciate the careful security read.

cubic-dev-ai

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/utils/byok.py">

<violation number="1" location="backend/utils/byok.py:163">
P2: Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</violation>
</file>

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

cubic-dev-ai · 2026-06-28T03:23:24Z

+    # public-looking name pointing at a private/metadata IP cannot slip through.
+    try:
+        infos = socket.getaddrinfo(host, parsed.port or 443, type=socket.SOCK_STREAM)
+    except socket.gaierror as e:


P2: Resolver exceptions other than socket.gaierror are not normalized, so malformed/IDNA hostnames can raise UnicodeError and bypass the graceful degradation in get_byok_custom_provider(), surfacing as a 500 error instead of falling back to the default provider.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At backend/utils/byok.py, line 163: <comment>Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</comment> <file context> @@ -133,14 +146,30 @@ def validate_custom_base_url(base_url: str) -> str: + # public-looking name pointing at a private/metadata IP cannot slip through. + try: + infos = socket.getaddrinfo(host, parsed.port or 443, type=socket.SOCK_STREAM) + except socket.gaierror as e: + raise ValueError(f'custom base URL host did not resolve: {e}') + for info in infos: </file context>

Suggested change

except socket.gaierror as e:

+ except (socket.gaierror, UnicodeError) as e:

…ff the hot path

ZachL111 · 2026-06-28T03:55:53Z

Thanks cubic. On the synchronous getaddrinfo: get_llm (and therefore this validation) runs in a worker thread, not on the event loop. Its callers offload the blocking LLM .invoke() to the executor, so the DNS lookup is already off the loop. The async loop.getaddrinfo you pointed at lives in the web-search tool, which genuinely runs on the loop; this BYOK path is the threadpool LLM path.

That said, I added a per-host TTL cache (5 minutes) around the resolution, so a base URL is resolved at most once per host per window instead of on every get_llm call. That keeps the lookup off the hot path and bounds the work a client could trigger: repeated calls with the same host hit the cache, and resolution failures are not cached. Added a test that a second validation of the same host is served from the cache.

cubic-dev-ai

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/utils/byok.py">

<violation number="1" location="backend/utils/byok.py:163">
P2: Resolver exceptions other than `socket.gaierror` are not normalized, so malformed/IDNA hostnames can raise `UnicodeError` and bypass the graceful degradation in `get_byok_custom_provider()`, surfacing as a 500 error instead of falling back to the default provider.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

… window

Git-on-my-level · 2026-06-28T05:07:39Z

Follow-up review after the latest commits — thanks for addressing the earlier feedback, @ZachL111.

What was addressed

SSRF DNS-resolution gap (resolved): validate_custom_base_url now resolves the hostname and rejects any resolved address that is private/loopback/link-local/reserved/multicast/unspecified, including 169.254.169.254. Tests cover hostname-resolves-to-private, hostname-resolves-to-metadata, non-resolving host, and public-resolution happy path. This closes the original gap.
get_llm early-return drift (resolved): the custom path now applies the same cache_key / supports_prompt_cache binding as the default path, so the two cannot drift on prompt-cache behavior.
Blocking getaddrinfo off the hot path: the per-host TTL cache (5 min, 512 entries) bounds repeated resolution. The clarification that this runs in the worker-thread LLM path (not the event loop) checks out.
Enrollment gate + has_byok_keys(): still read correctly — an un-enrolled x-byok-custom header cannot route traffic, and __-prefixed config keys alone cannot flip quota bypass.

Residual item worth noting before merge

The SSRF validation happens at call time but does not pin the resolved IP through to the LLM client connection. Combined with the 5-minute positive DNS cache, there is a DNS-rebinding window: a host that validates as public can re-resolve to an internal/metadata address between this check and the actual outbound request. The code documents this honestly (connect-time pinning via a dedicated transport is called out as a follow-up). For a brand-new outbound-call surface carrying a user key, I would want a maintainer to decide whether connect-time IP pinning (or a shorter/no positive cache) should land in this PR or as an immediate follow-up before broad rollout.

Recommendation

Still needs human security + maintainer review before merge — this introduces a new provider type, a new network-exposure surface, and touches model/provider routing, all of which warrant a product and security judgment call. Not approving. The implementation quality and test coverage are genuinely good.

Git-on-my-level · 2026-06-28T06:13:24Z

Thanks @ZachL111 — nice work on this. Adding a custom OpenAI-compatible provider to BYOK is a coherent extension of the existing surface, and the hardening around the new SSRF exposure is thoughtful: HTTPS-only, full internal/loopback/link-local/private/reserved/metadata blocking, hostname resolution that rejects any internal resolved address, and the deliberate choice to cache only rejections (re-resolving successes on every call to limit DNS-rebinding windows). The SSRF test coverage is solid, and gating the custom provider behind the existing fingerprint enrollment check is the right call.

A few things that need human maintainer eyes before merge:

Residual SSRF / DNS-rebinding TOCTOU. validate_custom_base_url resolves and validates the IP at call time, but the actual outbound connection (ChatOpenAI/httpx) re-resolves the hostname independently. Between validation and connect, an attacker who controls their domain's DNS can rebind from a public IP to an internal/metadata address (169.254.169.254) and ride the validation window. The docstring flags this honestly and suggests connect-time IP pinning as a follow-up — for a backend that will make authenticated outbound calls to arbitrary user-supplied endpoints, I'd treat that pinning (a custom httpx transport bound to the validated resolved IP, or equivalent) as a prerequisite rather than a follow-up.
Product/feature fit. Adding user-controlled provider routing (OpenRouter/Together/Groq/local) to BYOK expands the trust, performance, and abuse surface beyond the four major-lab providers BYOK supports today — rate limits, model-capability assumptions baked into get_llm feature routing, and prompt-cache binding behavior across unknown models. Worth an explicit maintainer decision on direction.
Model-name control. The model comes from a request header, so a user can point an arbitrary model string at their endpoint. That's expected for BYOK semantics, but it means feature paths in get_llm (structured output, prompt-cache binding, Anthropic-only gating) may behave unpredictably on unknown models — worth confirming degradation is graceful.

Keeping security-review / needs-maintainer-review / feature-fit-review. Not approving — this needs product + security sign-off, and connect-time SSRF pinning should be resolved first. Appreciate the contribution and the clear docstrings.

ZachL111 · 2026-06-28T06:26:29Z

Thanks for the careful security pass, and for separating the engineering hardening from the product-direction call.

Your characterization of the residual TOCTOU is exactly right. validate_custom_base_url resolves and checks the IP, but ChatOpenAI/httpx re-resolves the hostname independently at connect time, so a domain owner who controls their DNS can rebind a validated public IP to an internal or metadata address in the window between the check and the outbound request. Removing the positive cache (cache rejections only, re-resolve on every success) closes the cache-amplified window you and cubic flagged, but it does not close that underlying validate-then-connect gap.

The proper fix is connect-time IP pinning: resolve and validate once, then bind the outbound connection to that exact validated IP while preserving the original hostname for TLS SNI and certificate verification, via a custom httpx transport passed to the OpenAI client on both the sync and async paths. That removes the second resolution entirely, so the connection cannot be redirected after validation. I am glad to land that in this PR, or split it out as the immediate prerequisite follow-up, whichever you and the maintainer prefer.

On points 2 and 3, agreed those are maintainer calls. Custom-provider BYOK expands the trust and abuse surface beyond the four major-lab providers and changes the get_llm feature-routing assumptions for unknown models, so it warrants an explicit product sign-off. I scoped the change additive and enrollment-gated so existing behavior is byte-for-byte unchanged, and on model-name control the unknown-model paths degrade rather than fail: prompt-cache binding is only applied when supports_prompt_cache is true, and the Anthropic-only gating is unaffected because the custom path routes through the OpenAI-compatible client.

kodjima33

Backend feature: custom OpenAI-compatible BYOK provider — approve only.

ZachL111 added 4 commits June 27, 2026 14:36

Add custom OpenAI-compatible provider intake and an SSRF-guarded base…

a8d4b44

… URL to BYOK

Route LLM calls to a user's custom OpenAI-compatible endpoint when co…

4d70fa5

…nfigured

Allow enrolling a custom provider alongside the standard BYOK providers

f29a2bc

Test the custom BYOK provider: base URL validation, intake, enrollmen…

eb6c57b

…t gating, and routing

cubic-dev-ai Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread backend/utils/byok.py

Comment thread backend/utils/llm/clients.py Outdated

Git-on-my-level added security-review Needs security/privacy review needs-maintainer-review Needs a human maintainer to review/approve (e.g. stacked, product, or architecture judgment) feature-fit-review PR touches feature direction; qualitative fit assessed labels Jun 27, 2026

ZachL111 added 3 commits June 27, 2026 20:19

Address review: resolve DNS in the custom base URL SSRF guard and rej…

52c68cc

…ect internal IPs

Address review: apply prompt-cache binding on the custom BYOK get_llm…

fc6e1cf

… path too

Test custom base URL DNS resolution against private and metadata addr…

bdf760a

…esses

cubic-dev-ai Bot reviewed Jun 28, 2026

View reviewed changes

ZachL111 added 2 commits June 27, 2026 20:55

Address review: cache the custom base URL DNS resolution to keep it o…

5e43622

…ff the hot path

Test the custom base URL resolution cache

602135c

cubic-dev-ai Bot reviewed Jun 28, 2026

View reviewed changes

Comment thread backend/utils/byok.py Outdated

ZachL111 added 2 commits June 27, 2026 22:04

Cache only SSRF rejections, never successes, to close a DNS-rebinding…

497e9c5

… window

Test that successful custom base URL validations are not cached

3669fd0

kodjima33 approved these changes Jun 28, 2026

View reviewed changes

	except socket.gaierror as e:
	+ except (socket.gaierror, UnicodeError) as e:

Uh oh!

Conversation

ZachL111 commented Jun 27, 2026 • edited by cubic-dev-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Scope, and one decision for a maintainer

Testing

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Git-on-my-level commented Jun 27, 2026

Why this needs human review

Security — SSRF guard has a DNS-resolution gap

get_llm early-return bypasses downstream guards

Smaller notes

Uh oh!

ZachL111 commented Jun 28, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

ZachL111 commented Jun 28, 2026

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Git-on-my-level commented Jun 28, 2026

What was addressed

Residual item worth noting before merge

Recommendation

Uh oh!

Git-on-my-level commented Jun 28, 2026

Uh oh!

ZachL111 commented Jun 28, 2026

Uh oh!

kodjima33 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ZachL111 commented Jun 27, 2026 •

edited by cubic-dev-ai Bot

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

`get_llm` early-return bypasses downstream guards

cubic-dev-ai Bot left a comment •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading