Skip to content

feat: dynamic vault OIDC callback port and in-process token capture#910

Merged
openshift-merge-bot[bot] merged 3 commits into
openshift:masterfrom
clcollins:vault-dynamic-callback-port
Jun 5, 2026
Merged

feat: dynamic vault OIDC callback port and in-process token capture#910
openshift-merge-bot[bot] merged 3 commits into
openshift:masterfrom
clcollins:vault-dynamic-callback-port

Conversation

@clcollins
Copy link
Copy Markdown
Member

@clcollins clcollins commented Jun 3, 2026

Summary

  • Read dynamically-assigned host port from /tmp/vault_callback_port (written by ocm-container's ports feature) and pass it as callbackport to vault login -method=oidc
  • When re-authenticating inside a container, first try writing ~/.vault-token normally; if that fails (read-only bind mount or atomic rename issues), fall back to capturing the token in-process via VAULT_TOKEN env var
  • Enables multiple ocm-container instances to run vault OIDC authentication simultaneously without port conflicts
  • Updates error messages to reference the ports feature instead of the deprecated launch-opts approach

Companion PR

Jira

ROSAENG-39831

Test plan

  • Unit tests pass: go test ./pkg/utils/ -v
  • Inside ocm-container with existing valid ~/.vault-token: osdctl rhobs logs uses it directly
  • Inside ocm-container with expired token: osdctl rhobs logs triggers OIDC re-auth
  • With /tmp/vault_callback_port present: uses dynamic callback port
  • Without /tmp/vault_callback_port: falls back to default port 8250
  • Token file write failure: falls back to in-process VAULT_TOKEN capture

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Refactor

    • Improved Vault token setup: dynamic callback-port detection, more reliable container-mode login flow, and a fallback that captures tokens when container file writes fail.
  • Tests

    • Reworked tests to target helper behavior: added checks for port reading and login-argument composition; removed higher-level integration tests for full token setup.

Read the dynamically-assigned host port from /tmp/vault_callback_port
(written by ocm-container's ports feature) and pass it as callbackport
to vault login, enabling multiple containers to run simultaneously.

When re-authenticating inside a container, first try writing
~/.vault-token normally. If that fails (read-only bind mount or atomic
rename issues), fall back to capturing the token in-process via the
VAULT_TOKEN environment variable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 3, 2026

Walkthrough

Refactors Vault token refresh into smaller helpers: adds readCallbackPort(), builds container OIDC args with optional callback port, separates container/local setup, and implements a container fallback that captures the token from stdout when the token file cannot be written.

Changes

Vault Token Setup Refactoring

Layer / File(s) Summary
Imports, constants, and callback port reader
pkg/utils/vault.go (imports/constants), pkg/utils/vault_test.go (TestReadCallbackPort)
Adds readCallbackPort() with a test hook (readFileFunc), new callback-port file constant and default OIDC port, and updated imports.
Token setup, OIDC args, and container fallback
pkg/utils/vault.go (setupVaultToken, setupVaultTokenContainer, setupVaultTokenLocal, buildOIDCArgs)
setupVaultToken delegates invalid-token handling to container or local helpers. buildOIDCArgs() composes OIDC login args (injecting callback port when available). setupVaultTokenContainer() attempts normal login then falls back to re-running with -no-store -field=token to capture a token from stdout and set VAULT_TOKEN if the token file cannot be written.
Unit tests for helpers
pkg/utils/vault_test.go (TestReadCallbackPort, TestBuildOIDCArgs)
Replaces prior integration-style command/tests with unit tests: TestReadCallbackPort stubs file reads to validate trimming and empty behavior; TestBuildOIDCArgs asserts required flags and conditional inclusion of -no-store, -field=token, and port/callback arguments.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (14 passed)
Check name Status Explanation
Title check ✅ Passed The title directly reflects the main changes: dynamic Vault OIDC callback port and in-process token capture fallback are the core features introduced.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Tests use standard Go testing (not Ginkgo). All test names and subtests use stable, deterministic strings with no dynamic values, UUIDs, timestamps, or generated identifiers.
Test Structure And Quality ✅ Passed The custom check requires Ginkgo test syntax (It blocks, BeforeEach/AfterEach), but the modified test file uses standard Go testing with testing.T. Check is not applicable.
Microshift Test Compatibility ✅ Passed PR adds only standard Go unit tests, not Ginkgo e2e tests. No It(), Describe(), Context(), When() patterns found. Check not applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No Ginkgo e2e tests added; only standard Go unit tests for Vault utility functions. Check is not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies utility functions for Vault authentication (pkg/utils/vault.go) and their tests; no Kubernetes deployment manifests, operators, controllers, or scheduling constraints are introduced.
Ote Binary Stdout Contract ✅ Passed osdctl is a CLI tool, not an OTE binary. No Ginkgo suites, BeforeSuite/AfterSuite functions, or openshift-tests integration found. Check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only standard Go unit tests in pkg/utils/, not Ginkgo e2e tests. Custom check for Ginkgo e2e test IPv6/disconnected network compatibility does not apply.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons found. Code delegates all crypto to vault CLI.
Container-Privileges ✅ Passed PR contains only Go source code changes to vault.go/vault_test.go with no container or K8s manifests being added or modified; check not applicable.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data exposed in logs. Tokens are captured but never logged. Only port numbers and static messages are logged, with no passwords, API keys, PII, or hostnames exposed.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 3, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/utils/vault_test.go`:
- Around line 132-145: The tests TestSetupVaultTokenLocal_ArgsShape and
TestSetupVaultTokenContainer_ArgsShape are no-ops because they only reference
setupVaultTokenLocal and setupVaultTokenContainer without executing or asserting
behavior; replace them by invoking the functions (or the helper that builds the
vault login command) with a test-injected command factory or argument-builder
stub and assert the produced command, arguments, and environment variables match
expected values for non-container and container modes respectively.
Specifically, update the tests to call the builders used by
setupVaultTokenLocal/setupVaultTokenContainer (or refactor those functions to
accept an injectable commandFactory/argBuilder), inject a fake commandFactory
that captures the command and args, then assert on captured command string, args
list and any env vars to verify the login command construction.
- Around line 42-62: The test never calls readCallbackPort() and instead
duplicates its file-read/trim logic; refactor so the parsing is testable and
call production code directly: either change readCallbackPort to accept a path
(e.g., readCallbackPort(path string) string/error) or extract the trimming logic
into a new function parseCallbackPort(content string) (or parseCallbackPort(r
io.Reader)), update readCallbackPort to use that helper, and then update
vault_test.go to create a temp file (or a string/reader) and call the production
parseCallbackPort/readCallbackPort with that input and assert results for both
present and missing-file cases; reference readCallbackPort and
vaultCallbackPortFile to locate the code to change.

In `@pkg/utils/vault.go`:
- Around line 103-115: setupVaultTokenLocal (and setupVaultTokenContainer)
currently call exec.Command without a context which can hang indefinitely;
change both functions to accept a context.Context parameter, create a child
context with a reasonable timeout using context.WithTimeout, and use
exec.CommandContext instead of exec.Command so the spawned 'vault login' process
is cancelled when the context times out or is cancelled; propagate the ctx from
callers into setupVaultTokenLocal/setupVaultTokenContainer, check for ctx.Err()
and return a wrapped error (including context timeout/cancel reason) when the
command fails, and ensure any stdout/stderr handling remains correct so the
process is terminated on timeout.
- Around line 155-178: The fallback to in-process token capture is executed
whenever loginCmd.Run() returns any error, but the code logs "Token file write
failed" and switches to containerOIDCArgs(true) even when the error was
unrelated to writing ~/.vault-token; update setupVaultTokenContainer to inspect
the original error from the first vault login (loginCmd.Run()) and only perform
the -no-store fallback (rebuilding loginArgs with containerOIDCArgs(true) and
re-running loginCmd with tokenBuf capture) when the error indicates a token-file
write/rename permission failure (e.g., syscall errors, permission denied, or
specific vault CLI stderr text), otherwise return the original error (including
its context) instead of silently switching to container-based auth; keep
existing logging but change the message to reflect that the fallback is
conditional and include the original error when returning.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 01ffb963-6cc3-4b78-a4e5-d32dce055fe5

📥 Commits

Reviewing files that changed from the base of the PR and between 7117673 and 3a59e50.

📒 Files selected for processing (2)
  • pkg/utils/vault.go
  • pkg/utils/vault_test.go

Comment thread pkg/utils/vault_test.go
Comment thread pkg/utils/vault_test.go Outdated
Comment thread pkg/utils/vault.go
Comment thread pkg/utils/vault.go
clcollins and others added 2 commits June 3, 2026 13:04
Replace file-writing tests with mocked readFileFunc. Extract
buildOIDCArgs to accept callbackPort as a parameter so tests
exercise arg-building logic without any I/O or API calls.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add context.WithTimeout (5 minutes) to all vault login exec.Command
calls so stalled OIDC flows don't block indefinitely.

Gate the -no-store fallback in setupVaultTokenContainer to only trigger
on token file write errors (rename, device busy, read-only, permission
denied). Auth/network/timeout failures now return immediately instead
of silently retrying.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clcollins
Copy link
Copy Markdown
Member Author

/label tide/merge-method-squash

🤖 Claude claude@anthropic.com commenting on behalf of @clcollins

@openshift-ci openshift-ci Bot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Jun 3, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

@clcollins: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Comment thread pkg/utils/vault.go
return fmt.Errorf("vault login timed out after %s", vaultLoginTimeout)
} else if !isTokenFileError(err) {
return fmt.Errorf("vault login failed: %v\n\n"+
"If authentication timed out or the callback failed:\n"+
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this less ocm-container focused? Seems like it would be a general UX issue?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustman9000 Same as the other one: this is an ocm-container only problem & this code isn't reachable by normal osdctl users outside of ocm-container.

Comment thread pkg/utils/vault.go
}
return fmt.Errorf("vault login failed: %v\n\n"+
"If authentication timed out or the callback failed:\n"+
" 1. Ensure ocm-container has the vault port enabled in the ports feature\n"+
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same UX issue here, please make more generic.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dustman9000 this is an ocm-container only problem, and this function is not reachable unless you are in ocm-container. Normal osdctl users outside of ocm-container won't ever see this message.

@dustman9000
Copy link
Copy Markdown
Member

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 5, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 5, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: clcollins, dustman9000

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [clcollins,dustman9000]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot Bot merged commit 486780d into openshift:master Jun 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants