Skip to content

Fixes #21259: Don't store null secrets in Secrets Manager#27839

Open
aji-aju wants to merge 4 commits intomainfrom
fix/21259-null-secrets-test
Open

Fixes #21259: Don't store null secrets in Secrets Manager#27839
aji-aju wants to merge 4 commits intomainfrom
fix/21259-null-secrets-test

Conversation

@aji-aju
Copy link
Copy Markdown
Collaborator

@aji-aju aji-aju commented Apr 30, 2026

Summary

Closes #21259. Reopened from #27838 (which was based on a fork — switched to upstream branch to match the repo's contribution pattern).

When a user clears a password field on a service backed by an external Secrets Manager (managed-aws, GCP, Azure KV, Kubernetes), the previous code stored the literal string \"null\" in the secrets backing store instead of removing the secret. Downstream consumers (e.g. the Snowflake driver) then read back the four-character string \"null\" as a real password, breaking authentication — most visibly for users who authenticate via private key and never wanted a password configured at all.

This PR fixes the root cause and updates the tests that codified the buggy contract.

The bug, demonstrated

Before the fix, with SECRET_MANAGER=managed-aws:

# Create a service with a real password, then clear it via the UI (type something, delete it, save).
$ aws secretsmanager get-secret-value \
    --secret-id /<prefix>/<cluster>/database/<service>/authtype/password \
    --query SecretString --output text
null   # ← the literal four ASCII characters, not the JSON null

After the fix, the same call returns ResourceNotFoundException (the secret has been deleted) and the entity's password field is null, not a stale secret:/ reference.

Approach

The fix is three small, layered changes — each layer hands a clean null to the next:

  1. ExternalSecretsManager.upsertSecret() — when value is null/empty, the user's intent is "remove this credential", so delete the backing secret instead of writing \"null\". This also satisfies the GCP/Kubernetes empty-string rejection constraint from Fixes #25167: Secrets Manager Empty String Sanitization #25224 by simply not calling upsert with an empty value.
  2. ExternalSecretsManager.storeValue() — return null for null/empty values so the entity does not carry a stale secret:/ reference to a deleted secret. The null/empty guard sits at the top of the method (above the isSecret(value) call) so the function is self-defending against null inputs from any caller.
  3. SecretsManager.encryptPasswordFields() — propagate the null returned by storeValue() through to the entity field instead of attempting to fernet.encrypt(null).

Constraint check vs. #25224

#25224 introduced the null→\"null\" conversion specifically because GCP and Kubernetes Secrets Manager reject empty-string upserts. This PR satisfies that constraint by exclusion: we never call upsert with an empty value because we short-circuit to delete instead. The flipped Kubernetes test (testEmptySecretValueShouldNotBeStored, renamed from testEmptySecretValueShouldBeStoredAsNullString) verifies createNamespacedSecret is never called.

Tests

  • New: testIssue21259ClearingPasswordDoesNotStoreNullString exercises the exact user scenario (create service with password → clear password → save) and asserts the entity field is null, not a Fernet-wrapped secret:/ reference. Runs against every provider that subclasses ExternalSecretsManagerTest (AWS, AWS-SSM, GCP, Kubernetes).
  • New: testStoreValueWithNullDoesNotThrowNpe regression-protects the null-input defensive path.
  • Flipped: testEncryptDatabaseConnectionWithEmptyPassword was asserting the buggy contract; now asserts the corrected one (entity field is null).
  • Flipped & renamed: KubernetesSecretsManagerTest.testEmptySecretValueShouldBeStoredAsNullStringtestEmptySecretValueShouldNotBeStored. The previous test name itself encoded the bug.

All 83 tests in *SecretsManager*Test pass.

Out of scope (follow-ups)

  • Read-side cleanup of existing dirty data: customers who already have services with \"null\" strings stored in their SM will continue to read those literal \"null\" strings until those services are re-saved. A passive read-side wrapper that maps fetched \"null\"null (and optionally the existing migrate-secrets CLI tool to actively sweep) can land in a separate PR.
  • cleanNullOrEmpty() becomes effectively dead code in the upsert path. Left in place to keep this PR small and reviewable; can be removed in a follow-up.

Test plan

  • mvn test -pl openmetadata-service -Dtest='*SecretsManager*Test' (83 tests pass)
  • mvn spotless:apply (no formatting changes needed)
  • Manually reproduced the bug locally with managed-aws against AWS SM (output: literal null); verified the fix changes the output to ResourceNotFoundException
  • Manual regression sanity (set initial password, update password, clear password, recreate service, bot JWT lifecycle, test-connection workflow) — in progress

🤖 Generated with Claude Code

aji-aju and others added 3 commits April 30, 2026 11:52
When a user clears a password field on a service that uses an external
Secrets Manager (managed-aws, GCP, Azure KV, Kubernetes), the previous
behavior stored the literal string "null" in the secrets backing store
instead of removing the secret. Downstream consumers (e.g. the Snowflake
driver) then read back the four-character string "null" as a real
password, breaking authentication for users who actually authenticate
via private key and don't want a password configured at all.

The fix is three small, layered changes:

1. ExternalSecretsManager.upsertSecret() — when the value is null/empty,
   the user's intent is "remove this credential", so delete the backing
   secret instead of writing "null". This also satisfies the GCP/K8s
   empty-string rejection constraint from PR #25224 by simply not
   calling upsert with an empty value.

2. ExternalSecretsManager.storeValue() — return null for null/empty
   values so the entity does not carry a stale secret:/ reference to a
   deleted secret. Without this, the next read would attempt to follow
   the reference and either get a ResourceNotFoundException or (pre-fix)
   the literal "null" string.

3. SecretsManager.encryptPasswordFields() — propagate the null returned
   by storeValue() through to the entity field instead of attempting to
   fernet-encrypt null.

Tests:
- New: testIssue21259ClearingPasswordDoesNotStoreNullString — exercises
  the full clear-password scenario and asserts the entity field is null,
  not a Fernet-wrapped secret:/ reference.
- Flipped: testEncryptDatabaseConnectionWithEmptyPassword — was
  asserting the buggy contract; now asserts the corrected one.
- Flipped: KubernetesSecretsManagerTest.testEmptySecretValueShouldNotBeStored
  (renamed from testEmptySecretValueShouldBeStoredAsNullString) — the
  previous name codified the bug; the new test verifies storeValue with
  empty value does NOT call createNamespacedSecret and returns null.

All 80 tests in *SecretsManager*Test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… NPE

The previous version of storeValue() placed the null/empty guard AFTER
the isSecret() call. Today the path is unreachable because
encryptPasswordFields() upstream-guards on `obj != null`, so null never
actually reaches storeValue() — but the new null-handling branch was
technically dead code for the null case (isSecret() does
`string.startsWith(SECRET_FIELD_PREFIX)` which NPEs on null input).

Move the guard to the top so storeValue() is self-defending against
null inputs from any caller, present or future.

Add testStoreValueWithNullDoesNotThrowNpe regression test (runs across
all ExternalSecretsManager subclasses via the abstract test class).

All 83 tests in *SecretsManager*Test pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 30, 2026 07:08
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@@ -33,6 +33,16 @@ protected ExternalSecretsManager(
@Override
protected String storeValue(String fieldName, String value, String secretId, boolean store) {
String fieldSecretId = buildSecretId(false, secretId, fieldName.toLowerCase(Locale.ROOT));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Edge Case: storeValue with store=false and null value still computes fieldSecretId

When storeValue is called with a null value, the method still calls buildSecretId(false, secretId, fieldName.toLowerCase(Locale.ROOT)) on line 35 before hitting the null guard on line 40. This is harmless (just a wasted string concatenation), but moving the null guard above the buildSecretId call would be marginally cleaner and make the early-return intent clearer.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Comment on lines +36 to +39
// Issue #21259: a null/empty value means "no credential". Handle this case BEFORE
// calling isSecret() — isSecret() invokes startsWith() on the value and would NPE
// on null. Returning null here ensures the entity does not carry a stale secret:/
// reference to a deleted secret.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Quality: Verbose comments explaining the fix — prefer self-documenting code

Per the project's coding guidelines, code should be self-documenting with comments reserved for complex business logic or workarounds. The four-line block comments at lines 36-39, 58-61, and 414-416 essentially re-state what the code already makes clear. Consider trimming these to a single-line issue reference (e.g., // Fix #21259: null/empty means "remove credential") and letting the code speak for itself.

Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes issue #21259 in the OpenMetadata service secrets subsystem by ensuring cleared (null/empty) secret values are treated as “remove credential” for external secrets managers, preventing the literal "null" string from being persisted and later misinterpreted as a real password.

Changes:

  • Update external secret storage to delete secrets (and return null to callers) when the incoming value is null/empty.
  • Propagate null through encryption flow to avoid attempting Fernet encryption on cleared values.
  • Update and add unit tests to assert the corrected contract (no "null" string storage; no stale secret: references).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java Treat null/empty as delete intent; storeValue() returns null and upsertSecret() deletes instead of storing "null".
openmetadata-service/src/main/java/org/openmetadata/service/secrets/SecretsManager.java Ensure encryption flow propagates null instead of encrypting it.
openmetadata-service/src/test/java/org/openmetadata/service/secrets/ExternalSecretsManagerTest.java Flip prior expectations and add regressions for null/empty handling and issue #21259 scenario.
openmetadata-service/src/test/java/org/openmetadata/service/secrets/KubernetesSecretsManagerTest.java Update Kubernetes behavior test to assert empty values are not upserted (no create call) and return null.

Comment on lines +60 to +66
// "null". This also satisfies the GCP/Kubernetes empty-string rejection constraint
// (PR #25224) by simply not calling upsert with an empty value.
if (Objects.isNull(secretValue) || secretValue.isEmpty()) {
if (existSecret(secretName)) {
deleteSecretInternal(secretName);
sleep();
}
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In upsertSecret, the null/empty branch checks existSecret(secretName) before deleting. For most providers, ExternalSecretsManager.existSecret() is implemented by calling getSecret() and swallowing any exception (returning false). That means a transient network/permission error can be treated as “secret doesn’t exist”, so the delete is silently skipped and the old secret becomes orphaned even though the entity field is cleared. Consider changing the existence check to only suppress true not-found cases and propagate other exceptions (or make deleteSecretInternal idempotent across providers and delete without a pre-check) so credential clearing doesn’t silently leave stale secrets behind.

Suggested change
// "null". This also satisfies the GCP/Kubernetes empty-string rejection constraint
// (PR #25224) by simply not calling upsert with an empty value.
if (Objects.isNull(secretValue) || secretValue.isEmpty()) {
if (existSecret(secretName)) {
deleteSecretInternal(secretName);
sleep();
}
// "null". Avoid a pre-delete existence check here because existSecret() relies on
// getSecret() and suppresses all exceptions, which can incorrectly turn transient
// provider errors into a silent "does not exist" result and leave stale secrets
// orphaned even though the entity field has been cleared.
if (Objects.isNull(secretValue) || secretValue.isEmpty()) {
deleteSecretInternal(secretName);
sleep();

Copilot uses AI. Check for mistakes.
@aji-aju aji-aju self-assigned this Apr 30, 2026
@aji-aju aji-aju added the safe to test Add this label to run secure Github workflows on PRs label Apr 30, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 30, 2026

Code Review 👍 Approved with suggestions 0 resolved / 2 findings

Prevents null secrets from being stored in Secrets Manager as requested. Consider removing verbose comments and refactoring the redundant secret ID computation in storeValue.

💡 Edge Case: storeValue with store=false and null value still computes fieldSecretId

📄 openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:35 📄 openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:40-44

When storeValue is called with a null value, the method still calls buildSecretId(false, secretId, fieldName.toLowerCase(Locale.ROOT)) on line 35 before hitting the null guard on line 40. This is harmless (just a wasted string concatenation), but moving the null guard above the buildSecretId call would be marginally cleaner and make the early-return intent clearer.

💡 Quality: Verbose comments explaining the fix — prefer self-documenting code

📄 openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:36-39 📄 openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:58-61 📄 openmetadata-service/src/main/java/org/openmetadata/service/secrets/SecretsManager.java:414-416

Per the project's coding guidelines, code should be self-documenting with comments reserved for complex business logic or workarounds. The four-line block comments at lines 36-39, 58-61, and 414-416 essentially re-state what the code already makes clear. Consider trimming these to a single-line issue reference (e.g., // Fix #21259: null/empty means "remove credential") and letting the code speak for itself.

🤖 Prompt for agents
Code Review: Prevents null secrets from being stored in Secrets Manager as requested. Consider removing verbose comments and refactoring the redundant secret ID computation in storeValue.

1. 💡 Edge Case: storeValue with store=false and null value still computes fieldSecretId
   Files: openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:35, openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:40-44

   When `storeValue` is called with a null `value`, the method still calls `buildSecretId(false, secretId, fieldName.toLowerCase(Locale.ROOT))` on line 35 before hitting the null guard on line 40. This is harmless (just a wasted string concatenation), but moving the null guard above the `buildSecretId` call would be marginally cleaner and make the early-return intent clearer.

2. 💡 Quality: Verbose comments explaining the fix — prefer self-documenting code
   Files: openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:36-39, openmetadata-service/src/main/java/org/openmetadata/service/secrets/ExternalSecretsManager.java:58-61, openmetadata-service/src/main/java/org/openmetadata/service/secrets/SecretsManager.java:414-416

   Per the project's coding guidelines, code should be self-documenting with comments reserved for complex business logic or workarounds. The four-line block comments at lines 36-39, 58-61, and 414-416 essentially re-state what the code already makes clear. Consider trimming these to a single-line issue reference (e.g., `// Fix #21259: null/empty means "remove credential"`) and letting the code speak for itself.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

// reference to a deleted secret.
if (Objects.isNull(value) || value.isEmpty()) {
if (store) {
upsertSecret(fieldSecretId, value);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mhmh I think this is hard to follow. Why would we say upsertSecret(..., null), instead of explicitly handling this as deleteSecret. IMO upsert is add or update, but should not internally delete. Let's make the process bit easier to follow

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (12 flaky)

✅ 3986 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 298 0 1 4
🟡 Shard 2 750 0 4 8
🟡 Shard 3 745 0 1 7
🟡 Shard 4 774 0 1 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 732 0 5 8
🟡 12 flaky test(s) (passed on retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2, 1 retry)
  • Features/Permissions/DataProductPermissions.spec.ts › Data Product deny operations (shard 3, 1 retry)
  • Pages/CustomProperties.spec.ts › Email (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Tag.spec.ts › Verify Owner Add Delete (shard 6, 1 retry)
  • Pages/UserDetails.spec.ts › Create team with domain and verify visibility of inherited domain in user profile after team removal (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't store null secrets in Secrets Manager

3 participants