Skip to content

importinto: strip S3 external ID from resource params#68986

Closed
fgksgf wants to merge 1 commit into
pingcap:release-nextgen-202603from
fgksgf:codex/importinto-strip-external-id-resource-params-20260605
Closed

importinto: strip S3 external ID from resource params#68986
fgksgf wants to merge 1 commit into
pingcap:release-nextgen-202603from
fgksgf:codex/importinto-strip-external-id-resource-params-20260605

Conversation

@fgksgf

@fgksgf fgksgf commented Jun 5, 2026

Copy link
Copy Markdown
Member

Summary

  • Strip external-id / external_id from S3 resource parameters before generating IMPORT INTO SQL in the Lightning import-into backend.
  • Keep the original Mydumper.SourceDir available to the import SDK so Lightning can still assume the keyspace role with the external ID while reading source metadata.
  • Preserve non-S3 resource parameters unchanged.

Why

DM import-into uses a keyspace role on S3. The local DM/Lightning side needs external-id to assume that role, but NextGen TiDB injects the keyspace name as the external ID for IMPORT INTO itself and rejects or mishandles an explicit external ID in the SQL storage URI. Stripping the parameter only at SQL generation time keeps both consumers working.

This complements pingcap/tiflow#12670, which stripped the parameter too early from Mydumper.SourceDir and caused Lightning to fail before submitting IMPORT INTO.

How

  • Parse cfg.Mydumper.SourceDir as before in DefaultJobSubmitter.buildImportOptions.
  • For S3 source dirs only, remove external-id and external_id from ImportOptions.ResourceParameters.
  • Leave non-S3 query parameters unchanged.

Testing

  • Ran gofmt on changed files.
  • Not run: full go test; this PR was prepared from GitHub contents without a full TiDB checkout in the current workspace.

Risks and Reviewer Focus

  • Please review whether S3-only stripping is the right scope for NextGen import-role semantics.
  • url.Values.Encode() may reorder query parameters, but it preserves the key/value semantics used by IMPORT INTO resource parameters.

Summary by CodeRabbit

  • Bug Fixes

    • Improved S3 import job submission by filtering out external-id parameters from query strings.
  • Tests

    • Added tests validating external-id parameter handling for S3 and non-S3 data sources.

@ti-chi-bot

ti-chi-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot ti-chi-bot Bot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 5, 2026
@ti-chi-bot

ti-chi-bot Bot commented Jun 5, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign d3hunter for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 5, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR modifies the Lightning import job submitter to strip S3 external-id query parameters from resource parameters before job submission. A new helper function implements conditional removal for S3 URLs while preserving parameters for other storage backends. Tests validate correct behavior for both S3 and non-S3 sources.

Changes

S3 External ID Stripping

Layer / File(s) Summary
S3 external-id sanitization in job submission
lightning/pkg/importinto/job_submitter.go, lightning/pkg/importinto/job_submitter_test.go
New stripS3ExternalIDFromResourceParameters helper removes external-id/external_id keys from S3 URL queries case-insensitively. Helper is integrated into SubmitTable to sanitize resource parameters before job submission. Tests verify external-id removal for S3 sources and preservation for non-S3 sources (GCS).

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Suggested Labels

size/L, release-note

Suggested Reviewers

  • joechenrh
  • hawkingrei
  • D3Hunter

Poem

🐰 S3 secrets stripped away so clean,
No external IDs left to be seen,
GCS keeps its queries intact,
Lightning imports stay on track,
Parameter sanitization—a rabbity fact! 🌟

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description covers most required sections: a clear problem statement explaining the DM import-into keyspace role issue, detailed explanation of changes and rationale, testing confirmation, and risk notes. However, it lacks the required Issue Number reference format and release notes are marked as 'None' without proper justification. Add 'Issue Number: close #68986' or appropriate issue reference at the beginning of the description as required by the template.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'importinto: strip S3 external ID from resource params' directly and specifically summarizes the main change: removing S3 external ID from resource parameters in the importinto package.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.12.2)

Command failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
lightning/pkg/importinto/job_submitter.go (1)

139-158: ⚡ Quick win

Add a doc comment explaining the S3-specific stripping rationale.

The function lacks a comment explaining why external-ID stripping is S3-specific and the NextGen TiDB compatibility constraint it addresses. This context is non-obvious to future maintainers.

📝 Suggested doc comment
+// stripS3ExternalIDFromResourceParameters removes external-id and external_id
+// query parameters from S3 URLs to ensure compatibility with NextGen TiDB,
+// which injects its own external-id and rejects user-supplied values.
+// Non-S3 schemes are returned unchanged. Returns the original rawQuery if
+// parsing fails or no external-ID parameters are present.
 func stripS3ExternalIDFromResourceParameters(scheme, rawQuery string) string {

As per coding guidelines, comments should explain non-obvious intent and compatibility contracts.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lightning/pkg/importinto/job_submitter.go` around lines 139 - 158, Add a doc
comment above stripS3ExternalIDFromResourceParameters that explains this
function only strips the "external-id"/"external_id" query parameters for s3
scheme values because NextGen TiDB rejects resources containing S3 external IDs
(or treats them as incompatible), so we must remove those keys when scheme ==
"s3"; mention that non-S3 schemes and malformed queries are left untouched and
that the function preserves encoded order by returning values.Encode() only when
a change was made. Reference the function name
stripS3ExternalIDFromResourceParameters and the checked keys
"external-id"/"external_id" in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@lightning/pkg/importinto/job_submitter.go`:
- Around line 139-158: Add a doc comment above
stripS3ExternalIDFromResourceParameters that explains this function only strips
the "external-id"/"external_id" query parameters for s3 scheme values because
NextGen TiDB rejects resources containing S3 external IDs (or treats them as
incompatible), so we must remove those keys when scheme == "s3"; mention that
non-S3 schemes and malformed queries are left untouched and that the function
preserves encoded order by returning values.Encode() only when a change was
made. Reference the function name stripS3ExternalIDFromResourceParameters and
the checked keys "external-id"/"external_id" in the comment.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c6eb351b-6319-4cf9-a7c7-bde3f3ee1671

📥 Commits

Reviewing files that changed from the base of the PR and between cc6c38e and b60f0a8.

📒 Files selected for processing (2)
  • lightning/pkg/importinto/job_submitter.go
  • lightning/pkg/importinto/job_submitter_test.go

@fgksgf

fgksgf commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Closing this PR because TiDB IMPORT INTO now allows an explicit S3 external-id. The remaining fix is in pingcap/tiflow#12681: keep the original source dir so dumpling and Lightning local source reads can assume the keyspace role.

@fgksgf fgksgf closed this Jun 5, 2026
@fgksgf fgksgf deleted the codex/importinto-strip-external-id-resource-params-20260605 branch June 9, 2026 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant