importinto: strip S3 external ID from resource params#68986
Conversation
|
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughThis PR modifies the Lightning import job submitter to strip S3 ChangesS3 External ID Stripping
Estimated Code Review Effort🎯 2 (Simple) | ⏱️ ~15 minutes Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.12.2)Command failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
lightning/pkg/importinto/job_submitter.go (1)
139-158: ⚡ Quick winAdd a doc comment explaining the S3-specific stripping rationale.
The function lacks a comment explaining why external-ID stripping is S3-specific and the NextGen TiDB compatibility constraint it addresses. This context is non-obvious to future maintainers.
📝 Suggested doc comment
+// stripS3ExternalIDFromResourceParameters removes external-id and external_id +// query parameters from S3 URLs to ensure compatibility with NextGen TiDB, +// which injects its own external-id and rejects user-supplied values. +// Non-S3 schemes are returned unchanged. Returns the original rawQuery if +// parsing fails or no external-ID parameters are present. func stripS3ExternalIDFromResourceParameters(scheme, rawQuery string) string {As per coding guidelines, comments should explain non-obvious intent and compatibility contracts.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@lightning/pkg/importinto/job_submitter.go` around lines 139 - 158, Add a doc comment above stripS3ExternalIDFromResourceParameters that explains this function only strips the "external-id"/"external_id" query parameters for s3 scheme values because NextGen TiDB rejects resources containing S3 external IDs (or treats them as incompatible), so we must remove those keys when scheme == "s3"; mention that non-S3 schemes and malformed queries are left untouched and that the function preserves encoded order by returning values.Encode() only when a change was made. Reference the function name stripS3ExternalIDFromResourceParameters and the checked keys "external-id"/"external_id" in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@lightning/pkg/importinto/job_submitter.go`:
- Around line 139-158: Add a doc comment above
stripS3ExternalIDFromResourceParameters that explains this function only strips
the "external-id"/"external_id" query parameters for s3 scheme values because
NextGen TiDB rejects resources containing S3 external IDs (or treats them as
incompatible), so we must remove those keys when scheme == "s3"; mention that
non-S3 schemes and malformed queries are left untouched and that the function
preserves encoded order by returning values.Encode() only when a change was
made. Reference the function name stripS3ExternalIDFromResourceParameters and
the checked keys "external-id"/"external_id" in the comment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: c6eb351b-6319-4cf9-a7c7-bde3f3ee1671
📒 Files selected for processing (2)
lightning/pkg/importinto/job_submitter.golightning/pkg/importinto/job_submitter_test.go
|
Closing this PR because TiDB IMPORT INTO now allows an explicit S3 external-id. The remaining fix is in pingcap/tiflow#12681: keep the original source dir so dumpling and Lightning local source reads can assume the keyspace role. |
Summary
external-id/external_idfrom S3 resource parameters before generatingIMPORT INTOSQL in the Lightning import-into backend.Mydumper.SourceDiravailable to the import SDK so Lightning can still assume the keyspace role with the external ID while reading source metadata.Why
DM import-into uses a keyspace role on S3. The local DM/Lightning side needs
external-idto assume that role, but NextGen TiDB injects the keyspace name as the external ID forIMPORT INTOitself and rejects or mishandles an explicit external ID in the SQL storage URI. Stripping the parameter only at SQL generation time keeps both consumers working.This complements pingcap/tiflow#12670, which stripped the parameter too early from
Mydumper.SourceDirand caused Lightning to fail before submittingIMPORT INTO.How
cfg.Mydumper.SourceDiras before inDefaultJobSubmitter.buildImportOptions.external-idandexternal_idfromImportOptions.ResourceParameters.Testing
gofmton changed files.go test; this PR was prepared from GitHub contents without a full TiDB checkout in the current workspace.Risks and Reviewer Focus
url.Values.Encode()may reorder query parameters, but it preserves the key/value semantics used byIMPORT INTOresource parameters.Summary by CodeRabbit
Bug Fixes
Tests