Skip to content

feat: implement UploadPartCopy#64

Open
destifo wants to merge 7 commits into
mainfrom
feat/upload-part-copy
Open

feat: implement UploadPartCopy#64
destifo wants to merge 7 commits into
mainfrom
feat/upload-part-copy

Conversation

@destifo

@destifo destifo commented Jul 29, 2025

Copy link
Copy Markdown
Contributor

Migration notes


  • The change comes with new or modified tests
  • Hard-to-understand functions have explanatory comments
  • End-user documentation is updated to reflect the change

Summary by CodeRabbit

  • New Features
    • Added support for multipart upload part copy operations in S3-compatible and Swift-compatible backends.
    • Introduced new configuration option for setting a default bucket.
  • Improvements
    • Enhanced robustness and error handling for multipart upload workflows, including better validation and logging.
  • Bug Fixes
    • Fixed issues related to content-type validation and session file parsing during multipart uploads.
  • Tests
    • Added comprehensive tests for multipart upload part copy functionality.
  • Chores
    • Updated application and image versions to 0.7.0.

@destifo destifo requested a review from zifeo July 29, 2025 23:16
@destifo destifo self-assigned this Jul 29, 2025
@coderabbitai

coderabbitai Bot commented Jul 29, 2025

Copy link
Copy Markdown
📝 Walkthrough

"""

Walkthrough

This update implements multipart upload part copy functionality for both S3 and Swift backends, including robust validation and error handling. It introduces a new test suite for the part copy flow, updates the Helm chart and image version to 0.7.0, and adds a new configuration key for the default bucket.

Changes

Cohort / File(s) Change Summary
Helm Chart Version and Config Update
chart/Chart.yaml, chart/values.yaml
Bumped chart version and image tag from 0.5.0 to 0.7.0; added default_bucket config key under heraldConfig.
S3 Backend: Multipart Upload Part Copy
src/backends/s3/objects.ts
Added uploadPartCopy async function to handle S3 multipart upload part copy requests with validation, request proxying, retries, error handling, and logging.
Swift Backend: Multipart Upload Improvements & Part Copy
src/backends/swift/objects.ts
Implemented uploadPartCopy for Swift backend; improved multipart session validation, error handling, and logging in multipart upload functions; ensured content-type correctness and robust manifest validation.
Test: Multipart Upload Part Copy Flow
tests/swift/basic/uploadpartcopy_test.ts
Added a comprehensive test suite for the multipart upload part copy flow, validating end-to-end behavior including part copy, listing, completion, and object verification.
Test: Multipart Upload Part Copy with Checksum Verification
tests/swift/basic/uploadpartcopy_checksum_test.ts
Added a detailed test simulating a Docker pull scenario using multipart upload part copy, verifying data integrity via SHA-256 checksums and multipart upload correctness.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant S3/Swift Backend
    participant Storage Service

    Client->>S3/Swift Backend: PUT /bucket/object?partNumber=N&uploadId=... (with x-amz-copy-source)
    S3/Swift Backend->>Storage Service: GET source object (or byte range)
    Storage Service-->>S3/Swift Backend: Source data
    S3/Swift Backend->>Storage Service: PUT part data as part N
    Storage Service-->>S3/Swift Backend: Success/ETag
    S3/Swift Backend->>S3/Swift Backend: Update multipart session file (JSON)
    S3/Swift Backend-->>Client: XML CopyPartResult with ETag and LastModified
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • chore: bump chart to 0.7.0 #62: Updates Helm chart version and adds the default_bucket key, directly overlapping with this PR's chart/config changes.
  • refactor: mpu redesign #54: Refactors multipart upload session storage in the Swift backend; both PRs modify multipart upload logic in src/backends/swift/objects.ts.
  • feat: multipart upload #17: Introduces multipart upload support in the S3 backend; this PR extends those features by implementing uploadPartCopy.

Suggested labels

enhancement

Suggested reviewers

  • zifeo
  • hailatGH
  • Sabian-A
    """

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/upload-part-copy

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot added the enhancement New feature or request label Jul 29, 2025

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
tests/swift/basic/uploadpartcopy_test.ts (1)

54-54: Consider using consistent file sizes across tests.

The test creates a 10MB source file but only copies the first 5MB. Consider documenting why this specific size was chosen or using a constant for better maintainability.

+  const SOURCE_FILE_SIZE_MB = 10;
+  const COPY_SIZE_BYTES = 5 * 1024 * 1024; // 5MB
+
   // Create a source object to copy from
   await t.step("Create Source Object", async () => {
-    const tempFile = await createTempFile(10); // 10MB file
+    const tempFile = await createTempFile(SOURCE_FILE_SIZE_MB);
src/backends/swift/objects.ts (1)

1595-1601: Verify the size calculation logic.

The size calculation has multiple fallbacks which could lead to incorrect values. Consider simplifying:

-            const size = parseInt(
-              sourceSuccessResponse.headers.get("Content-Length") || "0",
-              10,
-            ) ||
-              (endByte !== undefined
-                ? (endByte - startByte + 1)
-                : sourceContent.byteLength);
+            // Prefer Content-Length header, fallback to calculated or actual size
+            const contentLengthHeader = sourceSuccessResponse.headers.get("Content-Length");
+            const size = contentLengthHeader 
+              ? parseInt(contentLengthHeader, 10)
+              : (endByte !== undefined 
+                  ? (endByte - startByte + 1)
+                  : sourceContent.byteLength);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be6c0ca and 7e17889.

📒 Files selected for processing (5)
  • chart/Chart.yaml (1 hunks)
  • chart/values.yaml (2 hunks)
  • src/backends/s3/objects.ts (2 hunks)
  • src/backends/swift/objects.ts (8 hunks)
  • tests/swift/basic/uploadpartcopy_test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: destifo
PR: expnt/herald#54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as <uploadId>.json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.
Learnt from: destifo
PR: expnt/herald#10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The `routeQueryParamedRequest` and `headBucket` functions in `src/backends/s3/buckets.ts` already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.
tests/swift/basic/uploadpartcopy_test.ts (3)

Learnt from: destifo
PR: #54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as .json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.

Learnt from: destifo
PR: #49
File: tests/_others/_task_queue_test.ts:7-12
Timestamp: 2025-06-26T12:47:53.021Z
Learning: The test in tests/_others/_task_queue_test.ts is specifically designed to test the task store functionality with an S3 backend. The direct type assertion globalConfig.task_store_backend as S3Config is safe because the test is intentionally S3-specific and uses S3 APIs throughout.

Learnt from: destifo
PR: #10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The routeQueryParamedRequest and headBucket functions in src/backends/s3/buckets.ts already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.

src/backends/s3/objects.ts (6)

Learnt from: destifo
PR: #10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The routeQueryParamedRequest and headBucket functions in src/backends/s3/buckets.ts already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.

Learnt from: destifo
PR: #54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as .json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.

Learnt from: destifo
PR: #49
File: src/backends/task_store.ts:170-228
Timestamp: 2025-06-26T12:48:25.255Z
Learning: In src/backends/task_store.ts, the TaskStore class creates Request objects with localhost URLs that are not actually used for HTTP requests. Instead, these Request objects serve as structured data containers that get passed to backend storage implementations (s3PutObject/swiftPutObject) which extract the necessary information (bucket, key, body, headers) and make the actual requests to the configured storage backends.

Learnt from: destifo
PR: #41
File: src/backends/mirror.ts:166-174
Timestamp: 2025-06-16T14:42:52.174Z
Learning: In src/backends/mirror.ts, when creating replica requests that reuse originalRequest.headers, the authentication headers (Authorization, x-amz-date, etc.) are automatically flushed/replaced down the request pipeline by functions like s3.putObject, so there's no signature mismatch issue with forwarding the original headers.

Learnt from: destifo
PR: #49
File: tests/_others/_task_queue_test.ts:7-12
Timestamp: 2025-06-26T12:47:53.021Z
Learning: The test in tests/_others/_task_queue_test.ts is specifically designed to test the task store functionality with an S3 backend. The direct type assertion globalConfig.task_store_backend as S3Config is safe because the test is intentionally S3-specific and uses S3 APIs throughout.

Learnt from: destifo
PR: #41
File: src/backends/swift/mod.ts:318-323
Timestamp: 2025-06-16T14:40:12.564Z
Learning: In conversion functions that return Result<Response, Error>, the Result represents whether the conversion process succeeded, not the HTTP semantics. A function like convertSwiftUploadPartToS3Response should return createOk(errorResponse) when it successfully converts to a proper S3 error response, and createErr(conversionError) only when the conversion process itself fails.

src/backends/swift/objects.ts (5)

Learnt from: destifo
PR: #54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as .json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.

Learnt from: destifo
PR: #10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The routeQueryParamedRequest and headBucket functions in src/backends/s3/buckets.ts already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.

Learnt from: destifo
PR: #41
File: src/backends/swift/mod.ts:318-323
Timestamp: 2025-06-16T14:40:12.564Z
Learning: In conversion functions that return Result<Response, Error>, the Result represents whether the conversion process succeeded, not the HTTP semantics. A function like convertSwiftUploadPartToS3Response should return createOk(errorResponse) when it successfully converts to a proper S3 error response, and createErr(conversionError) only when the conversion process itself fails.

Learnt from: destifo
PR: #35
File: src/backends/swift/mod.ts:0-0
Timestamp: 2025-06-11T20:21:26.693Z
Learning: For the Swift backend's PUT Object path (convertSwiftPutObjectToS3Response), Swift can return only 404 (container missing), 411 (length required), and 422 (checksum mismatch). It will not surface a 408 timeout here, so mapping all handled error cases to HTTP 400 in the S3 response is correct.

Learnt from: destifo
PR: #49
File: src/backends/task_store.ts:170-228
Timestamp: 2025-06-26T12:48:25.255Z
Learning: In src/backends/task_store.ts, the TaskStore class creates Request objects with localhost URLs that are not actually used for HTTP requests. Instead, these Request objects serve as structured data containers that get passed to backend storage implementations (s3PutObject/swiftPutObject) which extract the necessary information (bucket, key, body, headers) and make the actual requests to the configured storage backends.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test-full
🔇 Additional comments (9)
chart/Chart.yaml (1)

4-4: LGTM!

The Helm chart version update to 0.7.0 aligns with the new upload part copy feature implementation.

chart/values.yaml (2)

24-24: Clarify the purpose of default_bucket configuration.

The new default_bucket configuration key is introduced but its purpose and usage are not documented. Please clarify how this configuration is used and consider adding a comment explaining its purpose.


40-40: LGTM!

The image tag update to v0.7.0 is consistent with the Helm chart version bump.

src/backends/s3/objects.ts (1)

11-20: LGTM!

The imports are properly organized and necessary for the new uploadPartCopy function.

tests/swift/basic/uploadpartcopy_test.ts (1)

106-106: LGTM!

The byte range calculation is correct. Copying "bytes=0-5242880" results in 5242881 bytes (inclusive range), which matches the expected content length assertion.

Also applies to: 166-166

src/backends/swift/objects.ts (4)

827-827: LGTM!

Good practice to explicitly set Content-Type headers for JSON operations. This ensures proper content negotiation with the Swift backend.

Also applies to: 1000-1000, 1321-1321, 1371-1371, 1624-1624


1039-1057: Excellent error handling improvements!

The enhanced validation of session file content type and JSON parsing with detailed error logging improves robustness and debuggability.


1456-1466: Good byte range parsing implementation.

The copy source range parsing correctly handles the byte range format and extracts start/end bytes for the copy operation.


1081-1089: Good addition of debug logging.

The debug logging for manifest operations will help with troubleshooting multipart upload issues in production.

Also applies to: 1123-1127, 1234-1234

Comment on lines +544 to +607
export async function uploadPartCopy(
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;
logger.info("[S3 backend] Proxying Upload Part Copy Request...");

const { bucket, objectKey, queryParams } = s3Utils.extractRequestInfo(req);
if (!bucket) {
return createOk(
new Response("Bucket information missing from the request", {
status: 400,
}),
);
}

const config = bucketConfig.config as S3Config;
const headers = new Headers(req.headers);
headers.set("Host", config.endpoint.replace(/^https?:\/\//, ""));

// Convert queryParams to URLSearchParams
const params = new URLSearchParams();
for (const [key, values] of Object.entries(queryParams)) {
for (const value of values) {
params.append(key, value);
}
}

const reqUrl =
`${config.endpoint}/${bucket}/${objectKey}?${params.toString()}`;

const fetchFunc = async () => {
return await fetch(reqUrl, {
method: "PUT",
headers: headers,
// Use the request body directly
body: req.body,
});
};

const response = await retryWithExponentialBackoff(
fetchFunc,
);

if (!isOk(response)) {
const errRes = unwrapErr(response);
logger.warn(
`Upload Part Copy Failed. Failed to connect with Object Storage: ${errRes.message}`,
);
return response;
}

const successResponse = unwrapOk(response) as Response;
if (successResponse.status !== 200) {
const errMessage = `Upload Part Copy Failed: ${successResponse.statusText}`;
logger.warn(errMessage);
reportToSentry(errMessage);
} else {
logger.info(`Upload Part Copy Successful: ${successResponse.statusText}`);
}

return createOk(successResponse);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Verify the request body handling for upload part copy.

The uploadPartCopy function forwards the request body directly, but for S3 UploadPartCopy operations, the body is typically empty or contains XML with copy instructions. The actual copy source is specified via the x-amz-copy-source header, which should be validated and forwarded.

Consider verifying:

  1. Whether the body should be forwarded or set to null/empty
  2. The presence and validation of required headers like x-amz-copy-source
 export async function uploadPartCopy(
   reqCtx: RequestContext,
   req: Request,
   bucketConfig: Bucket,
 ): Promise<Result<Response, Error>> {
   const logger = reqCtx.logger;
   logger.info("[S3 backend] Proxying Upload Part Copy Request...");
 
   const { bucket, objectKey, queryParams } = s3Utils.extractRequestInfo(req);
   if (!bucket) {
     return createOk(
       new Response("Bucket information missing from the request", {
         status: 400,
       }),
     );
   }
 
+  // Validate required headers for copy operation
+  const copySource = req.headers.get("x-amz-copy-source");
+  if (!copySource) {
+    return createOk(
+      new Response("x-amz-copy-source header is required", {
+        status: 400,
+      }),
+    );
+  }
+
   const config = bucketConfig.config as S3Config;
   const headers = new Headers(req.headers);
   headers.set("Host", config.endpoint.replace(/^https?:\/\//, ""));

Also, consider whether other 2xx status codes (like 201) should be accepted as success:

   const successResponse = unwrapOk(response) as Response;
-  if (successResponse.status !== 200) {
+  if (successResponse.status < 200 || successResponse.status >= 300) {
     const errMessage = `Upload Part Copy Failed: ${successResponse.statusText}`;
     logger.warn(errMessage);
     reportToSentry(errMessage);
   } else {
     logger.info(`Upload Part Copy Successful: ${successResponse.statusText}`);
   }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function uploadPartCopy(
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;
logger.info("[S3 backend] Proxying Upload Part Copy Request...");
const { bucket, objectKey, queryParams } = s3Utils.extractRequestInfo(req);
if (!bucket) {
return createOk(
new Response("Bucket information missing from the request", {
status: 400,
}),
);
}
const config = bucketConfig.config as S3Config;
const headers = new Headers(req.headers);
headers.set("Host", config.endpoint.replace(/^https?:\/\//, ""));
// Convert queryParams to URLSearchParams
const params = new URLSearchParams();
for (const [key, values] of Object.entries(queryParams)) {
for (const value of values) {
params.append(key, value);
}
}
const reqUrl =
`${config.endpoint}/${bucket}/${objectKey}?${params.toString()}`;
const fetchFunc = async () => {
return await fetch(reqUrl, {
method: "PUT",
headers: headers,
// Use the request body directly
body: req.body,
});
};
const response = await retryWithExponentialBackoff(
fetchFunc,
);
if (!isOk(response)) {
const errRes = unwrapErr(response);
logger.warn(
`Upload Part Copy Failed. Failed to connect with Object Storage: ${errRes.message}`,
);
return response;
}
const successResponse = unwrapOk(response) as Response;
if (successResponse.status !== 200) {
const errMessage = `Upload Part Copy Failed: ${successResponse.statusText}`;
logger.warn(errMessage);
reportToSentry(errMessage);
} else {
logger.info(`Upload Part Copy Successful: ${successResponse.statusText}`);
}
return createOk(successResponse);
}
export async function uploadPartCopy(
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;
logger.info("[S3 backend] Proxying Upload Part Copy Request...");
const { bucket, objectKey, queryParams } = s3Utils.extractRequestInfo(req);
if (!bucket) {
return createOk(
new Response("Bucket information missing from the request", {
status: 400,
}),
);
}
// Validate required headers for copy operation
const copySource = req.headers.get("x-amz-copy-source");
if (!copySource) {
return createOk(
new Response("x-amz-copy-source header is required", {
status: 400,
}),
);
}
const config = bucketConfig.config as S3Config;
const headers = new Headers(req.headers);
headers.set("Host", config.endpoint.replace(/^https?:\/\//, ""));
// Convert queryParams to URLSearchParams
const params = new URLSearchParams();
for (const [key, values] of Object.entries(queryParams)) {
for (const value of values) {
params.append(key, value);
}
}
const reqUrl =
`${config.endpoint}/${bucket}/${objectKey}?${params.toString()}`;
const fetchFunc = async () => {
return await fetch(reqUrl, {
method: "PUT",
headers: headers,
// Use the request body directly
body: req.body,
});
};
const response = await retryWithExponentialBackoff(
fetchFunc,
);
if (!isOk(response)) {
const errRes = unwrapErr(response);
logger.warn(
`Upload Part Copy Failed. Failed to connect with Object Storage: ${errRes.message}`,
);
return response;
}
const successResponse = unwrapOk(response) as Response;
if (successResponse.status < 200 || successResponse.status >= 300) {
const errMessage = `Upload Part Copy Failed: ${successResponse.statusText}`;
logger.warn(errMessage);
reportToSentry(errMessage);
} else {
logger.info(`Upload Part Copy Successful: ${successResponse.statusText}`);
}
return createOk(successResponse);
}
🤖 Prompt for AI Agents
In src/backends/s3/objects.ts around lines 544 to 607, the uploadPartCopy
function incorrectly forwards the request body directly, but for S3
UploadPartCopy, the body should typically be empty or contain specific XML, and
the copy source must be specified via the 'x-amz-copy-source' header. Update the
code to validate the presence of the 'x-amz-copy-source' header and forward it
properly. Also, set the request body to null or empty as appropriate instead of
forwarding the original body. Additionally, modify the success status check to
accept other 2xx status codes like 201 as successful responses.

Comment on lines 1413 to +1418
export function uploadPartCopy(
_reqCtx: RequestContext,
_req: Request,
_bucketConfig: Bucket,
): Result<Response, Error> {
return createOk(NotImplementedException());
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make the uploadPartCopy function properly async.

The function signature is not marked as async but it returns a Promise and uses async operations internally via an IIFE.

-export function uploadPartCopy(
+export async function uploadPartCopy(
   reqCtx: RequestContext,
   req: Request,
   bucketConfig: Bucket,
 ): Promise<Result<Response, Error>> {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export function uploadPartCopy(
_reqCtx: RequestContext,
_req: Request,
_bucketConfig: Bucket,
): Result<Response, Error> {
return createOk(NotImplementedException());
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;
export async function uploadPartCopy(
reqCtx: RequestContext,
req: Request,
bucketConfig: Bucket,
): Promise<Result<Response, Error>> {
const logger = reqCtx.logger;
🤖 Prompt for AI Agents
In src/backends/swift/objects.ts around lines 1413 to 1418, the uploadPartCopy
function is not declared as async but returns a Promise and uses async
operations inside an immediately invoked function expression (IIFE). To fix
this, mark the uploadPartCopy function itself as async and remove the IIFE,
directly using await inside the function body to handle asynchronous operations
properly.

Comment on lines +1468 to 1685
return (async () => {
const config: SwiftConfig = bucketConfig.config as SwiftConfig;
const res = reqCtx.heraldContext.keystoneStore.getConfigAuthMeta(config);
const { storageUrl: swiftUrl, token: authToken } = res;
const headers = getSwiftRequestHeaders(authToken);

// Destination path for the part
const destUrl = `${swiftUrl}/${bucket}/${object}/${partNumber}`;

// Source object URL
const sourceUrl = `${swiftUrl}/${sourceBucket}/${sourceKey}`;

// First, fetch the source object (with range if specified)
if (copySourceRange) {
headers.set("Range", copySourceRange.replace("bytes=", "bytes="));
}

const fetchSourceFunc = async () => {
return await fetch(sourceUrl, {
method: "GET",
headers: headers,
});
};

const sourceResponse = await retryWithExponentialBackoff(fetchSourceFunc);

if (!isOk(sourceResponse)) {
const errRes = unwrapErr(sourceResponse);
logger.warn(
`Upload Part Copy Failed. Failed to fetch source object: ${errRes.message}`,
);
return sourceResponse;
}

const sourceSuccessResponse = unwrapOk(sourceResponse);
if (
sourceSuccessResponse.status !== 200 &&
sourceSuccessResponse.status !== 206
) {
const errMessage =
`Upload Part Copy Failed: Source object not found or inaccessible: ${sourceSuccessResponse.statusText}`;
logger.warn(errMessage);
reportToSentry(errMessage);
return createOk(InvalidRequestException(errMessage));
}

// Get the source content
const sourceContent = await sourceSuccessResponse.arrayBuffer();

// Put the content to the destination part
const putHeaders = getSwiftRequestHeaders(authToken);
const fetchFunc = async () => {
return await fetch(destUrl, {
method: "PUT",
headers: putHeaders,
body: sourceContent,
});
};

const response = await retryWithExponentialBackoff(fetchFunc);

if (!isOk(response)) {
const errRes = unwrapErr(response);
logger.warn(
`Upload Part Copy Failed. Failed to connect with Object Storage: ${errRes.message}`,
);
return response;
}

const successResponse = unwrapOk(response);
const lastModified = new Date().toISOString();
if (successResponse.status !== 201) {
const errMessage =
`Upload Part Copy Failed: ${successResponse.statusText}`;
logger.warn(errMessage);
reportToSentry(errMessage);
return createOk(InvalidRequestException(errMessage));
} else {
logger.info(`Upload Part Copy Successful: ${successResponse.statusText}`);

// Update the multipart upload session file
const multipartSessionPath = `${MULTIPART_UPLOADS_PATH}/${uploadId}.json`;
const multipartSessionUrl =
`${swiftUrl}/${bucket}/${multipartSessionPath}`;

try {
const headers = getSwiftRequestHeaders(authToken);
// Fetch the current session JSON
const getSessionFunc = async () => {
return await fetch(multipartSessionUrl, {
method: "GET",
headers: headers,
});
};

const getSessionResponse = await retryWithExponentialBackoff(
getSessionFunc,
);

if (
!isOk(getSessionResponse) ||
unwrapOk(getSessionResponse).status === 404
) {
logger.error(
`Multipart upload session file not found for uploadId ${uploadId} at ${multipartSessionPath}`,
);
return createOk(NoSuchUploadException());
}

const sessionResponse = unwrapOk(getSessionResponse);

// Check content type to ensure we're getting JSON
const contentType = sessionResponse.headers.get("Content-Type");
if (!contentType || !contentType.includes("application/json")) {
logger.error(
`Invalid content type for multipart session file: ${contentType}. Expected application/json for uploadId ${uploadId}`,
);
// Continue with response generation without updating session
} else {
try {
const sessionJson = await sessionResponse.json();

// Prepare part metadata
const eTag = successResponse.headers.get("etag") ||
successResponse.headers.get("ETag") || "";

// Get content length from the source response
const size = parseInt(
sourceSuccessResponse.headers.get("Content-Length") || "0",
10,
) ||
(endByte !== undefined
? (endByte - startByte + 1)
: sourceContent.byteLength);

const partMeta = {
partNumber: Array.isArray(partNumber)
? partNumber[0]
: partNumber,
eTag,
size,
lastModified,
};

// Update or create the parts array
if (!Array.isArray(sessionJson.parts)) sessionJson.parts = [];

// Remove any existing entry for this partNumber
sessionJson.parts = sessionJson.parts.filter(
(p: { partNumber: string }) =>
p.partNumber !== partMeta.partNumber,
);

sessionJson.parts.push(partMeta);

// Save the updated session JSON
headers.set("Content-Type", "application/json");
const putSessionFunc = async () => {
return await fetch(multipartSessionUrl, {
method: "PUT",
headers: headers,
body: JSON.stringify(sessionJson),
});
};

const putSessionResponse = await retryWithExponentialBackoff(
putSessionFunc,
);

if (!isOk(putSessionResponse) || !unwrapOk(putSessionResponse).ok) {
logger.error(
`Failed to update multipart upload session file for uploadId ${uploadId} at ${multipartSessionPath}`,
);
} else {
logger.info(
`Updated multipart upload session file for uploadId ${uploadId} with part ${partMeta.partNumber}`,
);
}
} catch (error) {
logger.error(
`Error parsing multipart upload session file for uploadId ${uploadId}: ${
error instanceof Error ? error.message : String(error)
}`,
);
// Continue with response generation without updating session
}
}
} catch (error) {
logger.error(
`Error updating multipart upload session file for uploadId ${uploadId}: ${
error instanceof Error ? error.message : String(error)
}`,
);
}
}

// Generate S3 CopyPartResult XML response
const eTag = successResponse.headers.get("etag") ||
successResponse.headers.get("ETag") || "";

const xmlResponse = `<?xml version="1.0" encoding="UTF-8"?>
<CopyPartResult xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<LastModified>${lastModified}</LastModified>
<ETag>${eTag}</ETag>
</CopyPartResult>`;

return createOk(
new Response(xmlResponse, {
status: 200,
headers: {
"Content-Type": "application/xml",
"x-amz-request-id": getRandomUUID(),
"x-amz-id-2": getRandomUUID(),
},
}),
);
})();
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring the IIFE pattern in uploadPartCopy.

The function uses an immediately invoked async function expression (IIFE) pattern which makes the code harder to read and maintain. Since the function should be async, the IIFE is unnecessary.

Remove the IIFE wrapper and make the function body directly async:

-  return (async () => {
     const config: SwiftConfig = bucketConfig.config as SwiftConfig;
     // ... rest of the implementation
     return createOk(
       new Response(xmlResponse, {
         status: 200,
         headers: {
           "Content-Type": "application/xml",
           "x-amz-request-id": getRandomUUID(),
           "x-amz-id-2": getRandomUUID(),
         },
       }),
     );
-  })();

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/backends/swift/objects.ts between lines 1468 and 1685, the uploadPartCopy
function uses an unnecessary immediately invoked async function expression
(IIFE) which complicates readability and maintenance. Refactor by removing the
IIFE wrapper and instead declare the function itself as async, moving the IIFE's
inner code directly into the function body to simplify the async flow.

Comment on lines +103 to +104
"Authorization":
`AWS ${credentials.accessKeyId}:${credentials.secretAccessKey}`,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the authorization header format.

The authorization header is incorrectly formatted. AWS S3 doesn't accept simple AWS accessKeyId:secretAccessKey format. This should use proper AWS Signature Version 4 or at least AWS Signature Version 2 format.

Consider using a proper AWS signature library or the SDK's request signing capabilities:

-        "Authorization":
-          `AWS ${credentials.accessKeyId}:${credentials.secretAccessKey}`,
+        // TODO: Use proper AWS signature
+        // For now, this might only work with non-authenticating test backends

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In tests/swift/basic/uploadpartcopy_test.ts around lines 103 to 104, the
Authorization header is incorrectly formatted as a simple concatenation of
accessKeyId and secretAccessKey, which AWS S3 does not accept. Replace this
manual header with a properly signed Authorization header using AWS Signature
Version 4 or Version 2 by integrating an AWS SDK signing method or a dedicated
AWS signature library to generate the correct header dynamically.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
tests/swift/basic/uploadpartcopy_checksum_test.ts (6)

13-13: Remove commented out import.

Remove the commented out logging middleware import as it's not being used.

-// import { loggingMiddleware } from "../../utils/mod.ts";

42-44: Remove commented out middleware code.

Remove the commented out middleware stack addition as it's not being used.

-// s3.middlewareStack.add(loggingMiddleware, {
-//   step: "finalizeRequest",
-// });

24-24: Consider reducing test file size for faster execution.

The 300MB test file might significantly slow down the test suite. Consider whether a smaller file (e.g., 50-100MB) would still adequately test the multipart copy functionality while improving test performance.

-const SOURCE_SIZE = 300 * MB;
+const SOURCE_SIZE = 50 * MB; // Reduced for faster test execution

56-66: Extract credential handling to avoid duplication.

The credential handling logic is duplicated between the S3 client initialization (lines 31-36) and this helper function. Consider extracting it to a single location.

+// Move this before S3 client initialization
+function getCredentials() {
+  return "accessKeyId" in containerConfig.credentials
+    ? {
+      accessKeyId: containerConfig.credentials.accessKeyId,
+      secretAccessKey: containerConfig.credentials.secretAccessKey,
+    }
+    : {
+      accessKeyId: containerConfig.credentials.username,
+      secretAccessKey: containerConfig.credentials.password,
+    };
+}
+
 const s3 = new S3Client({
-  credentials: "accessKeyId" in containerConfig.credentials
-    ? containerConfig.credentials
-    : {
-      accessKeyId: containerConfig.credentials.username,
-      secretAccessKey: containerConfig.credentials.password,
-    },
+  credentials: getCredentials(),
   region: containerConfig.region,
   forcePathStyle: true,
   endpoint: proxyUrl,
 });

-// Helper function to get credentials in a type-safe way
-function getCredentials() {
-  return "accessKeyId" in containerConfig.credentials
-    ? {
-      accessKeyId: containerConfig.credentials.accessKeyId,
-      secretAccessKey: containerConfig.credentials.secretAccessKey,
-    }
-    : {
-      accessKeyId: containerConfig.credentials.username,
-      secretAccessKey: containerConfig.credentials.password,
-    };
-}

197-197: Remove commented out debug code.

Remove the commented out setTimeout debug line.

-        // await new Promise((resolve) => setTimeout(resolve, 50000));

303-305: Consider removing or reducing the arbitrary delay.

The 3-second delay might not be necessary or could be reduced. If object assembly is eventually consistent, consider using a retry mechanism with exponential backoff instead of a fixed delay.

-  await t.step("Wait for object assembly", async () => {
-    console.log("Waiting for object assembly...");
-    await new Promise((resolve) => setTimeout(resolve, 3000));
-  });
+  // Remove this step if object assembly is synchronous, or implement retry logic in the verification step
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7e17889 and d7dc46a.

📒 Files selected for processing (1)
  • tests/swift/basic/uploadpartcopy_checksum_test.ts (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: destifo
PR: expnt/herald#54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as <uploadId>.json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.
Learnt from: destifo
PR: expnt/herald#10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The `routeQueryParamedRequest` and `headBucket` functions in `src/backends/s3/buckets.ts` already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.
tests/swift/basic/uploadpartcopy_checksum_test.ts (4)

Learnt from: destifo
PR: #54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as .json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.

Learnt from: destifo
PR: #49
File: tests/_others/_task_queue_test.ts:7-12
Timestamp: 2025-06-26T12:47:53.021Z
Learning: The test in tests/_others/_task_queue_test.ts is specifically designed to test the task store functionality with an S3 backend. The direct type assertion globalConfig.task_store_backend as S3Config is safe because the test is intentionally S3-specific and uses S3 APIs throughout.

Learnt from: destifo
PR: #10
File: src/backends/mirror.ts:170-172
Timestamp: 2025-01-28T17:41:43.238Z
Learning: In the mirror operations (src/backends/mirror.ts), bucket existence is guaranteed by design as mirror tasks are only created for buckets that exist in the bucket store, which is initialized during startup. Therefore, non-null assertions when finding buckets in the store are safe.

Learnt from: destifo
PR: #10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The routeQueryParamedRequest and headBucket functions in src/backends/s3/buckets.ts already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test-full
🔇 Additional comments (1)
tests/swift/basic/uploadpartcopy_checksum_test.ts (1)

183-189: Incorrect AWS signature implementation.

The authorization header is using an incorrect and insecure format. AWS signatures require proper HMAC-based calculation, not simple concatenation of credentials.

For testing purposes, if the proxy accepts this simplified format, add a comment explaining this is test-only. For production-like testing, implement proper AWS Signature V4:

// Add comment if this is intentional for testing
const headers = {
  // WARNING: This is a test-only simplified auth format, not AWS-compliant
  "Authorization":
    `AWS ${credentials.accessKeyId}:${credentials.secretAccessKey}`,
  "x-amz-copy-source": `${containerName}/${sourceObjectKey}`,
  "x-amz-copy-source-range": `bytes=${startByte}-${endByte}`,
  "x-amz-date": date,
};

Consider using the AWS SDK's built-in request signing or implementing proper AWS Signature V4 if testing against real S3-compatible services.

⛔ Skipped due to learnings
Learnt from: destifo
PR: expnt/herald#41
File: src/backends/mirror.ts:166-174
Timestamp: 2025-06-16T14:42:52.174Z
Learning: In src/backends/mirror.ts, when creating replica requests that reuse originalRequest.headers, the authentication headers (Authorization, x-amz-date, etc.) are automatically flushed/replaced down the request pipeline by functions like s3.putObject, so there's no signature mismatch issue with forwarding the original headers.
Learnt from: destifo
PR: expnt/herald#49
File: tests/_others/_task_queue_test.ts:7-12
Timestamp: 2025-06-26T12:47:53.021Z
Learning: The test in `tests/_others/_task_queue_test.ts` is specifically designed to test the task store functionality with an S3 backend. The direct type assertion `globalConfig.task_store_backend as S3Config` is safe because the test is intentionally S3-specific and uses S3 APIs throughout.
Learnt from: Yohe-Am
PR: expnt/herald#37
File: src/utils/signer.ts:140-142
Timestamp: 2025-05-05T19:02:11.966Z
Learning: In pre-signed S3 URL requests, parameters like X-Amz-Date are parsed from query params (not headers) and are validated using Zod schema.
Learnt from: destifo
PR: expnt/herald#10
File: src/backends/s3/buckets.ts:0-0
Timestamp: 2025-01-28T17:36:22.149Z
Learning: The `routeQueryParamedRequest` and `headBucket` functions in `src/backends/s3/buckets.ts` already implement retry logic for replica failover when the primary request fails, supporting both S3 and Swift replicas.
Learnt from: destifo
PR: expnt/herald#54
File: src/backends/swift/objects.ts:0-0
Timestamp: 2025-07-04T07:25:29.056Z
Learning: In src/backends/swift/objects.ts, the multipart upload system was redesigned to use individual per-upload session files (stored as <uploadId>.json) instead of a centralized index.json file. This design choice was made intentionally to avoid conflicts and race conditions that occur when multiple operations try to update the same index file simultaneously. The N+1 HTTP requests approach is preferred over the single index file approach for data consistency, even though it has performance implications.
Learnt from: destifo
PR: expnt/herald#20
File: src/backends/swift/objects.ts:189-189
Timestamp: 2025-04-23T09:09:42.386Z
Learning: In the herald project, write operations (putObject, deleteObject, copyObject, etc.) and read operations (getObject, listObjects, etc.) follow different patterns:
1. Write operations use `const response` without replica retry logic, often using mirrorOperation for eventual consistency
2. Read operations use `let response` with replica failover logic when primary fails


// Create a source object to copy from (simulating the original Docker layer)
await t.step("Create Source Object (Docker Layer)", async () => {
const tempFile = await createTempFile(SOURCE_SIZE / MB); // 30MB file

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect comment about file size.

The comment says "30MB file" but the actual size is 300MB based on SOURCE_SIZE.

-    const tempFile = await createTempFile(SOURCE_SIZE / MB); // 30MB file
+    const tempFile = await createTempFile(SOURCE_SIZE / MB); // 300MB file
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const tempFile = await createTempFile(SOURCE_SIZE / MB); // 30MB file
const tempFile = await createTempFile(SOURCE_SIZE / MB); // 300MB file
🤖 Prompt for AI Agents
In tests/swift/basic/uploadpartcopy_checksum_test.ts at line 96, the comment
incorrectly states the file size as "30MB file" while the actual size is 300MB
based on SOURCE_SIZE. Update the comment to accurately reflect the file size as
"300MB file" to avoid confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant