FEAT: Add CATERPILLAR_FILE_PATH_WRITE context key (DATA-8693) by snehalahire-pattern · Pull Request #77 · patterninc/caterpillar

snehalahire-pattern · 2026-06-23T16:48:05Z

Summary

Adds a new record context key CATERPILLAR_FILE_PATH_WRITE populated by the file task's read mode, alongside the existing CATERPILLAR_FILE_NAME_WRITE (which is unchanged for backward compatibility).
New textutil.SlugifyFilePath helper slugifies each path segment individually so the / hierarchy is preserved; URL schemes like s3://bucket/ are stripped, the final segment keeps its extension. e.g. s3://my-bucket/ReportType=A/Folder 1/data.CSV → reporttype_a/folder_1/data.csv.
Lets destination paths encode the full source hierarchy (.../ds={{ ds }}/{{ context "CATERPILLAR_FILE_PATH_WRITE" }}), avoiding same-name collisions when reading nested directories with a recursive glob (e.g. reportType=X/**/**.tsv).

ClickUp

Note on overlapping work

A separate branch feat/file-path-write-context (commit ac18420) takes a different approach to the same ticket: it slugifies the full path into a single flat segment (collapsing / to _) and also wires the new key into the archive/sftp tasks. This PR preserves the directory hierarchy literally, per the ticket's "Path information preserves folder hierarchy" criterion, and is scoped to the file task. Reviewers should pick one approach before merging.

Test plan

go build ./... clean (verified locally)
go test ./... green (verified locally)
Run a pipeline reading reportType=X/**/**.tsv and writing to .../{{ context "CATERPILLAR_FILE_PATH_WRITE" }}; confirm files from different subdirectories no longer overwrite each other in the destination.
Confirm existing pipelines using CATERPILLAR_FILE_NAME_WRITE continue to behave identically.

🤖 Generated with Claude Code

Expose the sanitized full source path as a record context value on the file task's read mode, alongside the existing CATERPILLAR_FILE_NAME_WRITE (base name only). Path segments are slugified individually with "/" preserved between them, so the directory hierarchy survives and can be used in destination paths to avoid same-name collisions when reading nested folders with a recursive glob (e.g. reportType=X/**/**.tsv). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Mahesh Kamble (ma-gk) · 2026-06-24T12:04:20Z

@@ -146,6 +146,7 @@ func (f *file) readFile(output chan<- *record.Record) error {
 		// Create a default record with context
 		rc := &record.Record{Context: ctx}
 		rc.SetContextValue(string(task.CtxKeyFileNameWrite), textutil.SlugifyFileName(filepath.Base(path)))


Can we create a single variable for the value generated by textutil.SlugifyFileName and reuse it wherever needed? This would help avoid code duplication and improve maintainability.

Mahesh Kamble (@ma-gk) SlugifyFileName is used only once, the other one is SlugifyFilePath.
Do you mean the same or something else?

- Extract URL scheme stripping in SlugifyFilePath into a stripURLScheme helper. - Hoist the slugified file name into a local variable in the file task. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prasadlohakpure · 2026-06-24T15:21:08Z

+In read mode, two values are stored in each record's context:
+
+- `CATERPILLAR_FILE_NAME_WRITE` — the sanitized base filename. The stem is lowercased with non-alphanumeric characters replaced by underscores, while the extension is preserved and lowercased (e.g. `"Report 1.CSV"` → `"report_1.csv"`).
+- `CATERPILLAR_FILE_PATH_WRITE` — the sanitized full source path with directory hierarchy preserved. Each segment is slugified the same way; the final segment keeps its extension; URL schemes such as `s3://bucket/` are stripped (e.g. `s3://my-bucket/ReportType=A/Folder 1/data.CSV` → `reporttype_a/folder_1/data.csv`). Reference it in the destination of a downstream write task to avoid collisions when reading nested directories with a recursive glob.


Do we have any use case/example where we would be leveraging this sluggified file path?

snehalahire-pattern requested a review from a team as a code owner June 23, 2026 16:48

refactored code

de7105b

snehalahire-pattern requested review from Mayuresh Pawar (Mayureshpawar29) and Mahesh Kamble (ma-gk) June 24, 2026 12:01

Mahesh Kamble (ma-gk) reviewed Jun 24, 2026

View reviewed changes

Merge branch 'main' into snehal/DATA-8693-file-path-context

6ca62b0

Mayuresh Pawar (Mayureshpawar29) reviewed Jun 24, 2026

View reviewed changes

Comment thread internal/pkg/textutil/slugify.go Outdated

REFACTOR: Address PR review feedback

cc751c2

- Extract URL scheme stripping in SlugifyFilePath into a stripURLScheme helper. - Hoist the slugified file name into a local variable in the file task. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

prasadlohakpure reviewed Jun 24, 2026

View reviewed changes

Mayuresh Pawar (Mayureshpawar29) approved these changes Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FEAT: Add CATERPILLAR_FILE_PATH_WRITE context key (DATA-8693)#77

FEAT: Add CATERPILLAR_FILE_PATH_WRITE context key (DATA-8693)#77
snehalahire-pattern wants to merge 4 commits into
mainfrom
snehal/DATA-8693-file-path-context

snehalahire-pattern commented Jun 23, 2026

Uh oh!

Mahesh Kamble (ma-gk) Jun 24, 2026

Uh oh!

snehalahire-pattern Jun 24, 2026

Uh oh!

Uh oh!

prasadlohakpure Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

snehalahire-pattern commented Jun 23, 2026

Summary

ClickUp

Note on overlapping work

Test plan

Uh oh!

Mahesh Kamble (ma-gk) Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

snehalahire-pattern Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prasadlohakpure Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants