Skip to content

feat(catalog): hadoop table and namespace CRUD operations#969

Open
tanmayrauth wants to merge 7 commits intoapache:mainfrom
tanmayrauth:feat/hadoop-table-crud
Open

feat(catalog): hadoop table and namespace CRUD operations#969
tanmayrauth wants to merge 7 commits intoapache:mainfrom
tanmayrauth:feat/hadoop-table-crud

Conversation

@tanmayrauth
Copy link
Copy Markdown
Contributor

@tanmayrauth tanmayrauth commented May 1, 2026

4: CreateTable + LoadTable + CheckTableExists
Implement the three core table operations. CreateTable validates the namespace exists, rejects custom locations, builds metadata via table.NewMetadata, writes v1.metadata.json through a temp-file-plus-rename pattern, and does a best-effort version-hint write. LoadTable calls findVersion to get the current version, builds the metadata path, and delegates to table.NewFromLocation. CheckTableExists delegates to isTableDir. Tests cover create-and-load round-trip, create with partition spec / sort order / properties, reject custom location, create in non-existent namespace, create duplicate, load non-existent, load with stale hint, and check exists true/false.

Depends on #968 #963
Relates to #798

Implement CreateNamespace, DropNamespace, CheckNamespaceExists,
ListNamespaces, LoadNamespaceProperties, and UpdateNamespaceProperties
(unsupported, matching Java).

Relates to apache#798

Depends-on: apache#953 (scaffold)
Depended-on-by: PR 4 (table CRUD), PR 5 (list/drop/rename)
Set up Docker and Spark infrastructure for Hadoop catalog
cross-compatibility testing with Java's HadoopCatalog.

- Add hadoop_validation.py: SparkSession configured with
  spark.sql.catalog.hadoop_test (type=hadoop, warehouse=/home/iceberg/hadoop-warehouse)
- Add shared volume mount in docker-compose.yml:
  /tmp/iceberg-hadoop-warehouse (host) <-> /home/iceberg/hadoop-warehouse (Spark)
- Copy hadoop_validation.py into Spark container via Dockerfile
- Add make integration-hadoop target

No Go code — purely infrastructure so subsequent PRs can add
integration test cases that validate Go ↔ Spark interop.

Depends-on: nothing (parallel with PR 1)
Depended-on-by: PRs 4, 5, 6 (integration test cases)
…p catalog

Implement the three core table operations:

- CreateTable: validates namespace exists, rejects custom locations,
  writes v1.metadata.json via temp-file+rename, updates version hint
- LoadTable: uses findVersion with three-tier fallback, delegates to
  table.NewFromLocation for metadata parsing
- CheckTableExists: delegates to isTableDir

Relates to apache#798

Depends-on: PR 2 (version-hint), PR 3 (namespace-ops)
Depended-on-by: PR 6 (CommitTable)
@tanmayrauth tanmayrauth requested a review from zeroshade as a code owner May 1, 2026 22:08
Add cross-compatibility integration tests verifying CreateTable,
LoadTable, and CheckTableExists work between Go and Spark Hadoop
catalogs. Pre-create the hadoop-warehouse directory before Docker
compose to ensure runner ownership in CI.
@tanmayrauth tanmayrauth force-pushed the feat/hadoop-table-crud branch from 3305012 to fb3dd76 Compare May 2, 2026 01:52
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

@laskoviymishka @zeroshade can you please review this PR?

Comment thread catalog/hadoop/hadoop.go
Comment on lines +253 to +256
info, err := os.Stat(nsPath)
if os.IsNotExist(err) || (err == nil && !info.IsDir()) {
return nil, fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace, strings.Join(ns, "."))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this support customizable file systems beyond just local? i.e. shouldn't this use the io package?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intentionally local-only for now to match the scoped plan (local parity with Spark's Java HadoopCatalog first). The io.IO interface doesn't currently have Stat or MkdirAll equivalents needed for directory-based namespace operations, so switching to it would require extending the interface. I'll open a follow-up issue to add something like StatableIO and refactor to use icebergio.IO throughout for HDFS/cloud support.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. Let's continue to use pkg.go.dev/io/fs as inspiration for any changes we make to the IO package.

Copy link
Copy Markdown
Member

@zeroshade zeroshade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM just update the docstrings for NewCatalog/Catalog to specify that this only supports local filesystem for now

Update Catalog and NewCatalog docstrings to note that only local
filesystem paths are currently supported.
@tanmayrauth
Copy link
Copy Markdown
Contributor Author

Updated the docstring.

@zeroshade
Copy link
Copy Markdown
Member

looks good, just need to resolve the conflicts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants