feat(catalog): hadoop table and namespace CRUD operations#969
feat(catalog): hadoop table and namespace CRUD operations#969tanmayrauth wants to merge 7 commits intoapache:mainfrom
Conversation
Implement CreateNamespace, DropNamespace, CheckNamespaceExists, ListNamespaces, LoadNamespaceProperties, and UpdateNamespaceProperties (unsupported, matching Java). Relates to apache#798 Depends-on: apache#953 (scaffold) Depended-on-by: PR 4 (table CRUD), PR 5 (list/drop/rename)
Set up Docker and Spark infrastructure for Hadoop catalog cross-compatibility testing with Java's HadoopCatalog. - Add hadoop_validation.py: SparkSession configured with spark.sql.catalog.hadoop_test (type=hadoop, warehouse=/home/iceberg/hadoop-warehouse) - Add shared volume mount in docker-compose.yml: /tmp/iceberg-hadoop-warehouse (host) <-> /home/iceberg/hadoop-warehouse (Spark) - Copy hadoop_validation.py into Spark container via Dockerfile - Add make integration-hadoop target No Go code — purely infrastructure so subsequent PRs can add integration test cases that validate Go ↔ Spark interop. Depends-on: nothing (parallel with PR 1) Depended-on-by: PRs 4, 5, 6 (integration test cases)
…p catalog Implement the three core table operations: - CreateTable: validates namespace exists, rejects custom locations, writes v1.metadata.json via temp-file+rename, updates version hint - LoadTable: uses findVersion with three-tier fallback, delegates to table.NewFromLocation for metadata parsing - CheckTableExists: delegates to isTableDir Relates to apache#798 Depends-on: PR 2 (version-hint), PR 3 (namespace-ops) Depended-on-by: PR 6 (CommitTable)
Add cross-compatibility integration tests verifying CreateTable, LoadTable, and CheckTableExists work between Go and Spark Hadoop catalogs. Pre-create the hadoop-warehouse directory before Docker compose to ensure runner ownership in CI.
3305012 to
fb3dd76
Compare
|
@laskoviymishka @zeroshade can you please review this PR? |
| info, err := os.Stat(nsPath) | ||
| if os.IsNotExist(err) || (err == nil && !info.IsDir()) { | ||
| return nil, fmt.Errorf("%w: %s", catalog.ErrNoSuchNamespace, strings.Join(ns, ".")) | ||
| } |
There was a problem hiding this comment.
shouldn't this support customizable file systems beyond just local? i.e. shouldn't this use the io package?
There was a problem hiding this comment.
This is intentionally local-only for now to match the scoped plan (local parity with Spark's Java HadoopCatalog first). The io.IO interface doesn't currently have Stat or MkdirAll equivalents needed for directory-based namespace operations, so switching to it would require extending the interface. I'll open a follow-up issue to add something like StatableIO and refactor to use icebergio.IO throughout for HDFS/cloud support.
There was a problem hiding this comment.
Fair enough. Let's continue to use pkg.go.dev/io/fs as inspiration for any changes we make to the IO package.
zeroshade
left a comment
There was a problem hiding this comment.
LGTM just update the docstrings for NewCatalog/Catalog to specify that this only supports local filesystem for now
Update Catalog and NewCatalog docstrings to note that only local filesystem paths are currently supported.
|
Updated the docstring. |
|
looks good, just need to resolve the conflicts! |
4: CreateTable + LoadTable + CheckTableExists
Implement the three core table operations. CreateTable validates the namespace exists, rejects custom locations, builds metadata via table.NewMetadata, writes v1.metadata.json through a temp-file-plus-rename pattern, and does a best-effort version-hint write. LoadTable calls findVersion to get the current version, builds the metadata path, and delegates to table.NewFromLocation. CheckTableExists delegates to isTableDir. Tests cover create-and-load round-trip, create with partition spec / sort order / properties, reject custom location, create in non-existent namespace, create duplicate, load non-existent, load with stale hint, and check exists true/false.
Depends on #968 #963
Relates to #798