Skip to content

Add broker FileSystemPackagesStorage support for Functions on Oxia#697

Merged
lhotari merged 10 commits into
apache:masterfrom
lhotari:lh-filesystem-package-storage-support
Jun 25, 2026
Merged

Add broker FileSystemPackagesStorage support for Functions on Oxia#697
lhotari merged 10 commits into
apache:masterfrom
lhotari:lh-filesystem-package-storage-support

Conversation

@lhotari

@lhotari lhotari commented Jun 25, 2026

Copy link
Copy Markdown
Member

Motivation

The Pulsar Packages Management Service — which stores uploaded function packages
(pulsar-admin functions create --jar ...) — runs on the broker. Its default storage provider,
BookKeeperPackagesStorage, relies on DistributedLog metadata in ZooKeeper, so it does not work
when Oxia is used as the metadata store (components.oxia: true).
As a result, Pulsar Functions cannot store uploaded packages on an Oxia-backed cluster.

Modifications

Add broker.packageManagement to host the Packages Management Service on the broker with
FileSystemPackagesStorage, which works without ZooKeeper. It is configured in two levels:

  • broker.packageManagement.enabled enables the service on the broker (enablePackagesManagement).
  • broker.packageManagement.fileSystemStorage.enabled selects FileSystemPackagesStorageProvider and sets
    STORAGE_PATH.

The package storage directory is mounted from a shared PersistentVolumeClaim added under the broker
pod spec, so every broker replica sees the same packages:

  • Single broker / single-node dev clusters (e.g. minikube): the default ReadWriteOnce PVC on the
    cluster's default StorageClass is enough — no extra configuration.
  • Multiple broker replicas: the volume must be a ReadWriteMany shared filesystem.
    broker.packageManagement.fileSystemStorage.{storageClass,persistentVolume,persistentVolumeClaim} render
    raw YAML (only apiVersion/kind are fixed by the chart; {} creates nothing), or you can reference a
    pre-created PVC via claimName. The README documents the GKE Filestore / EKS EFS / AKS Azure Files
    CSI
    options.

If components.functions is enabled with Oxia but FileSystemPackagesStorage is not, the chart fails the
Helm install
with an explanatory error (the default BookKeeper provider would not work without ZooKeeper).

Verifying this change

  • helm lint and helm template pass locally for: defaults, packageManagement.enabled only (BookKeeper),
    both flags (FileSystem provider + shared PVC + StorageClass/PV/PVC), Functions on Oxia, and the render-all
    templates-all-values + patch1 overlay.
  • The Oxia install test (.ci/clusters/values-oxia.yaml) now enables functions + FileSystemPackagesStorage,
    so ci::test_pulsar_function creates a function from a --jar and thereby validates a real package upload
    on Oxia end to end.
  • Make sure that the change passes the CI checks.

The Pulsar Packages Management Service runs on the broker. Its default
BookKeeperPackagesStorage requires ZooKeeper, so Pulsar Functions cannot store
uploaded packages when Oxia is the metadata store.

Add broker.packageManagement to host the Packages Management Service with
FileSystemPackagesStorage on a shared PersistentVolumeClaim mounted on every
broker pod, so Functions work with Oxia (and without ZooKeeper). It is
configured in two levels, like auth.authentication / auth.authentication.jwt:

- broker.packageManagement.enabled enables the service on the broker
  (sets enablePackagesManagement).
- broker.packageManagement.fileSystemStorage.enabled selects the
  FileSystemPackagesStorageProvider and sets STORAGE_PATH. Its storageClass /
  persistentVolume / persistentVolumeClaim render raw YAML (only apiVersion/kind
  fixed; {} creates nothing), and storagePath/claimName configure the shared
  volume. The default PVC is a single-node ReadWriteOnce claim (minikube);
  multi-broker needs a ReadWriteMany shared filesystem (GKE Filestore / EKS EFS /
  AKS Azure Files CSI).
- broker-statefulset mounts the shared volume on every broker pod; the embedded
  worker sets functionsWorkerEnablePackageManagement.
- Fail the Helm install when functions run on Oxia without FileSystemPackagesStorage
  (broker-package-storage-validation.yaml).
- CI: the Oxia install test enables functions + FileSystemPackagesStorage so
  ci::test_pulsar_function validates package upload end to end; render-all patch1
  exercises the StorageClass / PV / PVC branches.
- Docs: README, examples/README, examples/values-oxia.
lhotari and others added 2 commits June 25, 2026 13:58
…broker.conf

The broker-embedded function worker stores function packages in BookKeeper/DLog by
default, which requires ZooKeeper and fails with Oxia (NPE "dlogNamespace is null"
on `pulsar-admin functions create`). Routing package storage through the broker's
Packages Management Service requires functionsWorkerEnablePackageManagement=true.

This is a broker ServiceConfiguration (broker.conf) key, not a function-worker
(functions_worker.yml / PF_) key: for the broker-embedded worker, PulsarService
overrides the worker value with the broker config value
(workerConfig.setFunctionsWorkerEnablePackageManagement(brokerConfig.isFunctionsWorkerEnablePackageManagement())).
The previous PF_functionsWorkerEnablePackageManagement therefore had no effect and
the worker kept using DLog. Set it as a plain broker.conf key gated on
components.functions + broker.packageManagement.enabled.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
…sertion

The broker now mounts a PersistentVolumeClaim as the FileSystemPackagesStorage
directory, but the broker pod set no securityContext/fsGroup. The pulsar image runs
as uid 10000 with primary group 0 (root), so a freshly provisioned volume
(root:root, 0755) is not writable by the broker and the function package upload
would fail. CI did not catch this because the kind/microk8s hostpath provisioners
create the backing dir world-writable (0777), which masks the problem; real
StorageClasses (CSI/block, root:root 0755) and OpenShift (gid=0 prohibited) would
fail. Add broker.securityContext (fsGroup: 0, fsGroupChangePolicy: OnRootMismatch),
mirroring the bookkeeper/zookeeper securityContext, so the mounted volume is
group-0-writable.

Also harden the CI test:
- Use a non-default storagePath (/pulsar/test-packages-storage) in the Oxia values
  so the test actually exercises PULSAR_PREFIX_STORAGE_PATH. The
  FileSystemPackagesStorage default ("packages-storage") resolves to
  /pulsar/packages-storage, which coincides with the chart's default mount path, so
  a broken STORAGE_PATH wiring would still pass.
- Add ci::verify_package_storage_files: after `functions create`, assert the
  uploaded package files exist under the configured storagePath on the broker,
  catching broken STORAGE_PATH wiring or an unwritable volume.

Verified end to end on a local cluster: broker runs uid=10000 gid=0; the package
(data + meta) is written under /pulsar/test-packages-storage/function/..., while the
default /pulsar/packages-storage stays empty (confirming STORAGE_PATH is honored).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
@lhotari lhotari marked this pull request as draft June 25, 2026 15:05
lhotari and others added 7 commits June 25, 2026 18:07
examples/values-oxia.yaml is now a plain Oxia metadata-store example again:
Functions are disabled by default, so it no longer explicitly disables them, and
it no longer carries the broker.packageManagement block. It instead points to a
new example for running Functions on Oxia.

Add examples/values-functions-fs-storage.yaml: enables Functions and broker-hosted
FileSystemPackagesStorage (no ZooKeeper dependency), documented as suitable for
Oxia (merge it with values-oxia.yaml). Reference it from examples/README.md (table
+ Functions section) and from the top-level README.md.

Docs: drop the "(like auth.authentication / auth.authentication.jwt)" analogy from
README.md, examples/README.md and values.yaml, and reword the validation note to
"enabled without ZooKeeper (using Oxia)".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
Drop the duplicated volume-sizing (single broker vs. ReadWriteMany shared
filesystem with GKE/EKS/AKS CSI drivers) and StorageClass/PersistentVolume/
PersistentVolumeClaim raw-YAML details from the examples README. Keep a short
summary and link to the canonical "Pulsar Functions package storage" section in
the top-level README instead. The "Package storage (FileSystemPackagesStorage)"
heading is retained so existing in-page anchor links still resolve.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
For multi-broker FileSystemPackagesStorage, name the managed file service to choose
per cloud (Filestore / Amazon EFS / Azure Files) and note that block storage
(Persistent Disk / EBS / Azure Disk) is ReadWriteOnce and cannot be shared across
replicas. Fix the AKS reference to the correct Azure Files how-to:
https://learn.microsoft.com/azure/aks/create-volume-azure-files

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
Pulsar images run as uid 10000, gid 0, so function package files are written by
that user/group. Explain that broker.securityContext.fsGroup: 0 (OnRootMismatch)
makes the volume group-0 group-writable and works on most volume types, but that
NFS/SMB-backed ReadWriteMany volumes (EFS, Filestore, Azure Files) typically ignore
fsGroup — in which case grant uid 10000 / gid 0 rwx on the share directly (group-0
owned + group-writable, e.g. chmod 2770, or via CSI mount options).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
…E link

Replace the bare long URLs in the shared-filesystem table with short titled links,
and point GKE at the Filestore-via-CSI stateful-workload tutorial.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
Point the GKE reference at the Filestore CSI driver documentation as the primary
link, and keep the stateful-workload tutorial as a short "(example)".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01RZUbdHbb856wdmKxBU4V48
@lhotari lhotari marked this pull request as ready for review June 25, 2026 19:54
@lhotari lhotari merged commit f76de2e into apache:master Jun 25, 2026
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant