Skip to content

Add blobfuse support for AKS#6654

Merged
copybara-service[bot] merged 6 commits into
GoogleCloudPlatform:masterfrom
kiryl-filatau:azure-blobfuse2
May 28, 2026
Merged

Add blobfuse support for AKS#6654
copybara-service[bot] merged 6 commits into
GoogleCloudPlatform:masterfrom
kiryl-filatau:azure-blobfuse2

Conversation

@vofish
Copy link
Copy Markdown
Collaborator

@vofish vofish commented May 13, 2026

Add Azure Blob Storage (blobfuse2) support to the Kubernetes AI inference benchmark

Enables the kubernetes_ai_inference benchmark to load model weights from Azure Blob Storage via the Azure Blob CSI driver (blobfuse2), as an Azure equivalent of the GCS Fuse.

  • AksAutomaticCluster._Create: unconditionally pass --enable-blob-driver at cluster creation, matching GKE Autopilot (where the GCS Fuse CSI driver is on by default).
  • _ApplyBlobFusePVC: renders and applies blobfuse_pv_pvc.yaml.j2 with the Secret, PV, and PVC for the blob CSI driver.

wg_serving_inference_server.py:

  • Add three new flags: --k8s_inference_server_blobstorage_bucket, --k8s_inference_server_blobstorage_account, --k8s_inference_server_blobstorage_resource_group.
  • _GetStorageType: return 'blobfuse' when catalog_components contains blobfuse.
  • _Create: branch to _ApplyBlobFusePVC() when blobfuse is in catalog_components.
  • _ResolveBlobStorageAccount: resolves credentials

Command used to run:

./pkb.py --benchmarks=kubernetes_ai_inference --cloud=Azure --config_override=kubernetes_ai_inference.container_cluster.type='Auto' \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.model_server='vllm' \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.model_name='llama3-8b' \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.extra_deployment_args.pvc-name='gcs-fuse-csi-static-pvc' \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.catalog_components='1-H100,blobfuse,xla-cache' \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.deployment_timeout=2500 \
--config_override=kubernetes_ai_inference.container_cluster.inference_server.extra_deployment_args.model-path='vllm_models/llama3-8b-hf' \
--k8s_ai_inference_hf_token="${HF_TOKEN}" --k8s_ai_inference_request_rate=5 \
--k8s_inference_server_image_repo=us-docker.pkg.dev/p3rf-gke/public \
--metadata=cloud:Azure --zone=centralus --k8s_inference_server_blobstorage_bucket='gke-vllm-test' \
--wg_serving_repo_url='https://github.com/vofish/wg-serving' --wg_serving_repo_branch='azure-test'

It requires gke-vllm-test container storage to be created and the llama3-8b model to be uploaded to the vllm_models/llama3-8b-hf path prior to the run.

A tuple of (storage_account_name, resource_group_name, account_key).
"""
# Only required when Azure blobfuse is requested.
from perfkitbenchmarker.providers import azure
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof this looks quite ugly. We might finally be hitting some of the limit of the current wg_serving_inference_server.py if cloud implementation. It might be somewhat necessary currently (because idk what happens if you put this in root, whether that causes circular dependencies or errors with LoadProvider when running on eg GCP) but is not correct. We should instead have separate files in each provider with cloud specific code. I'll also forward this feedback to Xia internally.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored and moved functionality into a new file perfkitbenchmarker/providers/azure/azure_blob_csi_mount.py

@vofish vofish requested a review from hubatish May 19, 2026 14:40
Copy link
Copy Markdown
Collaborator

@hubatish hubatish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving but as mentioned I think this may fail pytype checks

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gcs-fuse-csi-static-pvc
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like odd gcs-fuse specific metadata?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was updated as follows: name: {{pvc_name}}

spot=FLAGS.azure_low_priority_vms,
)

def ApplyBlobFusePVC(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is basically what I was asking for. It is kubernetes_ai_inference specific atm but it is reasonable to think we'd use this in other benchmarks (none of the setup actually seems that AI specific).

Some additional changes to line up with this:

  • Consider also moving the yaml to data/container/azure rather than data/container/kubernetes/ai_infernece.
  • Add an empty version of this function to container_cluster.py (surprised the code as isn't failing pytype checks). Call it ApplyFusePVC rather than ApplyBlobFusePVC so it's more cloud agnostic.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done:

  • Jinja template was moved to data/container/azure
  • The function was renamed and moved to container_cluster.py

@copybara-service copybara-service Bot merged commit d0118d7 into GoogleCloudPlatform:master May 28, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants