Skip to content

[Test] Fix intermittent failure in test_dynamic_file_system_update.#7418

Merged
gmarciani merged 2 commits into
aws:developfrom
gmarciani:wip/mgiacomo/3160/test-dfms-fix-0602-1
Jun 3, 2026
Merged

[Test] Fix intermittent failure in test_dynamic_file_system_update.#7418
gmarciani merged 2 commits into
aws:developfrom
gmarciani:wip/mgiacomo/3160/test-dfms-fix-0602-1

Conversation

@gmarciani
Copy link
Copy Markdown
Contributor

@gmarciani gmarciani commented Jun 2, 2026

Description of changes

Fix intermittent failure in test_dynamic_file_system_update.
The test could fail because the external EFS has chances to deploy mount targets in AZs that are not used by the cluster.
With this fix we deploy the storage in the private subnets used by the compute nodes (one per AZ), so that the external EFS has a mount target in every AZ used by the cluster.

Tests

SUCCEEDED

test-suites:
  update:
    test_update.py::test_dynamic_file_systems_update:
      dimensions:
        - instances:
            - c5.xlarge
          oss:
            - alinux2023
          regions:
            - us-east-1
          schedulers:
            - slurm

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@gmarciani gmarciani added skip-changelog-update Disables the check that enforces changelog updates in PRs 3.x Test labels Jun 2, 2026
@gmarciani gmarciani marked this pull request as ready for review June 2, 2026 15:43
@gmarciani gmarciani requested review from a team as code owners June 2, 2026 15:43
gmarciani added 2 commits June 3, 2026 12:07
…parameter SubnetTwo optional to support the case where we need to deploy EFs mount targets in a single-AZ.
… deploying EFS mount targets in the single AZ used by compute nodes.

Before this fix, the test could fail when EFS mount targets were deployed in AZs different from those where the compute nodes were deployed. This fix forces the EFS mount targets to be created in the same AZ where the compute nodes are deployed.
@gmarciani gmarciani force-pushed the wip/mgiacomo/3160/test-dfms-fix-0602-1 branch from a3e13bb to e98a66f Compare June 3, 2026 10:10
@gmarciani gmarciani enabled auto-merge (rebase) June 3, 2026 12:36
Copy link
Copy Markdown
Contributor

@hehe7318 hehe7318 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks not good to me to change none test file cloudformation/storage/storage-stack.yaml, just for fixing intermittent test failure.
What do you think?

@gmarciani
Copy link
Copy Markdown
Contributor Author

Looks not good to me to change none test file cloudformation/storage/storage-stack.yaml, just for fixing intermittent test failure.
What do you think?

@hehe7318 that's an interesting point. I would agree with you if the change were not backward compatible.
In this case, the change is backward compatible, we are just making the template able to deploy in single-AZ fashion.
We are making the template more flexible, we are not reducing its capabilities.

Also, let's think about the alternative approach where we do not make any change to the template. In this case we to deploy the storage stack with multi-AZ fashion forcefully, but using just one AZ, which basically means that we are paying to an EFS mount target that is never used.

What do you think?

@hehe7318
Copy link
Copy Markdown
Contributor

hehe7318 commented Jun 3, 2026

Looks not good to me to change none test file cloudformation/storage/storage-stack.yaml, just for fixing intermittent test failure.
What do you think?

@hehe7318 that's an interesting point. I would agree with you if the change were not backward compatible. In this case, the change is backward compatible, we are just making the template able to deploy in single-AZ fashion. We are making the template more flexible, we are not reducing its capabilities.

Also, let's think about the alternative approach where we do not make any change to the template. In this case we to deploy the storage stack with multi-AZ fashion forcefully, but using just one AZ, which basically means that we are paying to an EFS mount target that is never used.

What do you think?

Convinced me! Approved!

@gmarciani gmarciani merged commit 634aa37 into aws:develop Jun 3, 2026
19 checks passed
@gmarciani gmarciani deleted the wip/mgiacomo/3160/test-dfms-fix-0602-1 branch June 3, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3.x skip-changelog-update Disables the check that enforces changelog updates in PRs Test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants