From cfb18a30f7c188852ba9dc3a1844a1d2cab33773 Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:18:28 -0500 Subject: [PATCH 1/7] Removing GPCNet --- docs/index.rst | 2 -- 1 file changed, 2 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 12d8163..a30734f 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -40,7 +40,6 @@ FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing i 32_lammpsACE/lammpsACE 40_remhos/remhos 50_miniem/miniem - 60_mlperf/mlperf .. toctree:: :maxdepth: 3 @@ -50,7 +49,6 @@ FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing i 70_phloem/phloem 71_omb/omb 72_smb/smb - 73_gpcnet/gpcnet 80_ior/ior 81_mdtest/mdtest 82_dlio/dlio From d0ad7525fa63b028aa26f5f182d423ecface5125 Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:20:12 -0500 Subject: [PATCH 2/7] Renaming priorities --- docs/index.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index a30734f..460ce91 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,8 +9,8 @@ 70-89 :: Microbenchmarks 90-99 :: Appendices -FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing is considered to be final -======================================================================================================== +ATS-6 Benchmarks Project. ATTENTION: This page is a work in progress and nothing is considered to be final +========================================================================================================== .. toctree:: :maxdepth: 3 @@ -22,7 +22,7 @@ FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing i .. toctree:: :maxdepth: 3 :numbered: - :caption: Priority 1 Mini-Applications + :caption: Technical Requirements 1 11_kripke/kripke 12_laghos/laghos @@ -34,7 +34,7 @@ FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing i .. toctree:: :maxdepth: 3 :numbered: - :caption: Priority 2 Mini-Applications + :caption: Technical Requirements 2 10_amg/amg 32_lammpsACE/lammpsACE From 1498071544da30a8c92eddac9b07d02859eb7dac Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:21:54 -0500 Subject: [PATCH 3/7] Update index.rst --- docs/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.rst b/docs/index.rst index 460ce91..22e76c3 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,7 +9,7 @@ 70-89 :: Microbenchmarks 90-99 :: Appendices -ATS-6 Benchmarks Project. ATTENTION: This page is a work in progress and nothing is considered to be final +ATS-6 Benchmarks. ATTENTION: This page is a work in progress and nothing is considered to be final ========================================================================================================== .. toctree:: From 134b8dcf84fecc475c03bd2eac8c3c708f607264 Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:22:41 -0500 Subject: [PATCH 4/7] Update conf.py --- docs/conf.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/conf.py b/docs/conf.py index 09992d5..2db3fe0 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -6,7 +6,7 @@ # -- Project information ----------------------------------------------------- # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information -project = "FCR Benchmarks" +project = "ATS-6 Benchmarks" copyright = "Advanced Simulation and Computing" author = "Tri-labs" From 38718f5e8ff74e61a95bf59c15158d8dd0b09080 Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:26:07 -0500 Subject: [PATCH 5/7] Update introduction.rst --- docs/00_intro/introduction.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/00_intro/introduction.rst b/docs/00_intro/introduction.rst index 5f97abb..f28c739 100644 --- a/docs/00_intro/introduction.rst +++ b/docs/00_intro/introduction.rst @@ -4,16 +4,16 @@ Introduction This is benchmark documentation for a Department of Energy (DOE) National Nuclear Security Administration (NNSA) Advanced Simulation -and Computing (ASC) **Future Computing Resource (FCR)**. +and Computing (ASC) **Advanced Technology System 6 (ATS-6)**. Benchmark Overview ================== -Mini Applications and Microbenchmarks are features, components, performance characteristics, or other properties that are important to the Laboratories. Mini Applications are prioritized as Priority 1, or Priority 2. +Mini Applications and Microbenchmarks are features, components, performance characteristics, or other properties that are important to the Laboratories. Mini Applications are prioritized as Technical Requirement 1, or Technical Requirement 2. -Priority 1 Mini Applications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Technical Requirement 1 Mini Applications +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: @@ -62,8 +62,8 @@ Priority 1 Mini Applications - Kokkos -Priority 2 Mini Applications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Technical Requirement 2 Mini Applications +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. list-table:: @@ -101,7 +101,7 @@ Priority 2 Mini Applications - NCCL+CUDA - NVIDIA NeMo -Please note that half of the RAJA kernels are Priority 1, and the other half are Priority 2. Similarly, 2 of the Laghos problems are Priority 1, and the third is Priority 2. +Please note that half of the RAJA kernels are Technical Requirement 1, and the other half are Technical Requirement 2. Similarly, 2 of the Laghos problems are Technical Requirement 1, and the third is Technical Requirement 2. .. _GlobalRunRules: From f5b653c2c7d4ffd6dadb6c886e55f0214d0c1acd Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:34:51 -0500 Subject: [PATCH 6/7] Update rajaperf.rst --- docs/13_rajaperf/rajaperf.rst | 114 +++++++++++++++++----------------- 1 file changed, 56 insertions(+), 58 deletions(-) diff --git a/docs/13_rajaperf/rajaperf.rst b/docs/13_rajaperf/rajaperf.rst index 045f6e7..dd62e99 100644 --- a/docs/13_rajaperf/rajaperf.rst +++ b/docs/13_rajaperf/rajaperf.rst @@ -79,9 +79,9 @@ Problems The RAJA Performance Suite Benchmark consists of a subset of kernels in the full Suite that focus on some key computational patterns found in LLNL -applications. The benchmark kernels are partitioned into two priority levels as -described below, along with notable features and RAJA constructs used in each -kernel (in parentheses). +applications. The benchmark kernels are partitioned into two sets of Technical +Requirements as described below, along with notable features and RAJA constructs +used in each kernel (in parentheses). .. note:: In the RAJA Performance Suite repository, each kernel contains a detailed reference description near the top of the header file for @@ -89,14 +89,14 @@ kernel (in parentheses). The reference description is a C-style sequential implementation of the kernel in a comment section near the top of the file. -The RAJA Performance Suite Benchmark kernels are partitioned into two -priority levels described below. +The RAJA Performance Suite Benchmark kernels are partitioned into two sets of +Technical Requirements described below. -Priority 1 kernels -^^^^^^^^^^^^^^^^^^^ +Technical Requirement 1 kernels +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -*Priority 1* kernels are most important to us. They are located in the +*Technical Requirement 1* kernels are most important to us. They are located in the ``RAJAPerf/src/apps`` sub-directory: #. **DIFFUSION3DPA** element-wise action of a 3D finite element volume diffusion operator via partial assembly and sum factorization *(nested loops, GPU shared memory, RAJA::launch API)* @@ -111,12 +111,10 @@ Priority 1 kernels #. **VOL3D** on a 3D structured hexahedral mesh (faces are not necessarily planes), compute volume of each zone (hex) *(single loop, data access via indirection array, RAJA::forall API)* -Priority 2 kernels -^^^^^^^^^^^^^^^^^^^ +Technical Requirement 2 kernels +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -*Priority 2* kernels are also important, but less so than the *Priority 1* -kernels listed above. *Priority 2* kernels are listed below and are located in -the ``RAJAPerf/src`` sub-directories noted: +*Technical Requirement 2* kernels are listed below and are located in the ``RAJAPerf/src`` sub-directories noted: #. **apps/CONVECTION3DPA** element-wise action of a 3D finite element volume convection operator via partial assembly and sum factorization *(nested loops, GPU shared memory, RAJA::launch API)* #. **apps/DEL_DOT_VEC_2D** divergence of a vector field at a set of points on a mesh *(single loop, data access via indirection array, RAJA::forall API)* @@ -383,14 +381,14 @@ The scripts and results discussed here are located in the ``scripts/2026-FCR`` directory there. .. important:: In the following sections, we present detailed results, - including FOM tables and throughput plots for the Priority 1 - kernels described above. For completeness, we also include a - brief summary of results for Priority 2 kernels in less detail. - Data files containing results for all kernels run are included - in this repository. + including FOM tables and throughput plots for the Technical + Requirement 1 kernels described above. For completeness, we also + include a brief summary of results for Technical Requirement 2 + kernels in less detail. Data files containing results for all + kernels run are included in this repository. -AMD MI300A throughput results (Priority 1 kernels) ----------------------------------------------------- +AMD MI300A throughput results (Technical Requirement 1 kernels) +--------------------------------------------------------------- For the MI300A architecture, we present two sets of throughput results. One is run in ``SPX mode`` where we use 4 MPI ranks on a node, one for each MI300A APU, @@ -400,8 +398,8 @@ APU, and treat each APU as 6 GPUs (one GPU = 1 XCD). In each case, we run each kernel over a sequence of problem sizes such that the saturation point is evident on its associated throughput curve. -SPX mode (Priority 1) -^^^^^^^^^^^^^^^^^^^^^^ +SPX mode (Technical Requirement 1) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For SPX mode (run with 1 MPI rank per APU on a node), we choose the smallest problem to use ~100,000 bytes of allocated memory and the largest problem @@ -416,7 +414,7 @@ memory and the largest problem to use ~600MB memory, which is over twice as large as the MALL. After building the code as described in :ref:`rajaperf_build_mi300a-label`, we -run the ``Priority 1`` kernels in **SPX mode** as follows:: +run the ``Technical Requirement 1`` kernels in **SPX mode** as follows:: $ pwd path/to/RAJAPerf @@ -440,34 +438,34 @@ directory specified via the ``--output-dir`` option above. We include the files generated by the ``process_data.py`` script in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_MI300A_tier1-SPX``. -.. csv-table:: FOM results for Priority 1 kernels run on MI300A in SPX mode +.. csv-table:: FOM results for Technical Requirement 1 kernels on MI300A in SPX mode :file: ./baseline_data/RPBenchmark_MI300A_tier1-SPX/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -SPX mode (Priority 2) +SPX mode (Technical Requirement 2) ^^^^^^^^^^^^^^^^^^^^^^ -The process for generating results for the Priority 2 kernels is essentially -the same as for the Priority 1 kernels just described. Note that two of the -kernels ``INDEXLIST_3LOOP`` and ``HALO_PACKING_FUSED`` do not perform any +The process for generating results for the Technical Requirement 2 kernels is +the same as for the Technical Requirement 1 kernels just described. Note that two +of the kernels ``INDEXLIST_3LOOP`` and ``HALO_PACKING_FUSED`` do not perform any floating point operations. They represent recurring computational patterns in our application that are important rather than key numerical kernels. Thus, the two kernels have zero GFLOP/sec rates. So, we consider the bandwidth as the appropriate metric to consider. -.. csv-table:: FOM results for Priority 2 kernels run on MI300A in SPX mode +.. csv-table:: FOM results for Technical Requirement 2 kernels on MI300A in SPX mode :file: ./baseline_data/RPBenchmark_MI300A_tier2-SPX/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -The baseline data files for Priority 2 kernels run on the MI300A architecture in +The baseline data files for Technical Requirement 2 kernels on the MI300A architecture in SPX mode are in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_MI300A_tier1-SPX``. -CPX mode (Priority 1) -^^^^^^^^^^^^^^^^^^^^^^ +CPX mode (Technical Requirement 1) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For CPX mode (run with 6 MPI ranks per APU on a node), we choose the smallest problem to use ~50,000 bytes of allocated memory and the largest @@ -480,8 +478,8 @@ For them, we chose the smallest problem to use ~1.6MB of allocated memory and the largest problem to use ~200MB memory, which is a little less than the MALL size. -Similar to the SPX mode description above, we run the ``Priority 1`` kernels in -**CPX mode** as follows:: +Similar to the SPX mode description above, we run the ``Technical Requirement 1`` +kernels in **CPX mode** as follows:: $ pwd path/to/RAJAPerf @@ -506,34 +504,34 @@ directory specified by via the ``--output-dir`` option above. We include the files generated by the ``process_data.py`` script in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_MI300A_tier1-CPX``. -.. csv-table:: FOM results for Priority 1 kernels run on MI300A in CPX mode +.. csv-table:: FOM results for Technical Requirement 1 kernels on MI300A in CPX mode :file: ./baseline_data/RPBenchmark_MI300A_tier1-CPX/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -CPX mode (Priority 2) -^^^^^^^^^^^^^^^^^^^^^^ +CPX mode (Technical Requirement 2) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The process for generating results for the Priority 2 kernels is essentially -the same as for the Priority 1 kernels just described. Note that two of the +The process for generating results for the Technical Requirement 2 kernels is essentially +the same as for the Technical Requirement 1 kernels just described. Note that two of the kernels ``INDEXLIST_3LOOP`` and ``HALO_PACKING_FUSED`` do not perform any floating point operations. They represent recurring computational patterns in our application that are important rather than key numerical kernels. Thus, the two kernels have zero GFLOP/sec rates. So, we consider the bandwidth as the appropriate metric to consider. -.. csv-table:: FOM results for Priority 2 kernels run on MI300A in CPX mode +.. csv-table:: FOM results for Technical Requirement 2 kernels run on MI300A in CPX mode :file: ./baseline_data/RPBenchmark_MI300A_tier2-CPX/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -The baseline data files for Priority 2 kernels run on this MI300A architecture in +The baseline data files for Technical Requirement 2 kernels run on this MI300A architecture in CPX mode are in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_MI300A_tier1-CPX``. -AMD MI300A throughput plots (Priority 1) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +AMD MI300A throughput plots (Technical Requirement 1) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following table contains throughput plots for each kernel run as described above on the MI300A architecture in SPX mode and CPX mode. Each plot has multiple @@ -560,7 +558,7 @@ RAJA execution policies specifically, can have a significant impact on performance. +-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ -| Priority 1 Kernels: MI300A Node Throughput (SPX Mode) | Priority 1 Kernels: MI300A Node Throughput (CPX Mode) | +| Technical Requirement 1 Kernels: MI300A Node Throughput (SPX Mode) | Technical Requirement 1 Kernels: MI300A Node Throughput (CPX Mode) | +-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ | | | | .. figure:: baseline_data/RPBenchmark_MI300A_tier1-SPX/figures/Apps_DIFFUSION3DPA_flops.png | .. figure:: baseline_data/RPBenchmark_MI300A_tier1-CPX/figures/Apps_DIFFUSION3DPA_flops.png | @@ -614,11 +612,11 @@ performance. +-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------+ -NVIDIA H100 throughput results (Priority 1 kernels) ----------------------------------------------------- +NVIDIA H100 throughput results (Technical Requirement 1 kernels) +---------------------------------------------------------------- For the H100 architecture, we present throughput results, where we run with -4 MPI ranks on a node -- one for each H100 GPU. We run each ``Priority 1`` +4 MPI ranks on a node -- one for each H100 GPU. We run each ``Technical Requirement 1`` kernel over a sequence of problem sizes such that the saturation point is evident on its associated throughput curve. @@ -634,7 +632,7 @@ and the largest problem to use ~300MB memory, which is about 6 times the L2 cache size. After building the code as described in :ref:`rajaperf_build_h100-label`, we -run the ``Priority 1`` kernels as follows:: +run the ``Technical Requirement 1`` kernels as follows:: $ pwd path/to/RAJAPerf @@ -658,34 +656,34 @@ directory specified by via the ``--output-dir`` option above. We include the files generated by the ``process_data.py`` script in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_H100_tier1``. -.. csv-table:: FOM results for Priority 1 kernels run on H100 +.. csv-table:: FOM results for Technical Requirement 1 kernels run on H100 :file: ./baseline_data/RPBenchmark_H100_tier1/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -H100 (Priority 2) -^^^^^^^^^^^^^^^^^^^^^^ +H100 (Technical Requirement 2) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The process for generating results for the Priority 2 kernels is essentially -the same as for the Priority 1 kernels just described. Note that two of the +The process for generating results for the Technical Requirement 2 kernels is essentially +the same as for the Technical Requirement 1 kernels just described. Note that two of the kernels ``INDEXLIST_3LOOP`` and ``HALO_PACKING_FUSED`` do not perform any floating point operations. They represent recurring computational patterns in our application that are important rather than key numerical kernels. Thus, the two kernels have zero GFLOP/sec rates. So, we consider the bandwidth as the appropriate metric to consider. -.. csv-table:: FOM results for Priority 2 kernels run on H100 +.. csv-table:: FOM results for Technical Requirement 2 kernels run on H100 :file: ./baseline_data/RPBenchmark_H100_tier2/FOM/combined_fom.csv :align: center :widths: auto :header-rows: 1 -The baseline data files for Priority 2 kernels run on the H100 architecture +The baseline data files for Technical Requirement 2 kernels run on the H100 architecture are in this repo in the directory ``./docs/13_rajaperf/baseline_data/RPBenchmark_H100_tier2-SPX``. -NVIDIA H100 throughput plots (Priority 1) -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +NVIDIA H100 throughput plots (Technical Requirement 1) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The following table contains throughput plots for each kernel run as described above for the H100 architecture. Each plot has multiple curves where GFLOP/sec @@ -710,7 +708,7 @@ These additional curves were included to show how kernel execution choices, RAJA execution policies specifically, can have a noticeable impact on performance. +-----------------------------------------------------------------------------------------------------+ -| Priority 1 Kernels H100 Node Throughput | +| Technical Requirement 1 Kernels H100 Node Throughput | +-----------------------------------------------------------------------------------------------------+ | | | .. figure:: baseline_data/RPBenchmark_H100_tier1/figures/Apps_DIFFUSION3DPA_flops.png | From 5ae379dfd42081e6bdaff33f404ef815cd9c8f1c Mon Sep 17 00:00:00 2001 From: pearce8 Date: Fri, 5 Jun 2026 14:36:25 -0500 Subject: [PATCH 7/7] Update laghos.rst --- docs/12_laghos/laghos.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/docs/12_laghos/laghos.rst b/docs/12_laghos/laghos.rst index 750bf65..9a329b9 100644 --- a/docs/12_laghos/laghos.rst +++ b/docs/12_laghos/laghos.rst @@ -19,15 +19,16 @@ Problems The test problems are the Sedov shock (problem 1 in Laghos) in 3D. The test problems should be run with a conforming mesh. -Linear, quadratic, and cubic orders are of interest with the following priorities: +Linear, quadratic, and cubic orders are of interest and fall into +the Technical Requirements as following: -Priority 1 problems -^^^^^^^^^^^^^^^^^^^ +Technical Requirement 1 problems +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ #. **3D Linear** This problem uses a kinematic order of 1, and a thermodynamic order of 0 (Q1Q0). #. **3D Quadratic** This problem uses a kinematic order of 2, and a thermodynamic order of 1 (Q2Q1). -Priority 2 problems -^^^^^^^^^^^^^^^^^^^ +Technical Requirement 2 problems +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 3. **3D Cubic** This problem uses a kinematic order of 3, and a thermodynamics order of 2 (Q3Q2).