diff --git a/docs/72_phloem/phloem.rst b/docs/70_phloem/phloem.rst similarity index 100% rename from docs/72_phloem/phloem.rst rename to docs/70_phloem/phloem.rst diff --git a/docs/70_omb/omb.rst b/docs/71_omb/omb.rst similarity index 100% rename from docs/70_omb/omb.rst rename to docs/71_omb/omb.rst diff --git a/docs/72_smb/smb.rst b/docs/72_smb/smb.rst new file mode 100644 index 0000000..56a89fb --- /dev/null +++ b/docs/72_smb/smb.rst @@ -0,0 +1,149 @@ +************************************* +Sandia Microbenchmarks - Message rate +************************************* + +SMB Message Rate - A multi-node MPI point-to-point benchmark. https://github.com/sandialabs/SMB. + +Purpose +======= +Sandia Microbenchmarks - Message Rate is a realistic messaging benchmark designed to emulate real application behavior. In particular there are a few things that set this benchmark apart; 1. It uses a peer count to emulate different communication patterns. 2. It clears the cache between each iteration to get realistic performance numbers. 3. It test different program behaviors, such at pre-posting receives, to evaluate performance under different scenarios. + +Characteristics +=============== + +Problem +------- + +The SMB implements four different communication patterns; single direction, pair based, pre-posted, and all-start. Each of these is a variation of behavior in a given communication pattern. For simplicity we're limiting this evaluation to pre-posted. + + -p Number of peers used in communication + -i Number of iterations per test + -m Number of messages per peer per iteration + -s Number of bytes per message + -c Cache size in bytes + -n Number of procs per node + -o Format output to be machine readable + +Figure of Merit +--------------- +As this is meant to represent a variety of application behaviors there isn't a single figure of merrit we can identify. However, we can identify a subset of input parameters to test. +For the purposes of this test the figure of merit is the message rate of pre-posted across different message sizes, and a number of peer count. + + +Source code modifications +========================= + +Please see :ref:`GlobalRunRules` for general guidance on allowed modifications. + +Building +======== +Accessing the sources + +* Clone the submodule from the benchmarks repository checkout + +.. code-block:: bash + + cd + git submodule update --init --recursive + cd SMB/src/msgrate/ + +.. + +Build requirements: + +* C/C++ compiler(s) with support for C11 and C++14. + +* MPI 3.0+ + + * `OpenMPI 1.10+ `_ + * `mpich `_ + + +.. code-block:: bash + + cd + make -j + +.. + +Testing the build: + +.. code-block:: bash + + mpirun -n 8 msgrate -n 1 + +.. + +You should see output similar to the following (note, because you're presumably testing on a single node at this point, the -n parameter need to be set to 1. While this is erroneous from a performance standpoint, the SMB tries to ensure all communication is done across the network, and thus can't be run on a single node. + +.. code-block:: bash + + job size: 8 + npeers: 6 + niters: 4096 + nmsgs: 128 + nbytes: 8 + cache size: 8388608 + ppn: 1 + single direction: 2578047.02 + pair-based: 4343577.14 + pre-post: 1889840.49 + all-start: 2398236.06 + +.. + + +Running +======= +We have two tests using SMB message rate, that we will describe here. The first is a based on a 2D 9-point stensil code and the second is a 3D 27-point stensil. +Each of these needs to be run for various message sizes and scales to test the performance of the entire system. + + +We define some system specific variables for these tests. + +* PPN - the number of processes per node. +* CACHE - 2x the size of the largest cache size (note: we use 2x here to be thorough) + +Note: these tests can be memory intensive, with memory usage growing as $O(message\_size*number\_of\_messages*number\_of\_peers*processes\_per\_node)$. If memory issues occur, use the -m flag to reduce the number of messages per iteration at higher message sizes. + +* 9 point stencil + +.. code-block:: bash + + for i in {0..24}; do mpirun msgrate -n $PPN -p 8 -c $CACHE -s $((2**i)) -o; done + +.. + + +* 27 point stencil + +.. code-block:: bash + + for i in {0..24}; do mpirun msgrate -n $PPN -p 26 -c $CACHE -s $((2**i)) -o; done + +.. + +Results from SMB are provided on the following systems: + +Validation +========== + + +Example Scalability Results +=========================== + + +Memory Usage +============ + + +Strong Scaling on El Capitan +============================ + + +Weak Scaling on El Capitan +========================== + + +References +========== diff --git a/docs/71_gpcnet/gpcnet.rst b/docs/73_gpcnet/gpcnet.rst similarity index 100% rename from docs/71_gpcnet/gpcnet.rst rename to docs/73_gpcnet/gpcnet.rst diff --git a/docs/index.rst b/docs/index.rst index 11438f4..12d8163 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -47,9 +47,10 @@ FCR Benchmarks Project. ATTENTION: This page is a work in progress and nothing i :numbered: :caption: Microbenchmarks - 70_omb/omb - 71_gpcnet/gpcnet - 72_phloem/phloem + 70_phloem/phloem + 71_omb/omb + 72_smb/smb + 73_gpcnet/gpcnet 80_ior/ior 81_mdtest/mdtest 82_dlio/dlio