Skip to content

Refactor bdy imp3#561

Draft
Benjamin Went (MetBenjaminWent) wants to merge 5 commits into
MetOffice:mainfrom
MetBenjaminWent:refactor-bdy_imp3
Draft

Refactor bdy imp3#561
Benjamin Went (MetBenjaminWent) wants to merge 5 commits into
MetOffice:mainfrom
MetBenjaminWent:refactor-bdy_imp3

Conversation

@MetBenjaminWent

@MetBenjaminWent Benjamin Went (MetBenjaminWent) commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

PR Summary

Sci/Tech Reviewer:
Code Reviewer:

Linked Issues:
Umbrella: #560
Umbrella Work: #106

Code Quality Checklist

  • I have performed a self-review of my own code
  • My code follows the project's style guidelines
  • Comments have been included that aid understanding and enhance the readability of the code
  • My changes generate no new warnings
  • All automated checks in the CI pipeline have completed successfully

Testing

  • I have tested this change locally, using the LFRic Apps rose-stem suite
  • If any tests fail (rose-stem or CI) the reason is understood and acceptable (e.g. kgo changes)
  • I have added tests to cover new functionality as appropriate (e.g. system tests, unit tests, etc.)
  • Any new tests have been assigned an appropriate amount of compute resource and have been allocated to an appropriate testing group (i.e. the developer tests are for jobs which use a small amount of compute resource and complete in a matter of minutes)

trac.log

Security Considerations

  • I have reviewed my changes for potential security issues
  • Sensitive data is properly handled (if applicable)
  • Authentication and authorisation are properly implemented (if applicable)

Performance Impact

  • Performance of the code has been considered and, if applicable, suitable performance measurements have been conducted

AI Assistance and Attribution

  • Some of the content of this change has been produced with the assistance of Generative AI tool name (e.g., Met Office Github Copilot Enterprise, Github Copilot Personal, ChatGPT GPT-4, etc) and I have followed the Simulation Systems AI policy (including attribution labels)

Documentation

  • Where appropriate I have updated documentation related to this change and confirmed that it builds correctly

PSyclone Approval

  • If you have edited any PSyclone-related code (e.g. PSyKAl-lite, Kernel interface, optimisation scripts, LFRic data structure code) then please contact the TCD Team

Sci/Tech Review

  • I understand this area of code and the changes being added
  • The proposed changes correspond to the pull request description
  • Documentation is sufficient (do documentation papers need updating)
  • Sufficient testing has been completed

(Please alert the code reviewer via a tag when you have approved the SR)

Code Review

  • All dependencies have been resolved
  • Related Issues have been properly linked and addressed
  • CLA compliance has been confirmed
  • Code quality standards have been met
  • Tests are adequate and have passed
  • Documentation is complete and accurate
  • Security considerations have been addressed
  • Performance impact is acceptable

@MetBenjaminWent

Copy link
Copy Markdown
Contributor Author

vern_comb_after_3n_16T_run2.txt:| bdy_imp3 | 23.37296 | 22.793 | 24.303 | 23.37296 | 22.793 | 24.303 | 421.66104 | 480 | 3.54854 | 0.04869 |
vern_comb_after_3n_1T_run2.txt:| bdy_imp3 | 0.4702 | 0.375 | 0.745 | 0.4702 | 0.375 | 0.745 | 306.8558 | 480 | 0.15091 | 0.00098 |
vern_comb_after_3n_2T_run2.txt:| bdy_imp3 | 1.87144 | 1.63 | 2.349 | 1.87144 | 1.63 | 2.349 | 231.32831 | 480 | 0.73924 | 0.0039 |
vern_comb_after_3n_4T_run2.txt:| bdy_imp3 | 4.64524 | 4.215 | 5.152 | 4.64524 | 4.215 | 5.152 | 227.4877 | 480 | 1.596 | 0.00968 |
vern_comb_after_3n_8T_run2.txt:| bdy_imp3 | 9.17898 | 8.857 | 10.841 | 9.17898 | 8.857 | 10.841 | 276.30565 | 480 | 2.24013 | 0.01912 |
vern_comb_before_3n_16T_run1.txt:| bdy_imp3 | 0.60892 | 0.595 | 0.629 | 0.60892 | 0.595 | 0.629 | 634.10033 | 480 | 0.09508 | 0.00127 |
vern_comb_before_3n_1T_run1.txt:| bdy_imp3 | 0.2106 | 0.058 | 0.522 | 0.2106 | 0.058 | 0.522 | 310.62336 | 480 | 0.06743 | 0.00044 |
vern_comb_before_3n_2T_run1.txt:| bdy_imp3 | 0.19568 | 0.131 | 0.265 | 0.19568 | 0.131 | 0.265 | 250.37705 | 480 | 0.0777 | 0.00041 |
vern_comb_before_3n_4T_run1.txt:| bdy_imp3 | 0.20734 | 0.166 | 0.259 | 0.20734 | 0.166 | 0.259 | 284.07115 | 480 | 0.07264 | 0.00043 |
vern_comb_before_3n_8T_run1.txt:| bdy_imp3 | 0.31942 | 0.27 | 0.42 | 0.31942 | 0.27 | 0.42 | 397.84844 | 480 | 0.07965 | 0.00067 |

Its way way worse doing it this way. Hmmmm. Not as easy to workout

@MetBenjaminWent

Copy link
Copy Markdown
Contributor Author

So its not the removal of the vector loop. It's the size of the loop (and or possibly the dynamic schedule).

Splitting it back into '3' loops, each with their own OMP coverage (one_over_v is effectively a loop over an array which is i_len / threads) has brought the performance back into some normalcy.

vern_comb_after_3n_16T_run1.txt:| bdy_imp3 | 0.69154 | 0.676 | 0.735 | 0.69154 | 0.676 | 0.735 | 633.28846 | 480 | 0.10804 | 0.00144 |
vern_comb_after_3n_1T_run1.txt:| bdy_imp3 | 0.17841 | 0.057 | 0.545 | 0.17841 | 0.057 | 0.545 | 312.58077 | 480 | 0.05683 | 0.00037 |
vern_comb_after_3n_2T_run1.txt:| bdy_imp3 | 0.21947 | 0.15 | 0.281 | 0.21947 | 0.15 | 0.281 | 250.61117 | 480 | 0.08703 | 0.00046 |
vern_comb_after_3n_4T_run1.txt:| bdy_imp3 | 0.2397 | 0.199 | 0.308 | 0.2397 | 0.199 | 0.308 | 287.77153 | 480 | 0.08272 | 0.0005 |
vern_comb_after_3n_8T_run1.txt:| bdy_imp3 | 0.36944 | 0.34 | 0.495 | 0.36944 | 0.34 | 0.495 | 396.13619 | 480 | 0.09233 | 0.00077 |
vern_comb_before_3n_16T_run1.txt:| bdy_imp3 | 0.60892 | 0.595 | 0.629 | 0.60892 | 0.595 | 0.629 | 634.10033 | 480 | 0.09508 | 0.00127 |
vern_comb_before_3n_1T_run1.txt:| bdy_imp3 | 0.2106 | 0.058 | 0.522 | 0.2106 | 0.058 | 0.522 | 310.62336 | 480 | 0.06743 | 0.00044 |
vern_comb_before_3n_2T_run1.txt:| bdy_imp3 | 0.19568 | 0.131 | 0.265 | 0.19568 | 0.131 | 0.265 | 250.37705 | 480 | 0.0777 | 0.00041 |
vern_comb_before_3n_4T_run1.txt:| bdy_imp3 | 0.20734 | 0.166 | 0.259 | 0.20734 | 0.166 | 0.259 | 284.07115 | 480 | 0.07264 | 0.00043 |
vern_comb_before_3n_8T_run1.txt:| bdy_imp3 | 0.31942 | 0.27 | 0.42 | 0.31942 | 0.27 | 0.42 | 397.84844 | 480 | 0.07965 | 0.00067 |

It also highlights that blocking was still beneficial. We should cross compare another datapoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants