Refactor bdy imp3#561
Conversation
|
vern_comb_after_3n_16T_run2.txt:| bdy_imp3 | 23.37296 | 22.793 | 24.303 | 23.37296 | 22.793 | 24.303 | 421.66104 | 480 | 3.54854 | 0.04869 | Its way way worse doing it this way. Hmmmm. Not as easy to workout |
|
So its not the removal of the vector loop. It's the size of the loop (and or possibly the dynamic schedule). Splitting it back into '3' loops, each with their own OMP coverage (one_over_v is effectively a loop over an array which is i_len / threads) has brought the performance back into some normalcy. vern_comb_after_3n_16T_run1.txt:| bdy_imp3 | 0.69154 | 0.676 | 0.735 | 0.69154 | 0.676 | 0.735 | 633.28846 | 480 | 0.10804 | 0.00144 | It also highlights that blocking was still beneficial. We should cross compare another datapoint. |
PR Summary
Sci/Tech Reviewer:
Code Reviewer:
Linked Issues:
Umbrella: #560
Umbrella Work: #106
Code Quality Checklist
Testing
trac.log
Security Considerations
Performance Impact
AI Assistance and Attribution
Documentation
PSyclone Approval
Sci/Tech Review
(Please alert the code reviewer via a tag when you have approved the SR)
Code Review