Skip to content

Use LoopVectorization in julia stencil / transpose #543

@haampie

Description

@haampie

What type of issue is this?

  • Bug in the code or other problem
  • Inadequate/incorrect documation
  • Feature request

LoopVectorization.jl usually does a better job than the julia compiler + llvm at unrolling and vectorization. You might want to use it for some of the benchmarks.

For instance on zen2:

$ ~/julia-1.6.0-rc1/bin/julia -O3

(@v1.6) pkg> activate --temp
  Activating new environment at `/tmp/jl_GYGsu9/Project.toml`

(jl_GYGsu9) pkg> add BenchmarkTools, LoopVectorization

julia> using LoopVectorization, BenchmarkTools

julia> r = 3;

julia> n = 1000;

julia> A = zeros(Float64, n, n);

julia> B = zeros(Float64, n, n);

julia> W = zeros(Float64, 2*r+1, 2*r+1);

julia> function do_stencil(A, W, B, r, n)
           for j=r:n-r-1
               for i=r:n-r-1
                   for jj=-r:r
                       for ii=-r:r
                           @inbounds B[i+1,j+1] += W[r+ii+1,r+jj+1] * A[i+ii+1,j+jj+1]
                       end
                   end
               end
           end
       end
do_stencil (generic function with 1 method)

julia> @benchmark do_stencil($A, $W, $B, $r, $n)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     24.744 ms (0.00% GC)
  median time:      24.799 ms (0.00% GC)
  mean time:        24.803 ms (0.00% GC)
  maximum time:     24.948 ms (0.00% GC)
  --------------
  samples:          202
  evals/sample:     1

julia> function do_stencil_avx(A, W, B, r, n)
           @avx for j=r:n-r-1, i=r:n-r-1, jj=-r:r, ii=-r:r
               B[i+1,j+1] += W[r+ii+1,r+jj+1] * A[i+ii+1,j+jj+1]
           end
       end
do_stencil_avx (generic function with 1 method)

julia> @benchmark do_stencil_avx($A, $W, $B, $r, $n)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.234 ms (0.00% GC)
  median time:      3.267 ms (0.00% GC)
  mean time:        3.275 ms (0.00% GC)
  maximum time:     3.452 ms (0.00% GC)
  --------------
  samples:          1527
  evals/sample:     1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions