Skip to content

Cache reflection dispatch on hot Read paths #161

Open
marklam wants to merge 1 commit into
Apollo3zehn:masterfrom
marklam:reflection-caching
Open

Cache reflection dispatch on hot Read paths #161
marklam wants to merge 1 commit into
Apollo3zehn:masterfrom
marklam:reflection-caching

Conversation

@marklam

@marklam marklam commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

and memoise the decode pipeline

NativeDataset.Read<T>, NativeAttribute.Read<T>, and DatatypeMessage's decode-info construction each previously did MakeGenericMethod + MethodInfo.Invoke on every call. Repeated reads against the same dataset/attribute paid the full reflection cost every time, including the per-call boxed object[] argument array.

This commit adds three caches around those sites:

  • NativeDataset / NativeAttribute: a static ConcurrentDictionary<(Type, Type), Delegate> keyed by (TResult, TElement) holding a typed ReaderDelegate built via MethodInfo.CreateDelegate. After the first call for a given type pair, dispatch is a dictionary lookup plus a direct delegate invocation.

  • DatatypeMessage.GetDecodeInfo<TElement>: a per-instance ConcurrentDictionary<(Type, bool), Delegate> caching the entire closure tree per (TElement, isRawMode). The datatype message is immutable after the file is decoded, so the closure is safely reusable across reads.

  • DatatypeMessage.GetDecodeInfoForUnmanagedElement(Type): a static ConcurrentDictionary<Type, ElementDecodeDelegate> caching the per-element decoder. Previously the closure called MethodInfo.Invoke on every element of every read, allocating a boxed argument array per call.

Also includes a workaround in GetDecodeInfoForUnmanagedMemory<T> for an interaction with .NET 8 tiered compilation: with the delegate cached and reused across many calls (which may be a JIT problem, or may be an Intel Raptor Lake problem - dotnet/runtime#128976)

Two test assertions adjusted: CanRead_Dataspace_Null on dataset and attribute previously expected Assert.Throws<TargetInvocationException> because MethodInfo.Invoke wrapped the inner exception. With direct
delegate dispatch, exceptions surface unwrapped; tests now expect the inner exception type directly.

Benchmark

benchmarks/PureHDF.Benchmarks/ReflectionDispatch.cs (new) isolates the per-call dispatch cost. Each method issues 10,000 small reads against scalar int / scalar 12-byte compound datasets and attributes with all caches warmed before the measured loop.

Hardware: 13th Gen Intel Core i9-13900KS, Windows 11, .NET SDK 10.0.300,
runtime 8.0.27, BenchmarkDotNet 0.15.8, default tiered compilation.

Times normalised to microseconds.

Before (master)

Method Mean Error StdDev Gen0 Allocated
Dataset_ReadScalarInt 9,803.0 μs 190.5 μs 219.4 μs 1390.6250 25.02 MB
Dataset_ReadScalarCompound 11,473.0 μs 203.8 μs 190.7 μs 1531.2500 27.85 MB
Attribute_ReadScalarInt 5,752.0 μs 95.7 μs 89.5 μs 507.8125 9.23 MB
Attribute_ReadScalarCompound 7,555.0 μs 115.2 μs 102.2 μs 656.2500 12.05 MB

After (this commit)

Method Mean Error StdDev Gen0 Allocated
Dataset_ReadScalarInt 4,048.5 μs 78.4 μs 139.4 μs 945.3125 17.09 MB
Dataset_ReadScalarCompound 5,298.6 μs 101.5 μs 108.6 μs 1046.8750 18.92 MB
Attribute_ReadScalarInt 669.5 μs 9.1 μs 7.6 μs 118.1641 2.14 MB
Attribute_ReadScalarCompound 1,790.0 μs 17.9 μs 14.9 μs 220.7031 3.97 MB

Summary

Method Speedup Alloc ratio
Dataset_ReadScalarInt 2.4× 0.68×
Dataset_ReadScalarCompound 2.2× 0.68×
Attribute_ReadScalarInt 8.6× 0.23×
Attribute_ReadScalarCompound 4.2× 0.33×

…peline

  `NativeDataset.Read<T>`, `NativeAttribute.Read<T>`, and
  `DatatypeMessage`'s decode-info construction each previously did
  `MakeGenericMethod` + `MethodInfo.Invoke` on every call. Repeated reads
  against the same dataset/attribute paid the full reflection cost every
  time, including the per-call boxed `object[]` argument array.

  This commit adds three caches around those sites:

  - `NativeDataset` / `NativeAttribute`: a static
    `ConcurrentDictionary<(Type, Type), Delegate>` keyed by
    `(TResult, TElement)` holding a typed `ReaderDelegate` built via
    `MethodInfo.CreateDelegate`. After the first call for a given type
    pair, dispatch is a dictionary lookup plus a direct delegate
    invocation.

  - `DatatypeMessage.GetDecodeInfo<TElement>`: a per-instance
    `ConcurrentDictionary<(Type, bool), Delegate>` caching the entire
    closure tree per `(TElement, isRawMode)`. The datatype message is
    immutable after the file is decoded, so the closure is safely
    reusable across reads.

  - `DatatypeMessage.GetDecodeInfoForUnmanagedElement(Type)`: a static
    `ConcurrentDictionary<Type, ElementDecodeDelegate>` caching the
    per-element decoder. Previously the closure called `MethodInfo.Invoke`
    on every element of every read, allocating a boxed argument array
    per call.

  Also includes a workaround in `GetDecodeInfoForUnmanagedMemory<T>` for
  an interaction with .NET 8 tiered compilation: with the delegate cached
  and reused across many calls, `MemoryMarshal.AsBytes`'s `checked(...)`
  multiplication intermittently threw `OverflowException` on inputs that
  cannot overflow (`Length=1, sizeof(T)=12`). Constructing the byte span
  directly via `MemoryMarshal.CreateSpan` sidesteps the throw site.
  Disabling tiered compilation (`DOTNET_TieredCompilation=0`) also
  eliminates the symptom, confirming the cause. See `../TierProblems` for
  a self-contained repro of the broader JIT-miscompilation family this
  sits inside — the throw site is one of several variants triggered by
  the same hot reflection-built-delegate call shape.

  Two test assertions adjusted: `CanRead_Dataspace_Null` on dataset and
  attribute previously expected `Assert.Throws<TargetInvocationException>`
  because `MethodInfo.Invoke` wrapped the inner exception. With direct
  delegate dispatch, exceptions surface unwrapped; tests now expect the
  inner exception type directly.

  # Benchmark

  `benchmarks/PureHDF.Benchmarks/ReflectionDispatch.cs` (new) isolates
  the per-call dispatch cost. Each method issues 10,000 small reads
  against scalar `int` / scalar 12-byte compound datasets and attributes
  with all caches warmed before the measured loop.

  Hardware: 13th Gen Intel Core i9-13900KS, Windows 11, .NET SDK 10.0.300,
  runtime 8.0.27, BenchmarkDotNet 0.15.8, default tiered compilation.

  Times normalised to microseconds.

  ## Before (master)

  | Method                       |        Mean |     Error |    StdDev |      Gen0 | Allocated |
  |------------------------------|------------:|----------:|----------:|----------:|----------:|
  | Dataset_ReadScalarInt        |  9,803.0 μs |  190.5 μs |  219.4 μs | 1390.6250 |  25.02 MB |
  | Dataset_ReadScalarCompound   | 11,473.0 μs |  203.8 μs |  190.7 μs | 1531.2500 |  27.85 MB |
  | Attribute_ReadScalarInt      |  5,752.0 μs |   95.7 μs |   89.5 μs |  507.8125 |   9.23 MB |
  | Attribute_ReadScalarCompound |  7,555.0 μs |  115.2 μs |  102.2 μs |  656.2500 |  12.05 MB |

  ## After (this commit)

  | Method                       |        Mean |     Error |    StdDev |      Gen0 | Allocated |
  |------------------------------|------------:|----------:|----------:|----------:|----------:|
  | Dataset_ReadScalarInt        |  4,048.5 μs |   78.4 μs |  139.4 μs |  945.3125 |  17.09 MB |
  | Dataset_ReadScalarCompound   |  5,298.6 μs |  101.5 μs |  108.6 μs | 1046.8750 |  18.92 MB |
  | Attribute_ReadScalarInt      |    669.5 μs |    9.1 μs |    7.6 μs |  118.1641 |   2.14 MB |
  | Attribute_ReadScalarCompound |  1,790.0 μs |   17.9 μs |   14.9 μs |  220.7031 |   3.97 MB |

  ## Summary

  | Method                       | Speedup | Alloc ratio |
  |------------------------------|--------:|------------:|
  | Dataset_ReadScalarInt        |    2.4× |       0.68× |
  | Dataset_ReadScalarCompound   |    2.2× |       0.68× |
  | Attribute_ReadScalarInt      |    8.6× |       0.23× |
  | Attribute_ReadScalarCompound |    4.2× |       0.33× |
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant