[experiment] JIT: Introduce KnownBits#129082
Draft
EgorBo wants to merge 6 commits into
Draft
Conversation
Contributor
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a new JIT-internal KnownBits analysis (LLVM-style “known zero/known one” bit lattice) and wires it into assertion propagation so the JIT can fold more comparisons (including TYP_LONG) and prove more casts redundant/overflow-safe using bit-level facts.
Changes:
- Introduces
KnownBits/KnownBitsOps(bit lattice + transfer functions) and aKnownBits::ComputeVN/assertion-driven analysis. - Uses KnownBits in global assertion propagation to fold relops and to remove/relax casts when provably safe.
- Updates JIT build wiring to compile the new implementation.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/coreclr/jit/knownbits.h | New KnownBits lattice + transfer helpers (And/Or/UDiv/Cast/EvalRelop) and analysis entrypoint. |
| src/coreclr/jit/knownbits.cpp | Implements VN/assertion-based KnownBits computation, including PHI merging and assertion refinement. |
| src/coreclr/jit/CMakeLists.txt | Adds KnownBits sources/headers to the JIT build. |
| src/coreclr/jit/assertionprop.cpp | Hooks KnownBits into global relop folding and cast simplification paths; widens some assertion creation gates to include TYP_LONG. |
Build on the initial KnownBits analysis by porting more LLVM-style transfer functions and wiring KnownBits into more assertion-prop consumers. Each addition was measured per-feature via SPMI asmdiffs on libraries.pmi and benchmarks.run, and additions that did not clear an 80-byte bar (or regressed) were dropped. Transfer functions (knownbits.h): add Mul (leading-zeros + low-bits), constant LSH/RSZ/RSH shifts, and URem, ported from llvm/lib/Support/KnownBits.cpp. (XOR/NOT/ADD/SUB/NEG were prototyped but trimmed: each <80 bytes or net harmful.) ComputeWorker (knownbits.cpp): handle VNF_MUL/UMOD/LSH/RSH/RSZ. MergeKnownBitsAssertions: read "num u< otherVN" / "num u<= otherVN" (num inherits otherVN's leading-zero bits) and pin the last unknown bit from "num != const". Consumers (assertionprop.cpp): * optAssertionProp_RangeProperties now derives non-negative/non-zero from known bits, covering TYP_LONG and bit patterns the interval range cannot express. * optAssertionProp_AddMulSub clears the overflow flag for TYP_LONG ADD/SUB/MUL when known bits prove the operation cannot overflow. * 64-bit (TYP_LONG) relop assertions are generated and consumed end-to-end. (Bounds-check elimination and generic constant-folding were prototyped but trimmed: <80 bytes and/or net regressions.) Add JitEnableKnownBits (default 1) to disable the analysis and all its consumers. Diffs (win-x64 libraries.pmi): -16,220 bytes, 572 contexts (488 improvements, 3 regressions); linux-arm64: -16,184 bytes. SuperPMI replay is clean on win-x64, linux-x64 and linux-arm64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add per-function citations to the corresponding routines in llvm/lib/Support/KnownBits.cpp (udiv, mul, urem/remGetLowBits, shl/lshr/ashr, eq/ne/ult/...), note the intentional simplifications (constant-shift only; udiv/mul refinements dropped), and document that Intersect/Union are LLVM's unionWith/intersectWith with the names inverted (they describe the value-set operation). Also note the lattice is a fixed-width adaptation of LLVM's APInt-based KnownBits. Comments only; no behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two new assertion-prop consumers that use KnownBits, plus the XOR transfer function that feeds them: * optAssertionProp_KnownBitsSimplify: remove a redundant constant mask on AND/OR -- "x & C" => "x" when every possibly-set bit of x is set in C, "x | C" => "x" when every set bit of C is already known-one in x, and "x & C" => 0 when x has no possibly-set bit in C. * optAssertionProp_BndsChk: drop a bounds check when known bits prove (uint)index < (uint)length (e.g. masked indices). * ComputeWorker now handles VNF_XOR. Mask simplification is the main win and scales with code volume (it fires ~1,550x on libraries_tests.run). Diffs vs the prior commit: libraries_tests.run -10,583 bytes, libraries.pmi -777 bytes; net improvement on every collection measured. SuperPMI replay clean on win-x64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+79
to
+83
| // Is bit "pos" known to be 0? | ||
| bool IsBitZero(unsigned pos) const | ||
| { | ||
| return (knownZero & (1ull << pos)) != 0; | ||
| } |
MergeKnownBitsAssertions handled unsigned OAK_LT_UN/OAK_LE_UN against a constant
but ignored the signed OAK_LT/OAK_LE forms, so a pattern like
if (a > 10 && a < 1000) // a is long
... checked((int)a) ...
did not learn that 'a' fits in an int: the 'a > 10' assertion proved a >= 0 (sign
bit 0) but the 'a < 1000' assertion was dropped, leaving the upper bits unknown and
the overflow check in place.
Now a signed 'num < C' / 'num <= C' assertion with a non-negative bound records a
candidate upper bound, which is applied after the assertion loop once num is also
known non-negative: num is then in [0, C-1], so its upper bits are known zero. This
lets optAssertionProp_Cast drop the checked cast's overflow check in the example.
Diffs on libraries.pmi win-x64: -80 bytes, 7 improvements, 0 regressions (plus
PerfScore wins from removed overflow branches). SuperPMI replay clean on libraries.pmi.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In ComputeWorker the known bits of a comparison VN were always reported as the [0, 1] bound. When the operands' known bits already settle the comparison, use KnownBitsOps::EvalRelop to report the exact constant instead (0 = false, 1 = true), falling back to [0, 1] when undetermined. This lets a comparison whose result is statically known propagate as a constant into the operations and comparisons that consume it (e.g. a nested relop operand), rather than only being known to be 0 or 1. Diffs on libraries.pmi win-x64: -252 bytes, 9 improvements, 0 regressions. SuperPMI replay clean on libraries.pmi. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines
+79
to
+83
| // Is bit "pos" known to be 0? | ||
| bool IsBitZero(unsigned pos) const | ||
| { | ||
| return (knownZero & (1ull << pos)) != 0; | ||
| } |
Comment on lines
+4124
to
+4140
| if (isUnsigned) | ||
| { | ||
| const T aMax = (T)a.GetUMax(width); | ||
| const T bMax = (T)b.GetUMax(width); | ||
| switch (oper) | ||
| { | ||
| case GT_ADD: | ||
| return !CheckedOps::AddOverflows<T>(aMax, bMax, CheckedOps::Unsigned); | ||
| case GT_SUB: | ||
| // Unsigned a - b underflows iff a < b; safe iff umin(a) >= umax(b). | ||
| return a.GetUMin(width) >= b.GetUMax(width); | ||
| case GT_MUL: | ||
| return !CheckedOps::MulOverflows<T>(aMax, bMax, CheckedOps::Unsigned); | ||
| default: | ||
| return false; | ||
| } | ||
| } |
Comment on lines
+4218
to
+4228
| // Known-bits based no-overflow proof. This also covers TYP_LONG operations (which the range-based | ||
| // path above does not handle) and bit patterns an interval range cannot express. | ||
| if (!optLocalAssertionProp && tree->gtOverflow() && varTypeIsIntegral(tree)) | ||
| { | ||
| const unsigned width = (genActualType(tree) == TYP_LONG) ? 64 : 32; | ||
| const KnownBits kb1 = KnownBits::Compute(this, optConservativeNormalVN(tree->gtGetOp1()), assertions); | ||
| const KnownBits kb2 = KnownBits::Compute(this, optConservativeNormalVN(tree->gtGetOp2()), assertions); | ||
| if (knownBitsOperCannotOverflow(tree->OperGet(), tree->IsUnsigned(), kb1, kb2, width)) | ||
| { | ||
| tree->ClearOverflow(); | ||
| return optAssertionProp_Update(tree, tree, stmt); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #105333
Closes #111567
Most optimizing compilers use KnownBits to track ranges, e.g. LLVM, MSVC vs tracking Min-Max values.
We have
Rangewich is 32-bit only and will likely require lots of efforts to make it 64-bit, fix casts between 32-bit and 64-bit, fight with all kinds of overflows and correctness issues and still it's only good fo representing continues ranges.KnownBits looks like this:
My estimate it can bring 500k-600k diffs for win-x64 if extended (I've seen 450k with +400 LOC), current version should be
-320kbon win-x64 with somewhat nices improvements in non-tests collections.I'm not sure it can fully replace Range just like Range can't replace KnownBits, but some usages definitely can be re-routed to KnownBits.
This PR is basically an attempt to mimic LLVM's impl
Not done:
Diffs -
-662244 byteson linux-arm64