Skip to content

Commit 8044090

Browse files
committed
Set P to zero for compact sketches.
Added storage layout documentation for the compact compressed sketch.
1 parent dd45187 commit 8044090

2 files changed

Lines changed: 31 additions & 1 deletion

File tree

src/main/java/org/apache/datasketches/theta/CompactOperations.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ static MemorySegment loadCompactMemorySegment(
253253
}
254254
if (preLongs > 1) {
255255
insertCurCount(dstWSeg, curCount);
256-
insertP(dstWSeg, (float) 1.0);
256+
insertP(dstWSeg, (float) 0.0); //0.0 to be consistent with C++
257257
}
258258
if (preLongs > 2) {
259259
insertThetaLong(dstWSeg, thetaLong);

src/main/java/org/apache/datasketches/theta/PreambleUtil.java

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,36 @@
109109
* 3 ||----------------------Start of Compact Long Array----------------------------------|
110110
* </pre>
111111
*
112+
* <p>The compressed CompactSketch has 8 bytes of preamble in exact mode because Theta can
113+
* be assumed to be 1.0. In estimating mode, the 2nd 8 bytes is Theta as a Long. The following
114+
* table assumes estimating mode. In any case the number of retained entries starts immediately
115+
* after, followed immediately by the delta encoded compressed byte array.</p>
116+
* Unique to this table:
117+
* <ul><li>Byte 3: entryBits (entBits): max number of bits for any one 64 bit hash not
118+
* including leading zeros. A value in the range [1,63].</li>
119+
* <li>Byte 4: numEntriesBytes (numEB): number of bytes required to hold the integer of number
120+
* of retained entries not including leading zero bytes. A value in the range [1,4].</li>
121+
* <li>The number of retained entries is stored starting at byte 16 (assuming estimating mode)
122+
* and may extend through bytes 17, 18 and 19. In any case, the delta encoded compressed array
123+
* starts immediately after and could start at byte 17, 18, 19 or 20.</li>
124+
* </ul>
125+
*
126+
* <pre>
127+
* Long || Start Byte Adr:
128+
* Adr:
129+
* || 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
130+
* 0 || Seed Hash | Flags | numEB | entBits| FamID | SerVer | PreLongs = 3 |
131+
*
132+
* || 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 |
133+
* 1 ||------------------------------THETA_LONG-------------------------------------------|
134+
*
135+
* || | | | (20) | (19) | (18) | (17) | 16 |
136+
* 2 ||----------------Retained Entries stored as 1 to 4 bytes----------------------------|
137+
*
138+
* || | | | | | | | |
139+
* 3 ||------------------Delta encoded compressed byte array------------------------------|
140+
* </pre>
141+
*
112142
* <p>The UpdateSketch and AlphaSketch require 24 bytes of preamble followed by a non-compact
113143
* array of longs representing a hash table.</p>
114144
*

0 commit comments

Comments
 (0)