Skip to content

Commit 0f592d6

Browse files
committed
Fixed Hll Union to HllUnion.
Updated README, POM.
1 parent 0833e37 commit 0f592d6

20 files changed

Lines changed: 212 additions & 210 deletions

README.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -27,34 +27,53 @@ This is the core Java component of the DataSketches library. It contains all of
2727

2828
This component is also a dependency of other components of the library that create adaptors for target systems, such as the [Apache Pig adaptor](https://github.com/apache/datasketches-pig), the [Apache Hive adaptor](https://github.com/apache/datasketches-hive), and others.
2929

30-
Note that we have a parallel core component for C++, Python and GO implementations of many of the same sketch algorithms,
31-
[datasketches-cpp](https://github.com/apache/datasketches-cpp), [datasketches-python](https://github.com/apache/datasketches-python), and
32-
[datasketches-go](https://github.com/apache/datasketches-go).
30+
Note that we have parallel core components for C++, Python and GO implementations of many of the same sketch algorithms:
31+
32+
- [datasketches-cpp](https://github.com/apache/datasketches-cpp),
33+
- [datasketches-python](https://github.com/apache/datasketches-python),
34+
- [datasketches-go](https://github.com/apache/datasketches-go).
3335

3436
Please visit the main [DataSketches website](https://datasketches.apache.org) for more information.
3537

3638
If you are interested in making contributions to this site please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us.
3739

3840
---
41+
## Major Changes with this Release
42+
This release is a major release where we took the opportunity to do some significant refactoring that will constitute incompatible changes from previous releases. Any incompatibility with prior releases is always an inconvenience to users who wish to just upgrade to the latest release and run. However, some of the code in this library was written in 2013 and meanwhile the Java language has evolved enormously since then. We chose to use this major release as the opportunity to modernize some of the code to achieve the following goals:
43+
44+
### Eliminate the dependency on the DataSketches-Memory component.
45+
The DataSketches-Memory component was originally developed in 2014 to address the need for fast access to off-heap memory data structures and used Unsafe and other JVM internals as there were no satisfactory Java language features to do this at the time.
46+
47+
The FFM capabilities introduced into the language in Java 22, are now part of the Java 25 LTS release, which we support. Since the capabilities of FFM are a superset of the original DataSketches-Memory component, it made sense to rewrite the code to eliminate the dependency on DataSketches-Memory and use FFM instead. This impacted code across the entire library.
48+
49+
This provided several advantages to the code base. By removing this dependency on DataSketches-Memory, there are now no runtime dependencies! This should make integrating this library into other Java systems much simpler. Since FFM is tightly integrated into the Java language, it has improved performance, especially with bulk operations.
50+
51+
- As an added note: There are numerous other improvements to the Java language that we could perhaps take advantage of in a rewrite, e.g., Records, text blocks, switch expressions, sealed, var, modules, patterns, etc. However, faced with the risk of accidentally creating bugs due to too many changes at one time, we focused on FFM, which actually improve performance as opposed to just syntactic sugar.
52+
53+
### Align public sketch class names so that the sketch family name is part of the class name.
54+
For example, the Theta sketch was the first sketch written for the library and its base class was called *Sketch*. Obviously, because it was the only sketch! The Tuple sketch evolved soon after and its base class was also called *Sketch*. Oops, bad idea. If a user wanted to use both the Theta and Tuple sketches in the same class one of them had to be fully qualified every time it was referenced. Ugh!
55+
56+
Unfortunately, this habit propagated so some of the other early sketches where we ended up with two different sketches with a *ItemsSketch*, for example. For the more recent additions to the library we started including the sketch family name in all the relevant sketch-like public classes of a sketch family.
57+
58+
In this release we have refactored these older sketches with new names that now include the sketch family name. Yes, this is an incompatible change for user code moving from earlier releases, but this can be usually fixed with search-and-replace tools. This release is not perfect, but hopefully more consistent across all the different sketch families.
59+
3960

4061
## Build & Runtime Dependencies
4162

4263
### Installation Directory Path
4364
**NOTE:** This component accesses resource files for testing. As a result, the directory elements of the full absolute path of the target installation directory must qualify as Java identifiers. In other words, the directory elements must not have any space characters (or non-Java identifier characters) in any of the path elements. This is required by the Oracle Java Specification in order to ensure location-independent access to resources: [See Oracle Location-Independent Access to Resources](https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html)
4465

45-
### OpenJDK Version 24
46-
An OpenJDK-compatible build of Java 24, provided by one of the Open-Source JVM providers, such as Azul Systems, Red Hat, SAP, Eclipse Temurin, etc, is required.
47-
All of the testing of this release has been performed with an Eclipse Temurin build.
48-
49-
This release uses the new Java Foreign Function & Memory (FFM) features that were made part of the Java Language in in Java 22.
66+
### OpenJDK Version 25
67+
At minimum, an OpenJDK-compatible build of Java 25, provided by one of the Open-Source JVM providers, such as *Azul Systems*, *Red Hat*, *SAP*, *Eclipse Temurin*, etc, is required.
68+
All of the testing of this release has been performed with the *Eclipse Temurin* build.
5069

5170
## Compilation and Test using Maven
5271
This DataSketches component is structured as a Maven project and Maven is the recommended tool for compile and test.
5372

5473
#### A Toolchain is required
5574

56-
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 24.
57-
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 24. Note that if your \$JAVA\_HOME is set to a Java version greater than 24, Maven will automatically use the Java 24 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, so no further action should be required.
75+
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 25.
76+
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 25. Note that if your \$JAVA\_HOME is set to a Java version greater than 25, Maven will automatically use the Java 25 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, if required, so no further action is needed.
5877
* Note that the paths specified in the toolchain must be fully qualified direct paths to the OpenJDK version locations. Using environment variables will not work.
5978

6079
#### To run normal unit tests:

src/main/java/org/apache/datasketches/hll/BaseHllSketch.java

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@
3535

3636
/**
3737
* Although this class is package-private, it provides a single place to define and document
38-
* the common public API for both HllSketch and Union.
38+
* the common public API for both HllSketch and HllUnion.
3939
* @author Lee Rhodes
4040
* @author Kevin Lang
4141
*/
@@ -115,7 +115,7 @@ public static final int getSerializationVersion(final MemorySegment seg) {
115115
* Gets the current (approximate) Relative Error (RE) asymptotic values given several
116116
* parameters. This is used primarily for testing.
117117
* @param upperBound return the RE for the Upper Bound, otherwise for the Lower Bound.
118-
* @param oooFlag set true if the sketch is the result of a non qualifying union operation.
118+
* @param oooFlag set true if the sketch is the result of a non qualifying HllUnion operation.
119119
* @param lgConfigK the configured value for the sketch.
120120
* @param numStdDev the given number of Standard Deviations. This must be an integer between
121121
* 1 and 3, inclusive.
@@ -206,8 +206,8 @@ public boolean isEstimationMode() {
206206
* inquire of the sketch if it has, in fact, moved itself.
207207
*
208208
* @param seg the given MemorySegment
209-
* @return true if the given MemorySegment refers to the same underlying resource as this sketch or
210-
* union.
209+
* @return true if the given MemorySegment refers to the same underlying resource as this HllSketch or
210+
* HllUnion.
211211
*/
212212
@Override
213213
public abstract boolean isSameResource(MemorySegment seg);
@@ -219,17 +219,17 @@ public boolean isEstimationMode() {
219219

220220
/**
221221
* Serializes this sketch as a byte array in compact form. The compact form is smaller in size
222-
* than the updatable form and read-only. It can be used in union operations as follows:
222+
* than the updatable form and read-only. It can be used in HllUnion operations as follows:
223223
* <pre>{@code
224-
* Union union; HllSketch sk, sk2;
224+
* HllUnion union; HllSketch sk, sk2;
225225
* int lgK = 12;
226226
* sk = new HllSketch(lgK, TgtHllType.HLL_4); //can be 4, 6, or 8
227227
* for (int i = 0; i < (2 << lgK); i++) { sk.update(i); }
228228
* byte[] arr = HllSketch.toCompactByteArray();
229229
* //...
230-
* union = Union.heapify(arr); //initializes the union using data from the array.
230+
* union = HllUnion.heapify(arr); //initializes the HllUnion using data from the array.
231231
* //OR, if used in an off-heap environment:
232-
* union = Union.heapify(MemorySegment.ofArray(arr)); //same as above, except from MemorySegment object.
232+
* union = HllUnion.heapify(MemorySegment.ofArray(arr)); //same as above, except from MemorySegment object.
233233
*
234234
* //To recover an updatable heap sketch:
235235
* sk2 = HllSketch.heapify(arr);
@@ -250,17 +250,17 @@ public boolean isEstimationMode() {
250250
/**
251251
* Serializes this sketch as a byte array in an updatable form. The updatable form is larger than
252252
* the compact form. The use of this form is primarily in environments that support updating
253-
* sketches in off-heap MemorySegment. If the sketch is constructed using HLL_8, sketch updating and
254-
* union updating operations can actually occur in MemorySegment, which can be off-heap:
253+
* sketches in off-heap MemorySegment. If the sketch is constructed using HLL_8, HllSketch updating and
254+
* HllUnion updating operations can actually occur in MemorySegment, which can be off-heap:
255255
* <pre>{@code
256-
* Union union; HllSketch sk;
256+
* HllUnion union; HllSketch sk;
257257
* int lgK = 12;
258258
* sk = new HllSketch(lgK, TgtHllType.HLL_8) //must be 8
259259
* for (int i = 0; i < (2 << lgK); i++) { sk.update(i); }
260260
* byte[] arr = sk.toUpdatableByteArray();
261261
* MemorySegment wseg = MemorySegment.wrap(arr);
262262
* //...
263-
* union = Union.writableWrap(wseg); //no deserialization!
263+
* union = HllUnion.writableWrap(wseg); //no deserialization!
264264
* }</pre>
265265
* @return this sketch as an updatable byte array.
266266
*/

src/main/java/org/apache/datasketches/hll/DirectHll4Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ void putNibble(final int slotNo, final int nibValue) {
136136
}
137137

138138
@Override
139-
//Would be used by Union, but not used because the gadget is always HLL8 type
139+
//Would be used by HllUnion, but not used because the gadget is always HLL8 type
140140
void updateSlotNoKxQ(final int slotNo, final int newValue) {
141141
throw new SketchesStateException("Improper access.");
142142
}

src/main/java/org/apache/datasketches/hll/DirectHll6Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ void putNibble(final int slotNo, final int nibValue) {
8383
}
8484

8585
@Override
86-
//Would be used by Union, but not used because the gadget is always HLL8 type
86+
//Would be used by HllUnion, but not used because the gadget is always HLL8 type
8787
void updateSlotNoKxQ(final int slotNo, final int newValue) {
8888
throw new SketchesStateException("Improper access.");
8989
}

src/main/java/org/apache/datasketches/hll/DirectHll8Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ void putNibble(final int slotNo, final int nibValue) {
8686
}
8787

8888
@Override
89-
//Used by Union when source is not HLL8
89+
//Used by HllUnion when source is not HLL8
9090
void updateSlotNoKxQ(final int slotNo, final int newValue) {
9191
final int oldValue = getSlotValue(slotNo);
9292
if (newValue > oldValue) {

src/main/java/org/apache/datasketches/hll/Hll4Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,7 +136,7 @@ void putNibble(final int slotNo, final int nibValue) {
136136
}
137137

138138
@Override
139-
//Would be used by Union, but not used because the gadget is always HLL8 type
139+
//Would be used by HllUnion, but not used because the gadget is always HLL8 type
140140
void updateSlotNoKxQ(final int slotNo, final int newValue) {
141141
throw new SketchesStateException("Improper access.");
142142
}

src/main/java/org/apache/datasketches/hll/Hll6Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ void putNibble(final int slotNo, final int nibValue) {
9393
}
9494

9595
@Override
96-
//Would be used by Union, but not used because the gadget is always HLL8 type
96+
//Would be used by HllUnion, but not used because the gadget is always HLL8 type
9797
void updateSlotNoKxQ(final int slotNo, final int newValue) {
9898
throw new SketchesStateException("Improper access.");
9999
}

src/main/java/org/apache/datasketches/hll/Hll8Array.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ void putNibble(final int slotNo, final int nibValue) {
9292
}
9393

9494
@Override
95-
//Used by Union when source is not HLL8
95+
//Used by HllUnion when source is not HLL8
9696
void updateSlotNoKxQ(final int slotNo, final int newValue) {
9797
final int oldValue = getSlotValue(slotNo);
9898
hllByteArr[slotNo] = (byte) Math.max(newValue, oldValue);

src/main/java/org/apache/datasketches/hll/HllSketch.java

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ public static final HllSketch heapify(final MemorySegment srcSeg) {
203203
return heapify(srcSeg, true);
204204
}
205205

206-
//used by union and above
206+
//used by HllUnion and above
207207
static final HllSketch heapify(final MemorySegment srcSeg, final boolean checkRebuild) {
208208
Objects.requireNonNull(srcSeg, "Source MemorySegment must not be null");
209209
checkBounds(0, 8, srcSeg.byteSize()); //need min 8 bytes
@@ -218,7 +218,7 @@ static final HllSketch heapify(final MemorySegment srcSeg, final boolean checkRe
218218
} else { //Hll_8
219219
heapSketch = new HllSketch(Hll8Array.heapify(srcSeg));
220220
if (checkRebuild) {
221-
Union.checkRebuildCurMinNumKxQ(heapSketch);
221+
HllUnion.checkRebuildCurMinNumKxQ(heapSketch);
222222
}
223223
}
224224
} else if (curMode == CurMode.LIST) {
@@ -245,7 +245,7 @@ public static final HllSketch writableWrap(final MemorySegment srcWseg) {
245245
return writableWrap(srcWseg, true);
246246
}
247247

248-
//used by union and above
248+
//used by HllUnion and above
249249
static final HllSketch writableWrap( final MemorySegment srcWseg, final boolean checkRebuild) {
250250
Objects.requireNonNull(srcWseg, "Source MemorySegment must not be null");
251251
checkBounds(0, 8, srcWseg.byteSize()); //need min 8 bytes
@@ -268,8 +268,8 @@ static final HllSketch writableWrap( final MemorySegment srcWseg, final boolean
268268
directSketch = new HllSketch(new DirectHll6Array(lgConfigK, srcWseg));
269269
} else { //Hll_8
270270
directSketch = new HllSketch(new DirectHll8Array(lgConfigK, srcWseg));
271-
if (checkRebuild) { //union only uses HLL_8, we allow non-finalized from a union call.
272-
Union.checkRebuildCurMinNumKxQ(directSketch);
271+
if (checkRebuild) { //HllUnion only uses HLL_8, we allow non-finalized from a HllUnion call.
272+
HllUnion.checkRebuildCurMinNumKxQ(directSketch);
273273
}
274274
}
275275
} else if (curMode == CurMode.LIST) {
@@ -305,8 +305,8 @@ public static final HllSketch wrap(final MemorySegment srcSeg) { //read only
305305
directSketch = new HllSketch(new DirectHll6Array(lgConfigK, srcSeg, true));
306306
} else { //Hll_8
307307
directSketch = new HllSketch(new DirectHll8Array(lgConfigK, srcSeg, true));
308-
//rebuild if srcSeg came from a union and was not finalized, rather than throw exception.
309-
Union.checkRebuildCurMinNumKxQ(directSketch);
308+
//rebuild if srcSeg came from a HllUnion and was not finalized, rather than throw exception.
309+
HllUnion.checkRebuildCurMinNumKxQ(directSketch);
310310
}
311311
} else if (curMode == CurMode.LIST) {
312312
directSketch =

0 commit comments

Comments
 (0)