You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+29-10Lines changed: 29 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,34 +27,53 @@ This is the core Java component of the DataSketches library. It contains all of
27
27
28
28
This component is also a dependency of other components of the library that create adaptors for target systems, such as the [Apache Pig adaptor](https://github.com/apache/datasketches-pig), the [Apache Hive adaptor](https://github.com/apache/datasketches-hive), and others.
29
29
30
-
Note that we have a parallel core component for C++, Python and GO implementations of many of the same sketch algorithms,
31
-
[datasketches-cpp](https://github.com/apache/datasketches-cpp), [datasketches-python](https://github.com/apache/datasketches-python), and
Please visit the main [DataSketches website](https://datasketches.apache.org) for more information.
35
37
36
38
If you are interested in making contributions to this site please see our [Community](https://datasketches.apache.org/docs/Community/) page for how to contact us.
37
39
38
40
---
41
+
## Major Changes with this Release
42
+
This release is a major release where we took the opportunity to do some significant refactoring that will constitute incompatible changes from previous releases. Any incompatibility with prior releases is always an inconvenience to users who wish to just upgrade to the latest release and run. However, some of the code in this library was written in 2013 and meanwhile the Java language has evolved enormously since then. We chose to use this major release as the opportunity to modernize some of the code to achieve the following goals:
43
+
44
+
### Eliminate the dependency on the DataSketches-Memory component.
45
+
The DataSketches-Memory component was originally developed in 2014 to address the need for fast access to off-heap memory data structures and used Unsafe and other JVM internals as there were no satisfactory Java language features to do this at the time.
46
+
47
+
The FFM capabilities introduced into the language in Java 22, are now part of the Java 25 LTS release, which we support. Since the capabilities of FFM are a superset of the original DataSketches-Memory component, it made sense to rewrite the code to eliminate the dependency on DataSketches-Memory and use FFM instead. This impacted code across the entire library.
48
+
49
+
This provided several advantages to the code base. By removing this dependency on DataSketches-Memory, there are now no runtime dependencies! This should make integrating this library into other Java systems much simpler. Since FFM is tightly integrated into the Java language, it has improved performance, especially with bulk operations.
50
+
51
+
- As an added note: There are numerous other improvements to the Java language that we could perhaps take advantage of in a rewrite, e.g., Records, text blocks, switch expressions, sealed, var, modules, patterns, etc. However, faced with the risk of accidentally creating bugs due to too many changes at one time, we focused on FFM, which actually improve performance as opposed to just syntactic sugar.
52
+
53
+
### Align public sketch class names so that the sketch family name is part of the class name.
54
+
For example, the Theta sketch was the first sketch written for the library and its base class was called *Sketch*. Obviously, because it was the only sketch! The Tuple sketch evolved soon after and its base class was also called *Sketch*. Oops, bad idea. If a user wanted to use both the Theta and Tuple sketches in the same class one of them had to be fully qualified every time it was referenced. Ugh!
55
+
56
+
Unfortunately, this habit propagated so some of the other early sketches where we ended up with two different sketches with a *ItemsSketch*, for example. For the more recent additions to the library we started including the sketch family name in all the relevant sketch-like public classes of a sketch family.
57
+
58
+
In this release we have refactored these older sketches with new names that now include the sketch family name. Yes, this is an incompatible change for user code moving from earlier releases, but this can be usually fixed with search-and-replace tools. This release is not perfect, but hopefully more consistent across all the different sketch families.
59
+
39
60
40
61
## Build & Runtime Dependencies
41
62
42
63
### Installation Directory Path
43
64
**NOTE:** This component accesses resource files for testing. As a result, the directory elements of the full absolute path of the target installation directory must qualify as Java identifiers. In other words, the directory elements must not have any space characters (or non-Java identifier characters) in any of the path elements. This is required by the Oracle Java Specification in order to ensure location-independent access to resources: [See Oracle Location-Independent Access to Resources](https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html)
44
65
45
-
### OpenJDK Version 24
46
-
An OpenJDK-compatible build of Java 24, provided by one of the Open-Source JVM providers, such as Azul Systems, Red Hat, SAP, Eclipse Temurin, etc, is required.
47
-
All of the testing of this release has been performed with an Eclipse Temurin build.
48
-
49
-
This release uses the new Java Foreign Function & Memory (FFM) features that were made part of the Java Language in in Java 22.
66
+
### OpenJDK Version 25
67
+
At minimum, an OpenJDK-compatible build of Java 25, provided by one of the Open-Source JVM providers, such as *Azul Systems*, *Red Hat*, *SAP*, *Eclipse Temurin*, etc, is required.
68
+
All of the testing of this release has been performed with the *Eclipse Temurin* build.
50
69
51
70
## Compilation and Test using Maven
52
71
This DataSketches component is structured as a Maven project and Maven is the recommended tool for compile and test.
53
72
54
73
#### A Toolchain is required
55
74
56
-
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 24.
57
-
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 24. Note that if your \$JAVA\_HOME is set to a Java version greater than 24, Maven will automatically use the Java 24 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, so no further action should be required.
75
+
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 25.
76
+
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 25. Note that if your \$JAVA\_HOME is set to a Java version greater than 25, Maven will automatically use the Java 25 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, if required, so no further action is needed.
58
77
* Note that the paths specified in the toolchain must be fully qualified direct paths to the OpenJDK version locations. Using environment variables will not work.
0 commit comments