Skip to content

Commit 316d691

Browse files
authored
Merge pull request #681 from apache/Fix_endian_issues
Made detection of BigEndian consistent throughout.
2 parents 2f31a88 + 9ad6c07 commit 316d691

33 files changed

Lines changed: 359 additions & 263 deletions

NOTICE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Copyright 2025 The Apache Software Foundation
33

44
Copyright 2015-2018 Yahoo Inc.
55
Copyright 2019-2020 Verizon Media
6-
Copyright 2021- Yahoo Inc.
6+
Copyright 2021-2025 Yahoo Inc.
77

88
This product includes software developed at
99
The Apache Software Foundation (http://www.apache.org/).

README.md

Lines changed: 8 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@ This is the core Java component of the DataSketches library. It contains all of
2727

2828
This component is also a dependency of other components of the library that create adaptors for target systems, such as the [Apache Pig adaptor](https://github.com/apache/datasketches-pig), the [Apache Hive adaptor](https://github.com/apache/datasketches-hive), and others.
2929

30-
Note that we have a parallel core component for C++ and Python implementations of many of the same sketch algorithms,
31-
[datasketches-cpp](https://github.com/apache/datasketches-cpp) and [datasketches-python](https://github.com/apache/datasketches-python)
30+
Note that we have a parallel core component for C++, Python and GO implementations of many of the same sketch algorithms,
31+
[datasketches-cpp](https://github.com/apache/datasketches-cpp), [datasketches-python](https://github.com/apache/datasketches-python), and
32+
[datasketches-go](https://github.com/apache/datasketches-go).
3233

3334
Please visit the main [DataSketches website](https://datasketches.apache.org) for more information.
3435

@@ -41,28 +42,19 @@ If you are interested in making contributions to this site please see our [Commu
4142
### Installation Directory Path
4243
**NOTE:** This component accesses resource files for testing. As a result, the directory elements of the full absolute path of the target installation directory must qualify as Java identifiers. In other words, the directory elements must not have any space characters (or non-Java identifier characters) in any of the path elements. This is required by the Oracle Java Specification in order to ensure location-independent access to resources: [See Oracle Location-Independent Access to Resources](https://docs.oracle.com/javase/8/docs/technotes/guides/lang/resources.html)
4344

44-
### OpenJDK Version 21
45-
An OpenJDK-compatible build of Java 21, provided by one of the Open-Source providers, such as Azul Systems, Red Hat, SAP, Eclipse Temurin, etc, is required.
45+
### OpenJDK Version 24
46+
An OpenJDK-compatible build of Java 24, provided by one of the Open-Source JVM providers, such as Azul Systems, Red Hat, SAP, Eclipse Temurin, etc, is required.
4647
All of the testing of this release has been performed with an Eclipse Temurin build.
4748

48-
This release uses the new Java Foreign Function & Memory (FFM) features that are in "preview" in Java 21.
49-
As a result, the JVM flag <nobr>**--enable-preview**</nobr> must be set at compile and at runtime.
50-
51-
**NOTE:** OpenJDK versions greater than 21 do not support running Java 21 class files (Class ID 65) with preview code. The runtime JVM version must be 21.
52-
53-
**NOTE:** The Eclipse Compiler for Java (ECJ) version 21, used by default in both the Eclipse JDT IDE and the VScode IDE, will not allow compilation of preview code for Java version 21. Both Eclipse and VScode can be configured to use an OpenJDK compiler instead, but you will loose the incremental build capability of the ECJ.
54-
55-
### DataSketches Memory 6.0.0
56-
57-
This component depends on the [datasketches-memory-6.0.0](https://github.com/apache/datasketches-memory/tree/6.0.0) component,
49+
This release uses the new Java Foreign Function & Memory (FFM) features that were made part of the Java Language in in Java 22.
5850

5951
## Compilation and Test using Maven
6052
This DataSketches component is structured as a Maven project and Maven is the recommended tool for compile and test.
6153

6254
#### A Toolchain is required
6355

64-
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 21.
65-
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 21. Note that if your \$JAVA\_HOME is set to a Java version greater than 21, Maven will automatically use the Java 21 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, so no further action should be required.
56+
* You must have a JDK type toolchain defined in location *~/.m2/toolchains.xml* that specifies where to find a locally installed OpenJDK-compatible version 24.
57+
* Your default \$JAVA\_HOME compiler must be OpenJDK compatible, specified in the toolchain, and may be a version greater than 24. Note that if your \$JAVA\_HOME is set to a Java version greater than 24, Maven will automatically use the Java 24 version specified in the toolchain instead. The included pom.xml specifies the necessary JVM flags, so no further action should be required.
6658
* Note that the paths specified in the toolchain must be fully qualified direct paths to the OpenJDK version locations. Using environment variables will not work.
6759

6860
#### To run normal unit tests:

pom.xml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -96,10 +96,9 @@ under the License.
9696
<!-- System-wide properties -->
9797
<maven.version>3.9.10</maven.version>
9898
<java.version>24</java.version>
99-
<jvm-arguments>-Xmx10g -Duser.language=en -Duser.country=US -Dfile.encoding=UTF-8</jvm-arguments>
99+
<jvm-arguments>-Xmx4g -Duser.language=en -Duser.country=US -Dfile.encoding=UTF-8</jvm-arguments>
100100
<maven.compiler.source>${java.version}</maven.compiler.source>
101101
<maven.compiler.target>${java.version}</maven.compiler.target>
102-
<argLine>${jvm-arguments}</argLine>
103102
<charset.encoding>UTF-8</charset.encoding>
104103
<project.build.sourceEncoding>${charset.encoding}</project.build.sourceEncoding>
105104
<project.build.resourceEncoding>${charset.encoding}</project.build.resourceEncoding>
@@ -159,7 +158,7 @@ under the License.
159158
<version>${maven-compiler-plugin.version}</version>
160159
<configuration>
161160
<compilerArgs>
162-
<arg>${jvm-arguments}</arg>
161+
<arg></arg>
163162
</compilerArgs>
164163
</configuration>
165164
</plugin>
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
/*
2+
* Licensed to the Apache Software Foundation (ASF) under one
3+
* or more contributor license agreements. See the NOTICE file
4+
* distributed with this work for additional information
5+
* regarding copyright ownership. The ASF licenses this file
6+
* to you under the Apache License, Version 2.0 (the
7+
* "License"); you may not use this file except in compliance
8+
* with the License. You may obtain a copy of the License at
9+
*
10+
* http://www.apache.org/licenses/LICENSE-2.0
11+
*
12+
* Unless required by applicable law or agreed to in writing,
13+
* software distributed under the License is distributed on an
14+
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
* KIND, either express or implied. See the License for the
16+
* specific language governing permissions and limitations
17+
* under the License.
18+
*/
19+
20+
package org.apache.datasketches.common;
21+
22+
/**
23+
* The DataSketches Library is not supported on Big Endian machines.
24+
*/
25+
public class BigEndianNativeOrderNotSupportedException extends SketchesException {
26+
private static final long serialVersionUID = 1L;
27+
28+
/**
29+
* Constructs a new runtime exception with the message:
30+
* "The DataSketches Library is not supported on Big Endian machines."
31+
*/
32+
public BigEndianNativeOrderNotSupportedException() {
33+
super("The DataSketches Library is not supported on Big Endian machines.");
34+
}
35+
}

src/main/java/org/apache/datasketches/common/Family.java

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919

2020
package org.apache.datasketches.common;
2121

22+
import java.nio.ByteOrder;
2223
import java.util.HashMap;
2324
import java.util.Locale;
2425
import java.util.Map;
@@ -165,19 +166,22 @@ public enum Family {
165166

166167
private static final Map<Integer, Family> lookupID = new HashMap<>();
167168
private static final Map<String, Family> lookupFamName = new HashMap<>();
168-
private int id_;
169-
private String famName_;
170-
private int minPreLongs_;
171-
private int maxPreLongs_;
169+
private final int id_;
170+
private final String famName_;
171+
private final int minPreLongs_;
172+
private final int maxPreLongs_;
172173

173174
static {
174175
for (final Family f : values()) {
175176
lookupID.put(f.getID(), f);
176177
lookupFamName.put(f.getFamilyName().toUpperCase(Locale.US), f);
177178
}
179+
if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
180+
throw new BigEndianNativeOrderNotSupportedException();
181+
}
178182
}
179183

180-
private Family(final int id, final String famName, final int minPreLongs, final int maxPreLongs) {
184+
Family(final int id, final String famName, final int minPreLongs, final int maxPreLongs) {
181185
id_ = id;
182186
famName_ = famName.toUpperCase(Locale.US);
183187
minPreLongs_ = minPreLongs;

src/main/java/org/apache/datasketches/common/Util.java

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
import static org.apache.datasketches.hash.MurmurHash3.hash;
2929

3030
import java.lang.foreign.MemorySegment;
31+
import java.nio.ByteOrder;
3132
import java.util.Comparator;
3233

3334
/**
@@ -38,6 +39,12 @@
3839
@SuppressWarnings("unchecked")
3940
public final class Util {
4041

42+
static {
43+
if (ByteOrder.nativeOrder() != ByteOrder.LITTLE_ENDIAN) {
44+
throw new BigEndianNativeOrderNotSupportedException();
45+
}
46+
}
47+
4148
/**
4249
* The java line separator character as a String.
4350
*/
@@ -812,7 +819,7 @@ public static <T> Object maxT(final Object item1, final Object item2, final Comp
812819
}
813820

814821
/**
815-
* Is item1 Less-Than item2
822+
* Is item1 Less-Than item2?
816823
* @param <T> the type
817824
* @param item1 item one
818825
* @param item2 item two
@@ -824,7 +831,7 @@ public static <T> boolean lt(final Object item1, final Object item2, final Compa
824831
}
825832

826833
/**
827-
* Is item1 Less-Than-Or-Equal-To item2
834+
* Is item1 Less-Than-Or-Equal-To item2?
828835
* @param <T> the type
829836
* @param item1 item one
830837
* @param item2 item two

src/main/java/org/apache/datasketches/cpc/PreambleUtil.java

Lines changed: 1 addition & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,6 @@
3131
import static org.apache.datasketches.cpc.RuntimeAsserts.rtAssertEquals;
3232

3333
import java.lang.foreign.MemorySegment;
34-
import java.nio.ByteOrder;
3534
import java.util.Objects;
3635

3736
import org.apache.datasketches.common.Family;
@@ -142,12 +141,6 @@ final class PreambleUtil {
142141

143142
private PreambleUtil() {}
144143

145-
static {
146-
if (ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN) {
147-
throw new SketchesStateException("This sketch will not work on Big Endian CPUs.");
148-
}
149-
}
150-
151144
private static final String fmt = "%10d%10x";
152145

153146
/**
@@ -156,7 +149,7 @@ private PreambleUtil() {}
156149
static final byte SER_VER = 1;
157150

158151
//Flag bit masks, Byte 5
159-
static final int BIG_ENDIAN_FLAG_MASK = 1; //Reserved.
152+
static final int RESERVED_FLAG_MASK = 1; //Reserved.
160153
static final int COMPRESSED_FLAG_MASK = 2;
161154
static final int HIP_FLAG_MASK = 4;
162155
static final int SUP_VAL_FLAG_MASK = 8; //num Suprising Values > 0
@@ -584,7 +577,6 @@ static String toString(final MemorySegment seg, final boolean detail) {
584577

585578
//Flags of the Flags byte
586579
final String flagsStr = zeroPad(Integer.toBinaryString(flags), 8) + ", " + (flags);
587-
final boolean bigEndian = (flags & BIG_ENDIAN_FLAG_MASK) > 0;
588580
final boolean compressed = (flags & COMPRESSED_FLAG_MASK) > 0;
589581
final boolean hasHip = (flags & HIP_FLAG_MASK) > 0;
590582
final boolean hasSV = (flags & SUP_VAL_FLAG_MASK) > 0;
@@ -593,8 +585,6 @@ static String toString(final MemorySegment seg, final boolean detail) {
593585
final int formatOrdinal = (flags >>> 2) & 0x7;
594586
final Format format = Format.ordinalToFormat(formatOrdinal);
595587

596-
final String nativeOrderStr = ByteOrder.nativeOrder().toString();
597-
598588
long numCoupons = 0;
599589
long numSv = 0;
600590
long winOffset = 0;
@@ -616,8 +606,6 @@ static String toString(final MemorySegment seg, final boolean detail) {
616606
sb.append("Byte 3: lgK : ").append(lgK).append(LS);
617607
sb.append("Byte 4: First Interesting Col : ").append(fiCol).append(LS);
618608
sb.append("Byte 5: Flags : ").append(flagsStr).append(LS);
619-
sb.append(" BIG_ENDIAN_STORAGE : ").append(bigEndian).append(LS);
620-
sb.append(" (Native Byte Order) : ").append(nativeOrderStr).append(LS);
621609
sb.append(" Compressed : ").append(compressed).append(LS);
622610
sb.append(" Has HIP : ").append(hasHip).append(LS);
623611
sb.append(" Has Surprising Values : ").append(hasSV).append(LS);

src/main/java/org/apache/datasketches/hll/PreambleUtil.java

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,6 @@
3232
import static org.apache.datasketches.hll.HllUtil.RESIZE_NUMER;
3333

3434
import java.lang.foreign.MemorySegment;
35-
import java.nio.ByteOrder;
3635

3736
import org.apache.datasketches.common.Family;
3837

@@ -133,7 +132,7 @@ private PreambleUtil() {}
133132
static int HLL_BYTE_ARR_START = 40;
134133

135134
//Flag bit masks
136-
static final int BIG_ENDIAN_FLAG_MASK = 1; //Set but not read. Reserved.
135+
static final int RESERVED_FLAG_MASK = 1; //Set to 0 but not read.
137136
static final int READ_ONLY_FLAG_MASK = 2; //Set but not read. Reserved.
138137
static final int EMPTY_FLAG_MASK = 4;
139138
static final int COMPACT_FLAG_MASK = 8;
@@ -150,8 +149,6 @@ private PreambleUtil() {}
150149
static final int LIST_PREINTS = 2;
151150
static final int HASH_SET_PREINTS = 3;
152151
static final int HLL_PREINTS = 10;
153-
static final boolean NATIVE_ORDER_IS_BIG_ENDIAN =
154-
(ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN);
155152

156153
static String toString(final byte[] byteArr) {
157154
final MemorySegment seg = MemorySegment.ofArray(byteArr);
@@ -168,8 +165,6 @@ static String toString(final MemorySegment seg) {
168165
final int flags = seg.get(JAVA_BYTE, FLAGS_BYTE);
169166
//Flags
170167
final String flagsStr = zeroPad(Integer.toBinaryString(flags), 8) + ", " + (flags);
171-
final boolean bigEndian = (flags & BIG_ENDIAN_FLAG_MASK) > 0;
172-
final String nativeOrder = ByteOrder.nativeOrder().toString();
173168
final boolean compact = (flags & COMPACT_FLAG_MASK) > 0;
174169
final boolean oooFlag = (flags & OUT_OF_ORDER_FLAG_MASK) > 0;
175170
final boolean readOnly = (flags & READ_ONLY_FLAG_MASK) > 0;
@@ -219,8 +214,6 @@ else if (curMode == CurMode.HLL) {
219214
}
220215
//expand byte 5: Flags
221216
sb.append("Byte 5: Flags: : ").append(flagsStr).append(LS);
222-
sb.append(" BIG_ENDIAN_STORAGE : ").append(bigEndian).append(LS);
223-
sb.append(" (Native Byte Order) : ").append(nativeOrder).append(LS);
224217
sb.append(" READ_ONLY : ").append(readOnly).append(LS);
225218
sb.append(" EMPTY : ").append(empty).append(LS);
226219
sb.append(" COMPACT : ").append(compact).append(LS);
@@ -286,8 +279,7 @@ static void insertLgK(final MemorySegment wseg, final int lgK) {
286279
}
287280

288281
static int extractLgArr(final MemorySegment seg) {
289-
final int lgArr = seg.get(JAVA_BYTE, LG_ARR_BYTE) & 0XFF;
290-
return lgArr;
282+
return seg.get(JAVA_BYTE, LG_ARR_BYTE) & 0XFF;
291283
}
292284

293285
static void insertLgArr(final MemorySegment wseg, final int lgArr) {

src/main/java/org/apache/datasketches/quantiles/PreambleUtil.java

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,6 @@
2828
import static org.apache.datasketches.quantiles.ClassicUtil.computeRetainedItems;
2929

3030
import java.lang.foreign.MemorySegment;
31-
import java.nio.ByteOrder;
3231

3332
//@formatter:off
3433

@@ -92,15 +91,12 @@ private PreambleUtil() {}
9291
static final int COMBINED_BUFFER = 32; //to 39 (Only for DoublesSketch)
9392

9493
// flag bit masks
95-
static final int BIG_ENDIAN_FLAG_MASK = 1;
94+
static final int RESERVED_FLAG_MASK = 1;
9695
static final int READ_ONLY_FLAG_MASK = 2;
9796
static final int EMPTY_FLAG_MASK = 4;
9897
static final int COMPACT_FLAG_MASK = 8;
9998
static final int ORDERED_FLAG_MASK = 16;
10099

101-
static final boolean NATIVE_ORDER_IS_BIG_ENDIAN =
102-
(ByteOrder.nativeOrder() == ByteOrder.BIG_ENDIAN);
103-
104100
/**
105101
* Default K for about 1.7% normalized rank accuracy
106102
*/
@@ -143,8 +139,6 @@ private static String memorySegmentToString(final MemorySegment srcSeg, final bo
143139
final int familyID = extractFamilyID(srcSeg);
144140
final String famName = idToFamily(familyID).toString();
145141
final int flags = extractFlags(srcSeg);
146-
final boolean bigEndian = (flags & BIG_ENDIAN_FLAG_MASK) > 0;
147-
final String nativeOrder = ByteOrder.nativeOrder().toString();
148142
final boolean readOnly = (flags & READ_ONLY_FLAG_MASK) > 0;
149143
final boolean empty = (flags & EMPTY_FLAG_MASK) > 0;
150144
final boolean compact = (flags & COMPACT_FLAG_MASK) > 0;
@@ -166,8 +160,7 @@ private static String memorySegmentToString(final MemorySegment srcSeg, final bo
166160
sb.append("Byte 1: Serialization Version: ").append(serVer).append(LS);
167161
sb.append("Byte 2: Family : ").append(famName).append(LS);
168162
sb.append("Byte 3: Flags Field : ").append(String.format("%02o", flags)).append(LS);
169-
sb.append(" BIG ENDIAN : ").append(bigEndian).append(LS);
170-
sb.append(" (Native Byte Order) : ").append(nativeOrder).append(LS);
163+
sb.append(" RESERVED : ").append(LS);
171164
sb.append(" READ ONLY : ").append(readOnly).append(LS);
172165
sb.append(" EMPTY : ").append(empty).append(LS);
173166
sb.append(" COMPACT : ").append(compact).append(LS);

0 commit comments

Comments
 (0)