Skip to content

reject non-xsd lexical forms in lexFloat and lexDouble#44

Open
aizu-m wants to merge 1 commit into
apache:trunkfrom
aizu-m:lexfloat-lexdouble-xsd-lexical
Open

reject non-xsd lexical forms in lexFloat and lexDouble#44
aizu-m wants to merge 1 commit into
apache:trunkfrom
aizu-m:lexfloat-lexdouble-xsd-lexical

Conversation

@aizu-m

@aizu-m aizu-m commented Jun 13, 2026

Copy link
Copy Markdown

Found while feeding malformed numerics through the lexical converters.

XsTypeConverter.lexFloat("0x1p4")     -> 16.0     (accepted)
XsTypeConverter.lexFloat("Infinity")  -> Infinity (accepted)
XsTypeConverter.lexFloat("1.0d")      -> 1.0      (accepted)
XsTypeConverter.lexDouble("1.0f")     -> 1.0      (accepted)

Float.parseFloat and Double.parseDouble take hex floats, the Java "Infinity" token and a trailing f/F/d/D suffix. None of those are in the xsd:float/xsd:double lexical space. lexFloat only guarded a trailing f/F and lexDouble only a trailing d/D, so each let the other type's suffix through, and both let hex floats and "Infinity" through.

This is reachable from untrusted XML through JavaFloatHolder/JavaDoubleHolder.validateLexical, so an out-of-space value validates clean instead of being flagged invalid.

Moved the guard into a shared helper that rejects the hex marker, the Infinity letters and the trailing suffix before parsing. INF, -INF, NaN and the ordinary decimal/exponent forms are untouched.

@pjfanning

pjfanning commented Jun 13, 2026

Copy link
Copy Markdown
Member

xmlbeans is over 20 years old and for it do suddenly be strict on numbers like this - I don't think that is a good user experience

  • I thought0x1p4 looked wrong - but apparently it is parseable (power of) - so not XSD compliant but parseable
  • failing because of trailing d or f, I would prefer if this was optional based on the XmlOptions class
  • likewise failing fo rinfinity failures could be controlled by the same XmlOptions option

@pjfanning

Copy link
Copy Markdown
Member

Users can validate their XML matches their XML schema using built-in Java classes

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
import java.io.IOException;
import java.io.InputStream;

public static boolean validate(InputStream xmlInputStream, 
                                   InputStream xsdInputStream) 
            throws SAXException, IOException {

        // Create SchemaFactory for W3C XML Schema
        SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
        
        // Load the schema
        Schema schema = factory.newSchema(new StreamSource(xsdInputStream));
        
        // Create Validator
        Validator validator = schema.newValidator();
        
        // Optional: set error handler for detailed messages
        // validator.setErrorHandler(new MyErrorHandler());
        
        try {
            validator.validate(new StreamSource(xmlInputStream));
            System.out.println("XML is valid according to the schema.");
            return true;
        } catch (SAXException e) {
            System.err.println("XML validation failed: " + e.getMessage());
            return false;
        }
    }

@aizu-m

aizu-m commented Jun 13, 2026

Copy link
Copy Markdown
Author

Fair point, defaulting to strict was the wrong call for a library this old.

When I traced the path in, the snag is that lexFloat/lexDouble are static and the validation route (JavaFloatHolder/JavaDoubleHolder.validateLexical) only carries a ValidationContext, not XmlOptions. So an XmlOptions opt-in means threading the flag down through validateLexical and the ValidationContext interface, since the options object doesn't reach the lexer today. Worth knowing before I touch that interface.

For context, the trailing f/F (float) and d/D (double) check isn't new, it predates this PR. The parts I added on top were the cross-type suffix, the hex form and the "Infinity" spelling.

So either I wire up the XmlOptions opt-in and thread it through (default stays lenient), or I drop the added strictness and leave the lexers as they were. Which would you rather? Given your point about users running strict validation through JAXP, I'm happy with the latter if you'd sooner keep the lexer lenient.

@pjfanning

Copy link
Copy Markdown
Member

I've got stuff on today so can't spend much time on this today.
This is not urgent so I'd suggest that you don't spend too much time on it until a design is agreed.
Maybe the ValidationContext or an existing param could be modified to include the XmlOptions.

The aim would be that calls like SstDocument.Factory.parse(is, DEFAULT_XML_OPTIONS); would have their options passed through the call stack and that we can add new overloads for static methods that need options and use these overloads instead of the existing ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants