reject non-xsd lexical forms in lexFloat and lexDouble#44
Conversation
|
xmlbeans is over 20 years old and for it do suddenly be strict on numbers like this - I don't think that is a good user experience
|
|
Users can validate their XML matches their XML schema using built-in Java classes |
|
Fair point, defaulting to strict was the wrong call for a library this old. When I traced the path in, the snag is that lexFloat/lexDouble are static and the validation route (JavaFloatHolder/JavaDoubleHolder.validateLexical) only carries a ValidationContext, not XmlOptions. So an XmlOptions opt-in means threading the flag down through validateLexical and the ValidationContext interface, since the options object doesn't reach the lexer today. Worth knowing before I touch that interface. For context, the trailing f/F (float) and d/D (double) check isn't new, it predates this PR. The parts I added on top were the cross-type suffix, the hex form and the "Infinity" spelling. So either I wire up the XmlOptions opt-in and thread it through (default stays lenient), or I drop the added strictness and leave the lexers as they were. Which would you rather? Given your point about users running strict validation through JAXP, I'm happy with the latter if you'd sooner keep the lexer lenient. |
|
I've got stuff on today so can't spend much time on this today. The aim would be that calls like |
Found while feeding malformed numerics through the lexical converters.
Float.parseFloat and Double.parseDouble take hex floats, the Java "Infinity" token and a trailing f/F/d/D suffix. None of those are in the xsd:float/xsd:double lexical space. lexFloat only guarded a trailing f/F and lexDouble only a trailing d/D, so each let the other type's suffix through, and both let hex floats and "Infinity" through.
This is reachable from untrusted XML through JavaFloatHolder/JavaDoubleHolder.validateLexical, so an out-of-space value validates clean instead of being flagged invalid.
Moved the guard into a shared helper that rejects the hex marker, the Infinity letters and the trailing suffix before parsing. INF, -INF, NaN and the ordinary decimal/exponent forms are untouched.