Add extract_regex and validate_pattern primitives (#100, #101)#107
Merged
Conversation
Two regex-based primitives, paired in one change because they share infrastructure (flag whitelist, null-handling decorator, factory wiring): - `extract_regex` pulls a capture group out of a string. Supports positional or named groups, optional regex flags from a whitelist (IGNORECASE/MULTILINE/DOTALL), and the standard strict/default fallback pattern. - `validate_pattern` asserts that a value matches a regex and returns it unchanged on success. Configurable mode (match/fullmatch/search) picks the anchoring semantics, with the same strict/default surface. Both compile their patterns eagerly so a malformed regex fails at construction time rather than per-row, and both go through @handle_null so missing values pass through untouched. Resolves #100, #101. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds two regex-based primitives —
extract_regexandvalidate_pattern— paired in one PR because they share infrastructure (flag whitelist, null-handling decorator, factory wiring) and the same test patterns. Closes #100 and #101.extract_regex(#100)Pulls a capture group out of a string. Supports positional or named groups, optional regex flags from a whitelisted set (
IGNORECASE,MULTILINE,DOTALL), and the standardstrict/defaultfallback pattern.{ "operation": "extract_regex", "expression": "MRN[:\\\\s]+([A-Z0-9-]+)", "group": 1, "strict": true }validate_pattern(#101)Asserts that a value matches a regex; returns the original value on success, or raises (strict) / returns
default(non-strict) on failure. Configurable mode picks the anchoring semantics:match— anchored at start (default)fullmatch— anchored both endssearch— match anywhere{ "operation": "validate_pattern", "expression": "^[A-Z]{2}\\\\d{6}$", "mode": "fullmatch", "strict": true }Design notes
VERBOSE) so the serialized form stays portable and reviewable.@handle_null+@support_iterable, consistent with the other string primitives — nulls pass through unchanged, lists fan out per element.validate_patterndeliberately reusesextract_regex's flag resolver to keep the two primitives in lockstep on the supported flag vocabulary.Tests
31 new tests in
tests/test_regex_primitives.pycovering every bullet in the issues' test plans: basic / named-group / explicit-index extraction, full-match group 0, strict + non-strict failure paths, invalid group, iterable input, flags, invalid pattern, unknown flag, all three validate modes, serialization roundtrip with and without flags, null pass-through, and a HarmonizationRule chain that goes through serialization on both primitives.Total: 162 tests pass (was 131).
Test plan
pytest— 162/162 pass🤖 Generated with Claude Code