Porting seq_split to master by CEisenhofer · Pull Request #9840 · Z3Prover/z3

CEisenhofer · 2026-06-12T13:19:49Z

No description provided.

CEisenhofer · 2026-06-12T13:20:30Z

@NikolajBjorner Currently just the code. Still needs to be used within the legacy solver

…olver

CEisenhofer · 2026-06-12T15:06:31Z

Update: Added it to the seq solver (probably quite buggy)

NikolajBjorner · 2026-06-13T22:49:33Z

+};
+
+// A split-set is a union of individual splits.
+typedef vector<split_pair> split_set;


what about making this API more abstract.
The service it provides is to expose an iterator of split_sets.

You can have the iterator return a std::option<split_pair> to communicate failure case. Or you can have the iterator expose a method if it provided only a partial split enumeration.

namespace seq {
class split {
struct imp;
imp * m_imp;
public;
split(...);
class iterator { // see term_enumerator branch for term_enumerator.h

}; class set { public: iterator begin(); iterator end(); }; set operator(expr* r); };

}

NikolajBjorner · 2026-06-13T22:56:43Z

+
+    // S1 cap S2 = { <D1 cap D2, N1 cap N2> } dropping any pair with a bottom
+    // component (and any rejected by `oracle`).  Returns false on threshold overrun.
+    bool intersect(split_set const& s1, split_set const& s2, split_set& result,


how about lazy expansion of split sets?
Considering the notes in z3paper/resplit/prompts it outlines an algebraic datatype corresponding to suspended computations.
You can use the expr class with declarations that are local to this class to maintain these suspended computations instead of coming up with a separate type.

Then considering "threshold". Could it be controlled outside of this class given that expansion would be lazy. Even lazy evaluation could end up bloating space so you may eventually keep the threshold parameter

Have it locally. WIP. However, currently quite buggy, so comitted this "eager" approach for now to collect some feedback before leaving. (Will apply the changed once I am back on the 22nd.)

NikolajBjorner · 2026-06-14T01:14:02Z

NB @veanes

NikolajBjorner · 2026-06-14T01:15:17Z

there are changes to seq_rewriter.
Can these go into a separate PR?
Some changes are cosmetic such as replacing "else if" by "if".
What are the substantial ones?
Note that the changes are in code that the derive branch is replacing

NikolajBjorner · 2026-06-14T01:19:00Z

            return;
        }

+        if (th.get_fparams().m_seq_regex_factorization_enabled) {


this is for legacy solver.
Shouldn't we focus on c3 branch?
With updates like this we also have to test and fine tune parameters. Preferrably not guard functionality under parameter settings.

But it might affect performance quite a bit, so a feature flag seems the way to go. Esp., since this feature is actually for the nseq solver.
So just having it enabled always?

NikolajBjorner · 2026-06-14T01:20:32Z

@CEisenhofer - can you also add unit tests for this pr?

NikolajBjorner · 2026-06-14T03:29:22Z

+
+Author:
+
+    Nikolaj Bjorner (nbjorner) 2026-6-10


Why not 🙃?

NikolajBjorner · 2026-06-14T03:34:02Z

+    pairs.swap(result);
+}
+
+std::pair<expr*, expr*> seq_split::split_membership(expr* str, expr* regex, unsigned threshold, split_set& result) const {


reference counts on returned expressions appear not accounted for in a self-contained way. Can you return a pair of expr_ref?

NikolajBjorner · 2026-06-14T03:35:03Z

+    seq_rewriter& m_rw;       // for mk_re_append + manager / seq_util access
+    seq_subset    m_subset;   // language-subset checks for subsumption
+
+    ast_manager&   m() const;


just use the attribute "m" for ast_manager and don't introduce this one.
It was an old convention that turned out to be pure overhead.

NikolajBjorner · 2026-06-14T03:36:50Z

+        unsigned cv;
+        VERIFY(seq().str.is_unit(tokens.get(run_start + i), ch));
+        VERIFY(seq().is_const_char(ch, cv));
+        c = c + zstring(cv);


first build a vector, then create a string from that vector. Concatenating non-mutual strings is a perf trap.

NikolajBjorner · 2026-06-14T03:38:09Z

+        oracle = [this, &c](expr*, expr* n) { return split_lookahead_viable(n, c); };
+
+    // Decompose the regex into a split-set via the shared seq_split engine
+    if (!m_rw.split(regex, result, threshold, split_mode::strong, oracle)) {


is this roundabout?
seq_rewriter calls compute or split again, right?

NikolajBjorner · 2026-06-14T03:38:41Z

+    // of each postfix
+    if (!c.empty()) {
+        unsigned w = 0;
+        for (i = 0; i < result.size(); ++i) {


can you use for comprehension whenever it applies?

NikolajBjorner · 2026-06-14T03:39:49Z

+    vector<expr*> stack;
+    stack.push_back(str);
+
+    while (!stack.empty()) {


this is a utility function somewhere

NikolajBjorner · 2026-06-14T03:42:08Z

+        regex = m_rw.mk_derivative(ch, regex);
+    }
+
+    if (i > 0) {


there may be a self-contained function for that somewhere. If not, why not add it? You are shifting a suffix into a prefix.

Sure; will do

NikolajBjorner · 2026-06-14T03:43:04Z

+        return { nullptr, nullptr };
+    }
+
+    m_rw.simplify_split(result);


isn't this just calling into seq_split?

Yep; it is. Residue of the porting

NikolajBjorner · 2026-06-14T03:44:15Z

+    seq_util::rex& r = re();
+
+    // 1. drop pairs with a bottom (empty-language) component.
+    unsigned w = 0;


this is a "filter" function.

NikolajBjorner · 2026-06-14T03:48:08Z

+        vector<expr*> stack;
+        stack.push_back(s);
+
+        while (!stack.empty()) {


The code optimizes but for fail fast but could be constructed from basic pieces and probably without a blimp on the perf radar:
first get concats.
Then create string,
return false if there is a concat that isn't a string.

NikolajBjorner · 2026-06-14T03:48:53Z

+    // union: sigma(r0 | ... | r_{n-1}) = U sigma(ri)   (re.union may be n-ary)
+    if (rex.is_union(r)) {
+        app* ap = to_app(r);
+        for (unsigned i = 0; i < ap->get_num_args(); ++i) {


you can iterate direcly:
for (expr* arg : *ap) {

}

NikolajBjorner · 2026-06-14T03:49:27Z

+    if (rex.is_concat(r)) {
+        app* ap = to_app(r);
+        const unsigned n = ap->get_num_args();
+        for (unsigned i = 0; i < n; ++i) {


you can iterate directly:
for (auto arg : *ap) {

}

NikolajBjorner · 2026-06-14T03:50:09Z

+                right = rex.mk_epsilon(seq_sort);
+            else {
+                right = ap->get_arg(i + 1);
+                for (unsigned j = i + 2; j < n; ++j) {


use iterator for (auto arg : *ap) {

}

Porting seq_split to master

871a2b3

CEisenhofer requested a review from NikolajBjorner June 12, 2026 13:19

Probably quite instable, but included the splitting into the legacy s…

1feb610

…olver

NikolajBjorner reviewed Jun 13, 2026

View reviewed changes

NikolajBjorner reviewed Jun 14, 2026

View reviewed changes

Conversation

CEisenhofer commented Jun 12, 2026

Uh oh!

CEisenhofer commented Jun 12, 2026

Uh oh!

CEisenhofer commented Jun 12, 2026

Uh oh!

NikolajBjorner Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NikolajBjorner commented Jun 14, 2026

Uh oh!

NikolajBjorner commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NikolajBjorner commented Jun 14, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

NikolajBjorner Jun 13, 2026 •

edited

Loading

NikolajBjorner commented Jun 14, 2026 •

edited

Loading