Skip to content

Porting seq_split to master#9840

Draft
CEisenhofer wants to merge 2 commits into
masterfrom
port_seq_split_to_master
Draft

Porting seq_split to master#9840
CEisenhofer wants to merge 2 commits into
masterfrom
port_seq_split_to_master

Conversation

@CEisenhofer

Copy link
Copy Markdown
Collaborator

No description provided.

@CEisenhofer

Copy link
Copy Markdown
Collaborator Author

@NikolajBjorner Currently just the code. Still needs to be used within the legacy solver

@CEisenhofer

Copy link
Copy Markdown
Collaborator Author

Update: Added it to the seq solver (probably quite buggy)

};

// A split-set is a union of individual splits.
typedef vector<split_pair> split_set;

@NikolajBjorner NikolajBjorner Jun 13, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about making this API more abstract.
The service it provides is to expose an iterator of split_sets.

You can have the iterator return a std::option<split_pair> to communicate failure case. Or you can have the iterator expose a method if it provided only a partial split enumeration.

namespace seq {
class split {
struct imp;
imp * m_imp;
public;
split(...);
class iterator { // see term_enumerator branch for term_enumerator.h

    };

    class set {
    public:
         iterator begin();
         iterator end();
    };

    set operator(expr* r);

};

}


// S1 cap S2 = { <D1 cap D2, N1 cap N2> } dropping any pair with a bottom
// component (and any rejected by `oracle`). Returns false on threshold overrun.
bool intersect(split_set const& s1, split_set const& s2, split_set& result,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about lazy expansion of split sets?
Considering the notes in z3paper/resplit/prompts it outlines an algebraic datatype corresponding to suspended computations.
You can use the expr class with declarations that are local to this class to maintain these suspended computations instead of coming up with a separate type.

Then considering "threshold". Could it be controlled outside of this class given that expansion would be lazy. Even lazy evaluation could end up bloating space so you may eventually keep the threshold parameter

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have it locally. WIP. However, currently quite buggy, so comitted this "eager" approach for now to collect some feedback before leaving. (Will apply the changed once I am back on the 22nd.)

@NikolajBjorner

Copy link
Copy Markdown
Contributor

NB @veanes

@NikolajBjorner

NikolajBjorner commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

there are changes to seq_rewriter.
Can these go into a separate PR?
Some changes are cosmetic such as replacing "else if" by "if".
What are the substantial ones?
Note that the changes are in code that the derive branch is replacing

Comment thread src/smt/seq_regex.cpp
return;
}

if (th.get_fparams().m_seq_regex_factorization_enabled) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for legacy solver.
Shouldn't we focus on c3 branch?
With updates like this we also have to test and fine tune parameters. Preferrably not guard functionality under parameter settings.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it might affect performance quite a bit, so a feature flag seems the way to go. Esp., since this feature is actually for the nseq solver.
So just having it enabled always?

@NikolajBjorner

Copy link
Copy Markdown
Contributor

@CEisenhofer - can you also add unit tests for this pr?


Author:

Nikolaj Bjorner (nbjorner) 2026-6-10

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 🙃?

pairs.swap(result);
}

std::pair<expr*, expr*> seq_split::split_membership(expr* str, expr* regex, unsigned threshold, split_set& result) const {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reference counts on returned expressions appear not accounted for in a self-contained way. Can you return a pair of expr_ref?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

seq_rewriter& m_rw; // for mk_re_append + manager / seq_util access
seq_subset m_subset; // language-subset checks for subsumption

ast_manager& m() const;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use the attribute "m" for ast_manager and don't introduce this one.
It was an old convention that turned out to be pure overhead.

unsigned cv;
VERIFY(seq().str.is_unit(tokens.get(run_start + i), ch));
VERIFY(seq().is_const_char(ch, cv));
c = c + zstring(cv);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first build a vector, then create a string from that vector. Concatenating non-mutual strings is a perf trap.

oracle = [this, &c](expr*, expr* n) { return split_lookahead_viable(n, c); };

// Decompose the regex into a split-set via the shared seq_split engine
if (!m_rw.split(regex, result, threshold, split_mode::strong, oracle)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this roundabout?
seq_rewriter calls compute or split again, right?

// of each postfix
if (!c.empty()) {
unsigned w = 0;
for (i = 0; i < result.size(); ++i) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use for comprehension whenever it applies?

vector<expr*> stack;
stack.push_back(str);

while (!stack.empty()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a utility function somewhere

regex = m_rw.mk_derivative(ch, regex);
}

if (i > 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there may be a self-contained function for that somewhere. If not, why not add it? You are shifting a suffix into a prefix.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure; will do

return { nullptr, nullptr };
}

m_rw.simplify_split(result);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this just calling into seq_split?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep; it is. Residue of the porting

seq_util::rex& r = re();

// 1. drop pairs with a bottom (empty-language) component.
unsigned w = 0;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a "filter" function.

vector<expr*> stack;
stack.push_back(s);

while (!stack.empty()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code optimizes but for fail fast but could be constructed from basic pieces and probably without a blimp on the perf radar:
first get concats.
Then create string,
return false if there is a concat that isn't a string.

// union: sigma(r0 | ... | r_{n-1}) = U sigma(ri) (re.union may be n-ary)
if (rex.is_union(r)) {
app* ap = to_app(r);
for (unsigned i = 0; i < ap->get_num_args(); ++i) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can iterate direcly:
for (expr* arg : *ap) {

}

if (rex.is_concat(r)) {
app* ap = to_app(r);
const unsigned n = ap->get_num_args();
for (unsigned i = 0; i < n; ++i) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can iterate directly:
for (auto arg : *ap) {

}

right = rex.mk_epsilon(seq_sort);
else {
right = ap->get_arg(i + 1);
for (unsigned j = i + 2; j < n; ++j) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use iterator for (auto arg : *ap) {

}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants