Porting seq_split to master#9840
Conversation
|
@NikolajBjorner Currently just the code. Still needs to be used within the legacy solver |
|
Update: Added it to the seq solver (probably quite buggy) |
| }; | ||
|
|
||
| // A split-set is a union of individual splits. | ||
| typedef vector<split_pair> split_set; |
There was a problem hiding this comment.
what about making this API more abstract.
The service it provides is to expose an iterator of split_sets.
You can have the iterator return a std::option<split_pair> to communicate failure case. Or you can have the iterator expose a method if it provided only a partial split enumeration.
namespace seq {
class split {
struct imp;
imp * m_imp;
public;
split(...);
class iterator { // see term_enumerator branch for term_enumerator.h
};
class set {
public:
iterator begin();
iterator end();
};
set operator(expr* r);
};
}
|
|
||
| // S1 cap S2 = { <D1 cap D2, N1 cap N2> } dropping any pair with a bottom | ||
| // component (and any rejected by `oracle`). Returns false on threshold overrun. | ||
| bool intersect(split_set const& s1, split_set const& s2, split_set& result, |
There was a problem hiding this comment.
how about lazy expansion of split sets?
Considering the notes in z3paper/resplit/prompts it outlines an algebraic datatype corresponding to suspended computations.
You can use the expr class with declarations that are local to this class to maintain these suspended computations instead of coming up with a separate type.
Then considering "threshold". Could it be controlled outside of this class given that expansion would be lazy. Even lazy evaluation could end up bloating space so you may eventually keep the threshold parameter
There was a problem hiding this comment.
Have it locally. WIP. However, currently quite buggy, so comitted this "eager" approach for now to collect some feedback before leaving. (Will apply the changed once I am back on the 22nd.)
|
NB @veanes |
|
there are changes to seq_rewriter. |
| return; | ||
| } | ||
|
|
||
| if (th.get_fparams().m_seq_regex_factorization_enabled) { |
There was a problem hiding this comment.
this is for legacy solver.
Shouldn't we focus on c3 branch?
With updates like this we also have to test and fine tune parameters. Preferrably not guard functionality under parameter settings.
There was a problem hiding this comment.
But it might affect performance quite a bit, so a feature flag seems the way to go. Esp., since this feature is actually for the nseq solver.
So just having it enabled always?
|
@CEisenhofer - can you also add unit tests for this pr? |
|
|
||
| Author: | ||
|
|
||
| Nikolaj Bjorner (nbjorner) 2026-6-10 |
| pairs.swap(result); | ||
| } | ||
|
|
||
| std::pair<expr*, expr*> seq_split::split_membership(expr* str, expr* regex, unsigned threshold, split_set& result) const { |
There was a problem hiding this comment.
reference counts on returned expressions appear not accounted for in a self-contained way. Can you return a pair of expr_ref?
| seq_rewriter& m_rw; // for mk_re_append + manager / seq_util access | ||
| seq_subset m_subset; // language-subset checks for subsumption | ||
|
|
||
| ast_manager& m() const; |
There was a problem hiding this comment.
just use the attribute "m" for ast_manager and don't introduce this one.
It was an old convention that turned out to be pure overhead.
| unsigned cv; | ||
| VERIFY(seq().str.is_unit(tokens.get(run_start + i), ch)); | ||
| VERIFY(seq().is_const_char(ch, cv)); | ||
| c = c + zstring(cv); |
There was a problem hiding this comment.
first build a vector, then create a string from that vector. Concatenating non-mutual strings is a perf trap.
| oracle = [this, &c](expr*, expr* n) { return split_lookahead_viable(n, c); }; | ||
|
|
||
| // Decompose the regex into a split-set via the shared seq_split engine | ||
| if (!m_rw.split(regex, result, threshold, split_mode::strong, oracle)) { |
There was a problem hiding this comment.
is this roundabout?
seq_rewriter calls compute or split again, right?
| // of each postfix | ||
| if (!c.empty()) { | ||
| unsigned w = 0; | ||
| for (i = 0; i < result.size(); ++i) { |
There was a problem hiding this comment.
can you use for comprehension whenever it applies?
| vector<expr*> stack; | ||
| stack.push_back(str); | ||
|
|
||
| while (!stack.empty()) { |
There was a problem hiding this comment.
this is a utility function somewhere
| regex = m_rw.mk_derivative(ch, regex); | ||
| } | ||
|
|
||
| if (i > 0) { |
There was a problem hiding this comment.
there may be a self-contained function for that somewhere. If not, why not add it? You are shifting a suffix into a prefix.
| return { nullptr, nullptr }; | ||
| } | ||
|
|
||
| m_rw.simplify_split(result); |
There was a problem hiding this comment.
isn't this just calling into seq_split?
There was a problem hiding this comment.
Yep; it is. Residue of the porting
| seq_util::rex& r = re(); | ||
|
|
||
| // 1. drop pairs with a bottom (empty-language) component. | ||
| unsigned w = 0; |
There was a problem hiding this comment.
this is a "filter" function.
| vector<expr*> stack; | ||
| stack.push_back(s); | ||
|
|
||
| while (!stack.empty()) { |
There was a problem hiding this comment.
The code optimizes but for fail fast but could be constructed from basic pieces and probably without a blimp on the perf radar:
first get concats.
Then create string,
return false if there is a concat that isn't a string.
| // union: sigma(r0 | ... | r_{n-1}) = U sigma(ri) (re.union may be n-ary) | ||
| if (rex.is_union(r)) { | ||
| app* ap = to_app(r); | ||
| for (unsigned i = 0; i < ap->get_num_args(); ++i) { |
There was a problem hiding this comment.
you can iterate direcly:
for (expr* arg : *ap) {
}
| if (rex.is_concat(r)) { | ||
| app* ap = to_app(r); | ||
| const unsigned n = ap->get_num_args(); | ||
| for (unsigned i = 0; i < n; ++i) { |
There was a problem hiding this comment.
you can iterate directly:
for (auto arg : *ap) {
}
| right = rex.mk_epsilon(seq_sort); | ||
| else { | ||
| right = ap->get_arg(i + 1); | ||
| for (unsigned j = i + 2; j < n; ++j) { |
There was a problem hiding this comment.
use iterator for (auto arg : *ap) {
}
No description provided.