2020-08-03 7:26:31 (GMT)
@Simon Cruanes @Petar Vukmirovic
I'd like to understand once and for all what Lightweight AVATAR, as implemented in Zip, is. I thought the answer would be in Simon's thesis but there I read (in Sect. 2.5) essentially a high-level description of AVATAR. The only helpful comment I found is
with a simplified implementation of AVATAR that does not prune inactive clauses (clauses whose trail is false in the current boolean interpretation)
(I'm going to pretend I just read "disabled" rather than "inactive", in keeping with standard terminology.)
A few questions: (1) What happens to the clauses in the active set that become disabled as a result of a Boolean model switch? (2) What happens to the clauses in the passive set under the same circumstances? (3) Must the given clause always be enabled (w.r.t. the Boolean model)? (4) Are disabled active clauses considered as inference partners of the given clause? (5) What is the Boolean model used for? (6) How is Lightweight AVATAR different from Splitting without Backtracking?
My guesses: (1) They move from active to passive. (2) They remain in passive (but could also be thrown away). (3) Yes, otherwise the model would be almost useless. (4) There is no such thing as a disabled active clause per (1). (5) It restricts clause selection and notices when all cases have been done. (6) The essential difference is the use of a SAT solver to focus on (try to saturate) one split branch at a time.
Thank you in advance for any help!
2020-08-03 8:21:57 (GMT)
I thjink all your questions are answered once you realize that Boolean model is not used unless the propositional abstraction of the problem is UNSAT.
2020-08-03 8:22:36 (GMT)
Boolean model is not exported anywhere and is not used to guide any heuristic (including clause selection)
2020-08-03 8:25:55 (GMT)
But let me try to answer all questions (@Simon Cruanes can correct me):
1) 2) There is no model switch. The fact that the current propositional abstraction is SAT is not used. What is used though, is that
so called zero-implied literals are used to simplify assertions. In other words, once SAT solver says that propositional literal A
is true in all models that is propagated through the assertions in the obvious way.
3) No (as Boolean model is not used).
4) There are no disabled clauses
5) Nothing other than the simplification I described
6) It enables simplifications on the same branch (or globally once you simplify assertions enough).
2020-08-03 8:26:28 (GMT)
OK, thanks. But then what you get should be very similar to splitting without backtracking, right?
2020-08-03 8:27:56 (GMT)
Which means it should be very easy to implement a Lightweight AVATAR in E.
2020-08-03 8:28:37 (GMT)
Except that your prover is aware of the "special" status of the assertions
And that you do branch-based simplifications
Branch-based simplifications are the reason why AVATAR is hard to implement in E
2020-08-03 8:28:48 (GMT)
Any version of AVATAR for that matter
2020-08-03 8:29:44 (GMT)
Aren't branch-based simplifications just special cases of contextual simplification?
2020-08-03 8:30:14 (GMT)
Cf. "Subsumption Demodulation in First-Order Theorem Proving"?
2020-08-03 8:30:55 (GMT)
I think we already discussed this. Specific implementation of rewriting in E needs to be changed
It caches the rewritten term, and as term will not be globally rewritten this needs to be taken care of
2020-08-03 8:31:0 (GMT)
Of course, that's inefficient, but if your conditions are all propositional (assuming AVATAR with SAT not SMT), that should be efficient.
2020-08-03 8:31:34 (GMT)
We discussed what you wrote, but I don't see the connection with "splitting without backtracking".
2020-08-03 8:32:57 (GMT)
Suppose you have the clauses and .
2020-08-03 8:33:13 (GMT)
Now suppose .
2020-08-03 8:33:38 (GMT)
Then we could rewrite to .
2020-08-03 8:33:42 (GMT)
This means that we will not use E's rewriting mechanism
2020-08-03 8:33:56 (GMT)
Well this means we need a contextual rewriting mechanism.
2020-08-03 8:33:57 (GMT)
But implement something that does not do the simplifications in-place
2020-08-03 8:34:15 (GMT)
Exactly. So it is not very easy :smile:
2020-08-03 8:34:48 (GMT)
I don't know the details, but I'm guessing it cannot be that hard to implement contextual rewriting in E. I'm surprised there's no rule (maybe disabled by default) that doesn't do this already.
2020-08-03 8:36:38 (GMT)
AFAIK, there is no rule like that (@Stephan Schulz knows this better of course).
To implement contextual rewriting, you need improved data structures (as described in
Vampire talk at IJCAR) + specifically for E, you need to break the invariant (at least in some places)
that rewritten terms are cached
2020-08-03 8:37:40 (GMT)
To be honest, I still don't get it. Simplification is just adding a clause and removing a clause.
2020-08-03 8:37:58 (GMT)
You can already add plenty of clauses today, and you also remove lots of redundant clauses.
2020-08-03 8:38:12 (GMT)
So what is the problem?
2020-08-03 8:38:25 (GMT)
That this would be slower than in-place rewriting?
2020-08-03 8:40:17 (GMT)
And also, these fancy Vampire data structures, you need them only if you do full "subsumption demodulation". Here the subsumption part would be propositional, i.e. no unification, and hence very efficient in principle (not NP-complete).
2020-08-03 8:42:54 (GMT)
Take rewriting as an example. you have a unit equation s = t and you rewrite in a clause C[s].
The way this is currently done in E is that there is a link in the term structure (together with
proof object information) that links s to t and replaces the clause in place. If we decide to
reuse this mechanism for rewriting in branches, these links need to carry that information
and different rewritings are possible.
If we decide to implement different mechanism, well, we need to design different
mechanism -- which is not very easy. It is not very hard though, but still time consuming.
The main reason is that the code that currently does rewriting will need to be updated
to break the rewrite caching invariants. I do not know how hard this will be a priori.
2020-08-03 8:45:5 (GMT)
I don't understand. You talk about "branches" but there are no branches with "splitting without backtracking" / "Lightweight AVATAR". You just explained this to me.
2020-08-03 8:45:30 (GMT)
The above clauses with and are really clauses:
2020-08-03 8:45:48 (GMT)
etc.
2020-08-03 8:46:29 (GMT)
The only difference to today is that the extended Demod rule might take a clause with more than one literal as its left premise.
2020-08-03 8:46:30 (GMT)
OK in "LW Avatar", they are special literals stored in the "assertion" part of the clause, so they are not real literals
2020-08-03 8:47:5 (GMT)
Yes, but these give you little that you don't already have with "splitting with backtracking".
2020-08-03 8:47:18 (GMT)
If you implement the kind of simplification I'm talking about.
2020-08-03 8:47:24 (GMT)
Jasmin Blanchette said:
The only difference to today is that the extended
Demodrule might take a clause with more than one literal as its left premise.
No. Because, the side effect would be that will rewrite __every__ in every clause the way E currently works
2020-08-03 8:48:8 (GMT)
Ah, so you're trying to tell me (again) that it's vital that the first premise of Demod is unit.
2020-08-03 8:48:21 (GMT)
The way E currently works, yes
2020-08-03 8:48:34 (GMT)
OK, that's the level of abstraction at which I'm following you.
2020-08-03 8:49:7 (GMT)
But as I pointed out, it wouldn't be the end of the world to bypass the Demod machinery to implement what I described, right?
2020-08-03 8:50:2 (GMT)
I do not know that before investigating a bit further.
2020-08-03 8:50:17 (GMT)
It is relatively big, extremely optimized part of code
2020-08-03 8:50:51 (GMT)
What I'm suggesting is bypassing that big, optimized part of the code.
2020-08-03 8:51:36 (GMT)
I'm just saying: the simplification I described above is what separates E's splitting without backtracking from Lightweight AVATAR.
2020-08-03 8:51:37 (GMT)
This means that I would need a separate index for new conditional rewrites,
which is somewhat simpler than Vampire work presented at IJCAR but not
significantly so.
2020-08-03 8:51:56 (GMT)
Significantly by any means. We'll only have ground terms.
2020-08-03 8:52:23 (GMT)
Jasmin Blanchette said:
I'm just saying: the simplification I described above is what separates E's splitting without backtracking from Lightweight AVATAR.
As far as I can see now, yes. But @Simon Cruanes needs to confirm that.
2020-08-03 8:53:7 (GMT)
Jasmin Blanchette said:
Significantly by any means. We'll only have ground terms.
I am not convinced. Because, sure, the assertion part is propositional, but a lot of code
for rewriting part needs to be duplicated.
2020-08-03 8:53:48 (GMT)
It depends on how efficient this has to be. And honestly, I don't know.
2020-08-03 8:54:35 (GMT)
Me neither. So I can agree it is not the end of the world, but I surely do not think it
is a very easy side project.
2020-08-03 8:55:14 (GMT)
I haven't said that. But you agree with me that Lightweight AVATAR in E doesn't have to be this gigantic monster either.
2020-08-03 8:56:1 (GMT)
The rule I described is sandwiched somewhere between a full-blown Sup and a Demod. It would need its own index.
2020-08-03 8:59:47 (GMT)
Sure. And if we decide to implement something like that in E, I would go for Zipperposition-like representation of assertions
and implement a special module that does rewriting (possibly other simplifications) on asserted clauses (for the reasons described above).
Regardless, I think @Stephan Schulz agrees that E's indexing data structures can (should?) be refactored to store any kind of data in
the leaves other than unit clauses (I think I will need that for a lot of indices we use here and there in HO rules)
2020-08-03 9:1:10 (GMT)
Well, _that_ sounds like a lot of work that won't happen during your PhD.
2020-08-03 9:2:37 (GMT)
I think implementation of those things (if done properly so that they can be merged back to master)
needs proper refactoring of E's data structures so everything does not become a big mess with double
versions of everything
2020-08-03 9:6:17 (GMT)
Sure. But I see we're between a rock and a hard place. And there's also the observation that backtracking w/o splitting apparently doesn't work well for E (because of the inapplicability of Demod perhaps?). Ah well, I agree with you: E is not AVATAR-friendly.
2020-08-03 9:9:34 (GMT)
That is something Stephan told me once. IIRC, in some of the splitting w/o backtracking papers, they have implemented branch-based simplifications (where branch is interpreted using the values of the names for clause components).
That helped the performance a bit (I do not know how much though)
2020-08-03 9:11:22 (GMT)
You don't know which paper by any chance? I thought there was only one.
2020-08-03 9:13:12 (GMT)
Hm... let me find that :smile:
2020-08-03 9:15:51 (GMT)
Thanks!
2020-08-03 9:19:15 (GMT)
2020-08-03 9:19:23 (GMT)
2020-08-03 9:19:50 (GMT)
This is exactly what Zipperposition checks
2020-08-03 9:29:0 (GMT)
Yes, this is exactly the simplification rule I was describing above.
2020-08-03 9:29:24 (GMT)
"we note that branch rewriting was implemented in less than 3 hours"
2020-08-03 9:32:45 (GMT)
I guess AVATAR was also implemented in weeks in Vampire, even though it would take much more in E :smile:
2020-08-03 9:33:6 (GMT)
;)
2020-08-03 9:34:36 (GMT)
Couldn't one simply prioritize these almost unit clauses in clause selection and rely on the Sup rule + subsumption to achieve the same result?
2020-08-03 9:35:5 (GMT)
That would be a three-hour adventure :smile:
2020-08-03 9:35:8 (GMT)
I.e. you just make sure that these clauses where all but one literal is an abstracted split component become active real fast.
2020-08-03 9:35:12 (GMT)
He he. ;)
2020-08-03 9:35:36 (GMT)
I mean, all the machinery is there... apart from the index, but this would make the index unnecessary.
2020-08-03 9:36:7 (GMT)
Yes, indeed. That can be tried.
2020-08-03 12:32:36 (GMT)
Jasmin Blanchette said:
OK, thanks. But then what you get should be very similar to splitting without backtracking, right?
Yes, except you get, for almost free:
2020-08-03 12:33:51 (GMT)
Petar Vukmirovic said:
But implement something that does not do the simplifications in-place
you can rewrite in place for non-conditional rewrites, i.e. all the current rewrites, and disable caching for other, conditional rewrites, correct?
2020-08-03 12:37:16 (GMT)
I am surprised that E rewrites in place, I remember slides where Stephan said he tried, gave up, and used caching instead (cache normal form of terms directly in them as a pointer).
2020-08-03 12:39:57 (GMT)
We used two notions of in place here
2020-08-03 12:41:4 (GMT)
In-place in terms of the clause -- new clause does not created when it is rewritten
What Stephan referred to what rewriting perfectly shared term DAG directly
2020-08-03 13:40:15 (GMT)
non chronological backtracking and the implementation is simpler as most of the backtracking state is in the sat-solver.
I don't understand that part. Petar convinced me earlier this morning that the SAT solver is used only to determine unsat, not to guide anything. Could you, Simon, go back to my earlier messages and answer my questions 1 to 5?
2020-08-03 13:40:58 (GMT)
What he mean there I think is only that some branches can be killed if they are implied on level-0 as determined by the sat solver
2020-08-03 13:43:27 (GMT)
Ah, thanks for clarifying.
2020-08-03 13:44:0 (GMT)
Of course, he might have thought something else :smile:
2020-08-03 13:44:14 (GMT)
It makes sense, though.
2020-08-03 14:29:43 (GMT)
Well, when you have a conflict, it affects the whole tree. I'm not sure how to expand on my intuition of CDCL's exploration of the search space.
Of course, the hiccup is that AVATAR-lite explores all branches "at the same time" since it doesn't use the SAT model. I'm not sure what a middle ground would look like (i.e. use the SAT model but with something simpler than full clause locking/unlocking).
2020-08-03 14:31:35 (GMT)
@Giles suggested selecting clauses that whose assertions are satisfied
2020-08-03 14:31:52 (GMT)
In principle, it can be sued for some other heuristics as well
2020-08-03 14:32:8 (GMT)
yes, it makes sense indeed. Just explore the current branch prioritarily.
2020-08-03 14:49:7 (GMT)
Sounds like a good idea indeed.
2020-08-03 15:1:2 (GMT)
In essence the locking/unlocking is only done in the clause queues…
2020-08-04 8:2:11 (GMT)
(1) This stuff about level-0 implied literals: Does that help already with a SAT solver, or do you need SMT for it to make sense? I.e. what would be a case where a lightweight AVATAR with a SAT solver would skip a branch in the split tree and splitting without backtracking would not?
2020-08-04 8:6:41 (GMT)
(2) At the theoretical level, the big gain of AVATAR is that it doesn't fully saturate at the limit; it only saturates all the infinite branches. (An even stronger splitting architecture would saturate only one infinite branch and ignore the others, but I believe that would require insights into how the SAT solver works.) Similarly, labeled splitting saturates only one infinite branch, but that's because is SAT strategy is so restricted that there's at most one such branch.
In the prover, this is reflected by the fact that AVATAR strictly adheres to the current model and avoids inferences outside. An architecture that doesn't do this strictly might in practice work well, but it would theoretically saturate too much, and that's a bad sign.
2020-08-04 9:12:3 (GMT)
1) It makes sense already with SAT. If you happened to derive clauses (on assertion level) , , A_2 will be zero-implied, which would result in removing all clauses that are asserted under and removing from the assertions
2020-08-04 9:12:49 (GMT)
To the best of my knowledge, splitting wo backtracking does not do that (of course it can be imitated with lots of resolutions, but yeah this is much cleaner...)
2020-08-04 9:13:27 (GMT)
Right, it doesn't do that because it gives such a low weight to the propositional atoms. I see.
2020-08-04 9:14:4 (GMT)
2) I think you are right there. I think that LW AVATAR has no way of knowing it saturated a single branch :frown:
2020-08-04 13:19:4 (GMT)
Petar Vukmirovic said:
1) It makes sense already with SAT. If you happened to derive clauses (on assertion level) , , A_2 will be zero-implied, which would result in removing all clauses that are asserted under and removing from the assertions
you don't even necessarily need to derive a clause here, simple boolean propagation before any decision will do it :smile:
2020-08-04 15:33:29 (GMT)
I agree, I just gave a simple example to illustrate what happens even on SAT level :smile:
2020-08-04 16:5:51 (GMT)
I'm grateful for the simple example. ;)
2020-08-05 10:3:52 (GMT)
On (1) we have (possibly had) lots of code in Vampire to do 'zero-implied optimisations' where we try and detect zero-implied variables and remove them from split sets. It sounds good as things derived in one branch can be used in more other branches. But in our early experiments it made close to zero difference. Of course there's always the chance we did it wrong...
2020-08-05 10:25:6 (GMT)
On (2) it should be possible to tell whether the passive set contains anything in the current SAT model but doing that efficiently doesn't sound easy.
2020-08-05 13:36:41 (GMT)
Indeed, the 0-level things were also useful to Zipperposition because avatar-lite was used in induction and cuts (the ones produced by lemmas). When you prove a lemma (closing ) it becomes a (set of) clause(s) without assertions by simplifying in .