zulip log viewer

Lightweight AVATAR

Jasmin Blanchette

2020-08-03 7:26:31 (GMT)

@Simon Cruanes @Petar Vukmirovic

I'd like to understand once and for all what Lightweight AVATAR, as implemented in Zip, is. I thought the answer would be in Simon's thesis but there I read (in Sect. 2.5) essentially a high-level description of AVATAR. The only helpful comment I found is

with a simplified implementation of AVATAR that does not prune inactive clauses (clauses whose trail is false in the current boolean interpretation)

(I'm going to pretend I just read "disabled" rather than "inactive", in keeping with standard terminology.)

A few questions: (1) What happens to the clauses in the active set that become disabled as a result of a Boolean model switch? (2) What happens to the clauses in the passive set under the same circumstances? (3) Must the given clause always be enabled (w.r.t. the Boolean model)? (4) Are disabled active clauses considered as inference partners of the given clause? (5) What is the Boolean model used for? (6) How is Lightweight AVATAR different from Splitting without Backtracking?

My guesses: (1) They move from active to passive. (2) They remain in passive (but could also be thrown away). (3) Yes, otherwise the model would be almost useless. (4) There is no such thing as a disabled active clause per (1). (5) It restricts clause selection and notices when all cases have been done. (6) The essential difference is the use of a SAT solver to focus on (try to saturate) one split branch at a time.

Thank you in advance for any help!

Petar Vukmirovic

2020-08-03 8:21:57 (GMT)

I thjink all your questions are answered once you realize that Boolean model is not used unless the propositional abstraction of the problem is UNSAT.

Petar Vukmirovic

2020-08-03 8:22:36 (GMT)

Boolean model is not exported anywhere and is not used to guide any heuristic (including clause selection)

Petar Vukmirovic

2020-08-03 8:25:55 (GMT)

But let me try to answer all questions (@Simon Cruanes can correct me):

1) 2) There is no model switch. The fact that the current propositional abstraction is SAT is not used. What is used though, is that
so called zero-implied literals are used to simplify assertions. In other words, once SAT solver says that propositional literal A
is true in all models that is propagated through the assertions in the obvious way.
3) No (as Boolean model is not used).
4) There are no disabled clauses
5) Nothing other than the simplification I described
6) It enables simplifications on the same branch (or globally once you simplify assertions enough).

Jasmin Blanchette

2020-08-03 8:26:28 (GMT)

OK, thanks. But then what you get should be very similar to splitting without backtracking, right?

Jasmin Blanchette

2020-08-03 8:27:56 (GMT)

Which means it should be very easy to implement a Lightweight AVATAR in E.

Petar Vukmirovic

2020-08-03 8:28:37 (GMT)

Except that your prover is aware of the "special" status of the assertions
And that you do branch-based simplifications
Branch-based simplifications are the reason why AVATAR is hard to implement in E

Petar Vukmirovic

2020-08-03 8:28:48 (GMT)

Any version of AVATAR for that matter

Jasmin Blanchette

2020-08-03 8:29:44 (GMT)

Aren't branch-based simplifications just special cases of contextual simplification?

Jasmin Blanchette

2020-08-03 8:30:14 (GMT)

Cf. "Subsumption Demodulation in First-Order Theorem Proving"?

Petar Vukmirovic

2020-08-03 8:30:55 (GMT)

I think we already discussed this. Specific implementation of rewriting in E needs to be changed
It caches the rewritten term, and as term will not be globally rewritten this needs to be taken care of

Jasmin Blanchette

2020-08-03 8:31:0 (GMT)

Of course, that's inefficient, but if your conditions are all propositional (assuming AVATAR with SAT not SMT), that should be efficient.

Jasmin Blanchette

2020-08-03 8:31:34 (GMT)

We discussed what you wrote, but I don't see the connection with "splitting without backtracking".

Jasmin Blanchette

2020-08-03 8:32:57 (GMT)

Suppose you have the clauses $b \approx a \leftarrow A_1$ and $p(b) \leftarrow A_2$ .

Jasmin Blanchette

2020-08-03 8:33:13 (GMT)

Now suppose $A_1 \subseteq A_2$ .

Jasmin Blanchette

2020-08-03 8:33:38 (GMT)

Then we could rewrite $p(b) \leftarrow A_2$ to $p(a) \leftarrow A_2$ .

Petar Vukmirovic

2020-08-03 8:33:42 (GMT)

This means that we will not use E's rewriting mechanism

Jasmin Blanchette

2020-08-03 8:33:56 (GMT)

Well this means we need a contextual rewriting mechanism.

Petar Vukmirovic

2020-08-03 8:33:57 (GMT)

But implement something that does not do the simplifications in-place

Petar Vukmirovic

2020-08-03 8:34:15 (GMT)

Exactly. So it is not very easy :smile:

Jasmin Blanchette

2020-08-03 8:34:48 (GMT)

I don't know the details, but I'm guessing it cannot be that hard to implement contextual rewriting in E. I'm surprised there's no rule (maybe disabled by default) that doesn't do this already.

Petar Vukmirovic

2020-08-03 8:36:38 (GMT)

AFAIK, there is no rule like that (@Stephan Schulz knows this better of course).
To implement contextual rewriting, you need improved data structures (as described in
Vampire talk at IJCAR) + specifically for E, you need to break the invariant (at least in some places)
that rewritten terms are cached

Jasmin Blanchette

2020-08-03 8:37:40 (GMT)

To be honest, I still don't get it. Simplification is just adding a clause and removing a clause.

Jasmin Blanchette

2020-08-03 8:37:58 (GMT)

You can already add plenty of clauses today, and you also remove lots of redundant clauses.

Jasmin Blanchette

2020-08-03 8:38:12 (GMT)

So what is the problem?

Jasmin Blanchette

2020-08-03 8:38:25 (GMT)

That this would be slower than in-place rewriting?

Jasmin Blanchette

2020-08-03 8:40:17 (GMT)

And also, these fancy Vampire data structures, you need them only if you do full "subsumption demodulation". Here the subsumption part would be propositional, i.e. no unification, and hence very efficient in principle (not NP-complete).

Petar Vukmirovic

2020-08-03 8:42:54 (GMT)

Take rewriting as an example. you have a unit equation s = t and you rewrite in a clause C[s].
The way this is currently done in E is that there is a link in the term structure (together with
proof object information) that links s to t and replaces the clause in place. If we decide to
reuse this mechanism for rewriting in branches, these links need to carry that information
and different rewritings are possible.

If we decide to implement different mechanism, well, we need to design different
mechanism -- which is not very easy. It is not very hard though, but still time consuming.
The main reason is that the code that currently does rewriting will need to be updated
to break the rewrite caching invariants. I do not know how hard this will be a priori.

Jasmin Blanchette

2020-08-03 8:45:5 (GMT)

I don't understand. You talk about "branches" but there are no branches with "splitting without backtracking" / "Lightweight AVATAR". You just explained this to me.

Jasmin Blanchette

2020-08-03 8:45:30 (GMT)

The above clauses with $A_1$ and $A_2$ are really clauses:

Jasmin Blanchette

2020-08-03 8:45:48 (GMT)

$b \approx A \mathrel{\lor} A_2$ etc.

Jasmin Blanchette

2020-08-03 8:46:29 (GMT)

The only difference to today is that the extended Demod rule might take a clause with more than one literal as its left premise.

Petar Vukmirovic

2020-08-03 8:46:30 (GMT)

OK in "LW Avatar", they are special literals stored in the "assertion" part of the clause, so they are not real literals

Jasmin Blanchette

2020-08-03 8:47:5 (GMT)

Yes, but these give you little that you don't already have with "splitting with backtracking".

Jasmin Blanchette

2020-08-03 8:47:18 (GMT)

If you implement the kind of simplification I'm talking about.

Petar Vukmirovic

2020-08-03 8:47:24 (GMT)

Jasmin Blanchette said:

The only difference to today is that the extended Demod rule might take a clause with more than one literal as its left premise.

No. Because, the side effect would be that $b = a$ will rewrite __every__ $b$ in every clause the way E currently works

Jasmin Blanchette

2020-08-03 8:48:8 (GMT)

Ah, so you're trying to tell me (again) that it's vital that the first premise of Demod is unit.

Petar Vukmirovic

2020-08-03 8:48:21 (GMT)

The way E currently works, yes

Jasmin Blanchette

2020-08-03 8:48:34 (GMT)

OK, that's the level of abstraction at which I'm following you.

Jasmin Blanchette

2020-08-03 8:49:7 (GMT)

But as I pointed out, it wouldn't be the end of the world to bypass the Demod machinery to implement what I described, right?

Petar Vukmirovic

2020-08-03 8:50:2 (GMT)

I do not know that before investigating a bit further.

Petar Vukmirovic

2020-08-03 8:50:17 (GMT)

It is relatively big, extremely optimized part of code

Jasmin Blanchette

2020-08-03 8:50:51 (GMT)

What I'm suggesting is bypassing that big, optimized part of the code.

Jasmin Blanchette

2020-08-03 8:51:36 (GMT)

I'm just saying: the simplification I described above is what separates E's splitting without backtracking from Lightweight AVATAR.

Petar Vukmirovic

2020-08-03 8:51:37 (GMT)

This means that I would need a separate index for new conditional rewrites,
which is somewhat simpler than Vampire work presented at IJCAR but not
significantly so.

Jasmin Blanchette

2020-08-03 8:51:56 (GMT)

Significantly by any means. We'll only have ground terms.

Petar Vukmirovic

2020-08-03 8:52:23 (GMT)

Jasmin Blanchette said:

I'm just saying: the simplification I described above is what separates E's splitting without backtracking from Lightweight AVATAR.

As far as I can see now, yes. But @Simon Cruanes needs to confirm that.

Petar Vukmirovic

2020-08-03 8:53:7 (GMT)

Jasmin Blanchette said:

Significantly by any means. We'll only have ground terms.

I am not convinced. Because, sure, the assertion part is propositional, but a lot of code
for rewriting part needs to be duplicated.

Jasmin Blanchette

2020-08-03 8:53:48 (GMT)

It depends on how efficient this has to be. And honestly, I don't know.

Petar Vukmirovic

2020-08-03 8:54:35 (GMT)

Me neither. So I can agree it is not the end of the world, but I surely do not think it
is a very easy side project.

Jasmin Blanchette

2020-08-03 8:55:14 (GMT)

I haven't said that. But you agree with me that Lightweight AVATAR in E doesn't have to be this gigantic monster either.

Jasmin Blanchette

2020-08-03 8:56:1 (GMT)

The rule I described is sandwiched somewhere between a full-blown Sup and a Demod. It would need its own index.

Petar Vukmirovic

2020-08-03 8:59:47 (GMT)

Sure. And if we decide to implement something like that in E, I would go for Zipperposition-like representation of assertions
and implement a special module that does rewriting (possibly other simplifications) on asserted clauses (for the reasons described above).

Regardless, I think @Stephan Schulz agrees that E's indexing data structures can (should?) be refactored to store any kind of data in
the leaves other than unit clauses (I think I will need that for a lot of indices we use here and there in HO rules)

Jasmin Blanchette

2020-08-03 9:1:10 (GMT)

Well, _that_ sounds like a lot of work that won't happen during your PhD.

Petar Vukmirovic

2020-08-03 9:2:37 (GMT)

I think implementation of those things (if done properly so that they can be merged back to master)
needs proper refactoring of E's data structures so everything does not become a big mess with double
versions of everything

Jasmin Blanchette

2020-08-03 9:6:17 (GMT)

Sure. But I see we're between a rock and a hard place. And there's also the observation that backtracking w/o splitting apparently doesn't work well for E (because of the inapplicability of Demod perhaps?). Ah well, I agree with you: E is not AVATAR-friendly.

Petar Vukmirovic

2020-08-03 9:9:34 (GMT)

That is something Stephan told me once. IIRC, in some of the splitting w/o backtracking papers, they have implemented branch-based simplifications (where branch is interpreted using the values of the names for clause components).
That helped the performance a bit (I do not know how much though)

Jasmin Blanchette

2020-08-03 9:11:22 (GMT)

You don't know which paper by any chance? I thought there was only one.

Petar Vukmirovic

2020-08-03 9:13:12 (GMT)

Hm... let me find that :smile:

Jasmin Blanchette

2020-08-03 9:15:51 (GMT)

Thanks!

Petar Vukmirovic

2020-08-03 9:19:15 (GMT)

image.png

Petar Vukmirovic

2020-08-03 9:19:23 (GMT)

From this paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.23.4605&rep=rep1&type=pdf

Petar Vukmirovic

2020-08-03 9:19:50 (GMT)

This is exactly what Zipperposition checks

Jasmin Blanchette

2020-08-03 9:29:0 (GMT)

Yes, this is exactly the simplification rule I was describing above.

Jasmin Blanchette

2020-08-03 9:29:24 (GMT)

"we note that branch rewriting was implemented in less than 3 hours"

Petar Vukmirovic

2020-08-03 9:32:45 (GMT)

I guess AVATAR was also implemented in weeks in Vampire, even though it would take much more in E :smile:

Jasmin Blanchette

2020-08-03 9:33:6 (GMT)

;)

Jasmin Blanchette

2020-08-03 9:34:36 (GMT)

Couldn't one simply prioritize these almost unit clauses in clause selection and rely on the Sup rule + subsumption to achieve the same result?

Petar Vukmirovic

2020-08-03 9:35:5 (GMT)

That would be a three-hour adventure :smile:

Jasmin Blanchette

2020-08-03 9:35:8 (GMT)

I.e. you just make sure that these clauses where all but one literal is an abstracted split component become active real fast.

Jasmin Blanchette

2020-08-03 9:35:12 (GMT)

He he. ;)

Jasmin Blanchette

2020-08-03 9:35:36 (GMT)

I mean, all the machinery is there... apart from the index, but this would make the index unnecessary.

Petar Vukmirovic

2020-08-03 9:36:7 (GMT)

Yes, indeed. That can be tried.

Simon Cruanes

2020-08-03 12:32:36 (GMT)

Jasmin Blanchette said:

OK, thanks. But then what you get should be very similar to splitting without backtracking, right?

Yes, except you get, for almost free:

branch rewriting
non chronological backtracking
and the implementation is simpler as most of the backtracking state is in the sat-solver.

Simon Cruanes

2020-08-03 12:33:51 (GMT)

Petar Vukmirovic said:

But implement something that does not do the simplifications in-place

you can rewrite in place for non-conditional rewrites, i.e. all the current rewrites, and disable caching for other, conditional rewrites, correct?

Simon Cruanes

2020-08-03 12:37:16 (GMT)

I am surprised that E rewrites in place, I remember slides where Stephan said he tried, gave up, and used caching instead (cache normal form of terms directly in them as a pointer).

Petar Vukmirovic

2020-08-03 12:39:57 (GMT)

We used two notions of in place here

Petar Vukmirovic

2020-08-03 12:41:4 (GMT)

In-place in terms of the clause -- new clause does not created when it is rewritten

What Stephan referred to what rewriting perfectly shared term DAG directly

Jasmin Blanchette

2020-08-03 13:40:15 (GMT)

non chronological backtracking and the implementation is simpler as most of the backtracking state is in the sat-solver.

I don't understand that part. Petar convinced me earlier this morning that the SAT solver is used only to determine unsat, not to guide anything. Could you, Simon, go back to my earlier messages and answer my questions 1 to 5?

Petar Vukmirovic

2020-08-03 13:40:58 (GMT)

What he mean there I think is only that some branches can be killed if they are implied on level-0 as determined by the sat solver

Jasmin Blanchette

2020-08-03 13:43:27 (GMT)

Ah, thanks for clarifying.

Petar Vukmirovic

2020-08-03 13:44:0 (GMT)

Of course, he might have thought something else :smile:

Jasmin Blanchette

2020-08-03 13:44:14 (GMT)

It makes sense, though.

Simon Cruanes

2020-08-03 14:29:43 (GMT)

Well, when you have a conflict, it affects the whole tree. I'm not sure how to expand on my intuition of CDCL's exploration of the search space.

Of course, the hiccup is that AVATAR-lite explores all branches "at the same time" since it doesn't use the SAT model. I'm not sure what a middle ground would look like (i.e. use the SAT model but with something simpler than full clause locking/unlocking).

Petar Vukmirovic

2020-08-03 14:31:35 (GMT)

@Giles suggested selecting clauses that whose assertions are satisfied

Petar Vukmirovic

2020-08-03 14:31:52 (GMT)

In principle, it can be sued for some other heuristics as well

Simon Cruanes

2020-08-03 14:32:8 (GMT)

yes, it makes sense indeed. Just explore the current branch prioritarily.

Jasmin Blanchette

2020-08-03 14:49:7 (GMT)

Sounds like a good idea indeed.

Simon Cruanes

2020-08-03 15:1:2 (GMT)

In essence the locking/unlocking is only done in the clause queues…

Jasmin Blanchette

2020-08-04 8:2:11 (GMT)

(1) This stuff about level-0 implied literals: Does that help already with a SAT solver, or do you need SMT for it to make sense? I.e. what would be a case where a lightweight AVATAR with a SAT solver would skip a branch in the split tree and splitting without backtracking would not?

Jasmin Blanchette

2020-08-04 8:6:41 (GMT)

(2) At the theoretical level, the big gain of AVATAR is that it doesn't fully saturate at the limit; it only saturates all the infinite branches. (An even stronger splitting architecture would saturate only one infinite branch and ignore the others, but I believe that would require insights into how the SAT solver works.) Similarly, labeled splitting saturates only one infinite branch, but that's because is SAT strategy is so restricted that there's at most one such branch.

In the prover, this is reflected by the fact that AVATAR strictly adheres to the current model and avoids inferences outside. An architecture that doesn't do this strictly might in practice work well, but it would theoretically saturate too much, and that's a bad sign.

Petar Vukmirovic

2020-08-04 9:12:3 (GMT)

1) It makes sense already with SAT. If you happened to derive clauses (on assertion level) $A_1$ , $\neg A _1 \lor A_2$ , A_2 will be zero-implied, which would result in removing all clauses that are asserted under $\neg A_2$ and removing $A_2$ from the assertions

Petar Vukmirovic

2020-08-04 9:12:49 (GMT)

To the best of my knowledge, splitting wo backtracking does not do that (of course it can be imitated with lots of resolutions, but yeah this is much cleaner...)

Jasmin Blanchette

2020-08-04 9:13:27 (GMT)

Right, it doesn't do that because it gives such a low weight to the propositional atoms. I see.

Petar Vukmirovic

2020-08-04 9:14:4 (GMT)

2) I think you are right there. I think that LW AVATAR has no way of knowing it saturated a single branch :frown:

Simon Cruanes

2020-08-04 13:19:4 (GMT)

Petar Vukmirovic said:

1) It makes sense already with SAT. If you happened to derive clauses (on assertion level) $A_1$ , $\neg A _1 \lor A_2$ , A_2 will be zero-implied, which would result in removing all clauses that are asserted under $\neg A_2$ and removing $A_2$ from the assertions

you don't even necessarily need to derive a clause here, simple boolean propagation before any decision will do it :smile:

Petar Vukmirovic

2020-08-04 15:33:29 (GMT)

I agree, I just gave a simple example to illustrate what happens even on SAT level :smile:

Jasmin Blanchette

2020-08-04 16:5:51 (GMT)

I'm grateful for the simple example. ;)

Giles

2020-08-05 10:3:52 (GMT)

On (1) we have (possibly had) lots of code in Vampire to do 'zero-implied optimisations' where we try and detect zero-implied variables and remove them from split sets. It sounds good as things derived in one branch can be used in more other branches. But in our early experiments it made close to zero difference. Of course there's always the chance we did it wrong...

Giles

2020-08-05 10:25:6 (GMT)

On (2) it should be possible to tell whether the passive set contains anything in the current SAT model but doing that efficiently doesn't sound easy.

Simon Cruanes

2020-08-05 13:36:41 (GMT)

Indeed, the 0-level things were also useful to Zipperposition because avatar-lite was used in induction and cuts (the ones produced by lemmas). When you prove a lemma (closing $cnf(\lnot F) \leftarrow \lnot \lfloor F \rfloor$ ) it becomes a (set of) clause(s) without assertions by simplifying in $cnf(F) \leftarrow \lfloor F \rfloor$ .