zulip log viewer

Petar Vukmirovic

2020-01-29 16:54:48 (GMT)

Dear all,

I have found some time to start working on Avatar-lite or however we are going to call it in E.

I guess mostly Stephan, Simon and me will be discussing it, but if anyone wants they are more than free to join (especially people like @Gabriel Ebner who already implemented Avatar).

Simon Cruanes

2020-01-29 17:5:0 (GMT)

ALright, that's very exciting.
Is Stephan around here?

Evatar (Abatar, like @Jasmin Blanchette likes to say) is a simplified version of Avatar where the SAT model isn't used to activate/freeze clauses, but simply to help make clauses shorter (and make branch simplifications easy, etc.). The SAT proofs of individual literals (literals assigned at level 0) can still be used to prune the superposition prover's set of boolean assumptions, so that's quite useful.

Petar Vukmirovic

2020-01-29 17:5:51 (GMT)

@Simon Cruanes , I have quickly went through your implementation and it is quite impressive. So, if I understand correctly, your avatar works as in the following description.

Clause $C$ is augmented to be $C \leftarrow A$ , where $A$ is a conjunction of labels (called assertion), each label standing for a component of (another) clause.

Whenever a generating inference is performed, resultant clause is asserted under the union of the assertions of parents.

Whenever a simplifying inference is performed : (1) if it is a unary simplifying inference then no changes are done to the assertion of the resulting clause; (2) if it is a binary simplifying inference, the clause $C \leftarrow A$ that is simplifying the active (given) clause $D \leftarrow B$ must have assertion $A$ for which $A \subseteq B$

In the background, there is a SAT solver that contains all assertions.

Whenever we determine that clausal (non-assertion) part of a clause $C \leftarrow A1 \land ... \land A_n$ can be split into $C_1, \ldots, C_m$ , where assertions assigned to each part are $B_1, \ldots, B_m$ , we add the following boolean clauses to SAT state:

$\neg A1 \lor \ldots \lor \neg A_n \lor B_1$ ,
...
$\neg A1 \lor \ldots \lor \neg A_n \lor B_1$ .

Petar Vukmirovic

2020-01-29 17:7:26 (GMT)

Finally, when clause $\bot \leftarrow A_1 \ldots A_n$ is created, we add the following boolean clause to the SAT state:
$\neg A_1 \lor \ldots \lor \neg A_n$

Simon Cruanes

2020-01-29 17:14:39 (GMT)

That seems like an accurate description.
The whole thing is worth doing because the boolean assumptions $A$ are much cheaper to handle than anything else, even in the subsumption test for simplifying $D \leftarrow B$ using $C \leftarrow A$ (it's OCaml's set's subset check which is $O(n \log(n))$ afaik).

This is also useful for custom case splitting (e.g. in the implementation of induction) because you can directly create a branch for each case by allocating a boolean literal for it, and if branches are incompatible you simply have to assert $\lnot A \lor \lnot B$ to the SAT solver.

Petar Vukmirovic

2020-01-29 17:19:16 (GMT)

Now, I have two questions:

What are lemmas in Zipperposition (as a prover object)?
Do you do the following:

When a new clause $C \leftarrow A_1 \land \ldots \land A_n$ which is unsplittable, do you check if $C$ is a component?
If $C$ is an component, then we can create boolean clause $\neg A_1 \lor \ldots \lor \neg A_n \lor [| C |]$
where $[| C |]$ is a boolean encoding of C.

Simon Cruanes

2020-01-29 17:23:22 (GMT)

lemmas are described in frocos 17, section 5. They're basically a cut for a brand new formula you somehow intuit, but they're only useful if you can prove the formula first. It's a generic mechanism even though it was introduced for (guessed) inductive lemmas.

Simon Cruanes

2020-01-29 17:24:46 (GMT)

if the clause is unsplittable, $C$ is a component. Do you mean, checking if $C$ is a ground atomic literal? I don't think I do that but it'd make sense.
However, in this case $C \leftarrow A$ will easily interact by ground superposition with $\lnot C \leftarrow A'$ to produce $\bot \leftarrow A \cup A'$ which becomes a SAT clause.

Petar Vukmirovic

2020-01-29 17:29:11 (GMT)

So, they are orthogonal to Avatar? In other words, one can exist without the other?
C does not have to be atomic clause but just a registered component.
In priciciple, everything SAT solvers do can be done using (ground) superposition but my guess is it will be more efficient using SAT.

Simon Cruanes

2020-01-29 17:31:17 (GMT)

They're building on top of Avatar to make it cheap to have a case distinction between the branch where you try to prove the lemma ( $… \leftarrow ¬[F]$ ), and the branch where you can use the lemma ( $… \leftarrow [F]$ ).
Ah, by "registered component" you mean something already in the mapping component<->boolean atoms. We don't do that but it couldn't hurt indeed.

Petar Vukmirovic

2020-01-29 17:33:20 (GMT)

OK, thanks. I will have a look.
Yes, I will play with this and let you know what is the outcome.

Stephan Schulz

2020-01-30 17:39:23 (GMT)

Some things we want to consider:

Is Avatar one strategy to pick, or should each branch use its own strategy?
Do we run this of one proof state, or does each branch get one proof state and we have a new master proof state/proofobject?

E already has code for detecting and reasonably efficiently maintain splittable clauses with boolean definitions for the "splitting without backtracking" that never really took off for me.

Simon Cruanes

2020-01-30 17:46:23 (GMT)

To me that's still all in one proof state. Evatar/Evader(?) really is just a better alternative to "splitting without backtracking" (with, imho, no downside, only upsides, as you get small clauses and branch simplification for a reasonably low implementation cost and 0 calculus-level cost).

Petar Vukmirovic

2020-01-30 18:26:7 (GMT)

@Simon Cruanes I have an addendum to that:)

Right now I am running some evaluations and there are cases where configuration with Evatar performs worse than configuration without Evatar
(I hate the name Evader)

Few remarks about Avatar:

Rewriting on incompatible assertions is not implemented. Question: Can we achieve something by staining the demodulated clause with the assertions from the demodulator?

If $\bot \leftarrow A_1, \ldots A_n$ is derived, shall we add $\neg A_1 \land \ldots \land \neg A_n$ where $\neg A_i$ is the negated (and skolemized) original component?

Petar Vukmirovic

2020-01-30 18:27:19 (GMT)

Some things we want to consider:

Is Avatar one strategy to pick, or should each branch use its own strategy?

Do we run this of one proof state, or does each branch get one proof state and we have a new master proof state/proofobject?

@Stephan Schulz There are no real branches here since we have no model. This avatar is really more like native treatment of names as in splitting with backtracking (or better yet support for interpretation of the names)

Simon Cruanes

2020-01-30 18:31:40 (GMT)

Rewriting on incompatible assertions is not implemented. Question: Can we achieve something by staining the demodulated clause with the assertions from the demodulator?

I don't think so, unless you make it an inference step and not a simplification step. The simplification is incorrect because you removed a clause that is not redundant in some models. Or am I missing something you said?

Are there cases where Evatar is clearly worse than E with splitting without backtracking ?

Petar Vukmirovic

2020-01-31 18:3:50 (GMT)

What I meant was more like if you have f(X) = g(X) <- 1 and a clause ...f(a)... <- 2, you can rewrite it into ....g(a)... <- 1,2

Petar Vukmirovic

2020-01-31 18:4:39 (GMT)

I am waiting for the latest results when I will get back with the list of problems on which Zip is clearly slower, but on smoketest of 1000 problems
there are ~10 on which no avatar is beter than avatar

Petar Vukmirovic

2020-01-31 18:31:4 (GMT)

OK, the results are:

avatar solves 37 problems non-avatar cannot
non-avatar solves 43 avatar cannot

Simon Cruanes

2020-01-31 19:8:33 (GMT)

What I meant was more like if you have f(X) = g(X) <- 1 and a clause ...f(a)... <- 2, you can rewrite it into ....g(a)... <- 1,2

But then you lose completeness.

Simon Cruanes

2020-01-31 19:10:54 (GMT)

Is there a way to compare with forced splitting-without-backtracking? Evatar could be enabled or disabled depending on heuristics, the same way as old school splitting.

Simon Cruanes

2020-01-31 19:11:4 (GMT)

(Also, that was a quick implementation!! Did you reuse picosat?)

Petar Vukmirovic

2020-01-31 19:16:57 (GMT)

Is there a way to compare with forced splitting-without-backtracking? Evatar could be enabled or disabled depending on heuristics, the same way as old school splitting.

Do you have Splitting without backtracking in Zip?

Petar Vukmirovic

2020-01-31 19:17:21 (GMT)

(Also, that was a quick implementation!! Did you reuse picosat?)

I am talking about implementation in Zip. I am still experimenting with it.

Petar Vukmirovic

2020-01-31 19:17:43 (GMT)

What I meant was more like if you have f(X) = g(X) <- 1 and a clause ...f(a)... <- 2, you can rewrite it into ....g(a)... <- 1,2

But then you lose completeness.

Are you sure? Can you explain me how?

Simon Cruanes

2020-01-31 22:44:35 (GMT)

Do you have Splitting without backtracking in Zip?

No, because it'd also require more work to make sure the ordering picks splitting atoms last (otherwise you get back to non-split clauses). Never got much use for that.

Re: completeness: by simplifying with a clause f(x)=g(x) <- 1 the clause C <- 2, you basically remove the clause C <- 2, ¬1 from the search space without good reason. In the boolean model ¬1, 2, this clause may have been needed to prove $\bot$ and now it's gone.

Petar Vukmirovic

2020-02-01 12:0:21 (GMT)

Do you have Splitting without backtracking in Zip?

Re: completeness: by simplifying with a clause f(x)=g(x) <- 1 the clause C <- 2, you basically remove the clause C <- 2, ¬1 from the search space without good reason. In the boolean model ¬1, 2, this clause may have been needed to prove $\bot$ and now it's gone.

Yeah, I realized that a little bit after I wrote the message :(

Here are my findings from a test of running no-Evatar Zip on all TPTP for 10s vs Evatar Zip on TPTP for 10s:

The results are that Evatar solves 10 more than no-Evatar Zip (counting only refutations).
Many times no-Evatar saturates, whereas Evatar does not.
It seems like in some cases, when you split a clause your heuristics get drawn by smaller and smaller clauses created by that branch. Now, if it is hard to refute that branch, you might get stuck there for a long time. Usually that branch is not important for a refutation, and you lose :( When me and Jasmin were working on AVATAR we realized that there Vampire sometihs switches models to prevent this behavior. In AVATAR there is no remedy for this.
A solution that I found is to create a clause priority function called PreferShortTrails which will prefer clauses with short trails and get Zipperposition out of rabbit hole maybe.

Some ideas:

What if we spilt clauses when they are infered and forward simplified rather than when they are chosen? This has the advantage that if many propositional clauses are derived SAT solver will know about them and possibly solve the problem without waiting for the clauses to get picked?
@Simon Cruanes : How about we implement spliting-without-backtracking like this: Names will be Builtins constructed by SplitName *i* where i is some fresh integer. Now Builtins are already smaller than everything else and when two Builtins are compared we compare them by their i. This would avoid all complications I think.

Simon Cruanes

2020-02-01 18:31:8 (GMT)

Yeah, I realized that a little bit after I wrote the messaquote

That happens :-).

Evatar solves 10 more than no-Evatar Zip (counting only refutations).

Many times no-Evatar saturates, whereas Evatar does not.

It seems like in some cases, when you split a clause your heuristics get drawn by smaller and smaller clauses created by that branch. Now, if it is hard to refute that branch, you might get stuck there for a long time. Usually that branch is not important for a refutation, and you lose :( When me and Jasmin were working on AVATAR we realized that there Vampire sometihs switches models to prevent this behavior. In AVATAR there is no remedy for this.

A solution that I found is to create a clause priority function called PreferShortTrails which will prefer clauses with short trails and get Zipperposition out of rabbit hole maybe.

That's very interesting. I've never been too interested in saturations (they're too unpredictable and just adding something like X=a|X=b will make E or Zipper easily diverge). However it'd still be nice to not be worse than Zip-no-evatar.

The switching model behavior is interesting, I didn't know about that. Intuitively, Evatar splits one problem into lots of smaller problems that should be easier to refute (smaller clauses), but it does indeed mean that you have to saturate all of them for a SAT input! So we could think of ways to detect if one branch is saturated (with a satisfiable trail as well) and stop there. The heuristic to favor short trails sounds good to me, indeed, there should be some fairness in branches that are processed. Or maybe penalize branches that are obtained after a lot of case splits (which is not exactly the same, since trails can shrink too).

What if we spilt clauses when they are infered and forward simplified rather than when they are chosen? This has the advantage that if many propositional clauses are derived SAT solver will know about them and possibly solve the problem without waiting for the clauses to get picked?

it's doable, it's just a matter of doing the multi-clause simplifications more eagerly (without switching full Otter loop on the subsumption side). Worth trying imho.

Simon Cruanes : How about we implement spliting-without-backtracking like this: Names will be Builtins constructed by SplitName *i* where i is some fresh integer. Now Builtins are already smaller than everything else and when two Builtins are compared we compare them by their i. This would avoid all complications I think.

Sure, you can try that as well. I'm not a big fan of splitting-without-backtracking as it seems no-one is enthusiastic about them, and Vampire just removed them entirely afaik. But for experimenting it's a good idea.

Simon Cruanes

2020-02-07 23:18:4 (GMT)

Another interesting case where Evatar isn't that optimal: ./zipperposition.exe examples/don_t_use_omega.p with and without --no-avatar. Without evatar, 21 iterations with a short proof; with it, 5075 iterations with a big proof (but note that this is not a complete calculus anyway).

Evatar (avatar-lite)

Petar Vukmirovic

Simon Cruanes

Petar Vukmirovic

Petar Vukmirovic

Simon Cruanes

Petar Vukmirovic

Simon Cruanes

Simon Cruanes

Petar Vukmirovic

Simon Cruanes

Petar Vukmirovic

Stephan Schulz

Simon Cruanes

Petar Vukmirovic

Petar Vukmirovic

Simon Cruanes

Petar Vukmirovic

Petar Vukmirovic

Petar Vukmirovic

Simon Cruanes

Simon Cruanes

Simon Cruanes

Petar Vukmirovic

Petar Vukmirovic

Petar Vukmirovic

Simon Cruanes

Petar Vukmirovic

Simon Cruanes

Simon Cruanes