2020-04-14 14:46:16 (GMT)
Let's get this started :)
2020-04-14 14:46:45 (GMT)
As @Simon Cruanes pointed out, we can have two things:
1/ only print the derived steps (assuming the checker knows the problem and assumptions already), or 2/ reprint type info, definitions, assumptions, and steps ("self-contained")
2020-04-14 14:46:58 (GMT)
I think it'd be nice to have 2
2020-04-14 14:47:18 (GMT)
that would make it easier to have a stand-alone checker
2020-04-14 14:48:1 (GMT)
But as @Hans-Jörg point out:
"too much needs to be reproduced and if you want to check if a proof is a proof for a problem you need to still do the mapping to the proof"
2020-04-14 14:48:11 (GMT)
which is true. So is it worth it?
2020-04-14 14:49:56 (GMT)
Why would it be "too much"? Basically the input problem would be part of the proof (or parts of the input problem), which is fine if it's optional imho. Option 2/ is good when you want an independent proof checker, not proof reconstruction.
2020-04-14 14:50:44 (GMT)
something to note: we already have additional define-funs for the skolems (as an option)
2020-04-14 14:51:38 (GMT)
as a user, wouldn't I also want to ask an independent checker: "here is an smt-lib problem, here is a claimed proof: is this really a proof for the problem". for proof reconstruction one might even skip the smt-lib problem.
2020-04-14 14:52:42 (GMT)
But parsing the whole SMTlib file might be a lot more complicated, right? if you have push/pop, set-info, etc.
What if you have several check-sat, which one is proved unsat?
2020-04-14 14:54:22 (GMT)
right, incremental problems, my blind spot (given that veriT does not support it).
2020-04-14 14:54:36 (GMT)
do you think the proof should follow the usage of named terms in the input, or just expand them?
2020-04-14 14:57:5 (GMT)
hmm I don't know, the simplest would be the best I think?
But anyway, I imagine it's ok to parse the .smt2 file first, I'll just go and use dolmen (cc @Guillaume Bury :wink: ) or my smtlib parser…
2020-04-14 14:57:43 (GMT)
it is not very unreasonable to assume the checker always takes an original SMT problem and then its proof
2020-04-14 14:57:52 (GMT)
but I like the idea of having a proof as self-contained object
2020-04-14 14:58:20 (GMT)
if the proof format is defined as an extension of smtlib2, I guess it could literally just be the concatenation of both files :)
2020-04-14 14:58:38 (GMT)
minus clutter (options, check-sat etc)
2020-04-14 15:2:16 (GMT)
well for incremental problems one has to anchor the proof to some state. Is there currently some "format" that is the smt-lib input interleaved with the solver output (since the output is also somewhat standardized)?
2020-04-14 15:6:9 (GMT)
I don't think it's worth it to discuss incremental problems
2020-04-14 15:6:25 (GMT)
if one wants proofs for incremental problems write problems separately
2020-04-14 15:17:56 (GMT)
Yeah let's not talk about incremental for now, I was just pointing out that smtlib2.6 is a rich language.
2020-04-14 15:21:49 (GMT)
@Simon Cruanes so concretly: could you for now take the original SMT problem and the not-self-contained proof?
2020-04-14 15:22:16 (GMT)
Yeah I think so. Anyway I'm a long way away from a working state, but I have smtlib parsers :)
2020-04-14 15:23:54 (GMT)
it's a good start!
2020-05-20 13:36:19 (GMT)
SMT: Assertions:
¬ (¬ 0 ≤ int (size l') ∨
¬ 10 * int (size lr) < 4 + 14 * int (size rr) ∨
10 * int (size lr) ≤ 15 + 25 * int (size l') ∨
¬ 10 * int (size lr) + 10 * int (size rr)
≤ 30 +
25 *
int (size
l'))
SMT: Names:
sorts:
A$ = 'a
B$ = 'b
C$ = 'c
Nat$ = nat
functions:
l$ = l'
lr$ = lr
rr$ = rr
size$ = size
size$a = size
size$b = size
of_nat$ = int
SMT: Problem:
; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
(set-option :produce-proofs true)
(set-logic AUFLIA)
(declare-sort A$ 0)
(declare-sort B$ 0)
(declare-sort C$ 0)
(declare-sort Nat$ 0)
(declare-fun l$ () A$)
(declare-fun lr$ () B$)
(declare-fun rr$ () C$)
(declare-fun size$ (A$) Nat$)
(declare-fun size$a (B$) Nat$)
(declare-fun size$b (C$) Nat$)
(declare-fun of_nat$ (Nat$) Int)
(assert (! (not (or (not (<= 0 (of_nat$ (size$ l$)))) (or (not (< (* 10 (of_nat$ (size$a lr$))) (+ 4 (* 14 (of_nat$ (size$b rr$)))))) (or (<= (* 10 (of_nat$ (size$a lr$))) (+ 15 (* 25 (of_nat$ (size$ l$))))) (not (<= (+ (* 10 (of_nat$ (size$a lr$))) (* 10 (of_nat$ (size$b rr$)))) (+ 30 (* 25 (of_nat$ (size$ l$)))))))))) :named a0))
(check-sat)
(get-proof)
SMT: Invoking SMT solver "verit" ...
SMT: Solver:
SMT: Result:
unsat
(assume a0 (! (not (! (or (! (not (! (<= 0 (! (of_nat$ (size$ l$)) :named @p_2)) :named @p_14)) :named @p_6) (or (! (not (! (< (! (* 10 (of_nat$ (size$a lr$))) :named @p_1) (+ 4 (* 14 (! (of_nat$ (size$b rr$)) :named @p_3)))) :named @p_15)) :named @p_7) (or (! (<= @p_1 (+ 15 (! (* 25 @p_2) :named @p_4))) :named @p_8) (! (not (! (<= (+ @p_1 (* 10 @p_3)) (+ 30 @p_4)) :named @p_16)) :named @p_9)))) :named @p_5)) :named @p_10))
(step t2 (cl (= @p_5 (! (or @p_6 @p_7 @p_8 @p_9) :named @p_11))) :rule tmp_AC_simp)
(step t3 (cl (! (= @p_10 (! (not @p_11) :named @p_13)) :named @p_12)) :rule cong :premises (t2))
(step t4 (cl (not @p_12) @p_5 @p_13) :rule equiv_pos2)
(step t5 (cl @p_13) :rule th_resolution :premises (a0 t3 t4))
(step t6 (cl @p_14) :rule not_or :premises (t5))
(step t7 (cl @p_15) :rule not_or :premises (t5))
(step t8 (cl (not @p_8)) :rule not_or :premises (t5))
(step t9 (cl @p_16) :rule not_or :premises (t5))
(step t10 (cl @p_6 @p_7 @p_8 @p_9) :rule la_generic :args (25 1 (div 12 5) (div 7 5)))
(step t11 (cl) :rule resolution :premises (t10 t6 t7 t8 t9))
2020-05-20 16:9:4 (GMT)
I think the problem we discussed is quite fundamental with integer programing problems that have a rational solution: of course the coefficients will be wrong, because this is (the?) rational solution. What is missing is the bounds that make the problem unsat. I will discuss with Pascal on monday what he thinks of this.
You can also expose the problem by just asserting:
(assert (> term0 0))
(assert (> 1 term0))
2020-05-20 17:46:32 (GMT)
My understanding was always that the method used in veriT works both on LRA and LIA, but it is only complete for LRA. Your example is in that fragment: LRA-satisfiable, but not LIA satisfiable.
2020-05-20 17:46:58 (GMT)
But that does not explain the problems with the coefficients…
2020-07-08 11:24:2 (GMT)
@Hans-Jörg I tried to abstract over terms when reconstructing LIA and I immediately hit a wall:
Example
If I look at the proof,
(step t48 (cl
(< 0 (ite (forall ((veriT_vr0 Int)) (or (< veriT_vr0 0) (< 0 veriT_vr0))) (- 1) 3))
(<= (ite (forall ((veriT_vr0 Int)) (or (< veriT_vr0 0) (< 0 veriT_vr0))) (- 1) 3)
3))
:rule la_generic :args (1 1))
I initially abstracted completely over if-then-else, yielding 0 < XX \/ XX <= 3 (which is not provable). Do you have any idea I have to look into terms? (Do I have to look into all ifs?)
2020-07-09 10:16:25 (GMT)
I think 0 < x ∨ x ≤ 3 is valid. If x ≤ 0 then certainly x ≤ 3, if otherwise 3 < x then 0 < x.
2020-07-16 5:28:4 (GMT)
I might be confused again @Hans-Jörg
The la_generic steps proves that ¬ 3 < 2 * x ∨ a < 0 ∨ ¬ 3 * x + 7 * a < 4 with the coefficients 3, 7, and 1. However, I am confused: how do the 3*x cancel out? (The first equality gives 6 * x, the second one 0*x and the last one 3 * x).
Isabelle linarith proves it with the coefficients 3, 14, and 2…
2020-07-16 5:28:44 (GMT)
(PS: This is the last not-working example of SMT_Examples, after that time for mirabelle testing :ogre: )
2020-07-23 8:57:20 (GMT)
hm I wonder if something like the following is happening here: 3 < 2 * x gets strengthened to 4 ≤ 2*x and then this is divided by 2 without the division being recorded. I will have a look.
2020-07-23 13:6:30 (GMT)
It was simpler. It's fixed now veriT now prints "(div 3 2)" as the coefficient.
2020-07-26 13:48:23 (GMT)
That fixed the problem, thanks.
2020-07-27 13:1:18 (GMT)
Damn, I have a proof where the forall_inst is so complicated that metis does not manage to do it (after instantiating the quantifier correctly) :surprise:
2020-07-27 13:1:43 (GMT)
(meson is able to, so the step is correct)
2020-07-27 13:3:2 (GMT)
Is it a step where the quantified term is clausified implicitly?
2020-07-27 13:3:14 (GMT)
yes…
2020-07-27 13:4:15 (GMT)
:(
2020-07-27 13:4:19 (GMT)
well not exactly:
∀veriT_vr72::'c.
∃veriT_vr73::'c.
({uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1} ⟶
F.F (λuua::'b. g uua veriT_vr72) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ (∃a::'b. g a veriT_vr72 ≠ ❙1)) ∧
((F.F (λuua::'b. g uua veriT_vr73) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ (∃a::'b. g a veriT_vr73 ≠ ❙1)) ⟶
{uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1}) ⟹
∀veriT_vr72::'c.
∃veriT_vr73::'c.
∀a::'b.
∃aa::'b.
({uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1} ⟶
F.F (λuua::'b. g uua veriT_vr72) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ g aa veriT_vr72 ≠ ❙1) ∧
((F.F (λuua::'b. g uua veriT_vr73) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ g a veriT_vr73 ≠ ❙1) ⟶
{uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1})
(left of ==>: instantiated assumption; right of ==>: conclusion)
2020-07-27 13:5:25 (GMT)
it is more some kind of maxi-scoping
2020-07-27 13:8:8 (GMT)
Okay, the simplifier is committing hara-kiri on that. It is weird.
2020-07-27 13:17:54 (GMT)
... but easy to fix actually
2020-07-27 18:5:13 (GMT)
@Hans-Jörg I have a lot of horrible work for you again (veriT master branch, version 5fe72ca)…
The problematic step is (step t95 (cl @p_324) :rule th_resolution :premises (a13 t93 t94)).
The assumptions are:
a13: @p323
t93: (forall ((?v0 A_unit_rbt$) (?v1 A$)) @p_321) = @p324
t94: @p323 != @p324 \/ ~@p323 \/ @p324
But (forall ((?v0 A_unit_rbt$) (?v1 A$)) @p_321) should actually be @p323 (and they are equal up to some one-point rule application). t93 is resulting from a qnt_simplify.
(I might be wrong with the diagnostic…)
2020-07-27 18:14:25 (GMT)
(and if there is any way to give you more useful information, please tell me)
2020-07-27 18:30:46 (GMT)
I will look into it. I made good headway today with removing implicit double neg elimination and implicit removal of repeated literals (+ implicit simplification to True, but I'm not sure this ever appears in a proof printed). Btw. 5fe72ca has a bug where la_generic does an out of bounds write which is fixed on HEAD.
2020-07-27 18:36:39 (GMT)
I don't see a newer commit, but there is no LA/LIA involved in this proof.
2020-07-27 18:39:0 (GMT)
huh my master is at "5a8b16531c27529d7dcc548135fe85578e1d689a". But yeah that bug is unrelated, but you might get spurious crashes.
2020-07-27 18:45:51 (GMT)
I already have that commit but the version number is not updated… After git clean -xdf and recompiling, it is fine now
2020-07-27 19:1:57 (GMT)
ah ok need to fix that at some point
2020-07-28 5:47:18 (GMT)
I have now finished checking the failed smt calls generated by mirabelle on HOL-Library with cvc4 and veriT :octopus:
2020-07-28 8:9:48 (GMT)
Interesting, failed proofs generated by e (that even metis is not able to reconstruct) seems to trigger a lot of issues...
Me being confused by this proof step:
(step t102.t15 (cl true) :rule equiv_pos2)
This does not look like a valid application of equiv_pos2 (except if it generates a tautology that is immediately simplified… but why do you generate that proof step and not a true step?)
2020-07-28 9:14:43 (GMT)
Assumption most likely correct, here is a simpler example:
The relevant proof step:
(step t119.t14 (cl (= (not (not (= (sfilter$ p$ s$) (sCons$ veriT_sk52 veriT_sk53)))) (= (sfilter$ p$ s$) (sCons$ veriT_sk52 veriT_sk53)))) :rule not_simplify)
(step t119.t15 (cl true) :rule equiv_pos2)
equiv_pos2 is
cl (= (not (not X)) X) (not (not (not X))) X)
aka, after double-negation simplification
cl (= (not (not X)) X) (not X) X)
which is a tautology. Now I can fix the proof reconstruction to support, but I am not sure that this is really the intended use of equiv_pos2...
2020-07-28 13:5:26 (GMT)
this is one of the implicit simplifications that happens for proof steps and that I'm working on removing. Actually this is a good test example because it crashes my branch ^^
2020-07-28 13:9:16 (GMT)
I am unsure however, why this tautology is produced. Somehow it feels like this is unnecessary in a proof and a simplification step could catch that…
2020-07-28 13:10:40 (GMT)
yeah I'm wondering the same, to cite myself: " (+ implicit simplification to True, but I'm not sure this ever appears in a proof printed)" this seems kinda useless, but maybe it's not catched by the pruning
2020-07-28 13:25:29 (GMT)
what actually breaks if tautologies are not simplified to true?
2020-07-28 13:42:38 (GMT)
Nothing, surely. You can formulate arbitrary complicated tautologies, and checking whether they are tautologies is undecidable after all.
2020-07-28 14:21:36 (GMT)
Abstractly I agree with you. In veriT, I don't know… maybe some function is assuming that syntactic tautologies have been simplified and are not pruned from the output…
2020-07-28 15:10:49 (GMT)
Of course.
2020-07-28 18:14:15 (GMT)
Wow, veriT sometimes uses la_generic to prove that 0 != 1, that is unexpected
2020-07-28 18:21:39 (GMT)
yeah I think there is no prepossessing step for that
2020-07-28 19:10:41 (GMT)
The only reason I saw that step, is actually that my reconstruction was missing some rules for reals (a = b ==> a * p / n = b * p / n)
2020-07-29 10:30:0 (GMT)
I am working on a proof where the choices generated are so big that just theAssumption.add_assumes (used to define them) takes more than 1s...
2020-07-29 10:31:10 (GMT)
and iterating over the terms is too slow
2020-07-29 11:18:49 (GMT)
You are referring to Hilbert choice?
2020-07-29 11:18:53 (GMT)
Yes, these can get big.
2020-07-29 11:24:30 (GMT)
Yeah Hilbert choice. Let's try to decrease the term size.
2020-07-29 11:42:2 (GMT)
To give an idea of the size, even pretty printing via @{print} throws an exception…
2020-07-29 12:34:54 (GMT)
Is that with the version of the proof with sharing?
2020-07-29 12:53:35 (GMT)
There is no sharing on the Isabelle side.
beginning of the term
2020-07-29 12:55:5 (GMT)
I am currently trying to fold the other choices to reduce the size (every SOME is actually a constant), but even iterating over the term is very slow, so I cannot use the default Isabelle method.
2020-07-29 12:56:33 (GMT)
And I have to kill Isabelle once an hour, because java runs out of memory when outputting the trace I need to understand what is happening…
2020-07-29 13:4:12 (GMT)
after folding
2020-07-29 14:3:16 (GMT)
@Hans-Jörg @Haniel Barbosa is generating twice the same Hilbert choice (except for the name) intended or not?
2020-07-29 14:5:32 (GMT)
2020-07-29 14:7:1 (GMT)
You mean it's generating alpha-equivalent choice functions?
2020-07-29 14:7:35 (GMT)
exactly
2020-07-29 14:8:35 (GMT)
Let me take a look, sec
2020-07-29 14:9:18 (GMT)
this is on Hans' fork, I imagine?
2020-07-29 14:9:31 (GMT)
Yes.
2020-07-29 14:9:37 (GMT)
ok
2020-07-29 14:13:16 (GMT)
no difference in the devel branch of veriT
2020-07-29 14:18:4 (GMT)
The original formula has something like (forall x . F) ^ (forall x . F), which occur negatively and are being skolemized, I believe. veriT is kinda limited in how it does alpha-equivalence elimination
2020-07-29 14:18:40 (GMT)
to avoid what you are seeing it'd be necessary to improve the alpha-equivalence elimination before skolemization happens
2020-07-29 14:19:25 (GMT)
Hopefully that is not a critical issue?
2020-07-29 14:24:8 (GMT)
something in my reconstruction does not like it. I just have to find out what...
2020-07-29 14:51:23 (GMT)
In the reconstruction of sko_forall, I was relying on the form x = choice ..., which did not happen anymore, because I was too aggressively folding in skolem terms (in this case instead of sk15 = choice ..., I had sk15 = sk4)
2020-07-29 14:54:12 (GMT)
If you used the same skolem for both choices it should be ok, though?
2020-07-29 14:55:41 (GMT)
currently, I try to stay as close as possible to veriT (easier to debug)
2020-07-29 14:56:2 (GMT)
but yes, this is one of the simplifications, I should probably implement
2020-07-29 19:31:55 (GMT)
@Mathias Fleury I tried to fix the qnt_tidy problem you had (after discussion with @Haniel Barbosa ). A prior proof was lost and now a transitivity step is added which tries to catch it. But I'm not sure if it's correct maybe you can just try and let me know that it doesn't work ^^
2020-07-30 5:25:46 (GMT)
I first merged the no-implicit-simplification branch and it works, thanks @Hans-Jörg and @Haniel Barbosa !
2020-07-31 14:10:39 (GMT)
I improved proof pruning to prune those true steps (and in general inside subproofs) I hope it breaks nothing ^^
2020-07-31 18:34:31 (GMT)
none of my test broke
2020-08-04 9:44:5 (GMT)
@Hans-Jörg how hard would it be to not generate bindings for choice and directly use the constant instead:
For example, if the proof contains
(define-fun veriT_sk0 () A$ (! (choice ((veriT_vr201 A$)) ... :named @p_2104))
use veriT_sk0 instead of @p_2104
?
2020-08-04 9:51:22 (GMT)
forget it, it is two lines in isabelle
2020-08-04 9:51:51 (GMT)
and the gain is 1.6s -> 95ms (really) during parsing (more precisely when creating the isabelle terms)
2020-08-04 11:35:50 (GMT)
cool! Changing that should not be to difficult, but I have to look at the code again.
2020-08-04 12:5:46 (GMT)
I don't really expect an answer, but do you have any intuition why the last inequality is included (real numbers):
¬ 0 / 2 < d / 2 ∨ ¬ d / 2 ≤ - (d / 2) ∨ 0 / 2 ≠ - (0 / 2)
It don't see why it is necessary...
2020-08-04 12:38:5 (GMT)
Mathias Fleury said:
and the gain is 1.6s -> 95ms (really) during parsing (more precisely when creating the isabelle terms)
wow... why the big difference? It's just because you don't have go through the trouble of equating e.g. @p_2104 and veriT_sk0?
2020-08-04 12:43:8 (GMT)
Not unfolding @p_2104 makes the term (much much) smaller and easier to convert to a typed Isabelle term…
2020-08-04 12:56:37 (GMT)
This is basically also the issue I had last week at a different level (it was while importing the terms into the current context).
2020-08-04 13:23:54 (GMT)
@Haniel Barbosa I looked closer at it (and opening the unshared proof is too much for emacs, so here are some grep estimates):
$ grep "define-fun veriT_sk27" <proof> | wc -c
3 698 794
$ grep "define-fun veriT_sk27" <proof> | grep -o choice | wc -l
5 184
So we replace 5184 a choice terms (it seems to be mostly veriT_sk20). The definition of veriT_sk20 has size:
$ grep "define-fun veriT_sk20" test14.smt_out | grep choice | wc -c
780
We do some overcounting here due to the extra-parentheses and the define-fun and not all choices are veriT_sk20, but roughly the unfolding saves us all of the definition of veriT_sk27…
2020-08-04 13:32:7 (GMT)
Interesting... what I think was puzzling me was that I'd expected that you'd need to unfold the choice to use it in the proof checking
2020-08-04 13:36:45 (GMT)
I try to avoid that as much as possible by instead introducing them. But it sometimes does not work (when veriT reorders equalities) and then I unfold (and regurlarly fail to reconstruct the proof)
2020-08-04 13:45:25 (GMT)
I see. That's nice
2020-08-04 13:48:10 (GMT)
I managed to fold the term:
$ grep "define-fun veriT_sk27" <proof>| wc -c
553
(so the definition is actually 6600 times smaller with the definitions than without…)
2020-08-04 13:59:9 (GMT)
Here is an example how it happens:
We have a goal, let's say
(∀veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252. P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252) =
P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252
At a certain point, there are all definitions. They are combined with the theorem ‹x = (SOME x. ¬P x) ⟹ (∀x. P x) ⟷ P x› to get theorems of the form (∀x. P x) ⟷ P x
We attempt to unfold those theorems with the goal. If we are lucky the corresponding P always unify, so each definitions peals off one quantifier at a time. So in this case:
(∀veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252. P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252) =
P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252
would become by introducing veriT_sk249
(∀veriT_vr250 veriT_vr251 veriT_vr252. P veriT_sk249 veriT_vr250 veriT_vr251 veriT_vr252) =
P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252
and then by introducing veriT_sk250
(∀veriT_vr251 veriT_vr252. P veriT_sk249 veriT_sk250 veriT_vr251 veriT_vr252) =
P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252
and so on until
P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252 =
P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252
which we can unify and close the proof.
The advantage of this approach is that our goals never become too large.
But sometimes it does not work. In that case, we use an alternative strategy: we first simplify the definitions and the goal and then try the same pealing.
And sometimes this still does not work (because the simplifier does not reorder the terms exactly in the same way). In that case, we ~give up~ unfold all terms, but that is basically a Hail Mary.
2020-08-04 14:0:15 (GMT)
I would have said that it used the name instead of the defined fun is not what we inteded when we added this define-skolem options, so it's bit of a bug no?
2020-08-04 14:3:19 (GMT)
I always assumed that there was a technical reason that I missed…
2020-08-04 14:4:7 (GMT)
I though that I had asked you about that
2020-08-04 14:4:9 (GMT)
I don't remember ^^
2020-08-04 14:9:32 (GMT)
It is mentioned in an email from last year («veriT true as solemn» (sic, the k got lost somehow) that I sent before going on vacation), but it seems that I never mentioned it again…
2020-08-04 14:14:30 (GMT)
Using the choice term instead of the symbol avoids thinking about ordering of the definitions. But I don't know if this is necessary in veriT
2020-08-04 16:18:34 (GMT)
Thanks for the explanation, @Mathias Fleury ! I see how you do it now, cool. So if the choice definitions syntactically correspond to the skolemized formula unification alone would take care of it?
2020-08-04 17:21:9 (GMT)
Exactly. And actually, if you can, you want the terms to be syntactically equivalent. In veriT, the problem is the order of the equalities. Are equalities ordered in CVC4?
2020-08-04 17:35:1 (GMT)
After much suffering, yes, we track all changes in the order of equalities in CVC4 now.
2020-08-04 19:2:57 (GMT)
@Haniel Barbosa is your Lean checker working differently?
2020-08-04 19:6:27 (GMT)
I didn't get to quantifiers yet :)
2020-08-04 19:6:35 (GMT)
it's the next step. Currently it's only QF_UF
2020-08-05 5:41:29 (GMT)
Go, quantifiers, go!
2020-08-05 12:36:28 (GMT)
CVC4 finally produces proofs for them internally (finally!!!), now it's a matter of printing them :P
2020-08-05 12:49:55 (GMT)
:)
2020-08-05 12:50:6 (GMT)
I'll believe it when I see it.
2020-08-05 13:5:44 (GMT)
Fair enough :)
Here is a proof with quantifiers for problem https://github.com/CVC4/CVC4/blob/master/test/regress/regress0/quantifiers/bug290.smt2. The proof has a few issues related to the SAT solver (losing track of justification of some facts derived from input assumptions) but the quantifiers part is fine. :P
The calculus is this one: https://github.com/CVC4/CVC4/blob/proof-new/src/expr/proof_rule.h
2020-08-05 13:19:4 (GMT)
I tried to compile CVC4 this morning and it used so much memory, I had to (hard) restart my computer :surprise:
2020-08-05 13:21:15 (GMT)
hahaha that issue I did not have. I used to suffer a lot from compile time, but I recently bought a desktop for gaming that has the nice side effect of compiling CVC4 in reasonable time. :upside_down:
2020-08-05 13:22:35 (GMT)
Well 32GB of memory… but make -j might have been too much still
2020-08-05 13:24:59 (GMT)
How do you keep track of the resolution steps in presence of conflict minimisation in the SAT solver?
2020-08-05 13:25:30 (GMT)
adding resolution steps for each literal eliminated from the conflict clause
2020-08-05 13:25:48 (GMT)
you can eliminate them because you know that their negation is justified
2020-08-05 13:26:0 (GMT)
so adding the resolution step is guaranteed to work
2020-08-05 13:26:27 (GMT)
it is more complicated than that if you do recursive minimisation with caching
2020-08-05 13:27:27 (GMT)
because you are not sure yet whether you need the resolution step or not
2020-08-05 13:28:56 (GMT)
I can't say for sure if CVC4 is doing minimisation as you are describing, but do you mean that one could add spurious resolution steps?
2020-08-05 13:30:34 (GMT)
Well no SAT solver since picosat (or precosat, don't remember) is able to generate resolution proofs for this reason
2020-08-05 13:32:6 (GMT)
When you look at https://github.com/CVC4/CVC4/blob/proof-new/src/prop/minisat/core/Solver.cc#L1110, marking a variable as seen, means that you would be able to remove that literal by resolution.
However, if you hit the cache, you do not know why it is possible without exploring the tree again (which you do not want for performance reason).
2020-08-05 13:33:59 (GMT)
ah, I think I see what you mean
2020-08-05 13:34:17 (GMT)
I add the resolution step but the literal I'm resolving against can be left open
2020-08-05 13:34:20 (GMT)
and its justification can be filled in later
2020-08-05 13:34:29 (GMT)
so I can only traverse the tree to get the reason afterwards
2020-08-05 13:35:8 (GMT)
I'm actually doing it eagerly though
2020-08-05 13:35:39 (GMT)
(optimizing has been not the priority for now)
2020-08-05 13:36:22 (GMT)
I don't even know which of the gazillion SAT solvers supported by CVC4 you use to generate proofs :smile:
2020-08-05 13:36:45 (GMT)
it's the one you linked :)
2020-08-05 13:40:13 (GMT)
I would have to find d_proxy->getPropEngine()->addResolutionStep(out_learnt[i]); to understand how you do it, but in the SAT solver itself, I don't see where you keep track of the resolution steps in the minimisation
2020-08-05 13:41:16 (GMT)
It's done in the method you wrote above
2020-08-05 13:41:35 (GMT)
that method calls reason in the SAT solver
2020-08-05 13:41:51 (GMT)
when trying to justify ~out_learnt[i]
2020-08-05 13:42:15 (GMT)
Is it too complicated to produce RAT (DRUP?) instead, and then later extract a resolution proof?
2020-08-05 13:42:31 (GMT)
I have absolutely no idea
2020-08-05 13:42:37 (GMT)
never looked at this
2020-08-05 13:42:43 (GMT)
I see no point in doing it if we can do resolution though
2020-08-05 13:43:40 (GMT)
@Mathias Fleury This is the relevant method in the proof producing code, if you are curious: https://github.com/CVC4/CVC4/blob/ed5b4681ca98d0a194685ee52582c2a86f1b0686/src/prop/prop_engine.cpp#L766
2020-08-05 13:44:49 (GMT)
@Simon Cruanes there is a paper on this for cryptominisat
2020-08-05 13:45:19 (GMT)
I meant in the context of SMT
2020-08-05 13:45:29 (GMT)
Do not store a resolution proof, just DRUP or whatever
2020-08-05 13:45:59 (GMT)
ah, well, that is not hard to do
2020-08-05 13:46:5 (GMT)
2020-08-05 13:46:9 (GMT)
right
2020-08-05 13:46:22 (GMT)
that is how we are gonna handle bitblasting proofs
2020-08-05 13:46:44 (GMT)
the old CVC4 proof code had a hacked minisat that would give you resolution proofs for the bitblasting part
2020-08-05 13:46:52 (GMT)
what I mean is that it might be simpler than storing detailed proofs in the presence of inprocessing, conflict minimisation, etc.
2020-08-05 13:47:24 (GMT)
which is fine. But the best way of doing bitvector solving is using off-the-shelf state-of-th-art SAT solvers. And you don't wanna hack those to get resolution proofs. So you get their off-the-shelf DRAT proofs and convert them
2020-08-05 13:47:41 (GMT)
@Simon Cruanes for the core SAT solver it does not matter
2020-08-05 13:47:49 (GMT)
minisat is good enough to our purposes
2020-08-05 13:48:22 (GMT)
but if we were to use, say, KissSat or whatever as the core solver in the CDCL(T) framework we would go the route of getting a DRAT proof out of it and converting to resolution
2020-08-05 13:49:30 (GMT)
Of course, sorry, I thought you were saying that even getting perfect resolution proofs for the core CDCL solver had issues :)
2020-08-05 13:50:19 (GMT)
it has when you have to do it as you learn how the SAT solver works :upside_down:
2020-08-05 13:50:40 (GMT)
but when one knows perfectly well what one is doing it shouldn't be difficult at all
2020-08-05 13:50:49 (GMT)
I imagine it was easy for @Pascal Fontaine to do this in veriT
2020-08-05 13:53:37 (GMT)
I don't see in tryJustifyingLit where you check if the literal is already in the clause you are trying to derive, but otherwise I get the idea
2020-08-05 13:55:42 (GMT)
the resolution is l1 v ... v li v ... v ln with ~li, where li is the redundant literal and ~li is the one that method is trying to justify. In the current implementation, for other reasons that will soon disappear, the method is always overwriting a previously computed justification for ~li, if any.
2020-08-05 14:4:6 (GMT)
Assume you have l1 v l2 v l3 and are trying to justify ~l3. You get the reason currentReason = ~l3 v l1 v l4. In the for loop, you call tryJustifyingLit(~l1) and I don't see where you stop the exploration instead of going even deeper and start analysing the reason of the ~l1
2020-08-05 14:7:38 (GMT)
I am also confused why currentReason_size = currentReason.size(); is required in the for-loop and the fact that the analysis can cause a reallocation of clauses, but I must miss something here
2020-08-05 14:45:5 (GMT)
Second question: how do you generate the proofs? I have 1300 smt files with (get-proof) and I would like cvc4 to either ignore that or return a proof (just to know if CVC4 is able to solve the problems…)
2020-08-05 16:31:40 (GMT)
Mathias Fleury said:
Assume you have
l1 v l2 v l3and are trying to justify~l3. You get the reasoncurrentReason = ~l3 v l1 v l4. In the for loop, you calltryJustifyingLit(~l1)and I don't see where you stop the exploration instead of going even deeper and start analysing the reason of the~l1
You don't stop. You indeed go as high as you can.
2020-08-05 16:33:49 (GMT)
Mathias Fleury said:
I am also confused why
currentReason_size = currentReason.size();is required in the for-loop and the fact that the analysis can cause a reallocation of clauses, but I must miss something here
This was a sanity check. That part of the code was based on something on the old proof module in which the size of the reason was updated while in the loop. I had the impression this was unnecessary, so I was conservative but added the assertion to confirm my hunch.
2020-08-05 16:36:49 (GMT)
Mathias Fleury said:
Second question: Haniel Barbosa how do you generate the proofs? I have 1300 smt files with
(get-proof)and I would like cvc4 to either ignore that or return a proof (just to know if CVC4 is able to solve the problems…)
CVC4 currently has two proof modules. The standard way of getting proofs (with get-proof) will use the old one. The new one is only partially in master, in fact most of it is still is our WIP branch https://github.com/CVC4/CVC4/tree/proof-new. We are merging to master as things get stable.
To use the new proof module the options are --proof-new and --dump-proofs, which are only gonna work in the branch
2020-08-05 16:37:52 (GMT)
If I understand correctly what you want to know in for which problems CVC4 can produce a proof for with the new infrastructure?
2020-08-05 16:40:57 (GMT)
No, I just want to if CVC4 is able to prove it when proof-output is active
2020-08-05 16:41:8 (GMT)
ah
2020-08-05 16:41:35 (GMT)
do the following: remove (get-proof) from all of them, compile the proof-new branch and run with the --proof-new option
2020-08-05 16:42:17 (GMT)
that will make CVC4 use the new proof infrastructure but it'll not try to generate the final proof (connecting the proofs generated locally for preprocessing, CNF/SAT solving, theory reasoning)
2020-08-05 16:45:57 (GMT)
what theories are these problems using?
2020-08-05 17:7:31 (GMT)
AUFLIA
2020-08-05 17:8:26 (GMT)
and maybe a few AUFLIRA
2020-08-05 17:12:23 (GMT)
cool
2020-08-05 17:15:35 (GMT)
I have 362 that either veriT or Z3 can solve within 1s timeout. I am rerunning my filtering with a 12s timeout now with CVC4 too (without proof generation that should bring a few more problems in).
2020-08-05 17:16:43 (GMT)
I should run some more mirabelle to generate more problems
2020-08-05 17:26:29 (GMT)
One thing that would help the printing of veriT proofs from CVC4 was a set of benchmarks for which reconstruction works, so I can run @Hans-Jörg's veriT on them and have a ground truth of proofs to compare with
2020-08-05 17:32:35 (GMT)
I am not sure I understand what you mean. I can give you SMT files (corresponding to Isabelle theorems, but that link is lost) on which veriT can produce a proof. How does that help you? I mean: CVC4 will produce a different proof, so what do you gain?
2020-08-05 17:33:43 (GMT)
I would like to try to produce, from CVC4, proofs that can be reconstructed in Isabelle. My hope is that the benchmarks for which veriT can produce proofs that are successfully are a good start
2020-08-05 17:34:5 (GMT)
if I can produce, from CVC4, proofs at least as detailed as the proofs that veriT produces
2020-08-05 17:36:6 (GMT)
I have 360 such problems...
2020-08-05 17:36:32 (GMT)
If you could send to me I'd appreciate it :) In a few weeks I may be able to give veriT proofs for them
2020-08-05 17:40:38 (GMT)
I mean, as far as I understand there is no easy way to check these proofs, but if I can generate proofs that I can check internally for these benchmarks and that I can print in what is the same format as the one veriT is printing on is already something I'd be happy with
2020-08-05 17:41:1 (GMT)
and that might motivate doing the work of trying to integrate CVC4 in the proof reconstruction machinery eventually
2020-08-05 17:42:1 (GMT)
but please correct me if I'm working with wrong assumptions :upside_down:
2020-08-05 17:44:29 (GMT)
Haniel Barbosa said:
I mean, as far as I understand there is no easy way to check these proofs, but if I can generate proofs that I can check internally for these benchmarks and that I can print in what is the same format as the one veriT is printing on is already something I'd be happy with
That is actually exactly the plan and the reason why I am currently filtering my SMT files :smile:
2020-08-05 17:56:43 (GMT)
You want to find the blindspots for Z3/veriT?
2020-08-05 18:0:36 (GMT)
No the other motivation is to adapt the scheduling used by veriT to the problems generated by Isabelle. In particular, veriT is able to produce proofs for CVC4 (while Z3 is not often)
2020-08-05 18:2:25 (GMT)
Answer to the question: isabelle dumps all calls and not only the ones that succeed. Therefore, I have to filter out the unsuccessful ones
2020-08-05 18:5:37 (GMT)
Ah, I see now