zulip log viewer

Haniel Barbosa

2020-04-14 14:46:16 (GMT)

Let's get this started :)

Haniel Barbosa

2020-04-14 14:46:45 (GMT)

As @Simon Cruanes pointed out, we can have two things:

1/ only print the derived steps (assuming the checker knows the problem and assumptions already), or 2/ reprint type info, definitions, assumptions, and steps ("self-contained")

Haniel Barbosa

2020-04-14 14:46:58 (GMT)

I think it'd be nice to have 2

Haniel Barbosa

2020-04-14 14:47:18 (GMT)

that would make it easier to have a stand-alone checker

Haniel Barbosa

2020-04-14 14:48:1 (GMT)

But as @Hans-Jörg point out:

"too much needs to be reproduced and if you want to check if a proof is a proof for a problem you need to still do the mapping to the proof"

Haniel Barbosa

2020-04-14 14:48:11 (GMT)

which is true. So is it worth it?

Simon Cruanes

2020-04-14 14:49:56 (GMT)

Why would it be "too much"? Basically the input problem would be part of the proof (or parts of the input problem), which is fine if it's optional imho. Option 2/ is good when you want an independent proof checker, not proof reconstruction.

Hans-Jörg

2020-04-14 14:50:44 (GMT)

something to note: we already have additional define-funs for the skolems (as an option)

Hans-Jörg

2020-04-14 14:51:38 (GMT)

as a user, wouldn't I also want to ask an independent checker: "here is an smt-lib problem, here is a claimed proof: is this really a proof for the problem". for proof reconstruction one might even skip the smt-lib problem.

Simon Cruanes

2020-04-14 14:52:42 (GMT)

But parsing the whole SMTlib file might be a lot more complicated, right? if you have push/pop, set-info, etc.
What if you have several check-sat, which one is proved unsat?

Hans-Jörg

2020-04-14 14:54:22 (GMT)

right, incremental problems, my blind spot (given that veriT does not support it).

Hans-Jörg

2020-04-14 14:54:36 (GMT)

do you think the proof should follow the usage of named terms in the input, or just expand them?

Simon Cruanes

2020-04-14 14:57:5 (GMT)

hmm I don't know, the simplest would be the best I think?
But anyway, I imagine it's ok to parse the .smt2 file first, I'll just go and use dolmen (cc @Guillaume Bury :wink: ) or my smtlib parser…

Haniel Barbosa

2020-04-14 14:57:43 (GMT)

it is not very unreasonable to assume the checker always takes an original SMT problem and then its proof

Haniel Barbosa

2020-04-14 14:57:52 (GMT)

but I like the idea of having a proof as self-contained object

Simon Cruanes

2020-04-14 14:58:20 (GMT)

if the proof format is defined as an extension of smtlib2, I guess it could literally just be the concatenation of both files :)

Haniel Barbosa

2020-04-14 14:58:38 (GMT)

minus clutter (options, check-sat etc)

Hans-Jörg

2020-04-14 15:2:16 (GMT)

well for incremental problems one has to anchor the proof to some state. Is there currently some "format" that is the smt-lib input interleaved with the solver output (since the output is also somewhat standardized)?

Haniel Barbosa

2020-04-14 15:6:9 (GMT)

I don't think it's worth it to discuss incremental problems

Haniel Barbosa

2020-04-14 15:6:25 (GMT)

if one wants proofs for incremental problems write problems separately

Simon Cruanes

2020-04-14 15:17:56 (GMT)

Yeah let's not talk about incremental for now, I was just pointing out that smtlib2.6 is a rich language.

Haniel Barbosa

2020-04-14 15:21:49 (GMT)

@Simon Cruanes so concretly: could you for now take the original SMT problem and the not-self-contained proof?

Simon Cruanes

2020-04-14 15:22:16 (GMT)

Yeah I think so. Anyway I'm a long way away from a working state, but I have smtlib parsers :)

Haniel Barbosa

2020-04-14 15:23:54 (GMT)

it's a good start!

Mathias Fleury

2020-05-20 13:36:19 (GMT)

SMT: Assertions:
                   ¬ (¬ 0 ≤ int (size l') ∨
                      ¬ 10 * int (size lr) < 4 + 14 * int (size rr) ∨
                      10 * int (size lr) ≤ 15 + 25 * int (size l') ∨
                      ¬ 10 * int (size lr) + 10 * int (size rr)
                        ≤ 30 +
                          25 *
                          int (size
                                l'))
SMT: Names:
     sorts:
       A$ = 'a
       B$ = 'b
       C$ = 'c
       Nat$ = nat
     functions:
       l$ = l'
       lr$ = lr
       rr$ = rr
       size$ = size
       size$a = size
       size$b = size
       of_nat$ = int
SMT: Problem:
                           ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
                           (set-option :produce-proofs true)
                           (set-logic AUFLIA)
                           (declare-sort A$ 0)
                           (declare-sort B$ 0)
                           (declare-sort C$ 0)
                           (declare-sort Nat$ 0)
                           (declare-fun l$ () A$)
                           (declare-fun lr$ () B$)
                           (declare-fun rr$ () C$)
                           (declare-fun size$ (A$) Nat$)
                           (declare-fun size$a (B$) Nat$)
                           (declare-fun size$b (C$) Nat$)
                           (declare-fun of_nat$ (Nat$) Int)
                           (assert (! (not (or (not (<= 0 (of_nat$ (size$ l$)))) (or (not (< (* 10 (of_nat$ (size$a lr$))) (+ 4 (* 14 (of_nat$ (size$b rr$)))))) (or (<= (* 10 (of_nat$ (size$a lr$))) (+ 15 (* 25 (of_nat$ (size$ l$))))) (not (<= (+ (* 10 (of_nat$ (size$a lr$))) (* 10 (of_nat$ (size$b rr$)))) (+ 30 (* 25 (of_nat$ (size$ l$)))))))))) :named a0))
                           (check-sat)
                           (get-proof)

SMT: Invoking SMT solver "verit" ...
SMT: Solver:
SMT: Result:
      unsat
      (assume a0 (! (not (! (or (! (not (! (<= 0 (! (of_nat$ (size$ l$)) :named @p_2)) :named @p_14)) :named @p_6) (or (! (not (! (< (! (* 10 (of_nat$ (size$a lr$))) :named @p_1) (+ 4 (* 14 (! (of_nat$ (size$b rr$)) :named @p_3)))) :named @p_15)) :named @p_7) (or (! (<= @p_1 (+ 15 (! (* 25 @p_2) :named @p_4))) :named @p_8) (! (not (! (<= (+ @p_1 (* 10 @p_3)) (+ 30 @p_4)) :named @p_16)) :named @p_9)))) :named @p_5)) :named @p_10))
      (step t2 (cl (= @p_5 (! (or @p_6 @p_7 @p_8 @p_9) :named @p_11))) :rule tmp_AC_simp)
      (step t3 (cl (! (= @p_10 (! (not @p_11) :named @p_13)) :named @p_12)) :rule cong :premises (t2))
      (step t4 (cl (not @p_12) @p_5 @p_13) :rule equiv_pos2)
      (step t5 (cl @p_13) :rule th_resolution :premises (a0 t3 t4))
      (step t6 (cl @p_14) :rule not_or :premises (t5))
      (step t7 (cl @p_15) :rule not_or :premises (t5))
      (step t8 (cl (not @p_8)) :rule not_or :premises (t5))
      (step t9 (cl @p_16) :rule not_or :premises (t5))
      (step t10 (cl @p_6 @p_7 @p_8 @p_9) :rule la_generic :args (25 1 (div 12 5) (div 7 5)))
      (step t11 (cl) :rule resolution :premises (t10 t6 t7 t8 t9))

Hans-Jörg

2020-05-20 16:9:4 (GMT)

I think the problem we discussed is quite fundamental with integer programing problems that have a rational solution: of course the coefficients will be wrong, because this is (the?) rational solution. What is missing is the bounds that make the problem unsat. I will discuss with Pascal on monday what he thinks of this.
You can also expose the problem by just asserting:

(assert (> term0 0))
(assert (> 1 term0))

Mathias Fleury

2020-05-20 17:46:32 (GMT)

My understanding was always that the method used in veriT works both on LRA and LIA, but it is only complete for LRA. Your example is in that fragment: LRA-satisfiable, but not LIA satisfiable.

Mathias Fleury

2020-05-20 17:46:58 (GMT)

But that does not explain the problems with the coefficients…

Mathias Fleury

2020-07-08 11:24:2 (GMT)

@Hans-Jörg I tried to abstract over terms when reconstructing LIA and I immediately hit a wall:

Example

   ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
   (set-option :produce-proofs true)
   (set-logic AUFLIA)
   (assert (! (not (< 0 (ite (forall ((?v0 Int)) (or (< ?v0 0) (< 0 ?v0))) (- 1) 3))) :named a0))
   (check-sat)
   (get-proof)

If I look at the proof,

(step t48 (cl
       (< 0 (ite (forall ((veriT_vr0 Int)) (or (< veriT_vr0 0) (< 0 veriT_vr0))) (- 1) 3))
       (<=  (ite (forall ((veriT_vr0 Int)) (or (< veriT_vr0 0) (< 0 veriT_vr0))) (- 1) 3)
            3))
    :rule la_generic :args (1 1))

I initially abstracted completely over if-then-else, yielding 0 < XX \/ XX <= 3 (which is not provable). Do you have any idea I have to look into terms? (Do I have to look into all ifs?)

Hans-Jörg

2020-07-09 10:16:25 (GMT)

I think 0 < x ∨ x ≤ 3 is valid. If x ≤ 0 then certainly x ≤ 3, if otherwise 3 < x then 0 < x.

Mathias Fleury

2020-07-16 5:28:4 (GMT)

I might be confused again @Hans-Jörg

   ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
   (set-option :produce-proofs true)
   (set-logic AUFLIA)
   (declare-fun a$ () Int)
   (declare-fun x$ () Int)
   (assert (! (< (+ (* 3 x$) (* 7 a$)) 4) :named a0))
   (assert (! (< 3 (* 2 x$)) :named a1))
   (assert (! (not (< a$ 0)) :named a2))
   (check-sat)
   (get-proof)

SMT: Invoking SMT solver "verit" ...
SMT: Solver:
SMT: Result:
unsat

       (assume a0 (! (< (+ (* 3 x$) (* 7 a$)) 4) :named @p_3))
       (assume a1 (! (< 3 (* 2 x$)) :named @p_1))
       (assume a2 (not (! (< a$ 0) :named @p_2)))
       (step t4 (cl (not @p_1) @p_2 (not @p_3)) :rule la_generic :args (3 7 1))
       (step t5 (cl) :rule resolution :premises (t4 a0 a1 a2))

SMT: Time (ms):

The la_generic steps proves that ¬ 3 < 2 * x ∨ a < 0 ∨ ¬ 3 * x + 7 * a < 4 with the coefficients 3, 7, and 1. However, I am confused: how do the 3*x cancel out? (The first equality gives 6 * x, the second one 0*x and the last one 3 * x).

Isabelle linarith proves it with the coefficients 3, 14, and 2…

Mathias Fleury

2020-07-16 5:28:44 (GMT)

(PS: This is the last not-working example of SMT_Examples, after that time for mirabelle testing :ogre: )

Hans-Jörg

2020-07-23 8:57:20 (GMT)

hm I wonder if something like the following is happening here: 3 < 2 * x gets strengthened to 4 ≤ 2*x and then this is divided by 2 without the division being recorded. I will have a look.

Hans-Jörg

2020-07-23 13:6:30 (GMT)

It was simpler. It's fixed now veriT now prints "(div 3 2)" as the coefficient.

Mathias Fleury

2020-07-26 13:48:23 (GMT)

That fixed the problem, thanks.

Mathias Fleury

2020-07-27 13:1:18 (GMT)

Damn, I have a proof where the forall_inst is so complicated that metis does not manage to do it (after instantiating the quantifier correctly) :surprise:

Mathias Fleury

2020-07-27 13:1:43 (GMT)

(meson is able to, so the step is correct)

Hans-Jörg

2020-07-27 13:3:2 (GMT)

Is it a step where the quantified term is clausified implicitly?

Mathias Fleury

2020-07-27 13:3:14 (GMT)

yes…

Hans-Jörg

2020-07-27 13:4:15 (GMT)

Mathias Fleury

2020-07-27 13:4:19 (GMT)

well not exactly:

 ∀veriT_vr72::'c.
       ∃veriT_vr73::'c.
          ({uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1} ⟶
           F.F (λuua::'b. g uua veriT_vr72) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ (∃a::'b. g a veriT_vr72 ≠ ❙1)) ∧
          ((F.F (λuua::'b. g uua veriT_vr73) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ (∃a::'b. g a veriT_vr73 ≠ ❙1)) ⟶
           {uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1}) ⟹
    ∀veriT_vr72::'c.
       ∃veriT_vr73::'c.
          ∀a::'b.
             ∃aa::'b.
                ({uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1} ⟶
                 F.F (λuua::'b. g uua veriT_vr72) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ g aa veriT_vr72 ≠ ❙1) ∧
                ((F.F (λuua::'b. g uua veriT_vr73) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1 ⟶ g a veriT_vr73 ≠ ❙1) ⟶
                 {uu::'c. F.F (λuua::'b. g uua uu) {uu::'b. ∃b::'c. g uu b ≠ ❙1} ≠ ❙1} ⊆ {uu::'c. ∃a::'b. g a uu ≠ ❙1})

(left of ==>: instantiated assumption; right of ==>: conclusion)

Mathias Fleury

2020-07-27 13:5:25 (GMT)

it is more some kind of maxi-scoping

Mathias Fleury

2020-07-27 13:8:8 (GMT)

Okay, the simplifier is committing hara-kiri on that. It is weird.

Mathias Fleury

2020-07-27 13:17:54 (GMT)

... but easy to fix actually

Mathias Fleury

2020-07-27 18:5:13 (GMT)

@Hans-Jörg I have a lot of horrible work for you again (veriT master branch, version 5fe72ca)…

   ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=10
   (set-option :produce-proofs true)
   (set-logic AUFLIA)
   (declare-sort A$ 0)
   (declare-sort Unit$ 0)
   (declare-sort A_set$ 0)
   (declare-sort Color$ 0)
   (declare-sort A_list$ 0)
   (declare-sort A_bool_fun$ 0)
   (declare-sort A_unit_rbt$ 0)
   (declare-sort A_unit_rbt$a 0)
   (declare-sort A_unit_rbt_bool_fun$ 0)
   (declare-sort A_unit_rbt_a_unit_rbt_prod$ 0)
   (declare-sort Unit_a_unit_rbt_bool_fun_fun$ 0)
   (declare-sort A_unit_a_unit_rbt_bool_fun_fun_fun$ 0)
   (declare-sort A_unit_rbt_a_unit_rbt_bool_fun_fun$ 0)
   (declare-sort A_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun$ 0)
   (declare-sort Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$ 0)
   (declare-fun t1$ () A_unit_rbt$a)
   (declare-fun t2$ () A_unit_rbt$a)
   (declare-fun uu$ (A_set$) A_bool_fun$)
   (declare-fun bot$ () A_set$)
   (declare-fun nil$ () A_list$)
   (declare-fun set$ (A_unit_rbt$a) A_set$)
   (declare-fun uua$ () Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$)
   (declare-fun uub$ () A_unit_rbt_a_unit_rbt_bool_fun_fun$)
   (declare-fun keys$ (A_unit_rbt$) A_list$)
   (declare-fun pair$ (A_unit_rbt$ A_unit_rbt$) A_unit_rbt_a_unit_rbt_prod$)
   (declare-fun set$a (A_list$) A_set$)
   (declare-fun coset$ (A_unit_rbt$a) A_set$)
   (declare-fun empty$ () A_unit_rbt$)
   (declare-fun keys$a (A_unit_rbt$a) A_list$)
   (declare-fun branch$ (Color$ A_unit_rbt$ A$ Unit$ A_unit_rbt$) A_unit_rbt$)
   (declare-fun member$ (A$ A_set$) Bool)
   (declare-fun uminus$ (A_set$) A_set$)
   (declare-fun collect$ (A_bool_fun$) A_set$)
   (declare-fun fun_app$ (A_unit_rbt_bool_fun$ A_unit_rbt$) Bool)
   (declare-fun impl_of$ (A_unit_rbt$a) A_unit_rbt$)
   (declare-fun less_eq$ (A_set$ A_set$) Bool)
   (declare-fun case_rbt$ (Bool Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$ A_unit_rbt$) Bool)
   (declare-fun fun_app$a (A_unit_rbt_a_unit_rbt_bool_fun_fun$ A_unit_rbt$) A_unit_rbt_bool_fun$)
   (declare-fun fun_app$b (A_bool_fun$ A$) Bool)
   (declare-fun fun_app$c (Unit_a_unit_rbt_bool_fun_fun$ Unit$) A_unit_rbt_bool_fun$)
   (declare-fun fun_app$d (A_unit_a_unit_rbt_bool_fun_fun_fun$ A$) Unit_a_unit_rbt_bool_fun_fun$)
   (declare-fun fun_app$e (A_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun$ A_unit_rbt$) A_unit_a_unit_rbt_bool_fun_fun_fun$)
   (declare-fun fun_app$f (Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$ Color$) A_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun$)
   (declare-fun case_prod$ (A_unit_rbt_a_unit_rbt_bool_fun_fun$ A_unit_rbt_a_unit_rbt_prod$) Bool)
   (declare-fun undefined$ () A$)
   (declare-fun rbt_min_opt$ (A_unit_rbt$) A$)
   (assert (! (forall ((?v0 A_unit_rbt$) (?v1 A_unit_rbt$)) (! (= (fun_app$ (fun_app$a uub$ ?v0) ?v1) (case_rbt$ (case_rbt$ false uua$ ?v1) uua$ ?v0)) :pattern ((fun_app$ (fun_app$a uub$ ?v0) ?v1)))) :named a0))
   (assert (! (forall ((?v0 A_set$) (?v1 A$)) (! (= (fun_app$b (uu$ ?v0) ?v1) (not (member$ ?v1 ?v0))) :pattern ((fun_app$b (uu$ ?v0) ?v1)))) :named a1))
   (assert (! (forall ((?v0 Color$) (?v1 A_unit_rbt$) (?v2 A$) (?v3 Unit$) (?v4 A_unit_rbt$)) (! (= (fun_app$ (fun_app$c (fun_app$d (fun_app$e (fun_app$f uua$ ?v0) ?v1) ?v2) ?v3) ?v4) (less_eq$ (coset$ t1$) (set$ t2$))) :pattern ((fun_app$ (fun_app$c (fun_app$d (fun_app$e (fun_app$f uua$ ?v0) ?v1) ?v2) ?v3) ?v4)))) :named a2))
   (assert (! (forall ((?v0 A_set$)) (= (uminus$ ?v0) (collect$ (uu$ ?v0)))) :named a3))
   (assert (! (forall ((?v0 A_unit_rbt$a)) (= (uminus$ (set$ ?v0)) (coset$ ?v0))) :named a4))
   (assert (! (forall ((?v0 A$)) (= (member$ ?v0 bot$) false)) :named a5))
   (assert (! (forall ((?v0 A_unit_rbt$a)) (= (keys$ (impl_of$ ?v0)) (keys$a ?v0))) :named a6))
   (assert (! (= (keys$ empty$) nil$) :named a7))
   (assert (! (= (set$a nil$) bot$) :named a8))
   (assert (! (forall ((?v0 A$) (?v1 A_bool_fun$)) (= (member$ ?v0 (collect$ ?v1)) (fun_app$b ?v1 ?v0))) :named a9))
   (assert (! (forall ((?v0 A_unit_rbt_a_unit_rbt_bool_fun_fun$) (?v1 A_unit_rbt$) (?v2 A_unit_rbt$)) (= (case_prod$ ?v0 (pair$ ?v1 ?v2)) (fun_app$ (fun_app$a ?v0 ?v1) ?v2))) :named a10))
   (assert (! (forall ((?v0 Bool) (?v1 Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$)) (= (case_rbt$ ?v0 ?v1 empty$) ?v0)) :named a11))
   (assert (! (forall ((?v0 Bool) (?v1 Color_a_unit_rbt_a_unit_a_unit_rbt_bool_fun_fun_fun_fun_fun$) (?v2 Color$) (?v3 A_unit_rbt$) (?v4 A$) (?v5 Unit$) (?v6 A_unit_rbt$)) (= (case_rbt$ ?v0 ?v1 (branch$ ?v2 ?v3 ?v4 ?v5 ?v6)) (fun_app$ (fun_app$c (fun_app$d (fun_app$e (fun_app$f ?v1 ?v2) ?v3) ?v4) ?v5) ?v6))) :named a12))
   (assert (! (forall ((?v0 A_unit_rbt$) (?v1 A$)) (=> (and (= (rbt_min_opt$ ?v0) ?v1) (and (forall ((?v2 Color$) (?v3 A$) (?v4 Unit$) (?v5 A_unit_rbt$)) (=> (and (= ?v0 (branch$ ?v2 empty$ ?v3 ?v4 ?v5)) (= ?v1 ?v3)) false)) (and (forall ((?v2 Color$) (?v3 Color$) (?v4 A_unit_rbt$) (?v5 A$) (?v6 Unit$) (?v7 A_unit_rbt$) (?v8 A$) (?v9 Unit$) (?v10 A_unit_rbt$)) (=> (and (= ?v0 (branch$ ?v2 (branch$ ?v3 ?v4 ?v5 ?v6 ?v7) ?v8 ?v9 ?v10)) (= ?v1 (rbt_min_opt$ (branch$ ?v3 ?v4 ?v5 ?v6 ?v7)))) false)) (=> (and (= ?v0 empty$) (= ?v1 undefined$)) false)))) false)) :named a13))
   (assert (! (forall ((?v0 A_unit_rbt$a)) (= (set$ ?v0) (set$a (keys$a ?v0)))) :named a14))
   (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (= (less_eq$ ?v0 ?v1) (forall ((?v2 A$)) (=> (member$ ?v2 ?v0) (member$ ?v2 ?v1))))) :named a15))
   (assert (! (not (= (less_eq$ (coset$ t1$) (set$ t2$)) (case_prod$ uub$ (pair$ (impl_of$ t1$) (impl_of$ t2$))))) :named a16))
   (check-sat)
   (get-proof)

The problematic step is (step t95 (cl @p_324) :rule th_resolution :premises (a13 t93 t94)).

The assumptions are:

a13: @p323
t93: (forall ((?v0 A_unit_rbt$) (?v1 A$)) @p_321) = @p324
t94: @p323 != @p324 \/ ~@p323 \/ @p324

But (forall ((?v0 A_unit_rbt$) (?v1 A$)) @p_321) should actually be @p323 (and they are equal up to some one-point rule application). t93 is resulting from a qnt_simplify.

(I might be wrong with the diagnostic…)

Mathias Fleury

2020-07-27 18:14:25 (GMT)

(and if there is any way to give you more useful information, please tell me)

Hans-Jörg

2020-07-27 18:30:46 (GMT)

I will look into it. I made good headway today with removing implicit double neg elimination and implicit removal of repeated literals (+ implicit simplification to True, but I'm not sure this ever appears in a proof printed). Btw. 5fe72ca has a bug where la_generic does an out of bounds write which is fixed on HEAD.

Mathias Fleury

2020-07-27 18:36:39 (GMT)

I don't see a newer commit, but there is no LA/LIA involved in this proof.

Hans-Jörg

2020-07-27 18:39:0 (GMT)

huh my master is at "5a8b16531c27529d7dcc548135fe85578e1d689a". But yeah that bug is unrelated, but you might get spurious crashes.

Mathias Fleury

2020-07-27 18:45:51 (GMT)

I already have that commit but the version number is not updated… After git clean -xdf and recompiling, it is fine now

Hans-Jörg

2020-07-27 19:1:57 (GMT)

ah ok need to fix that at some point

Mathias Fleury

2020-07-28 5:47:18 (GMT)

I have now finished checking the failed smt calls generated by mirabelle on HOL-Library with cvc4 and veriT :octopus:

Mathias Fleury

2020-07-28 8:9:48 (GMT)

Interesting, failed proofs generated by e (that even metis is not able to reconstruct) seems to trigger a lot of issues...

Me being confused by this proof step:

 (step t102.t15 (cl true) :rule equiv_pos2)

test7.smt2

This does not look like a valid application of equiv_pos2 (except if it generates a tautology that is immediately simplified… but why do you generate that proof step and not a true step?)

Mathias Fleury

2020-07-28 9:14:43 (GMT)

Assumption most likely correct, here is a simpler example:

   ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
   (set-option :produce-proofs true)
   (set-logic AUFLIA)
   (declare-sort A$ 0)
   (declare-sort A_stream$ 0)
   (declare-sort A_bool_fun$ 0)
   (declare-sort Bool_bool_fun$ 0)
   (declare-fun p$ () A_bool_fun$)
   (declare-fun s$ () A_stream$)
   (declare-fun x$ () A$)
   (declare-fun uu$ () Bool_bool_fun$)
   (declare-fun shd$ (A_stream$) A$)
   (declare-fun stl$ (A_stream$) A_stream$)
   (declare-fun comp$ (Bool_bool_fun$ A_bool_fun$) A_bool_fun$)
   (declare-fun sCons$ (A$ A_stream$) A_stream$)
   (declare-fun fun_app$ (Bool_bool_fun$ Bool) Bool)
   (declare-fun sfilter$ (A_bool_fun$ A_stream$) A_stream$)
   (declare-fun fun_app$a (A_bool_fun$ A$) Bool)
   (declare-fun sdrop_while$ (A_bool_fun$ A_stream$) A_stream$)
   (assert (! (forall ((?v0 Bool)) (! (= (fun_app$ uu$ ?v0) (not ?v0)) :pattern ((fun_app$ uu$ ?v0)))) :named a0))
   (assert (! (forall ((?v0 Bool_bool_fun$) (?v1 A_bool_fun$) (?v2 A$)) (= (fun_app$a (comp$ ?v0 ?v1) ?v2) (fun_app$ ?v0 (fun_app$a ?v1 ?v2)))) :named a1))
   (assert (! (forall ((?v0 A_bool_fun$) (?v1 A$) (?v2 A_stream$)) (= (sdrop_while$ ?v0 (sCons$ ?v1 ?v2)) (ite (fun_app$a ?v0 ?v1) (sdrop_while$ ?v0 ?v2) (sCons$ ?v1 ?v2)))) :named a2))
   (assert (! (forall ((?v0 A_bool_fun$) (?v1 A_stream$)) (= (shd$ (sfilter$ ?v0 ?v1)) (shd$ (sdrop_while$ (comp$ uu$ ?v0) ?v1)))) :named a3))
   (assert (! (forall ((?v0 A_bool_fun$) (?v1 A_stream$)) (= (stl$ (sfilter$ ?v0 ?v1)) (sfilter$ ?v0 (stl$ (sdrop_while$ (comp$ uu$ ?v0) ?v1))))) :named a4))
   (assert (! (forall ((?v0 A_stream$)) (=> (forall ((?v1 A$) (?v2 A_stream$)) (=> (= ?v0 (sCons$ ?v1 ?v2)) false)) false)) :named a5))
   (assert (! (forall ((?v0 A$) (?v1 A_stream$)) (= (shd$ (sCons$ ?v0 ?v1)) ?v0)) :named a6))
   (assert (! (forall ((?v0 A$) (?v1 A_stream$)) (= (stl$ (sCons$ ?v0 ?v1)) ?v1)) :named a7))
   (assert (! (not (=> (not (fun_app$a p$ x$)) (= (sCons$ (shd$ (sdrop_while$ (comp$ uu$ p$) (sCons$ x$ s$))) (sfilter$ p$ (stl$ (sdrop_while$ (comp$ uu$ p$) (sCons$ x$ s$))))) (ite (fun_app$a p$ x$) (sCons$ x$ (sCons$ (shd$ (sdrop_while$ (comp$ uu$ p$) s$)) (sfilter$ p$ (stl$ (sdrop_while$ (comp$ uu$ p$) s$))))) (sfilter$ p$ s$))))) :named a8))
   (check-sat)
   (get-proof)

The relevant proof step:

(step t119.t14 (cl (= (not (not (= (sfilter$ p$ s$) (sCons$ veriT_sk52 veriT_sk53)))) (= (sfilter$ p$ s$) (sCons$ veriT_sk52 veriT_sk53)))) :rule not_simplify)
(step t119.t15 (cl true) :rule equiv_pos2)

equiv_pos2 is

 cl (= (not (not X)) X) (not (not (not X))) X)

aka, after double-negation simplification

cl  (= (not (not X)) X) (not X) X)

which is a tautology. Now I can fix the proof reconstruction to support, but I am not sure that this is really the intended use of equiv_pos2...

Hans-Jörg

2020-07-28 13:5:26 (GMT)

this is one of the implicit simplifications that happens for proof steps and that I'm working on removing. Actually this is a good test example because it crashes my branch ^^

Mathias Fleury

2020-07-28 13:9:16 (GMT)

I am unsure however, why this tautology is produced. Somehow it feels like this is unnecessary in a proof and a simplification step could catch that…

Hans-Jörg

2020-07-28 13:10:40 (GMT)

yeah I'm wondering the same, to cite myself: " (+ implicit simplification to True, but I'm not sure this ever appears in a proof printed)" this seems kinda useless, but maybe it's not catched by the pruning

Mathias Fleury

2020-07-28 13:25:29 (GMT)

what actually breaks if tautologies are not simplified to true?

Jasmin Blanchette

2020-07-28 13:42:38 (GMT)

Nothing, surely. You can formulate arbitrary complicated tautologies, and checking whether they are tautologies is undecidable after all.

Mathias Fleury

2020-07-28 14:21:36 (GMT)

Abstractly I agree with you. In veriT, I don't know… maybe some function is assuming that syntactic tautologies have been simplified and are not pruned from the output…

Jasmin Blanchette

2020-07-28 15:10:49 (GMT)

Of course.

Mathias Fleury

2020-07-28 18:14:15 (GMT)

Wow, veriT sometimes uses la_generic to prove that 0 != 1, that is unexpected

Hans-Jörg

2020-07-28 18:21:39 (GMT)

yeah I think there is no prepossessing step for that

Mathias Fleury

2020-07-28 19:10:41 (GMT)

The only reason I saw that step, is actually that my reconstruction was missing some rules for reals (a = b ==> a * p / n = b * p / n)

Mathias Fleury

2020-07-29 10:30:0 (GMT)

I am working on a proof where the choices generated are so big that just theAssumption.add_assumes (used to define them) takes more than 1s...

Mathias Fleury

2020-07-29 10:31:10 (GMT)

and iterating over the terms is too slow

Jasmin Blanchette

2020-07-29 11:18:49 (GMT)

You are referring to Hilbert choice?

Jasmin Blanchette

2020-07-29 11:18:53 (GMT)

Yes, these can get big.

Mathias Fleury

2020-07-29 11:24:30 (GMT)

Yeah Hilbert choice. Let's try to decrease the term size.

Mathias Fleury

2020-07-29 11:42:2 (GMT)

To give an idea of the size, even pretty printing via @{print} throws an exception…

Haniel Barbosa

2020-07-29 12:34:54 (GMT)

Is that with the version of the proof with sharing?

Mathias Fleury

2020-07-29 12:53:35 (GMT)

There is no sharing on the Isabelle side.

beginning of the term

 "SOME veriT_vr228.
     ¬ ¬ (lnth Sts (j - 1) =
           (veriT_vr228 ∪
            {add_mset
              (SOME veriT_vr225.
                  ¬ (∀veriT_vr226 veriT_vr227 veriT_vr228.
                         ¬ (lnth Sts (j - 1) =
                             (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227},
                              SOME veriT_vr223.
                                 ¬ (∀veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                        ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                            lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                            add_mset
                                             (SOME veriT_vr222.
                                                 ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                        ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                            lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                            add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧ - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)))
                                             veriT_sk20
                                            ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                            - veriT_vr225 =
                                            (SOME veriT_vr222.
                                                ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                       ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧ - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227))) ⋅l
                                            veriT_vr226 ∧
                                            veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)),
                              SOME veriT_vr224.
                                 ¬ (∀veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                        ¬ (lnth Sts (j - 1) =
                                            (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227},
                                             SOME veriT_vr223.
                                                ¬ (∀veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                       ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           add_mset
                                                            (SOME veriT_vr222.
                                                                ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                                       ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                           lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                           add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                                           - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)))
                                                            veriT_sk20
                                                           ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                           - veriT_vr225 =
                                                           (SOME veriT_vr222.
                                                               ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                                      ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                          lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                          add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                                          - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227))) ⋅l
                                                           veriT_vr226 ∧
                                                           veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)),
                                             veriT_vr224) ∧
                                            lnth Sts j =
                                            (veriT_vr228 ∪ {veriT_vr227},
                                             SOME veriT_vr223.
                                                ¬ (∀veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                       ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                           add_mset
                                                            (SOME veriT_vr222.
                                                                ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                                       ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                           lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                           add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                                           - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)))
                                                            veriT_sk20
                                                           ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                           - veriT_vr225 =
                                                           (SOME veriT_vr222.
                                                               ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                                      ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                          lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                                          add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧
                                                                          - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227))) ⋅l
                                                           veriT_vr226 ∧
                                                           veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)),
                                             veriT_vr224) ∧
                                            add_mset
                                             (SOME veriT_vr222.
                                                 ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                        ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                            lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                            add_mset veriT_vr222 veriT_sk20 ∈ veriT_vr223 ∪ veriT_vr224 ∧ - veriT_vr225 = veriT_vr222 ⋅l veriT_vr226 ∧ veriT_sk20 ⋅ veriT_vr226 ⊆# veriT_vr227)))
                                             veriT_sk20
                                            ∈ (SOME veriT_vr223.
                                                   ¬ (∀veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.
                                                          ¬ (lnth Sts (j - 1) = (veriT_vr228 ∪ {add_mset veriT_vr225 veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                              lnth Sts j = (veriT_vr228 ∪ {veriT_vr227}, veriT_vr223, veriT_vr224) ∧
                                                              add_mset
                                                               (SOME veriT_vr222.
                                                                   ¬ (∀veriT_vr223 veriT_vr224 veriT_vr225 veriT_vr226 veriT_vr227 veriT_vr228.

Mathias Fleury

2020-07-29 12:55:5 (GMT)

I am currently trying to fold the other choices to reduce the size (every SOME is actually a constant), but even iterating over the term is very slow, so I cannot use the default Isabelle method.

Mathias Fleury

2020-07-29 12:56:33 (GMT)

And I have to kill Isabelle once an hour, because java runs out of memory when outputting the trace I need to understand what is happening…

Mathias Fleury

2020-07-29 13:4:12 (GMT)

after folding

Mathias Fleury

2020-07-29 14:3:16 (GMT)

@Hans-Jörg @Haniel Barbosa is generating twice the same Hilbert choice (except for the name) intended or not?

Mathias Fleury

2020-07-29 14:5:32 (GMT)

In the problem

       ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
       (set-option :produce-proofs true)
       (set-logic AUFLIA)
       (declare-sort A$ 0)
       (declare-sort Nat$ 0)
       (declare-sort A_set$ 0)
       (declare-sort A_list$ 0)
       (declare-sort A_set_nat_fun$ 0)
       (declare-sort A_set_bool_fun$ 0)
       (declare-sort A_set_a_set_bool_fun_fun$ 0)
       (declare-fun uu$ () A_set_a_set_bool_fun_fun$)
       (declare-fun xs$ () A_list$)
       (declare-fun ys$ () A_list$)
       (declare-fun bot$ () A_set$)
       (declare-fun inf$ (A_set$ A_set$) A_set$)
       (declare-fun rhs$ () Bool)
       (declare-fun set$ (A_list$) A_set$)
       (declare-fun sup$ (A_set$ A_set$) A_set$)
       (declare-fun top$ () A_set$)
       (declare-fun card$ () A_set_nat_fun$)
       (declare-fun less$ (Nat$ Nat$) Bool)
       (declare-fun plus$ (Nat$ Nat$) Nat$)
       (declare-fun zero$ () Nat$)
       (declare-fun card$a () A_set_nat_fun$)
       (declare-fun append$ (A_list$ A_list$) A_list$)
       (declare-fun eq_set$ () A_set_a_set_bool_fun_fun$)
       (declare-fun finite$ (A_set$) Bool)
       (declare-fun length$ (A_list$) Nat$)
       (declare-fun member$ (A$ A_set$) Bool)
       (declare-fun subset$ () A_set_a_set_bool_fun_fun$)
       (declare-fun uminus$ (A_set$) A_set$)
       (declare-fun fun_app$ (A_set_bool_fun$ A_set$) Bool)
       (declare-fun less_eq$ () A_set_a_set_bool_fun_fun$)
       (declare-fun remdups$ (A_list$) A_list$)
       (declare-fun fun_app$a (A_set_a_set_bool_fun_fun$ A_set$) A_set_bool_fun$)
       (declare-fun fun_app$b (A_set_nat_fun$ A_set$) Nat$)
       (declare-fun list_coset$ (A_list$) A_set$)
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (! (= (fun_app$ (fun_app$a uu$ ?v0) ?v1) (= ?v0 ?v1)) :pattern ((fun_app$ (fun_app$a uu$ ?v0) ?v1)))) :named a0))
       (assert (! (forall ((?v0 A_list$)) (= (fun_app$ (fun_app$a subset$ (list_coset$ xs$)) (set$ ?v0)) (and (less$ zero$ (fun_app$b card$ top$)) (= (fun_app$b card$ (set$ (append$ xs$ ?v0))) (fun_app$b card$ top$))))) :named a1))
       (assert (! (forall ((?v0 A$) (?v1 A_set$)) (=> (member$ ?v0 (uminus$ ?v1)) (not (member$ ?v0 ?v1)))) :named a2))
       (assert (! (forall ((?v0 A_list$)) (= (fun_app$b card$ (set$ ?v0)) (length$ (remdups$ ?v0)))) :named a3))
       (assert (! (forall ((?v0 A_list$)) (finite$ (set$ ?v0))) :named a4))
       (assert (! (forall ((?v0 A$)) (member$ ?v0 top$)) :named a5))
       (assert (! (not (= top$ bot$)) :named a6))
       (assert (! (forall ((?v0 A$) (?v1 A_set$) (?v2 A_set$)) (=> (and (member$ ?v0 (sup$ ?v1 ?v2)) (and (=> (member$ ?v0 ?v1) false) (=> (member$ ?v0 ?v2) false))) false)) :named a7))
       (assert (! (forall ((?v0 Nat$) (?v1 Nat$)) (= (plus$ ?v0 ?v1) (plus$ ?v1 ?v0))) :named a8))
       (assert (! (forall ((?v0 Nat$) (?v1 Nat$)) (= (= ?v0 (plus$ ?v0 ?v1)) (= ?v1 zero$))) :named a9))
       (assert (! (= (fun_app$b card$a (set$ xs$)) (length$ (remdups$ xs$))) :named a10))
       (assert (! (= card$a card$) :named a11))
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (=> (and (finite$ ?v0) (finite$ ?v1)) (= (plus$ (fun_app$b card$ ?v0) (fun_app$b card$ ?v1)) (plus$ (fun_app$b card$ (sup$ ?v0 ?v1)) (fun_app$b card$ (inf$ ?v0 ?v1)))))) :named a12))
       (assert (! (forall ((?v0 A_set$)) (= (= (fun_app$b card$ ?v0) zero$) (or (= ?v0 bot$) (not (finite$ ?v0))))) :named a13))
       (assert (! (forall ((?v0 A_set$)) (=> (and (finite$ top$) (= (fun_app$b card$ ?v0) (fun_app$b card$ top$))) (= ?v0 top$))) :named a14))
       (assert (! (forall ((?v0 A_list$)) (= (uminus$ (list_coset$ ?v0)) (set$ ?v0))) :named a15))
       (assert (! (forall ((?v0 A_list$)) (= (uminus$ (set$ ?v0)) (list_coset$ ?v0))) :named a16))
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (= (= (inf$ ?v0 ?v1) bot$) (forall ((?v2 A$)) (=> (member$ ?v2 ?v0) (not (member$ ?v2 ?v1)))))) :named a17))
       (assert (! (= eq_set$ uu$) :named a18))
       (assert (! (forall ((?v0 A_set$)) (=> (finite$ ?v0) (= (finite$ (uminus$ ?v0)) (finite$ top$)))) :named a19))
       (assert (! (= rhs$ (ite (= (fun_app$b card$ top$) zero$) false (and (= (plus$ (length$ (remdups$ xs$)) (length$ (remdups$ ys$))) (fun_app$b card$ top$)) (and (forall ((?v0 A$)) (=> (member$ ?v0 (set$ (remdups$ xs$))) (not (member$ ?v0 (set$ (remdups$ ys$)))))) (forall ((?v0 A$)) (=> (member$ ?v0 (set$ (remdups$ ys$))) (not (member$ ?v0 (set$ (remdups$ xs$)))))))))) :named a20))
       (assert (! (forall ((?v0 A_list$) (?v1 A_list$)) (= (set$ (append$ ?v0 ?v1)) (sup$ (set$ ?v0) (set$ ?v1)))) :named a21))
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (= (= ?v0 ?v1) (and (fun_app$ (fun_app$a less_eq$ ?v0) ?v1) (fun_app$ (fun_app$a less_eq$ ?v1) ?v0)))) :named a22))
       (assert (! (forall ((?v0 A_list$)) (= (set$ (remdups$ ?v0)) (set$ ?v0))) :named a23))
       (assert (! (= subset$ less_eq$) :named a24))
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (=> (forall ((?v2 A$)) (=> (member$ ?v2 ?v0) (member$ ?v2 ?v1))) (fun_app$ (fun_app$a less_eq$ ?v0) ?v1))) :named a25))
       (assert (! (forall ((?v0 A_set$) (?v1 A_set$)) (=> (and (fun_app$ (fun_app$a less_eq$ ?v0) ?v1) (fun_app$ (fun_app$a less_eq$ ?v1) ?v0)) (= ?v0 ?v1))) :named a26))
       (assert (! (forall ((?v0 A_set$) (?v1 A_list$)) (= (fun_app$ (fun_app$a less_eq$ ?v0) (list_coset$ ?v1)) (forall ((?v2 A$)) (=> (member$ ?v2 (set$ ?v1)) (not (member$ ?v2 ?v0)))))) :named a27))
       (assert (! (not (= (fun_app$ (fun_app$a eq_set$ (list_coset$ xs$)) (set$ ys$)) rhs$)) :named a28))
       (check-sat)
       (get-proof)
 ````


The Hilbert choices `veriT_sk4` and `veriT_sk15` are identical (up to the name)

   unsat
   (define-fun veriT_sk0 () A$ (choice ((veriT_vr57 A$)) (not (! (=> (! (member$ veriT_vr57 (! (set$ (! (remdups$ xs$) :named @p_9)) :named @p_380)) :named @p_429) (! (not (! (member$ veriT_vr57 (! (set$ (! (remdups$ ys$) :named @p_10)) :named @p_381)) :named @p_431)) :named @p_432)) :named @p_433))))
   (define-fun veriT_sk1 () A$ (choice ((veriT_vr58 A$)) (not (! (=> (! (member$ veriT_vr58 @p_381) :named @p_434) (! (not (! (member$ veriT_vr58 @p_380) :named @p_436)) :named @p_437)) :named @p_438))))
   (define-fun veriT_sk4 () A$ (choice ((veriT_vr97 A$)) (not (! (=> (! (member$ veriT_vr97 (! (set$ xs$) :named @p_713)) :named @p_813) (! (not (! (member$ veriT_vr97 @p_381) :named @p_815)) :named @p_816)) :named @p_817))))
   (define-fun veriT_sk13 () A$ (choice ((veriT_vr121 A$)) (not (! (=> (! (member$ veriT_vr121 (! (list_coset$ xs$) :named @p_21)) :named @p_860) (! (member$ veriT_vr121 (! (set$ (append$ xs$ @p_10)) :named @p_844)) :named @p_862)) :named @p_863))))
   (define-fun veriT_sk15 () A$ (choice ((veriT_vr126 A$)) (not (! (=> (! (member$ veriT_vr126 @p_713) :named @p_905) (! (not (! (member$ veriT_vr126 @p_381) :named @p_907)) :named @p_908)) :named @p_909))))

```

Haniel Barbosa

2020-07-29 14:7:1 (GMT)

You mean it's generating alpha-equivalent choice functions?

Mathias Fleury

2020-07-29 14:7:35 (GMT)

exactly

Haniel Barbosa

2020-07-29 14:8:35 (GMT)

Let me take a look, sec

Haniel Barbosa

2020-07-29 14:9:18 (GMT)

this is on Hans' fork, I imagine?

Mathias Fleury

2020-07-29 14:9:31 (GMT)

Yes.

Haniel Barbosa

2020-07-29 14:9:37 (GMT)

Mathias Fleury

2020-07-29 14:13:16 (GMT)

no difference in the devel branch of veriT

Haniel Barbosa

2020-07-29 14:18:4 (GMT)

The original formula has something like (forall x . F) ^ (forall x . F), which occur negatively and are being skolemized, I believe. veriT is kinda limited in how it does alpha-equivalence elimination

Haniel Barbosa

2020-07-29 14:18:40 (GMT)

to avoid what you are seeing it'd be necessary to improve the alpha-equivalence elimination before skolemization happens

Haniel Barbosa

2020-07-29 14:19:25 (GMT)

Hopefully that is not a critical issue?

Mathias Fleury

2020-07-29 14:24:8 (GMT)

something in my reconstruction does not like it. I just have to find out what...

Mathias Fleury

2020-07-29 14:51:23 (GMT)

In the reconstruction of sko_forall, I was relying on the form x = choice ..., which did not happen anymore, because I was too aggressively folding in skolem terms (in this case instead of sk15 = choice ..., I had sk15 = sk4)

Haniel Barbosa

2020-07-29 14:54:12 (GMT)

If you used the same skolem for both choices it should be ok, though?

Mathias Fleury

2020-07-29 14:55:41 (GMT)

currently, I try to stay as close as possible to veriT (easier to debug)

Mathias Fleury

2020-07-29 14:56:2 (GMT)

but yes, this is one of the simplifications, I should probably implement

Hans-Jörg

2020-07-29 19:31:55 (GMT)

@Mathias Fleury I tried to fix the qnt_tidy problem you had (after discussion with @Haniel Barbosa ). A prior proof was lost and now a transitivity step is added which tries to catch it. But I'm not sure if it's correct maybe you can just try and let me know that it doesn't work ^^

Mathias Fleury

2020-07-30 5:25:46 (GMT)

I first merged the no-implicit-simplification branch and it works, thanks @Hans-Jörg and @Haniel Barbosa !

Hans-Jörg

2020-07-31 14:10:39 (GMT)

I improved proof pruning to prune those true steps (and in general inside subproofs) I hope it breaks nothing ^^

Mathias Fleury

2020-07-31 18:34:31 (GMT)

none of my test broke

Mathias Fleury

2020-08-04 9:44:5 (GMT)

@Hans-Jörg how hard would it be to not generate bindings for choice and directly use the constant instead:
For example, if the proof contains
(define-fun veriT_sk0 () A$ (! (choice ((veriT_vr201 A$)) ... :named @p_2104))
use veriT_sk0 instead of @p_2104
?

Mathias Fleury

2020-08-04 9:51:22 (GMT)

forget it, it is two lines in isabelle

Mathias Fleury

2020-08-04 9:51:51 (GMT)

and the gain is 1.6s -> 95ms (really) during parsing (more precisely when creating the isabelle terms)

Hans-Jörg

2020-08-04 11:35:50 (GMT)

cool! Changing that should not be to difficult, but I have to look at the code again.

Mathias Fleury

2020-08-04 12:5:46 (GMT)

I don't really expect an answer, but do you have any intuition why the last inequality is included (real numbers):

¬ 0 / 2 < d / 2 ∨ ¬ d / 2 ≤ - (d / 2) ∨ 0 / 2 ≠ - (0 / 2)

It don't see why it is necessary...

       ; --proof-with-sharing --index-fresh-sorts --proof-define-skolems --proof-prune --proof-merge --disable-print-success --disable-banner --max-time=20
       (set-option :produce-proofs true)
       (set-logic AUFLIRA)
       (declare-sort Real_real_prod$ 0)
       (declare-sort Real_real_real_prod_fun$ 0)
       (declare-fun d$ () Real)
       (declare-fun uu$ () Real_real_real_prod_fun$)
       (declare-fun uua$ () Real_real_real_prod_fun$)
       (declare-fun uub$ () Real_real_real_prod_fun$)
       (declare-fun uuc$ () Real_real_real_prod_fun$)
       (declare-fun pair$ (Real Real) Real_real_prod$)
       (declare-fun times$ (Real Real) Real)
       (declare-fun divide$ (Real Real) Real)
       (declare-fun fun_app$ (Real_real_real_prod_fun$ Real) Real_real_prod$)
       (declare-fun diamond_y$ (Real) Real)
       (assert (! (forall ((?v0 Real)) (! (= (fun_app$ uuc$ ?v0) (pair$ (times$ (- ?v0 (divide$ 1.0 2.0)) d$) (diamond_y$ (times$ (- ?v0 (divide$ 1.0 2.0)) d$)))) :pattern ((fun_app$ uuc$ ?v0)))) :named a0))
       (assert (! (forall ((?v0 Real)) (! (= (fun_app$ uua$ ?v0) (pair$ (times$ (- ?v0 (divide$ 1.0 2.0)) d$) (- (diamond_y$ (times$ (- ?v0 (divide$ 1.0 2.0)) d$))))) :pattern ((fun_app$ uua$ ?v0)))) :named a1))
       (assert (! (forall ((?v0 Real)) (! (= (fun_app$ uu$ ?v0) (pair$ (divide$ d$ 2.0) (times$ (- (* 2.0 ?v0) 1.0) (diamond_y$ (divide$ d$ 2.0))))) :pattern ((fun_app$ uu$ ?v0)))) :named a2))
       (assert (! (forall ((?v0 Real)) (! (= (fun_app$ uub$ ?v0) (pair$ (- (divide$ d$ 2.0)) (times$ (- (* 2.0 ?v0) 1.0) (diamond_y$ (- (divide$ d$ 2.0)))))) :pattern ((fun_app$ uub$ ?v0)))) :named a3))
       (assert (! (< 0.0 d$) :named a4))
       (assert (! (forall ((?v0 Real)) (= (diamond_y$ ?v0) (- (divide$ d$ 2.0) (ite (< ?v0 0.0) (- ?v0) ?v0)))) :named a5))
       (assert (! (forall ((?v0 Real) (?v1 Real) (?v2 Real)) (= (< (divide$ ?v0 ?v1) (divide$ ?v2 ?v1)) (and (=> (< 0.0 ?v1) (< ?v0 ?v2)) (and (=> (< ?v1 0.0) (< ?v2 ?v0)) (not (= ?v1 0.0)))))) :named a6))
       (assert (! (forall ((?v0 Real) (?v1 Real) (?v2 Real)) (= (< (divide$ ?v0 ?v1) (divide$ ?v2 ?v1)) (and (=> (< 0.0 ?v1) (< ?v0 ?v2)) (and (=> (< ?v1 0.0) (< ?v2 ?v0)) (not (= ?v1 0.0)))))) :named a7))
       (assert (! (forall ((?v0 Real) (?v1 Real)) (= (divide$ (- ?v0) ?v1) (- (divide$ ?v0 ?v1)))) :named a8))
       (assert (! (forall ((?v0 Real) (?v1 Real)) (= (times$ (- ?v0) ?v1) (- (times$ ?v0 ?v1)))) :named a9))
       (assert (! (forall ((?v0 Real) (?v1 Real) (?v2 Real) (?v3 Real)) (= (= (pair$ ?v0 ?v1) (pair$ ?v2 ?v3)) (and (= ?v0 ?v2) (= ?v1 ?v3)))) :named a10))
       (assert (! (not (=> (and (not (= uu$ uua$)) (= uub$ uuc$)) false)) :named a11))
       (check-sat)
       (get-proof)

Haniel Barbosa

2020-08-04 12:38:5 (GMT)

Mathias Fleury said:

and the gain is 1.6s -> 95ms (really) during parsing (more precisely when creating the isabelle terms)

wow... why the big difference? It's just because you don't have go through the trouble of equating e.g. @p_2104 and veriT_sk0?

Mathias Fleury

2020-08-04 12:43:8 (GMT)

Not unfolding @p_2104 makes the term (much much) smaller and easier to convert to a typed Isabelle term…

Mathias Fleury

2020-08-04 12:56:37 (GMT)

This is basically also the issue I had last week at a different level (it was while importing the terms into the current context).

Mathias Fleury

2020-08-04 13:23:54 (GMT)

@Haniel Barbosa I looked closer at it (and opening the unshared proof is too much for emacs, so here are some grep estimates):

$ grep "define-fun veriT_sk27" <proof> | wc -c
3 698 794

$ grep "define-fun veriT_sk27" <proof> | grep -o choice | wc -l
5 184

So we replace 5184 a choice terms (it seems to be mostly veriT_sk20). The definition of veriT_sk20 has size:

$ grep "define-fun veriT_sk20" test14.smt_out | grep choice  | wc -c
780

We do some overcounting here due to the extra-parentheses and the define-fun and not all choices are veriT_sk20, but roughly the unfolding saves us all of the definition of veriT_sk27…

Haniel Barbosa

2020-08-04 13:32:7 (GMT)

Interesting... what I think was puzzling me was that I'd expected that you'd need to unfold the choice to use it in the proof checking

Mathias Fleury

2020-08-04 13:36:45 (GMT)

I try to avoid that as much as possible by instead introducing them. But it sometimes does not work (when veriT reorders equalities) and then I unfold (and regurlarly fail to reconstruct the proof)

Haniel Barbosa

2020-08-04 13:45:25 (GMT)

I see. That's nice

Mathias Fleury

2020-08-04 13:48:10 (GMT)

I managed to fold the term:

$ grep "define-fun veriT_sk27" <proof>| wc -c
553

(so the definition is actually 6600 times smaller with the definitions than without…)

Mathias Fleury

2020-08-04 13:59:9 (GMT)

Here is an example how it happens:

We have a goal, let's say

 (∀veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252. P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252) =
    P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252

At a certain point, there are all definitions. They are combined with the theorem ‹x = (SOME x. ¬P x) ⟹ (∀x. P x) ⟷ P x› to get theorems of the form (∀x. P x) ⟷ P x
We attempt to unfold those theorems with the goal. If we are lucky the corresponding P always unify, so each definitions peals off one quantifier at a time. So in this case:

 (∀veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252. P veriT_vr249 veriT_vr250 veriT_vr251 veriT_vr252) =
    P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252

would become by introducing veriT_sk249

 (∀veriT_vr250 veriT_vr251 veriT_vr252. P veriT_sk249 veriT_vr250 veriT_vr251 veriT_vr252) =
    P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252

and then by introducing veriT_sk250

 (∀veriT_vr251 veriT_vr252. P veriT_sk249 veriT_sk250 veriT_vr251 veriT_vr252) =
    P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252

and so on until

    P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252  =
    P veriT_sk249 veriT_sk250 veriT_sk251 veriT_sk252

which we can unify and close the proof.

The advantage of this approach is that our goals never become too large.

But sometimes it does not work. In that case, we use an alternative strategy: we first simplify the definitions and the goal and then try the same pealing.

And sometimes this still does not work (because the simplifier does not reorder the terms exactly in the same way). In that case, we ~give up~ unfold all terms, but that is basically a Hail Mary.

Hans-Jörg

2020-08-04 14:0:15 (GMT)

I would have said that it used the name instead of the defined fun is not what we inteded when we added this define-skolem options, so it's bit of a bug no?

Mathias Fleury

2020-08-04 14:3:19 (GMT)

I always assumed that there was a technical reason that I missed…

Mathias Fleury

2020-08-04 14:4:7 (GMT)

I though that I had asked you about that

Hans-Jörg

2020-08-04 14:4:9 (GMT)

I don't remember ^^

Mathias Fleury

2020-08-04 14:9:32 (GMT)

It is mentioned in an email from last year («veriT true as solemn» (sic, the k got lost somehow) that I sent before going on vacation), but it seems that I never mentioned it again…

Mathias Fleury

2020-08-04 14:14:30 (GMT)

Using the choice term instead of the symbol avoids thinking about ordering of the definitions. But I don't know if this is necessary in veriT

Haniel Barbosa

2020-08-04 16:18:34 (GMT)

Thanks for the explanation, @Mathias Fleury ! I see how you do it now, cool. So if the choice definitions syntactically correspond to the skolemized formula unification alone would take care of it?

Mathias Fleury

2020-08-04 17:21:9 (GMT)

Exactly. And actually, if you can, you want the terms to be syntactically equivalent. In veriT, the problem is the order of the equalities. Are equalities ordered in CVC4?

Haniel Barbosa

2020-08-04 17:35:1 (GMT)

After much suffering, yes, we track all changes in the order of equalities in CVC4 now.

Mathias Fleury

2020-08-04 19:2:57 (GMT)

@Haniel Barbosa is your Lean checker working differently?

Haniel Barbosa

2020-08-04 19:6:27 (GMT)

I didn't get to quantifiers yet :)

Haniel Barbosa

2020-08-04 19:6:35 (GMT)

it's the next step. Currently it's only QF_UF

Jasmin Blanchette

2020-08-05 5:41:29 (GMT)

Go, quantifiers, go!

Haniel Barbosa

2020-08-05 12:36:28 (GMT)

CVC4 finally produces proofs for them internally (finally!!!), now it's a matter of printing them :P

Jasmin Blanchette

2020-08-05 12:49:55 (GMT)

Jasmin Blanchette

2020-08-05 12:50:6 (GMT)

I'll believe it when I see it.

Haniel Barbosa

2020-08-05 13:5:44 (GMT)

Fair enough :)

Here is a proof with quantifiers for problem https://github.com/CVC4/CVC4/blob/master/test/regress/regress0/quantifiers/bug290.smt2. The proof has a few issues related to the SAT solver (losing track of justification of some facts derived from input assumptions) but the quantifiers part is fine. :P

The calculus is this one: https://github.com/CVC4/CVC4/blob/proof-new/src/expr/proof_rule.h

(CHAIN_RESOLUTION
    (REORDERING
        (RESOLUTION
            (CNF_AND_NEG |:args| ((and (= (append (cons x nil) y) (cons x (append nil y))) (= y (append nil y)))))
            (IMPLIES_ELIM
                (SCOPE
                    (TRUE_ELIM
                        (TRANS
                            (SYMM
                                (TRANS
                                    (CONG
                                        (ASSUME |:args| ((= (and (= y (append nil y)) (= (append (cons x nil) y) (cons x (append nil y)))) true)))
                                        (REFL |:args| ((= (append (cons x nil) y) (cons x y))))
                                        |:args| (=>))
                                    (THEORY_REWRITE |:args| ((=> true (= (append (cons x nil) y) (cons x y))) true))))
                            (TRUE_INTRO
                                (SCOPE
                                    (TRANS
                                        (SYMM
                                            (SYMM
                                                (ASSUME |:args| ((= (append (cons x nil) y) (cons x (append nil y)))))))
                                        (CONG
                                            (REFL |:args| (x))
                                            (SYMM (ASSUME |:args| ((= y (append nil y)))))
                                            |:args| (cons)))
                                    |:args| ((= y (append nil y)) (= (append (cons x nil) y) (cons x (append nil y))))))))
                    |:args| ((= (append (cons x nil) y) (cons x (append nil y))) (= y (append nil y)))))
            |:args| ((and (= (append (cons x nil) y) (cons x (append nil y))) (= y (append nil y)))))
        |:args| ((or (= (append (cons x nil) y) (cons x y)) (not (= (append (cons x nil) y) (cons x (append nil y)))) (not (= y (append nil y))))))
    (CHAIN_RESOLUTION
        (IMPLIES_ELIM
            (SCOPE
                (INSTANTIATE
                    (EQ_RESOLVE
                        (ASSUME |:args| ((forall ((?y List)) (= (append nil ?y) ?y) )))
                        (CONG
                            (REFL |:args| (((?y List))))
                            (THEORY_REWRITE |:args| ((= (append nil ?y) ?y) false))
                            |:args| (FORALL)))
                    |:args| (y))
                |:args| ((forall ((?y List)) (= ?y (append nil ?y)) ))))
        (EQ_RESOLVE
            (ASSUME |:args| ((forall ((?y List)) (= (append nil ?y) ?y) )))
            (CONG
                (REFL |:args| (((?y List))))
                (THEORY_REWRITE |:args| ((= (append nil ?y) ?y) false))
                |:args| (FORALL)))
        |:args| ((not (forall ((?y List)) (= ?y (append nil ?y)) ))))
    (CHAIN_RESOLUTION
        (IMPLIES_ELIM
            (SCOPE
                (INSTANTIATE
                    (ASSUME |:args| ((forall ((?x Elem) (?y1 List) (?y2 List)) (= (append (cons ?x ?y1) ?y2) (cons ?x (append ?y1 ?y2))) )))
                    |:args| (x nil y))
                |:args| ((forall ((?x Elem) (?y1 List) (?y2 List)) (= (append (cons ?x ?y1) ?y2) (cons ?x (append ?y1 ?y2))) ))))
        (ASSUME |:args| ((forall ((?x Elem) (?y1 List) (?y2 List)) (= (append (cons ?x ?y1) ?y2) (cons ?x (append ?y1 ?y2))) )))
        |:args| ((not (forall ((?x Elem) (?y1 List) (?y2 List)) (= (append (cons ?x ?y1) ?y2) (cons ?x (append ?y1 ?y2))) ))))
    (ASSUME |:args| ((not (= (append (cons x nil) y) (cons x y))))) |:args| ((not (= y (append nil y))) (not (= (append (cons x nil) y) (cons x (append nil y)))) (= (append (cons x nil) y) (cons x y))))

Mathias Fleury

2020-08-05 13:19:4 (GMT)

I tried to compile CVC4 this morning and it used so much memory, I had to (hard) restart my computer :surprise:

Haniel Barbosa

2020-08-05 13:21:15 (GMT)

hahaha that issue I did not have. I used to suffer a lot from compile time, but I recently bought a desktop for gaming that has the nice side effect of compiling CVC4 in reasonable time. :upside_down:

Mathias Fleury

2020-08-05 13:22:35 (GMT)

Well 32GB of memory… but make -j might have been too much still

Mathias Fleury

2020-08-05 13:24:59 (GMT)

How do you keep track of the resolution steps in presence of conflict minimisation in the SAT solver?

Haniel Barbosa

2020-08-05 13:25:30 (GMT)

adding resolution steps for each literal eliminated from the conflict clause

Haniel Barbosa

2020-08-05 13:25:48 (GMT)

you can eliminate them because you know that their negation is justified

Haniel Barbosa

2020-08-05 13:26:0 (GMT)

so adding the resolution step is guaranteed to work

Mathias Fleury

2020-08-05 13:26:27 (GMT)

it is more complicated than that if you do recursive minimisation with caching

Mathias Fleury

2020-08-05 13:27:27 (GMT)

because you are not sure yet whether you need the resolution step or not

Haniel Barbosa

2020-08-05 13:28:56 (GMT)

I can't say for sure if CVC4 is doing minimisation as you are describing, but do you mean that one could add spurious resolution steps?

Mathias Fleury

2020-08-05 13:30:34 (GMT)

Well no SAT solver since picosat (or precosat, don't remember) is able to generate resolution proofs for this reason

Mathias Fleury

2020-08-05 13:32:6 (GMT)

When you look at https://github.com/CVC4/CVC4/blob/proof-new/src/prop/minisat/core/Solver.cc#L1110, marking a variable as seen, means that you would be able to remove that literal by resolution.
However, if you hit the cache, you do not know why it is possible without exploring the tree again (which you do not want for performance reason).

Haniel Barbosa

2020-08-05 13:33:59 (GMT)

ah, I think I see what you mean

Haniel Barbosa

2020-08-05 13:34:17 (GMT)

I add the resolution step but the literal I'm resolving against can be left open

Haniel Barbosa

2020-08-05 13:34:20 (GMT)

and its justification can be filled in later

Haniel Barbosa

2020-08-05 13:34:29 (GMT)

so I can only traverse the tree to get the reason afterwards

Haniel Barbosa

2020-08-05 13:35:8 (GMT)

I'm actually doing it eagerly though

Haniel Barbosa

2020-08-05 13:35:39 (GMT)

(optimizing has been not the priority for now)

Mathias Fleury

2020-08-05 13:36:22 (GMT)

I don't even know which of the gazillion SAT solvers supported by CVC4 you use to generate proofs :smile:

Haniel Barbosa

2020-08-05 13:36:45 (GMT)

it's the one you linked :)

Mathias Fleury

2020-08-05 13:40:13 (GMT)

I would have to find d_proxy->getPropEngine()->addResolutionStep(out_learnt[i]); to understand how you do it, but in the SAT solver itself, I don't see where you keep track of the resolution steps in the minimisation

Haniel Barbosa

2020-08-05 13:41:16 (GMT)

It's done in the method you wrote above

Haniel Barbosa

2020-08-05 13:41:35 (GMT)

that method calls reason in the SAT solver

Haniel Barbosa

2020-08-05 13:41:51 (GMT)

when trying to justify ~out_learnt[i]

Simon Cruanes

2020-08-05 13:42:15 (GMT)

Is it too complicated to produce RAT (DRUP?) instead, and then later extract a resolution proof?

Haniel Barbosa

2020-08-05 13:42:31 (GMT)

I have absolutely no idea

Haniel Barbosa

2020-08-05 13:42:37 (GMT)

never looked at this

Haniel Barbosa

2020-08-05 13:42:43 (GMT)

I see no point in doing it if we can do resolution though

Haniel Barbosa

2020-08-05 13:43:40 (GMT)

@Mathias Fleury This is the relevant method in the proof producing code, if you are curious: https://github.com/CVC4/CVC4/blob/ed5b4681ca98d0a194685ee52582c2a86f1b0686/src/prop/prop_engine.cpp#L766

Mathias Fleury

2020-08-05 13:44:49 (GMT)

@Simon Cruanes there is a paper on this for cryptominisat

Simon Cruanes

2020-08-05 13:45:19 (GMT)

I meant in the context of SMT

Simon Cruanes

2020-08-05 13:45:29 (GMT)

Do not store a resolution proof, just DRUP or whatever

Haniel Barbosa

2020-08-05 13:45:59 (GMT)

ah, well, that is not hard to do

Mathias Fleury

2020-08-05 13:46:5 (GMT)

http://theory.stanford.edu/~barrett/pubs/ONP+19.pdf

Haniel Barbosa

2020-08-05 13:46:9 (GMT)

right

Haniel Barbosa

2020-08-05 13:46:22 (GMT)

that is how we are gonna handle bitblasting proofs

Haniel Barbosa

2020-08-05 13:46:44 (GMT)

the old CVC4 proof code had a hacked minisat that would give you resolution proofs for the bitblasting part

Simon Cruanes

2020-08-05 13:46:52 (GMT)

what I mean is that it might be simpler than storing detailed proofs in the presence of inprocessing, conflict minimisation, etc.

Haniel Barbosa

2020-08-05 13:47:24 (GMT)

which is fine. But the best way of doing bitvector solving is using off-the-shelf state-of-th-art SAT solvers. And you don't wanna hack those to get resolution proofs. So you get their off-the-shelf DRAT proofs and convert them

Haniel Barbosa

2020-08-05 13:47:41 (GMT)

@Simon Cruanes for the core SAT solver it does not matter

Haniel Barbosa

2020-08-05 13:47:49 (GMT)

minisat is good enough to our purposes

Haniel Barbosa

2020-08-05 13:48:22 (GMT)

but if we were to use, say, KissSat or whatever as the core solver in the CDCL(T) framework we would go the route of getting a DRAT proof out of it and converting to resolution

Simon Cruanes

2020-08-05 13:49:30 (GMT)

Of course, sorry, I thought you were saying that even getting perfect resolution proofs for the core CDCL solver had issues :)

Haniel Barbosa

2020-08-05 13:50:19 (GMT)

it has when you have to do it as you learn how the SAT solver works :upside_down:

Haniel Barbosa

2020-08-05 13:50:40 (GMT)

but when one knows perfectly well what one is doing it shouldn't be difficult at all

Haniel Barbosa

2020-08-05 13:50:49 (GMT)

I imagine it was easy for @Pascal Fontaine to do this in veriT

Mathias Fleury

2020-08-05 13:53:37 (GMT)

I don't see in tryJustifyingLit where you check if the literal is already in the clause you are trying to derive, but otherwise I get the idea

Haniel Barbosa

2020-08-05 13:55:42 (GMT)

the resolution is l1 v ... v li v ... v ln with ~li, where li is the redundant literal and ~li is the one that method is trying to justify. In the current implementation, for other reasons that will soon disappear, the method is always overwriting a previously computed justification for ~li, if any.

Mathias Fleury

2020-08-05 14:4:6 (GMT)

Assume you have l1 v l2 v l3 and are trying to justify ~l3. You get the reason currentReason = ~l3 v l1 v l4. In the for loop, you call tryJustifyingLit(~l1) and I don't see where you stop the exploration instead of going even deeper and start analysing the reason of the ~l1

Mathias Fleury

2020-08-05 14:7:38 (GMT)

I am also confused why currentReason_size = currentReason.size(); is required in the for-loop and the fact that the analysis can cause a reallocation of clauses, but I must miss something here

Mathias Fleury

2020-08-05 14:45:5 (GMT)

Second question: how do you generate the proofs? I have 1300 smt files with (get-proof) and I would like cvc4 to either ignore that or return a proof (just to know if CVC4 is able to solve the problems…)

Haniel Barbosa

2020-08-05 16:31:40 (GMT)

Mathias Fleury said:

Assume you have l1 v l2 v l3 and are trying to justify ~l3. You get the reason currentReason = ~l3 v l1 v l4. In the for loop, you call tryJustifyingLit(~l1) and I don't see where you stop the exploration instead of going even deeper and start analysing the reason of the ~l1

You don't stop. You indeed go as high as you can.

Haniel Barbosa

2020-08-05 16:33:49 (GMT)

Mathias Fleury said:

I am also confused why currentReason_size = currentReason.size(); is required in the for-loop and the fact that the analysis can cause a reallocation of clauses, but I must miss something here

This was a sanity check. That part of the code was based on something on the old proof module in which the size of the reason was updated while in the loop. I had the impression this was unnecessary, so I was conservative but added the assertion to confirm my hunch.

Haniel Barbosa

2020-08-05 16:36:49 (GMT)

Mathias Fleury said:

Second question: Haniel Barbosa how do you generate the proofs? I have 1300 smt files with (get-proof) and I would like cvc4 to either ignore that or return a proof (just to know if CVC4 is able to solve the problems…)

CVC4 currently has two proof modules. The standard way of getting proofs (with get-proof) will use the old one. The new one is only partially in master, in fact most of it is still is our WIP branch https://github.com/CVC4/CVC4/tree/proof-new. We are merging to master as things get stable.

To use the new proof module the options are --proof-new and --dump-proofs, which are only gonna work in the branch

Haniel Barbosa

2020-08-05 16:37:52 (GMT)

If I understand correctly what you want to know in for which problems CVC4 can produce a proof for with the new infrastructure?

Mathias Fleury

2020-08-05 16:40:57 (GMT)

No, I just want to if CVC4 is able to prove it when proof-output is active

Haniel Barbosa

2020-08-05 16:41:8 (GMT)

Haniel Barbosa

2020-08-05 16:41:35 (GMT)

do the following: remove (get-proof) from all of them, compile the proof-new branch and run with the --proof-new option

Haniel Barbosa

2020-08-05 16:42:17 (GMT)

that will make CVC4 use the new proof infrastructure but it'll not try to generate the final proof (connecting the proofs generated locally for preprocessing, CNF/SAT solving, theory reasoning)

Haniel Barbosa

2020-08-05 16:45:57 (GMT)

what theories are these problems using?

Mathias Fleury

2020-08-05 17:7:31 (GMT)

AUFLIA

Mathias Fleury

2020-08-05 17:8:26 (GMT)

and maybe a few AUFLIRA

Haniel Barbosa

2020-08-05 17:12:23 (GMT)

cool

Mathias Fleury

2020-08-05 17:15:35 (GMT)

I have 362 that either veriT or Z3 can solve within 1s timeout. I am rerunning my filtering with a 12s timeout now with CVC4 too (without proof generation that should bring a few more problems in).

Mathias Fleury

2020-08-05 17:16:43 (GMT)

I should run some more mirabelle to generate more problems

Haniel Barbosa

2020-08-05 17:26:29 (GMT)

One thing that would help the printing of veriT proofs from CVC4 was a set of benchmarks for which reconstruction works, so I can run @Hans-Jörg's veriT on them and have a ground truth of proofs to compare with

Mathias Fleury

2020-08-05 17:32:35 (GMT)

I am not sure I understand what you mean. I can give you SMT files (corresponding to Isabelle theorems, but that link is lost) on which veriT can produce a proof. How does that help you? I mean: CVC4 will produce a different proof, so what do you gain?

Haniel Barbosa

2020-08-05 17:33:43 (GMT)

I would like to try to produce, from CVC4, proofs that can be reconstructed in Isabelle. My hope is that the benchmarks for which veriT can produce proofs that are successfully are a good start

Haniel Barbosa

2020-08-05 17:34:5 (GMT)

if I can produce, from CVC4, proofs at least as detailed as the proofs that veriT produces

Mathias Fleury

2020-08-05 17:36:6 (GMT)

I have 360 such problems...

Haniel Barbosa

2020-08-05 17:36:32 (GMT)

If you could send to me I'd appreciate it :) In a few weeks I may be able to give veriT proofs for them

Haniel Barbosa

2020-08-05 17:40:38 (GMT)

I mean, as far as I understand there is no easy way to check these proofs, but if I can generate proofs that I can check internally for these benchmarks and that I can print in what is the same format as the one veriT is printing on is already something I'd be happy with

Haniel Barbosa

2020-08-05 17:41:1 (GMT)

and that might motivate doing the work of trying to integrate CVC4 in the proof reconstruction machinery eventually

Haniel Barbosa

2020-08-05 17:42:1 (GMT)

but please correct me if I'm working with wrong assumptions :upside_down:

Mathias Fleury

2020-08-05 17:44:29 (GMT)

Haniel Barbosa said:

I mean, as far as I understand there is no easy way to check these proofs, but if I can generate proofs that I can check internally for these benchmarks and that I can print in what is the same format as the one veriT is printing on is already something I'd be happy with

That is actually exactly the plan and the reason why I am currently filtering my SMT files :smile:

Haniel Barbosa

2020-08-05 17:56:43 (GMT)

You want to find the blindspots for Z3/veriT?

Mathias Fleury

2020-08-05 18:0:36 (GMT)

No the other motivation is to adapt the scheduling used by veriT to the problems generated by Isabelle. In particular, veriT is able to produce proofs for CVC4 (while Z3 is not often)

Mathias Fleury

2020-08-05 18:2:25 (GMT)

Answer to the question: isabelle dumps all calls and not only the ones that succeed. Therefore, I have to filter out the unsuccessful ones

Haniel Barbosa

2020-08-05 18:5:37 (GMT)

Ah, I see now