zulip log viewer

isabelle-smt

Mathias Fleury

2020-05-15 14:59:2 (GMT)

creating topic

Hans-Jörg

2020-05-15 16:44:25 (GMT)

I think I fixed the problem now without the fallback to the old option. It's in the master branch of verit-rmx. I also merged the la-coefficients branch. I will fix mainline veriT in the next few days.

Mathias Fleury

2020-05-15 17:14:57 (GMT)

Danke!

Hans-Jörg

2020-06-09 16:7:15 (GMT)

I implemented the removal of the implicit equality in the branch no-implicit-equality-order. First tests show now problems, but I want to do some performance regression tests before merging.

Hans-Jörg

2020-06-12 11:6:34 (GMT)

I fixed a bug with this that lead to a performance regression, but I still have some weird bugs that disappear when compiling in debugging mode. Hence I will have to do a bit of debugging before merging. But if you want to play around with this you can use that branch.

Haniel Barbosa

2020-07-07 22:23:24 (GMT)

@Mathias Fleury @Hans-Jörg : is the experimental data you used in your PxTP paper available somewhere? I'd like to have a set of benchmarks produced by Isabelle and for which veriT produces proofs that are successfully reconstructed as a reference for the output of veriT proofs in CVC4

Mathias Fleury

2020-07-08 4:17:51 (GMT)

No. I used mirabelle, which does not generate them (at least by default). However, I have to rerun mirabelle soon, so I can also try to generate SMT files if you are interested.

Jasmin Blanchette

2020-07-08 6:7:53 (GMT)

Indeed, there's a 'keep' option or something in Mirabelle you can use to generate the SMT-LIB files.

Mathias Fleury

2020-07-08 6:11:40 (GMT)

Thanks!

Haniel Barbosa

2020-07-08 12:46:7 (GMT)

That'd be great, thanks @Mathias Fleury !

Mathias Fleury

2020-08-31 8:57:38 (GMT)

$ grep "veriT_smt" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | wc -l
243
$ grep "[^_]smt" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
939

Hans-Jörg

2020-08-31 8:58:11 (GMT)

25.9% veriT :P

Mathias Fleury

2020-08-31 8:59:29 (GMT)

$ grep "veriT_smt (best)" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
35
$ grep "veriT_smt (default)" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
73
$ grep "veriT_smt (ccfv_SIG)" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
25
$ grep "veriT_smt (ccfv_threshold)" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
58
$ grep "veriT_smt (del_insts)" ../mirabelle_{cvc4,z3}_HOLLibrary/*.log | grep -v "timed out" | grep -v "failed"  | grep -v sledgehammer | wc -l
52

Mathias Fleury

2020-09-15 19:4:46 (GMT)

@Jasmin Blanchette Do you know why Sascha never used the coefficients given by Z3 to improve reconstruction?

Jasmin Blanchette

2020-09-15 19:15:44 (GMT)

Laziness? ;)

Mathias Fleury

2020-09-15 19:21:49 (GMT)

Or he got scared when he saw the code of linarith.

Mathias Fleury

2020-09-15 19:24:17 (GMT)

Trying to replace my code by that is a very interesting experience.

Mathias Fleury

2020-09-15 19:24:59 (GMT)

I already learnt more than I ever wanted to know about Drule.implies_intr_hyps (and it is still not working)

Jasmin Blanchette

2020-09-15 19:29:54 (GMT)

Mathias Fleury said:

Or he got scared when he saw the code of linarith.

Very plausible.

Mathias Fleury

2020-10-06 16:53:47 (GMT)

@Jasmin Blanchette @Pascal Fontaine @Hans-Jörg @Martin Desharnais

Jasmin had the good point that veriT_smt is a bad name for an Isabelle tactic because it is too long to type.

To continue in the Isabelle tradition of naming things after rivers, I like the namelethe (the mythological river [Léthé in French] https://en.wikipedia.org/wiki/Lethe whose name appears in the greek word meaning verity), but the veriT branding gets lost in the name.

Does anyone have a better idea?

For context, shorter name of other tactics (the one that come to my mind): try, smt , auto, simp, best, fast, try0 ; equal: force, metis, arith; longer names: fastforce, sledgehammer, presburger, linarith.

Jasmin Blanchette

2020-10-06 17:15:28 (GMT)

There's also moura. How about fontaine?

Jasmin Blanchette

2020-10-06 17:16:0 (GMT)

Like Jean de La Fontaine, of course.

Jasmin Blanchette

2020-10-06 17:18:52 (GMT)

So it's really going to be a separate tactic, not smt (verit) or something? Because I thought a priori smt can handle multiple SMT solvers. It's just that the syntax for switching is currently too cumbersome.

Jasmin Blanchette

2020-10-06 17:19:19 (GMT)

My main objection to veriT_smt was in the paper. I didn't know if it was a code name or the real thing.

Mathias Fleury

2020-10-06 17:19:47 (GMT)

The tactic is only a shortcut for the switching syntax

Jasmin Blanchette

2020-10-06 17:19:49 (GMT)

For a code name, it's not so much length as the redundancy (veriT is an SMT solver, why add _smt).

Jasmin Blanchette

2020-10-06 17:20:32 (GMT)

Ah, I see. Then, to be honest, I'd go with smt_veriT or, even better, just have a lightweight switching syntax smt (verit) or smt (veriT).

Mathias Fleury

2020-10-06 17:20:37 (GMT)

I don't know if I like the smt (verit) (best) syntax

Jasmin Blanchette

2020-10-06 17:20:57 (GMT)

I mean by (smt (verit) foo bar). Like metis etc.

Jasmin Blanchette

2020-10-06 17:21:17 (GMT)

The problem with lethe etc. is that it adds more aliases that people have to learn. That's almost always bad.

Mathias Fleury

2020-10-06 17:21:21 (GMT)

Yeah, I meant for switching the strategy used by veriT

Jasmin Blanchette

2020-10-06 17:21:58 (GMT)

Oh, you can write by (smt (verit, best) one_plus_one_eq_zero). Again like metis.

Jasmin Blanchette

2020-10-06 17:22:13 (GMT)

smt would just take a list of options, given in any order.

Jasmin Blanchette

2020-10-06 17:22:44 (GMT)

I'm not saying the metis syntax is perfect, but it works and it's been working for years and people have never complained

Jasmin Blanchette

2020-10-06 17:23:17 (GMT)

except Tobias once, but he complained more about the fact that sometimes the options given by Sledgehammer are unnecessarily complicated than about the syntax per se.

Jasmin Blanchette

2020-10-06 17:24:2 (GMT)

However, in your paper, you can use a code name. For an example, corec etc. is called AmiCo in our paper "Friends with Benefits".

Mathias Fleury

2020-10-06 17:24:5 (GMT)

Right I will change the smt tactic then

Jasmin Blanchette

2020-10-06 17:24:21 (GMT)

Cool. And you have some weeks to think about the paper.

Jasmin Blanchette

2020-10-06 17:24:45 (GMT)

Lethe is cool. Or any other name related to water. ;)

Hans-Jörg

2020-10-06 17:24:50 (GMT)

just out of curiosity: how often do people type "smt" by hand vs. it being suggested by Sledgehammer?

Jasmin Blanchette

2020-10-06 17:25:3 (GMT)

1 vs 999.

Jasmin Blanchette

2020-10-06 17:25:13 (GMT)

Possibly with a fourth 9. ;)

Jasmin Blanchette

2020-10-06 17:26:56 (GMT)

The paper "Extending Sledgehammer with SMT" is one of my most cited. It took about three weeks to do the research -- mostly the evaluation. Suddenly, people started using smt. Nobody (or hardly anybody) was doing that before.

Hans-Jörg

2020-10-06 17:31:58 (GMT)

I like the smt + options solution. At least at one place smt will no longer be synonymous with z3 :P

Hans-Jörg

2020-10-06 17:34:34 (GMT)

but Lethe is a really cool name @Mathias Fleury we should find something to name Lethe

Pascal Fontaine

2020-10-07 7:59:2 (GMT)

Hi, I do not feel strongly about the name, so feel free to do what you want. I do not dislike credits, but it is not mandatory that veriT appears clearly in the tactic name, or cryptically ;-) . I know this is a joke but to be sure: do not call the tactic fontaine. Among many arguments against this, veriT is the work of several people.
Thanks!

Jasmin Blanchette

2020-10-07 8:33:37 (GMT)

You never know if it's a joke or not. There is a tactic called moura in Isabelle, no joke. :)

Mathias Fleury

2020-10-17 13:55:36 (GMT)

FYI, I reactivated verit as default component of Isabelle and this time it worked (Isabelle integration test: apple and on linux -- not windows integration test).

Hans-Jörg

2020-10-17 14:7:27 (GMT)

I can't wait for everything breaking during the RC phase :P

Jasmin Blanchette

2020-10-17 15:4:35 (GMT)

The next step, I guess, is for @Martin Desharnais to enable veriT by default in Sledgehammer for people with enough cores (to find proofs) and for reconstruction?

Jasmin Blanchette

2020-10-17 15:6:8 (GMT)

@Hans-Jörg From what I understand, your work lives in a branch of veriT. What is your and Pascal's plan for integrating it into the main branch? Because I'm hoping the plan is not to let your code "rot" in some nonstandard branch forever. :smiley_cat:

Jasmin Blanchette

2020-10-17 15:6:25 (GMT)

Anything else that separates us from greatness?

Hans-Jörg

2020-10-17 15:58:2 (GMT)

@Jasmin Blanchette In my understanding Pascal want's to perform some deeper changes to the veriT code base. Essentially fixing some of the rough patches in the code that makes it hard to develop it further. During this my (and Daniels) work would naturally converge into one codebase.

Jasmin Blanchette

2020-10-17 15:58:20 (GMT)

What's the time frame?

Jasmin Blanchette

2020-10-17 15:59:25 (GMT)

If this all works, this would be great, but sometimes the better is the enemy of the good.

Jasmin Blanchette

2020-10-17 16:2:49 (GMT)

See e.g. Joel on Software: https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/

Hans-Jörg

2020-10-17 16:14:28 (GMT)

oh wait just to ensure the context is correct: are you asking about my code in relation to the Isabelle 2021 release, or the overall development arch of veriT?

Jasmin Blanchette

2020-10-17 16:14:48 (GMT)

Latter.

Jasmin Blanchette

2020-10-17 16:15:14 (GMT)

In general, I hate code that rots in a branch, like I've seen so many times with SPASS. Ask Pascal.

Mathias Fleury

2020-10-17 16:50:34 (GMT)

Jasmin Blanchette said:

The next step, I guess, is for Martin Desharnais to enable veriT by default in Sledgehammer for people with enough cores (to find proofs) and for reconstruction?

I have some design questions on that:

- SH tests the smt method if smt_proofs is true. Do we add an extra verit_smt_proofs options or do we keep only one?
- Testing only one strategy for veriT or all?
- Maybe scheduling of the strategies on veriT in Sledgehammer?

Jasmin Blanchette

2020-10-17 17:26:36 (GMT)

Keep only one option. Even the one we have is little used anyway (now that SMT proofs are considered OK in the AFP).

Jasmin Blanchette

2020-10-17 17:27:54 (GMT)

Testing takes user time. A good target would be to keep the same upper bound as today for Z3, but now for Z3 and veriT. Shorter slices and less Z3 should do it.

Jasmin Blanchette

2020-10-17 17:28:49 (GMT)

About scheduling of strategies, by all means! Today this is done for (not-SMT) ATPs but not for SMT solvers. The same could, and ideally should, be done for SMT. It would increase the success rate.

Jasmin Blanchette

2020-10-17 17:30:39 (GMT)

See also tasks #7 and #8 in the SH bug tracker. ;)

Jasmin Blanchette

2020-10-17 17:34:25 (GMT)

@Hans-Jörg I should say I'm not in a position to throw the first stone. The code of my LICS 2017 paper (https://ieeexplore.ieee.org/document/8005071) is unfinished and rots in a branch. If I could go back in time, I'd refused to submit until we've finished the job. Now it's virtually too late. Isabelle will likely never have nonuniform datatypes. The topic can't even be given to a PhD student, because the publication is already out.

Mathias Fleury

2020-10-19 17:6:3 (GMT)

Here is a patch that fulfils only the first points raised by Jasmin, as base for discussion: http://isabelle.in.tum.de/repos/testboard/rev/1cca528a8b46

Jasmin Blanchette said:

Testing takes user time. A good target would be to keep the same upper bound as today for Z3, but now for Z3 and veriT. Shorter slices and less Z3 should do it.

@Jasmin Blanchette : you mean when checking each method, so your idea is to divide the timeout by 2 in the function raw_preplay_step_for_method for SMT-based methods, right?

Jasmin Blanchette

2020-10-19 17:44:27 (GMT)

Minor:

Why not imitate Metis_Method and have the verit option stored in a string list? As discussed last time, there's value in imitating existing idioms instead of doing things differently.
SMT_VERIT_... -> SMT_Verit... or SMT_VeriT_... or SMT_veriT_ (I'd go with the first one since the tool is called verit in Isabelle.

As for division by 2, I don't know; that's for you to find out. All I know is that it should take the same time as before overall in the worst case (failure of reconstruction), not twice or thrice as much time.

Mathias Fleury

2020-10-19 18:0:32 (GMT)

is replaying methods threaded or not?

Mathias Fleury

2020-10-21 17:34:26 (GMT)

@Jasmin Blanchette the algorithm in Sledgehammer used to redirect proofs does not take into account variable appearing in the steps, right? (this is relevant for translating skolems from veriT to something SH can deal with. If not, I have to make sure that steps involving them appear only after their introduction… )

Jasmin Blanchette

2020-10-21 17:52:58 (GMT)

I think the idea is that we always made sure so far you'd have two steps: before and after skolemization, with an arrow in between. Then it worked.

Jasmin Blanchette

2020-10-21 17:54:0 (GMT)

I suspect whatever veriT does cannot be so different from all the other provers, for which this worked. There should be a way to massage the veriT proof if necessary -- I'd guess.

Mathias Fleury

2020-10-21 17:59:7 (GMT)

Okay so far I replaced:
step1: $(\forall x.\ P\; x) = P\; sk$
by
step1: $(\forall x.\ P\; x) \implies P\; sk$

but I guess, I need to do multi-step replacements to have this arrow, namely by replacing

step1: $(\forall x.\ P\; x) = P\; sk$
step2: $P\; sk$
by
step2: $P\; sk$ .

Thanks!

Jasmin Blanchette

2020-10-21 18:0:31 (GMT)

Replacing = by arrows is something I tried with Z3. It ended with tears.

Jasmin Blanchette

2020-10-21 18:0:51 (GMT)

I'm unfortunately completely swamped this week but I'd be happy to discuss next week. ;)

Mathias Fleury

2020-10-21 18:5:48 (GMT)

Damn. That will get super ugly when the skolemization happen in subproofs, because step 2 can be implicit.