2020-08-14 13:8:19 (GMT)
I'm trying to make sense of the enumerative QI paper by Reynolds et al., and one aspect that's not clear to me is skolemization and clausification. Is it so that CVC4 and veriT first put a formula in Skolem normal form (as hinted by the comment on p. 2) and then apply miniscoping (not mentioned in the paper) to move the quantifiers as far in and get a split ? I think my problem stems from the fact that the authors, on p. 2, assume familiarity with the notion of "quantified formula", but their notion seems to be different than mine, and their substitution operators seem to magically remove the leading foralls. Anyway, any reference or explanation that would help me understand what actually happens in theory and practice (so I can make sure to model it faithfully) would be greatly appreciated.
2020-08-14 13:9:42 (GMT)
@Pascal Fontaine maybe you can help me too. ;)
2020-08-14 13:22:53 (GMT)
so that's the next bookclub, heh? :)
2020-08-14 13:25:9 (GMT)
I don't quite understand what you are not understanding. :)
Skolemization is done a priori, generally only on top-level skolemizable quantifiers
2020-08-14 13:25:30 (GMT)
as in, skolemizable quantifiers that are not on the scope of non-skolemizable quantifiers
2020-08-14 13:26:2 (GMT)
thus the quantifier formulas that will later on be instantiated may contain quantifiers in their scope
2020-08-14 13:26:55 (GMT)
To put it simply, I don't understand anything.
2020-08-14 13:27:17 (GMT)
I don't understand fully what's the initial format, what transformations are applied in what order, and so on.
2020-08-14 13:27:22 (GMT)
Your answer helps a bit.
2020-08-14 13:28:8 (GMT)
But seeing how you apply to , which is "a quantified formula" (could it be ??), that doesn't reassure me. :S I have to infer what you mean.
2020-08-14 13:28:14 (GMT)
What I said is how both CVC4 and veriT handle quantifiers. IIRC in the paper we assume that the formula is preprocessed so that prenexing/minscoping has been done
2020-08-14 13:28:39 (GMT)
The paper never even mentions the phrase miniscoping. You see why I'm confused?
2020-08-14 13:28:49 (GMT)
miniscoping is an optimization
2020-08-14 13:28:53 (GMT)
prenexing suffices
2020-08-14 13:29:7 (GMT)
Sure, but then everything is universal and you don't need G, just Q.
2020-08-14 13:29:27 (GMT)
Or E, whatever. ;)
2020-08-14 13:29:29 (GMT)
true :)
2020-08-14 13:29:47 (GMT)
let me look what we wrote :)
2020-08-14 13:29:49 (GMT)
I'm trying to understand and model a realistic SMT solver, not something tha merely works in principle.
2020-08-14 13:30:34 (GMT)
Seriously, in the real CVC4 or veriT:
2020-08-14 13:30:50 (GMT)
If I have ,
2020-08-14 13:31:23 (GMT)
Does that immediately get massaged in such a way that Skolems are introduced and the forall is on top?
2020-08-14 13:32:30 (GMT)
I'm not sure what CVC4 will do. What veriT will do: it will eliminate the iff operator so that the negative occurrence of the forall can be skolemized
2020-08-14 13:32:33 (GMT)
And in Theorem 3, when you write "finite sets of instances of Q", you mean ground instances, right?
2020-08-14 13:32:39 (GMT)
yes
2020-08-14 13:32:53 (GMT)
In the paper we say:
"We assume that all quantified formulas in ψ are of the form ∀x̄. φ with φ quantifier-free. This can be achieved by prenex form transformation and Skolemization."
2020-08-14 13:33:1 (GMT)
OK, so there's still a huge disconnect between the paper and the implementation.
2020-08-14 13:33:35 (GMT)
Where is that written?
2020-08-14 13:33:43 (GMT)
the difference between that and what happens in practice is that \phi is not quanifier-free because skolemization is done only when it'd generate constant functions and that prenexing is not generally done
2020-08-14 13:33:56 (GMT)
I'm starting to wonder if the version I got from the Matryoshka web site is up to date :S
2020-08-14 13:34:5 (GMT)
that is not written because it does not matter to the scope of the paper :)
2020-08-14 13:34:38 (GMT)
I can't parse your last 3 sentences.
2020-08-14 13:34:41 (GMT)
(funny enough the one question about this paper when I presented it at TACAS was exactly about this...)
2020-08-14 13:34:51 (GMT)
I'm not surprised.
2020-08-14 13:34:54 (GMT)
hahaha
2020-08-14 13:34:57 (GMT)
I don't get it, really
2020-08-14 13:35:10 (GMT)
You quoted from your own paper. I cannot find the quote.
2020-08-14 13:35:14 (GMT)
On which page is it?
2020-08-14 13:35:19 (GMT)
But I'd like that you understood this so please let me help
2020-08-14 13:35:23 (GMT)
I quoted from page 3
2020-08-14 13:35:40 (GMT)
in the "Instantiation-based SMT solvers"
2020-08-14 13:35:54 (GMT)
Ah, I'm reading "Revisiting Enumerative Instantiation".
2020-08-14 13:36:11 (GMT)
So am I. That is the name of the subsection
2020-08-14 13:36:13 (GMT)
I thought so much was clear from the title of this thread. :S
2020-08-14 13:36:38 (GMT)
As I said, I think I'm reading an obsolete (e.g. submitted) version.
2020-08-14 13:36:52 (GMT)
I repeat my claim that the version I got from the Matryoshka web site is probably not up to date.
2020-08-14 13:37:3 (GMT)
2020-08-14 13:37:57 (GMT)
Indeed, that was it. Thanks for clearing that up.
2020-08-14 13:38:15 (GMT)
Note to self to update the PDF next week (after migration to the new web site).
2020-08-14 13:39:32 (GMT)
OK, so you do agree there's a quite nasty gap between the theory & implem.
2020-08-14 13:39:49 (GMT)
I.e. even if the theory is complete and we strongly suspect the implem is, there's no easy way to bridge the two.
2020-08-14 13:40:21 (GMT)
you mean "complete" as in the completeness of FOL?
2020-08-14 13:40:29 (GMT)
Yes, refutationally complete.
2020-08-14 13:40:39 (GMT)
OK. I do not think the gap is nasty
2020-08-14 13:40:42 (GMT)
An SMT+QI solver is basically a first-order prover.
2020-08-14 13:40:44 (GMT)
Yes, there is a gap
2020-08-14 13:40:45 (GMT)
Why not?
2020-08-14 13:40:50 (GMT)
How would you fill it?
2020-08-14 13:41:2 (GMT)
because one can prove (nobody has) that the gap can be bridged
2020-08-14 13:41:9 (GMT)
what the implementation is doing
2020-08-14 13:41:18 (GMT)
Yes but the implementation might be incomplete.
2020-08-14 13:41:24 (GMT)
This just begs the question.
2020-08-14 13:41:30 (GMT)
is equivalent to what the paper assumes. But it is done incrementally, as the formulas are instantiated
2020-08-14 13:41:47 (GMT)
Yes, but that changes everything.
2020-08-14 13:41:56 (GMT)
That is, of course, assuming the implementation is applying the technique as described in the paper. CVC4 and veriT do
2020-08-14 13:41:58 (GMT)
It changes the order the models could be generated etc.
2020-08-14 13:42:10 (GMT)
(veriT kinda actually, there are some issues)
2020-08-14 13:42:18 (GMT)
Exactly.
2020-08-14 13:42:32 (GMT)
My point is, you haven't proved a thing until you've proved a thing.
2020-08-14 13:42:42 (GMT)
I agree with you. I'm conjecturing
2020-08-14 13:42:48 (GMT)
I'm confident enough to not care though
2020-08-14 13:42:55 (GMT)
As I explained to Pascal last time, it's not even clear a priori that enum. QI + SMT is refut. complete.
2020-08-14 13:43:18 (GMT)
The theorem 3 in this paper, the main one, is something that nobody was surprised to see, because everybody trusts it as well
2020-08-14 13:43:19 (GMT)
I am convinced that it is, and one of the reasons for this is related, obscurely, to the existence of limit points in topological spaces.
2020-08-14 13:43:30 (GMT)
but I think it's quite nice it exists
2020-08-14 13:43:47 (GMT)
But one could in principle imagine a scenario where the ground SMT solver and QI spin forever.
2020-08-14 13:43:59 (GMT)
And Herbrand couldn't help because new formulas and models would keep appearing.
2020-08-14 13:44:11 (GMT)
in pure equational FOL?
2020-08-14 13:44:13 (GMT)
E.g. if you instantiate a Bool with a formula.
2020-08-14 13:44:30 (GMT)
All of this is "a priori". I'm going to prove that it cannot happen.
2020-08-14 13:44:38 (GMT)
For some reasonable logics, strategies, etc.
2020-08-14 13:46:1 (GMT)
My point is just, again, until we've proved it, we haven't proved it, and there's a substantial gap between Hebrand++ (from the EQI paper) and actual refutational completeness of a solver based on that, let alone CVC4 and veriT.
2020-08-14 13:46:39 (GMT)
Thanks for your help. This was extremely useful. :)
2020-08-14 13:47:4 (GMT)
Yes, you're not wrong
2020-08-14 13:47:25 (GMT)
the nastiest thing might be how the quantifier instantiation techniques interact
2020-08-14 13:48:46 (GMT)
Hm, I'm not too worried about that, actually. From my point of view, it's always OK to throw more formulas into the mix.
2020-08-14 13:49:11 (GMT)
But what I'm seeing is that this whole topic is too big to be a section or a subsection in a paper on AVATAR. I'll have to come back to it later.
2020-08-14 13:50:30 (GMT)
Jasmin Blanchette said:
Hm, I'm not too worried about that, actually. From my point of view, it's always OK to throw more formulas into the mix.
Ah, right. I was thinking of "not doing one but doing other" but what actually happens in the solver is that if one technique fails another is applied in a completely independent manner.
2020-08-14 14:23:17 (GMT)
I'm confident enough to not care though
My conversations with Giles and some theoretical incompletenesses of AVATAR led to very similar comments. ;)
2020-08-14 14:23:29 (GMT)
I don't mind if people don't care. I do care, though. ;)
2020-08-14 14:24:12 (GMT)
These issues become increasingly important as the techniques get generalized (e.g. to HOL), adapted, or reimplemented.
2020-08-14 14:24:31 (GMT)
What I'm trying to say is, one can see it as a matter of self-respect to care. :)
2020-08-14 14:28:20 (GMT)
hahaha yeah, I get it. I completely understand other people caring and I'd be happy to see such results, as I am with the theoretical part of this EQI paper.
2020-08-14 14:28:55 (GMT)
Does that mean authors of SMT solvers (or similar systems) who are writing papers about their methods should be more precise about the gap between theoretical presentation and implementation and prove that the results for the theoretical method can be lifted to the practical implementation?
2020-08-14 14:29:10 (GMT)
I just don't believe much in proving the exact behavior of implementations correct. I like such results modulo nice assumptions. For the nastiest stuff I'm happy with checking certificates from the systems. :)
2020-08-14 14:29:34 (GMT)
There are two types of gaps.
2020-08-14 14:29:45 (GMT)
You have those that can be closed by refinement and those that can't.
2020-08-14 14:30:17 (GMT)
The first type is fine. You're bound to have some gap between a 15-page paper and a 30 000 line implementation.
2020-08-14 14:30:31 (GMT)
It's the second type of gap that I find worrying.
2020-08-14 14:31:5 (GMT)
Basically, when the theory doesn't quite capture accurately what happens in practice and couldn't be used to prove the implementation correct.
2020-08-14 14:31:51 (GMT)
Distinguishing the two types of gaps isn't trivial. Sometimes, a clever idea (taking the form of a sophisticated refinement relation) can be used to bridge the gap, unexpectedly.
2020-08-14 14:33:2 (GMT)
Looking at EQI, one would be tempted to generalize the theory (the Herbrand theorem), so that it also works with nonground terms.
2020-08-14 14:33:15 (GMT)
But maybe the general case can be reduced to the ground case.
2020-08-14 14:33:50 (GMT)
(By "refinement", above, I meant specifically "step-wise refinement", a la @Mathias Fleury 's PhD thesis.)
2020-08-14 14:35:35 (GMT)
Sometimes the gap is just that the theory considers a smaller set of connectives. In theory, there is a gap, and one must generalize the theory, but in practice, this step is accepted "by analogy" by most people (but, mind you, not by proof assistants). I don't see this as problematic; if I did, I'd have to reject the majority of math and TCS papers.
2020-08-14 14:37:20 (GMT)
@Haniel Barbosa you might find the following rewiew extract (fresh off the press) funny/sad:
A first big point concerns the reliability of this work. As things get more
and more difficult and involved, how can you guarantee that no mistake
has been made? Or at least, can you try to provide or evidence or partial
support in this direction?
This issue can be ignored to some extent for clearcut basic algorithms
-- this was in particular the foundation of the LCF architecture by
Milner in the 1970s. But it can be less and less ignored as things get
more complicated. This is why even the Field Medalist Veovodsky, or at
a modest level, researchers submitting their wotk to conferences such
as POPL or LICS formalize their work using a well-kwn prof assistant
such as Agda, Coq or Isabelle. Sure, it involves a hard work on top
of the already hard conceptual work performed.My first suggestion would then to provide a formalization if the work in a
reliable proof assistant. This formalization should contain clear statements,
instructions for replaying the proof and check that no hypothesis or axiom is
left under the carpet. This way many people can get convinced that
the theorems hold without needing to follow all technical details.
As a compensation, a high-level description of what is going on is enough.
If this is out of reach in a reasonable amount of time, then authors could
at least explain what is their TCB (trusted code base) and prepare a roadmap
to reduce it.
2020-08-14 14:37:37 (GMT)
The context is E.
2020-08-14 14:38:15 (GMT)
I think I see where you're getting at
2020-08-14 14:42:14 (GMT)
@Jasmin Blanchette I find it naive...
2020-08-14 14:43:2 (GMT)
And apparently experimental evaluation was not taken into account at all in the confidence part for the results?
2020-08-14 14:58:1 (GMT)
The reviewer was a self-confessing ITP-Mensch. They read the paper with their ITP goggles on, which means focus on 100.0000% certainty.
2020-08-14 14:58:27 (GMT)
The answer is easy: We'll sketch both approaches (verifiying vs. certificates) and discuss the merits of both.
2020-08-14 14:58:43 (GMT)
Regardless, so that the formalizers have work to do, somebody's got to develop the theory in the first place. ;)
2020-08-14 15:43:29 (GMT)
isn't the answer "we'll have a proof checker" anyway? :upside_down:
2020-08-14 15:47:56 (GMT)
Well that wont help you for completeness.
2020-08-14 15:55:44 (GMT)
Indeed not. Depends on what the goal is indeed. But personally when it comes to automatic theorem provers, in practice, I'm mostly concerned about them being correct.
2020-08-14 16:0:21 (GMT)
I think they're worried about soundness. When I wrote "sketch both approaches", I meant we'll explain that proof checker is the way to go in practice (cf. Heule).
2020-08-14 16:1:26 (GMT)
Completeness is still not to be disregarded in practice -- for various decidable fragments. But even there, you can have independent checkers of saturations.
2020-08-14 16:5:12 (GMT)
Culturally completeness seem to matter. Most of the time one starts with a calculus that one argues that it's complete and then might restrict it to an incomplete one.
2020-08-14 16:43:48 (GMT)
Apart from soundness, completeness is one of the very few theoretical things we can prove about a prover. That's what distinguishes some-program-that-does-some-stuff and a good prover. I guess it has so much focus because there's not much else we can prove and because it's hard.
2020-08-14 16:46:13 (GMT)
But we still only prove it modulo a million simplifying assumptions
2020-08-14 16:46:30 (GMT)
some more serious than others, as we discussed :)
2020-08-15 13:48:54 (GMT)