2020-05-11 14:30:0 (GMT)
Dear all Zip contributors,
We should register the system for CASC soon.
We think that we should submit Zipperposition to: first-order, higher-order and LTB divisions of the competion.
I am yet to understand how LTB works and will let you know soon how we plan to go about that.
I am ready to take over all competition participation issues on me if that is ok wwith everyone.
2020-05-11 14:30:59 (GMT)
If LTB is too much work (say, more than one day), don't hesitate to drop it.
2020-05-11 15:10:27 (GMT)
It looks like this year's LTB is different from last year. If I recall correctly, last year they used the "bushy" versions of the problems, i.e. exactly the axioms that are needed to prove the theorem. This year, they use the "chainy" version, which includes all facts that precede the given theorem in HOL4. That decision will give Zipperposition an advantage and Leo-III a disadvantage I think.
2020-05-11 15:19:37 (GMT)
What bothers me with LTB is that it has a weird input format that nobody else ever used or will ever used, and it even changes from year to year I believe. I'm also not convinced by the premise behind the whole "batch" idea, but I must admit I haven't looked closely into it. Zip would stand a good chance, I'm sure.
2020-05-11 15:32:30 (GMT)
I feel like the "batch" aspect of LTB is targeted towards machine learning, but I might be wrong.
The interesting part about LTB is that the problems come from a proof assistant. In that sense, it is very relevant to Matryoshka.
2020-05-11 15:34:5 (GMT)
It would be nice to have a straight-forward implementation of the batch aspect that can have any other prover as a backend. Then everybody could participate without having to write LTB specific code.
2020-05-11 15:34:26 (GMT)
I wonder if that doesn't already exists. It rings a bell.
2020-05-11 15:35:0 (GMT)
I think @Alex Steen wrote something like this for Leo-III, but I don't know how modular it is.
2020-05-11 15:46:48 (GMT)
Yes, the batch is clearly for machine learning, but it still feels artificial to me. I can't imagine a real-life scenario where you'd feed a theory the way you do. But yeah, I'm a fan of the LT part of LTB.
2020-05-11 15:48:43 (GMT)
I haven't looked at LTB in a long time, but isn't LTB a decent model of what would happen if sledgehammer kept provers running in between calls? So you'd have provers loaded with lots of lemmas, and you'd just send them SH goals as you go…
2020-05-11 15:51:43 (GMT)
I think the biggest problem with that premise is that superposition provers are not incremental in that way
2020-05-11 15:52:0 (GMT)
I started working on LTB now, I will see how far I get by tomorrow
2020-05-11 15:54:48 (GMT)
As an aside: I think with avatar(-lite), you can easily pursue several goals at once, by just asserting:
G1 <- [goal_1]
G2 <- [goal_2]
…
for each conjecture , with a corresponding fresh "goal" pseudo-atom. Then whenever is proven at level 0 in the SAT solver, you know the corresponding goal has been proved.
The idea would be that you share all the completion work (positive superposition, and demodulation, mostly) among goals. Of course the problem is that you can't tailor heuristics to each goal!
2020-05-11 15:57:54 (GMT)
Simon: What you describe would have a different mode of interaction, not a big batch file. That's my main objection. With the TPTP format, if you implement it, not only you can participate at CASC, you can also have users invoke your prover. It's a good thing. In contrast, the batch file is usefuless in a scenario where SH would keep provers running between calls.
2020-05-11 15:59:47 (GMT)
Also, it's kind of sad, but Stephan Schulz told me the best way to prove conjectures in a row, even related ones, is to throw away everything you've done each time. At most a kind of heuristic "good lemma" search (as implemented by one of the E tools) might help a bit. Maybe one day we'll be able to learn more from one run to the other, but right now the situation isn't very good.
2020-05-11 16:8:46 (GMT)
A more realistic model would be Geoff gets, say, 200 problems generated by Sledgehammer in sequence by the same user (and he has those problems already via SystemOnTPTP). But then the problems should use the same names for the same symbols etc. -- which basically amounts to using THF1. Of course, one could substitute HOLyHammer for Sledgehammer etc., or have several hammers.
2020-05-11 16:8:54 (GMT)
I'm not surprised, it fits into the "cannot tailor heuristics to each" goal limitation…
2020-05-11 16:12:55 (GMT)
Right. Stephan basically said that the saturation process is very goal-oriented and that what you derive is not very useful for other goals.
2020-05-11 16:13:19 (GMT)
(Even lemmas that are not descendants of the negated conjecture.)
2020-05-11 18:57:19 (GMT)
I have almost developed a rudimentary library for running a random prover on LTB problems, though I will test it tomorrow
2020-05-11 18:57:53 (GMT)
I have a lot of work to do on HO and FO benchmarks, and I will leave this whole LTB thing as an exercise (it also seems I have some polymorphism problems that need to be ironed out)
2020-05-11 22:25:21 (GMT)
Jasmin Blanchette said:
Also, it's kind of sad, but Stephan Schulz told me the best way to prove conjectures in a row, even related ones, is to throw away everything you've done each time. At most a kind of heuristic "good lemma" search (as implemented by one of the E tools) might help a bit. Maybe one day we'll be able to learn more from one run to the other, but right now the situation isn't very good.
I support this from a Vampire perspective - whenever I tried to do some kind of 'learning' in LTB I always did worse than just running 'good' schedules repeatedly.
The batch nature of LTB has been diminished last year as now we just get a set of problems and have to solve as many as we can whilst in the past they had to be proved in order. I believe datasets from the past were similar to what Jasmin suggests e.g. '200 problems generated by Sledgehammer in sequence by the same user'.
There is a big stress on machine learning (we get this training set) but it's not really used that much.
Currently Vampire has not registered for LTB.
2020-05-22 9:7:44 (GMT)
Alex opened my eyes that I do not have to submit the binary while registering,
so I registered Zip for CASC.
I gave it the version number 2 because there are huge differences compared to the previous version.
I guess that when we (soon) write a system description we want to call that version 2 as well? (@Simon Cruanes )
2020-05-22 14:3:33 (GMT)
It seems like a short timespan to release 2.0, but maybe it's time after all. I'd still like to have a tag in the repo that is "casc-2020" or something like that, with the precise version for CASC. 2.0 can come slightly later when it's a bit more polished.
2020-05-22 14:3:59 (GMT)
(I'd call that a "2.0-rc0" probably, release candidates are a thing, but for a small project it might be overkill)
2020-05-28 12:9:41 (GMT)
For what it's worth, have a look at http://www.tptp.org and esp. "The TPTP Needs Money". I believe you just need to give USD 1 to get your name there. ;)
2020-06-02 19:32:54 (GMT)
http://www.tptp.org/CASC/J10/Entrants.html#Systems wow, THF is clearly the exciting category to follow this year (followed by LTB)…
2020-06-02 20:4:14 (GMT)
More interesting than TFA! For a bit there I thought it would just be us
2020-06-02 20:5:7 (GMT)
It's a bit sad there's no category for TF0 without arithmetic :-)
(Or maybe: FOF should die and be replaced with a typed version, cause who can't handle types in 2020, really? :upside_down: )
2020-06-02 20:14:35 (GMT)
TF0 could be added as a subcategory to FOF similar to equality/no-equality to have types/no-types. I think more than half of the systems entered into FOF don't support types currently, I may be wrong. I think the main reason for lack of interest in TF0 is a lack of problems in TPTP.
2020-06-02 20:44:0 (GMT)
I agree, there aren't that many typed problems in there. It's a bit of a vicious circle.
2020-06-03 5:48:44 (GMT)
We should perform type inference on the FOF problems, using a tool like Monotonox.
2020-06-03 5:48:53 (GMT)
And generate TF0 problems.
2020-06-03 5:49:14 (GMT)
And kill the old FOF problems. ;)
2020-06-03 12:38:5 (GMT)
Jasmin Blanchette said:
We should perform type inference on the FOF problems, using a tool like Monotonox.
did you mean dolmen? :stuck_out_tongue:
(I mean, inferring precise sorts is useful for satisfiable problems but seems quite different than just replacing fof with tff and declaring symbols at the beginning after inferring if they're predicates or functions).
2020-06-03 13:3:22 (GMT)
Dolmen should soon be able to do this kind of task indeed (well, more generally, reading a file, typing it and then exporting it to another supported format, I guess I can add some optionts to "promote" fof statements to tff statements).
On the other hand, according to the inference rules for tff, simply replacing fof by tff textually should pretty much work in all situations if I remember the rules correctly.
2020-06-03 13:22:36 (GMT)
There's also basic tooling for that in Zipperposition, but it's probably obsolete now that dolmen exists.
2020-06-03 13:23:2 (GMT)
I think tff demands that symbols be declared? Even if variables of type $i don't have to be annotated?
2020-06-03 13:27:38 (GMT)
Actually, tff specifies relatively complex inference rules: first is the fact that you can syntactically (or during a descent in type-checking) know whether to expect a term or a formula at any point in the AST, then tff specifies that when encountering a constant symbol (because in tptp you can syntactically differentiate variables that start with an uppercase from constant that start with a lowercase) that has not been declared previously, you should infer its type as taking the number of arguments to which it is currently applied, all of type $i, and returning the approriate type depending on what you expect (so either $o or $i).
2020-06-03 13:28:48 (GMT)
It's still pretty simple and corresponds to the inference you need to do even in fof (to differentiate predicates from functions). I agree you need to see the whole problem first, but it's still pretty easy.
2020-06-03 13:33:23 (GMT)
Simon Cruanes said:
It's still pretty simple and corresponds to the inference you need to do even in fof (to differentiate predicates from functions). I agree you need to see the whole problem first, but it's still pretty easy.
Exactly: the tff inference rules actually seem to be exactly what you'd need to do to type fof, hence why I think changing fof to tff textually might actually be a viable solution. Evidently, it is risky and brittle, so having dolmen do that while checking the types seems more reasonable, :)
2020-06-03 13:59:44 (GMT)
Simon Cruanes said:
Jasmin Blanchette said:
We should perform type inference on the FOF problems, using a tool like Monotonox.
did you mean
dolmen? :stuck_out_tongue:
(I mean, inferring precise sorts is useful for satisfiable problems but seems quite different than just replacingfofwithtffand declaring symbols at the beginning after inferring if they're predicates or functions).
I was only half serious, but continuing in that vein: The point of using Monotonox wasn't to get a syntactically correct TF0 problem -- I'm well aware that replacing fof by tff etc. does the trick -- but to create problems that are genuinely many-sorted, which in turns makes them more convenient for a variety of usages (reading by humans, proving (think superposition from variables), model-finding with Paradox or iProver or Vampire, ...).
2020-06-03 14:0:55 (GMT)
Yes, I agree properly inferring sorts is useful, but it's truly changing the problem in many cases (making it easier, hopefully). Doing both could probably be useful! Keeping the original version and the refined one, the same way a lot of problems currently exist both in CNF and FOF forms.
2020-06-03 14:14:28 (GMT)
I don't think Geoff would honestly kill the FOF problems -- they're useful if nothing else to the non-TFF provers left. As for "truly changing the problem", it depends on your definition of "truly", but I'd say that any Monotonox-induced change is a rather shallow change. ;)
2020-06-03 14:54:42 (GMT)
Martin Suda was pointing out to me we should do this anyway as there is a small overhead in superposition where we consider superpositions for incompatible 'subsorts' - lifting FOF problems to TFO would avoid this. We do it anyway for finite model building.
2020-06-03 14:58:39 (GMT)
Of course I responded without reading the full thread and seeing that Jasmin had already hinted at this.
There used to be a difference between fof and tff in that 1 used to be a valid constant of type $i in fof but is obviously of type $int in tff but the rules were changed and Geoff hunted down the numbers in TPTP.
2020-06-03 15:2:33 (GMT)
Looking at CASC this year, I don't see that many ATP developers left in this world. If we can get together and all agree that typing FOF problems automatically is a good thing (perhaps something we should check empirically first), we could talk to Geoff.
2020-06-03 15:3:1 (GMT)
By "we" I mean "you" -- I don't think I've written a single line of ATP code yet, regrettably. :S
2020-06-03 15:3:17 (GMT)
And by "you", I mean "you guys". ;)
2020-07-02 16:34:17 (GMT)
What's the status of the LTB algorithm used in the end? It seems to perform pretty well!
2020-07-02 16:34:35 (GMT)
It is a stupid script I implemented in one day
2020-07-02 16:34:41 (GMT)
WIth some basic heuristics
2020-07-02 16:36:7 (GMT)
Well seems pretty good for a script written in one day ;)