zulip log viewer

CASC registration

Petar Vukmirovic

2020-05-11 14:30:0 (GMT)

Dear all Zip contributors,

We should register the system for CASC soon.

We think that we should submit Zipperposition to: first-order, higher-order and LTB divisions of the competion.
I am yet to understand how LTB works and will let you know soon how we plan to go about that.

I am ready to take over all competition participation issues on me if that is ok wwith everyone.

Jasmin Blanchette

2020-05-11 14:30:59 (GMT)

If LTB is too much work (say, more than one day), don't hesitate to drop it.

Alexander Bentkamp

2020-05-11 15:10:27 (GMT)

It looks like this year's LTB is different from last year. If I recall correctly, last year they used the "bushy" versions of the problems, i.e. exactly the axioms that are needed to prove the theorem. This year, they use the "chainy" version, which includes all facts that precede the given theorem in HOL4. That decision will give Zipperposition an advantage and Leo-III a disadvantage I think.

Jasmin Blanchette

2020-05-11 15:19:37 (GMT)

What bothers me with LTB is that it has a weird input format that nobody else ever used or will ever used, and it even changes from year to year I believe. I'm also not convinced by the premise behind the whole "batch" idea, but I must admit I haven't looked closely into it. Zip would stand a good chance, I'm sure.

Alexander Bentkamp

2020-05-11 15:32:30 (GMT)

I feel like the "batch" aspect of LTB is targeted towards machine learning, but I might be wrong.

The interesting part about LTB is that the problems come from a proof assistant. In that sense, it is very relevant to Matryoshka.

Alexander Bentkamp

2020-05-11 15:34:5 (GMT)

It would be nice to have a straight-forward implementation of the batch aspect that can have any other prover as a backend. Then everybody could participate without having to write LTB specific code.

Simon Cruanes

2020-05-11 15:34:26 (GMT)

I wonder if that doesn't already exists. It rings a bell.

Alexander Bentkamp

2020-05-11 15:35:0 (GMT)

I think @Alex Steen wrote something like this for Leo-III, but I don't know how modular it is.

Jasmin Blanchette

2020-05-11 15:46:48 (GMT)

Yes, the batch is clearly for machine learning, but it still feels artificial to me. I can't imagine a real-life scenario where you'd feed a theory the way you do. But yeah, I'm a fan of the LT part of LTB.

Simon Cruanes

2020-05-11 15:48:43 (GMT)

I haven't looked at LTB in a long time, but isn't LTB a decent model of what would happen if sledgehammer kept provers running in between calls? So you'd have provers loaded with lots of lemmas, and you'd just send them SH goals as you go…

Petar Vukmirovic

2020-05-11 15:51:43 (GMT)

I think the biggest problem with that premise is that superposition provers are not incremental in that way

Petar Vukmirovic

2020-05-11 15:52:0 (GMT)

I started working on LTB now, I will see how far I get by tomorrow

Simon Cruanes

2020-05-11 15:54:48 (GMT)

As an aside: I think with avatar(-lite), you can easily pursue several goals at once, by just asserting:

G1 <- [goal_1]
G2 <- [goal_2]
…

for each conjecture $G_i$ , with a corresponding fresh "goal" pseudo-atom. Then whenever $\lnot [\text{goal}_i]$ is proven at level 0 in the SAT solver, you know the corresponding goal has been proved.

The idea would be that you share all the completion work (positive superposition, and demodulation, mostly) among goals. Of course the problem is that you can't tailor heuristics to each goal!

Jasmin Blanchette

2020-05-11 15:57:54 (GMT)

Simon: What you describe would have a different mode of interaction, not a big batch file. That's my main objection. With the TPTP format, if you implement it, not only you can participate at CASC, you can also have users invoke your prover. It's a good thing. In contrast, the batch file is usefuless in a scenario where SH would keep provers running between calls.

Jasmin Blanchette

2020-05-11 15:59:47 (GMT)

Also, it's kind of sad, but Stephan Schulz told me the best way to prove $n$ conjectures in a row, even related ones, is to throw away everything you've done each time. At most a kind of heuristic "good lemma" search (as implemented by one of the E tools) might help a bit. Maybe one day we'll be able to learn more from one run to the other, but right now the situation isn't very good.

Jasmin Blanchette

2020-05-11 16:8:46 (GMT)

A more realistic model would be Geoff gets, say, 200 problems generated by Sledgehammer in sequence by the same user (and he has those problems already via SystemOnTPTP). But then the problems should use the same names for the same symbols etc. -- which basically amounts to using THF1. Of course, one could substitute HOLyHammer for Sledgehammer etc., or have several hammers.

Simon Cruanes

2020-05-11 16:8:54 (GMT)

I'm not surprised, it fits into the "cannot tailor heuristics to each" goal limitation…

Jasmin Blanchette

2020-05-11 16:12:55 (GMT)

Right. Stephan basically said that the saturation process is very goal-oriented and that what you derive is not very useful for other goals.

Jasmin Blanchette

2020-05-11 16:13:19 (GMT)

(Even lemmas that are not descendants of the negated conjecture.)

Petar Vukmirovic

2020-05-11 18:57:19 (GMT)

I have almost developed a rudimentary library for running a random prover on LTB problems, though I will test it tomorrow

Petar Vukmirovic

2020-05-11 18:57:53 (GMT)

I have a lot of work to do on HO and FO benchmarks, and I will leave this whole LTB thing as an exercise (it also seems I have some polymorphism problems that need to be ironed out)

Giles

2020-05-11 22:25:21 (GMT)

Jasmin Blanchette said:

Also, it's kind of sad, but Stephan Schulz told me the best way to prove $n$ conjectures in a row, even related ones, is to throw away everything you've done each time. At most a kind of heuristic "good lemma" search (as implemented by one of the E tools) might help a bit. Maybe one day we'll be able to learn more from one run to the other, but right now the situation isn't very good.

I support this from a Vampire perspective - whenever I tried to do some kind of 'learning' in LTB I always did worse than just running 'good' schedules repeatedly.

The batch nature of LTB has been diminished last year as now we just get a set of problems and have to solve as many as we can whilst in the past they had to be proved in order. I believe datasets from the past were similar to what Jasmin suggests e.g. '200 problems generated by Sledgehammer in sequence by the same user'.

There is a big stress on machine learning (we get this training set) but it's not really used that much.

Currently Vampire has not registered for LTB.

Petar Vukmirovic

2020-05-22 9:7:44 (GMT)

Alex opened my eyes that I do not have to submit the binary while registering,
so I registered Zip for CASC.

I gave it the version number 2 because there are huge differences compared to the previous version.

I guess that when we (soon) write a system description we want to call that version 2 as well? (@Simon Cruanes )

Simon Cruanes

2020-05-22 14:3:33 (GMT)

It seems like a short timespan to release 2.0, but maybe it's time after all. I'd still like to have a tag in the repo that is "casc-2020" or something like that, with the precise version for CASC. 2.0 can come slightly later when it's a bit more polished.

Simon Cruanes

2020-05-22 14:3:59 (GMT)

(I'd call that a "2.0-rc0" probably, release candidates are a thing, but for a small project it might be overkill)

Jasmin Blanchette

2020-05-28 12:9:41 (GMT)

For what it's worth, have a look at http://www.tptp.org and esp. "The TPTP Needs Money". I believe you just need to give USD 1 to get your name there. ;)

Simon Cruanes

2020-06-02 19:32:54 (GMT)

http://www.tptp.org/CASC/J10/Entrants.html#Systems wow, THF is clearly the exciting category to follow this year (followed by LTB)…

Giles

2020-06-02 20:4:14 (GMT)

More interesting than TFA! For a bit there I thought it would just be us

Simon Cruanes

2020-06-02 20:5:7 (GMT)

It's a bit sad there's no category for TF0 without arithmetic :-)
(Or maybe: FOF should die and be replaced with a typed version, cause who can't handle types in 2020, really? :upside_down: )

Giles

2020-06-02 20:14:35 (GMT)

TF0 could be added as a subcategory to FOF similar to equality/no-equality to have types/no-types. I think more than half of the systems entered into FOF don't support types currently, I may be wrong. I think the main reason for lack of interest in TF0 is a lack of problems in TPTP.

Simon Cruanes

2020-06-02 20:44:0 (GMT)

I agree, there aren't that many typed problems in there. It's a bit of a vicious circle.

Jasmin Blanchette

2020-06-03 5:48:44 (GMT)

We should perform type inference on the FOF problems, using a tool like Monotonox.

Jasmin Blanchette

2020-06-03 5:48:53 (GMT)

And generate TF0 problems.

Jasmin Blanchette

2020-06-03 5:49:14 (GMT)

And kill the old FOF problems. ;)

Simon Cruanes

2020-06-03 12:38:5 (GMT)

Jasmin Blanchette said:

We should perform type inference on the FOF problems, using a tool like Monotonox.

did you mean dolmen? :stuck_out_tongue:
(I mean, inferring precise sorts is useful for satisfiable problems but seems quite different than just replacing fof with tff and declaring symbols at the beginning after inferring if they're predicates or functions).

Guillaume Bury

2020-06-03 13:3:22 (GMT)

Dolmen should soon be able to do this kind of task indeed (well, more generally, reading a file, typing it and then exporting it to another supported format, I guess I can add some optionts to "promote" fof statements to tff statements).
On the other hand, according to the inference rules for tff, simply replacing fof by tff textually should pretty much work in all situations if I remember the rules correctly.

Simon Cruanes

2020-06-03 13:22:36 (GMT)

There's also basic tooling for that in Zipperposition, but it's probably obsolete now that dolmen exists.

Simon Cruanes

2020-06-03 13:23:2 (GMT)

I think tff demands that symbols be declared? Even if variables of type $i don't have to be annotated?

Guillaume Bury

2020-06-03 13:27:38 (GMT)

Actually, tff specifies relatively complex inference rules: first is the fact that you can syntactically (or during a descent in type-checking) know whether to expect a term or a formula at any point in the AST, then tff specifies that when encountering a constant symbol (because in tptp you can syntactically differentiate variables that start with an uppercase from constant that start with a lowercase) that has not been declared previously, you should infer its type as taking the number of arguments to which it is currently applied, all of type $i, and returning the approriate type depending on what you expect (so either $o or $i).

Simon Cruanes

2020-06-03 13:28:48 (GMT)

It's still pretty simple and corresponds to the inference you need to do even in fof (to differentiate predicates from functions). I agree you need to see the whole problem first, but it's still pretty easy.

Guillaume Bury

2020-06-03 13:33:23 (GMT)

Simon Cruanes said:

It's still pretty simple and corresponds to the inference you need to do even in fof (to differentiate predicates from functions). I agree you need to see the whole problem first, but it's still pretty easy.

Exactly: the tff inference rules actually seem to be exactly what you'd need to do to type fof, hence why I think changing fof to tff textually might actually be a viable solution. Evidently, it is risky and brittle, so having dolmen do that while checking the types seems more reasonable, :)

Jasmin Blanchette

2020-06-03 13:59:44 (GMT)

Simon Cruanes said:

Jasmin Blanchette said:

We should perform type inference on the FOF problems, using a tool like Monotonox.

did you mean dolmen? :stuck_out_tongue:
(I mean, inferring precise sorts is useful for satisfiable problems but seems quite different than just replacing fof with tff and declaring symbols at the beginning after inferring if they're predicates or functions).

I was only half serious, but continuing in that vein: The point of using Monotonox wasn't to get a syntactically correct TF0 problem -- I'm well aware that replacing fof by tff etc. does the trick -- but to create problems that are genuinely many-sorted, which in turns makes them more convenient for a variety of usages (reading by humans, proving (think superposition from variables), model-finding with Paradox or iProver or Vampire, ...).

Simon Cruanes

2020-06-03 14:0:55 (GMT)

Yes, I agree properly inferring sorts is useful, but it's truly changing the problem in many cases (making it easier, hopefully). Doing both could probably be useful! Keeping the original version and the refined one, the same way a lot of problems currently exist both in CNF and FOF forms.

Jasmin Blanchette

2020-06-03 14:14:28 (GMT)

I don't think Geoff would honestly kill the FOF problems -- they're useful if nothing else to the non-TFF provers left. As for "truly changing the problem", it depends on your definition of "truly", but I'd say that any Monotonox-induced change is a rather shallow change. ;)

Giles

2020-06-03 14:54:42 (GMT)

Martin Suda was pointing out to me we should do this anyway as there is a small overhead in superposition where we consider superpositions for incompatible 'subsorts' - lifting FOF problems to TFO would avoid this. We do it anyway for finite model building.

Giles

2020-06-03 14:58:39 (GMT)

Of course I responded without reading the full thread and seeing that Jasmin had already hinted at this.

There used to be a difference between fof and tff in that 1 used to be a valid constant of type $i in fof but is obviously of type $int in tff but the rules were changed and Geoff hunted down the numbers in TPTP.

Jasmin Blanchette

2020-06-03 15:2:33 (GMT)

Looking at CASC this year, I don't see that many ATP developers left in this world. If we can get together and all agree that typing FOF problems automatically is a good thing (perhaps something we should check empirically first), we could talk to Geoff.

Jasmin Blanchette

2020-06-03 15:3:1 (GMT)

By "we" I mean "you" -- I don't think I've written a single line of ATP code yet, regrettably. :S

Jasmin Blanchette

2020-06-03 15:3:17 (GMT)

And by "you", I mean "you guys". ;)

Simon Cruanes

2020-07-02 16:34:17 (GMT)

What's the status of the LTB algorithm used in the end? It seems to perform pretty well!

Petar Vukmirovic

2020-07-02 16:34:35 (GMT)

It is a stupid script I implemented in one day

Petar Vukmirovic

2020-07-02 16:34:41 (GMT)

WIth some basic heuristics

Simon Cruanes

2020-07-02 16:36:7 (GMT)

Well seems pretty good for a script written in one day ;)