2019-04-03 16:20:43 (GMT)
Some potentially interesting sources of benchmarks, beyond the TPTP library and Sledgehammer (/ Judgment Day), all produced by Chad Brown and sometimes some colleagues of his:
GRUNGE:
https://arxiv.org/pdf/1903.02539.pdf
Grundlagen:
http://grid01.ciirc.cvut.cz/~chad/grundlagen-thf-ltb.tgz
http://grid01.ciirc.cvut.cz/~chad/grundlagen-thf.tar.7z
Mizar:
http://grid01.ciirc.cvut.cz/~chad/mizar-thf-ltb.tgz
http://grid01.ciirc.cvut.cz/~chad/mptp_thf.tgz
Please make sure to acknowledge/thank Chad (and, for GRUNGE, his coauthors) if you use these.
P.S. Unfortunately Chad is hard to contact, but he can be reached among others through snail mail.
2019-04-03 16:24:57 (GMT)
Do you know if these are going to be included into TPTP? On that note, did you or do you plan to submit the JD benchmarks into TPTP? It'd be good to diversify the THF library
2019-04-03 16:26:23 (GMT)
We submitted some JD problems on several occasions but Geoff typically takes only a small subset of what I send him. He's trying to keep some balance or something.
2019-04-03 16:26:57 (GMT)
But I was thinking of releasing a new, much larger and more diverse set of benchmarks and call it Halloween maybe, to replace Judgment Day.
2019-04-03 16:27:33 (GMT)
These would include that you generated fro Benzmueller?
2019-04-03 16:27:37 (GMT)
HOLowWins? :thinking:
2019-04-03 16:29:5 (GMT)
They would not include de Benz ones. They would be selected purely randomly from all the AFP (Archive of Formal Proof), the library of user contributions. And they would exist in several versions (SMT-LIB 2, SMT-LIB 3, TH1, TH0, TF1, TF0).
2019-04-03 16:29:5 (GMT)
This new library could contain versions in THF and in HOSMT, that'd be good
2019-04-03 16:29:16 (GMT)
Yes exactly.
2019-04-03 16:30:16 (GMT)
Selecting from AFP is different than selecting from the Isabelle theories?
2019-04-03 16:30:27 (GMT)
I don't know much about all that :)
2019-04-03 16:30:38 (GMT)
Simon: I live in HOL-land. And I'm an alco-HOL-ic. Compared with that, any HOL pun is going to be lame. ;)
HOLy soit qui mal y pense.
2019-04-03 16:32:18 (GMT)
The Benz benchmark were also from the AFP, but I don't know what I did to select them (pure randomness, or followed some hunch).
I want a true representative sample of the AFP, and avoid having too many problems coming from the same theory file (for more variety). Anyway, it doesn't matter; the point is simply that I should at some point sit down and do this, or better find somebody who's willing to do it. ;)
2019-04-03 16:40:34 (GMT)
I see