2019-12-13 23:25:54 (GMT)
I just moved and renamed the test tool used in nunchaku to benchpress on Sneeuwballen.
It's a pure OCaml tool that can currently run benchmarks, analyze them a bit, and present them in a very basic web UI; I plan to make it able to do more advanced analysis, plotting, etc. I think it could also be useful for E at some point.
2020-01-16 4:25:17 (GMT)
An output of running Zipperposition on TPTP-6.1:
show.html (cc @Petar Vukmirovic). It's on the master branch and apparently it wrongly saturates on a bunch of HO problems (see "bad" results)?
2020-01-22 16:47:44 (GMT)
hi simon
What configuration did you run on HO problems? Some flags need to be setup to correctly support HO problems.
2020-01-22 16:48:40 (GMT)
It's in the configuration file, I think there's no flag for HO :grinning:
2020-01-22 16:49:16 (GMT)
Look at tests/benchpress.sexp.
I don't think there's a bug, I just need to filter out HO problems or use proper flags (or a detection feature like --auto, that'd be nice)
2020-01-22 16:49:46 (GMT)
wait I am on Windows atm
restarting :)
2020-01-22 16:50:2 (GMT)
Ah no worries, it's not
2020-01-22 17:33:41 (GMT)
yeah I saw it now
maybe we should create a new prover called zip-ho or something
and run it with those provers
2020-01-22 17:33:53 (GMT)
as for benchmarking and optimizing we should talk about this a bit more
2020-01-22 17:40:22 (GMT)
zip-ho sounds perfect, we just need the other version to SZS GaveUp on HO problems (or maybe non lfHO problems) instead of returning a wrong answer?
2020-01-22 17:47:27 (GMT)
in principle when you turn on ho mode
we set the lost_completeness flag so the status is properly reported
2020-01-22 17:50:15 (GMT)
Yes, but without the ho mode, it should also detect if the input is not in the complete fragment and set the flag too
2020-01-22 17:50:48 (GMT)
(it does it, for example, for arithmetic: if the problem contains arithmetic it'll set lost_completeness; I think HO problems without the appropriate flags should do the same)
2020-01-22 17:55:59 (GMT)
you're right
will do
2020-01-25 18:58:56 (GMT)
It returns Innapropriate status
2020-01-25 18:59:55 (GMT)
Because I filtered out polymorphic probs since they are not applicable to competition
2020-01-25 19:2:1 (GMT)
Oh, ok, didn't know about that :grinning_face_with_smiling_eyes:
I imagine we can re-enable that. But again with the safe behavior that if it's "too" polymorphic (ie not in the prenex polymorphic fragment) we can just give up rather than giving the wrong answer.
2020-01-25 22:29:51 (GMT)
It is not inside Zip but rather in StarExec scripts
2020-01-25 22:29:55 (GMT)
Zip still works as before
2020-02-29 1:28:7 (GMT)
I updated benchpress' code and it now looks a lot better (self-contained html page).
2020-05-28 3:5:55 (GMT)
And now a version of the server. Each entry is a sqlite database that is easy to rsync from your favorite workstation :slight_smile: .