zulip log viewer

benchpress: benchmarking tool

Simon Cruanes

2019-12-13 23:25:54 (GMT)

I just moved and renamed the test tool used in nunchaku to benchpress on Sneeuwballen.
It's a pure OCaml tool that can currently run benchmarks, analyze them a bit, and present them in a very basic web UI; I plan to make it able to do more advanced analysis, plotting, etc. I think it could also be useful for E at some point.

Simon Cruanes

2020-01-16 4:25:17 (GMT)

An output of running Zipperposition on TPTP-6.1:
show.html (cc @Petar Vukmirovic). It's on the master branch and apparently it wrongly saturates on a bunch of HO problems (see "bad" results)?

Petar Vukmirovic

2020-01-22 16:47:44 (GMT)

hi simon

What configuration did you run on HO problems? Some flags need to be setup to correctly support HO problems.

Simon Cruanes

2020-01-22 16:48:40 (GMT)

It's in the configuration file, I think there's no flag for HO :grinning:

Simon Cruanes

2020-01-22 16:49:16 (GMT)

Look at tests/benchpress.sexp.
I don't think there's a bug, I just need to filter out HO problems or use proper flags (or a detection feature like --auto, that'd be nice)

Petar Vukmirovic

2020-01-22 16:49:46 (GMT)

wait I am on Windows atm
restarting :)

Simon Cruanes

2020-01-22 16:50:2 (GMT)

Ah no worries, it's not

Petar Vukmirovic

2020-01-22 17:33:41 (GMT)

yeah I saw it now

maybe we should create a new prover called zip-ho or something
and run it with those provers

Petar Vukmirovic

2020-01-22 17:33:53 (GMT)

as for benchmarking and optimizing we should talk about this a bit more

Simon Cruanes

2020-01-22 17:40:22 (GMT)

zip-ho sounds perfect, we just need the other version to SZS GaveUp on HO problems (or maybe non lfHO problems) instead of returning a wrong answer?

Petar Vukmirovic

2020-01-22 17:47:27 (GMT)

in principle when you turn on ho mode
we set the lost_completeness flag so the status is properly reported

Simon Cruanes

2020-01-22 17:50:15 (GMT)

Yes, but without the ho mode, it should also detect if the input is not in the complete fragment and set the flag too

Simon Cruanes

2020-01-22 17:50:48 (GMT)

(it does it, for example, for arithmetic: if the problem contains arithmetic it'll set lost_completeness; I think HO problems without the appropriate flags should do the same)

Petar Vukmirovic

2020-01-22 17:55:59 (GMT)

you're right

will do

Petar Vukmirovic

2020-01-25 18:58:56 (GMT)

It returns Innapropriate status

Petar Vukmirovic

2020-01-25 18:59:55 (GMT)

Because I filtered out polymorphic probs since they are not applicable to competition

Simon Cruanes

2020-01-25 19:2:1 (GMT)

Oh, ok, didn't know about that :grinning_face_with_smiling_eyes:
I imagine we can re-enable that. But again with the safe behavior that if it's "too" polymorphic (ie not in the prenex polymorphic fragment) we can just give up rather than giving the wrong answer.

Petar Vukmirovic

2020-01-25 22:29:51 (GMT)

It is not inside Zip but rather in StarExec scripts

Petar Vukmirovic

2020-01-25 22:29:55 (GMT)

Zip still works as before

Simon Cruanes

2020-02-29 1:28:7 (GMT)

I updated benchpress' code and it now looks a lot better (self-contained html page).

Simon Cruanes

2020-05-28 3:5:55 (GMT)

And now a version of the server. Each entry is a sqlite database that is easy to rsync from your favorite workstation :slight_smile: .