zulip log viewer

Hans-Jörg

2020-04-23 14:40:4 (GMT)

We should make a list of options and collect data.
Some things come to mind: what timeout should we choose for the experiments and what timeout should we create the scheduler for (the timelimit is afaik 20min, but the 2017 scheduler uses 33.3min for UF).

Hans-Jörg

2020-04-23 14:40:20 (GMT)

@Daniel El Ouraoui what options does you branch add?

Haniel Barbosa

2020-04-23 14:56:17 (GMT)

The timeout is indeed 20min

Daniel El Ouraoui

2020-04-30 13:29:38 (GMT)

Hans-Jörg said:

Daniel El Ouraoui what options does you branch add?

Here are the options:

--mdli to activate the selection
--limit-sk-qform=<value> limit the maximum number of time you want to skolemize each existential formula, 9 is the best value;
--limit-qform=<value> limit the number of instances par quantified formula, 100 is the best.

It's useful to play a bit with the limit of instances per sort --inst-sorts-threshold=<value>, I found that 100 was a good value.
I think it might be useful to use several configurations in the portfolio script. Somethings like that:

./veriT --disable-print-success --mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=1000 file.smt2
./veriT --disable-print-success --mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=100 file.smt2
./veriT --disable-print-success --triggers-new --triggers-m-cond --mdli --limit-sk-qform=9 --limit-qform=1000 --inst-sorts-threshold=100 file.smt2
./veriT --disable-print-success --triggers-new --triggers-m-cond --mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=100 file.smt2

Could help, it is also possible to tune the value of --limit-sk-qform between [7; 13] and --limit-qform between [40; 150], 10 by 10.

Hans-Jörg

2020-04-30 13:56:39 (GMT)

maybe we should just collect data with those 4 options? and see how/if it fits in?

Daniel El Ouraoui

2020-04-30 16:44:31 (GMT)

Yes maybe, I'll try to tune a bit. How much time I have to give you a good combination? Tomorrow is it ok?

Hans-Jörg

2020-04-30 17:0:10 (GMT)

yeah, the deadline for the first version is the 4th, but I will just use the old scheduling script. The final version is due on the 18th. I think the experiments for the old strategies will take 2-3 more days anyway.

Daniel El Ouraoui

2020-05-02 12:55:44 (GMT)

Ok, let me work a bite on that. In the meantime, you can take the last version on the inst-selection branch if you want. I updated the code to remove some experimental stuff. And I realize that increasing --inst-sorts-threshold, is not good for my optimization.
I think the best is to try this configuration:

./veriT --disable-print-success --mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=100 file.smt2

With and without triggers-new, single and multi triggers.
I'll find out some other values for --limit-qform.

Daniel El Ouraoui

2020-05-04 8:19:26 (GMT)

I just did this weekend some experiments, it seems that using the optimization as a base, with all other configurations, payoffs. I got better result by using --mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=100 by defaults with all different configurations.
Adding a bash variable in the script like that:

restriction_sk_inst="--mdli --limit-sk-qform=9 --limit-qform=100 --inst-sorts-threshold=100"

seems to be the easiest solution.

Hans-Jörg

2020-05-04 9:49:9 (GMT)

Cool. One thing though: I now ran experiments with the existing (42) strategies and it took quite a while (~4 days). I'm not sure how to integrate this. I see three options: 1) repeat the experiment with the option added and select from all the options 2) run experiments with this option added to some options and train on this 3) select from the existing experimental data and then add your option

Hans-Jörg

2020-05-04 9:54:5 (GMT)

anyway I put the results of the experiments so far into sto-verit/Experiment-Results/SMT-COMP-2020-1. There are no soundness bugs, but some other bugs which we could fix if we want.

Daniel El Ouraoui

2020-05-04 19:38:13 (GMT)

Cool, thank you. I think 2) would be the best option, but maybe the toughest... 3) the simpler and 1 a trade-off... If we could try 1) we would have a good overview of the gain with these options, and above all, if it worth it...
I couldn't see sto-verit/Experiment-Results/SMT-COMP-2020-1 is it on the git repo? On devel?

Daniel El Ouraoui

2020-05-06 14:2:21 (GMT)

I just updated the strategies_under_test file on git lab

Hans-Jörg

2020-05-08 14:46:34 (GMT)

Oh nice! Somehow this ended up in its own branch. I don't know why, I merged it now. One note: The first empty line was there on purpose: It's represents calling veriT without an option. I also implemented a simple script to optimize the order of strategies to minimize the total cpu time. I'm now testing the strategy that spit out. Afterwards I'll collect data for your new options.

Hans-Jörg

2020-05-10 16:2:17 (GMT)

I uploaded the experiment results for the new options.

Daniel El Ouraoui

2020-05-10 22:20:37 (GMT)

Yes, I updated using the web interface, maybe it is the reason why it messes up, sorry I didn't know for the empty line. I checked the result thank you. There are in SMT-COMP-2020-3? It seems that this configuration:
--mdli --limit-sk-qform=9 --limit-qform=1000 --inst-sorts-threshold=100 perform well I'm surprised.

Daniel El Ouraoui

2020-05-10 22:27:49 (GMT)

Might be interesting to add this one, it could be even better:
--mdli --limit-sk-qform=15 --limit-qform=1000 --inst-sorts-threshold=100

Hans-Jörg

2020-05-11 8:11:11 (GMT)

Ok I'll try this too and any other strategy you suggest until afternoon, but I think then we should start preparing the full scheduler and the system description.

Hans-Jörg

2020-05-11 9:38:20 (GMT)

the new scheduler has this option (limit-sk-qform=9) as the first strategy and the prediction says it solves 200 more then the last strategy. (on UF)

Hans-Jörg

2020-05-11 15:2:14 (GMT)

Hm I ran a test on gros, but sadly we solved less with the new script, I suspect it's because of a higher number of out-of memory. Gros has much less memory then the grvingt.

Hans-Jörg

2020-05-11 15:16:58 (GMT)

look at this: https://schurr.io/smtcomp/

Daniel El Ouraoui

2020-05-11 15:27:42 (GMT)

arf yes, that's strange because on grvingt, and just with UF, and timeout of 180s, it solves more problems..
I added the result here: https://members.loria.fr/DElOuraoui/veriT_port_UF_results.html

Daniel El Ouraoui

2020-05-11 15:31:18 (GMT)

Is Gros more realistic than Grvingt, SMTCOMP wise?

Hans-Jörg

2020-05-11 15:32:17 (GMT)

no less realistic, on SMT-COMP we have 60GB per benchmark

Daniel El Ouraoui

2020-05-11 15:33:25 (GMT)

Could you share the new scheduler?

Daniel El Ouraoui

2020-05-11 15:34:13 (GMT)

I'll try to check, and maybe adjust it, with grving and a smaller timeout

Hans-Jörg

2020-05-11 15:35:9 (GMT)

new.sh

Daniel El Ouraoui

2020-05-11 15:40:22 (GMT)

I see we cannot do with shorter timeout because this is what the optimizer computed?

Hans-Jörg

2020-05-11 15:43:32 (GMT)

yeah, but the optimizer tries then to minimize the total time to solve all benchmarks. so testing with smaller timeouts might still be valuable. I was just interested to do one test under competition conditions. The predition is 3433 solved benchmarks, the reality is 3046. the prediction is calculated by summing how many benchmarks have been solved in total by the picked options/time slices pairs in the tests.

Daniel El Ouraoui

2020-05-11 15:46:50 (GMT)

Arf, maybe this should be run again with grvingt, to be sure... I also think that's a memory issue. Gros is much less powerful

Hans-Jörg

2020-05-11 15:48:12 (GMT)

Yeah probably, the problem is that tests with a timeout of 20min take a long time (~6h)

Daniel El Ouraoui

2020-05-11 15:48:14 (GMT)

I'll try to dived all timeout by 10 and run it again under 120 secondes

Daniel El Ouraoui

2020-05-11 17:52:40 (GMT)

Maybe the scheduler doesn't fit in practice.
Here the results on grvingt, with 120s timeout :https://members.loria.fr/DElOuraoui/veriT_NewSched_UF.html

Hans-Jörg

2020-05-11 19:28:47 (GMT)

yeah, I'm in general confused now. I want to check tomorrow what happens to the examples where the optimizer predicts that they will be solved, but then they are not solved. I'm not sure, however, how much we can learn by dividing the timeouts by 10 since the behavior is rather nonline.

Hans-Jörg

2020-05-11 19:29:14 (GMT)

worst case we can still use the old scheduler, or some combination

Daniel El Ouraoui

2020-05-11 19:33:11 (GMT)

Yes, I think I have a clue on what's going on... I just wait for the result of news ran exp, may be I'll got the answer... :fingers_crossed:

Daniel El Ouraoui

2020-05-12 8:30:2 (GMT)

veriT was still printing statistics Infos, but the scheduler was not designed to read this king of output...That was the issue.
I just got the results they are bit better but this is still less than what the optimizer predicts... On UF and 20 min timeout it solves 3359 pbs.

Daniel El Ouraoui

2020-05-12 8:37:19 (GMT)

I disabled STAT in veriT and changed the case by a grep, in the function trywith, to be sure:

    res1="$(grep "unsat" <<< $result)"
    if [[ $res1 == "unsat" ]]; then echo "$result"; exit 0; fi

Daniel El Ouraoui

2020-05-12 9:28:0 (GMT)

Here the raw results file: https://members.loria.fr/DElOuraoui/SMTCOMP-1200_new_scheduler_raw.txt

Hans-Jörg

2020-05-13 15:28:35 (GMT)

I found a bug in the optimization script which probably resulted in wrong strategies :(

Daniel El Ouraoui

2020-05-13 19:44:8 (GMT)

Oooh that means that there is a hope for a better strategy?

Hans-Jörg

2020-05-14 9:8:25 (GMT)

Not much, but I trained a complete scheduler now, it's in the verit-smtcomp repo. I will also make one for 24s. I noticed that for some of the smaller theories there are singular options that solve all examples. I still need to implement logic to detect this.

Hans-Jörg

2020-05-15 14:29:6 (GMT)

I fixed a couple of bugs and did a lot of refactoring.now there is a retrained scheduler and 24s scheduler in verit-smtcomp

Hans-Jörg

2020-05-15 15:29:28 (GMT)

oh another thing: the prediction is wrong, I used the wrong counting function ^^ it's the number of benchmarks that are solved in total (dh. the union of benchmarks solved by the different options)

Daniel El Ouraoui

2020-05-16 9:54:21 (GMT)

Do you think it is necessary to launch new exps? Where I should add the system description ?

Hans-Jörg

2020-05-17 16:26:50 (GMT)

I played around wit a scheduler for 24s (it's not really worth it), but one thing: we loose examples with a new grep based function (in the 24s scheduler) so I will go back to the old one. I hope veriT never prints out crap

Haniel Barbosa

2020-05-17 17:0:53 (GMT)

Just be careful that starexec collapses stdout and stderr and that this year it's the solver's job to capture stderr (if it would throw off the postprocessors)

Hans-Jörg

2020-05-17 19:21:0 (GMT)

So stderr should be eliminated?

Haniel Barbosa

2020-05-17 19:24:51 (GMT)

Not necessarily. It's just that the postprocessors expect well behaved outputs, as described in the rules

Haniel Barbosa

2020-05-17 19:25:23 (GMT)

so if you have stderr for example printing "unknown" that might be captured by the postprocessor

Haniel Barbosa

2020-05-17 19:26:5 (GMT)

that was an issue last year in the competition, so the organizers wrapped all solvers to kill stderr. This year we simplified this by leaving this for the solvers themselves

Hans-Jörg

2020-05-18 8:26:29 (GMT)

the cvc4/veriT scritpt actually merges too

Hans-Jörg

2020-05-18 9:44:40 (GMT)

@Daniel El Ouraoui system descriptions are in verit-extra

Hans-Jörg

2020-05-19 10:56:16 (GMT)

@Daniel El Ouraoui both schedulers do now perform better than predicted: the 24s scheduler solves 3279 the prediction was 3266. The 1200s scheduler solves 3421 with a prediction of 3410

smt-comp

Hans-Jörg

Hans-Jörg

Haniel Barbosa

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Hans-Jörg

Hans-Jörg

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Hans-Jörg

Daniel El Ouraoui

Daniel El Ouraoui

Daniel El Ouraoui

Daniel El Ouraoui

Hans-Jörg

Daniel El Ouraoui

Hans-Jörg

Hans-Jörg

Hans-Jörg

Daniel El Ouraoui

Hans-Jörg

Haniel Barbosa

Hans-Jörg

Haniel Barbosa

Haniel Barbosa

Haniel Barbosa

Hans-Jörg

Hans-Jörg

Hans-Jörg