2019-03-22 20:32:7 (GMT)
I'd like to make a release soon-ish, withdune and iter and msat 0.8 in it. But I heard Petar has improved the heuristics for first-order so maybe we should wait a bit to integrate that?
2019-03-22 20:38:18 (GMT)
Petar's heuristic changes and my bug fixes are all based on the version that contains the new HO mode. So if we integrate them, we should probably integrate everything.
2019-03-22 20:40:8 (GMT)
I am not sure how much work that is. Depends on how much you changed since I pulled from dev (around November i guess).
2019-03-22 20:54:50 (GMT)
hmm most changes are on master, but a lot of it is s/Sequence/Iter, and some changes in the SAT solver's interface. I haven't touched to the rest (or barely).
2019-03-22 20:55:16 (GMT)
Still, if it's based on the HO branch then it can wait until your work is merged, which is non trivial.
2019-03-22 20:56:30 (GMT)
We should ask Petar if he really based his stuff on my branch, but I would guess so.
2019-03-22 20:57:37 (GMT)
The question if/when we merge is what to enable by default. Should the first-order calculus be preserved, or should the prover try at least lfhol unconditionnally?
2019-03-22 21:0:56 (GMT)
lfhol?
2019-03-22 21:1:19 (GMT)
I removed lambda-free from our branch :D
2019-03-22 21:2:52 (GMT)
Do you mean our new calculus?
2019-03-22 21:3:27 (GMT)
Let's call it lamsup, because it can handle lambdas now...
2019-03-22 21:12:43 (GMT)
Alright. Anyway I suppose that if it's conditional on detecting that the input is HO (or on --ho being passed), it wouldn't affect first-order at all, so it could simply live in the master branch.
2019-03-22 21:14:19 (GMT)
Yes, it's activated by passing an option only
2019-03-22 21:14:56 (GMT)
The default mode should be more or less the same (modulo bug fixes and heuristic improvements by Petar)
2019-03-22 21:16:15 (GMT)
I've implemented an option --mode that we could use to make the different modes easier to access
2019-04-16 22:30:20 (GMT)
Is this on master? I'd like to release soon ^^
2019-04-16 23:59:33 (GMT)
I don't think so. I think Petar is working on the MPI repo. @Petar Vukmirovic What branch are you working on currently?
2019-04-17 0:1:27 (GMT)
I copied Petar's implementation a while ago from the MPI to GitHub. It's on lzip: https://github.com/c-cube/zipperposition/tree/lzip.
2019-04-17 0:24:51 (GMT)
Most merge conflicts seem to come from Sequence vs Iter. I'll have a closer look.
2019-04-17 4:14:23 (GMT)
you may want to also do sed -i 's/Sequence/Iter/g' on the source files on the lzip branch, it could help with the merge.
2019-04-17 15:50:48 (GMT)
I think Petar is on holiday. I am not sure which branch we should merge into master. Maybe we should wait for him to come back.
2019-04-17 18:45:17 (GMT)
Sure, it can wait.
2019-04-19 18:12:12 (GMT)
I tried to replace Sequence/Iter and added iter in the .opam files. iter is installed on opam, too. But still I get Error: Unbound module Iter. Any ideas?
2019-04-19 18:19:35 (GMT)
You also need to update the dune files (see the similar commit on master)
2019-04-19 18:44:32 (GMT)
Ok, thanks. It still doesn't compile, but I think I'll merge first and repair afterwards.
2019-04-19 18:47:11 (GMT)
opam install iter?Iter after updating the dune files?2019-04-19 19:3:32 (GMT)
No, it was complaining about another module. But it doesn't matter because after merging it now compiles.
2019-04-19 19:6:32 (GMT)
It'd be optimal if you could now run the prover on as many tests as possible :)
2019-04-19 19:7:26 (GMT)
2019-04-19 19:8:24 (GMT)
It compiles, but some tests fail. But some of them also failed before merging. Let's wait for Petar...
2019-04-19 19:13:38 (GMT)
I meant a set of benchmarks, but local tests are important too, indeed.
2019-04-22 19:24:20 (GMT)
Hi guys!
2019-04-22 19:24:28 (GMT)
Can you get me up to speed on tests?
2019-04-22 19:28:5 (GMT)
I'm no expert but it seems like it may be related to unification…?
2019-04-22 19:28:30 (GMT)
Sorry, what I posted there won't help at all :D. Let's start from the beginning.
I merged master and supext_fix. I think already on supext_fix, there are a few tests that fail. Do you know why, Petar?
2019-04-22 19:31:55 (GMT)
On supext_fix, it currently looks like this:
Testing all. This run has ID `027DD200-AFB0-474C-9964-E36FFB0DE049`. [OK] units 0 subst.rename. [OK] units 1 subst.unify. [OK] units 2 multiset.max. [OK] units 3 multiset.compare. [OK] units 4 multiset.size. [OK] units 5 ordering.rpo. [OK] units 6 ordering.kbo. [OK] units 7 db shift. [OK] units 8 db unshift. [OK] units 9 is_appvar. [OK] units 10 whnf1. [OK] units 11 patterns. [OK] units 12 poly app. [OK] units 13 eta reduce. [OK] units 14 eta qreduce. [OK] units 15 eta expand. [OK] units 16 check unifiable. [OK] units 17 check unifiable. [OK] units 18 check unifiable. [OK] units 19 check unifiable. [OK] units 20 check unifiable. [OK] units 21 check unifiable. [OK] units 22 check unifiable. [OK] units 23 check unifiable. [OK] units 24 check unifiable. [OK] units 25 check unifiable. [OK] units 26 check unifiable. [OK] units 27 check unifiable. [ERROR] units 28 check matchable. [OK] units 29 regression matching. [OK] units 30 JP unification. ... units 31 JP-unif check count.^C
And then it doesn't terminate because I didn't put an upper bound on the steps that JP unif is allowed to make...
2019-04-22 19:33:7 (GMT)
After my merge, I get even more errors:
[ERROR] units 0 subst.rename. [ERROR] units 1 subst.unify. [OK] units 2 multiset.max. [OK] units 3 multiset.compare. [OK] units 4 multiset.size. [OK] units 5 ordering.rpo. [OK] units 6 ordering.kbo. [OK] units 7 db shift. [OK] units 8 db unshift. [OK] units 9 is_appvar. [OK] units 10 whnf1. [OK] units 11 patterns. [OK] units 12 poly app. [OK] units 13 eta reduce. [OK] units 14 eta qreduce. [OK] units 15 eta expand. [ERROR] units 16 check unifiable. [ERROR] units 17 check unifiable. [ERROR] units 18 check unifiable. [ERROR] units 19 check unifiable. [ERROR] units 20 check unifiable. [ERROR] units 21 check unifiable. [ERROR] units 22 check unifiable. [ERROR] units 23 check unifiable. [OK] units 24 check unifiable. [OK] units 25 check unifiable. [OK] units 26 check unifiable. [ERROR] units 27 check unifiable. [ERROR] units 28 check matchable. [OK] units 29 regression matching. [ERROR] units 30 JP unification. ... units 31 JP-unif check count.^C
2019-04-22 19:35:14 (GMT)
Should I look into the errors that I introduced while merging and you (Petar) look into the errors that were already present on the supext_fix branch?
2019-04-22 19:38:52 (GMT)
I forgot how, but there's a way to see the actual error message for the test, of course. That could be useful.
2019-04-22 19:40:1 (GMT)
I think if you wait long enough there is a timeout and you'll get the message
2019-04-22 19:40:22 (GMT)
But I already found a big problem with my merge. I'll try to fix that first :D
2019-04-22 19:52:38 (GMT)
It's weird that I get a lot of errors displayed in my editor that do not occur if I compile on the terminal...
2019-04-22 19:56:13 (GMT)
it depends on what profile you use, I wonder (dune build is much more strict than dune build --profile-release)
2019-04-22 19:56:47 (GMT)
I don't know what I did, but I just got rid of them :D
2019-04-22 19:57:31 (GMT)
Possibly some things were not compiled yet and merlin was complaining?
2019-04-22 19:57:54 (GMT)
Maybe it's just VSCode that's behaving weirdly
2019-04-22 20:18:46 (GMT)
Ok, found the error that was introduced by the merge. Now the tests look like on Petar's branch. It would be cool if you could look into this, @Petar Vukmirovic , as you probably know better where they are coming from.
2019-04-22 20:19:44 (GMT)
Sorry guys
2019-04-22 20:19:56 (GMT)
Was not getting notifications on my phone for Zulip
2019-04-22 20:20:18 (GMT)
Thanks for merging!
2019-04-22 20:20:23 (GMT)
I will do the check tomorrow
2019-04-22 20:20:32 (GMT)
Let you know as soon as I fix 8t
2019-04-22 20:22:2 (GMT)
I think it'd be nice if @Alexander Bentkamp could run the branch's Zipperposition on TPTP, as a form of integration test, to ensure there's no new unsoundness?
2019-04-22 20:29:20 (GMT)
Note: I'm also fixing a tiny issue that pops up in older OCaml versions on https://travis-ci.org/c-cube/zipperposition/
2019-04-23 8:9:22 (GMT)
I can also do that :) I have a feeling Alex has a lot in his hands
2019-04-23 16:22:12 (GMT)
I have successfuly merged, and made all tests pass
2019-04-23 16:23:10 (GMT)
Now we need to run this on TPTP or something similar, then I'll merge into master and make a release :)
2019-04-23 16:23:15 (GMT)
Few notes: The matching test failed because pattern/LFHO higher-order unifciation algorithm implementened previously
did some smart stuff with lifting constants of variables, but dealt with DeBruijn indices incorrectly
2019-04-23 16:23:45 (GMT)
3 tests in JP unification hanged because it just became too slow when implemented with full iteration :(
2019-04-23 16:24:2 (GMT)
Now we need to run this on TPTP or something similar, then I'll merge into master and make a release :)
The task is running :)
2019-04-23 16:24:14 (GMT)
Ah, you mean going from f a b to (λx y. f x y) a b, hmm?
2019-04-23 16:27:1 (GMT)
Yup
2019-04-23 16:27:23 (GMT)
The problem is that it would do the same with terms that have no vars
but do have loosely bound variables
2019-04-23 16:27:42 (GMT)
and then because no shifting occurs when you apply substitution --
all kinds of funky things happen.
2019-04-23 16:28:17 (GMT)
Right now I disabled the lifting if the term is not ground and _closed_ + removed the test case since now it is out of the fragment.
2019-04-23 16:28:31 (GMT)
Well that's worrying indeed. It was an experiment, I'm glad we now have a more robust calculus thanks to y'all's hard work!
2019-04-23 16:28:33 (GMT)
3 tests in JP unification hanged because it just became too slow when implemented with full iteration :(
Didn't I implement full iteration? I remember those tests passed before. What has changed exactly?
2019-04-23 16:29:16 (GMT)
You remember how we found out that you also need to iterate on non-functional arguments.
2019-04-23 16:30:10 (GMT)
I implemented literally only that change (few lines) and now it hangs (in the first case because the unifier is very particular
and in the others because now there are more than 1 or 2 unifiers.
2019-04-23 16:30:22 (GMT)
most of the test cases still pass though
2019-04-23 16:31:1 (GMT)
Ah, you mean going from
f a bto(λx y. f x y) a b, hmm?
It is still smart and useful idea that we in the group all independently came up with :)
2019-04-23 16:32:18 (GMT)
Ah, right! I remember now. Hopefully we can improve JP a bit when I am back.
2019-04-23 16:33:5 (GMT)
What are your thoughts about --mode? What modes are there now and what modes should there be?
2019-04-23 16:33:18 (GMT)
Actually, because of architectural reasons I think improving pure JP is hopeless.
If we want something pragmatic we should work on the architecture I have in PragHOUnif.
WEe can discuss this on one ocassion.
2019-04-23 16:34:16 (GMT)
Even if we restrict ourselves to second order unification?
2019-04-23 16:35:27 (GMT)
What are your thoughts about
--mode? What modes are there now and what modes should there be?
I wanted to talk exactly about that. In my opinion we need 3:
As fast as possible first-order
As complete as possible higher-order
Best performing higher-order (some compromise between performance on FO and completeness)
2019-04-23 16:40:11 (GMT)
Even if we restrict ourselves to second order unification?
I think so. The main problem is when beta-reduce and apply substitution too much. That is common to any order of higher-order logic.
2019-04-23 16:40:53 (GMT)
While unifying you are mostly interested in what is the head of the term, and for that you only need to dereference the head of the term and get terms in whnf. From my experience, this brings huge speed improvement.
2019-04-23 16:41:3 (GMT)
But of course, we can discuss this further.
2019-04-23 16:43:58 (GMT)
About the modes: So far, I have always switched off features like avatar, induction & arithmetic for the evaluation because I thought that is not what I want to evaluate. That's why I gave the current modes the suffix "-basic". But maybe we don't need extra modes for this. And maybe we actually want to enable these features in the eval. What do you think?
2019-04-23 16:48:28 (GMT)
I don't know on top of my mind. I think we need to evaluate all that to have the clear picture.
We need to do some fairly extensive evaluation of combination of parameters and then see
what happens.
2019-04-23 16:49:32 (GMT)
Ok, but we don't really need a mode for every configuration that we want to evaluate, right?
2019-04-23 16:50:43 (GMT)
So I think the three that you proposed sound good.
2019-04-23 16:53:46 (GMT)
Do we want a HO mode that uses the old architecture without streams of unifiers? And what should be the default mode? Should the default mode also get a name? If yes, what effect is --mode default supposed to have?
2019-04-23 17:25:43 (GMT)
I'd like the default mode to be automatic, if possible (iirc there are some basic feature detections for arithmetic or induction, to be confirmed). It should try to use whatever is needed for the problem at hand.
If you want to evaluate a particular (set of) features then --mode ho or similar is the way to go?
2019-04-23 17:27:14 (GMT)
I don't know how Avatar (or rather its simplified version) fits into your calculi but I think it generally helps in the benchmarks, it should be on in default modes, I think. Arithmetic is less robust, induction as well (but they should at least keep working if required).
2019-04-23 17:55:56 (GMT)
Ok, but we don't really need a mode for every configuration that we want to evaluate, right?
No but we have to know which features should go in there (in each of the non-complete modes)
2019-04-23 17:58:22 (GMT)
Ok, but we don't really need a mode for every configuration that we want to evaluate, right?
Actually that sounds like a pretty good idea to me, for more "specialized" modes. Could also be useful as ± snapshots of parameters (like --mode casc-tff-2019).
2019-04-24 8:32:3 (GMT)
OK. Good news. I tested Zip on 1000 problems, and it performs quite nice, with no discovered bugs so far.
Full TPTP is still running (and will be until tomorrow I think).
2019-05-01 15:2:42 (GMT)
Hi @Simon Cruanes , why do you want to make the release now? Petar and I think that it would be better to wait 1-2 months until we (probably mostly Petar) have tested properly, tuned the heuristics and cleaned up.
2019-05-01 15:25:15 (GMT)
Sure, I thought heuristics were up to date, but it's not that urgent. I just like to have significant changes released but maybe I thought things were more stable than they are!