[POPLmark] Experience report with Twelf

Wed Aug 17 22:58:32 EDT 2005

Hi Stephanie and everyone,

I'm a second-year grad student at CMU; I've been using Twelf in a class
and in my research for almost a year now.  I'm writing this response to
explain some of my experience with Twelf, in the hopes that you will
better understand why I find it useful.  In particular, I've never used
another proof assistant, so I don't have anything to compare Twelf to,
and I'm not trying to argue that Twelf is better than, say, Coq.

> More importantly, each time I'd like to use the fact that false
> implies anything, I have to prove it as a separate lemma. (In my
> preservation proof, this was 6 times). Each time I used Leibniz
> equality, I also had to prove that (13 times). What is a simple tactic
> in Coq ("subst") is 5 tedious lines of Twelf.  

I've had to do this too.  However, I find the (tacticless) style of
Twelf proofs useful.  Because object-language derivations are isomorphic
to canonical LF terms and meta-proofs proceed by induction over
canonical forms, the meta-proofs have a close correspondence to the rule
inductions over derivations that we do on paper.  When writing a Twelf
proof, I more or less just write out the rule induction proof that I'd
do on paper and Twelf checks it.  Also, it makes it easy to translate a
Twelf proof, case by case, into a traditional paper proof.  Kevin
Watkins has pointed this out on this list before.

> The proof of the progress theorem includes a case analysis of whether
> A is a subtype of B or not. In constructive logic, we have to show
> that subtyping is decidable. In Coq, I used "apply Classic." In Twelf,
> Geoff wrote 300 lines of code. Granted, these extra lines actually
> proved more: we got a proof that subtyping is decidable in FJ. But, as
> I was just interested in the type soundness of FJ, so being forced to
> prove more properties was annoying.

Intuitionistically, what you're proving when you don't actually do the
decidability proof is "the law of the excluded middle (for subtyping)
implies progress".

You could do this in Twelf by leaving an unproven meta-theorem stating
decidability:

lem-for-subtyping : {a : tp} {b : tp} sub-or-not a b -> type.

(where sub-or-not is essentially a sum of subtyping and not, however you
define them).  

If you're happy with "apply Classic", then you should be fine with
leaving this meta-theorem unproven, since it is "obviously" true. =)

Or, you could state progress with an extra premise of type ({a : tp} {b
: tp} sub-or-not a b), though I don't immediately see that it would be
easy to satisfy this premise once you've proven decidability as a
meta-theorem.

> 3) HOAS, non-modularity and strengthening

The issues with variable cases are in some sense the price for
higher-order syntax and judgements.  In my research, I've found the
complications with variables a small price to pay for the ease that
higher-order syntax and judgements provide.  I never think about
alpha-conversion or capture-avoiding substitution for syntax, even
though it's present in everything I do.  When I write an auxiliary
hypothetical judgement, I don't have to prove substitution for it---I
just apply the hypothetical to derivations satisfying its premises.
Then again, I can always just ask Karl what to do when things go
wrong. =)

Cartsen Schurmann's forthcoming Delphin system is supposed to address
these problems.  As I understand it, it will allow you to write
meta-proofs as total functional programs over the canonical LF terms;
you'll be able to supply the cases for variables directly and locally,
rather than in the context.

Sometimes strengthening does come up.  More than once, though, I've
thought I needed strengthening when really I had corrupted my
subordination relation (i.e., Twelf thought types were subordinate when
they shouldn't have been).  The subordination relation tells you when
objects of one type can appear in another.  You can use the command (in
the Twelf server directly) Print.subord to print the subordination
relation and inspect it.  I'm not sure that it shows the transitive
closure, so you may have to compute that yourself.

> 4) Bottom-up vs. top-down
> 
> In Twelf, my proof development was very bottom up. This was
> particularly true, because I couldn't make assertions in Twelf. I
> couldn't tell Twelf that something was a total logic program (assert
> that it is a theorem) without proving it (I was using the CVS version
> of Twelf, so the "feature" described on the Twelf wiki didn't work. I
> imagine that it would be trivial to add this to Twelf, so perhaps it
> is already there and I just don't know how to do it.).

I just checked out the latest version from CVS, and it is still Twelf
1.5R2, Mar 13, 2005.  This version definitely still has the %total bug
that lets you leave a hole for an unproven metatheorem; I'm using it
fairly heavily in a proof I'm doing right now.  Note that to exploit the
bug you have to actually get to the totality check: it doesn't work if
Twelf trips over one of the cases of the theorem or the %worlds
declaration.

I believe that future versions will have a %trustme declaration that, in
unsafe mode, will let you assert unproven metatheorems.

> Separate from this is the difference between using an induction tactic
> in Coq and the coverage checker in Twelf. In Coq, after I define an
> inductive datatype, it tells me what the induction principal is, and
> in fact interactively guides me through the steps of using it.  In
> Twelf, I have to guess that principal myself. And according to Twelf's
> coverage checker, I'm not very good at guessing it. I imagine with
> more experience with Twelf, I will get better at it. This may be more
> of an issue for the novice, but it does make Twelf's learning curve
> all the more steep.

In my experience, the totality checker gets much easier to understand
after a while.  I've never had problems understanding the termination
checker. Most of the time, the argument to the inductive call has to be
a syntactic subterm of the input, so when you get an error, it's pretty
easy to look and see that it isn't.  (This gets a little more
complicated if you use a %reduces declaration, since then the inductive
argument might be the output of another call, but even then I haven't
found it hard to keep track of).  

It took me a little while to get used to the input coverage checker.  In
part, this is because the coverage checker is using the magic of pattern
matching with dependent types to rule out a bunch of cases for you (for
example, when proving progress for simple languages, the coverage
checker can essentially do canonical forms for you).  Sometimes I have
to pause for a moment to understand what contradiction ruled out a
particular case, but I don't really mind when Twelf makes less work for
me!

There are a couple of insidious cases, though.  The first is when you
think you've covered a particular case, but Twelf isn't convinced.  In
my experience, this has always occurred when I accidentally unified two
unification variables that, in the general case, should be distinct.
Thus, the case I wrote down really didn't cover what I thought it did,
since I assumed in that case that two distinct things were equal.  I
have two ways of figuring out what I did wrong:

(1) inspect the type that Twelf infers for the case.  For example, when I
see an equality judgement relating a thing to itself, it's usually the
problem.  That is, 

%% definitional equality
deq : tm -> tm -> tp -> type.

when I see something like deq M M A in a type, it's often a bug.

Comparing the inferred type to the required type is sometimes helpful,
too, but I usually get discouraged by the alpha-renaming and go right on
to the second technique and...

(2) add type annotations that give names to the implicit arguments.  If
you write down two different variables, Twelf won't unify them, and
instead you'll get a type error *when you type check the case*.  

For instance, say we are proving that definitional equality is
symmetric

deq-sym : deq M1 M2 A -> deq M2 M1 A -> type.
%mode deq-sym +X1 -X2.

and we thought we covered the case for application with a case like

 - : deq-sym
      (deq-app ...)
      _
      ...

If we accidentally unified the indices in the premise, we'd get a
coverage error, since we'd only have covered the case for 
(deq (app M N) (app M N) A).  If we add a type annotation

 - : deq-sym
      ((deq-app ...) : deq (app M N) (app M' N') A)
      _
      ...

forcing the two input terms to be distinct, we'll get a type error, and
we can then trace it through.

The second insidious case is when the coverage checker splits the wrong
variable, and that prevents it from splitting what it needs to split to
verify your proof.  You can see what Twelf is splitting by using the
(Twelf server buffer) command 'set chatter 6' (or higher to see
more--the default is I think 3).  Twelf seems to split from the inside
out.  One time I got Twelf to accept a proof by permuting the arguments
in the theorem declaration so that the type family I wanted it to split
was at the end of the inputs.  I.e.

thm : A -> B -> C -> D -> type
%mode thm +X1 +X2 +X3 -X4.  

to 

thm : B -> C -> A -> D -> type
%mode thm +X1 +X2 +X3 -X4.  

Hope that helps,
-Dan