[POPLmark] Experience report with Twelf

Stephanie Weirich sweirich at cis.upenn.edu
Wed Aug 17 17:14:13 EDT 2005

This list has been a bit quiet lately, so I thought I'd get some
discussion going again.

As part of our experimentation with proof assistants and logical
frameworks, I (with the assistance of Geoff Washburn) have coded up
the type soundness of Featherweight Java (FJ) in Twelf. As I coded
this same proof up in Coq seven months ago, it provides a comparison
between Twelf and Coq.  However, note that this isn't really a fair
fight: alpha-conversion is not an issue in the semantics of FJ, so
there isn't much chance for Twelf to shine. I used HOAS because I
wanted experience with it, but I don't think it was particularly
called for. (In fact, if I did it again, I would probably drop the
HOAS---more about this below.)

In this message, I'd like to report about my recent Twelf experiences,
and ask a few questions of the Twelf experts. I'm going to concentrate
on Twelf because that is what I most recently had experience with. In
particular, I have some critical things to say about Twelf below. I
also had some problems with Coq, but I won't go into detail about
those in this message.

For reference, you can download the Twelf and Coq proofs from the
poplmark web site: http://www.cis.upenn.edu/proj/plclub/mmm/.
(To check the Twelf proof you need to use the CVS version of Twelf.)

Overall, I was more pleased with Coq than I was with Twelf for this
exercise. The Twelf version was about twice as long as the Coq
version. Bear in mind that both versions were created by
novices---I've never used either tool for this sort of proof before
now. However, I think that with experience (and much better use of
tactics), I could make the Coq version shorter. I don't think there is
much I could do to simplify the Twelf version. (Please tell me if I am

I think my troubles with Twelf stemmed from the four reasons that I
discuss below. For each of these reasons, I would like to know:
(a) am I just missing something? Perhaps there is a better way to
     this that I don't know about. Please correct me if I am confused.
(b) if not, is this a known problem with Twelf that can be fixed
      or is this something fundamental?

Again, this wasn't a fair fight---there is nothing in FJ to really
take advantage of HOAS. But it is typical of some of the work I would
like to do using Twelf.

1) No polymorphism for logical operators

The lack of polymorphism in Twelf is a common riff, and I wouldn't
mention it if my troubles were limited to just data structures (such
as lists). In fact, the lack of polymorphism in Twelf was much more
annoying than I thought it would be.

Twelf is a logical framework, not a logic. That means that many
logical operators, like disjunction, conjunction, false, and equality,
must be explicitly added. However, as Twelf lacks polymorphism, they
cannot be added generically (as they can in Coq). Each type needs its
own definition of equality. Each disjunction of two types must be

More importantly, each time I'd like to use the fact that false
implies anything, I have to prove it as a separate lemma. (In my
preservation proof, this was 6 times). Each time I used Leibniz
equality, I also had to prove that (13 times). What is a simple tactic
in Coq ("subst") is 5 tedious lines of Twelf.  For example,

    refineMTEQ : methodTable_eq MT1 MT2 -> lookupMethod MT1 M MD
                                        -> lookupMethod MT2 M MD ->  
    %mode refineMTEQ +MTEQ +LM1 -LM2.
    -: refineMTEQ methodTable_refl LM LM.
    %worlds (b_bind | var) (refineMTEQ M N O).
    %total M (refineMTEQ M N O).

2) Constructive logic and logic programming

In Twelf, a theorem is a total logic program. So, the only way to
prove something is to write a logic program that the metatheory of
Twelf can show is total. That means that we can only use constructive

The proof of the progress theorem includes a case analysis of whether
A is a subtype of B or not. In constructive logic, we have to show
that subtyping is decidable. In Coq, I used "apply Classic." In Twelf,
Geoff wrote 300 lines of code. Granted, these extra lines actually
proved more: we got a proof that subtyping is decidable in FJ. But, as
I was just interested in the type soundness of FJ, so being forced to
prove more properties was annoying.

Furthermore, even sticking to constructive logic, I was still
slightly hampered by the logic programming aspect of it. For example,
in the specification of when class tables are wellformed, I wanted to
say something like :

for all C. if you look up C in the class table CT, then C is well formed
    the class table CT is well formed.

(this is how my Coq specification reads, in Coq syntax of course) I
couldn't figure out how to say that in Twelf. Instead, I had to write
the logic program that folds over the class table, checking the
well-formedness of each class.

class_table_typing_help : classTable -> classTable -> type.
ctt_nil : class_table_typing_help CT0  crnil.
ctt_cons : class_table_typing_help CT0 CT
         -> class_typing CT0 C CD
         -> className_neq C object
    % ----------------------------------------------------------------
       -> class_table_typing_help CT0 (crcons (CD : classDef C) CT CTI).

class_table_typing : classTable -> type.
- : class_table_typing_help CT CT -> class_table_typing CT.

Perhaps I'm being too critical, as the code I had to write is only a
bit longer, but it isn't exactly what I wanted to write. In Coq, I had
the choice, but in Twelf, I was forced to do it this way.

3) HOAS, non-modularity and strengthening

I used HOAS for method arguments in FJ, and I think I stumbled on two
disadvantages for HOAS. (Again, alpha-equality is not much of an issue
in FJ, so the advantages of HOAS wouldn't show up so much.) Both of
these issues showed up in my proof of narrowing which reads:

If G, X <:C0, G' |- e : D0  and C1 <: C0  then  G, X<:C1, G' |- e :  
D1 and
D1 <: D0.

(Luckily, the proof for narrowing in FJ was not nearly as tricky as
for F-sub in the POPLmark challenge. It is a fairly straightforward
paper proof.)

However, I had a problem with adding assumptions to the LF
context. During this proof, as I traversed under a binder, I needed to
add the "variable case" to the context. This was fun, I created a
block that assumed the variable, its bound, and the proof of the
narrowing theorem when the expression is exactly that variable:

(For the curious, it looks like:
%block b_bind :
    some {CT: classTable}
         {C0 : className}
         {x: exp}
         {q: typing CT x C0}
         {N: ({WCT : class_table_typing CT}
              {C1 : className}
              {C2 : className}
              {SB0: subtyping CT C1 C2}
              narrowing_exp WCT ([y][qy] q) SB0 ([y][qy] q) s_refl)}. )

However, this block means that all of the lemmas that narrowing
uses must be shown in a world that contains blocks that look like
the above. The full dependence tree contains 27 lemmas that I had to
modify the worlds declaration for, so that I could use them in the
proof of narrowing.  Furthermore, I had to put the statement of the
narrowing lemma before these 27 lemmas because I need it to describe
the block.  But I needed to put the proof of the narrowing lemma after
these 27 lemmas, because it depends on them. This seems bad to me, a
theorem with a different LF context that needed to use narrowing in
its proof would have to sandwich all of this code: the statement of
the theorem above so that the world definitions could refer to it,
all of the world definitions modified, and then the proof below.
I don't see how this can scale.

My second issue had to do with assumptions in the LF context. Twelf
would assume that parts of my proof could depend on variables in the
context, even though they couldn't really. So I had to prove to twelf
that there was no dependence.

For example, if I looked up a class definition from the class table
(which can't depend on x), but the method table could have, it was the
case that the method table actually didn't depend on x, and I could
have replaced that x with something else.

lookupClass_strengthen :
   ({x:exp} lookupClass CT C (class C D FS (MT x)))
     -> lookupClass CT C (class C D FS (MT unit)) -> type.

For some parts of my proof, it didn't matter whether there was a
dependence or not. So I added a new block (called "var") to 24 lemmas
so Twelf could show that they were ok even with a variable "x" in the

4) Bottom-up vs. top-down

In Twelf, my proof development was very bottom up. This was
particularly true, because I couldn't make assertions in Twelf. I
couldn't tell Twelf that something was a total logic program (assert
that it is a theorem) without proving it (I was using the CVS version
of Twelf, so the "feature" described on the Twelf wiki didn't work. I
imagine that it would be trivial to add this to Twelf, so perhaps it
is already there and I just don't know how to do it.).

Separate from this is the difference between using an induction tactic
in Coq and the coverage checker in Twelf. In Coq, after I define an
inductive datatype, it tells me what the induction principal is, and
in fact interactively guides me through the steps of using it.  In
Twelf, I have to guess that principal myself. And according to Twelf's
coverage checker, I'm not very good at guessing it. I imagine with
more experience with Twelf, I will get better at it. This may be more
of an issue for the novice, but it does make Twelf's learning curve
all the more steep.


That's it. In many ways that have been previously described by others,
Twelf was a fine tool. I won't go into detail here, but in particular,
I greatly appreciated Twelf's type inference and wished that Coq could
do more of that for me. It also goes without saying that in both
systems, dependent types are essential. I would really like to hear
from Twelf users and experts, to help me refine my understanding of
these issues.


More information about the Poplmark mailing list