[POPLmark] Meta-comment

Thu May 12 12:52:06 EDT 2005

On Thu, 2005-05-12 at 01:31 -0400, Daniel C. Wang wrote:

> >>Personally, If I wanted to use first-order encodings I might as well be using,
> >>Coq, Isabell, or HOL. They are more mature than Twelf and I'd say more
> >>appropiate for first-order encodings.
> >>    
> >>
> >
> >My inner snot can't resist pointing out that "mature" is as mature
> >does.  Let's see a few solutions in the other systems Dan Wang
> >mentioned, and compare.  There seems to be a lot of talk about other
> >systems, and few complete solutions.
> >
> >  
> >
> Look, the POPLMark is not the only theorem proving problem in the world! 
> Lets not forget that Coq has recently been use to reprove the four-color 
> problem. (Google and I'm sure you'll find it.) Isabell/HOL now includes 
> a full executable specification for a TCP/IP stack. HOL-lite is used to 
> verify the correctness of the algoritms use in the x86 FPU. Lets see a 
> Twelf signature that encodes the theory of the real numbers from first 
> principles then we can talk. :)

In addition to supporting tactics and user interactive proof, the three
systems Dan mentioned (and others) also have very small "trusted theory
bases" and small "trusted computing bases".  By "trust", I mean anything
that cannot be checked.  That is, the mathematical foundations of these
logics are small, standard, and has been exhaustively studied, the
type/proof checking algorithms are well understood (and in many cases
have been formally proved correct in some other system), and as long as
you stick to the system's methodology (e.g., defining structures in
terms of existing structures rather than introducing possibly-
inconsistent axiomatizations in Isabelle/HOL etc), you know that what
you are doing is consistent with a mathematical foundation that many
people have studied and attempted to poke holes in, and failed.  

These properties do not hold of LF/Twelf when used according to the
proposed methodology.  I am *not* saying that I think LF is inconsistent
or untrustworthy.  The underlying LF type theory is well understood and
has a solid grounding in type theory, and the LF type checking algorithm
is well studied and has a formalized proof of correctness (I believe
that Randy Pollack and James McKinna's formalization of PTS type
checking is one example).  But the kind of proofs we are interested in
formalizing (that is proofs of meta-properties of programming languages)
are not expressed in LF itself in the Twelf methodology.

Instead, these proofs rely on a much deeper theory of how to do
induction/recursion on HOAS or how to check worlds/blocks/totality of
relations defined in LF.  (I will call this the the meta-logic and meta-
proof theory of Twelf; maybe this is not the standard term.)  This (it
seems to me) brings both a large amount of "paper proof" (of a very deep
and interesting, but by the same token hard to trust nature) back into
the trusted theoretical base and a large amount of code (e.g. Twelf's
coverage/totality/world checking) back into the trusted computing base,
relative to plain LF or other theories.  I am not saying that that I
think there is a flaw in Twelf's meta-logic, just that I don't have the
same confidence in it that I have in, say, pure LF, Coq's type theory,
ZFC or higher-order logic.

In addition, in Twelf one typically describes one's system by
introducing many Twelf constants describing the syntax, axioms, and
rules of the system.  That is, each new Twelf development is really
about a new logic encoded in LF.  It is easy to write things that look
right but are inconsistent/inadequate for subtle reasons or just typos.
>From Twelf's point of view this is no problem, it just means that lots
of types are inhabited.  Checking adequacy seems to require a trusted
paper proof that lies outside the system.  (I remember hearing that this
was a problem with adding obvious-seeming derived proof rules to the
original form of PCC in LF.)

FPCC, I believe, remained foundational while using Twelf by sticking to
a very small set of axioms, namely those for higher-order logic.
Everything else was done by introducing definitions and providing
explicit derivations.  This foundational approach means that things stop
working almost right away if the definitions are wrong, that is, the
canary dies before the miners.

Also, one reason I am interested in nominal logic and related techniques
is that it is provides much of the benefit of higher-order techniques
(that is, alpha-equivalence comes for free and capture-avoiding
substitution can be done automatically) while remaining much closer to a
foundational system (as Christian mentioned, the techniques can be
implemented definitionally within higher-order logic/Isabelle/HOL).  

I think that it would be appropriate to differentiate among
solutions/tools both on the basis of convenience, maturity of
implementation, and rapid prototyping (where Twelf clearly wins), and on
the basis of mathematical foundations (where Twelf is, to me at least,
not a convincing win).  That is, I don't view the correctness of the
Twelf solution (or Coq or other putative solutions) to POPLMark as an
absolute judgment, but only something that can be evaluated relative to
the trusted theory base and trusted computing base.

This difficulty would be ameliorated if the crucial components of
Twelf's meta-logic and proof checker were formalized in a foundational
system (or even by translation to Twelf proofs relative to a small,
trusted theory).  I can understand why providing a formalization of
Twelf's meta-logic/proof checker in some other system may not seem like
a worthwhile endeavor to Twelf's developers, but in my view it (or
something equivalent) is a necessary sanity check.  In the absence of
such a proof, it does not seem fair to me to compare the systems above
apples-to-apples with Twelf since they may be solving a harder problem.
The flip side of this is that *if Twelf meta-proofs are correct*, then
they evidently save you a whole lot of work to prove the same things.  I
am not willing to take the "if" part of the previous sentence for
granted.

I also think that it would be helpful to draw the following distinctions
in discussions about the expressiveness of Twelf (or any other candidate
system for POPLMarking).  It seems to me that there are roughly speaking
three kinds of formalizations possible in Twelf:

1.  A "foundational" formalization in which one writes down the axioms
of some trusted logic (using HOAS, LF contexts, etc) and then does
everything specific to a problem by introducing definitions and writing
down concrete derivations for theorems.

Here Twelf provides some assistance in the form of unification and
typechecking, but so do most of the other type-theoretic systems; also,
the latter provide tactics and fine grained control over interactive
proof, which Twelf (by itself) does not at present.  I think the FPCC
development is in this category.

2.  A "elegant" formalization in which the intended methodology of the
system is used without compromise (e.g. in Twelf, full HOAS used for
syntax and LF context used for all hypothetical judgments).  In such an
approach, one reasons explicitly (and only) about the things really in
the theory, such as terms, derivations, and judgments, instead of (for
example) reasoning by induction on the height of derivations encoded in
some unpleasant way as trees.

For any language for which such a formalization is adequate, Twelf is
clearly superior (modulo the foundational concerns I expressed above).
The problem is, how does one know *a priori* whether a problem has an
elegant formalization?

3.  Something in between, where some features of the methodology are
used but others are not, or where additional well-formedness judgments
are needed.  e.g. a logic such as linear logic (or bunched implications)
where one can make full use of alpha-beta-eta-equivalence provided by
HOAS but where the LF context isn't quite what we want so we need to
represent it explicitly as a list/tree or use additional well-formedness
judgments to ensure correctness/adequacy.  Another example would be
Andreas Abel's [Merlin 2001] second-order formulation of the lambda-mu
calculus (with mu : (cont A -> exp B) -> exp B rather than the more
elegant, but inadequate mu : ((exp A -> exp B) -> exp B) -> exp A).  I
think the POPLMark's FSub problems are also in this category.  

For "in-between" formalizations, Twelf may or may not be a win,
depending on how work you save by using built-in alpha-beta-eta and the
LF context versus how much extra work it is to reason about low-level
things by hand instead of using tactics.

Twelf is clearly beneficial in situation 2, but suppose you have a
language that combines well-behaved features, e.g. quantification and,
lambda, with some features that aren't handled elegantly, so that you
need to perform explicit reasoning about representations (e.g., encoding
heap cell labels as integers and propagating a counter through all
judgments for generating new labels, or reasoning about distinct names
in pattern matches).  In that case, the parts of your language for which
Twelf provides assistance work great, but you have to do all the other
reasoning using more primitive tools than in a tactic based prover.
(This sounds like the situation Dan described with FPCC).

I think that when some people have asked "can Twelf do X", they really
mean "can Twelf do X 'elegantly' (i.e., using approach 2)?"  Since
elegant is a subjective term, I don't see the point of arguing it.  I
think it would be interesting to determine whether there is a technical
characterization of those languages for which the Twelf methodology
works as intended i.e., without compromises or reverting to a
foundational approach), although I'm not sure this is even a sensible
technical question.

--James