[POPLmark] Some notes

Mon Oct 3 21:16:38 EDT 2005

Hello.

I was heartened by Harper's talk at ICFP that suggested that
machine-checked language definitions might actually be feasible with
today's technology. Hence I decided to give a shot at the Challenge
(with Coq). I need the practice anyway.

However, it seems that all the static documents at the Poplmark wiki
have disappeared, and in particular the challenge itself cannot be
found. Whoever maintains it, please fix it, and add a webmaster address
somewhere so no one need bother the entire list with technical issues.

Thankfully the MMM paper and the submissions could be found elsewhere in
the web. These raised some thoughts. Since I intend to formalize
everything as faithfully to the original as I can, I need some
clarification on what is actually desired.

(I am also prone to ranting. Excuse me.)

For one thing, although the text says that all details of binding issues
are left to the implementer, nevertheless there is an explicit ban on
shadowing an existing variable in a type environment, regardless of
whether such a prohibition is useful (or even meaningful) with the
implementer's chosen binder representation technique or not. This seems
a bit inconsistent.

(As an aside, it's interesting how paper definitions usually first
define a syntactic category with ordinary abstract BNF, and then give a
separate well-formedness judgement, even when well-formedness is not
context-dependent. In Coq it is natural to give just a single inductive
definition that includes both syntax and well-formedness constraints in
one go. Is there some deep theoretical reason why this direct definition
style (i.e. the one in TAPL 3.2.2) is not commonly practiced? Is there
some benefit (besides familiarity) from first giving a non-dependent
inductive definition for a set that also includes ill-formed terms?)

I also feel somewhat uneasy with the set operations in e.g. 
SA-TRANS-TVAR and SA-RCD, since no definitions for them were provided. A
formal theory for finite sets is not a big thing, but without them the
subtyping rules are underdefined. I know, this is done everywhere,
except when explicit structural rules are provided. Do I gather
correctly that this is exactly the point: real-world paper definitions
just don't go all the way and the formalizer just has to make best with
what's available?

Thirdly, a question regarding algorithmic, syntax-directed judgements. I
think these can be seen in two lights: either as inductively defined
relations that just happen to be decidable, or then as algorithms that
have been defined as formal systems simply because more powerful
machinery (i.e. a programming language for defining algorithms) would
not be worth the complication on paper.

The distinction is important when formalizing: in the first case, we
have to define the relation and provide a decision procedure:

Inductive Sub : TypeEnv -> Type -> Type -> Prop := ... .
Lemma sub_decidable : forall G S T, Sub G S T \/ ~(Sub G S T).

On the other hand, if I view the subtyping inherently as an algorithm,
we can define the algorithm directly as a program, since Coq provides
the machinery for doing so:

Definition is_sub : TypeEnv -> Type -> Type -> bool.

Of course we can do both, and show that they coincide:

Lemma is_sub_decides_sub : forall G S T, (is_sub G S T)=true <-> Sub G S T.

but this is more work and would lengthen the formalization.

Similarly, for e.g. challenge 3.3, should we think of one-step
evaluation as a relation with some decidability properties:

Inductive Eval : Term -> Term -> Prop := ... .
Lemma eval_decid : 
  forall t, (exists t', Eval t t') \/ ~(exists t', Eval t t').

or, when it is deterministic (as it is here), just as an algorithm for
determining the next step, if there is one:

Definition eval : Term -> option Term.

My concern is not so much with the work required by the various
approaches, but rather with the "spirit" of the paper definitions: do we
define algorithms as relations because we _want_ to define them as
relations (and should define them as relations also in a formalization),
or because we have no other way to define them on paper (which could and
should be corrected in a formalization)?

Finally, let me mention an aspect of this mechanization business that
has maybe been shadowed by the more immediate concerns for reliability
and article space. (This is the rant part.)

A programming language is not simply a tool for computation: it is also
a medium of expression. When we have a vague idea about how to solve
some problem, then writing it down as a program does not only give us a
tireless servant who will solve the problem for us henceforth, but it
also forces us to refine our original idea and scrub off all ambiguity
and handwaving until we are left with a pearl that shines with clarity.

I think a prime example of this is SPJ et al's "Composing contracts":
they start with a general idea of how financial derivatives work, and in
the process of writing it down in Haskell they end up with a far more
precise understanding of the subject. This is the real value of formal
methods: they help _us_, as humans, _think_.

Mechanized logical frameworks should of course be the optimal tools for
this purpose, and indeed this is what I mainly use Coq for. When I'm
studying math, the small (or not so small) formalist in me is always
annoyed by the definitions and theorems: quantifiers missing everywhere,
scopes of variables unclear, implicit assumptions unstated, and the
general unholy mishmash of natural language and pseudo-formal notation
makes me always doubt whether I have really understood what's being
said.

But when I see it written down _really_ formally, revealed from under
the fuzzy notation (and preferably lifted from the amorphous sea of ZFC
sets into a more structured universe), I can finally concentrate on the
whole definition, and nothing but the definition. Like this:

Structure Category : Type := 
  {Ob :> Type;
   Hom : Ob -> Ob -> Setoid;
   Op_comp : forall a b c : Ob, Map2 (Hom a b) (Hom b c) (Hom a c);
   Id : forall a : Ob, Hom a a;
   Prf_ass : Assoc_law Op_comp;
   Prf_idl : Idl_law Op_comp Id;
   Prf_idr : Idr_law Op_comp Id}.

At _this_ point (after looking at the auxiliary definitions), I can
finally convince myself that I sort of understand what a category is
supposed to be.

So far good and fine. But a theory contains not only definitions and
propositions, but also proofs. This, I think, is what the Poplmark
Challenge is (or should be) really about. IMNSHO, a good logical
framework should allow proofs to be expressed just as precisely and
legibly as a good programming language allows algorithms to be
expressed. I already prefer reading a real program over pseudocode, and
a formal definition over an informal one. One day I hope to also prefer
reading a formal proof of a nontrivial theorem over an informal one.

But it doesn't look like that day is quite here yet. Of the submitted
solutions, only the ATS one looks like it expresses a proof reasonably
directly in the way that a human would prove things naturally (though
I'm not familiar with either ATS or Twelf). Coq is particularly bad
here: tactic scripts don't really reveal much anything about the real
structure of the proof, and even when you look at the resulting proof
term, it usually turns out to be a humongous monster. Manual refactoring
and simplifying helps, but it is rather tedious. I don't think think
there is anything inherently difficult in producing more legible proofs,
but this doesn't seem to be a very high priority in the Coq culture.

And this is something that worries me. In the future we may get correct
proofs at the press of a button, but I don't want the cost of that to be
the understanding of _why_ they are correct.

Lauri Alanko
la at iki.fi