[POPLmark] Adequacy for LF

Tue May 3 01:23:21 EDT 2005

Hi Steve,

I think this is the root of our disagreement.  If the point was to 
compare proofs of "the same thing", then I agree that we did not satisfy 
the spirit of the challenge.  I thought the point was otherwise: to 
explore how ready automated systems are for expressing metatheoretical 
arguments.  Certainly that is what the point *should* be.  And if that 
is the point, then the process of coming up with the best presentation 
is an essential part of the exercise, just as picking the right data 
structures are an essential part of implementing an algorithm.  
(Remember, proofs are programs!)

This is why Bob and I have been so stubborn on this matter.  It would 
not have been very difficult for us to express the system with an 
evaluation context, but there is no reason whatsoever to do so and good 
reason not to.  One of the important dividends of a logic framework is 
that in helps to to come up with the cleanest possible presentation of 
your system.  Very often one actually makes minor changes to the 
language in response to what one learns by encoding it, although we did 
not do so here.

On point, when one formalizes operational semantics using evaluation 
contexts, one quickly learns how bad of an idea they are.  Evaluation 
contexts appear attractive in direct proportion to how much of the 
detail associated with them one elides.  When formalized, elision is not 
an option, and all the shortcomings come front-and-center.  (Jerome 
pointed out a few of these in his email.)

    -- Karl

Stephan Zdancewic wrote:

> Karl Crary wrote:
>
>> There's been some confusion about the adequacy of our encoding of the 
>> operational semantics; in particular, why our encoding is adequate, 
>> given that the paper presents the semantics using evaluation contexts 
>> and our encoding does not.  We wanted to clarify why it is adequate, 
>> and moreover, why we thought its adequacy was obvious.
>
>
> Hi Karl,
>  I would also like to elaborate a bit more on why we originally gave 
> the Twelf implementation a grade of "-" with respect to the 
> operational semantics.  Part of the confusion (and this is probably 
> our fault entirely) is that the point of PoplMark is *not* to prove 
> the soundness of FSub.  Instead, the point is to allow us to compare 
> proofs of "the same thing" using different theorem proving and term 
> representation strategies.  Because different term representations 
> will likely need to use different proof techniques, there has to be 
> some wiggle room for what exactly "the same thing" is, but by trying 
> to keep as many features of the formalism the same across different 
> implementations, we had hoped to make comparison easier.  In addition, 
> the other point of the PoplMark Challenge is to see how the 
> technologies scale to handle different language constructs, including 
> things like exceptions, etc. that are more cleanly handled by 
> evaluation contexts.  (See Peter's mail.)
>
>  As you rightly point out, the operational semantics for FSub doesn't 
> require evaluation contexts and omitting them may make the proofs 
> cleaner.  In fact, never have we said that your solution was 
> inadequate (in the technical sense).  Our objection was solely to the 
> fact that it didn't follow the spirit of the challenge.  We should 
> have clarified this from the beginning.
>
>  Putting the adequacy of your original encoding aside for the moment, 
> let me ask you this: How hard would it be to formalize the operational 
> semantics, using evaluation contexts, following the paper version as 
> closely as possible?  My impression is that it wouldn't be that much 
> harder than the existing solution.  If it *would* be significantly 
> harder, then that in itself is interesting, and I would like to 
> understand why.
> Thanks,
>  Steve
> _______________________________________________
> Poplmark mailing list
> Poplmark at lists.seas.upenn.edu
> http://lists.seas.upenn.edu/mailman/listinfo/poplmark
>
>