[POPLmark] Comparison of certified code transformations for functional languages?

Sat May 12 21:21:09 EDT 2007

Adam Chlipala wrote:
> Karl Crary wrote:
>> I know nothing at all about the development you're looking at, so I 
>> can't give you an answer in any case.  However, your question about 
>> "fairness" suggests that you're looking to draw some sort of 
>> conclusion from these numbers, so I'm curious:  What sort of 
>> conclusion are you looking to draw?  You're comparing lines of code 
>> in proofs of different results in different systems.  It's hard for 
>> me to see that the comparison could tell you much of anything.
>
> I don't think that the fact that it's different systems should matter 
> much, modulo standard concerns about measuring code size between 
> languages.  The choice of system is as much open to debate as other 
> elements of the formalization strategy.  If one system's 
> implementations are systematically bulkier, then that suggests a 
> useful conclusion about which system to use.

I would agree with you, provided disk space were a primary 
consideration.  On the contrary, I would argue that the number of lines 
is much less important than how easily those lines were written.  For 
example, the Nuprl proof development system admitted *extremely* compact 
proofs, but was less easy to use than more verbose systems like Twelf.  
(I believe you just made this point yourself in your other email.)

Adam Chlipala wrote:
> I don't deny that the theorems being proved are different, but they 
> aren't enormously different.  Some quantitative comparison seems 
> better than pure hand-waving.  It would naturally be better to have 
> apples-to-apples comparisons, but the last time I tried to facilitate 
> that by suggesting a compilation benchmark problem on this list, no 
> one responded with alternate solutions. ;)

We'll have to disagree on that score.  I think a misleading quantitative 
comparison is no better than hand-waving.  I'm sure we can agree that 
minor changes to a theorem can lead to radical differences its 
difficulty of proof.

By the way, why can't the POPLmark challenge serve for an 
apples-to-apples comparison?  It was certainly intended to.

    -- Karl