[TYPES] Typing extensible records without row variables?

Mon Aug 25 21:55:05 EDT 2008

Hi,
Many of the papers I've read that deal with the typing of extensible 
records use the concept of "row variables" ([1] [2] [3] [4]). Therefore 
if we use the syntax(*):

     'val r = {[title = "Duma"] [genre = Drama]}

to bind a record with 2 fields to (r) and

     {[certificate = U] :: r}

to denote the extension of (r) by a new field, with row variables the 
extension operator has type:

     {[_ : _] :: _} :
         'forall (l : LABEL) (t : *) (r : ROW) .
             Lab[l] -> t -> Rec[r] -> Rec{| [l : t] :: r |}

where {| [l : t] :: r |} denotes the row r extended by the label/type 
association, and (Rec) is a built-in type constructor of kind (ROW -> 
*). (Curried application is denoted by glued square brackets which can 
be omitted if the argument is itself a bracketed form, and keywords 
always start with a prime.)
For brevity the above can be written:

     {[_ : _] :: _} :
         Lab[l] t Rec[r] -> Rec{| [l : t] :: r |}
             | [l : LABEL] [t : *] [r : ROW]

where the postfix bar/square brackets are just an alternative way of 
introducing universally quantified variables so that the interesting 
"shape" part of the type can be written (and read) first, and the 
function has an arity (as in Clean) so there is only one arrow.

Some record systems only allow one occurrence of any given label per 
record, which is represented by a "lacks" predicate, which ensures that 
the row does not already contain the specified label:

     {[_ : _] :: _} :
         Lab[l] t Rec[r] -> Rec{| [l : t] :: r |}
             | [l : LABEL] [t : *] [r : ROW] Lacks[l r]

Now my question is: suppose instead of using a row variable we simply 
used a normal type variable (of kind *) but used a *predicate* to 
restrict this type variable to the subset of (*) which are records:

     {[_ : _] :: _} :
         Lab[l] t r -> { [l : t] :: r }
             | [l : LABEL] [t r : *] Record[r] Lacks[l r]

The point of doing this is that it gives (to my mind) a much more 
intuitive type for the extension operation: we no longer need to 
explicitly lift a row into a record using (Rec) or deal with the extra 
concept at all. Also, just as with (Lacks) predicates, the presence of 
the (Record) predicate could often be deduced from the shape of the type 
and therefore would often not need to be written explicitly by the 
programmer.

Of course one justification for breaking down the record concept into 
row + lift-to-record is that a variant can be represented by row + 
lift-to-variant and hence there is the possiblity to turn records into 
variants and vice versa at the term level ([1] 6.1). However to my mind 
there are several fundamental differences between records and variants 
in terms of how they are used in practical programming languages:

1) For variants it is important that the compiler can issue 
exhaustiveness warnings whereas for a record it doesn't matter if there 
are extra fields that are not needed by an operation.

2) To use variants in practice (e.g. in patterns) a syntax like <lab = 
val> would be far too verbose and therefore the mainstream functional 
languages use curried constructors (Haskell) or 0/1 constructors (SML) 
(which the implementation may or may not try to optimize by turning a 
constructor that takes one tuple of arity n into a constructor of arity 
n by erasing the tuple behind the scenes). However if we choose to use 
curried constructors as in Haskell the type associated with each variant 
label would have to be a "lifted tuple" (i.e. a tuple which is regarded 
as a container of zero or more elements so that a 1-tuple is not the 
same as the value itself) since we need to be able to distinguish 
between a function of arity one whose argument is a pair, and a function 
of arity two. Therefore the row corresponding to a variant would always 
have to be of a specific form and so there could not be free interchange 
between records and variants.

3) While it is obviously useful to have extensible records it is not at 
all clear to me whether or not extensible variants are similarly useful 
since there are other ways of achieving extensiblility in functional 
programming e.g. by adding new combinators to a combinator library and 
keeping the underlying representation variant completely hidden.

In any case the question of interchange between records and variants 
seems to be orthogonal to the question of whether rows need to be 
represented explicitly in the type system, since we could simply use 
built-in predicates and operations like:

     plus-elim : 'record-to[a r] v -> a
                    | a r v Record[r] Variant[v] EqRV[r v]

where EqRV[r v] ensures that each label in record type (r) is associated 
with the same type as the corresponding label in variant type (v), and 
that the set of labels in (r) and (v) is the same, i.e. it replaces the 
need to explicitly construct (r) and (v) from the same row variable.
(This is my translation of (plusElim) from ([1] 6.1), and the prime 
before the "record-to" helps to emphasize that this is a built-in type 
function.)

Anyway since the separate concept of row variable is so ubiquitous in 
the literature, and since I have not so far seen (Record) or (Variant) 
predicates used instead, I would be very interested to know if there's 
any other reason for using row variables that is not covered by the 
above, or if I'm missing something important, or if anyone else has had 
the same idea.

Thanks, Brian.

[1] " A Polymorphic Type System for Extensible Records and Variants"
	Benedict R. Gaster and Mark P. Jones
[2] "Lightweight Extensible Records for Haskell"
	Mark P. Jones and Simon Peyton Jones
[3] "First-class labels for extensible rows"
	Daan Leijen
[4] "Extensible records with scoped labels"
	Daan Leijen

(*) This is the comma-free(! :-)) syntax I've developed for the language 
I'm working on.