[TYPES] Typing extensible records without row variables?
Brian Hulley
brianh at metamilk.com
Mon Aug 25 21:55:05 EDT 2008
Hi,
Many of the papers I've read that deal with the typing of extensible
records use the concept of "row variables" ([1] [2] [3] [4]). Therefore
if we use the syntax(*):
'val r = {[title = "Duma"] [genre = Drama]}
to bind a record with 2 fields to (r) and
{[certificate = U] :: r}
to denote the extension of (r) by a new field, with row variables the
extension operator has type:
{[_ : _] :: _} :
'forall (l : LABEL) (t : *) (r : ROW) .
Lab[l] -> t -> Rec[r] -> Rec{| [l : t] :: r |}
where {| [l : t] :: r |} denotes the row r extended by the label/type
association, and (Rec) is a built-in type constructor of kind (ROW ->
*). (Curried application is denoted by glued square brackets which can
be omitted if the argument is itself a bracketed form, and keywords
always start with a prime.)
For brevity the above can be written:
{[_ : _] :: _} :
Lab[l] t Rec[r] -> Rec{| [l : t] :: r |}
| [l : LABEL] [t : *] [r : ROW]
where the postfix bar/square brackets are just an alternative way of
introducing universally quantified variables so that the interesting
"shape" part of the type can be written (and read) first, and the
function has an arity (as in Clean) so there is only one arrow.
Some record systems only allow one occurrence of any given label per
record, which is represented by a "lacks" predicate, which ensures that
the row does not already contain the specified label:
{[_ : _] :: _} :
Lab[l] t Rec[r] -> Rec{| [l : t] :: r |}
| [l : LABEL] [t : *] [r : ROW] Lacks[l r]
Now my question is: suppose instead of using a row variable we simply
used a normal type variable (of kind *) but used a *predicate* to
restrict this type variable to the subset of (*) which are records:
{[_ : _] :: _} :
Lab[l] t r -> { [l : t] :: r }
| [l : LABEL] [t r : *] Record[r] Lacks[l r]
The point of doing this is that it gives (to my mind) a much more
intuitive type for the extension operation: we no longer need to
explicitly lift a row into a record using (Rec) or deal with the extra
concept at all. Also, just as with (Lacks) predicates, the presence of
the (Record) predicate could often be deduced from the shape of the type
and therefore would often not need to be written explicitly by the
programmer.
Of course one justification for breaking down the record concept into
row + lift-to-record is that a variant can be represented by row +
lift-to-variant and hence there is the possiblity to turn records into
variants and vice versa at the term level ([1] 6.1). However to my mind
there are several fundamental differences between records and variants
in terms of how they are used in practical programming languages:
1) For variants it is important that the compiler can issue
exhaustiveness warnings whereas for a record it doesn't matter if there
are extra fields that are not needed by an operation.
2) To use variants in practice (e.g. in patterns) a syntax like <lab =
val> would be far too verbose and therefore the mainstream functional
languages use curried constructors (Haskell) or 0/1 constructors (SML)
(which the implementation may or may not try to optimize by turning a
constructor that takes one tuple of arity n into a constructor of arity
n by erasing the tuple behind the scenes). However if we choose to use
curried constructors as in Haskell the type associated with each variant
label would have to be a "lifted tuple" (i.e. a tuple which is regarded
as a container of zero or more elements so that a 1-tuple is not the
same as the value itself) since we need to be able to distinguish
between a function of arity one whose argument is a pair, and a function
of arity two. Therefore the row corresponding to a variant would always
have to be of a specific form and so there could not be free interchange
between records and variants.
3) While it is obviously useful to have extensible records it is not at
all clear to me whether or not extensible variants are similarly useful
since there are other ways of achieving extensiblility in functional
programming e.g. by adding new combinators to a combinator library and
keeping the underlying representation variant completely hidden.
In any case the question of interchange between records and variants
seems to be orthogonal to the question of whether rows need to be
represented explicitly in the type system, since we could simply use
built-in predicates and operations like:
plus-elim : 'record-to[a r] v -> a
| a r v Record[r] Variant[v] EqRV[r v]
where EqRV[r v] ensures that each label in record type (r) is associated
with the same type as the corresponding label in variant type (v), and
that the set of labels in (r) and (v) is the same, i.e. it replaces the
need to explicitly construct (r) and (v) from the same row variable.
(This is my translation of (plusElim) from ([1] 6.1), and the prime
before the "record-to" helps to emphasize that this is a built-in type
function.)
Anyway since the separate concept of row variable is so ubiquitous in
the literature, and since I have not so far seen (Record) or (Variant)
predicates used instead, I would be very interested to know if there's
any other reason for using row variables that is not covered by the
above, or if I'm missing something important, or if anyone else has had
the same idea.
Thanks, Brian.
[1] " A Polymorphic Type System for Extensible Records and Variants"
Benedict R. Gaster and Mark P. Jones
[2] "Lightweight Extensible Records for Haskell"
Mark P. Jones and Simon Peyton Jones
[3] "First-class labels for extensible rows"
Daan Leijen
[4] "Extensible records with scoped labels"
Daan Leijen
(*) This is the comma-free(! :-)) syntax I've developed for the language
I'm working on.
More information about the Types-list
mailing list