[TYPES] Complexity of row unification

Thu Feb 17 16:33:49 EST 2005

Scripsit Francois Pottier <Francois.Pottier at inria.fr>
> On Sun, Feb 13, 2005 at 03:34:25AM +0000, Henning Makholm wrote:

>> Unless I'm much mistaken, just a tiny bit of ingenuity will get it down to
>> O(nm*alpha(n)),

> Sounds plausible, although my own attempts, which involved sorting
> labels, brought up an additional log(m) factor, I think.

The ingenuity I had in mind was to intern all labels into a trie which
maps them to small integers <m, as they are first read from the input.
Keep these indices around in your internal representations of rows,
and you can bucketsort any set of labels by "order of first seeing
them" in time O(m).

The initial trie-building happens only once and takes time
proportional to the total _length_ of all labels in the input
(assuming that they are given as strings in a finite alphabet), and
can therefore be swallowed by the O(n) factor.

For most large practical programs - which mention many different
labels but seldom combines a significant fraction of them into the
same row - I'm pretty sure that a comparison-based label sorting will
perform better than this.  So this trick would be aimed more at
bounding the worst-case complexity.  One might imagine an
implementation that switched seamlessly to bucketsorting when it
needed to unify two rows and saw that they had more than c(m/logm)
explicit labels in total for some fixed c, to get the best of both
worlds. But I doubt that even that is worth the trouble in practice.

> I have published a paper about the complexity of solving nonstructural
> subtyping constraints that involve rows (in LICS'03).

Yes, I've read that.

-- 
Henning Makholm      "Jeg kunne ikke undgå at bemærke at han gik på hænder."