[TYPES] Type theory of file formats

Jimmy Koppel jkoppel at mit.edu
Sun Aug 21 21:47:04 EDT 2022


On Sun, Aug 21, 2022 at 9:03 AM Christopher League <
christopher.league at liu.edu> wrote:

>
> Hi Jimmy,
>
> I recognize something much like this as "row types" or "row
> polymorphism".  I encountered the idea in Rémy and Vouillon POPL 97,
> Objective ML.
>
> Interestingly, with row variables, the capability isn't quite modeled as
> subtyping but rather *parametric* polymorphism of the extension of the
> record type.
>
> Combining the two can be interesting, and there are probably some
> object-modeling papers that do it: subtyping for "depth" extensibility, and
> rows for "flat" extensions.  To represent objects with subtyping but retain
> efficient, flat (non-boxed) in-memory representation was the goal.  That
> seems similar to your forward-compatible binary file format use case,
> though of course some other challenges could arise too!
>
> Hope that helps, and good luck.
>
> CL
>


 Hi Chris,

(You seem to have replied to just me, but I'll assume that's a mistake.)

This is  a good connection that I hadn't made before; thank you. Row types
look to have more in common with extensibility of JSON formats, which is
much easier to do, and much easier to model. I'm still hoping to find some
type theory that actually cares about indexes into a byte array.

>
> Jimmy Koppel <jkoppel at mit.edu> writes:
>
> >
> > [ The Types Forum,
> https://urldefense.com/v3/__http://lists.seas.upenn.edu/mailman/listinfo/types-list__;!!DeIc-uvKXH9G!-FSN5V0yiQnRrDfBehDbLji9VJVqftqzIlLIBOjjFe8MTrsS8N_FjfXHL2ytBS4UpGbubf-tZBHE9w1inF6SMi0$
>  ]
> >
> > There’s a beautiful fact, not often discussed, that the theory of
> subtyping
> > explains forwards compatibility between file formats.
> >
> >
> > For example here’s a simple file format for images:
> >
> >
> > type header_length_v1 = {x : uint32 | x >= 2}
> >
> > type width = uint32
> >
> > type height = uint32
> >
> > type header_v1(n) =  width * height * uint32[n-2]
> >
> > type image_data = uint32[]
> >
> >
> > type image_file_v1 = {n: header_length_v1} * header_v1(n) * image_data
> >
> >
> >
> > This type allows for arbitrary extensions to the header, so that any
> > extension to this format which adds more header fields is a subtype. A
> > reader for this file format is thus forwards-compatible with any format
> > extension which adds more header fields.
> >
> >
> > However, even though I wrote it them above as products, the components
> of a
> > binary file format are not products. In particular, they do not allow
> > nesting, e.g.: (a*b*c)*d is not compatible with (a*b)*d.  Compatibility
> > between two binary formats is determined  by the same data being in “the
> > same place.”  Even though, classically, uint32 * uint8 is a subtype of
> > uint32, one could not replace  width in the type above with (uint32 *
> uint8),
> > as that would move the location of the height from bytes 8-11 to bytes
> > 9-12, breaking existing code.
> >
> >
> > For my class, I made up a “flat sequence type” to describe file formats,
> > which effectively functions as a product type which must be
> > right-associated. I would instead write the header(n) type as width ++
> > height ++ uint32[n-2]. I have not worked out all the theory, but I’ve
> > noticed fewer students making mistakes about binary compatibly since
> > introducing this idea.
> >
> >
> > Can anyone point me to related work?
> >
> >
> > Yours,
> >
> > Jimmy Koppel
> >
> > MIT CSAIL
>


-- 
Jimmy Koppel
twitter.com/jimmykoppel
https://urldefense.com/v3/__http://www.jameskoppelcoaching.com__;!!IBzWLUs!S4wpNwoe6w2Q7s9z-HapvFj1KoePa4YHN2QBxnFvwP6rg1O-gEHM8dt2LBxLC7LRGAkE5d947CNiDqY20kikLkJypQWF$  


More information about the Types-list mailing list