[Unison-hackers] Exploring alternatives to OCaml's marshalling

Stéphane Glondu steph at glondu.net
Wed Feb 19 08:53:56 EST 2020


Hello,

Currently, Unison depends on OCaml's marshalling which turns out to be
not very stable across OCaml versions. As a consequence, Unison may be
incompatible with itself when compiled with a different OCaml version.
This makes offering a Debian package for Unison a nightmare (see for
example [1] or [2]).

[1] https://github.com/bcpierce00/unison/issues/94
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946041

Moreover, stability across versions is not a goal of OCaml's
marshalling, which is why I think Unison should switch to something else.

I can imagine several alternatives, including:
1) reimplement (or embed) one fixed version of OCaml's marshalling
   inside Unison, keeping a type-independent strategy;
2) implement a different type-independent marshalling function (using
   the Obj module);
3) implement type-oriented and "explicit" marshalling.

All of them require some kind of "monitoring" of the evolution of the
OCaml language or implementation. I believe 1) and 2) require monitoring
of low-level details of the OCaml implementation, something an avid
OCaml follower may do, but not something we can impose on Unison
contributors. With 3), on the contrary, the language (and its tooling)
can help with adapting to a new version of OCaml. So I decided to
explore 3).

The idea is to identify all types that need marshalling, and implement
type-safe (un)marshalling functions for these types. This can be done by
annotating each such type with an attribute that a PPX preprocessor
would use to generate the functions. There exists already several such
PPXs; as suggested in [1], I looked into ppx_protobuf and ppx_bin_prot
(available in opam).

Unison is easily adapted to both.

However, ppx_protobuf uses "bytes" as buffers and does not support
bigarrays or reading/writing directly to a channel, so it is impossible
to handle structures that are bigger than Sys.max_string_length, which
is ridiculously low on 32 bits architectures (16MB IIRC). And I easily
get archives bigger than that.

Ppx_bin_prot uses bigarrays as buffers and does not suffer from the
Sys.max_string_length limitation, but uses a less standard binary format
(albeit specified) and has much more dependencies (another Debian
packaging nightmare). Being part of Jane Street stack, I am somewhat
confident that it will be properly maintained in the future. But who knows.

In both cases, the compatibility problem is shifted to the marshalling
libraries. Contrarily to OCaml's marshalling, these libraries seem to
aim stability. But, again, who knows.

I've pushed both versions to my fork on GitHub:

  https://github.com/glondu/unison/tree/protobuf
  https://github.com/glondu/unison/tree/bin_prot

I've been using the protobuf branch [between two (64-bit) Linux hosts]
for 15 days now with no issues, so the principle of the transformation
(which is the same in the bin_prot branch) looks sound. However, because
of the 32-bit limitation, I've been focusing my efforts on the bin_prot
branch lately. An alternative would be to change ppx_protobuf to use
bigarrays.

I have not tested on OSX or Windows.

I am now seeking feedback about my methodology and implementation.


Cheers,

-- 
Stéphane


More information about the Unison-hackers mailing list