[Unison-hackers] path forward on independence from ocaml version

Greg Troxel gdt at lexort.com
Thu Jan 27 10:06:23 EST 2022


Tõivo has added RPC version negotiation that is backwards compatible, so
with that feature, we can add a new RPC version and not impose a flag
day on users.  This makes not only adding a new wire protocol far
easier, but it makes adding the next wire protocol later far less of a
big deal.

The next step is to decide on an approach for a new wire protocol and
then to get an implementation staged as a pull request.

Or perhaps, a new archive format, which is perhaps easier, and might be
a good chioce in terms of sequencing, even though it has less of an
impact.

The obvious requirement is that it be an implemenation of a protocol
spec, so that it is independent of ocaml version, as well as independent
of CPU type, ILP32/LP64/ILP64/etc. and endianness.   I won't insist on
it working on a PDP-11, but it ought to work on Vax, sparc, sparc64, BE
mips, etc., as long as ocaml does.

Less obvious requirements and desirable properties:

 - a format that is already standardized is preferred.  JSON or BSON
   would be a win.

 - efficient encoding, both in space and in CPU time to go to/from

 - avoid homegrown, relying on already debugged  external libraries and
   standards

 - minimal external dependencies, because building unison will require
   those.  One library that is in widespread use and already packaged in
   most packaging systems, and easy in the rest,would be totally ok.

 - avoid vendoring in external code

 - new code should go back to some oldish ocaml version, to offer a
   non-flag-day upgrade path to people.  But right now people have to
   have flag days so this is not critical.  It's an interesting question
   how old matters; I'll ask on -users.

These requirements self-conflict, more or less.  It would be ideal if
there were a libary that marhsals to BSON, if it had a spec, if it were
mature, if the rest of the ocaml world used it, if it supported back to
say 4.02, and if it and all dependencies were widely packaged.

I am inclined to give up on efficiency first, as size of metadata does
not seem like unison's big problem.

Back to reality, this seems to be the closest external library (3 links,
one thing):

  https://gitlab.com/nomadic-labs/data-encoding/
  https://nomadic-labs.gitlab.io/data-encoding/data-encoding/Data_encoding/index.html
  https://discuss.ocaml.org/t/ann-data-encoding-0-3-performances-and-streaming/7205

and issues are:

  no apparent spec (mimor)
  ocaml 4.10 required
  fairly large dependency footprint

There is also umarshal, a work in progress by Stephane, which is similar
to data_encoding's Binary option, and ocaml's Marhshal, but a fixed
protocol.  While it's homegrown, it should be possible to have ocaml
compat going way back, and I don't expect it will need a lot of
maintenance.

 has added RPC version negotiation that is backwards compatible, so
with that feature, we can add a new RPC version and not impose a flag
day on users.  This makes not only adding a new wire protocol far
easier, but it makes adding the next wire protocol later far less of a
big deal.

The next step is to decide on an approach for a new wire protocol and
then to get an implementation staged as a pull request.

Or perhaps, a new archive format, which is perhaps easier, and might be
a good chioce in terms of sequencing, even though it has less of an
impact.

The obvious requirement is that it be an implemenation of a protocol
spec, so that it is independent of ocaml version, as well as independent
of CPU type, ILP32/LP64/ILP64/etc. and endianness.   I won't insist on
it working on a PDP-11, but it ought to work on Vax, sparc, sparc64, BE
mips, etc., as long as ocaml does.

Less obvious requirements and desirable properties:

 - a format that is already standardized is preferred.  JSON or BSON
   would be a win.

 - efficient encoding, both in space and in CPU time to go to/from

 - avoid homegrown, relying on already debugged  external libraries and
   standards

 - minimal external dependencies, because building unison will require
   those.  One library that is in widespread use and already packaged in
   most packaging systems, and easy in the rest,would be totally ok.

 - avoid vendoring in external code

 - new code should go back to some oldish ocaml version, to offer a
   non-flag-day upgrade path to people.  But right now people have to
   have flag days so this is not critical.  It's an interesting question
   how old matters; I'll ask on -users.

These requirements self-conflict, more or less.  It would be ideal if
there were libary that marhsals to BSON, if it had a spec, if it were
mature, if the rest of the ocaml world used it, if it supported back to
say 4.02, and if it were widely packaged.

I am inclined to give up on efficiency first, as size of metadata does
not seem like unison's big problem.

Back to reality, this seems to be the closest external library (3 links,
one thing):

  https://gitlab.com/nomadic-labs/data-encoding/
  https://nomadic-labs.gitlab.io/data-encoding/data-encoding/Data_encoding/index.html
  https://discuss.ocaml.org/t/ann-data-encoding-0-3-performances-and-streaming/7205

and issues are:

  no apparent spec (mimor)
  ocaml 4.10 required
  fairly large dependency footprint

The same people have https://gitlab.com/nomadic-labs/json-data-encoding
which seems 1) json only 2) lighter weight but 3) still 4.10 adn 4)
still a fairly heeft dependency load.


There is also umarshal, a work in progress by Stephane, which is similar
to data_encoding's Binary option, and ocaml's Marhshal, but a fixed
protocol.  While it's homegrown, it should be possible to have ocaml
compat going way back, and I don't expect it will need a lot of
maintenance.


Tõivo and I have discussed, and we lean to umarshal because we think it
will solve our probblem with the least total work, and the "avoid
boutique protocol" problem shouldn't get that much weight compared to
the pain of 4.10 only and large dependencies.


Therefore, questions for hackers@:

  Does anybody know of a library that meets requirements, or close
  enough to discuss?

  Any thoughts that this path is unwise?

  The list has been quiet.  It looks like this will be Tõivo taking the
  existing code forward, with some code review by me, a few people
  testing, and that's it.  So don't hold your breath, and it would be
  great if more people dug in.

Greg
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://LISTS.SEAS.UPENN.EDU/pipermail/unison-hackers/attachments/20220127/ac6fe27d/attachment-0001.asc>


More information about the Unison-hackers mailing list