[Unison-hackers] Re: a common patch set - first attempt [repost]

Mon Jun 13 15:16:55 EDT 2005

> The fundamental issue is this.  Unison's RPC mechanism is built on  
> OCaml's marshalling and unmarshalling primitives, which are fast but  
> type-unsafe: when you unmarshall a value, you just tell OCaml what type  
> you expect it to be, and that's the type you get.  If the actual  
> bytestring you are unmarshalling is not a marshalled value of that  
> type, chaos can ensue.  Since we don't want chaos happening during file  
> synchronization, we wrap the raw marshalling routines in a type safety  
> layer that adds a tag (unique for each remotely callable procedure) to  
> each packet containing marshalled data; this ensures that both ends of  
> the RPC agree about what procedure is being called, and from this the  
> type system guarantees safely... *provided* that both client and server  
> are running binaries compiled from the same sources.  This is why  
> Unison checks at the beginning of each run that the server's version  
> number is the same as the client's.

Okay, thank you.  This is the first full explanation I've seen about why 
different versions of Unison won't talk to each other.

> Of course, this condition is actually stronger than necessary.  If the  
> code has changed but the declarations of data types have not, then  
> there is no problem.  But there is no automatic way to tell whether  
> this is the situation.

OK.

> I fee a few possibilities for improving matters...
> 
> 1) Adopt a convention that, whenever anyone makes a change to a  
> marshalled data structure, they bump the major version number, and make  
> Unison's startup test just the major version number and ignore the  
> minor version number.  This seems viable (especially since we have  
> multiple people looking at commit messages and if one person forgets,  
> someone else is likely to notice), and it has the great advantage of  
> being simple.

Yes, this sounds easy and fairly good.

> 2) Add a preference allowing users to specify explicitly that Unison  
> should accept specific other versions as "close enough" to talk to  
> safely.  This would allow people to upgrade to "slightly newer"  
> versions easily, but requires some work from users.  (This work could  
> perhaps be performed by packagers on behalf of users, by editing the  
> code before compiling it so that some values for this preference are  
> built in.)

I appreciate the thought behind this, because I've often thought that 
there must be larger equivalence classes of compatible versions.  But we 
can't expect users or packagers to know which versions are compatible.  
I'm both a user and a packager, and I have no knowledge of Unison's data 
structures and when they've changed enough to make them incompatible 
with other versions.  I would have to discover them by trial and error, 
which is unsatisfactory.  So really the information would have to come 
from you, the developers, which puts us back in option #1.

> 3) Change the RPC mechanism so that each RPC is tagged with a hash of  
> the actual type declaration for the data being communicated.  This is  
> quite a bit of work, but it's certainly the best solution technically:  
> completely safe, completely automatic.  Eijiro Sumii and I have been  
> developing some low-level infrastructure that could perhaps be used for  
> this (but don't hold your breath :-).

I agree that this sounds good, if you had time to implement it.  But we 
would still need a way of signaling to packagers that a new, 
incompatible version had been released, so that a new package could be 
prepared.  And, we'd still need a rule for telling users what version to 
run, i.e. if you need to synchronize with a server running version 2.a.b 
through 2.c.d, then run version 2.e.f.  This would be much easier if you 
can just tell everyone that the first two parts of the version number 
have to match, as in #1.

> 4) Reorganize the Unison repository and development process into  
> different branches -- "trunk", "stable", etc.  I'm reluctant to take  
> this path because it increases the overhead of making improvements to  
> Unison: we are still in a situation where very few people feel able to  
> make significant changes to the code, and I don't want to decrease  
> their motivation for doing so!

Do you mean that there would be, for example, a current development 
version as there is now, plus patched versions of e.g. 2.10.2 and 
2.12.0?  This sounds like a good idea to me, and in fact it's pretty 
close to what we have now.  When major fixes are merged into the 
development tree, someone (usually Jerome) then also releases them as 
patches to versions 2.10.2 and 2.12.0.  So the work of backporting 
changes (and ensuring that the changes don't break version 
compatibility) is already being done.  What's different is that instead 
of the patches going into a CVS repository, they go to the mailing 
lists, where I and the other packagers scoop them up and add them to our 
own patch repositories.  This seems like a less desirable method, but it 
does basically work so far.  My Unison patches page is an attempt to 
standardize the patch set.

I also don't want to make you and Jerome's work harder, especially for a 
project that's not supposed to be in active development any more...  But 
I wonder where most of the work is:  in backporting fixes to old 
versions, or in maintaining the source branches for old versions?  If 
maintaining the old versions is enough extra burden that the developers 
don't want to take it on, then maybe we should find a different 
maintainer for those branches.  S/he might still have to rely on the 
developers to provide the patches, so it wouldn't reduce that load on 
you.  But it would solve the patch problem for packagers-- we could just 
grab the latest CVS version of "2.10.2".

A.