[Unison-hackers] An idea for faster initial syncs

Benjamin C. Pierce bcpierce at cis.upenn.edu
Thu Jun 17 10:31:59 EDT 2010


The process of setting up a new laptop this week has brought home to me, again, the fact that unison's initial scan of the replicas is horribly slow when the replicas are big (and whose replicas aren't big these days?).  I've been thinking about a small hack that might make this better.

My idea is to add a switch that says to Unison "I know that the replicas are in sync already and I want to you rebuild your archives as fast as possible."  When this switch is set, Unison would skip fingerprinting the contents of new files -- it would simply store a dummy fingerprint (a hash of the file's size and permissions) in the archive for each file.  As long as files are never changed after this, this dummy fingerprint would never be looked at, so Unison's behavior would remain the same.  If a file is changed at some point in the future, Unison will fingerprint the new contents, detect a change, and copy the new version to the other side, again behaving as it should.  The one slight difference in behavior will be that if a file is really changed on one side but only touched on the other, Unison will detect a conflict rather than propagating the change.

I can see a couple of potential issues with this scheme.  First, if, on the initial sync, the replicas are *not* identical -- in particular, if the files at some path differ but have the same length -- Unison will miss this change.  I think we could live with this, because it will be easy for users to understand this danger: they are telling Unison not to look for changes, so they won't be surprised that it doesn't.  

Second, more seriously, if there is some file with different sizes (or that exists on one replica and not the other), Unison will calculate a dummy fingerprint during update detection and then later think that the file hasn't been transferred correctly because the fingerprints don't match.  We may need a special case in the fingerprint check at the end that recalculates the fingerprint if it is a dummy.

Comments?

    - Benjamin




More information about the Unison-hackers mailing list