[Unison-hackers] An idea for faster initial syncs

Jerome Vouillon Jerome.Vouillon at pps.jussieu.fr
Mon Jun 28 05:41:20 EDT 2010


On Thu, Jun 17, 2010 at 10:31:59AM -0400, Benjamin C. Pierce wrote:
> My idea is to add a switch that says to Unison "I know that the
> replicas are in sync already and I want to you rebuild your archives
> as fast as possible."  When this switch is set, Unison would skip
> fingerprinting the contents of new files -- it would simply store a
> dummy fingerprint (a hash of the file's size and permissions) in the
> archive for each file.  As long as files are never changed after
> this, this dummy fingerprint would never be looked at, so Unison's
> behavior would remain the same.  If a file is changed at some point
> in the future, Unison will fingerprint the new contents, detect a
> change, and copy the new version to the other side, again behaving
> as it should.  The one slight difference in behavior will be that if
> a file is really changed on one side but only touched on the other,
> Unison will detect a conflict rather than propagating the change.

That's an interesting idea, indeed.  It can also improve the user
experience when one of the replicas is initially empty: Unison will
start propagating files right away instead of spending a lot of time
scanning files.

There should be a way to make Unison replace in the archives the dummy
fingerprints by the actual fingerprints.  This requires a bit of work,
as we have to make sure that the archives are updated simultaneously.
But that could be implemented later.

> Second, more seriously, if there is some file with different sizes
> (or that exists on one replica and not the other), Unison will
> calculate a dummy fingerprint during update detection and then later
> think that the file hasn't been transferred correctly because the
> fingerprints don't match.  We may need a special case in the
> fingerprint check at the end that recalculates the fingerprint if it
> is a dummy.

That's the most delicate part, indeed.
- when the archive contains a dummy fingerprint, we should not scan
  the file contents do decide whether the file has been changed,
  whether fastcheck is set to true or false, so that
  Update.checkNoUpdates works properly;
- Copy.paranoidCheck should just return the computed checksum in case
  of mismatch.  Then, the appropriate action can be taken in
  Copy.checkContentsChangeLocal, where we will have a possibly dummy
  fingerprint from update detection and accurate fingerprints of the
  source file and temporary destination file.

As we have computed the actual fingerprint during file transfer, we
should put it in the archive.  But that could be implemented in a
second step.

-- Jerome


More information about the Unison-hackers mailing list