[Unison-hackers] An idea for faster initial syncs

Benjamin Pierce bcpierce at cis.upenn.edu
Wed Jul 21 21:48:09 EDT 2010


Hi Jerome,

When I started working on implementing this idea, I ran into a couple of questions:

* If we have this, do we need the fingerprint cache any more?  Can we justify the complexity of keeping both schemes?

* It's tempting to make this a default behavior -- i.e., to *always* use a dummy fingerprint whenever we encounter a new file.  The cost of this is that, if the file happens to get touched, it will get re-fingerprinted and show up as having been changed even if it has not really.  With a little work, maybe this could be made invisible to the user most of the time; we'd have to notice when we had a recon item involving a "recently dummy" file (i.e., where one side is a dummy and the other is not) and re-fingerprint the non-dummy side.  But I wonder if the cost (in terms of making the user's model of the system's default behavior more complex) is worth it.

Thoughts?  (Especially about the first...)

    - B



On Jun 28, 2010, at 5:41 AM, Jerome Vouillon wrote:

> On Thu, Jun 17, 2010 at 10:31:59AM -0400, Benjamin C. Pierce wrote:
>> My idea is to add a switch that says to Unison "I know that the
>> replicas are in sync already and I want to you rebuild your archives
>> as fast as possible."  When this switch is set, Unison would skip
>> fingerprinting the contents of new files -- it would simply store a
>> dummy fingerprint (a hash of the file's size and permissions) in the
>> archive for each file.  As long as files are never changed after
>> this, this dummy fingerprint would never be looked at, so Unison's
>> behavior would remain the same.  If a file is changed at some point
>> in the future, Unison will fingerprint the new contents, detect a
>> change, and copy the new version to the other side, again behaving
>> as it should.  The one slight difference in behavior will be that if
>> a file is really changed on one side but only touched on the other,
>> Unison will detect a conflict rather than propagating the change.
> 
> That's an interesting idea, indeed.  It can also improve the user
> experience when one of the replicas is initially empty: Unison will
> start propagating files right away instead of spending a lot of time
> scanning files.
> 
> There should be a way to make Unison replace in the archives the dummy
> fingerprints by the actual fingerprints.  This requires a bit of work,
> as we have to make sure that the archives are updated simultaneously.
> But that could be implemented later.
> 
>> Second, more seriously, if there is some file with different sizes
>> (or that exists on one replica and not the other), Unison will
>> calculate a dummy fingerprint during update detection and then later
>> think that the file hasn't been transferred correctly because the
>> fingerprints don't match.  We may need a special case in the
>> fingerprint check at the end that recalculates the fingerprint if it
>> is a dummy.
> 
> That's the most delicate part, indeed.
> - when the archive contains a dummy fingerprint, we should not scan
>  the file contents do decide whether the file has been changed,
>  whether fastcheck is set to true or false, so that
>  Update.checkNoUpdates works properly;
> - Copy.paranoidCheck should just return the computed checksum in case
>  of mismatch.  Then, the appropriate action can be taken in
>  Copy.checkContentsChangeLocal, where we will have a possibly dummy
>  fingerprint from update detection and accurate fingerprints of the
>  source file and temporary destination file.
> 
> As we have computed the actual fingerprint during file transfer, we
> should put it in the archive.  But that could be implemented in a
> second step.
> 
> -- Jerome
> _______________________________________________
> Unison-hackers mailing list
> Unison-hackers at lists.seas.upenn.edu
> http://lists.seas.upenn.edu/mailman/listinfo/unison-hackers



More information about the Unison-hackers mailing list