[Unison-hackers] An idea for faster initial syncs
Benjamin Pierce
bcpierce at cis.upenn.edu
Wed Jul 21 21:48:09 EDT 2010
Hi Jerome,
When I started working on implementing this idea, I ran into a couple of questions:
* If we have this, do we need the fingerprint cache any more? Can we justify the complexity of keeping both schemes?
* It's tempting to make this a default behavior -- i.e., to *always* use a dummy fingerprint whenever we encounter a new file. The cost of this is that, if the file happens to get touched, it will get re-fingerprinted and show up as having been changed even if it has not really. With a little work, maybe this could be made invisible to the user most of the time; we'd have to notice when we had a recon item involving a "recently dummy" file (i.e., where one side is a dummy and the other is not) and re-fingerprint the non-dummy side. But I wonder if the cost (in terms of making the user's model of the system's default behavior more complex) is worth it.
Thoughts? (Especially about the first...)
- B
On Jun 28, 2010, at 5:41 AM, Jerome Vouillon wrote:
> On Thu, Jun 17, 2010 at 10:31:59AM -0400, Benjamin C. Pierce wrote:
>> My idea is to add a switch that says to Unison "I know that the
>> replicas are in sync already and I want to you rebuild your archives
>> as fast as possible." When this switch is set, Unison would skip
>> fingerprinting the contents of new files -- it would simply store a
>> dummy fingerprint (a hash of the file's size and permissions) in the
>> archive for each file. As long as files are never changed after
>> this, this dummy fingerprint would never be looked at, so Unison's
>> behavior would remain the same. If a file is changed at some point
>> in the future, Unison will fingerprint the new contents, detect a
>> change, and copy the new version to the other side, again behaving
>> as it should. The one slight difference in behavior will be that if
>> a file is really changed on one side but only touched on the other,
>> Unison will detect a conflict rather than propagating the change.
>
> That's an interesting idea, indeed. It can also improve the user
> experience when one of the replicas is initially empty: Unison will
> start propagating files right away instead of spending a lot of time
> scanning files.
>
> There should be a way to make Unison replace in the archives the dummy
> fingerprints by the actual fingerprints. This requires a bit of work,
> as we have to make sure that the archives are updated simultaneously.
> But that could be implemented later.
>
>> Second, more seriously, if there is some file with different sizes
>> (or that exists on one replica and not the other), Unison will
>> calculate a dummy fingerprint during update detection and then later
>> think that the file hasn't been transferred correctly because the
>> fingerprints don't match. We may need a special case in the
>> fingerprint check at the end that recalculates the fingerprint if it
>> is a dummy.
>
> That's the most delicate part, indeed.
> - when the archive contains a dummy fingerprint, we should not scan
> the file contents do decide whether the file has been changed,
> whether fastcheck is set to true or false, so that
> Update.checkNoUpdates works properly;
> - Copy.paranoidCheck should just return the computed checksum in case
> of mismatch. Then, the appropriate action can be taken in
> Copy.checkContentsChangeLocal, where we will have a possibly dummy
> fingerprint from update detection and accurate fingerprints of the
> source file and temporary destination file.
>
> As we have computed the actual fingerprint during file transfer, we
> should put it in the archive. But that could be implemented in a
> second step.
>
> -- Jerome
> _______________________________________________
> Unison-hackers mailing list
> Unison-hackers at lists.seas.upenn.edu
> http://lists.seas.upenn.edu/mailman/listinfo/unison-hackers
More information about the Unison-hackers
mailing list