From wasabi at larvalstage.net Wed Nov 2 17:03:16 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Wed, 02 Nov 2005 16:03:16 -0600 Subject: [Unison-hackers] File copying/moving optimizations Message-ID: <1130968996.24721.21.camel@localhost.localdomain> So I notice that Unison completely re-downloads a file when it is the result of a copy or a move. Makes sense from a simple implementation point of view, but I was wondering about an alternative. When a file is seen as not existing in the replica, it makes sense to me for the data of the file to be searched for in the replica, and copied to the new name if found. The original says a file with a certain MD5 sum exists at a new location, the replicate searches for an existing file with that MD5 sum and copies it. Seems like this could speed up synchronization in some specific cases by a good deal. Simply renaming a large top level directory would result in a new directory being created in the replica and populated from the previous data. Much faster! Thoughts? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051102/0526aa5c/attachment-0001.pgp From drayside at MIT.EDU Wed Nov 2 17:12:16 2005 From: drayside at MIT.EDU (Derek Rayside) Date: Wed, 2 Nov 2005 17:12:16 -0500 (EST) Subject: [Unison-hackers] File copying/moving optimizations In-Reply-To: <1130968996.24721.21.camel@localhost.localdomain> References: <1130968996.24721.21.camel@localhost.localdomain> Message-ID: On Wed, 2 Nov 2005, Jerry Haltom wrote: > So I notice that Unison completely re-downloads a file when it is the > result of a copy or a move. Makes sense from a simple implementation > point of view, but I was wondering about an alternative. > > When a file is seen as not existing in the replica, it makes sense to me > for the data of the file to be searched for in the replica, and copied > to the new name if found. The original says a file with a certain MD5 > sum exists at a new location, the replicate searches for an existing > file with that MD5 sum and copies it. > > Seems like this could speed up synchronization in some specific cases by > a good deal. Simply renaming a large top level directory would result in > a new directory being created in the replica and populated from the > previous data. Much faster! > > Thoughts? That would be a nice feature. I wonder if it could be implemented without threatening correctness. My current work-around is to rename the top level directory on both replicas: it takes unison less time to figure out that the changes are identical than it does to transfer the data. From alan.schmitt at polytechnique.org Thu Nov 3 03:16:35 2005 From: alan.schmitt at polytechnique.org (Alan Schmitt) Date: Thu, 3 Nov 2005 09:16:35 +0100 Subject: [Unison-hackers] File copying/moving optimizations In-Reply-To: <1130968996.24721.21.camel@localhost.localdomain> References: <1130968996.24721.21.camel@localhost.localdomain> Message-ID: <48469524-28BE-4A49-83AE-8D98F7C97A48@polytechnique.org> On 2 nov. 05, at 23:03, Jerry Haltom wrote: > So I notice that Unison completely re-downloads a file when it is the > result of a copy or a move. Makes sense from a simple implementation > point of view, but I was wondering about an alternative. > > When a file is seen as not existing in the replica, it makes sense > to me > for the data of the file to be searched for in the replica, and copied > to the new name if found. The original says a file with a certain MD5 > sum exists at a new location, the replicate searches for an existing > file with that MD5 sum and copies it. > > Seems like this could speed up synchronization in some specific > cases by > a good deal. Simply renaming a large top level directory would > result in > a new directory being created in the replica and populated from the > previous data. Much faster! > > Thoughts? Well, it should already be the case. According to the manual: xferbycopying When this preference is set, Unison will try to avoid transferring file contents across the network by recognizing when a file with the required contents already exists in the target replica. This usually allows file moves to be propagated very quickly. The default value is true. Alan Schmitt -- The hacker: someone who figured things out and made something cool happen. .O. ..O OOO -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051103/3bb0fd44/PGP.pgp From wasabi at larvalstage.net Thu Nov 3 10:06:58 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Thu, 03 Nov 2005 09:06:58 -0600 Subject: [Unison-hackers] File copying/moving optimizations In-Reply-To: <48469524-28BE-4A49-83AE-8D98F7C97A48@polytechnique.org> References: <1130968996.24721.21.camel@localhost.localdomain> <48469524-28BE-4A49-83AE-8D98F7C97A48@polytechnique.org> Message-ID: <1131030418.11281.1.camel@localhost.localdomain> Ahh. That isn't in the man page. Hadn't noticed it! THanks! On Thu, 2005-11-03 at 09:16 +0100, Alan Schmitt wrote: > On 2 nov. 05, at 23:03, Jerry Haltom wrote: > > > So I notice that Unison completely re-downloads a file when it is the > > result of a copy or a move. Makes sense from a simple implementation > > point of view, but I was wondering about an alternative. > > > > When a file is seen as not existing in the replica, it makes sense > > to me > > for the data of the file to be searched for in the replica, and copied > > to the new name if found. The original says a file with a certain MD5 > > sum exists at a new location, the replicate searches for an existing > > file with that MD5 sum and copies it. > > > > Seems like this could speed up synchronization in some specific > > cases by > > a good deal. Simply renaming a large top level directory would > > result in > > a new directory being created in the replica and populated from the > > previous data. Much faster! > > > > Thoughts? > > Well, it should already be the case. According to the manual: > > xferbycopying > When this preference is set, Unison will try to avoid transferring > file contents across the network by recognizing when a file with the > required contents already exists in the target replica. This usually > allows file moves to be propagated very quickly. The default value is > true. > > Alan Schmitt > > _______________________________________________ > Unison-hackers mailing list > Unison-hackers at lists.seas.upenn.edu > http://lists.seas.upenn.edu/mailman/listinfo/unison-hackers -- Jerry Haltom From Damien.Pous at ens-lyon.fr Fri Nov 4 09:22:14 2005 From: Damien.Pous at ens-lyon.fr (Damien Pous) Date: Fri, 04 Nov 2005 15:22:14 +0100 Subject: [Unison-hackers] patches: nodeletion, ignorefile Message-ID: <1131114134.5716.32.camel@mostha> Here are two patches: * nodeletion.diff: adds a "nodeletion" switch, that prevents unison from propagating deletion automatically: 1) a conflict is issued when a path has been deleted in one replica, and left unmodified in the other one 2) in batch mode, deletions are skipped (with a warning) point 2 is indeed a safety assertion, since all deletions should be handled as conflicts and thus be skipped in batch mode * ignorefile.diff: adds a "ignorefile" option, that allows one to specify a filename to search for additional ignore directives, in a per directory basis (like .cvsignore files): suppose you work with "ignorefile = .unisonignore", you can ignore all .eps files of a directory by putting a .unisonignore file in this directory, containing the line '*.eps' I did it by redefining the function Globals.shouldIgnore whenever an ignore file is read, its content (converted as a regex) is cached in an hashtable. I had to add an argument "fspath" to this function, since it had to manipulate `real' paths. I could add this parameter to every call to this function, except for the calls in Ui{text,gtk...} when an ignore directive is added by the user and that the list on reconItems has to be filtered. This is not problematic since in this case, shouldIgnore is called with paths that have already been checked against ignore files. I have not yet widely tested the latter patch... Damien -------------- next part -------------- A non-text attachment was scrubbed... Name: ignorefile.diff Type: text/x-patch Size: 7747 bytes Desc: not available Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051104/89288522/ignorefile.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: nodeletion.diff Type: text/x-patch Size: 4436 bytes Desc: not available Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051104/89288522/nodeletion.bin From michael.parsons at ucdmc.ucdavis.edu Fri Nov 4 09:44:31 2005 From: michael.parsons at ucdmc.ucdavis.edu (Michael Parsons) Date: Fri, 4 Nov 2005 06:44:31 -0800 Subject: [Unison-hackers] Michael Parsons/IS/HS/UCD is out of the office. Message-ID: I will be out of the office starting 11/04/2005 and will not return until 11/08/2005. From wasabi at larvalstage.net Tue Nov 8 14:38:49 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Tue, 08 Nov 2005 13:38:49 -0600 Subject: [Unison-hackers] identical conflict resolution Message-ID: <1131478729.9269.11.camel@localhost.localdomain> It looks like that a situation can arise where a file is created on BOTH ends of a synchronized pair between syncs which Unison could handle better. In the case of the files containomg the exact same data (same hash), they should simply be updated to the latest mod time. ? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051108/c11bf7c0/attachment-0001.pgp From dworley at pingtel.com Tue Nov 8 19:30:45 2005 From: dworley at pingtel.com (Dale R. Worley) Date: Tue, 08 Nov 2005 19:30:45 -0500 Subject: [Unison-hackers] identical conflict resolution In-Reply-To: <1131478729.9269.11.camel@localhost.localdomain> References: <1131478729.9269.11.camel@localhost.localdomain> Message-ID: <1131496245.7438.4.camel@maine.pingtel.com> On Tue, 2005-11-08 at 13:38 -0600, Jerry Haltom wrote: > It looks like that a situation can arise where a file is created on BOTH > ends of a synchronized pair between syncs which Unison could handle > better. In the case of the files containomg the exact same data (same > hash), they should simply be updated to the latest mod time. (Assuming that "times = true" is set.) That is not true. If both files were created, but their mod times are different, then the files are different -- their contents are the same but their properties are different, and Unison cannot itself determine what their properties ought to be. Dale From wasabi at larvalstage.net Tue Nov 8 19:59:45 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Tue, 08 Nov 2005 18:59:45 -0600 Subject: [Unison-hackers] identical conflict resolution In-Reply-To: <1131496245.7438.4.camel@maine.pingtel.com> References: <1131478729.9269.11.camel@localhost.localdomain> <1131496245.7438.4.camel@maine.pingtel.com> Message-ID: <1131497985.9269.17.camel@localhost.localdomain> I would agree, from a purely technical point of view, this is right. But from a practical point of view, I suspect that in the majority of situations, it doesn't matter. I can't think of a reason I would care whether the times were accurate, as long as the contents were the same. I will try it with times = false later and see if I get my desired effect. On Tue, 2005-11-08 at 19:30 -0500, Dale R. Worley wrote: > On Tue, 2005-11-08 at 13:38 -0600, Jerry Haltom wrote: > > It looks like that a situation can arise where a file is created on BOTH > > ends of a synchronized pair between syncs which Unison could handle > > better. In the case of the files containomg the exact same data (same > > hash), they should simply be updated to the latest mod time. > > (Assuming that "times = true" is set.) That is not true. If both files > were created, but their mod times are different, then the files are > different -- their contents are the same but their properties are > different, and Unison cannot itself determine what their properties > ought to be. > > Dale > > > _______________________________________________ > Unison-hackers mailing list > Unison-hackers at lists.seas.upenn.edu > http://lists.seas.upenn.edu/mailman/listinfo/unison-hackers -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051108/bd532748/attachment.pgp From dworley at pingtel.com Wed Nov 9 13:43:25 2005 From: dworley at pingtel.com (Dale R. Worley) Date: Wed, 09 Nov 2005 13:43:25 -0500 Subject: [Unison-hackers] identical conflict resolution In-Reply-To: <1131497985.9269.17.camel@localhost.localdomain> References: <1131478729.9269.11.camel@localhost.localdomain> <1131496245.7438.4.camel@maine.pingtel.com> <1131497985.9269.17.camel@localhost.localdomain> Message-ID: <1131561805.4353.12.camel@maine.pingtel.com> On Tue, 2005-11-08 at 18:59 -0600, Jerry Haltom wrote: > I would agree, from a purely technical point of view, this is right. But > from a practical point of view, I suspect that in the majority of > situations, it doesn't matter. > > I can't think of a reason I would care whether the times were accurate, > as long as the contents were the same. > > I will try it with times = false later and see if I get my desired > effect. I agree that it seems odd, but Unison seems to use the approach I described, with great consistency. I believe that if you truly do not consider the times to be important, only the contents, then you should use "times = false". And I see from the user's manual that the default is "times = false". Dale From jat at terra.com.br Wed Nov 9 23:07:00 2005 From: jat at terra.com.br (Jose Tavares) Date: Thu, 10 Nov 2005 02:07:00 -0200 Subject: [Unison-hackers] 1 side of sync has double the size.. Message-ID: <1131595621.10821.33.camel@p800> Hi all.. I don't know what is happening here, but I've already lost some files trying to discover why.. I'm using unison (in this case) for one-way backup.. I haven't finished syncing this dir yet and I've got .. p800:/mnt/p800-backup/home2/samba/killer/lmule/Temp# du -sh /home2/samba/killer/lmule/Temp 4.1G /home2/samba/killer/lmule/Temp p800:/mnt/p800-backup/home2/samba/killer/lmule/Temp# du -sh 7.4G . .... and it's still growing .. It's the first time I'm syncing this dir.. Why am I getting double of it's size? I'm using debian unstable with ext2 .. Thanks.. JA Tavares From wasabi at larvalstage.net Fri Nov 18 10:05:51 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Fri, 18 Nov 2005 09:05:51 -0600 Subject: [Unison-hackers] Other synching configurations Message-ID: <1132326351.14185.3.camel@localhost.localdomain> So Unison seems to work best (or perhaps at all) when using a Hub layout... one box on the middle synching to the others. I'm wondering why this is. If I sync one replica with two other replicas which also synch between themselves, should not each replica be aware of it's change state and try to replay that change state against each other, perhaps resulting in an attempt to delete a file that was already deleted by a replica, or modify a file that was already modified by another replica. But what problems would that cause? In the case of deleting, nothing should happen... it's already gone. In the case of modifying or adding, you'd still have to check for conflicts. If the files were being modified to the same state they are already in, it doesn't seem like an error conditition to me. So why does Unison only work well in a hub layout? From wasabi at larvalstage.net Fri Nov 18 10:21:29 2005 From: wasabi at larvalstage.net (Jerry Haltom) Date: Fri, 18 Nov 2005 09:21:29 -0600 Subject: [Unison-hackers] Backups. Message-ID: <1132327289.14185.7.camel@localhost.localdomain> I notice backup and backupdir. One thing I am not clear on. Since I am taking two directories, one remote and one local, both of which will have files deleted or modified by Unison... how do I specify different backupdirs for each side? From alan.schmitt at polytechnique.org Sun Nov 20 04:29:43 2005 From: alan.schmitt at polytechnique.org (Alan Schmitt) Date: Sun, 20 Nov 2005 10:29:43 +0100 Subject: [Unison-hackers] Other synching configurations In-Reply-To: <1132326351.14185.3.camel@localhost.localdomain> References: <1132326351.14185.3.camel@localhost.localdomain> Message-ID: <751BEDF2-7E4A-47AE-A83F-7476A9BC0135@polytechnique.org> On 18 nov. 05, at 16:05, Jerry Haltom wrote: > So why does Unison only work well in a hub layout? Consider the following scenario: you have three replicas, A, B, C with one file containing "init", and each one synchronizes with the others. A modifies the file to "foo" and syncs with B. Then B syncs with C. So all replica are in the same state and synchronized. Now imagine C changes the content of the file to "bar" and synchronizes with A. Intuitively, the change should propagate, but instead a conflict is detected, because from the point of view of unison, A and C previously had "init" and now one has "foo" and the other "bar". What is missing in this case is a way to propagate the synchronization state between synchronization pairs. I hope this makes things clearer. Alan -- The hacker: someone who figured things out and made something cool happen. .O. ..O OOO -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051120/69717801/PGP.pgp From dworley at pingtel.com Mon Nov 21 10:48:45 2005 From: dworley at pingtel.com (Dale R. Worley) Date: Mon, 21 Nov 2005 10:48:45 -0500 Subject: [Unison-hackers] Other synching configurations In-Reply-To: <751BEDF2-7E4A-47AE-A83F-7476A9BC0135@polytechnique.org> References: <1132326351.14185.3.camel@localhost.localdomain> <751BEDF2-7E4A-47AE-A83F-7476A9BC0135@polytechnique.org> Message-ID: <1132588125.28915.5.camel@cdhcp139.pingtel.com> On Sun, 2005-11-20 at 10:29 +0100, Alan Schmitt wrote: > Consider the following scenario: you have three replicas, A, B, C > with one file containing "init", and each one synchronizes with the > others. A modifies the file to "foo" and syncs with B. Then B syncs > with C. So all replica are in the same state and synchronized. Now > imagine C changes the content of the file to "bar" and synchronizes > with A. Intuitively, the change should propagate, but instead a > conflict is detected, because from the point of view of unison, A and > C previously had "init" and now one has "foo" and the other "bar". > What is missing in this case is a way to propagate the > synchronization state between synchronization pairs. This scenario could be cured by having C sync with A before the file is changed to "bar". Perhaps a rule of thumb is that if one makes a cycle of sync relationships, one has to keep all of the relationships updated, not just the replicas they connect. Dale From alan.schmitt at polytechnique.org Tue Nov 22 05:47:35 2005 From: alan.schmitt at polytechnique.org (Alan Schmitt) Date: Tue, 22 Nov 2005 11:47:35 +0100 Subject: [Unison-hackers] Other synching configurations In-Reply-To: <1132588125.28915.5.camel@cdhcp139.pingtel.com> References: <1132326351.14185.3.camel@localhost.localdomain> <751BEDF2-7E4A-47AE-A83F-7476A9BC0135@polytechnique.org> <1132588125.28915.5.camel@cdhcp139.pingtel.com> Message-ID: On 21 nov. 05, at 16:48, Dale R. Worley wrote: > This scenario could be cured by having C sync with A before the > file is > changed to "bar". Perhaps a rule of thumb is that if one makes a > cycle > of sync relationships, one has to keep all of the relationships > updated, > not just the replicas they connect. Yes, this is true. I do use unison with cycles for a few things, and with some discipline everything goes well. It's typically to work on a web site (which I may do from my laptop or from my desktop, which are synchronized), and before changing something I first do a sync to bring the local synchronization state to the correct value. Alan -- Alan Schmitt The hacker: someone who figured things out and made something cool happen. .O. ..O OOO -------------- next part -------------- A non-text attachment was scrubbed... Name: PGP.sig Type: application/pgp-signature Size: 186 bytes Desc: This is a digitally signed message part Url : http://lists.seas.upenn.edu/pipermail/unison-hackers/attachments/20051122/6816d5f3/PGP.pgp