[Unison-hackers] reconsidering the backups concept

Greg Troxel gdt at lexort.com
Fri Sep 16 20:38:08 EDT 2022

Christoph Groth <christoph at grothesque.org> writes:

> Greg Troxel wrote:
>> unison has a concept of making backups, but this is awkward because:
>> 1) They are sometimes put in a different place than the replica (could
>> be different security properties, could have different space issues).
>> I think this shouldn't happen, at least by default.

Thanks for writing.  I paused in replying in hopes others would jump in.

> On the other hand, scattering backup files all over the replica could be
> seen as a bad default as well.

True; there are no great options.

> A further issue is that the default central backup location is the same
> for any pair of replicas, so that backups relating to different replicas
> are mixed up.

That seems like a bug that should be fixed.

> This could be especially problematic with the backupcurr option for
> three-way merges, if the same file is synced with multiple replicas.
> The consequence will be that three-way merge may not be proposed in
> cases where it would be possible if the backup directories were
> separate.

Good point.

> I think that a solution to all these problems could be to
> • keep backups centralized by default,

I guess, but central/local is a preference with no good answer.

> • but make the default backup location undefined, forcing the user to
>   explicitly choose one.
> The manual could mention that the backup directory should likely have
> a unique name for each pair of replicas.

Or perhaps unison could construct one, maybe using the same hash as
archive, and maybe something else.

> In addition, to solve the security problem, unison could mirror
> permissions and ownership when backing up files (and when creating
> directories) and warn/fail if setting them is impossible.

That would be good, and helps, but it doesn't fix it.  Just writing
content to a different filesystem is a problem, because it is possibly a
different physical medium and possibly has different encryption status.
For example I have seen environments where certain files have to be on
removable mediuma and not on the system disk (an actual policy, that
made sense for an environment and rules that I am not going to

>> 2) The scheme is simplistic and there is no eventual garbage
>> collection of no longer needed backups.  Of course, software can't
>> tell what the future holds and "no longer needed" is tricky
>> business.  I'm not really sure what to do here, even ig there were
>> cycles to do it.  My own approach to backups is to keep them
>> indefinitely and buy more disks over time and at some point decide
>> they are so old they don't matter.  Because I generally back up the
>> contents of replicas, and don't run -auto, I have never turned on
>> unison's mechanism.
>> 3) The interaction of backups for merge and backups on delete/modify
>> is not easy to understand, and may not be the best choice.
> In my opinion “backupcurr” is more important than “backup” for Unison’s
> core functionality.  Backupcurr allows thee-way merges which are very
> useful when synchronizing text files, while plain “backup” seems hardly
> useful if one has some other backup solution already (as one should).

It strikes me that backupcurr and backup should maybe have independent
storage as they are different mechanisms.

> Moreover, since “backupcurr” explicitly only keeps a single copy of the
> *current* version of the file, it does not seem problematic at all to
> automatically remove such backups upon deletion of a file.  More than
> that, in contrast to plain “backup”, it even corresponds to what Unison
> might be expected to do, since the “current version” of a deleted file
> is “no file”.

I agree, but I see the distinction that it's a merge pre-image and thus
not useful if the file is gone, vs a saved copy of a deleted file.

> It seems to me that an easy and conservative but useful enhancement of
> Unison would be to add an option “deletebackupcurr” (or similar) that
> would be “off” by default.

Yes, as long as backupcurr and backup backup-copies are different.

> (If breaking absolute backward compatibility is possible, eventually it
> could be required to set this option explicitly, and finally the default
> could change to “on”, and the requirement to specify it disappear.)

I don't think we necessarily need to keep backwards compat for things
that are arguable, with notice in NEWS of upcoming changes.

>> It might be that it would be good to have "backupmaxage", defaulting
>> to off, but settable to e.g. 1 year, that causes unison to
>> periodically scan replicas (when syncing, not every time) and remove
>> backups that were created too long ago.  It might be that this has
>> consequences we don't like -- I don't really know.
> I think that this is dangerous if not limited to files that are no
> longer in the replica: even an old backup can be the most recent one.
> Moreover, simply deleting files of a certain age can be implemented
> easily outside of unison.

Good point about wanting old backups that are recent, but OTOH the point
of backups is to allow recovery from mistakes, and a trashbin that is
emptied after N days is a normal idiom.  And the implied "I wouldn't
want this" is valid and points out how tricky specifying something is.

> However, limiting “backupmaxage” to backups of files that are no longer
> present in the replicas poses the difficulty that the same backup
> directory may be (and by default indeed is) used for different pairs of
> replicas, so that it seems difficult to define the concept of “a backup
> file of a file that no longer exists in the replica”.

We need to fix that first.  I see backupfiles from multiple replicas
being commingled as the largest issues with the scheme.

> All in all, I do not see a satisfactory and easy solution to this
> problem.

I think we can have knobs and people can use them or not use them, as
they think they work for them.  Getting rid of backups is a balancing
act between deciding that the chance of wanting them is small compared
to the downsides of keeping them, both resource use and retention

> This is in contrast to deleting “backups” that are only kept to allow
> three way merges.  After all, after deleting a file no three-way merge
> is possible anyway.

True, that's a really good point.

>> I don't intend to work on this, because it solves a problem I don't
>> have, but if someone does, please feel free to discuss here,
>> especially if you're willing to write code, write manual text, and
>> write tests.
> I do not speak OCaml and have only limited time for such things, but
> I hope to have made the point that adding a mechanism for deleting
> “backupcurr” files would be easy, consistent, safe and useful.

Yes, you have, and you have IMHO more importantly pointed out that the
central storage scheme has collisions.

I would really like to hear what other people think about this.

If I don't hear anything, I'm inclined to open two tickets:

  - backup locations are mixed across replicas and between
    backupcurr/backup (defect ticket)

  - Add deletebackupcurr preference (feature ticket)
    If true, remove files created by backupcurr if the source file is no
    longer in the repo.  Perhaps do this on delete, perhaps have a
    command to do search/rm explicitly, perhaps run this every Nth sync,
    perhaps ?

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 194 bytes
Desc: not available
URL: <http://LISTS.SEAS.UPENN.EDU/pipermail/unison-hackers/attachments/20220916/8456c988/attachment.asc>

More information about the Unison-hackers mailing list