[Unison-hackers] Memory exhaustion issue (#1068)

Sun Jan 5 11:25:38 EST 2025

On 05/01/2025 15:39, Greg Troxel wrote:
> What are you doing about swap space?  Does your system page?  If you are
> running a RPI3 in aarch64 mode, you could configure 32GB of swap space
> on a USB-attached SSD, and then unison should run.  Have you tried that?
I originally had 3 GB of swap space on a USB-attached magnetic disk. The
result is that, as soon as physical memory is exhausted, the system
becomes unusable because it spends most if its time waiting for swapping
operations to complete. An SSD might be faster, but the connection is
still over USB 2.0, so it might not even make a difference at all.
>> I’d be curious to know what the “average” use case for Unison looks
>> like. Does anyone happen to have an idea of how people out there are
>> using Unison?
> There are about 1000 people on the users list, so I'd guess there are at
> least 10K users.  But I'm guessing.   Unison, being honorable Free
> Software, does not contain tracking code to report.
Thank God there’s no tracking code in Unison :-) that’s what I expected,
but I thought you guys might have an idea based on discussions on this
lists, bug reports etc. You’ve probably seen a lot more usage scenarios
than I have.
>> For my use case, around a terabyte of data in a million files, accessed
>> by a handful of users and fairly static, would be well within the means
>> of a Pi 3 with 1 GB of RAM – as long as we’re just talking about file
>> sharing via CIFS or SFTP. It‘s only when Unison gets involved that the
>> system reaches its limit.
> CIFS/SFTP is not a fair comparison, because that is access not sync.
This was not meant to imply that others do the job better, but to
illustrate the perspective of a user who adds sync capabilities to their
existing file server: without sync, 1 GB (probably even less) will work
just fine. But as soon as sync enters the picture, a lot more memory is
suddenly needed. Other tools might have similar issues – except that the
“competitors” (Syncthing, git etc.) work quite differently from Unison
and are definitely not a drop-in replacement.
> You could invent virtual memory and use it :-)
See above – disk access times make this impractical.
> Or you could implement
> application-specific virtual memory.   This would look like on-disk
> storage for archives, with a cache of objects in memory, and reading
> them on demand.
The general issue with any kind of virtual memory and swapping is to
predict what data will be needed, and what is OK to swap out. Keeping
the archive on disk and caching just parts in memory is a kind of memory
virtualization what swapping as well, except it happens in the
application and the code can be tuned to minimize swap operations. For
example, if archive entries are identified by their hash code, one could
make the order in which files are compared dependent on hash codes, so
each chunk of the archive needs to be loaded into memory only once in
the comparison step.
> You could also choose to generate profiles for subparts of the file tree
> and run them sequentially.  I have organized files by directory and sync
> them separately anyway, because I want to control which directories get
> synced to which subset of places, for various reasons not about unison.

That’s what I did before. However, I sometimes move files between these
subparts (and incidentally, these tend to be on the larger side, several
GB per file). These files would then get copied over the wire again,
instead of the move operation just being mirrored on the other end. And
with a copy-on-write filesystem with snapshots, there are further
implications regarding disk space.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://LISTS.SEAS.UPENN.EDU/pipermail/unison-hackers/attachments/20250105/08e53e8e/attachment-0001.htm>