[Unison-hackers] Memory exhaustion issue (#1068)
Michael von Glasow
michael at vonglasow.com
Sun Nov 24 10:09:00 EST 2024
On 24/11/2024 13:29, Tõivo Leedjärv wrote:
> On Sat, 23 Nov 2024 at 23:25, Michael von Glasow <michael at vonglasow.com> wrote:
>
>> The big profile has some 700,000 items, around 350 G in total, archive
>> file size around 58M.
> This can be a source of major memory consumption. The archive is
> loaded entirely into memory and the in-memory representation is such
> that I wouldn't be surprised if this archive alone used up 1 GB of
> memory.
My tests suggest it’s close to 300–400 MB (according to top output,
300.3M RES, 68.2M SWAP right after scanning).
> There may not be any low-hanging fruit left that would significantly
> reduce this specific memory usage. Syncing 700 000 with just 1 GB of
> RAM is not too common.
There’s probably always going to be someone who pushes the envelope yet
another bit further. When content to sync vs. memory exceeds a certain
ratio, it would make sense to look into other ways of optimizing memory
usage: either by moving part of the in-memory cache to a temporary file
on disk (which makes disk I/O more controllable than paging – the
process decides when to request disk access, rather than being at the
mercy of the OS) or cutting some corners on checks in order to lower
memory requirements. That goes at the expense of disk usage, execution
time and/or bandwidth, but that hurts less than running out of memory
and Unison crashing or even the whole system becoming unstable.
> the major memory consumer is the
> archive itself and my archive was the minimal possible, only one file.
What are the drivers for archive size? Number of files? Combined size of
files?
What happens to archive size if files are deleted from the root? Let’s
say I have a root with one small file an sync it – I’ll end up with a
minimal archive file. Now I add 700,000 files, 350 GB in total, and sync
again – the archive file grows to some 50–60 MB. Next I delete all these
files and sync again – will the archive file be back to its minimal
size? Or will it stay at 50–60 MB? Or somewhere in between?
> I tested with rsync=false and memory consumption was slightly lower
> (by a few MB).
I’ll try to reproduce with a VM (as I don’t have spare hardware and
don’t want to mess with my production system). My idea for a test would be:
* create a VM running Ubuntu 22.04, 1 GB RAM, some 24 GB HD, no swap
space (so I’d run out of memory rather than getting excessive swapping)
* test both the repo version and the latest CI binaries of Unison (makes
my life a lot easier if the architecture is amd64)
* test one pair of roots with 700,000 files of random content, each 1 KB
in size (plus one 16 GB file)
* test a minimal pair of roots (one file, 1 KB in size)
> You mentioned that the memory usage reduces on its own after a few
> hours (but was this actual reduction or the process being swapped
> out?). This indicates that it may not a true memory leak but rather a
> delayed GC.
I‘ll repeat that test with the VM (no swap), then we’ll see.
> I tested this by patching 2.53.7 to force a GC run after the rsync
> copy path and this reduced the memory consumption on server from 28 MB
> (after syncing 6400 MB file) down to 9 MB.
Would it make sense to trigger a GC whenever a potentially large chunk
of memory has been released? By potentially large I mean anything that
grows with the amount of data to be synced (total number of files, total
size of files, number of changed files, size of changed files etc.).
While I’d expect OCaml to trigger a GC when memory is low, I don‘t know
whether that also happens when physical memory is low but there is still
plenty of swap space.
> - The number of items in the archive (this is the primary metric for
> "static" memory consumption) is 700 000, which is a huge number for
> a system with only 1 GB of memory (not an excuse, just reality).
> Actually copying files (with or without the "smarts") is probably
> a measuring noise compared to this (or, has it been shown to be
> otherwise?).
In the real scenario, transferring a new 16 GB file to the server froze
the server system at 1%. With `copythreshold = 163840` the transfer was
uneventful.
> - You mentioned memory being freed (or rather swapped out?) after two
> hours, but only between scan and sync. Was memory not freed a few
> hours after the sync?
Not sure as I didn’t test that, but I can do so.
> For next steps:
>
> - The tests you ran should be repeated with both rsync=true (the
> default) and rsync=false to compare the results.
Currently in progress with rsync=-false (live environment with the
700,000 file archive). 4% of 16 GB have gone over the wire, the system
is still responding, Unison memory usage is around 570M res, 140M swap.
However, data transfer rates are oscillating heavily and seldom exceed 1
MByte/s. Network I/O reflects that, being way below 30 Mbps most of the
time. Disk I/O is similarly low (read/write way below 10 MiB/s most of
the time), hence swapping doesn’t seem to be what’s causing delays. CPU
load varies between 25–75%. Might take a while until I know for sure if
it worked.
> - If you can, you should run the tests with completely empty roots.
> This way we can measure the archives consuming memory vs the copying
> consuming memory.
See test scenarios above. I‘ll test one archive with only one file,
another with many files. What I might not be able to test in the VM
scenario is a large total size of files.
> - If we can figure out how you can build yourself or if someone can
> help with this, I can provide some patches for you to try and see
> if they make any difference.
Building for amd64 should not be an issue – at the worst, I can have CI
do that for me (I still have my own fork on Github). Only other
processor architectures are difficult, as I don‘t have a build machine
and would need to set up a VM.
More information about the Unison-hackers
mailing list