[Unison-hackers] Memory exhaustion issue (#1068)

Tõivo Leedjärv toivol at gmail.com
Sun Jan 5 07:06:47 EST 2025


On Wed, 27 Nov 2024 at 20:41, Michael von Glasow <michael at vonglasow.com> wrote:
>
> At 35000 files: 33.5M
> At 70000 files: 51.6M
> At 140000 files: 89.7M
> At 210000 files: 126.0M
> At 280000 files: 170.1M
> At 350000 files: 213.0M
> At 420000 files: 250.1M
> At 490000 files: 290.3M
> At 560000 files: 339.1M
> At 630000 files: 377.3M
> At 686000 files: 478M
> At 700000 files: 523.5M
>
> That is, initially memory consumption increases fairly constantly with
> the number of files, about 40 MB per 70,000 files, which would be some
> 570 bytes per file. After reaching 490,000 files (70%), this goes up
> slightly (the next 70,000 files occupy about 50 MB), then returns to the
> old rate until, above 630,000 files, memory usage increases
> dramatically: the next 56,000 files occupy around 100 MB of memory,
> which would correspond to some 1.8 kB per file, almost twice the size of
> a file. That is, unless one of these figures lags behind the other. On
> average, one file occupies around 730 bytes of memory.

I have run tests similar to what you did and the results are roughly
the same. This is normal and amounts to Unison building up the
in-memory database (the "archive") which obviously gets bigger with
each synced file. I also ran tests with memtrace as suggested by
Jacques-Henri and found no obvious leaks or bugs (which doesn't mean
there couldn't be any, just that they're not easy to find).

This increase in memory usage is due to the metadata and is not
related to the size of synced files; it is a direct function of the
number of synced items.

What explains the issue suddenly appearing after you upgraded from the
older Unison version is that you also upgraded from 32 bits to 64
bits. OCaml values use word-sized blocks in memory. Going from 32 to
64 bits basically doubles the memory need for pretty much all the
metadata, with notable exception of string values (file/dir names and
more complex props, such as xattrs and ACLs, if you're syncing those).

You might be wondering if ~800 bytes for each file is reasonable or
whether the entire archive has to be in memory. For the latter, I can
only say that this has not been a limitation and your case is rather
extreme. For the former, it is difficult to say. Unison does keep
quite a bit of metadata, but it is not for nothing. All this metadata
is required for functionality, performance and safety. It may very
well be possible to tweak some data structures and squeeze a few bytes
in memory representation. It may also be possible to find
opportunities to discard some values sooner in order to let GC
re-claim the memory. But none of this is immediately obvious; like I
wrote before, I believe all the low hanging fruit is gone already.
Getting further meaningful savings (for example, one word or 8 bytes
per file is not going to cut it) requires considerable effort.


More information about the Unison-hackers mailing list