[Unison-hackers] Memory exhaustion issue (#1068)
Tõivo Leedjärv
toivol at gmail.com
Sun Nov 24 06:29:27 EST 2024
On Sat, 23 Nov 2024 at 23:25, Michael von Glasow <michael at vonglasow.com> wrote:
>
> On 23/11/2024 23:13, Tõivo Leedjärv wrote:
> > What does unison -version report (it will show the compiler that was
> > used, as it may be important in this case)?
> unison version 2.53.3 (ocaml 4.14.1)
So that is good. Just wanted to make sure it wasn't compiler version
5.x (not that it would automatically mean any worse results but there
have been GC regressions early in the 5.x series).
> The small profile has some 8,400 items, slightly above 40 G in total.
> Archive file size is around 750 k.
Ok, this should not make a major difference.
> The big profile has some 700,000 items, around 350 G in total, archive
> file size around 58M.
This can be a source of major memory consumption. The archive is
loaded entirely into memory and the in-memory representation is such
that I wouldn't be surprised if this archive alone used up 1 GB of
memory.
Something to consider here is that moving from 32 bits to 64 bits
doubles the memory requirement for pointers and integers. That may
have a considerable impact; not saying that it does here but it's
something to keep in mind.
> >> times the size of the file – quite a lot IMO. And for a tool which can
> > The size of a synced file does not impact memory usage, so you are
> > most likely looking at a bug.
> 10 times the size of the file refers to the archive file (if that were
> 10 times the size of the synced file, it would have caused way bigger
> problems, much earlier... it‘s not that bad :-)
Right, I missed that. But that's how it is right now. Like I said, I
wouldn't be surprised if this archive actually used up 1 GB of memory.
There may not be any low-hanging fruit left that would significantly
reduce this specific memory usage. Syncing 700 000 with just 1 GB of
RAM is not too common.
> Scanning files for parts already present at the other end certainly
> makes sense in a scenario where bandwidth is more limited than memory –
> some of my Unison use cases involve connections with limited bandwidth.
> Only in this particular use case, bandwidth is decent (100 M) but memory
> is limited, and the bandwidth-saving approach, although well-meant, ends
> up backfiring.
I've also been thinking about this (and some of it may have in public
discussions) in general but it's not obvious where the balance goes
for defaults best supporting a wide range of users. And remember, some
of the features/defaults are from times when many people still used
dial-up. In general, the default assumption is that disk speed >>
network speed, but maybe this assumption no longer holds for
many/majority of current Unison users? But we digress.
> In more technical terms, a possible approach would be to detect when
> we‘re running out of memory (physical memory – excessive swapping can
> quickly cripple a system). If such a condition is detected, skip
> duplicate detection on that file (freeing up the memory used for that)
> and just copying it, (potentially) sacrificing bandwidth efficiency for
> stability. This is roughly what I’m currently doing with -copythreshold,
> except that the threshold is determined automatically and adapts itself
> to the current situation.
Well, you shouldn't be forced to use workarounds.
I ran a few tests similar to yours but slightly simplified, with
2.53.7. The roots I synced only contained the single test file and
nothing else. While I did see the monotonous memory usage increase,
the numbers are so small as to not really matter. All the numbers I
saw are < 35 MB. As per above, the major memory consumer is the
archive itself and my archive was the minimal possible, only one file.
I tested with rsync=false and memory consumption was slightly lower
(by a few MB).
I also did a quick comparison against 2.51.3 and 2.51.5 (did not have
2.52 at hand) and while I saw a slightly reduced memory usage compared
to 2.53.7 it is too small a difference and could amount to different
compilers (potential differences in GC) having been used.
You mentioned that the memory usage reduces on its own after a few
hours (but was this actual reduction or the process being swapped
out?). This indicates that it may not a true memory leak but rather a
delayed GC.
I tested this by patching 2.53.7 to force a GC run after the rsync
copy path and this reduced the memory consumption on server from 28 MB
(after syncing 6400 MB file) down to 9 MB.
A summary of my understanding so far:
- Memory usage increased migrating from 2.52.1 to 2.53.3. However, the
OS was migrated from 32-bit to 64-bit at the same time, so it is
difficult to attribute the memory usage increase to the version
upgrade as opposed to 64-bit software simply requiring more memory.
- The number of items in the archive (this is the primary metric for
"static" memory consumption) is 700 000, which is a huge number for
a system with only 1 GB of memory (not an excuse, just reality).
Actually copying files (with or without the "smarts") is probably
a measuring noise compared to this (or, has it been shown to be
otherwise?).
- You mentioned memory being freed (or rather swapped out?) after two
hours, but only between scan and sync. Was memory not freed a few
hours after the sync?
For next steps:
- The tests you ran should be repeated with both rsync=true (the
default) and rsync=false to compare the results.
- If you can, you should run the tests with completely empty roots.
This way we can measure the archives consuming memory vs the copying
consuming memory.
- If we can figure out how you can build yourself or if someone can
help with this, I can provide some patches for you to try and see
if they make any difference.
More information about the Unison-hackers
mailing list