[Unison-hackers] Memory exhaustion issue (#1068)
Greg Troxel
gdt at lexort.com
Sat Nov 23 16:01:57 EST 2024
Michael von Glasow <michael at vonglasow.com> writes:
> Switching profiles should be sufficient – this created a new unison
> process on the server, while the old one gradually freed up its memory
> (but kept running).
Sorry, I guess I was unclear. I am not really looking for just
sufficient to make your case, but the simplest possible way to reproduce
the problem, expressed programmatically so that others can run it (after
reading the code to feel it is safe). So that's no GUI, no persistent
server, and everything in the profile to be synced created by the
script.
If it turns out that you need a persistent server, or you need a GUI,
that's how it is -- but it's also a huge clue.
For comparing with other people, I think it will be more useful to talk
about KB or MB of memory usage vs %. Lots of people will have different
amounts of RAM and other loads, although 1 GB such as RPI3 is pretty
common.
I personally am not interested in debugging anything older than 2.53.7
(or 6, but there is no reason to use 6 if you are compiling).
(You are of course welcome to debug older versions, and it's fine for
others to help you with this on the list.)
> For now, here’s the results of my test series. I performed sync using
> the GUI, keeping it running between sync runs, over an SSH connection.
>
> I created test files with:
>
> dd if=/dev/urandom of=/path/to/testfile bs=1M count=SIZE_IN_MBYTE
> status=progress
>
> Each run comprised the following steps:
>
> * Create test file
>
> * Scan (only change being the test file)
>
> * Sync, while monitoring server resources with top (in an SSH session)
> and Webmin stats
>
> * Delete file
>
> * Sync again
>
> * Repeat with next test file
>
> Test file sizes were 160M, 1600M, 3200M, 6400M, 12800M and 16000M (in
> that order).
>
> Unison on the server stayed running and got reused for each sync. Memory
> usage increased near-monotonously.
>
> After the 3200M file, memory usage was at 4.1%. After syncing 6400M, it
> climbed to 5.7% (+1.6%). After syncing 12800M, it went to 9.1% (+3.4%).
> After syncing 16000M, it went to 14.6% (+5.5%). When reusing the
> connection, increases happened only when transferring a new files, never
> during rescan, deletion or post-sync scans on the server.
So it sort of sounds like memory is allocated proportional to the size
of the transferred file, and not freed.
And, that other memory uesd for scan/etc. is reused.
>> It may be that your issue is profile size and the big file was just the
>> last straw, not the bug.
>
> Looks like profile size is a factor indeed. After scanning the big set
> of files, Unison uses 50% of available memory (during scan it oscillated
> between just below 45% and just above 60%). The rest of the system took
> up around 30%.
>
> However, if I wait for 2h between scan and sync, Unison on the server
> frees up most of its memory. If I start the sync, it jumps back to 60%
> by the time sync reaches 1%.
This seems like buggy behavior. It will be interesting to see if it can
be reproduced by a script, with 2.53.7, with ocaml 4.14, and then by
others.
> The archive file is 50M in size, the whole set of files is somewhere
> around 350G. That is somewhat large, but 50% of 1 GB would be 500M, ten
> times the size of the file – quite a lot IMO. And for a tool which can
> get that memory-hungry, it might be worthwhile to look into ways to
> reduce memory usage.
Sure, it would be worthwhile and I hope somebody does!
> The ticket instructs users to read the wiki for advice on memory usage,
> but none of the articles there immediately spring to mind as
> memory-related. What article are you referring to? Or what settings are
> recommended?
I am referring to
https://urldefense.com/v3/__https://github.com/bcpierce00/unison/wiki/Reporting-Bugs-and-Feature-Requests__;!!IBzWLUs!QuNgOAsHlsFz9um9Eh5kJ4aR7hoVZeSIQ7cmy7WU3ZZeCVW_OGkHVan0KDS0bZQPUfKibSnTT_3QMq-P-SULf-6M$
where it says
- debug with the latest release
- debug with ocaml 4.14
- Reduce the complexity of what you are doing, even if what you ultimately want is complicated. Simplification steps include:
Sync locally.
Use the Text User Interface.
Turn off the watcher.
Turn off -repeat.
Turn off -auto.
It does not say "produce a shell script that provokes the problem" and
probably it should.
> Looking at the docs, what comes to mind is:
>
> - copyprog, copyprogthreshold (use external program <copyprog> for
> copying files larger than <copyprogthreshold> kB)
That is about to be deleted.
> Or is there a way to tell Unison to stop being smart and just copy the
> damn thing (which is presumably less memory-hungry) if a file is larger
> than a certain size?
I don't think so, but really that should not be necessary. If there is
code that uses memory when it shouldn't, we should find that and fix it.
When I run 2.53.7 with 4.14, with a profile with 1 file, I see
reasonable usage. As far as I can tell you are still testing with old
unison, and unclear about which ocaml version.
So all I can say is that I'm not seeing what you're seeing.
In order to find the bug, my suggested path is for you to
- first, test with 2.53.7 with ocaml 4.14
- then, reduce complexity to the smallest repro recipe you can come
up, expressed as a shell script, and then for others to try that.
If you end up with two repro scripts, one that doesn't show the bug
and one slightly different that does, all the better.
I tried syncing a directory with 59K files, 5GB. RSS stayed mostly
under 45K and then jumped up right at the end.
gdt 22726 49.5 0.5 65456 44480 ? Os 3:45PM 2:54.78 unison -server __new-rpc-mode
gdt 22726 49.1 0.5 65168 43300 ? Rs 3:45PM 2:55.19 unison -server __new-rpc-mode
gdt 22726 48.7 0.5 65588 42880 ? Os 3:45PM 2:55.73 unison -server __new-rpc-mode
gdt 22726 48.8 0.5 65428 43252 ? Rs 3:45PM 2:56.27 unison -server __new-rpc-mode
gdt 22726 49.0 0.5 65112 43668 ? Rs 3:45PM 2:56.86 unison -server __new-rpc-mode
gdt 22726 49.0 0.5 62296 43028 ? Rs 3:45PM 2:57.42 unison -server __new-rpc-mode
gdt 22726 49.4 0.5 62296 43340 ? Ss 3:45PM 2:57.79 unison -server __new-rpc-mode
gdt 22726 46.9 0.6 65756 45760 ? Rs 3:45PM 2:58.07 unison -server __new-rpc-mode
gdt 22726 47.3 0.6 65756 45760 ? Os 3:45PM 2:58.41 unison -server __new-rpc-mode
gdt 22726 46.4 0.6 65756 46124 ? Rs 3:45PM 2:59.15 unison -server __new-rpc-mode
gdt 22726 47.6 0.6 69856 50972 ? Rs 3:45PM 2:59.67 unison -server __new-rpc-mode
gdt 22726 48.4 0.7 80104 56828 ? Rs 3:45PM 3:00.37 unison -server __new-rpc-mode
gdt 22726 48.4 0.7 80104 58864 ? Os 3:45PM 3:00.87 unison -server __new-rpc-mode
gdt 22726 48.5 0.7 80104 58864 ? Os 3:45PM 3:01.36 unison -server __new-rpc-mode
gdt 22726 48.2 0.7 80104 58864 ? Os 3:45PM 3:01.83 unison -server __new-rpc-mode
It is more or less necessary for unison to look at state per file, and
you may just be seeing paging when it goes over that data quickly. You
said 350 GB, but not how many files.
As for archive file size, for
58463 files
4509845 bytes archive file
77.1401570223 bytes/file
Hope this helps....
More information about the Unison-hackers
mailing list