reducing memory footprint in share reception #97
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#97
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
One source of large memory footprint is the known issue with twisted.web's POST handling, addressed in #29. This ticket is not about that.
The memory footprint as reported by 'make check-memory' for uploading large files directly from disk is currently about 29MB. This seems acceptable for now. This test does not allow the uploading node to hold its own shares.
The "upload-self" series of tests does allow the node to hold its own shares, and this increases the memory footprint considerably (it seems to stabilize at 59MB). I have two theories on this, both of which center on Foolscap's behavior:
The first theory should be exercised by writing a variant of the check-memory test which, instead of asking the isolated process-under-test to send files, it should instead have it receive shares, and watch its memory usage as it receives large numbers of shares of various sizes. If the first theory is correct, then I'd expect to see its memory usage be high when given small numbers (10s) of large (1MB) shares, since gc isn't going to kick in until we've seen hundreds of objects get allocated and freed. Large numbers (1000s) of small (10-100B) shares ought to trigger gc before the overhead gets too high.
If that theory doesn't hold up, then work on the second theory should start by replacing the code in broker.LoopbackTransport.write with a queue.append. The theory is that the naieve
eventually(self.peer.dataReceived, data)
puts dozens of calls (plus the shares themselves) on the eventual-send queue, and then foolscap.eventual._SimpleCallQueue._turn isn't freeing the queue until all events have been processed. The second change to try out is to modify that _turn method to pop events off the queue as they are processed, to miminize the lifetime of those shares.I also have some suspicions about the way that nested functions work: do they capture the state of all local variables, or just the ones that are referenced in their body? Is there some introspection trick (like locals() or dir() or something) that would obligate Python to keep all local variables alive? For this, we need to be aggressive about using
del
to get rid of shares/blocks the moment they are no longer necessary.Also, Deferreds do not do tail-recursion, so any chains that involve things like
defer.succeed
will caused.addCallback
to run synchronously, and can result in very long call stacks (hundreds of frames long). This means that all the variables that were expected to go out of scope might stick around for a surprisingly long time, and when their lifetimes overlap, that increases the memory footprint. Having methods always return a Deferred (and using defer.succeed when the method is in fact synchronous) is an important pattern, because it improves flexibility for later: "Premature synchronization is the root of all evil (in an event-driven world)". To balance these, I think that foolscap.eventual.fireEventually can be used as a "reactor turn barrier", to ensure that the call chain is cut short. Alternatively, maybe we should investigate Deferred and see how it might be made to implement tail recursion properly.Another direction that might be promising would be to implement custom foolscap Slicers and Unslicers for blocks. I think we'd have to write the shares to disk during encoding, but if we did that then a Slicer could read just a small amount of data from disk each time the transport buffer opened up (using the producer/consumer model). This could effectively push the memory footprint off to disk.
I think that in the long run, we'll be able to closely model the amount of memory footprint that each concurrent upload will represent (probably as a linear function of
min(filesize, max_segment_size)
), and then we can build an upload scheduler that sequences the upload work to keep our memory footprint below some configured ceiling. i.e. you can do many small uploads at once, but only a certain number of bigger ones, but also once the file gets larger than 1MB then it doesn't matter how big it is because we're only processing a single segment at a time.There are some useful notes about simultaneous-peer share pushing and the way that foolscap uses memory in #29, even though the ticket as a whole is focusing on the web-POST problem that was just resolved.
reducing memory footprintto reducing memory footprint in share receptionOne data point: I observed the test framework process for the 100MB check_memory.py 'upload' test to hit 600MB. This framework process holds 4 or 5 nodes and is receiving all of the shares emitted by the program-under-test. It is the program-under-test whose memory footprint is stable at 29MB.
This was bad enough that I had to disable the 100MB test. The buildslave that hosts this test is on a somewhat memory-constrained VM, and that 600MB RSS caused the rest of the machine to thrash and choke.
If you love this ticket (#97), then you might like tickets #54 (port memory usage tests to windows), #419 (pycryptopp uses up too much RAM), #478 (add memory-usage to stats-provider numbers), and #277 (our automated memory measurements might be measuring the wrong thing).
warner wrote:
You are right to be suspicious: in CPython closures keep a reference to the entire enclosing frame. See Owen S.'s answer at http://stackoverflow.com/questions/3145893/how-are-closures-implemented-in-python .
BTW, this approach not only leads to memory leaks, but also
gives the wrong -- or at least, unexpected by most programmers --affects the semantics for captured mutable variables. And since changing that would be an incompatible change, it's not likely that either the memory leaks or the semantics will be fixed.Trunk now depends on a version of Twisted that solves the Deferred tail recursion problem.