high memory usage during GET for large files and slow links #129
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#129
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Load testing revealed that doing a GET of a large file through a slow link
causes the memory footprint of the decoding node to balloon to the size of
the file being downloaded. The cause is simple: decode is outpacing the
download, and we're doing naive twisted.web transport.write for each segment.
This forces the transport to buffer all of the data that we've written and
which the client (in this case a browser on the other end of a DSL line) has
not yet received.
I can think of two possible solutions:
make sure that decoding is a producer/consumer process. This means we hold
off on downloading the shares for a given segment until the consumer (in
this case the HTTP connection) says they want more (because their buffer
size has dropped below some value). This changes the control flow in
download, not coincidentally mirroring a similar change in upload (to
support offloaded-uploading #116).
have the decode process write the data to a temporary file on disk, and
then pass that off to the web transport to read at its leisure (and
delete it when finished, using an anonymous filehandle)
Doing producer/consumer probably raises the memory footprint by 1MB for each
active download (holding one segment of plaintext in memory while we wait for
the client to download it, maybe 2MB if we pipeline the next segment's
shares).
The tempfile approach means downloads run full-throttle and then finish,
avoiding the memory overhead, but of course then we have a disk overhead of
the full file size for the duration of the download. In practice, the kernel
will cache these disk files until they get too large, then push them to an
actual disk, with a cache size varying according to whatever else is using
memory.
I'm inclined to implement the producer/consumer thing, but when I think about
it, the kernel is in the best position to make the tradeoff between disk and
memory, so it might be a better approach to simply let it do its job. Client
behavior has an effect too: if people download half of a large file and then
quit and never come back, the tempfile approach means a lot of wasted
fetch/decode effort. On the other hand, the tempfile approach makes it a
!!!lot!!! easier to keep the tempfile around for a couple of hours in case
the client comes back to finish the job. (we'd have to implement
Content-Range: on the GET command, but that might not be all that difficult).
I've added an automated memory test for this: check out the buildbot "memcheck" builder for the current numbers. As of right now, downloading a 50MB file and pushing it over a slow HTTP 'GET' link causes the node to peak at 89MB.
Fixed, in changeset:1340c484c6c60c52. The producer/consumer stuff works great, and the memory footprint is now down to 29MB for a stalled download of a 50MB file (this is within 7% of the footprint of our other 50MB tests).
The new code also handles interrupted downloads extremely gracefully. The segment that is currently downloading completes, then the rest are skipped and the download finishes with a DownloadStopped exception.