memory usage in MDMF publish #1513
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1513
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I did a 'tahoe push --mdmf --mutable-type=mdmf foo' of a 210MB file. The client process swelled to 1.15GB RSS, making my entire system pretty unresponsive. The publish eventually succeeded, and the memory usage went back to normal.
I'm guessing that either there's a design problem in which it's trying to upload all segments in parallel, or there's a failure in the Pipeline code such that it's holding all shares in memory at the same time.
Since MDMF is supposed to make it possible to work with large files, I think the memory usage should be similar to CHK files: capped at a small constant times the segsize.
It would be nice to fix this for 1.9, but since MDMF is still experimental, I'm willing to ship without it.
Hm, there's a tension between reliability and memory-footprint-performance here. When making changes, we want each share to atomically jump from version1 to version2, without it being left in any intermediate state. But that means all of the changes need to be held in memory and applied at the same time.
When we're jumping from "no such share" to version1, those changes are the entire file. The data needs to be buffered somewhere. If we were allowed to write one segment at a time to the server's disk, then a server failure or lost connection would leave us in an intermediate state, where the share only had a portion of version1, which would effectively be a corrupt share.
I can think of a couple of ways to improve this:
If we're willing to tolerate the disk-footprint, we could increase reliability against server crashes by making start_editing() create a full copy of the old share in a sibling directory (like incoming/, not visible to anyone but the edithandle). Then apply_delta() would do normal write()s to the copy, and finish() would atomically move the copy back into place. Everything in the incoming/ directory would be deleted at startup, and the temp copies would also be deleted when the connection to the client was lost. This would slow down the updates for large files (since a lot of data would need to be shuffled around before the edit could begin), and would consume more disk (twice the size of the share), but would allow edits to be spread across separate messages, which reduces the client's memory requirements. It would also reduce share corruption caused by the server being bounced during a mutable write.
Replying to warner:
I prefer this option: it allows the client to apply the deltas to all servers and confirm that those operations succeed, and only then send
finish
to all servers. But note that there needs to be anedithandle.truncate(new_size)
operation, or alternatively.finish(new_size)
.There are some memory usage measurements on the duplicate #1523. Particularly concerning is that there seems to be a rather large memory leak; it's not just high transient memory usage.
Let's call it a memory "leak" if doing some operation repeatedly results in progressively greater memory usage, such that if you do that operation enough times it will use up all the memory in your system. Let's not call it a memory "leak" if it uses up way too much RAM. Note that last time I heard, CPython never releases memory back to the operating system: http://www.evanjones.ca/memoryallocator/
It sounds to me like there is a major problem here, which is that Tahoe-LAFS uses up way too much memory. I don't see evidence that there is a "leak" per se, and I don't consider it to be a major problem that CPython never releases memory back to the operating system.
We need to document this in 1.9 release's docs/performance.rst if it isn't fixed.
Subject to fragmentation issues, CPython does return memory to the OS: http://bugs.python.org/issue1123430. I tried to test whether uploading a second file resulted in the same additional memory usage (suggesting a leak) or less (suggesting that not returning memory is part of the problem), but couldn't complete the test because my machine became unresponsive. I'll try again when I have more free memory.
Note that it's RSS that we're measuring, not virtual memory. Memory pages that aren't being used shouldn't be counted in RSS (eventually).
not making it into 1.9
This isn't going to make it into [1.11.0]. I think it requires a deep change. Ultimately I think it actually requires end-to-end two-phase-commit (#1755)!
Let's see, does the docs/performance.rst already document this issue? source:trunk/docs/performance.rst?rev=514fb096be50464ce78933f4db48db4de40e7265#publishing-an-a-byte-mutable-file. Yes! Good.