large file download has bad alacrity #670
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#670
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A 10 GB file has about 80,000 segments, and the download process first downloads the entire Merkle Tree, which means 80,000 32-byte hashes. For some reason building the Merkle Hash Tree object with http://allmydata.org/trac/tahoe/browser/src/allmydata/hashtree.py takes more than an hour. Could be slowness in our SHA256d implementation, could be something in the hash tree data structure. Brian is busy measuring exactly how long it takes to build hash trees of various sizes even as I write this.
This means it takes a long time before it even begins downloading the file contents. Apache's reverse proxy sometimes times-out while waiting for the file download to start, and wget can time-out on this as well.
The best fix is probably to download and use only the subset of the hashes of Merkle Tree that we actually need for the current block. Another improvement would be to optimize the construction of the hash tree object with those hashes.
Zooko and I identified some super-linearities in
IncompleteHashTree.add_hashes
. I haven't yet characterized just how bad it is, but I think it's at leastO(N**2)
. Our current download code grabs the entire merkle tree (both the block tree and the ciphertext hash tree) and adds the whole thing at once, triggering this superlinear behavior.first: change
IncompleteHashTree
to fix the superlinearity. Add a test case which adds a couple thousand hashes, something which takes about 1.0s on my workstation (so it might take less than 60s on our slowest buildslave), but which would take a really really long time in the broken version. Some notes on the fix: instead of maintaining a sorted list of all pending hash nodes, break the list up into one set per level of the tree, and process everything on the lowest level (in arbitrary order) before proceding to the level above.second: change download to retrieve the minimal Merkle chain. If we fix the first problem, we can probably put this off for a while.
For reference. Foolscap (serialization+deserialization) takes 84us per hash, so 80k hashes would take about 6.7s .
changeset:466014f66fbccff7 ought to fix the O(N**2) problem. On my laptop, an 80k-leaf tree now takes 10s to run
add_hashes
(compared to at least hours before). Preparing the dictionary takes about 13s. So I think this might reduce the alacrity to 29.7s plus however long it takes to transfer the hashes over the wire. When I get back to the office tomorrow I'll see if I can run a real test and measure the 10GB-file alacrity properly.It's still a good idea to switch to using the minimal Merkle chain. I think that would reduce the alacrity to just a few RTT.
Maybe we should close this ticket to commemorate the valiant slaying of the O(N**2) tree-building bug and open a new ticket to call for a hero to come slay the issue in which it stupidly downloads all the hashes instead of just the ones it needs.
yeah, and let's reset the milestone to be whatever release it made it into.. someone else who runs into this problem (in the older release) should be able to find this ticket and learn that it's been solved by (newer release).
This fix to the superlinear tree-building algorithm was released in Tahoe-LAFS v1.4.1. The remaining issue -- that it dumbly downloads the entire Merkle Tree up front (thus worsening alacrity) is now in a new ticket -- #800 (improve alacrity by downloading only the part of the Merkle Tree that you need).