large file download has bad alacrity #670

New Issue

zooko · 2009-03-29T03:50:15Z

zooko commented

2009-03-29 03:50:15 +00:00

A 10 GB file has about 80,000 segments, and the download process first downloads the entire Merkle Tree, which means 80,000 32-byte hashes. For some reason building the Merkle Hash Tree object with http://allmydata.org/trac/tahoe/browser/src/allmydata/hashtree.py takes more than an hour. Could be slowness in our SHA256d implementation, could be something in the hash tree data structure. Brian is busy measuring exactly how long it takes to build hash trees of various sizes even as I write this.

This means it takes a long time before it even begins downloading the file contents. Apache's reverse proxy sometimes times-out while waiting for the file download to start, and wget can time-out on this as well.

The best fix is probably to download and use only the subset of the hashes of Merkle Tree that we actually need for the current block. Another improvement would be to optimize the construction of the hash tree object with those hashes.

A 10 GB file has about 80,000 segments, and the download process first downloads the entire Merkle Tree, which means 80,000 32-byte hashes. For some reason building the Merkle Hash Tree object with <http://allmydata.org/trac/tahoe/browser/src/allmydata/hashtree.py> takes more than an hour. Could be slowness in our SHA256d implementation, could be something in the hash tree data structure. Brian is busy measuring exactly how long it takes to build hash trees of various sizes even as I write this. This means it takes a long time before it even begins downloading the file contents. Apache's reverse proxy sometimes times-out while waiting for the file download to start, and wget can time-out on this as well. The best fix is probably to download and use only the subset of the hashes of Merkle Tree that we actually need for the current block. Another improvement would be to optimize the construction of the hash tree object with those hashes.

zooko added the

labels 2009-03-29 03:50:15 +00:00

zooko added this to the 1.6.0 milestone 2009-03-29 03:50:15 +00:00

warner commented

2009-03-29 06:17:56 +00:00

Zooko and I identified some super-linearities in IncompleteHashTree.add_hashes. I haven't yet characterized just how bad it is, but I think it's at least O(N**2). Our current download code grabs the entire merkle tree (both the block tree and the ciphertext hash tree) and adds the whole thing at once, triggering this superlinear behavior.

first: change IncompleteHashTree to fix the superlinearity. Add a test case which adds a couple thousand hashes, something which takes about 1.0s on my workstation (so it might take less than 60s on our slowest buildslave), but which would take a really really long time in the broken version. Some notes on the fix: instead of maintaining a sorted list of all pending hash nodes, break the list up into one set per level of the tree, and process everything on the lowest level (in arbitrary order) before proceding to the level above.
second: change download to retrieve the minimal Merkle chain. If we fix the first problem, we can probably put this off for a while.

For reference. Foolscap (serialization+deserialization) takes 84us per hash, so 80k hashes would take about 6.7s .

Zooko and I identified some super-linearities in `IncompleteHashTree.add_hashes`. I haven't yet characterized just how bad it is, but I think it's at least `O(N**2)`. Our current download code grabs the entire merkle tree (both the block tree and the ciphertext hash tree) and adds the whole thing at once, triggering this superlinear behavior. * first: change `IncompleteHashTree` to fix the superlinearity. Add a test case which adds a couple thousand hashes, something which takes about 1.0s on my workstation (so it might take less than 60s on our slowest buildslave), but which would take a really really long time in the broken version. Some notes on the fix: instead of maintaining a sorted list of all pending hash nodes, break the list up into one set per level of the tree, and process everything on the lowest level (in arbitrary order) before proceding to the level above. * second: change download to retrieve the minimal Merkle chain. If we fix the first problem, we can probably put this off for a while. For reference. Foolscap (serialization+deserialization) takes 84us per hash, so 80k hashes would take about 6.7s .

warner commented

2009-03-31 20:33:10 +00:00

changeset:466014f66fbccff7 ought to fix the O(N**2) problem. On my laptop, an 80k-leaf tree now takes 10s to run add_hashes (compared to at least hours before). Preparing the dictionary takes about 13s. So I think this might reduce the alacrity to 29.7s plus however long it takes to transfer the hashes over the wire. When I get back to the office tomorrow I'll see if I can run a real test and measure the 10GB-file alacrity properly.

It's still a good idea to switch to using the minimal Merkle chain. I think that would reduce the alacrity to just a few RTT.

changeset:466014f66fbccff7 ought to fix the O(N**2) problem. On my laptop, an 80k-leaf tree now takes 10s to run `add_hashes` (compared to at least hours before). Preparing the dictionary takes about 13s. So I think this might reduce the alacrity to 29.7s plus however long it takes to transfer the hashes over the wire. When I get back to the office tomorrow I'll see if I can run a real test and measure the 10GB-file alacrity properly. It's still a good idea to switch to using the minimal Merkle chain. I think that would reduce the alacrity to just a few RTT.

zooko commented

2009-09-02 15:24:39 +00:00

Maybe we should close this ticket to commemorate the valiant slaying of the O(N**2) tree-building bug and open a new ticket to call for a hero to come slay the issue in which it stupidly downloads all the hashes instead of just the ones it needs.

warner commented

2009-09-02 22:04:01 +00:00

yeah, and let's reset the milestone to be whatever release it made it into.. someone else who runs into this problem (in the older release) should be able to find this ticket and learn that it's been solved by (newer release).

zooko commented

2009-09-04 02:47:00 +00:00

This fix to the superlinear tree-building algorithm was released in Tahoe-LAFS v1.4.1. The remaining issue -- that it dumbly downloads the entire Merkle Tree up front (thus worsening alacrity) is now in a new ticket -- #800 (improve alacrity by downloading only the part of the Merkle Tree that you need).

zooko added the

fixed

label 2009-09-04 02:47:00 +00:00

zooko modified the milestone from 1.6.0 to 1.4.1

2009-09-04 02:47:00 +00:00

zooko closed this issue

2009-09-04 02:47:00 +00:00

Sign in to join this conversation.