add readv() API to immutable-share storage-server protocol, use in downloader #1545

Open
opened 2011-09-26 00:25:53 +00:00 by warner · 3 comments

One of the most obvious fixes for the immutable-download performance problems tracked in #1264 (and on the Performance/Sep2011 results) is to implement a scatter/gather readv() method for immutable shares. The graphs show MDMF downloads running just as fast with k=60 as with k=3, whereas for immutable files there is a drastic slowdown (10x) between k=3 and k=60. We're still investigating, but I suspect that Foolscap's message-serialization performance is to blame, and an easy way to mitigate that is to send fewer messages.

The interface should probably be just like the mutable-share's remote_readv() API: a vector of (offset,length) tuples, and the return value is a vector of data strings. (A future HTTP-based interface will probably pack these vectors into a single string, but we might experiment with doing that here too (basically do the marshalling before handing anything to foolscap, trading off generality for performance).

David-Sarah mentioned that some of their new storage-backend code (for LAE) provides this interface, so we're likely to have the back half of this feature fairly soon. The rest of the work is to change immutable/downloader/share.py to turn a Request span into a read vector, instead of looping over all pieces of the span and sending separate read() requests for each.

One of the most obvious fixes for the immutable-download performance problems tracked in #1264 (and on the Performance/Sep2011 results) is to implement a scatter/gather `readv()` method for immutable shares. The graphs show MDMF downloads running just as fast with k=60 as with k=3, whereas for immutable files there is a drastic slowdown (10x) between k=3 and k=60. We're still investigating, but I suspect that Foolscap's message-serialization performance is to blame, and an easy way to mitigate that is to send fewer messages. The interface should probably be just like the mutable-share's `remote_readv()` API: a vector of `(offset,length)` tuples, and the return value is a vector of data strings. (A future HTTP-based interface will probably pack these vectors into a single string, but we might experiment with doing that here too (basically do the marshalling before handing anything to foolscap, trading off generality for performance). David-Sarah mentioned that some of their new storage-backend code (for LAE) provides this interface, so we're likely to have the back half of this feature fairly soon. The rest of the work is to change immutable/downloader/share.py to turn a Request span into a read vector, instead of looping over all pieces of the span and sending separate `read()` requests for each.
warner added the
code-storage
major
enhancement
1.9.0a2
labels 2011-09-26 00:25:53 +00:00
warner added this to the undecided milestone 2011-09-26 00:25:53 +00:00
Author

early results suggest that doing this would speed up high-k immutable downloads by about 24%. For example, k=54 with trunk takes roughly 414s to download a 100MB file (6 servers, 3 hosts, LAN connections). When basic readv() is used, this drops to 317s. For small k (like the default k=3), the effect is less clear, however there still seems to be a significant improvement (k=3 trunk 100MB takes maybe 38s, with-readv takes 32s).

The effect is roughly halfway between unmodified trunk CHK and trunk MDMF (which prefetches the whole block_hash_tree and doesn't even have a crypttext_hash_tree).

early results suggest that doing this would speed up high-k immutable downloads by about 24%. For example, k=54 with trunk takes roughly 414s to download a 100MB file (6 servers, 3 hosts, LAN connections). When basic readv() is used, this drops to 317s. For small k (like the default k=3), the effect is less clear, however there still seems to be a significant improvement (k=3 trunk 100MB takes maybe 38s, with-readv takes 32s). The effect is roughly halfway between unmodified trunk CHK and trunk MDMF (which prefetches the whole block_hash_tree and doesn't even have a crypttext_hash_tree).
Author

Attachment readv.diff (8229 bytes) added

add readv() support to server, use it from the client if available

**Attachment** readv.diff (8229 bytes) added add readv() support to server, use it from the client if available
8.0 KiB
Author

That patch is just a proof-of-concept, not actually ready or recommended for landing. I'm attaching it so others can reproduce my results.

That patch is just a proof-of-concept, not actually ready or recommended for landing. I'm attaching it so others can reproduce my results.
tahoe-lafs modified the milestone from undecided to 1.11.0 2012-04-01 23:21:04 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1545
No description provided.