pipeline download blocks for better performance #1110

Open
opened 2010-07-06 21:28:53 +00:00 by zooko · 5 comments

As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.

As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.
zooko added the
code-network
major
enhancement
1.7.0
labels 2010-07-06 21:28:53 +00:00
zooko added this to the 1.8.0 milestone 2010-07-06 21:28:53 +00:00
warner was assigned by zooko 2010-07-06 21:28:53 +00:00
zooko modified the milestone from 1.8.0 to eventually 2010-08-05 21:29:25 +00:00
Author

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html)

says:

This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html) says: This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.
Author

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html)

Kyle's benchmarks and discussion and links to code:

Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for all of that large difference!

This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html) Kyle's benchmarks and discussion and links to code: Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for *all* of that large difference! This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)
davidsarah commented 2010-09-07 00:44:37 +00:00
Owner

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows.

It may be worth doing this ticket first, but with an eye to how to extend it.

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows. It may be worth doing this ticket first, but with an eye to how to extend it.
Author

Marking this as 1.9.0 and "unfinished-business" as mentioned in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html .

Marking this as 1.9.0 and "unfinished-business" as mentioned in <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html> .
zooko modified the milestone from eventually to 1.9.0 2010-09-07 16:10:09 +00:00
zooko modified the milestone from 1.9.0 to soon 2011-07-27 18:22:37 +00:00
daira commented 2014-03-02 13:50:21 +00:00
Owner

Eek, I'm shocked this isn't fixed yet.

Eek, I'm shocked this isn't fixed yet.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1110
No description provided.