pipeline download blocks for better performance #1110

New Issue

zooko · 2010-07-06T21:28:53Z

zooko commented

2010-07-06 21:28:53 +00:00

As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.

zooko added the

labels 2010-07-06 21:28:53 +00:00

zooko added this to the 1.8.0 milestone 2010-07-06 21:28:53 +00:00

warner was assigned by zooko

2010-07-06 21:28:53 +00:00

zooko modified the milestone from 1.8.0 to eventually

2010-08-05 21:29:25 +00:00

zooko commented

2010-08-05 21:38:07 +00:00

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html)

says:

This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html) says: This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.

zooko commented

2010-09-06 06:44:40 +00:00

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html)

Kyle's benchmarks and discussion and links to code:

Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for all of that large difference!

This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)

(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html) Kyle's benchmarks and discussion and links to code: Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for *all* of that large difference! This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)

davidsarah commented

2010-09-07 00:44:37 +00:00

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows.

It may be worth doing this ticket first, but with an eye to how to extend it.

#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows. It may be worth doing this ticket first, but with an eye to how to extend it.

zooko commented

2010-09-07 16:10:09 +00:00

Marking this as 1.9.0 and "unfinished-business" as mentioned in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html .

Marking this as 1.9.0 and "unfinished-business" as mentioned in <http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html> .

zooko modified the milestone from eventually to 1.9.0

2010-09-07 16:10:09 +00:00

zooko modified the milestone from 1.9.0 to soon

2011-07-27 18:22:37 +00:00

daira commented

2014-03-02 13:50:21 +00:00

Eek, I'm shocked this isn't fixed yet.

Sign in to join this conversation.