pipeline download blocks for better performance #1110
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1110
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As Brian and I have discussed in person, downloads would probably be a bit faster for some users if we pipelined requests for successive blocks. Brian and I casually agreed that a pipeline depth of 2 would probably be pretty good for lots of users.
(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-August/004909.html)
says:
This is a good example of how pipelining download of blocks (#1110) could help. Previously I thought of it as a performance improvement when downloading successive blocks of the same file. Therefore I figured that if you were doing streaming processing of the file, such as if it was a movie and you were playing it out at normal speed, then a sufficiently large segment size would make the download faster than your playout speed, so pipelining would not matter for that. But this example shows that for some cases the segment size is irrelevant—in this case (if Brian's guess is correct) a read-block-pipeline depth of >= 2 would take one round-trip off of the startup time.
(http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005151.html)
Kyle's benchmarks and discussion and links to code:
Intriguing! It looks like upload typically took about 150 seconds and download took at least 850! Upload [has pipelining]source:trunk/src/allmydata/immutable/layout.py@4655#L118 and download [doesn't]source:trunk/src/allmydata/immutable/downloader/share.py@4707#L181. I wonder if that could account for all of that large difference!
This is issue #1110. It would probably make an excellent first hack for a new Tahoe-LAFS coder in the v1.9 timeframe. :-)
#1187 is a more ambitious generalization of this ticket. If you pipeline successive shares, but still download a fixed set of shares per segment per server, then the potential gain is limited by the fact that for each segment, you still have to wait for the server that finishes last. What #1187 proposes would tend to keep the pipe from each server as full as possible, by downloading as many shares from each server as bandwidth allows.
It may be worth doing this ticket first, but with an eye to how to extend it.
Marking this as 1.9.0 and "unfinished-business" as mentioned in http://tahoe-lafs.org/pipermail/tahoe-dev/2010-September/005163.html .
Eek, I'm shocked this isn't fixed yet.