mitigate the performance bottleneck of slow servers in download #1187

Open
opened 2010-08-27 00:18:25 +00:00 by davidsarah · 2 comments
davidsarah commented 2010-08-27 00:18:25 +00:00
Owner

Ticket #1170 showed that server selection can have a big impact on download performance (presumably the same is true of upload performance) when the chosen servers have different effective bandwidths. For example, /tahoe-lafs/trac-2024-07-25/issues/6232#comment:94 showed an overall download bandwidth of ~90 Kbps vs ~190 Kbps depending on how many shares were taken from each of two servers.

My hypothesis is that this is mainly due to the time to download each segment being bottlenecked by the slowest server. Once we've downloaded the share(s) for a given segment from the faster servers and are waiting for the slower ones, the faster servers are left idle until the next segment.

In principle, we can use the erasure coding to mitigate this bottleneck. If k were larger than it typically is now, we could download as many shares per segment from each server as its current effective bandwidth is able to support. Since the individual shares would be a smaller fraction of the segment size, so would be the wasted time per segment.

A further optimization would be to pipeline requests across consecutive segments. That is, when we're getting near to finishing the download of shares for a segment, we can also be downloading shares for the next segment (but only the next segment) from servers that would otherwise be idle. Note that this depends on downloading a variable number of shares from each server for each segment (otherwise, the servers that finish first on one segment would also typically finish first on the next segment, so we would either have unbounded memory usage or still have to leave the faster servers servers that finish first idle some of the time).

It isn't entirely clear how large k would need to be for this approach to work. If it would need to be larger than the number of servers, then we would have to rethink the servers-of-happiness criterion. The current definition basically only credits each server for having one share, and it would have to be changed to credit each server for having multiple shares. I think this can still be given a fairly simple definition in the bipartite matching model.

Ticket #1170 showed that server selection can have a big impact on download performance (presumably the same is true of upload performance) when the chosen servers have different effective bandwidths. For example, [/tahoe-lafs/trac-2024-07-25/issues/6232](/tahoe-lafs/trac-2024-07-25/issues/6232)#comment:94 showed an overall download bandwidth of ~90 Kbps vs ~190 Kbps depending on how many shares were taken from each of two servers. My hypothesis is that this is mainly due to the time to download each segment being bottlenecked by the slowest server. Once we've downloaded the share(s) for a given segment from the faster servers and are waiting for the slower ones, the faster servers are left idle until the next segment. In principle, we can use the erasure coding to mitigate this bottleneck. If k were larger than it typically is now, we could download as many shares per segment from each server as its current effective bandwidth is able to support. Since the individual shares would be a smaller fraction of the segment size, so would be the wasted time per segment. A further optimization would be to pipeline requests across consecutive segments. That is, when we're getting near to finishing the download of shares for a segment, we can also be downloading shares for the next segment (but *only* the next segment) from servers that would otherwise be idle. Note that this depends on downloading a variable number of shares from each server for each segment (otherwise, the servers that finish first on one segment would also typically finish first on the next segment, so we would either have unbounded memory usage or still have to leave the ~~faster servers~~ servers that finish first idle some of the time). It isn't entirely clear how large k would need to be for this approach to work. If it would need to be larger than the number of servers, then we would have to rethink the servers-of-happiness criterion. The current definition basically only credits each server for having one share, and it would have to be changed to credit each server for having multiple shares. I think this can still be given a fairly simple definition in the bipartite matching model.
tahoe-lafs added the
code-network
major
defect
1.8β
labels 2010-08-27 00:18:25 +00:00
tahoe-lafs added this to the undecided milestone 2010-08-27 00:18:25 +00:00
davidsarah commented 2010-08-27 00:22:16 +00:00
Author
Owner

In the ticket description, when I say "idle", I mean not working on this download. Of course servers that are idle in that sense might be doing other useful work, but probably most Tahoe grids have very bursty download traffic, and so they often (usually?) won't be.

In the ticket description, when I say "idle", I mean not working on this download. Of course servers that are idle in that sense *might* be doing other useful work, but probably most Tahoe grids have very bursty download traffic, and so they often (usually?) won't be.
davidsarah commented 2010-09-07 00:48:40 +00:00
Author
Owner

See also #1110, which proposes pipelining without downloading a variable number of shares from each server per segment.

See also #1110, which proposes pipelining without downloading a variable number of shares from each server per segment.
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1187
No description provided.