Optimize FEC parameters to increase download performance #791

New Issue

tahoe-lafs · 2009-08-19T19:20:51Z

swillden commented

2009-08-19 19:20:51 +00:00

When Tahoe is configured with a large value for k, the number of shares required to reconstruct a file, retrievals can be done by pulling shares from many servers in parallel. The 'swarming download' effect can in many cases significantly improve performance.

In ticket #778 (http://allmydata.org/trac/tahoe/ticket/778) it was decided to change the Tahoe configuration to specify the share generation and distribution parameters in terms of numbers of servers, and to leave the selection of actual FEC parameters as an implementation decision in Tahoe. This decision makes it possible for Tahoe to increase the number of shares required, k_e, and distributed m_e. The ticket #778 discussion included one algorithm for doing that.

When Tahoe is configured with a large value for `k`, the number of shares required to reconstruct a file, retrievals can be done by pulling shares from many servers in parallel. The 'swarming download' effect can in many cases significantly improve performance. In ticket #778 (<http://allmydata.org/trac/tahoe/ticket/778>) it was decided to change the Tahoe configuration to specify the share generation and distribution parameters in terms of numbers of servers, and to leave the selection of actual FEC parameters as an implementation decision in Tahoe. This decision makes it possible for Tahoe to increase the number of shares required, `k_e`, and distributed `m_e`. The ticket #778 discussion included one algorithm for doing that.

tahoe-lafs added the

labels 2009-08-19 19:20:51 +00:00

tahoe-lafs added this to the undecided milestone 2009-08-19 19:20:51 +00:00

warner added

code-encoding

and removed

unknown

labels 2009-08-22 19:43:57 +00:00

zooko commented

2009-10-10 23:32:45 +00:00

Here is the code that Shawn Willden attached to ticket #778 to compute fec parameters:

http://allmydata.org/trac/tahoe/attachment/ticket/778/fecparams.py

Here was his comment about it:

http://allmydata.org/trac/tahoe/ticket/778#comment:32

Here is the code that Shawn Willden attached to ticket #778 to compute fec parameters: <http://allmydata.org/trac/tahoe/attachment/ticket/778/fecparams.py> Here was his comment about it: <http://allmydata.org/trac/tahoe/ticket/778#comment:32>

warner commented

2010-01-15 05:50:57 +00:00

incidentally, if we allowed the downloader to use a bit more RAM, we could achieve maximum parallelism without raising k, by having it pull shares for two segments at the same time (pull from servers(1..k) for seg0, and from servers(k+1..2k) for seg1). If we were really clever, we could time the requests such that the data for the second segment didn't arrive at our downstream congestion point until most of the data for the first segment had showed up, which might let us keep the downstream pipe filled more.

Increasing k has the negative property that it leads to an increase in N, and if N is capped by the total number of servers, then the entropy of share placement (as explored in #872) drops to zero, so all files get placed on the an identical set of servers, causing them all to share the same fate. I don't yet know how overall reliability of k-of-N vs 2k-of-2N when the total number of servers is not much larger than 2N.. I'll have to think about that. I believe that reliability roughly grows with N-k-1, so it's possible that 2k-of-2N is so much better than k-of-N that the peer-selection-entropy is irrelevant.

incidentally, if we allowed the downloader to use a bit more RAM, we could achieve maximum parallelism without raising `k`, by having it pull shares for two segments at the same time (pull from servers(1..k) for seg0, and from servers(k+1..2k) for seg1). If we were really clever, we could time the requests such that the data for the second segment didn't arrive at our downstream congestion point until most of the data for the first segment had showed up, which might let us keep the downstream pipe filled more. Increasing `k` has the negative property that it leads to an increase in `N`, and if `N` is capped by the total number of servers, then the entropy of share placement (as explored in #872) drops to zero, so all files get placed on the an identical set of servers, causing them all to share the same fate. I don't yet know how overall reliability of `k-of-N` vs `2k-of-2N` when the total number of servers is not much larger than `2N`.. I'll have to think about that. I believe that reliability roughly grows with `N-k-1`, so it's possible that `2k-of-2N` is so much better than `k-of-N` that the peer-selection-entropy is irrelevant.

davidsarah commented

2010-01-15 18:19:30 +00:00

Replying to warner:

I believe that reliability roughly grows with N-k-1, so it's possible that 2k-of-2N is so much better than k-of-N that the peer-selection-entropy is irrelevant.

Reliability can't be concluded to grow with N-k-1 without making additional assumptions. Consider a grid with half of the servers in each of two clusters. If some disaster wipes out one cluster, then on average it wipes out half the shares of each file. In that case k-of-N and 2k-of-2N have roughly equal reliability: the proportion of files lost depends on how k compares to N/2 in both cases.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/791#issuecomment-72816): > I believe that reliability roughly grows with `N-k-1`, so it's possible that `2k-of-2N` is so much better than `k-of-N` that the peer-selection-entropy is irrelevant. Reliability can't be concluded to grow with `N-k-1` without making additional assumptions. Consider a grid with half of the servers in each of two clusters. If some disaster wipes out one cluster, then on average it wipes out half the shares of each file. In that case `k-of-N` and `2k-of-2N` have roughly equal reliability: the proportion of files lost depends on how `k` compares to `N/2` in both cases.

Sign in to join this conversation.