peer selection could be more uniform #132
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#132
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our new Tahoe2 peer selection algorithm is a lot more uniform than the previous Tahoe3 one. But
the way I implemented it could still be improved a bit.
The issue is when N (the total number of shares being created) is not an exact multiple of S (the number of servers available). For example, on our current testnet, we have 3-of-10 encoding (so N=10) and 3 servers (so S=3). I'd really like the share allocation to be 4+3+3, because then you could lose any two servers and still recover the file. But the current code does 4+4+2 instead, which still leaves a lose-two-servers case that fails.
The reason for this is the way I built the peer-selection loop. We try to avoid requiring more than two passes around the circle of peers. During the first pass, we ask each server to hold a single share. When we get back to the start, if we see that we still have shares left, we do
shares_per_server = div_ceil(remaining_shares, num_servers)
, and then ask each peer to hold that many shares.For our N=10/S=3 example, this places one share per server on the first loop, then starts the second loop with remaining_shares=7, and
div_ceil(7,3)
is 3, so it gives 3 additional shares (for a total of 4) to the first server, same to the second server, then it sees that it only has one unplaced share left and gives just that to the last server, resulting in an allocation of 4+4+2.We only recalculate shares_per_server at the top of each loop. I think the way to address this is going to involve using
div_floor
instead ofdiv_ceil
and distributing the remainder by sending an extra share to every Nth server, somehow. It would probably be easier if we had aconvenient way to know how many servers we've looped through so far. If we knew how many servers
were left to ask on this particular loop, we could do
div_floor(remaining_shares, remaining_servers)
and that would probably get us the desired properties. To do this, as we make the second pass,
we should move servers from
self.contacted_servers
to a holding list, and then move thewhole batch of them back into
self.contacted_servers
when we exhaust that list.fixed, with changeset:808f85158989ebc7. It was easier than I thought it'd be.