upload is unhappy even though the shares are already distributed #1124
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1124
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Here's a test case that I added to source:src/allmydata/test/test_upload.py:
When I run it, it fails like this:
Why does the upload not succeed?
I added debugprints and here are some of them:
(Nitpick: the name of the test method should be
0123_03_1_2
.)Is it the maximum matching implementation that is incorrect? If it is, then it should be possible to write a narrower test case for that.
Replying to davidsarah:
That isn't the problem:
Swapping the order of the last two servers in the test makes it so the code under test starts succeeding to upload instead of failing to upload. This fails:
while logging the following:
This succeeds:
while logging the following:
Attachment test-1124.dpatch.txt (11983 bytes) added
test-1124.dpatch.txt is a test for this issue, marked TODO.
The problem seems to be the share redistribution algorithm implemented by
Tahoe2PeerSelector
(as modified by #778). If we instrument it to print out the sharemap (i.e. sharenum -> set of peerids) after the call tomerge_peers
in each iteration of_loop
, we get (abbreviating each peerid to its first hex byte):Presumably 'b9' is server 0. So we merge in all of the shares found for that server, but then in the same iteration of
_loop
, we move all but one of them intohomeless_shares
. I don't understand the intent of the algorithm fully, but it seems as though we're forgetting that server 0 had additional shares that if taken into account, would have increased servers_of_happiness -- even though they didn't increase it in that particular iteration.The intent of the algorithm is to identify servers with more than one share, and make some of the shares on those servers homeless so that they can be redistributed to peers that might not have had any shares assigned to them yet. It is a greedy algorithm that doesn't quite do the trick in a lot of situations, and it seems like this is one of them.
test_problem_layout_comment_187
is another such layout; it is marked as todo, because we hope to change the uploader to do a better job of share redistribution in 1.8. This might be a feature of #1126, or might be another ticket that hasn't been made yet.There was a bug in the code that david-sarah was testing in comment:78663. That bug was fixed by changeset:13b5e44fbc2effd0. We should re-evaluate this ticket.
Test was committed in changeset:0f46766a51792eb5.
what needs to happen to resolve this? do we have a plan to improve the share-distribution algorithm? It seems to me that there's no chance of this being a small safe fix, so I'm kicking it out of 1.9 .
Replying to warner:
That was being done in #1382. If I understand correctly, Kevan's patches there try to implement the algorithm of ticket:1130#comment:-1, or something very much like it.
I believe this will be fixed by #1382.
Milestone renamed
moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders
Moving open issues out of closed milestones.
Ticket retargeted after milestone closed