mutable: tolerate mixed corrupt/good shares from any given peer #211
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#211
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The current mutable file Retrieve code has a control flow problem that causes
it to respond to a corrupt share by ignoring any remaining shares from the
same peer. This causes unnecessary problems for small grids, because it makes
fewer shares available for use. In the worst case, this could make files
unavailable.
This worst case is only likely to be exercised in a unit test, but that's
what is happening in our test_mutable, where we use 5 nodes, 10 shares (of
which 7 are corrupt), 3-of-10 encoding.
To fix this, we need to modify the control flow in Retrieve._got_results to
allow a CorruptShareError to allow processing of the remaining shares but
still raise the exception at the end of the loop (to notify _query_failed,
which cares about the peerid but not the share number).
The current workaround is to use 10 nodes in that test instead of 5. Once we
fix this control flow, test_system.SystemTest.test_mutable should be restored
to using 5 nodes intead of 10, because the memory footprint of a 10-node test
is considerably larger than a 5-node test (233MB instead of 77MB).
The workaround was introduced in changeset:59d6c3c8229d8457 to fix #209 in time for the 0.7.0 release.
Fixed, in changeset:e3037a7541d2a37c. I also reduced the test case back down to 5 nodes: to exercise the recent resource.setrlimit code in node.py, you'll want to raise that back up to 10 briefly.