errors during add-lease cause checker false-negatives #875
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#875
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Francois, on the 25c3 grid, observed failures (looking very much like the exceptions in #786) that occurred when "check --repair --add-lease" was done on a directory for which all of the shares were stored on an old (tahoe-1.2.0) server.
The implementation of the mutable checker's add-lease code was such that any unexpected errors in the add-lease call would cause the checker to ignore any shares reported by the simultaneous readv call. tahoe-1.2.0 had a bug in the latency-measure code such that add-lease always threw a
KeyError
(I'd mis-analyzed the relative levels of support in my notes in the NEWS file).So when he did "check --add-lease --repair", the add-lease bug caused the checker to think there were no shares available, so it attempted repair, and the lack of any shares caused repair to fail with the weird
TypeError
that is the focus of #786.The fix is to separate out the add-lease response/errback path from the readv path. I want to record some errors but ignore the ones that I think are harmless and noisy, like the known limitations of older tahoe versions.
The immutable checker code uses the same pipelined add-lease/DYHB queries, and needs to be fixed also.
Ok, changeset:794e32738fc654ae should fix this one (both mutable and immutable).