if a stored share has a corrupt header, other shares held by that server for the file should still be accessible to clients #1566
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1566
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When a storage server receives a
remote_get_buckets
orremote_slot_testv_and_readv_and_writev
request, it will try to create share objects for each of the shares it stores under that SI that are wanted by the client. If any of those shares have a corrupt header (typically resulting in aUnknownMutableContainerVersionError
,UnknownImmutableContainerVersionError
, orstruct.error
from the share class constructor), the whole request will fail, even though the server might hold other shares that are not corrupted.Unfortunately there is no way in the current storage protocol to report success for some shares and a failure for others. The options are:
I found this bug when working on the branch for #999, but I think it also applies to trunk.
[description as a comment by mistake]posted
I will write a test.
I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called
remote_advise_corrupt_share()
, but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways).I'd be ok with reporting "nope, no shares here" if all shares are corrupt, rather than returning an error. I don't think the client is in a good position to do much with information about shares that are so corrupt that it can't even see part of their contents (which is what header corruption would mean). The only interaction I can think of is how the client might try to repair this file: they can't replace the corrupt-header share (the server admin will need to delete it locally), they can't modify it's contents, so they ought to just put a new copy somewhere else. They'll probably try to put a new copy on this server, which it needs to refuse (refusal doesn't always mean no-space-available, although the client might have bugs which makes it interpret refusal that way).
Replying to warner:
Good, that's essentially what I'd implemented in [5477/ticket999-S3-backend]. Please review! (I think that changeset is understandable independently of the other pluggable-backends changes. Comments particularly wanted on the behaviour when attempting to write shares with corrupted headers.)
Replying to [davidsarah]comment:4:
... except for filing a corruption report, which I agree is a good idea.
I'm not going to re-assign this, as I don't feel completely confident that my review is sturdy enough to have properly understood this patch. None the less, I feel like I have a good understanding of the impact the patch is trying to achieve and the abstract approach (thanks to the readability of the code). I pretty much ignored syntax and looked only at the discourse that I could understand.
It looks as though the ability to access the server's other shares, even if a corrupt header exists for one of those shares, is functional. And one would be able to retrieve other files on that corrupted server, even with a return of corrupted headers.
AFAIK, the application now identifies corrupt headers, circumvents them for the action, and then reports what headers were corrupt.
Yay, my first review.
kraeftig: thanks.
I understand what the behaviour should be now, and will make sure that it works that way before merging the pluggable backends code. (Currently on the ticket999-S3-backends branch, the disk backend works as described by warner, while the S3 backend does not. But the S3 backend is going to be rewritten for RAIC before it gets merged, in any case.)
The tests on the cloud branch have been updated to reflect the desired behaviour. They pass for the disk backend, and are marked TODO for the cloud backend.
Fixed in fd819cea and ea335f74 on the cloud-rebased2 branch.
It looks good to me, except for one line I have a question about. I left a comment on fd819cea about it.
Does the server file a local corruption report like Brian suggested? I can't seem to find code anywhere that does this.
Replying to markberger:
Fixed, thanks.
It does not. Filed as #2026.