if a stored share has a corrupt header, other shares held by that server for the file should still be accessible to clients #1566

Closed
opened 2011-10-18 18:04:30 +00:00 by davidsarah · 11 comments
davidsarah commented 2011-10-18 18:04:30 +00:00
Owner

When a storage server receives a remote_get_buckets or remote_slot_testv_and_readv_and_writev request, it will try to create share objects for each of the shares it stores under that SI that are wanted by the client. If any of those shares have a corrupt header (typically resulting in a UnknownMutableContainerVersionError, UnknownImmutableContainerVersionError, or struct.error from the share class constructor), the whole request will fail, even though the server might hold other shares that are not corrupted.

Unfortunately there is no way in the current storage protocol to report success for some shares and a failure for others. The options are:

  • the status quo -- no shares in the shareset are accessible;
  • shares with corrupt headers are ignored on read requests;
  • if all shares are corrupted then report one of the errors, but if only some shares in a shareset have corrupted headers, ignore them and allow access to the rest.

I found this bug when working on the branch for #999, but I think it also applies to trunk.

When a storage server receives a `remote_get_buckets` or `remote_slot_testv_and_readv_and_writev` request, it will try to create share objects for each of the shares it stores under that SI that are wanted by the client. If any of those shares have a corrupt header (typically resulting in a `UnknownMutableContainerVersionError`, `UnknownImmutableContainerVersionError`, or `struct.error` from the share class constructor), the whole request will fail, even though the server might hold other shares that are not corrupted. Unfortunately there is no way in the current storage protocol to report success for some shares and a failure for others. The options are: * the status quo -- no shares in the shareset are accessible; * shares with corrupt headers are ignored on read requests; * if *all* shares are corrupted then report one of the errors, but if only some shares in a shareset have corrupted headers, ignore them and allow access to the rest. I found this bug when working on the branch for #999, but I think it also applies to trunk.
tahoe-lafs added the
code-storage
major
defect
1.9.0b1
labels 2011-10-18 18:04:30 +00:00
tahoe-lafs added this to the undecided milestone 2011-10-18 18:04:30 +00:00
davidsarah commented 2011-10-18 18:14:46 +00:00
Author
Owner

[description as a comment by mistake]posted

[description as a comment by mistake]posted
davidsarah commented 2011-10-18 18:16:23 +00:00
Author
Owner

I will write a test.

I will write a test.

I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called remote_advise_corrupt_share(), but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways).

I'd be ok with reporting "nope, no shares here" if all shares are corrupt, rather than returning an error. I don't think the client is in a good position to do much with information about shares that are so corrupt that it can't even see part of their contents (which is what header corruption would mean). The only interaction I can think of is how the client might try to repair this file: they can't replace the corrupt-header share (the server admin will need to delete it locally), they can't modify it's contents, so they ought to just put a new copy somewhere else. They'll probably try to put a new copy on this server, which it needs to refuse (refusal doesn't always mean no-space-available, although the client might have bugs which makes it interpret refusal that way).

I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called `remote_advise_corrupt_share()`, but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways). I'd be ok with reporting "nope, no shares here" if all shares are corrupt, rather than returning an error. I don't think the client is in a good position to do much with information about shares that are so corrupt that it can't even see part of their contents (which is what header corruption would mean). The only interaction I can think of is how the client might try to repair this file: they can't replace the corrupt-header share (the server admin will need to delete it locally), they can't modify it's contents, so they ought to just put a new copy somewhere else. They'll probably try to put a new copy on this server, which it needs to refuse (refusal doesn't always mean no-space-available, although the client might have bugs which makes it interpret refusal that way).
davidsarah commented 2011-10-18 23:36:01 +00:00
Author
Owner

Replying to warner:

I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called remote_advise_corrupt_share(), but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways).

I'd be ok with reporting "nope, no shares here" if all shares are corrupt, rather than returning an error. I don't think the client is in a good position to do much with information about shares that are so corrupt that it can't even see part of their contents (which is what header corruption would mean). The only interaction I can think of is how the client might try to repair this file: they can't replace the corrupt-header share (the server admin will need to delete it locally), they can't modify it's contents, so they ought to just put a new copy somewhere else. They'll probably try to put a new copy on this server, which it needs to refuse (refusal doesn't always mean no-space-available, although the client might have bugs which makes it interpret refusal that way).

Good, that's essentially what I'd implemented in [5477/ticket999-S3-backend]. Please review! (I think that changeset is understandable independently of the other pluggable-backends changes. Comments particularly wanted on the behaviour when attempting to write shares with corrupted headers.)

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1566#issuecomment-86075): > I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called `remote_advise_corrupt_share()`, but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways). > > I'd be ok with reporting "nope, no shares here" if all shares are corrupt, rather than returning an error. I don't think the client is in a good position to do much with information about shares that are so corrupt that it can't even see part of their contents (which is what header corruption would mean). The only interaction I can think of is how the client might try to repair this file: they can't replace the corrupt-header share (the server admin will need to delete it locally), they can't modify it's contents, so they ought to just put a new copy somewhere else. They'll probably try to put a new copy on this server, which it needs to refuse (refusal doesn't always mean no-space-available, although the client might have bugs which makes it interpret refusal that way). Good, that's essentially what I'd implemented in [5477/ticket999-S3-backend]. Please review! (I think that changeset is understandable independently of the other pluggable-backends changes. Comments particularly wanted on the behaviour when attempting to write shares with corrupted headers.)
tahoe-lafs modified the milestone from undecided to 1.10.0 2011-10-18 23:36:01 +00:00
davidsarah commented 2011-10-18 23:43:06 +00:00
Author
Owner

Replying to [davidsarah]comment:4:

Replying to warner:

I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called remote_advise_corrupt_share(), but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways).
[...]

Good, that's essentially what I'd implemented in [5477/ticket999-S3-backend].

... except for filing a corruption report, which I agree is a good idea.

Replying to [davidsarah]comment:4: > Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1566#issuecomment-86075): > > I think the server should not report anything to the client about shares with corrupt headers. It should file a local corruption report (as if the remote client had called `remote_advise_corrupt_share()`, but with a flag that shows this was discovered locally, and probably in a part of the share that the client wouldn't be able to read anyways). > [...] > > Good, that's essentially what I'd implemented in [5477/ticket999-S3-backend]. ... except for filing a corruption report, which I agree is a good idea.
kraeftig commented 2012-03-30 23:37:08 +00:00
Author
Owner

I'm not going to re-assign this, as I don't feel completely confident that my review is sturdy enough to have properly understood this patch. None the less, I feel like I have a good understanding of the impact the patch is trying to achieve and the abstract approach (thanks to the readability of the code). I pretty much ignored syntax and looked only at the discourse that I could understand.

It looks as though the ability to access the server's other shares, even if a corrupt header exists for one of those shares, is functional. And one would be able to retrieve other files on that corrupted server, even with a return of corrupted headers.

AFAIK, the application now identifies corrupt headers, circumvents them for the action, and then reports what headers were corrupt.

Yay, my first review.

I'm not going to re-assign this, as I don't feel completely confident that my review is sturdy enough to have properly understood this patch. None the less, I feel like I have a good understanding of the impact the patch is trying to achieve and the abstract approach (thanks to the readability of the code). I pretty much ignored syntax and looked only at the discourse that I could understand. It looks as though the ability to access the server's other shares, even if a corrupt header exists for one of those shares, is functional. And one would be able to retrieve other files on that corrupted server, even with a return of corrupted headers. AFAIK, the application now identifies corrupt headers, circumvents them for the action, and then reports what headers were corrupt. Yay, my first review.
davidsarah commented 2012-03-31 00:40:29 +00:00
Author
Owner

kraeftig: thanks.

I understand what the behaviour should be now, and will make sure that it works that way before merging the pluggable backends code. (Currently on the ticket999-S3-backends branch, the disk backend works as described by warner, while the S3 backend does not. But the S3 backend is going to be rewritten for RAIC before it gets merged, in any case.)

kraeftig: thanks. I understand what the behaviour should be now, and will make sure that it works that way before merging the pluggable backends code. (Currently on the ticket999-S3-backends branch, the disk backend works as described by warner, while the S3 backend does not. But the S3 backend is going to be rewritten for RAIC before it gets merged, in any case.)
davidsarah commented 2012-06-05 03:17:12 +00:00
Author
Owner

The tests on the cloud branch have been updated to reflect the desired behaviour. They pass for the disk backend, and are marked TODO for the cloud backend.

The tests on the cloud branch have been updated to reflect the desired behaviour. They pass for the disk backend, and are marked TODO for the cloud backend.
daira commented 2013-05-01 02:46:30 +00:00
Author
Owner

Fixed in fd819cea and ea335f74 on the cloud-rebased2 branch.

Fixed in [fd819cea](https://github.com/LeastAuthority/tahoe-lafs/commit/fd819cea11599cc274b8e1d72bfce0fffea39296) and [ea335f74](https://github.com/LeastAuthority/tahoe-lafs/commit/ea335f7437474f4f22f265b896ac889676073999) on the cloud-rebased2 branch.

It looks good to me, except for one line I have a question about. I left a comment on fd819cea about it.

Does the server file a local corruption report like Brian suggested? I can't seem to find code anywhere that does this.

It looks good to me, except for one line I have a question about. I left a comment on [fd819cea](https://github.com/LeastAuthority/tahoe-lafs/commit/fd819cea11599cc274b8e1d72bfce0fffea39296) about it. Does the server file a local corruption report like Brian suggested? I can't seem to find code anywhere that does this.
daira commented 2013-07-17 13:23:48 +00:00
Author
Owner

Replying to markberger:

It looks good to me, except for one line I have a question about. I left a comment on fd819cea about it.

Fixed, thanks.

Does the server file a local corruption report like Brian suggested? I can't seem to find code anywhere that does this.

It does not. Filed as #2026.

Replying to [markberger](/tahoe-lafs/trac-2024-07-25/issues/1566#issuecomment-86083): > It looks good to me, except for one line I have a question about. I left a comment on [fd819cea](https://github.com/LeastAuthority/tahoe-lafs/commit/fd819cea11599cc274b8e1d72bfce0fffea39296) about it. Fixed, thanks. > Does the server file a local corruption report like Brian suggested? I can't seem to find code anywhere that does this. It does not. Filed as #2026.
tahoe-lafs added the
fixed
label 2013-07-17 13:23:59 +00:00
daira closed this issue 2013-07-17 13:23:59 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1566
No description provided.