the mutable publisher should try harder to place all shares #1640

Open
opened 2011-12-17 22:32:42 +00:00 by kevan · 3 comments
kevan commented 2011-12-17 22:32:42 +00:00
Owner

If a connection error is encountered while pushing a share to a storage server, the mutable publisher forgets about the writer object associated with the (share, server) placement; this is consistent with the pre-1.9 publisher, and, in high level terms, means that the publisher views that share placement as probably invalid, associating the error with a server failure or something like it. The pre-1.9 publisher attempts to find another home for the share placed on the broken server. The current publisher doesn't.

When I first wrote the publisher, I wanted to support streaming upload of mutable files. That made it hard to find a new home for a share placed on a broken storage server, since we wouldn't necessarily have all of the parts of the share we generated and placed before the failure available to upload to a new server. We ended up ditching streaming uploads due to other concerns; instead, we write a share all at once, and we have everything we will write to a storage server available to us when we write. Given this, there's no compelling reason that the publisher couldn't attempt to find a new home for shares placed on broken servers. Ensuring that all shares are placed if at all possible makes it more likely that there will be a recoverable version of the mutable file available after an update.

In practical terms, this increases the chance of data loss somewhat, proportional to the number of servers that fail during a publish operation. If too many storage servers fail during the upload process and too much of the initial share placement is lost due to these failures, the newly-placed mutable file might not be recoverable. A fix would involve a way to change the server associated with a writer after the writer is created, and probably some control flow changes to ensure that write failures result in shares being reassigned.

If a connection error is encountered while pushing a share to a storage server, the mutable publisher forgets about the writer object associated with the (share, server) placement; this is consistent with the pre-1.9 publisher, and, in high level terms, means that the publisher views that share placement as probably invalid, associating the error with a server failure or something like it. The pre-1.9 publisher attempts to find another home for the share placed on the broken server. The current publisher doesn't. When I first wrote the publisher, I wanted to support streaming upload of mutable files. That made it hard to find a new home for a share placed on a broken storage server, since we wouldn't necessarily have all of the parts of the share we generated and placed before the failure available to upload to a new server. We ended up ditching streaming uploads due to other concerns; instead, we write a share all at once, and we have everything we will write to a storage server available to us when we write. Given this, there's no compelling reason that the publisher couldn't attempt to find a new home for shares placed on broken servers. Ensuring that all shares are placed if at all possible makes it more likely that there will be a recoverable version of the mutable file available after an update. In practical terms, this increases the chance of data loss somewhat, proportional to the number of servers that fail during a publish operation. If too many storage servers fail during the upload process and too much of the initial share placement is lost due to these failures, the newly-placed mutable file might not be recoverable. A fix would involve a way to change the server associated with a writer after the writer is created, and probably some control flow changes to ensure that write failures result in shares being reassigned.
tahoe-lafs added the
unknown
major
defect
1.9.0
labels 2011-12-17 22:32:42 +00:00
tahoe-lafs added this to the undecided milestone 2011-12-17 22:32:42 +00:00

I'm not sure this is important enough to warrant trying to fix it in a 1.9.1. That's because a server failing during an upload isn't that common, and if it does happen it isn't that damaging. Or, wait, does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored?

I'm not sure this is important enough to warrant trying to fix it in a 1.9.1. That's because a server failing during an upload isn't that common, and if it does happen it isn't that damaging. Or, wait, does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored?
zooko modified the milestone from undecided to soon 2011-12-18 01:42:39 +00:00
kevan commented 2011-12-18 02:46:48 +00:00
Author
Owner

The old publisher won't finish until it has either placed all of its shares somewhere or has tried and failed a certain number of times to do place all of its shares somewhere. In the second case, a failure message is returned. The new publisher will return a failure message if it can't place enough shares for the file to be recoverable. So the robustness criterion in the old publisher is whether all shares are placed somewhere, and the robustness criterion in the new publisher is whether enough shares are placed for the file to be recoverable.

The old publisher won't finish until it has either placed all of its shares somewhere or has tried and failed a certain number of times to do place all of its shares somewhere. In the second case, a failure message is returned. The new publisher will return a failure message if it can't place enough shares for the file to be recoverable. So the robustness criterion in the old publisher is whether all shares are placed somewhere, and the robustness criterion in the new publisher is whether enough shares are placed for the file to be recoverable.
davidsarah commented 2011-12-18 18:58:40 +00:00
Author
Owner

Replying to zooko:

... does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored?

No, that is ticket #1057.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1640#issuecomment-86863): > ... does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored? No, that is ticket #1057.
warner added
code-peerselection
and removed
unknown
labels 2014-09-11 22:22:43 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1640
No description provided.