mutable modify() may need to publish even if retry was a NOP #551

Closed
opened 2008-12-05 07:37:02 +00:00 by warner · 1 comment

The mutable-file modify() function takes a "modifier callback" and
performs (retrieve-modify-publish) until either the modifier callback does
not actually modify anything, or the publish succeeds without raising
UncoordinatedWriteError. The intent is to keep applying some change until it
sticks. If UCWE is reliably raised in case of overlapping writes, and if all
parties keep trying (with random backoff) until they succeed, eventually all
parties' changes should be applied.

The don't-publish-if-modifier-didn't-change-anything clause is intended to
handle the case where two parties are each performing the same change.
However, I'm starting to think that it's a mistake. We planned (but did not
implement) a mutable-file "recovery" mechanism, which was to be triggered by
any UCWE, and was to reinforce some version (not necesarily the one that we
just published), to reduce the chance that later writes and crashing clients
could break the file completely.

An uncontested retrieve+publish cycle should have about the same result as a
dedicated recovery operation. So, for the modify function, if the first
publish caused UCWE, then I'm thinking we should always do a second publish,
even if the modifier callback doesn't wind up modifying anything. Since #546
is a place where UCWE can occur when it doesn't really need to, and because
in #546 the publish is quite successful (the UCWE is raised just because of a
few leftover old "surprise" shares; the new version has a full 10 shares
written to the top of the permuted peer list), the second retrieve will get
the new version of the file, and the second publish would normally be
skipped.

So I currently think that we should change the logic of modify() to
keep doing retrieve-modify-publish until the publish finishes without UCWE,
and remove the if-modifier-didn't-change-anything test.

The mutable-file `modify()` function takes a "modifier callback" and performs (retrieve-modify-publish) until either the modifier callback does not actually modify anything, or the publish succeeds without raising UncoordinatedWriteError. The intent is to keep applying some change until it sticks. If UCWE is reliably raised in case of overlapping writes, and if all parties keep trying (with random backoff) until they succeed, eventually all parties' changes should be applied. The don't-publish-if-modifier-didn't-change-anything clause is intended to handle the case where two parties are each performing the same change. However, I'm starting to think that it's a mistake. We planned (but did not implement) a mutable-file "recovery" mechanism, which was to be triggered by any UCWE, and was to reinforce some version (not necesarily the one that we just published), to reduce the chance that later writes and crashing clients could break the file completely. An uncontested retrieve+publish cycle should have about the same result as a dedicated recovery operation. So, for the `modify` function, if the first publish caused UCWE, then I'm thinking we should always do a second publish, even if the modifier callback doesn't wind up modifying anything. Since #546 is a place where UCWE can occur when it doesn't really need to, and because in #546 the publish is quite successful (the UCWE is raised just because of a few leftover old "surprise" shares; the new version has a full 10 shares written to the top of the permuted peer list), the second retrieve will get the new version of the file, and the second publish would normally be skipped. So I currently think that we should change the logic of `modify()` to keep doing retrieve-modify-publish until the publish finishes without UCWE, and remove the if-modifier-didn't-change-anything test.
warner added the
code-dirnodes
major
defect
1.2.0
labels 2008-12-05 07:37:02 +00:00
warner added this to the undecided milestone 2008-12-05 07:37:02 +00:00
Author

changeset:ffb598514656e7b2 implements this. I made it such that if the initial call of the modifier is a NOP, the file is not published. So the rule is that a publish only happens if something changed, but if we ever see a UCWE, we'll keep trying until we get a publish that doesn't see UCWE. (subject to the limitations of the back-off-agent, which gives up after four retries).

changeset:ffb598514656e7b2 implements this. I made it such that if the initial call of the modifier is a NOP, the file is not published. So the rule is that a publish only happens if something changed, but if we ever see a UCWE, we'll keep trying until we get a publish that doesn't see UCWE. (subject to the limitations of the back-off-agent, which gives up after four retries).
warner added the
fixed
label 2008-12-06 05:31:29 +00:00
warner modified the milestone from undecided to 1.3.0 2008-12-06 05:31:29 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#551
No description provided.