mapupdate(MODE_WRITE) triggers on a false boundary #547

Open
opened 2008-12-05 07:05:09 +00:00 by warner · 3 comments

"problem 1" in #546, is that the mapupdate code has a bug, in which it
triggers too early. The MODE_WRITE logic waits for the following conditions:

  1. for the highest sequence number we've ever seen, we can recover all
    versions that have that sequence number
  2. we've received responses from at least k+epsilon servers
  3. we've seen a contiguous range of epsilon servers who do not have a share

The last criteria is intended to help us find the edge of the "active set":
the boundary between servers who have shares (at the beginning of the
permuted list) and those who do not (at the end of the list). The bug is in
the way this last criteria is tested.

If we've queried 10 servers and received responses from 9 of them, in the
pattern "00100010?1", the logic that looks for "1000" will fire:

00100010?1
  1000

Instead, the logic should be more like "1000$". At least it should not be
allowed to fire if there are any share-holding servers beyond the match.

The consequence of this bug is to exacerbate the problems in #546: sending
shares to servers which already have other shares, triggering inappropriate
UncoordinatedWriteErrors.

"problem 1" in #546, is that the mapupdate code has a bug, in which it triggers too early. The MODE_WRITE logic waits for the following conditions: 1. for the highest sequence number we've ever seen, we can recover all versions that have that sequence number 2. we've received responses from at least k+epsilon servers 3. we've seen a contiguous range of epsilon servers who do not have a share The last criteria is intended to help us find the edge of the "active set": the boundary between servers who have shares (at the beginning of the permuted list) and those who do not (at the end of the list). The bug is in the way this last criteria is tested. If we've queried 10 servers and received responses from 9 of them, in the pattern "00100010?1", the logic that looks for "1000" will fire: ``` 00100010?1 1000 ``` Instead, the logic should be more like "1000$". At least it should not be allowed to fire if there are any share-holding servers beyond the match. The consequence of this bug is to exacerbate the problems in #546: sending shares to servers which already have other shares, triggering inappropriate UncoordinatedWriteErrors.
warner added the
code-mutable
major
defect
1.2.0
labels 2008-12-05 07:05:09 +00:00
warner added this to the undecided milestone 2008-12-05 07:05:09 +00:00
Author

Attachment cryptic_notes.txt (1630 bytes) added

**Attachment** cryptic_notes.txt (1630 bytes) added
tahoe-lafs modified the milestone from undecided to 1.7.0 2010-03-24 23:12:10 +00:00

It's really bothering me that mutable file upload and download behavior is so finicky, buggy, inefficient, hard to understand, different from immutable file upload and download behavior, etc. So I'm putting a bunch of tickets into the "1.8" Milestone. I am not, however, at this time, volunteering to work on these tickets, so it might be a mistake to put them into the 1.8 Milestone, but I really hope that someone else will volunteer or that I will decide to do it myself. :-)

It's really bothering me that mutable file upload and download behavior is so finicky, buggy, inefficient, hard to understand, different from immutable file upload and download behavior, etc. So I'm putting a bunch of tickets into the "1.8" Milestone. I am not, however, at this time, volunteering to work on these tickets, so it might be a mistake to put them into the 1.8 Milestone, but I really hope that someone else will volunteer or that I will decide to do it myself. :-)
zooko modified the milestone from 1.7.0 to 1.8.0 2010-05-27 22:07:41 +00:00
tahoe-lafs modified the milestone from 1.8.0 to soon 2010-08-10 04:09:11 +00:00

If you like this ticket, you might like #540 (inappropriate "uncoordinated write error" after handling a server failure).

If you like this ticket, you might like #540 (inappropriate "uncoordinated write error" after handling a server failure).
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#547
No description provided.