K=1 for mutable files #332

New Issue

zooko · 2008-03-08T01:40:22Z

zooko commented

2008-03-08 01:40:22 +00:00

Per [dicussion http://allmydata.org/pipermail/tahoe-dev/2008-March/000416.html]this, we're changing mutable files to have K=1.

Some documentation needs to be updated. See also #254 (need better user output on UncoordinatedWriteError) and #207 (unit tests for failure modes of small mutable files).

Per [dicussion <http://allmydata.org/pipermail/tahoe-dev/2008-March/000416.html>]this, we're changing mutable files to have K=1. Some documentation needs to be updated. See also #254 (need better user output on [UncoordinatedWriteError](wiki/UncoordinatedWriteError)) and #207 (unit tests for failure modes of small mutable files).

zooko added the

labels 2008-03-08 01:40:22 +00:00

zooko added this to the 0.9.0 (Allmydata 3.0 final) milestone 2008-03-08 01:40:22 +00:00

warner commented

2008-03-08 01:45:59 +00:00

Also, we need to fix #312 before we change K, otherwise we risk data unavailability for existing files (which are encoded at 3-of-10).

zooko commented

2008-03-10 19:38:32 +00:00

We hesitate to make this change until #207 (unit tests for failure modes of small mutable files) is in place to assure us that this change doesn't destroy any data from the 0.8.0-based Allmydata.com 3.0 beta production grid.

warner commented

2008-03-10 20:30:20 +00:00

Oh, another concern with k=1 is that this makes it a lot easier to experience
an accidental rollback attack when a single server is offline during an
update. Specifically:

I'm experimenting with 1-of-8, since that gets the availability that I
want (relative to the old 3-of-10)
the mutfilenode is created, and ver1 shares are pushed to 8 servers
later, we update to ver2, but one of the servers is offline at that moment
- ver2 shares go to 7 of the original servers and one new one.
now the offline server comes back. We now have 8 ver2 shares and 1 ver1
share
now a retrieve occurs. If it hits the once-offline server first, we'll
finish with ver1, and the accidental rollback will have occurred.

If a server was offline, then the chances of experiencing a rollback are
1-out-of-8 (since it requires that the fastest server in the later retrieval
group be the one with the old version).

When we refactor Retrieve to grab multiple versions (#205), we plan to introduce the
"epsilon" parameter as protection against both this and intentional rollback
attacks. But we're likely to switch to k=1 before we finish that work.

Oh, another concern with k=1 is that this makes it a lot easier to experience an accidental rollback attack when a single server is offline during an update. Specifically: * I'm experimenting with 1-of-8, since that gets the availability that I want (relative to the old 3-of-10) * the mutfilenode is created, and ver1 shares are pushed to 8 servers * later, we update to ver2, but one of the servers is offline at that moment * ver2 shares go to 7 of the original servers and one new one. * now the offline server comes back. We now have 8 ver2 shares and 1 ver1 share * now a retrieve occurs. If it hits the once-offline server first, we'll finish with ver1, and the accidental rollback will have occurred. If a server was offline, then the chances of experiencing a rollback are 1-out-of-8 (since it requires that the fastest server in the later retrieval group be the one with the old version). When we refactor Retrieve to grab multiple versions (#205), we plan to introduce the "epsilon" parameter as protection against both this and intentional rollback attacks. But we're likely to switch to k=1 before we finish that work.

zooko commented

2008-03-12 15:50:10 +00:00

If I understand correctly, we're pushing this one out of v0.9.0.

zooko modified the milestone from 0.9.0 (Allmydata 3.0 final) to undecided

2008-03-12 15:50:10 +00:00

zooko commented

2008-03-12 15:56:15 +00:00

If we change the Prime Directive of Uncoordinated Writes: "Don't Do That", then we also need to change the user output that is visible from the wui on UncoordinatedWriteError, as was described in now-closed ticket #254 (need better user output on UncoordinatedWriteError).

If we change the Prime Directive of Uncoordinated Writes: "Don't Do That", then we also need to change the user output that is visible from the wui on [UncoordinatedWriteError](wiki/UncoordinatedWriteError), as was described in now-closed ticket #254 (need better user output on [UncoordinatedWriteError](wiki/UncoordinatedWriteError)).

zooko commented

2008-05-30 04:46:23 +00:00

I think we've given up on the idea of using {K=1} at all. Let's close this as invalid or wontfix or fixed. :-)

zooko commented

2008-05-31 00:15:44 +00:00

Putting this into Milestone 1.1.0 so that Brian will notice it. Justification: this was a proposed robustness improvement to mutable files which was obviated by Brian's excellent "new mutable files" work which is going into 1.1.0.

zooko modified the milestone from undecided to 1.1.0

2008-05-31 00:15:44 +00:00

warner commented

2008-06-03 06:13:43 +00:00

This is definitely not a 1.1.0 thing.

I think there may still be value in switching to K=1. It needs more testing than we can give it this week, and it's lower priority that anything we have in the next month. Giving up on it requires some thinking time, and there are higher-priority demands on thinking time right now. So, having noticed it, I'm going to move it all the way back out to the Undecided category.

This is definitely not a 1.1.0 thing. I think there may still be value in switching to K=1. It needs more testing than we can give it this week, and it's lower priority that anything we have in the next month. Giving up on it requires some thinking time, and there are higher-priority demands on thinking time right now. So, having noticed it, I'm going to move it all the way back out to the Undecided category.

warner modified the milestone from 1.1.0 to undecided

2008-06-03 06:13:43 +00:00

warner added

code-mutable

and removed

unknown

labels 2008-06-03 06:13:56 +00:00

zooko added the

invalid

label 2008-09-24 13:20:31 +00:00

zooko closed this issue

2008-09-24 13:20:31 +00:00