need better user output on UncoordinatedWriteError #254

New Issue

zooko · 2008-01-04T19:18:38Z

zooko commented

2008-01-04 19:18:38 +00:00

The current strategy for solving the eternal puzzle of availability vs. consistency in the presence of multiple uncoordinated writes is to tell the user "Don't Do That." -- don't allow multiple people, or multiple processes belonging to the same person, to write to the same directory at the same time. (For example, make a different directory for each person who needs to write.)

This strategy has many benefits, including being easy to implement, easy to understand, offering perfect availability, and being flexible for many different use cases. However, it has the drawback that so far no actual user has read the doc (source:doc/mutable.txt) and planned in advance to avoid uncoordinated writes. In fact, even I, one of the architects of the "Don't Do That." strategy, have often forgotten, and Done It.

This shows that for some but not all of those aforementioned many use cases, people are going to want a more automated way to trade away some availability in order to get consistency. This automation might not need to live in Tahoe proper, but might be more of a feature of the user interface or application layer.

Anyway, the very next improvement, which we should do ASAP, is make the error message that arises clearly explain to the user that (a) this is the expected result of uncoordinated writes, not an internal error in the Tahoe implementation, and (b) who wrote what when (inasmuch as we can easily provide clues about that), and (c) "Don't Do That!".

The current strategy for solving [the eternal puzzle of availability vs. consistency](http://allmydata.org/trac/tahoe/browser/docs/mutable.txt?rev=955bd5383daed77c#L29) in the presence of multiple uncoordinated writes is to tell the user "Don't Do That." -- don't allow multiple people, or multiple processes belonging to the same person, to write to the same directory at the same time. (For example, make a different directory for each person who needs to write.) This strategy has many benefits, including being easy to implement, easy to understand, offering perfect availability, and being flexible for many different use cases. However, it has the drawback that so far no actual user has read the doc (source:doc/mutable.txt) and planned in advance to avoid uncoordinated writes. In fact, even I, one of the architects of the "Don't Do That." strategy, have often forgotten, and Done It. This shows that for some but not all of those aforementioned many use cases, people are going to want a more automated way to trade away some availability in order to get consistency. This automation might not need to live in Tahoe proper, but might be more of a feature of the user interface or application layer. Anyway, the very next improvement, which we should do ASAP, is make the error message that arises clearly explain to the user that (a) this is the expected result of uncoordinated writes, not an internal error in the Tahoe implementation, and (b) who wrote what when (inasmuch as we can easily provide clues about that), and (c) "Don't Do That!".

zooko added the

labels 2008-01-04 19:18:38 +00:00

zooko added this to the 0.7.0 milestone 2008-01-04 19:18:38 +00:00

secorp commented

2008-01-04 19:23:16 +00:00

I think this came about in a set of serial modifications to the directory, not parallel, specifically so that they would not overlap.

Claudio (at Digbang) should be able to give more information about how he did this.

I think this came about in a set of serial modifications to the directory, not parallel, specifically so that they would not overlap. Claudio (at Digbang) should be able to give more information about how he did this.

Claudio commented

2008-01-04 20:03:29 +00:00

The modifications were indeed parallel. I was handling concurrent adds, deletes and uploads.
Haven't seen the exception pop up when making the requests serially.

(A note clarifying this in the webapi.txt might be useful for front-end developers)

The modifications were indeed parallel. I was handling concurrent adds, deletes and uploads. Haven't seen the exception pop up when making the requests serially. (A note clarifying this in the webapi.txt might be useful for front-end developers)

zooko commented

2008-01-04 23:18:20 +00:00

Claudio: good point about a note in webapi.txt. I'll do that.

warner commented

2008-01-05 05:28:38 +00:00

Also note that the top item on the #207 megaticket is to implement the
recovery algorithm that we documented in
http://allmydata.org/trac/tahoe/browser/docs/mutable.txt#L436
docs/mutable.txt. With recovery in place, UncoordinatedWriteError would
change from meaning "you shouldn't have done that, and the file might now be
lost or at least very unhealthy), to "you shouldn't have done that, but some
version of the file is probably very healthy right now", which is better (for
certain values of "better": it might make it safe for application code to
catch and log+ignore UncoordinatedWriteError).

Also note that the top item on the #207 megaticket is to implement the recovery algorithm that we documented in <http://allmydata.org/trac/tahoe/browser/docs/mutable.txt#L436> docs/mutable.txt. With recovery in place, UncoordinatedWriteError would change from meaning "you shouldn't have done that, and the file might now be lost or at least very unhealthy), to "you shouldn't have done that, but some version of the file is probably very healthy right now", which is better (for certain values of "better": it might make it safe for application code to catch and log+ignore UncoordinatedWriteError).

zooko commented

2008-01-06 00:36:22 +00:00

I'm writing a unit test for UncoordinatedWriteError.

I'm writing a unit test for [UncoordinatedWriteError](wiki/UncoordinatedWriteError).

zooko commented

2008-01-07 05:43:05 +00:00

I'm going to do the following on the plane tomorrow:

update source:docs/webapi.txt to mention the necessity of write coordination
change the wui to show a user-friendly web page about write coordination instead of a Python traceback
add a unit test of this wui extension

I'm going to do the following on the plane tomorrow: * update source:docs/webapi.txt to mention the necessity of write coordination * change the wui to show a user-friendly web page about write coordination instead of a Python traceback * add a unit test of this wui extension

zooko commented

2008-01-08 17:23:35 +00:00

partially fixed by changeset:9e2ed2df01cf427c

moving the rest to 0.7.1

partially fixed by changeset:9e2ed2df01cf427c moving the rest to 0.7.1

zooko added this to the 0.8.0 (Allmydata 3.0 Beta) milestone 2008-01-23 02:44:32 +00:00

zooko commented

2008-03-08 01:35:36 +00:00

I'm going to update this to reflect the new K=1 approach for mutable files.

zooko modified the milestone from 0.8.0 (Allmydata 3.0 Beta) to 0.9.0 (Allmydata 3.0 final)

2008-03-08 02:11:31 +00:00

zooko commented

2008-03-12 15:54:52 +00:00

We're pushing off the K == 1 approach. When/if we do it, then let's remember to update the user interface in case of UncoordinatedWriteError.

Closing for now, and linking to this (now closed) ticket from #332 (K=1 for mutable files).

We're pushing off the K == 1 approach. When/if we do it, then let's remember to update the user interface in case of [UncoordinatedWriteError](wiki/UncoordinatedWriteError). Closing for now, and linking to this (now closed) ticket from #332 (K=1 for mutable files).

zooko added the

wontfix

label 2008-03-12 15:54:52 +00:00

zooko closed this issue

2008-03-12 15:54:52 +00:00

Sign in to join this conversation.