update mutable file API: overwrite vs replace, expose verinfo? #328

New Issue

warner · 2008-02-29T03:35:28Z

warner commented

2008-02-29 03:35:28 +00:00

In the #321 analysis, we discovered that the mutable file API
(!IMutableFileNode) is not doing quite what we want it to do. The intention
was to make sure that the application was aware of any changes to the file:
no automatic merges. That means that a user of the mutable file (such as a
dirnode) that sees version 3, then changes the contents to add some child
entry and writes out the new version as 4, should receive an error if the
mutable file node sees evidence of some other version 4.

We expect code to use the mutable file node like this:

 n = client.create_node_from_uri(URI)
 d = n.download_to_data()
 d.addCallback(modify)
 d.addCallback(n.replace)

but what's not obvious from this code is that the mutable file node remembers
the version information (seqnum and roothash) internally. The preceding code
sample suggests that the replace call will use this internal cache as
the update precondition, thereby throwing an exception if a newer version is
already present in the grid.

However, we learned that replace is actually doing a retrieve first,
then doing a publish. This has the effect of ignoring the
previously-retrieved version verinfo, transforming our replace call
into an overwrite.

We thought of two API changes to fix this. The first is to make distinct
replace and overwrite calls. replace would be used as
above, with a requirement that n.download_to_data must be called first
(under penalty of raising an exception). The verinfo retrieved with
download_to_data would be used as the precondition for the
replace. The separate overwrite call would not require a
preceding download_to_data, instead it would do an internal retrieval
(to discover what seqnum it should use) then turn around and do a publish
with the replacement contents. We would expect overwrite to be used
very rarely, since it makes data loss the norm rather than the exception.

The other approach would be to expose the verinfo to the application, and ask
it to pass that information back in at publish time. By making it explicit
instead of staying hidden inside the mutable file node, the verinfo data
could be passed to and from an external client (i.e. over HTTP, perhaps in
JSON or through some special HTTP header like !ETag). In this case,
download or download_to_data would fire with a tuple of (verinfo,
contents), and replace would accept (verinfo, newcontents).

If we were also to pass the sharemap from the retrieve side to the publish
side, we could avoid a roundtrip at publish time (by using the previous
sharemap as a precondition). This would shave about 200-250ms off the update
time, which represents about 40% of a small-dirnode publish, and perhaps 10%
of a large one. Another small speedup would be achieved by stashing the
encrypted privkey at retrieve time, perhaps 5-10%, but it should be kept
hidden inside the node rather than being passed through the application.

I don't know which approach is better. A lot of it depends upon what you
think of as the application, and where you are comfortable with "automatic
merges". It probably comes down to whether or not the HTTP "add child to
directory" command is supposed to overwrite an existing child or not.

add_child is pretty safe as it is: the only chance of surprise/dataloss is if
someone else added a child of the same name while you weren't looking, in
which case you'll overwrite it. This could be considered an automatic merge
of the add request.

We're planning (#205) to refactor Retrieve to make it easier for use in
checking (by returning multiple versions, and information about versions that
it couldn't decode). We should probably decide on this issue before doing
that refactoring.

In the #321 analysis, we discovered that the mutable file API (!IMutableFileNode) is not doing quite what we want it to do. The intention was to make sure that the application was aware of any changes to the file: no automatic merges. That means that a user of the mutable file (such as a dirnode) that sees version 3, then changes the contents to add some child entry and writes out the new version as 4, should receive an error if the mutable file node sees evidence of some other version 4. We expect code to use the mutable file node like this: ``` n = client.create_node_from_uri(URI) d = n.download_to_data() d.addCallback(modify) d.addCallback(n.replace) ``` but what's not obvious from this code is that the mutable file node remembers the version information (seqnum and roothash) internally. The preceding code sample suggests that the `replace` call will use this internal cache as the update precondition, thereby throwing an exception if a newer version is already present in the grid. However, we learned that `replace` is actually doing a retrieve first, then doing a publish. This has the effect of ignoring the previously-retrieved version verinfo, transforming our `replace` call into an `overwrite`. We thought of two API changes to fix this. The first is to make distinct `replace` and `overwrite` calls. `replace` would be used as above, with a requirement that `n.download_to_data` must be called first (under penalty of raising an exception). The verinfo retrieved with `download_to_data` would be used as the precondition for the `replace`. The separate `overwrite` call would not require a preceding `download_to_data`, instead it would do an internal retrieval (to discover what seqnum it should use) then turn around and do a publish with the replacement contents. We would expect `overwrite` to be used very rarely, since it makes data loss the norm rather than the exception. The other approach would be to expose the verinfo to the application, and ask it to pass that information back in at publish time. By making it explicit instead of staying hidden inside the mutable file node, the verinfo data could be passed to and from an external client (i.e. over HTTP, perhaps in JSON or through some special HTTP header like !ETag). In this case, `download` or `download_to_data` would fire with a tuple of (verinfo, contents), and `replace` would accept (verinfo, newcontents). If we were also to pass the sharemap from the retrieve side to the publish side, we could avoid a roundtrip at publish time (by using the previous sharemap as a precondition). This would shave about 200-250ms off the update time, which represents about 40% of a small-dirnode publish, and perhaps 10% of a large one. Another small speedup would be achieved by stashing the encrypted privkey at retrieve time, perhaps 5-10%, but it should be kept hidden inside the node rather than being passed through the application. I don't know which approach is better. A lot of it depends upon what you think of as the application, and where you are comfortable with "automatic merges". It probably comes down to whether or not the HTTP "add child to directory" command is supposed to overwrite an existing child or not. add_child is pretty safe as it is: the only chance of surprise/dataloss is if someone else added a child of the same name while you weren't looking, in which case you'll overwrite it. This could be considered an automatic merge of the add request. We're planning (#205) to refactor Retrieve to make it easier for use in checking (by returning multiple versions, and information about versions that it couldn't decode). We should probably decide on this issue before doing that refactoring.

warner added the

labels 2008-02-29 03:35:28 +00:00

warner added this to the undecided milestone 2008-02-29 03:35:28 +00:00

warner commented

2008-04-23 19:05:53 +00:00

The (internal) mutable file API has been overhauled, and should support at least some of this stuff. Take a look at the IMutableFileNode API in source:src/allmydata/interfaces.py#L568 .

An open question is how much to expose this to the outside world (through the webapi). The only way an external client can currently use mutable files is by blindly overwriting them. If those clients want to put structured data in a mutable file and perform controlled updates, then they'll need the same sort of test-modify-set semantics as they could get inside the tahoe process by passing a servermap around.

But I don't think that passing a servermap outside the tahoe node is a good idea (although maybe passing a handle to one could work).

There's much to discuss about this one. But I think about half of this ticket could be closed now.

The (internal) mutable file API has been overhauled, and should support at least some of this stuff. Take a look at the `IMutableFileNode` API in source:src/allmydata/interfaces.py#L568 . An open question is how much to expose this to the outside world (through the webapi). The only way an external client can currently use mutable files is by blindly overwriting them. If those clients want to put structured data in a mutable file and perform controlled updates, then they'll need the same sort of test-modify-set semantics as they could get inside the tahoe process by passing a servermap around. But I don't think that passing a servermap outside the tahoe node is a good idea (although maybe passing a handle to one could work). There's much to discuss about this one. But I think about half of this ticket could be closed now.

warner added

code-mutable

and removed

code

labels 2008-04-24 23:27:36 +00:00

warner commented

2008-05-09 01:26:38 +00:00

I've created #413 to track the possibility of exposing the servermaps and version info in general to HTTP clients. Time to close this one.

warner added the

fixed

label 2008-05-09 01:26:38 +00:00

warner modified the milestone from undecided to 1.1.0