DIR2:IMM #607

Closed
opened 2009-02-06 18:22:23 +00:00 by zooko · 13 comments

Directories are currently stored in SSK files. They were designed so that directories could be easily stored in different types of files, so it shouldn't be hard for someone to implement DIR2:CHK files. These would have nice properties, especially for backup applications:

  • It would be nice to know that your old backed-up directory is immutable.
  • It would allow convergence of backed-up directories (note that front-end backup tools such as Brian's backupdb and Shawn's backup tool might figure this out and converge old directories for you, but doing it by converging CHK's on upload might work better for some cases, including the case that you aren't using backupdb or Shawn's tool).
  • It would be faster to create (currently SSKs require an RSA key-pair generation on creation, which is expensive).
Directories are currently stored in `SSK` files. They were designed so that directories could be easily stored in different types of files, so it shouldn't be hard for someone to implement `DIR2:CHK` files. These would have nice properties, especially for backup applications: * It would be nice to know that your old backed-up directory is immutable. * It would allow convergence of backed-up directories (note that front-end backup tools such as Brian's backupdb and Shawn's backup tool might figure this out and converge old directories for you, but doing it by converging CHK's on upload might work better for some cases, including the case that you aren't using backupdb or Shawn's tool). * It would be faster to create (currently SSKs require an RSA key-pair generation on creation, which is expensive).
zooko added the
code-dirnodes
major
defect
1.2.0
labels 2009-02-06 18:22:23 +00:00
zooko added this to the undecided milestone 2009-02-06 18:22:23 +00:00

(tweak formatting.. itemized lists in trac's markup language require a leading space)

(tweak formatting.. itemized lists in trac's markup language require a leading space)

Another good feature of an immutable-file based directory is that it could be
repaired, unlike our current RSA-based (write-enabler-based) mutable files,
when referenced through a readcap (#625), like the ones created by "tahoe
backup".

I'd like to implement this, and change "tahoe backup" to use it. The basic
steps I anticipate are:

  • implement create_dirnode(mutable=True, initial_children={})
  • replace the existing create_empty_dirnode() with that
  • refactor DirectoryNode to separate out the underlying filenode
    better. The idea would be to nail down the interface that dirnodes need
    from the filenode that they've wrapped. The read side just needs read().
    The write side needs the normal mutable-filenode operations, like
    modify(). We should have an immutable filenode which offers the same
    read-side interface as the mutable filenode does.
  • change the "NodeMaker" code to create dirnodes by first creating a
    filenode and then passing it as the constructor to Dirnode(). It may
    useful to first change the way that uploads are done, and create a special
    kind of immutable filenode for upload purposes. This "gestating" node
    would have an interface to add data, would perform the upload while data
    is added, and would then have a finalize() method, which would finish the
    upload process, compute the filecap, and return the real !IFilesystemNode
    which can be used for reading. Making this special node have the same
    interface as a mutable filenode's initial-upload methods would let Dirnode
    be oblivious to the type of filenode it's been given.

I'm planning to require that the contents of an immutable directory are also
immutable (LIT, CHK, and DIR2:CHK, not regular mutable DIR2), so that these
objects are always deep-readonly. (there may be an argument to provide
shallow-readonly directories, but I think deep-readonly is more generally
useful).

I'm pondering if there's a way to support multi-level trees in the future
without drastic changes, so that this one-level immutable directory could
turn into a full "virtual CD" (#204), with better performance (by bundling a
whole tree of directories into a single distributed object). This would
suggest making the name table accept tuples of names instead of just a single
one.

I've also wondered if we should implement some faster lookup scheme for these
immutable dirnodes, especially because we don't need to update it later.
Maybe djb's "cdb" (constant-time database). I'm not sure that a database
which has been optimized for minimal disk seeks will necessarily help us
here, since the segment size is drastically larger than what a hard disk
offers, and the network roundtrip latency is frequently an order of magnitude
larger too. But certainly we can come up with something that's easier to pack
and unpack than the DIR2 format.

Also, we can discard several things from the DIR2 format: we don't need child
writecaps (just the readcaps), and we obviously don't need the obsolete salt.
We probably still want the metadata dictionary, although that would
potentially interfere with the grid-side convergence that Zooko mentioned.

Changing the table format would remove some of the benefits (and thus
motivation) to the other refactoring changes described above: if we've got a
separate class for immutable-dirnodes, then there's not much point in
contorting mutable and immutable filenodes to present the same interface.
But, it would probably be cleaner overall if there were just one dirnode
class, whose mutability is determined solely by asking the underlying
filenode about its own mutability. In this case, all the mutating methods
will still exist on the immutable dirnodes, but they'd throw an exception if
you actually try to call them in that situation, just as they do now.

Another good feature of an immutable-file based directory is that it could be repaired, unlike our current RSA-based (write-enabler-based) mutable files, when referenced through a readcap (#625), like the ones created by "tahoe backup". I'd like to implement this, and change "tahoe backup" to use it. The basic steps I anticipate are: * implement `create_dirnode(mutable=True, initial_children={})` * replace the existing `create_empty_dirnode()` with that * refactor `DirectoryNode` to separate out the underlying filenode better. The idea would be to nail down the interface that dirnodes need from the filenode that they've wrapped. The read side just needs read(). The write side needs the normal mutable-filenode operations, like modify(). We should have an immutable filenode which offers the same read-side interface as the mutable filenode does. * change the "NodeMaker" code to create dirnodes by first creating a filenode and then passing it as the constructor to `Dirnode()`. It may useful to first change the way that uploads are done, and create a special kind of immutable filenode for upload purposes. This "gestating" node would have an interface to add data, would perform the upload while data is added, and would then have a finalize() method, which would finish the upload process, compute the filecap, and return the real !IFilesystemNode which can be used for reading. Making this special node have the same interface as a mutable filenode's initial-upload methods would let Dirnode be oblivious to the type of filenode it's been given. I'm planning to require that the contents of an immutable directory are also immutable (LIT, CHK, and DIR2:CHK, not regular mutable DIR2), so that these objects are always deep-readonly. (there may be an argument to provide shallow-readonly directories, but I think deep-readonly is more generally useful). I'm pondering if there's a way to support multi-level trees in the future without drastic changes, so that this one-level immutable directory could turn into a full "virtual CD" (#204), with better performance (by bundling a whole tree of directories into a single distributed object). This would suggest making the name table accept tuples of names instead of just a single one. I've also wondered if we should implement some faster lookup scheme for these immutable dirnodes, especially because we don't need to update it later. Maybe djb's "cdb" (constant-time database). I'm not sure that a database which has been optimized for minimal disk seeks will necessarily help us here, since the segment size is drastically larger than what a hard disk offers, and the network roundtrip latency is frequently an order of magnitude larger too. But certainly we can come up with something that's easier to pack and unpack than the DIR2 format. Also, we can discard several things from the DIR2 format: we don't need child writecaps (just the readcaps), and we obviously don't need the obsolete salt. We probably still want the metadata dictionary, although that would potentially interfere with the grid-side convergence that Zooko mentioned. Changing the table format would remove some of the benefits (and thus motivation) to the other refactoring changes described above: if we've got a separate class for immutable-dirnodes, then there's not much point in contorting mutable and immutable filenodes to present the same interface. But, it would probably be cleaner overall if there were just one dirnode class, whose mutability is determined solely by asking the underlying filenode about its own mutability. In this case, all the mutating methods will still exist on the immutable dirnodes, but they'd throw an exception if you actually try to call them in that situation, just as they do now.

Zooko and I had a chat, and agreed to leave the encoding format the same. So
"DIR2:" and "DIR2-CHK" (or -IMM or something) will have the same format, just
in different containers. We can put off a format change until DIR3.

We're not sure about the "prototype immutable filenode" refactoring (the one
that would make dirnodes call the same write() method for both mutable and
immutable filenodes). It might be better off deferred.

One way to make the download/read side more uniform would be to introduce
"FileVersion" objects. I might have described these in some other ticket,
but the idea would be to move the read/write methods out of MutableFileNode
and onto this FileVersion object which represents a single specific version
of the mutable slot. FileVersion.replace would encapsulate the
servermap argument, performing the replacement only if the mutable file
looked like it hadn't changed since the version was fetched.
MutableFileNode.get_best_version() would return one of these version
objects. ImmutableFileNode.get_best_version() would return self. Then
we'd make sure the read() interface was the same for both. (this would
dovetail nicely with the future LDMF files, which will offer multiple
versions: once you've grabbed the one that you care about, use read() on it).

This would take a moderate amount of work, but would allow us to use the same
dirnode code for both types: the dirnode read code would just do
self._filenode.get_best_version().read().

Zooko and I had a chat, and agreed to leave the encoding format the same. So "DIR2:" and "DIR2-CHK" (or -IMM or something) will have the same format, just in different containers. We can put off a format change until DIR3. We're not sure about the "prototype immutable filenode" refactoring (the one that would make dirnodes call the same write() method for both mutable and immutable filenodes). It might be better off deferred. One way to make the download/read side more uniform would be to introduce "FileVersion" objects. I might have described these in some other ticket, but the idea would be to move the read/write methods out of MutableFileNode and onto this FileVersion object which represents a single specific version of the mutable slot. `FileVersion.replace` would encapsulate the servermap argument, performing the replacement only if the mutable file looked like it hadn't changed since the version was fetched. `MutableFileNode.get_best_version()` would return one of these version objects. `ImmutableFileNode.get_best_version()` would return self. Then we'd make sure the read() interface was the same for both. (this would dovetail nicely with the future LDMF files, which will offer multiple versions: once you've grabbed the one that you care about, use read() on it). This would take a moderate amount of work, but would allow us to use the same dirnode code for both types: the dirnode read code would just do `self._filenode.get_best_version().read()`.
zooko changed title from DIR2:CHK to DIR2:IMM 2009-10-21 05:15:22 +00:00
Author

I posted a couple of notes about this to http://allmydata.org/pipermail/tahoe-dev/2009-October/003027.html and hereby copy them into this comment:

When you create a DIR2:IMM, giving it a set of (childname, childcap)
tuples, it should raise an exception if any childcap is not
immutable. The immutable childcaps are "CHK" (perhaps renamed to
"IMM"), LIT, and DIR2:CHK (or "DIR2:IMM").

When you unpack a DIR2:IMM, if you find any non-immutable children in
there (i.e. because someone else's Tahoe-LAFS gateway is altered or
buggy so that it did not raise the exception described above), then
you treat that child as non-existent and log a warning.

There could optionally be a command to deep-walk a directory graph
and produce an immutable snapshot of everything. This could be an
expensive operation depending on how deep the graph is, but large
files are typically already immutable, so snapshotting them is free.
Anyway, if you want to put something into an immutable directory and
you get rejected because the thing isn't immutable, then this command
would be useful.

I posted a couple of notes about this to <http://allmydata.org/pipermail/tahoe-dev/2009-October/003027.html> and hereby copy them into this comment: When you create a DIR2:IMM, giving it a set of (childname, childcap) tuples, it should raise an exception if any childcap is not immutable. The immutable childcaps are "CHK" (perhaps renamed to "IMM"), LIT, and DIR2:CHK (or "DIR2:IMM"). When you unpack a DIR2:IMM, if you find any non-immutable children in there (i.e. because someone else's Tahoe-LAFS gateway is altered or buggy so that it did not raise the exception described above), then you treat that child as non-existent and log a warning. There could optionally be a command to deep-walk a directory graph and produce an immutable snapshot of everything. This could be an expensive operation depending on how deep the graph is, but large files are typically already immutable, so snapshotting them is free. Anyway, if you want to put something into an immutable directory and you get rejected because the thing isn't immutable, then this command would be useful.
warner was assigned by zooko 2009-10-21 05:30:00 +00:00
davidsarah commented 2009-10-28 04:12:42 +00:00
Owner

Tagging issues relevant to new cap protocol design.

Tagging issues relevant to new cap protocol design.

I'm about 80% done with immutable directories. The current work is to add URI:DIR2-CHK: and URI:DIR2-LIT: to the set recognized by uri.py. (I'm planning to use CHK because the rest of the arguments are exactly the same as URI:CHK:/URI:LIT:). An ideal cap format would make the wrapping more explicit, like tahoe:*grid-4/dir/imm/READCAP and tahoe:*grid-4/imm/READCAP.

The next few steps are:

  • modify nodemaker.py to recognize the new caps and create immutable Filenodes for them and then wrap them in Directorynodes (this handles the read side)
  • add nodemaker.create_immutable_directory(children) to pack the children, perform an immutable upload, then transform the filecap into a dircap. (this handles the write side)
  • tests for those
  • new webapi (probably POST /uri?t=mkdir-immutable) that takes a JSON dict in the children= form portion: docs, tests, then implementation
  • done!

Along the way, I plan to change "tahoe backup" to use t=mkdir-with-children (which will speed things up a lot, but still create readcaps-to-mutable-directories). Then, once this ticket is closed, I'll change it again to use t=mkdir-immutable.

Incidentally, yeah, I think that a form of "cp -r" that creates an immutable deep copy of some dirnode would be a great idea. Maybe "cp -r --immutable" ? Likewise, it might be useful to have "cp -r --mutable", which explicitly creates mutable copies of everything being copied (at least of the dirnodes). The default behavior of "cp -r" should be to re-use immutable objects.

I'm about 80% done with immutable directories. The current work is to add `URI:DIR2-CHK:` and `URI:DIR2-LIT:` to the set recognized by `uri.py`. (I'm planning to use CHK because the rest of the arguments are exactly the same as `URI:CHK:/URI:LIT:`). An ideal cap format would make the wrapping more explicit, like `tahoe:*grid-4/dir/imm/READCAP` and `tahoe:*grid-4/imm/READCAP`. The next few steps are: * modify nodemaker.py to recognize the new caps and create immutable Filenodes for them and then wrap them in Directorynodes (this handles the read side) * add `nodemaker.create_immutable_directory(children)` to pack the children, perform an immutable upload, then transform the filecap into a dircap. (this handles the write side) * tests for those * new webapi (probably `POST /uri?t=mkdir-immutable`) that takes a JSON dict in the children= form portion: docs, tests, then implementation * done! Along the way, I plan to change "tahoe backup" to use t=mkdir-with-children (which will speed things up a lot, but still create readcaps-to-mutable-directories). Then, once this ticket is closed, I'll change it again to use t=mkdir-immutable. Incidentally, yeah, I think that a form of "cp -r" that creates an immutable deep copy of some dirnode would be a great idea. Maybe "cp -r --immutable" ? Likewise, it might be useful to have "cp -r --mutable", which explicitly creates mutable copies of everything being copied (at least of the dirnodes). The default behavior of "cp -r" should be to re-use immutable objects.

I've got the write and read sides done (in changeset:5fe713fc52dc331b). I had to move create_immutable_directory to Client instead of NodeMaker (because it needs the client's convergence secret).. that may change later as I figure out how to best clean this stuff up. Tests are written too, but I won't be satisfied with them until I've resurrected the figleaf code (which was surgically removed to make the Ubuntu entry easier) and can figure out what's being missed.

Next up: webapi, and changing "tahoe backup" to use the new dirnodes.

I've got the write and read sides done (in changeset:5fe713fc52dc331b). I had to move `create_immutable_directory` to `Client` instead of `NodeMaker` (because it needs the client's convergence secret).. that may change later as I figure out how to best clean this stuff up. Tests are written too, but I won't be satisfied with them until I've resurrected the figleaf code (which was surgically removed to make the Ubuntu entry easier) and can figure out what's being missed. Next up: webapi, and changing "tahoe backup" to use the new dirnodes.

looks like this will be the major (er, only) new feature in 1.6

looks like this will be the major (er, only) new feature in 1.6
warner modified the milestone from undecided to 1.6.0 2009-11-12 00:32:54 +00:00

changing "tahoe backup" to use this has been split out to #828, so the only remaining work on this ticket is to expose immutable directories via the webapi.

changing "tahoe backup" to use this has been split out to #828, so the only remaining work on this ticket is to expose immutable directories via the webapi.
Author

Hopefully also #778 will be a new feature in 1.6.

Hopefully also #778 will be a new feature in 1.6.
Author

see also #830 (review Brian's patches for #607). I guess it is really the same as this ticket, but currently this ticket is assigned to Brian and that one is assigned to me.

see also #830 (review Brian's patches for #607). I guess it is really the same as this ticket, but currently this ticket is assigned to Brian and that one is assigned to me.

I had the patch all ready to go, docs and tests and implementation, and then I had an epiphany: the JSON dictionary of child names+caps should be delivered as the body of the POST webapi request, rather than as the "children=" field of a multipart/form-data -type MIME body. This is easier for client-side implementors, and using form encoding feels inappropriate because we aren't using an HTML form to create the request anyways. Even HTML-embedded javascript will be using XMLHTTPRequest and a JSON encoder for the body rather than creating an HTML form and pressing the "submit" button programmatically.

So I'm going to spend an extra day rewriting the patch with this API.

I had the patch all ready to go, docs and tests and implementation, and then I had an epiphany: the JSON dictionary of child names+caps should be delivered as the *body* of the POST webapi request, rather than as the "children=" field of a multipart/form-data -type MIME body. This is easier for client-side implementors, and using form encoding feels inappropriate because we aren't using an HTML form to create the request anyways. Even HTML-embedded javascript will be using XMLHTTPRequest and a JSON encoder for the body rather than creating an HTML form and pressing the "submit" button programmatically. So I'm going to spend an extra day rewriting the patch with this API.

Done, in changeset:f85690697a21e669. Although I forgot to add the "if you find a mutable child in an immutable dirnode, complain and ignore it" part: I've just opened #833 for that.

Done, in changeset:f85690697a21e669. Although I forgot to add the "if you find a mutable child in an immutable dirnode, complain and ignore it" part: I've just opened #833 for that.
warner added the
fixed
label 2009-11-18 07:35:24 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#607
No description provided.