"virtual CDs" #204

Open
opened 2007-11-08 18:52:23 +00:00 by warner · 4 comments

We've kicked around the idea of low-overhead read-only immutable directory
trees. The original "filetree" design included "realms" of dirnodes, in which
each realm was a serialized tree of dirnodes and filenodes. In that
(abandoned) design, there were both mutable and immutable realms. The biggest
problem (apart from the complexity) was in figuring out how to share
individual subtrees without giving access to parent dirnodes in the same
realm.

When we went to actually implement dirnodes, we opted for a much simpler
approach, in which each dirnode is stored in exactly one slot. This removed
most of the complexity and allowed us to produce a useable vdrive with the
fine-grained access control semantics that we wanted, but increases the
overhead of storing and accessing dirnodes, especially if those dirnodes are
being accessed through read-only capabilities.

One option I'd like to explore is to serialize a whole tree of directories
into a single storage unit. This would reduce the access and storage overhead
(fewer peers to contact to traverse the tree) at the expense of making
sharing more complicated (to share a subtree, you have to copy it out into a
new structure).

Zooko has been pointing two things out to me for months now that only
recently sunk in.

  • most directories are probably only going to be accessed by a single user,
    so it might be ok to put off some work until someone actually tells us
    that they want to carve off a subtree (i.e. create a new get_shared_uri()
    method, and allow it to return a Deferred and do more work). In addition
    many shared directories are going to be shared in a read-only fashion.
  • there is benefit to putting related pieces of information in the same
    place, specifically that a collection of dirnodes that mainly exist for
    the benefit of a single user could be placed on the same servers or
    in the same storage slots.

In addition, we've been thinking for a while that a read-only immutable tree
of dirnodes would be a useful data type. We've been calling this "burning a
Virtual CD", since that phrase expresses the immutability properties pretty
accurately.

So what I'm thinking now is that these "virtual CDs" should be serialized
into a single data structure, but that the fine-grained access control should
be implemented by putting a different encryption key on each internal
dirnode, and the child keys should be hash-derived from the parent keys. The
read-capability URI that references this structure should include the CHK
identification information for the structure as a whole, plus the
offset+length and encryption key for the individual dirnode being referenced.

This would make fine-grained sharing easy. If we improve the CHK download
code to allow random access, it would limit the amount of data that needs to
be transferred to a single segment plus hash overhead (and we'd probably want
to use a smaller segsize for these structures). By keeping the data structure
immutable, we don't need to worry about those URIs becoming stale or
invalidated by data changes after they've been minted. The URIs might be a
bit long (CHK length plus extra stuff), but it might be possible to use the
hash of the dirnode (which includes the hashes of all its children) instead
of the usual UEB hash (although that would probably make it hard to isolate
corrupted shares.. must think about this further).

We've kicked around the idea of low-overhead read-only immutable directory trees. The original "filetree" design included "realms" of dirnodes, in which each realm was a serialized tree of dirnodes and filenodes. In that (abandoned) design, there were both mutable and immutable realms. The biggest problem (apart from the complexity) was in figuring out how to share individual subtrees without giving access to parent dirnodes in the same realm. When we went to actually implement dirnodes, we opted for a much simpler approach, in which each dirnode is stored in exactly one slot. This removed most of the complexity and allowed us to produce a useable vdrive with the fine-grained access control semantics that we wanted, but increases the overhead of storing and accessing dirnodes, especially if those dirnodes are being accessed through read-only capabilities. One option I'd like to explore is to serialize a whole tree of directories into a single storage unit. This would reduce the access and storage overhead (fewer peers to contact to traverse the tree) at the expense of making sharing more complicated (to share a subtree, you have to copy it out into a new structure). Zooko has been pointing two things out to me for months now that only recently sunk in. * most directories are probably only going to be accessed by a single user, so it might be ok to put off some work until someone actually tells us that they want to carve off a subtree (i.e. create a new get_shared_uri() method, and allow it to return a Deferred and do more work). In addition many shared directories are going to be shared in a read-only fashion. * there is benefit to putting related pieces of information in the same place, specifically that a collection of dirnodes that mainly exist for the benefit of a single user could be placed on the same servers or in the same storage slots. In addition, we've been thinking for a while that a read-only immutable tree of dirnodes would be a useful data type. We've been calling this "burning a Virtual CD", since that phrase expresses the immutability properties pretty accurately. So what I'm thinking now is that these "virtual CDs" should be serialized into a single data structure, but that the fine-grained access control should be implemented by putting a different encryption key on each internal dirnode, and the child keys should be hash-derived from the parent keys. The read-capability URI that references this structure should include the CHK identification information for the structure as a whole, plus the offset+length and encryption key for the individual dirnode being referenced. This would make fine-grained sharing easy. If we improve the CHK download code to allow random access, it would limit the amount of data that needs to be transferred to a single segment plus hash overhead (and we'd probably want to use a smaller segsize for these structures). By keeping the data structure immutable, we don't need to worry about those URIs becoming stale or invalidated by data changes after they've been minted. The URIs might be a bit long (CHK length plus extra stuff), but it might be possible to use the hash of the dirnode (which includes the hashes of all its children) instead of the usual UEB hash (although that would probably make it hard to isolate corrupted shares.. must think about this further).
warner added the
code-encoding
major
enhancement
0.6.1
labels 2007-11-08 18:52:23 +00:00
warner added this to the eventually milestone 2007-11-08 18:52:23 +00:00
warner changed title from immutable dirnodes to immutable dirnodes, "virtual CDs" 2008-03-08 00:43:59 +00:00
warner added
code-dirnodes
and removed
code-encoding
labels 2008-04-24 23:50:45 +00:00
warner modified the milestone from eventually to undecided 2008-06-01 20:39:38 +00:00
zooko changed title from immutable dirnodes, "virtual CDs" to "virtual CDs" 2009-02-25 22:54:56 +00:00
davidsarah commented 2009-10-28 07:20:54 +00:00
Owner

Tagging issues relevant to new cap protocol design.

Tagging issues relevant to new cap protocol design.
davidsarah commented 2009-12-20 17:12:00 +00:00
Owner

The minimum readcap size for an internal node of a virtual CD (just counting crypto fields), would be the size of a collision-resistant hash plus a server selection index. This minimum can be achieved if a server indexes all the internal nodes in the CDs it is storing. For example, each internal node can have its own SI, which points to the CD's bucket and the offset/length of the node in the CD.

The minimum readcap size for an internal node of a virtual CD (just counting crypto fields), would be the size of a collision-resistant hash plus a server selection index. This minimum can be achieved if a server indexes all the internal nodes in the CDs it is storing. For example, each internal node can have its own SI, which points to the CD's bucket and the offset/length of the node in the CD.
davidsarah commented 2010-04-25 21:02:42 +00:00
Owner

Apparently many source control systems (svn, darcs, git) use lots of small files, so backing up a source repository is very inefficient at the moment. This ticket could probably help with that, if tahoe backup used virtual CDs.

Apparently many source control systems (svn, darcs, git) use lots of small files, so backing up a source repository is very inefficient at the moment. This ticket could probably help with that, if `tahoe backup` used virtual CDs.
tahoe-lafs modified the milestone from undecided to 2.0.0 2010-04-25 21:02:42 +00:00
Owner

Ticket retargeted after milestone closed (editing milestones)

Ticket retargeted after milestone closed (editing milestones)
meejah removed this from the 2.0.0 milestone 2021-03-30 18:40:46 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#204
No description provided.