"virtual CDs" #204
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#204
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We've kicked around the idea of low-overhead read-only immutable directory
trees. The original "filetree" design included "realms" of dirnodes, in which
each realm was a serialized tree of dirnodes and filenodes. In that
(abandoned) design, there were both mutable and immutable realms. The biggest
problem (apart from the complexity) was in figuring out how to share
individual subtrees without giving access to parent dirnodes in the same
realm.
When we went to actually implement dirnodes, we opted for a much simpler
approach, in which each dirnode is stored in exactly one slot. This removed
most of the complexity and allowed us to produce a useable vdrive with the
fine-grained access control semantics that we wanted, but increases the
overhead of storing and accessing dirnodes, especially if those dirnodes are
being accessed through read-only capabilities.
One option I'd like to explore is to serialize a whole tree of directories
into a single storage unit. This would reduce the access and storage overhead
(fewer peers to contact to traverse the tree) at the expense of making
sharing more complicated (to share a subtree, you have to copy it out into a
new structure).
Zooko has been pointing two things out to me for months now that only
recently sunk in.
so it might be ok to put off some work until someone actually tells us
that they want to carve off a subtree (i.e. create a new get_shared_uri()
method, and allow it to return a Deferred and do more work). In addition
many shared directories are going to be shared in a read-only fashion.
place, specifically that a collection of dirnodes that mainly exist for
the benefit of a single user could be placed on the same servers or
in the same storage slots.
In addition, we've been thinking for a while that a read-only immutable tree
of dirnodes would be a useful data type. We've been calling this "burning a
Virtual CD", since that phrase expresses the immutability properties pretty
accurately.
So what I'm thinking now is that these "virtual CDs" should be serialized
into a single data structure, but that the fine-grained access control should
be implemented by putting a different encryption key on each internal
dirnode, and the child keys should be hash-derived from the parent keys. The
read-capability URI that references this structure should include the CHK
identification information for the structure as a whole, plus the
offset+length and encryption key for the individual dirnode being referenced.
This would make fine-grained sharing easy. If we improve the CHK download
code to allow random access, it would limit the amount of data that needs to
be transferred to a single segment plus hash overhead (and we'd probably want
to use a smaller segsize for these structures). By keeping the data structure
immutable, we don't need to worry about those URIs becoming stale or
invalidated by data changes after they've been minted. The URIs might be a
bit long (CHK length plus extra stuff), but it might be possible to use the
hash of the dirnode (which includes the hashes of all its children) instead
of the usual UEB hash (although that would probably make it hard to isolate
corrupted shares.. must think about this further).
immutable dirnodesto immutable dirnodes, "virtual CDs"immutable dirnodes, "virtual CDs"to "virtual CDs"Tagging issues relevant to new cap protocol design.
The minimum readcap size for an internal node of a virtual CD (just counting crypto fields), would be the size of a collision-resistant hash plus a server selection index. This minimum can be achieved if a server indexes all the internal nodes in the CDs it is storing. For example, each internal node can have its own SI, which points to the CD's bucket and the offset/length of the node in the CD.
Apparently many source control systems (svn, darcs, git) use lots of small files, so backing up a source repository is very inefficient at the moment. This ticket could probably help with that, if
tahoe backup
used virtual CDs.Ticket retargeted after milestone closed (editing milestones)