add directory traversal / deep-verify capability? #308

Open
opened 2008-02-08 22:26:00 +00:00 by warner · 10 comments

We might split our current three-level directory capability structure (write,
read, verify) into four levels: write, read, traverse, verify. The 'traverse
cap' would be able to access the traverse cap of child directories, and the
verify cap of child files. It would not be able to read child names.

The issue is that, at present, the verifier manifest (a set of verifier caps
for all files and directories reachable from some root) can only be generated
by someone who holds a readcap for the root. This manifest generation cannot
be safely delegated to some other party (such as a central allmydata server).
So we're forced to decide between having customers expose their files to us
(by giving us their root readcap) or being required to create their manifest
(for file checking/repair) on their own.

If we had traversal caps, then customers could give us the traversal cap
instead of their read cap. We could still see the shape of their filesystem
(and probably the length of their filenames, and the size of their files),
but perhaps that would be little enough exposure that customers would be
comfortable with revealing it. In exchange, we could provide the service of
keeping all their files checked and repaired with less effort on their part,
even when they leave their node offline for months at a time.

The implementation would require a couple of pieces:

  • dirnode capabilities would need to have a four-level structure. Writecaps
    beget readcaps. Readcaps beget traversecaps. Traversecaps beget
    verifycaps.
  • I think this means that mutable file caps need an extra intermediate layer
    as well: this is tricky, and will require some staring at the DSA mutable
    file diagram to find a place that could accomodate it.
  • Each edge entry contains five child items: (name, writecap, readcap,
    traversecap, metadata)
  • 'name', 'readcap', and 'metadata' are encrypted with a key derived from
    the dirnode's readcap.
  • 'writecap' is encrypted with the dirnode's writecap.
  • 'traversecap' is encrypted with the dirnode's traversecap

When we do DSA dirnodes, we should take advantage of the compatibility break
to implement something like this. I suspect it will require changes to the
DSA scheme as well.

(there is probably a better name for this concept.. "walk cap"? "manifest
cap"? "deep verify cap"?)

We might split our current three-level directory capability structure (write, read, verify) into four levels: write, read, traverse, verify. The 'traverse cap' would be able to access the traverse cap of child directories, and the verify cap of child files. It would not be able to read child names. The issue is that, at present, the verifier manifest (a set of verifier caps for all files and directories reachable from some root) can only be generated by someone who holds a readcap for the root. This manifest generation cannot be safely delegated to some other party (such as a central allmydata server). So we're forced to decide between having customers expose their files to us (by giving us their root readcap) or being required to create their manifest (for file checking/repair) on their own. If we had traversal caps, then customers could give us the traversal cap instead of their read cap. We could still see the shape of their filesystem (and probably the length of their filenames, and the size of their files), but perhaps that would be little enough exposure that customers would be comfortable with revealing it. In exchange, we could provide the service of keeping all their files checked and repaired with less effort on their part, even when they leave their node offline for months at a time. The implementation would require a couple of pieces: * dirnode capabilities would need to have a four-level structure. Writecaps beget readcaps. Readcaps beget traversecaps. Traversecaps beget verifycaps. * I think this means that mutable file caps need an extra intermediate layer as well: this is tricky, and will require some staring at the DSA mutable file diagram to find a place that could accomodate it. * Each edge entry contains five child items: (name, writecap, readcap, traversecap, metadata) * 'name', 'readcap', and 'metadata' are encrypted with a key derived from the dirnode's readcap. * 'writecap' is encrypted with the dirnode's writecap. * 'traversecap' is encrypted with the dirnode's traversecap When we do DSA dirnodes, we should take advantage of the compatibility break to implement something like this. I suspect it will require changes to the DSA scheme as well. (there is probably a better name for this concept.. "walk cap"? "manifest cap"? "deep verify cap"?)
warner added the
code
major
enhancement
0.7.0
labels 2008-02-08 22:26:00 +00:00
warner added this to the eventually milestone 2008-02-08 22:26:00 +00:00

I like "deep verify cap" as a name.

However, their manifest doesn't change while they are off-line, right? So it doesn't seem too onerous to require them to produce manifests for checkers whenever they change their tree.

Still, it is an interesting idea.

I like "deep verify cap" as a name. However, their manifest doesn't change while they are off-line, right? So it doesn't seem too onerous to require them to produce manifests for checkers whenever they change their tree. Still, it is an interesting idea.
Author

true. The requirement would be that they produce and deliver (reliably) a
manifest some time after they stop changing things, and before they shut down
their machine or go offline for a month. One concern is that we can't predict
their behavior, so we might have to be fairly aggressive about pushing these
manifests (like, after a minute of inactivity), and they're relatively
expensive to build (since it requires a traversal of their whole directory
tree). My original hope was to produce a manifest once per day, but I'm not
sure how realistic that is w.r.t. a laptop which goes offline unexpectedly.

Another factor to keep in mind is directory sharing. We haven't talked much
about who "owns" shared directories: one reasonable answer is that everybody
does: if you can read the file, you share responsibility for keeping it alive
(by maintaing a lease on it along with everyone else). Another reasonable
policy is that we only add leases to files in writeable directories,
declaring "ownership" to be equal to mutability. This approach would work
better for read-only directories which are shared among many people, but
would fail if the write-capable "owner" of that directory got tired of
maintaining it.

In any case, if the set of files in your manifest can change without your
involvement (because somebody else made additions to a shared directory),
then we might want the manifest to be updated in a more offline fashion, and
to do this we'd need some sort of traversal cap. On the other hand, we might
make the argument that the manifest of the person who added that new child
may contain the new file, and their manifest would be good enough to keep the
file alive. Or, we could just state that you have to give us a new manifest
at least once a month if you want to take advantage of our file keepalive
services.

On the other other hand, the file keepalive service might also be the
first-line quota enforcement service, and we might require that you submit
your traversal cap as fairly cheap way to estimate the amount of space you're
consuming. In this world, the rule would be that we'll only do keepalives for
the 1G or 10G or whatever you've contracted with us to store, and the quota
is primarily enforced by adding up the sizes of all files in the manifest
(which we calculate ourselves, using the traversal cap). In this case, we'd
only check with the storage servers rarely, either randomly or if we suspect
that the client is storing large files outside the directory graph that
they've given us traversal authority over. If the storage servers tell us
that this user is storing more data than the manifest contains, we might get
suspicious.

true. The requirement would be that they produce and deliver (reliably) a manifest some time after they stop changing things, and before they shut down their machine or go offline for a month. One concern is that we can't predict their behavior, so we might have to be fairly aggressive about pushing these manifests (like, after a minute of inactivity), and they're relatively expensive to build (since it requires a traversal of their whole directory tree). My original hope was to produce a manifest once per day, but I'm not sure how realistic that is w.r.t. a laptop which goes offline unexpectedly. Another factor to keep in mind is directory sharing. We haven't talked much about who "owns" shared directories: one reasonable answer is that everybody does: if you can read the file, you share responsibility for keeping it alive (by maintaing a lease on it along with everyone else). Another reasonable policy is that we only add leases to files in writeable directories, declaring "ownership" to be equal to mutability. This approach would work better for read-only directories which are shared among many people, but would fail if the write-capable "owner" of that directory got tired of maintaining it. In any case, if the set of files in your manifest can change without your involvement (because somebody else made additions to a shared directory), then we might want the manifest to be updated in a more offline fashion, and to do this we'd need some sort of traversal cap. On the other hand, we might make the argument that the manifest of the person who added that new child may contain the new file, and their manifest would be good enough to keep the file alive. Or, we could just state that you have to give us a new manifest at least once a month if you want to take advantage of our file keepalive services. On the other other hand, the file keepalive service might also be the first-line quota enforcement service, and we might require that you submit your traversal cap as fairly cheap way to estimate the amount of space you're consuming. In this world, the rule would be that we'll only do keepalives for the 1G or 10G or whatever you've contracted with us to store, and the quota is primarily enforced by adding up the sizes of all files in the manifest (which we calculate ourselves, using the traversal cap). In this case, we'd only check with the storage servers rarely, either randomly or if we suspect that the client is storing large files outside the directory graph that they've given us traversal authority over. If the storage servers tell us that this user is storing more data than the manifest contains, we might get suspicious.
warner changed title from directory traversal capability to add directory traversal / deep-verify capability? 2008-02-12 04:15:00 +00:00
warner added
code-dirnodes
and removed
code
labels 2008-04-24 23:51:00 +00:00
warner modified the milestone from eventually to undecided 2008-06-01 20:43:33 +00:00
davidsarah commented 2009-10-28 04:11:57 +00:00
Owner

Tagging issues relevant to new cap protocol design.

Tagging issues relevant to new cap protocol design.
davidsarah commented 2009-11-24 17:54:02 +00:00
Owner

I'm pretty sure we want this, and I see how to do it for the Elk Point design.

The name deep-verify seems preferable because it would allow you to verify, not just traverse.

I'm pretty sure we want this, and I see how to do it for the Elk Point design. The name deep-verify seems preferable because it would allow you to verify, not just traverse.
tahoe-lafs modified the milestone from undecided to eventually 2009-11-24 17:54:02 +00:00
zooko modified the milestone from eventually to 2.0.0 2010-02-23 03:08:39 +00:00
davidsarah commented 2011-02-20 04:34:55 +00:00
Owner

In http://tahoe-lafs.org/pipermail/tahoe-dev/2009-July/002302.html , zooko asks whether we should make all verify caps deep (in the same way that all directory read caps are deep). He also points out this counterargument:

Now the reason why it could be useful to have a Shallow Verify Cap -- to give someone the ability to verify the integrity of a directory without also giving them the ability to get the verify-caps of the children -- is for a kind of data-privacy. You might want to give lots of people the ability to verify the integrity of your directories without also giving them the ability to trace your directory structure -- the sizes and link structure of your directories and files. As we've recently been discussing, it might be nice for every storage server to have a verify cap to go with every share that it holds. We generally agree that "verify caps are not secret" -- everyone in the world can see everyone else's verify caps. You might not want everyone to be able to see the shape of your filesystem though!

For the next version of the Elk Point protocol I'm working on (v4), I plan to make shallow verify caps the same as storage indices. So, it would be automatic that every storage server has a shallow verify cap for each share that it holds.

In this context I agree with the counterargument. A server shouldn't automatically get deep verify authority for the shares it holds.

zooko argued for a different conclusion:

The only problem is: they can do that anyway. Anybody who can observe your Tahoe storage service connections (even though they are encrypted) or who operates a storage server can easily detect the exact structure of your filesystem -- which directories are linked to which other directories and files, as well as the precise size of all of the files. To defend against this sort of traffic analysis or pattern detection is somewhere between "hard" and "impossible". Our comrades over at the GNUnet project, the Freenet project, and others have been trying to develop such techniques for years (both Brian and I have contributed to such projects in the past, Brian more recently than I). Whether they're close to succeeding is not clear to me (perhaps some representative of such projects or someone whose expertise is more current than mine could speak up). But it is certain that TahoeLAFS will not offer such privacy in the next couple of releases.

I don't agree that the cap protocol should be designed in a way that precludes this privacy gain. It's certainly hard to achieve privacy of directory structure against storage servers (even when running Tahoe-LAFS over Tor, I2P, etc.). However, if we move to an unencrypted storage protocol (or make encryption optional for that protocol), then making all verify caps deep would reveal the whole directory structure even to passive observers.

In <http://tahoe-lafs.org/pipermail/tahoe-dev/2009-July/002302.html> , zooko asks whether we should make all verify caps deep (in the same way that all directory read caps are deep). He also points out this counterargument: > Now the reason why it could be useful to have a Shallow Verify Cap -- to give someone the ability to verify the integrity of a directory without also giving them the ability to get the verify-caps of the children -- is for a kind of data-privacy. You might want to give lots of people the ability to verify the integrity of your directories without also giving them the ability to trace your directory structure -- the sizes and link structure of your directories and files. As we've recently been discussing, it might be nice for every storage server to have a verify cap to go with every share that it holds. We generally agree that "verify caps are not secret" -- everyone in the world can see everyone else's verify caps. You might not want everyone to be able to see the shape of your filesystem though! For the next version of the Elk Point protocol I'm working on (v4), I plan to make shallow verify caps the same as storage indices. So, it would be automatic that every storage server has a shallow verify cap for each share that it holds. In this context I agree with the counterargument. A server shouldn't automatically get deep verify authority for the shares it holds. zooko argued for a different conclusion: > The only problem is: *they can do that anyway*. Anybody who can observe your Tahoe storage service connections (even though they are encrypted) or who operates a storage server can easily detect the exact structure of your filesystem -- which directories are linked to which other directories and files, as well as the precise size of all of the files. To defend against this sort of traffic analysis or pattern detection is somewhere between "hard" and "impossible". Our comrades over at the GNUnet project, the Freenet project, and others have been trying to develop such techniques for years (both Brian and I have contributed to such projects in the past, Brian more recently than I). Whether they're close to succeeding is not clear to me (perhaps some representative of such projects or someone whose expertise is more current than mine could speak up). But it is certain that TahoeLAFS will not offer such privacy in the next couple of releases. I don't agree that the cap protocol should be designed in a way that precludes this privacy gain. It's certainly hard to achieve privacy of directory structure against storage servers (even when running Tahoe-LAFS over Tor, I2P, etc.). However, if we move to an unencrypted storage protocol (or make encryption optional for that protocol), then making all verify caps deep would reveal the whole directory structure even to *passive* observers.
davidsarah commented 2011-02-20 05:36:10 +00:00
Owner

If you can't see the SVG attachment, try http://jacaranda.org/tahoe/immutable-elkpoint-4.png

If you can't see the SVG attachment, try <http://jacaranda.org/tahoe/immutable-elkpoint-4.png>
davidsarah commented 2011-02-20 05:47:06 +00:00
Owner

Attachment immutable-elkpoint-4.svg (85692 bytes) added

Immutable file protocol "Elk Point 4" (Scalable Vector Graphics format). [errors in text]corrected

**Attachment** immutable-elkpoint-4.svg (85692 bytes) added Immutable file protocol "Elk Point 4" (Scalable Vector Graphics format). [errors in text]corrected
davidsarah commented 2011-02-20 05:59:16 +00:00
Owner

A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.

A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.
davidsarah commented 2011-02-20 06:25:47 +00:00
Owner

Replying to davidsarah:

A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.

Oh, there's a better solution. We can include hash(CS, Plain_K) in the share (incidentally fixing #453), and then compute K as a hash of that and Plain_X. Then the read cap holder can check the decrypted Plain_X against K, even though it doesn't in general know CS.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/308#issuecomment-64712): > A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X. Oh, there's a better solution. We can include hash(CS, Plain_K) in the share (incidentally fixing #453), and then compute K as a hash of that and Plain_X. Then the read cap holder can check the decrypted Plain_X against K, even though it doesn't in general know CS.
Owner

Ticket retargeted after milestone closed (editing milestones)

Ticket retargeted after milestone closed (editing milestones)
meejah removed this from the 2.0.0 milestone 2021-03-30 18:40:46 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#308
No description provided.