add directory traversal / deep-verify capability? #308
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#308
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We might split our current three-level directory capability structure (write,
read, verify) into four levels: write, read, traverse, verify. The 'traverse
cap' would be able to access the traverse cap of child directories, and the
verify cap of child files. It would not be able to read child names.
The issue is that, at present, the verifier manifest (a set of verifier caps
for all files and directories reachable from some root) can only be generated
by someone who holds a readcap for the root. This manifest generation cannot
be safely delegated to some other party (such as a central allmydata server).
So we're forced to decide between having customers expose their files to us
(by giving us their root readcap) or being required to create their manifest
(for file checking/repair) on their own.
If we had traversal caps, then customers could give us the traversal cap
instead of their read cap. We could still see the shape of their filesystem
(and probably the length of their filenames, and the size of their files),
but perhaps that would be little enough exposure that customers would be
comfortable with revealing it. In exchange, we could provide the service of
keeping all their files checked and repaired with less effort on their part,
even when they leave their node offline for months at a time.
The implementation would require a couple of pieces:
beget readcaps. Readcaps beget traversecaps. Traversecaps beget
verifycaps.
as well: this is tricky, and will require some staring at the DSA mutable
file diagram to find a place that could accomodate it.
traversecap, metadata)
the dirnode's readcap.
When we do DSA dirnodes, we should take advantage of the compatibility break
to implement something like this. I suspect it will require changes to the
DSA scheme as well.
(there is probably a better name for this concept.. "walk cap"? "manifest
cap"? "deep verify cap"?)
I like "deep verify cap" as a name.
However, their manifest doesn't change while they are off-line, right? So it doesn't seem too onerous to require them to produce manifests for checkers whenever they change their tree.
Still, it is an interesting idea.
true. The requirement would be that they produce and deliver (reliably) a
manifest some time after they stop changing things, and before they shut down
their machine or go offline for a month. One concern is that we can't predict
their behavior, so we might have to be fairly aggressive about pushing these
manifests (like, after a minute of inactivity), and they're relatively
expensive to build (since it requires a traversal of their whole directory
tree). My original hope was to produce a manifest once per day, but I'm not
sure how realistic that is w.r.t. a laptop which goes offline unexpectedly.
Another factor to keep in mind is directory sharing. We haven't talked much
about who "owns" shared directories: one reasonable answer is that everybody
does: if you can read the file, you share responsibility for keeping it alive
(by maintaing a lease on it along with everyone else). Another reasonable
policy is that we only add leases to files in writeable directories,
declaring "ownership" to be equal to mutability. This approach would work
better for read-only directories which are shared among many people, but
would fail if the write-capable "owner" of that directory got tired of
maintaining it.
In any case, if the set of files in your manifest can change without your
involvement (because somebody else made additions to a shared directory),
then we might want the manifest to be updated in a more offline fashion, and
to do this we'd need some sort of traversal cap. On the other hand, we might
make the argument that the manifest of the person who added that new child
may contain the new file, and their manifest would be good enough to keep the
file alive. Or, we could just state that you have to give us a new manifest
at least once a month if you want to take advantage of our file keepalive
services.
On the other other hand, the file keepalive service might also be the
first-line quota enforcement service, and we might require that you submit
your traversal cap as fairly cheap way to estimate the amount of space you're
consuming. In this world, the rule would be that we'll only do keepalives for
the 1G or 10G or whatever you've contracted with us to store, and the quota
is primarily enforced by adding up the sizes of all files in the manifest
(which we calculate ourselves, using the traversal cap). In this case, we'd
only check with the storage servers rarely, either randomly or if we suspect
that the client is storing large files outside the directory graph that
they've given us traversal authority over. If the storage servers tell us
that this user is storing more data than the manifest contains, we might get
suspicious.
directory traversal capabilityto add directory traversal / deep-verify capability?Tagging issues relevant to new cap protocol design.
I'm pretty sure we want this, and I see how to do it for the Elk Point design.
The name deep-verify seems preferable because it would allow you to verify, not just traverse.
In http://tahoe-lafs.org/pipermail/tahoe-dev/2009-July/002302.html , zooko asks whether we should make all verify caps deep (in the same way that all directory read caps are deep). He also points out this counterargument:
For the next version of the Elk Point protocol I'm working on (v4), I plan to make shallow verify caps the same as storage indices. So, it would be automatic that every storage server has a shallow verify cap for each share that it holds.
In this context I agree with the counterargument. A server shouldn't automatically get deep verify authority for the shares it holds.
zooko argued for a different conclusion:
I don't agree that the cap protocol should be designed in a way that precludes this privacy gain. It's certainly hard to achieve privacy of directory structure against storage servers (even when running Tahoe-LAFS over Tor, I2P, etc.). However, if we move to an unencrypted storage protocol (or make encryption optional for that protocol), then making all verify caps deep would reveal the whole directory structure even to passive observers.
If you can't see the SVG attachment, try http://jacaranda.org/tahoe/immutable-elkpoint-4.png
Attachment immutable-elkpoint-4.svg (85692 bytes) added
Immutable file protocol "Elk Point 4" (Scalable Vector Graphics format). [errors in text]corrected
A known weakness in Elk Point 4 is that the holder of a read cap can't verify that the value of Ctext_X in the share is correct (and hence that the decryption Plain_X, which would hold the verify caps of a directory's children, is correct). This is OK if Plain_K holds read/verify caps for the directory's children, because a read cap holder can use those and ignore Plain_X.
Replying to davidsarah:
Oh, there's a better solution. We can include hash(CS, Plain_K) in the share (incidentally fixing #453), and then compute K as a hash of that and Plain_X. Then the read cap holder can check the decrypted Plain_X against K, even though it doesn't in general know CS.
Ticket retargeted after milestone closed (editing milestones)