Add file-with-metadata caps #947

Open
opened 2010-02-12 19:56:53 +00:00 by kpreid · 7 comments

Web architecture expects that a resource has a Content-Type. Modern filesystems have "extended attributes" per file as well as metadata such as modification time, and it is desirable to back these things up. Both of these things point to the idea that there ought to be addressable (has-a-URL) objects which designate the metadata as well as the file data (binary blob). My understanding of current Tahoe architecture is that all metadata is instead stored in the rows of the directory objects.

Additionally, metadata should be mutable iff the file is, so that it can be updated in-place without access to every directory which might contain it.

I imagine that these objects would contain a current-design file cap, rather than themselves containing the file data, so that we still get the convergent encryption space advantage even if file metadata differs among separately-created instances.

This idea raised at friam 2010-02-12.

Web architecture expects that a resource has a Content-Type. Modern filesystems have "extended attributes" per file as well as metadata such as modification time, and it is desirable to back these things up. Both of these things point to the idea that there ought to be addressable (has-a-URL) objects which designate the metadata as well as the file data (binary blob). My understanding of current Tahoe architecture is that all metadata is instead stored in the rows of the directory objects. Additionally, metadata should be mutable iff the file is, so that it can be updated in-place without access to every directory which might contain it. I imagine that these objects would *contain* a current-design file cap, rather than themselves containing the file data, so that we still get the convergent encryption space advantage even if file metadata differs among separately-created instances. This idea raised at friam 2010-02-12.
kpreid added the
unknown
major
enhancement
unknown
labels 2010-02-12 19:56:53 +00:00
kpreid added this to the undecided milestone 2010-02-12 19:56:53 +00:00

Hm, the way I imagined implementing this at first was to have the client first fetch the associated metadata and then fetch the file. One way to envision the implementation would simply be to define a kind of directory which can only have one child link in it. Then take the cap to that directory and wrap it in a different cap type which means "fetch this directory then fetch the file it points to, applying all of the metadata that it contains".

But, we could also consider bundling some metadata along with the cap itself. For example, if the cap is being embedded into a URL, then include the metadata in the URL, along with the cap. Spelling out the content type in standard text format e.g. image/svg+xml would add significantly to the length of the URL, but perhaps we could define a custom compression scheme which could represent the most common types in only a character or two while falling back to uncompressed form for types that we haven't included in our compression definition.

Hm, the way I imagined implementing this at first was to have the client first fetch the associated metadata and then fetch the file. One way to envision the implementation would simply be to define a kind of directory which can only have one child link in it. Then take the cap to that directory and wrap it in a different cap type which means "fetch this directory then fetch the file it points to, applying all of the metadata that it contains". But, we could also consider bundling some metadata along with the cap itself. For example, if the cap is being embedded into a URL, then include the metadata in the URL, along with the cap. Spelling out the content type in standard text format e.g. `image/svg+xml` would add significantly to the length of the URL, but perhaps we could define a custom compression scheme which could represent the most common types in only a character or two while falling back to uncompressed form for types that we haven't included in our compression definition.

One reason that I am thinking about this is the "security-related extra metadata" that I've been ticketing about tonight: highest-known-version-number (#955), petrification-marker (#954), LAFS 301 Moved Permanently marker (not yet ticketed), etc.. It would be cool if, when I send you a URL containing a Tahoe-LAFS cap to a mutable file, I automatically include in that URL the highest version number of that file that I have ever seen, thus empowering you to reject rollback attacks which present an older file to you when you try to read it.

That one, at least, can't really be implemented in the indirection-node way (because if someone is going to rollback the file, they might also rollback the indirection-node), but would have to be in the bundled-with-the-original-URL way.

One reason that I am thinking about this is the "security-related extra metadata" that I've been ticketing about tonight: highest-known-version-number (#955), petrification-marker (#954), LAFS 301 Moved Permanently marker (not yet ticketed), etc.. It would be cool if, when I send you a URL containing a Tahoe-LAFS cap to a mutable file, I automatically include in that URL the highest version number of that file that I have ever seen, thus empowering you to reject rollback attacks which present an older file to you when you try to read it. That one, at least, can't really be implemented in the indirection-node way (because if someone is going to rollback the file, they might also rollback the indirection-node), but would have to be in the bundled-with-the-original-URL way.
zooko added
code
and removed
unknown
labels 2010-02-15 06:06:23 +00:00

If you like this ticket, you might also like #956 (embed security metadata in parent directory) and #957 (embed security metadata in URL).

If you like this ticket, you might also like #956 (embed security metadata in parent directory) and #957 (embed security metadata in URL).
zooko added
1.6.0
and removed
unknown
labels 2010-02-23 03:10:46 +00:00
zooko modified the milestone from undecided to 2.0.0 2010-02-23 03:10:46 +00:00

Is this the same as #307 (maybe add node metadata? (in addition to edge metadata))?

Is this the same as #307 (maybe add node metadata? (in addition to edge metadata))?
chrysn commented 2011-01-17 14:46:51 +00:00
Owner

as zooko correctly pointed out there, this is relevant for #1325 (make tahoe backup useable as a replacement for rsync).

personally i'd go for storing the file metadata in the directory. this does require the relevant data (mime type) to be included in the url in order to be used in connection with the file, but think about it that way: that's even true for the file name.

other reasons supporting metadata-in-directory are

  • faster access (fewer roundtrips, especially in the typical file-manager situation where a directory is listed and then all its files are stat-ed),
  • better compatibility (i guess there is a way to put additional metadata in the directory w/o breaking compatibility to older versions; doing this with intermediate nodes would be rather hard), and that
  • git does it that way too (ok, i admit, that's not really a reason).
as zooko correctly pointed out there, this is relevant for #1325 (make `tahoe backup` useable as a replacement for rsync). personally i'd go for storing the file metadata in the directory. this does require the relevant data (mime type) to be included in the url in order to be used in connection with the file, but think about it that way: that's even true for the file name. other reasons supporting metadata-in-directory are * faster access (fewer roundtrips, especially in the typical file-manager situation where a directory is listed and then all its files are stat-ed), * better compatibility (i guess there is a way to put additional metadata in the directory w/o breaking compatibility to older versions; doing this with intermediate nodes would be rather hard), and that * git does it that way too (ok, i admit, that's not really a reason).
davidsarah commented 2012-05-19 19:38:01 +00:00
Owner

[to me as a reminder to explain why the Content-Type-in-direntry feature can't be implemented on its own without the Content-Type-in-URL feature]assigning

[to me as a reminder to explain why the Content-Type-in-direntry feature can't be implemented on its own without the Content-Type-in-URL feature]assigning
Owner

Ticket retargeted after milestone closed (editing milestones)

Ticket retargeted after milestone closed (editing milestones)
meejah removed this from the 2.0.0 milestone 2021-03-30 18:40:46 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#947
No description provided.