handle arbitrary URIs in directories #683

Open
opened 2009-04-19 18:02:38 +00:00 by kpreid · 6 comments

Tahoe has things it calls URIs which identify files. For example:
URI:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

However, they are not URIs (which term is defined by RFC); in particular, URIs have the syntax :, where the possible values for are administered by the IETF:

http://www.iana.org/assignments/uri-schemes.html

Since Tahoe "URIs" do have the properties a URI should, I believe the appropriate fix for this is to register a tahoe: URI scheme. As far as I know, the "URI:" part of a Tahoe URI is always the same, so it conveys no information and can be replaced with this for only a two-character addition: tahoe:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

--- The remainder of this text is not a matter of correctness but additional functionality ---

Furthermore, so that these URIs are also URLs (readily usable for contacting the resource with no local context), I would recommend including in the the syntax of the scheme-specific-path a provision for an OPTIONAL location hint for the grid, i.e. some host that can be contacted by some protocol that can put the client in communication with appropriate storage servers. This is essentially the same provision as in CapTP URIs; borrowing their syntax, it would be like:

tahoe://example.net:1234,192.168.33.91:1234/CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295

That is, tahoe:// comma-separated list of hosts / current Tahoe-URI components.


Besides correctness in terminology, another advantage of having registered Tahoe URI syntax is that Tahoe files can participate as first-class entities in URI-based systems, and vice-versa.

For example, if Tahoe directories could store arbitrary URIs, of which Tahoe URIs were a special case, then they could include references not just to other things in the same Tahoe grid, but any other URLs-as-capabilities system, including other Tahoe grids or Waterken servers or ...whatever. You could use the data: URI scheme to store sufficiently small files directly in directories. (I vaguely recall that Tahoe might already have that capability.)

If there is a registered Tahoe scheme, then systems which work exclusively with URLs, but are extensible to handle additional URL schemes, can be extended to support Tahoe, rather than necessarily going through a Tahoe web gateway, thus providing useful information (e.g. 'this is immutable'), perhaps more efficient downloading, etc.

Tahoe has things it calls URIs which identify files. For example: URI:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295 However, they are not URIs (which term is defined by RFC); in particular, URIs have the syntax <scheme>:<scheme-specific-part>, where the possible values for <scheme> are administered by the IETF: <http://www.iana.org/assignments/uri-schemes.html> Since Tahoe "URIs" do have the properties a URI should, I believe the appropriate fix for this is to register a `tahoe:` URI scheme. As far as I know, the "URI:" part of a Tahoe URI is always the same, so it conveys no information and can be replaced with this for only a two-character addition: tahoe:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295 --- The remainder of this text is not a matter of correctness but additional functionality --- Furthermore, so that these URIs are also URLs (readily usable for contacting the resource with no local context), I would recommend including in the the syntax of the scheme-specific-path a provision for an OPTIONAL location hint for the grid, i.e. some host that can be contacted by some protocol that can put the client in communication with appropriate storage servers. This is essentially the same provision as in CapTP URIs; borrowing their syntax, it would be like: tahoe://example.net:1234,192.168.33.91:1234/CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295 That is, `tahoe://` comma-separated list of hosts `/` current Tahoe-URI components. ---- Besides correctness in terminology, another advantage of having registered Tahoe URI syntax is that Tahoe files can participate as first-class entities in URI-based systems, and vice-versa. For example, if Tahoe directories could store arbitrary URIs, of which Tahoe URIs were a special case, then they could include references not just to other things in the same Tahoe grid, but any other URLs-as-capabilities system, including other Tahoe grids or Waterken servers or ...whatever. You could use the `data:` URI scheme to store sufficiently small files directly in directories. (I vaguely recall that Tahoe might already have that capability.) If there is a registered Tahoe scheme, then systems which work exclusively with URLs, but are extensible to handle additional URL schemes, can be extended to support Tahoe, rather than necessarily going through a Tahoe web gateway, thus providing useful information (e.g. 'this is immutable'), perhaps more efficient downloading, etc.
kpreid added the
unknown
major
defect
1.3.0
labels 2009-04-19 18:02:38 +00:00
kpreid added this to the undecided milestone 2009-04-19 18:02:38 +00:00

Because of Kevin Reid's post on tahoe-dev: http://allmydata.org/pipermail/tahoe-dev/2009-May/001770.html

I have realized that this ticket contains two issues: 1. making tahoe URIs be real, official URIs so that they fit into the way other code such as web browsers use URIs, and 2. extending Tahoe directories to hold arbitrary URIs and not just tahoe caps.

They are both interesting prospects to me, and certainly related, but we should probably split off a separate ticket, so people can understand them as features that could be separately implemented.

Because of Kevin Reid's post on tahoe-dev: <http://allmydata.org/pipermail/tahoe-dev/2009-May/001770.html> I have realized that this ticket contains two issues: 1. making tahoe URIs be real, official URIs so that they fit into the way other code such as web browsers use URIs, and 2. extending Tahoe directories to hold arbitrary URIs and not just tahoe caps. They are both interesting prospects to me, and certainly related, but we should probably split off a separate ticket, so people can understand them as features that could be separately implemented.
Author

To be a little more explicit: in the message Zooko linked to, I allude to that if Tahoe directories contained general URIs, then you could insert a directory entry which is a revocable membrane to a Tahoe directory; this directory entry is not itself a Tahoe-type-URI because Tahoe, being distributed, cannot support revocation (there is no relied-upon agent in the grid to remember to abort access).

(This wouldn't just require inserting URIs in tahoe directories, though; it would also require that clients are willing to switch between the crypto-and-DHT-based ('offline', in a sense) Tahoe protocols and a talk-to-one-server-which-proxies-for-you ('online') protocol. But storing URIs in directories at least lets clients have the option of being so fancy.)

To be a little more explicit: in the message Zooko linked to, I allude to that if Tahoe directories contained general URIs, then you could insert a directory entry which is a revocable membrane to a Tahoe directory; this directory entry is not itself a Tahoe-type-URI because Tahoe, being distributed, cannot support revocation (there is no relied-upon agent in the grid to remember to abort access). (This wouldn't just require inserting URIs in tahoe directories, though; it would also require that clients are willing to switch between the crypto-and-DHT-based ('offline', in a sense) Tahoe protocols and a talk-to-one-server-which-proxies-for-you ('online') protocol. But storing URIs in directories at least lets clients *have the option of being so fancy*.)

Okay, the part about putting Tahoe caps into real URIs is already ticketed: #432 (writing down filecaps: revise URI scheme).

I'm changing this ticket to be about the second part: handling arbitrary URIs inside Tahoe directories (such as using some sort of plugin system?).

Okay, the part about putting Tahoe caps into real URIs is already ticketed: #432 (writing down filecaps: revise URI scheme). I'm changing this ticket to be about the second part: handling arbitrary URIs inside Tahoe directories (such as using some sort of plugin system?).
zooko changed title from So-called URIs aren't to handle arbitrary URIs in directories 2009-05-14 22:21:50 +00:00
warner added
code-dirnodes
and removed
unknown
labels 2009-07-03 01:00:04 +00:00

changeset:ef1b6ae8e312af21 changes the way dirnodes are processed to tolerate unrecognized URIs. This should make tahoe-1.5 able to survive new formats that come from the future (i.e. if a 1.5 client tries to read or modify a directory which has new-format entries which were placed there by some 1.6-or-beyond version). It's at least a start.

changeset:ef1b6ae8e312af21 changes the way dirnodes are processed to tolerate unrecognized URIs. This should make tahoe-1.5 able to survive new formats that come from the future (i.e. if a 1.5 client tries to read or modify a directory which has new-format entries which were placed there by some 1.6-or-beyond version). It's at least a start.
davidsarah commented 2009-10-28 03:33:50 +00:00
Owner

Tagging issues relevant to new cap protocol design.

Tagging issues relevant to new cap protocol design.
Owner

Tahoe "URI"s are specific to a particular grid; without that piece of information you have no particular way of knowing how to access the referenced object. Including the host/IP information as a hint in a tahoe: URI is useful, but they're only hints; they can become invalid without the underlying objecting being invalid.

I think, therefore, a tahoe: URI must include some kind of unambiguous grid identifier so that it uniquely globally identifies a particular object. Some kind of connection hint may also be useful, but that seems like a layering violation (since IPv4, or IP in general, is not the only possible transport for Tahoe).

I guess this is related to issue #403.

Tahoe "URI"s are specific to a particular grid; without that piece of information you have no particular way of knowing how to access the referenced object. Including the host/IP information as a hint in a tahoe: URI is useful, but they're only hints; they can become invalid without the underlying objecting being invalid. I think, therefore, a tahoe: URI must include some kind of unambiguous grid identifier so that it uniquely globally identifies a particular object. Some kind of connection hint may also be useful, but that seems like a layering violation (since IPv4, or IP in general, is not the only possible transport for Tahoe). I guess this is related to issue #403.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#683
No description provided.