handle arbitrary URIs in directories #683
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#683
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Tahoe has things it calls URIs which identify files. For example:
URI:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295
However, they are not URIs (which term is defined by RFC); in particular, URIs have the syntax :, where the possible values for are administered by the IETF:
http://www.iana.org/assignments/uri-schemes.html
Since Tahoe "URIs" do have the properties a URI should, I believe the appropriate fix for this is to register a
tahoe:
URI scheme. As far as I know, the "URI:" part of a Tahoe URI is always the same, so it conveys no information and can be replaced with this for only a two-character addition: tahoe:CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295--- The remainder of this text is not a matter of correctness but additional functionality ---
Furthermore, so that these URIs are also URLs (readily usable for contacting the resource with no local context), I would recommend including in the the syntax of the scheme-specific-path a provision for an OPTIONAL location hint for the grid, i.e. some host that can be contacted by some protocol that can put the client in communication with appropriate storage servers. This is essentially the same provision as in CapTP URIs; borrowing their syntax, it would be like:
tahoe://example.net:1234,192.168.33.91:1234/CHK:twpnflhnjeubo2tluuglxrbvdu:oan4set42mwkwxonqmq4xlull6ggnl2f2zggjmp6fgji7uv7py2a:3:10:34295
That is,
tahoe://
comma-separated list of hosts/
current Tahoe-URI components.Besides correctness in terminology, another advantage of having registered Tahoe URI syntax is that Tahoe files can participate as first-class entities in URI-based systems, and vice-versa.
For example, if Tahoe directories could store arbitrary URIs, of which Tahoe URIs were a special case, then they could include references not just to other things in the same Tahoe grid, but any other URLs-as-capabilities system, including other Tahoe grids or Waterken servers or ...whatever. You could use the
data:
URI scheme to store sufficiently small files directly in directories. (I vaguely recall that Tahoe might already have that capability.)If there is a registered Tahoe scheme, then systems which work exclusively with URLs, but are extensible to handle additional URL schemes, can be extended to support Tahoe, rather than necessarily going through a Tahoe web gateway, thus providing useful information (e.g. 'this is immutable'), perhaps more efficient downloading, etc.
Because of Kevin Reid's post on tahoe-dev: http://allmydata.org/pipermail/tahoe-dev/2009-May/001770.html
I have realized that this ticket contains two issues: 1. making tahoe URIs be real, official URIs so that they fit into the way other code such as web browsers use URIs, and 2. extending Tahoe directories to hold arbitrary URIs and not just tahoe caps.
They are both interesting prospects to me, and certainly related, but we should probably split off a separate ticket, so people can understand them as features that could be separately implemented.
To be a little more explicit: in the message Zooko linked to, I allude to that if Tahoe directories contained general URIs, then you could insert a directory entry which is a revocable membrane to a Tahoe directory; this directory entry is not itself a Tahoe-type-URI because Tahoe, being distributed, cannot support revocation (there is no relied-upon agent in the grid to remember to abort access).
(This wouldn't just require inserting URIs in tahoe directories, though; it would also require that clients are willing to switch between the crypto-and-DHT-based ('offline', in a sense) Tahoe protocols and a talk-to-one-server-which-proxies-for-you ('online') protocol. But storing URIs in directories at least lets clients have the option of being so fancy.)
Okay, the part about putting Tahoe caps into real URIs is already ticketed: #432 (writing down filecaps: revise URI scheme).
I'm changing this ticket to be about the second part: handling arbitrary URIs inside Tahoe directories (such as using some sort of plugin system?).
So-called URIs aren'tto handle arbitrary URIs in directorieschangeset:ef1b6ae8e312af21 changes the way dirnodes are processed to tolerate unrecognized URIs. This should make tahoe-1.5 able to survive new formats that come from the future (i.e. if a 1.5 client tries to read or modify a directory which has new-format entries which were placed there by some 1.6-or-beyond version). It's at least a start.
Tagging issues relevant to new cap protocol design.
Tahoe "URI"s are specific to a particular grid; without that piece of information you have no particular way of knowing how to access the referenced object. Including the host/IP information as a hint in a tahoe: URI is useful, but they're only hints; they can become invalid without the underlying objecting being invalid.
I think, therefore, a tahoe: URI must include some kind of unambiguous grid identifier so that it uniquely globally identifies a particular object. Some kind of connection hint may also be useful, but that seems like a layering violation (since IPv4, or IP in general, is not the only possible transport for Tahoe).
I guess this is related to issue #403.