smaller and prettier directory URIs #102
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#102
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our webapi.txt document currently contains the following admonition:
This is an unfortunate wart. Can we find a way to remove this requirement?
My current thought is to define the dirnode URI syntax as:
URI:DIR:(vdrive-server):(storage-index)
as it is currently, but then declare that the (vdrive-server) part (which contains a FURL) shall always be base32-encoded. This would turn the typical 116-character long URI into one that is 165 characters long, but it would keep the FURL as an opaque string.
An alternative which I'm not really fond of would be to extract out the FURL components (relying upon their format and encoding), and re-packing them in the dirnode in a way that avoids the slash problem. For example,
URI:DIR:tubid:ipaddr+port,ipaddr+port:swissnum:storage-index
. I really don't want to break the abstraction boundary of a FURL this way.If we knew that FURLs never used some character "X" which was safe to use in a URL, then we could declare that our vdrive-server spec is a FURL with the slashes replaced by X. This would be a part of the dirnode specification, so dirnode URIs everywhere would look like this, not just in the web API, which would be a big improvement.
The problem is, what would be a suitable value of X? FURLs use characters from the set
a-z0-9:/@,
, plus whatever characters the programmer decides to use inside a name passed to registerReference() (which is currently unbounded but I think it's fair to impose some restrictions on them). I think that slashes were the only real problem (basically it seemed that the apache reverse proxy that was causing us problems was url-decoding the URL, splitting on slashes, re-encoding the remaining pieces, then sending the results to the backend server, so url-encoding the slashes didn't help, but perhaps other characters remain encoded safely). If that's the case, it would be safe (although kind of ugly) to replace the slashes with anything outside the FURL character set (say ~ or & or % or ^ or !), although if it's something that has special meaning then it will require that we always url-encode the dirnode URI before passing it to the web server, which is an easy thing to screw up.The colons are another problem, since we use them to delimit components of the URI itself. Our current parser works because we only have one field that could contain colons, so we pull the prefix from the left and the storage-index from the right and then whatever's left must be the furl. But that's kind of a wart too.
So, I dunno. Establishing a rule that the dirnode-server portion is always packed according to the following seems like my current favorite approach, although I'm not yet that happy about it:
This approach would give us dirnode URIs that look like
URI:DIR:pb$^^t7p44biq3u6i5r5zjpb6cdqxid7v7vpx@192.168.69.247$58845,127.0.0.1$58845/vdrive:w57ncp9cmzyb6kwrjaebq7d8co
and are still 116 characters long.
Please see my suggestion in:
http://allmydata.org/pipermail/tahoe-dev/2007-August/000097.html
This is part of the "improved web API" task. I would like to see it done for v0.6.
I think the next step is for me to propose "compressed furls", possibly also with an implementation, for foolscap. See foolscap trac ticket 24:
http://foolscap.lothar.com/trac/ticket/24
See also ticket #120 and #105, where it is shown that dirnode URIs might need to be pasted into shells, multiplying the number of characters that will cause trouble (e.g. "$"), and emphasizing the usability cost of dirnodes being large.
web POST action requires munged dirnode URIto smaller and prettier directory URIsthis will be mostly fixed by the distributed-dirnodes fix (#115), as dirnode URIs become just like mutable-file URIs. We just need to decide upon a reasonable length for the crypto pieces. The dirnodes will need to have two hash values: one will be used as an AES key, the other is a validation hash.
The first version of #115 is #197.
[source:docs/mutable.txt]docs/mutable.txt says:
URI:SSK-RW:b2a(writekey):b2a(verification_key_hash)
URI:SSK-RO:b2a(readkey):b2a(verification_key_hash)
URI:SSK-Verify:b2a(storage_index):b2a(verification_key_hash)
If we make writekey and verification_key_hash each be 256-bit values, then a RW URI would look like this
URI:SSK-RW:j13ax9dtuxzxim5yg9a7e8xupjqq4t56tdprwi9ryqupid59xa6y:bux13ehzebbokwng7w6wzswyfppog6nqt3ndu3jxoz8kbbkihz4o
.If we made writekey and verification_key_hash each be 128-bit, then it would look like this:
URI:SSK-RW:j13ax9dtuxzxim5yg9a7e8xupe:bux13ehzebbokwng7w6wzswyfc
.I would be comfortable with reducing the writekey size and the verification_key_hash size to something in the range of 100 bits each:
URI:SSK-RW:j13ax9dtuxzxim5yg9a7:bux13ehzebbokwng7w6w
.Most users of these strings won't care about which part is the verification hash and which part is the key (and those users that do care can use slicing), so we could leave out the separator between those two:
URI:SSK-RW:j13ax9dtuxzxim5yg9a7bux13ehzebbokwng7w6w
.The ":" stops my double-click from selecting the whole word (which suggests that users might cut-and-paste only the end part, thinking that the "URI:SSK-RW:" is not necessary), so how about:
MUTRWj13ax9dtuxzxim5yg9a7bux13ehzebbokwng7w6w
?I looked for a special character to put between the "W" and the "j", but I guess special characters have the problem that they get treated specially by text editors -- also possibly by users.
What do you think?
Alternately, we can treat the leading parts as meant for user clarification and not actually a necessary part of the URI, so it could be spelled something like
MUT-RW:j13ax9dtuxzxim5yg9a7bux13ehzebbokwng7w6w
, and the app would accept input from the user of the formj13ax9dtuxzxim5yg9a7bux13ehzebbokwng7w6w
and do the right thing with it.I prefer this last form. I vote for mutable file URIs to look like:
MUT-RW:j13ax9dtuxzxim5yg9a7bux13ehzebbokwng7w6w
.Now how does the code distinguish mutable files from mutable directories? We've previously discussed putting that type bit into the URI, but now I think this is a bad idea. Not only because it adds to the size of the URI, but also because if the user accidentally twiddles that bit then they get a file of binary garbage when they were supposed to get a directory. I guess URIs are a little too fragile to hold type bits.
What do you think?
Hm. Actually, I feel unease. This demonstrates that I'm not really perfectly comfortable with 100-bit crypto values. Not, of course, that I'm worried about attackers brute-force computing something on the order of 2^100^ computations, but I'm worried about bugs and novel attacks which reduce the effective strength, or leaks partial information.
So how about 128-bit write keys and 127-bit verification hashes?
MUT-RW:j13ax9dtuxzxim5yg9a7e8xupegp6mfd17yrgbkoe5su4164oyi
If we had 128-bit verification hashes, then it would look like this if the last bit was 1:
MUT-RW:j13ax9dtuxzxim5yg9a7e8xupegp6mfd17yrgbkoe5su4164oyio
and this if the last bit was 0:MUT-RW:j13ax9dtuxzxim5yg9a7e8xupegp6mfd17yrgbkoe5su4164oyiy
. It doesn't seem worth it to use a whole character (which is "o" or "y") to represent one bit.Likewise, you could enlarge the key from 128 to 130 bits and the verification hash from 127 to 130 bits, at the cost of adding one character to the URI.
My current ideas for URI format (which we should probably rename "printable
representations of filenode/dirnode access capabilities" or something more
accurate):
Directories which use these files as a backing store then use a short prefix
to indicate how the file contents should be interpreted:
DIR_readkey_uebhash
: immutable directory tree, i.e. "Virtual CD" #204DIV_storageindex_uebhash
: verifier for virtual CDDMW_writekey_pubkeyhash
: normal read-write dirnodeDMR_readkey_pubkeyhash
: read-only dirnode (still mutable by others)DMV_storageindex_pubkeyhash
: dirnode verifierDLW_stuff
: large dirnodesEach of these formats should have an internal binary representation, which is
the "non-printable serialized filenode/dirnode access capability", and that
is the form that should be stored in dirnodes. The printable forms should
just be used by external APIs and UI tools like the web interface. The binary
representation should probably start with a single non-printable byte so we
can have code that accepts both printable and non-printable forms.
Oh, and of course the use of underscores in those URIs is to allow double-click to select the whole URI (versus the current colons, which most systems treat as word breaks). That will make it easier to cut-and-paste URIs into and out of tahoe UIs like the web page. It might also make URIs less vulnerable to wrapping and corruption by things like MUAs and mailing list software.
We should check to see if that actually works on all our platforms of interest.
Oh, and it might be a good idea to declare that all places you can paste in a URI (like on a web page) will remove all whitespace (both inside and out), to allow the pieces of a wrapped URI to be reassembled. I'm not sure how reliable that would be, though.
Firefox on Macintosh breaks word-selection on underscore.
I think that separating the different crypto pieces from each other is more useful for tahoe hackers than for tahoe users.
I still don't know if we intend for the "is this a file or a directory" typing information to be present only in the URI, or also elsewhere, i.e. what is called a "URI extension block" in the context of CHKs.
I think we ought to do the latter (store that information in a place where it is quite inconvenient for a user to change it) and make the typing information in the URI be optional/advisory.
If it would cause real problems for the user to mangle or omit the typing information in the URI, then I think it ought to be glommed onto the crypto information with no intervening special characters. (Although it is okay for the typing information to be capitalized and the crypto information to be lowercase.)
I've been reading about key lengths (http://keylength.com and Ferguson & Schneier's Practical Cryptography among other sources), and worrying about the long-term security of smaller crypto values.
After all, if tahoe is relied upon as a storage system, then it may well be used for long-term storage. Ferguson & Schneier write that any cryptosystem deployed today might be in use for 30 years, and that once it is decommissioned, it ought to continue to provide backwards confidentiality for at least 20 more years.
Symmetric encryption keys of size 128 or so bits seem likely to last for 50 years, but secure hash values of 128 or so bits might not last for 30 years, in part because secure hashes and SHA-256 have not been really studied and optimized by cryptographers the way that symmetric ciphers and AES have. (Ferguson & Schneier wrote in Practical Cryptography -- 2003 -- that they generally regard the public crypto community as knowing as much about secure hashes as they knew about symmetric ciphers in the 1980's.)
Then I had a bit of a brainstorm -- tahoe capabilities can be canonically defined as containing full 256-bit SHA-256 outputs, like this:
MUT-RW:upyf5nwrpccqw4f53hiidug96663eo5qq4hna4prbragh9e554eou7tqn1ife4tiiuw5eu73ihiia
, but can be truncated for human convenience, e.g. to 128-bit hash values, like this:MUT-RW:upyf5nwrpccqw4f53hiidug96663eo5qq4hna4prbragh9e554eo
.The neat thing about this is that you can store the full hash in long term storage (for example, in tahoe directories pointing at other tahoe directories or files), but use the truncated form for short-term exchange through user-friendly tools like IM and e-mail.
Obviously there is a risk that someone stores the short form and wants to use it many years hence and therefore incurs more risk that the resulting file has been substituted by an attacker, but people who are conscious of the fact that they are storing a tahoe cap for the long-term can easily use the full form.
Neat idea. I've been pondering doing something like this with foolscap tubids
to allow people to get shorter FURLs.
The implementation details would include:
bits get checked. If you want to play fast and loose, leave the hash blank.
putting SI in the URI: this would allow storage servers to verify their
own shares up to the signature, and gives us more options to protect
against people uploading bogus data in the future. We'd need to declare
some minimum length for the SI in this case (enough to provide adequate
collision-resistance for billions of files), but that can still give
some flexibility of how many bits of the hash(pubkey) you need to
paste into an email
The main concern that I'd have would be the usual consequences of hash
collisions:
have the first N bits of their UEB hashes be equal
N bits of the UEB hash
contract as referenced by the URI
from the grid, then upload the bad contract
contract
The obvious answer is to tell people to not bind themselves to anything with
an insufficiently long hash.. there are "secure" URIs and "insecure" ones.
Not a major concern, but we'd want to make sure to document safe handling
procedures for URIs w.r.t. the strength of their identification properties.
Our current plan is to use the new crypto scheme described in #217 -- "better crypto for mutable files -- small URLs, fast file creation" so that we can have only one crypto value in a capability, and make crypto values be 256-bits, and use base-62 encoding so that the resulting strings are still double-clickable and googlable.
A related change is to stop calling them URIs! They are "caps". caps! caps! caps! Yay, caps!
Tagging issues relevant to new cap protocol design.
Er, isn't the description of this ticket about something that was fixed long ago?
I don't think there's anything remaining here that isn't covered by #882 and #432.