reject mutable children when *reading* an immutable dirnode #833
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#833
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In #607 (DIR2-CHK), Zooko suggested that in addition to refusing to allow mutable children when creating an immutable dirnode, the tahoe code should guard against buggy/noncompliant implementations and complain when it sees a mutable child in a pre-existing immutable dirnode. The code should log an error of some sort and ignore the child.
When I implemented #607, I didn't do this. It should be done. The trickiest part is testing it, and deciding where to send the error message.
This is a security (integrity) issue, because if I have an immutable directory, I should be able to assume collision-resistance for the whole of the subtree beneath that directory.
I agree and I think we should do this before releasing immutable directories in Tahoe-LAFS v1.6.
To clarify, I think this is a critical security issue because if you
tahoe cp -r $IMM_DIR_NODE ./local
and then give$IMM_DIR_NODE
to your friend, and she alsotahoe cp -r $IMM_DIR_NODE ./herlocal
, then you can be assured that she has all the same stuff that you do, even if the original creator of the directory that you are copying tries to trick you so that you and your friend get different results. This is the "deep" analogue of #491 (URIs do not refer to unique files in Allmydata Tahoe).A consequence of fixing this would be that you can be almost sure that an immutable directory represents a directed acyclic graph, because it should be infeasible to create a cycle of hashes. This seems like a potentially useful property.
On the phone, Zooko and I discussed the following plan:
type="unknown"
instead oftype="filenode"
ortype="dirnode"
. This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still includerw_uri
andro_uri
.This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their
tahoe cp -r $IMM_DIR_NODE ./herlocal
command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents)davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.
Replying to warner:
I'm a bit concerned that this approach would make it too easy to inadvertently write a webapi client that fails to enforce the restriction. Also, having to duplicate the check in all frontends seems like an indication that it is being checked at the wrong layer, i.e. that the webapi should be enforcing it instead.
Is there any reason why such entries need to be accessible at all? It doesn't seem different from any other incorrectly encoded directory entry, which might not be accessible.
I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.
This would mean that "tahoe ls" couldn't even acknowledge the existence of such caps. Older clients would be more likely to overwrite them if they can't see them at all. And it limits the ability of other tools (perhaps smarter or more up-to-date than the tahoe node they're using) to do something with those caps.
I dunno what's the best thing to do. I agree that it feels a bit leaky, but I'm also uncomfortable with censoring out the existence of caps-from-the-future.
Replying to warner:
Not necessarily. If the directory is immutable, we could just omit
rw_uri
.Hey that's a good idea! :-)
That is a good idea, and an explicit form of it should go into
source:src/allmydata/web/directory.py#L801 (in
DirectoryJSONMetadata._got
), to stomp out therw_uri
return valueif the directory being listed is readonly.
The directory unpacking function (source:src/allmydata/dirnode.py#L256, in
DirectoryNode._unpack_contents
) will only return arwcap
value ifthe dirnode has a writekey, which is only the case for mutable dirwritecaps
(neither mutable dirreadcaps nor immutable directories have a writekey). But
if the
rocap
value happens to really be a writecap (like if somenon-conformant implementation placed one there),
DirectoryNode.list
would return a writeable filenode, and then the webapi directory listing
would get a
rw_uri
value, and would return it to the client without acheck to explicitly stomp it out.
(I hadn't previously considered the problem of writecaps stored in the rocap
column.. for recognized caps, we can block them, but of course we can't
really say anything about caps-from-the-future).
But I don't think that's enough. If our claim is that "each time a conformant
Tahoe client copies out the contents of an immutable directory, they will be
exactly the same", then only serving
ro_uri
isn't strong enough(
ro_uri
is always read-only, but not always immutable). If somenon-conformant implementation puts a mutable readcap into an immutable
directory's
ro_uri
column, then a conformant client which only reportsro_uri
will still enable "tahoe cp" to copy from something mutable, andthen users could see different contents on different copies.
(If we'd decided to provide an
imm_uri
in addition toro_uri
andrw_uri
, then maybe we could have omitted everything butimm_uri
,but we didn't, and besides I think it would have made the dirnode columns
even more complex).
You know, we really should include an
imm_uri
, shouldn't we? I guess the way we are currently doing it means "If you are listing a mutable dir, then thero_uri
is a mutable child, and if you are listing an immutable dir, then thero_uri
is an immutable child.". Is that right? But what if there is an immutable child in a mutable dir?So in addition to whatever plan we settle on for reliably excluding known-mutable children, is there any use in predicting what format caps from the future might use to indicate read-only, read-write, and immutable? In my Tahoe-LAFS plugin for TiddlyWiki I made it recognize the current cap format and also:
http://allmydata.org/source/tiddly_on_tahoe/trunk/tahoe_tiddly/TahoePlugin.js
I'm not sure that does any good, in particular because we may end up using different indicators for cap-type, maybe in part because having a separator char there, while good for human eyeballs, is bad for selecting with double-click of a mouse...
Anyway, if you could think for a minute about forward-compatibility and make sure that such things are not of use in this case and state conclusively that Tahoe-LAFS v1.6 should treat all unrecognized caps (whether from the future, corrupted, or malicious) that it finds as the children of immutable directories as mutable and unreadable that would make me feel better. :-)
hm. I guess immutability is a property of the object, while readability/writeability is a property of the cap (think of the cap as a facet here). There's no such thing as an "immutable cap" to a mutable object.
So maybe instead of "imm_uri" we should put another property on the JSON stuff we return, an "immutable" boolean.
Of course, for caps-from-the-future, we'd leave this empty, because we don't know. Then we'd need to decide whether the webapi promises
all([for x in immdir.list()]x.immutable) == True
(in which case it would be obligated to strip out everything except known-immutable children), or whether it promises something smaller.WRT to anticipated future caps, I can currently imagine append-only caps, write-with-storage-authority caps, verify/repair caps, destroy caps, revocation caps, and others. Very few of these cleanly break down into mutable/immutable, or read/write. So I don't see an overwhelming amount of value to trying to anticipate them too much, and therefore having current code attempt to deduce much of anything about caps-from-the-future.
So yes, I believe that 1.6 should treat all unrecognized caps as unknown, and any operation that promises to only return caps that are known to have some particular property should be obligated to either filter out unrecognized caps or mark them in some way such that clients can easily tell which are which.
I'm still undecided as to whether marking nodetype==
unknown
or removing them entirely is better.If I understand correctly an
"mutable"
flag is a good idea, and should perhaps be quickly added into v1.6 in the interests of forward-compatibility, but is orthogonal to this issue of how to exclude mutable children from the listing of a mutable directory.What if we had a
"bad_children"
element in the"dirnode"
JSON, adjacent to the current"children"
element: source:docs/frontends/webapi.txt?rev=4112#L509. This would give the programmer all the information about the bad children while making it very unlikely that someone would accidentally include bad children in among the good children. :-)Oh, and although I really like the name
"bad_children"
, I guess it also includes good children from the future, so we ought to name it something like"unrecognized_children"
. How is a programmer to know (possibly just out of curiousity) why each child is in that set? Perhaps there is a flag on each child saying either"mutable=true"
or"known_format=false"
or something?Oh, what about having two separate keys next to
"children"
--"bad_children"
for mutable children of an immutable directory and"unrecognized_children"
for children from the future or corrupted.I guess it boils down to whether we want the implementation written by a lazy programmer to exclude the children entirely from the listing or to include them but not make them "live" (clickable, cd'able, cp'able, etc.) -- just make them "greyed-out".
If we want them excluded, then they shouldn't appear as keys under
"children"
at all. If we want them included but "greyed-out", then they should appear as keys under"children"
but they should not have a"ro_uri"
or"rw_uri"
key.So maybe this means that mutable children of an immutable directory should appear in a separate
"bad_children"
key or shouldn't appear at all, while unrecognized children (hopefully from the future) should appear in the"children"
key while having no"ro_uri"
or"rw_uri"
.Brian wrote:
David-Sarah wrote:
I wrote:
No wait, that's a bad idea. A read-only cap to a mutable directory is not a read-cap to an immutable directory. I keep getting confused about this, and in the history of Tahoe-LAFS Brian and I have been confused about it more than once. I'm going to get a post-it note and affix it to my monitor, saying:
"READ-ONLY DOESN'T MEAN IMMUTABLE."
Brian and I chatted about this on IRC yesterday. At the end I got sleepy and confused about what constraints we impose on the web gateway with regard to the JSON-formatted directory listings that it delivers to web clients. It is a question of forward-compatibility -- today's web clients may read JSON-formatted directory listings from web gateways from the future. This is a separate but related issue to today's web gateways (which are also Tahoe-LAFS storage clients) reading distributed directories that contain children inserted by other Tahoe-LAFS storage clients from the future. That latter issue is what most of our thinking on this ticket has been about so far.
I'm not sure what to think about that whole idea -- I haven't thought it through yet.
However, now in the light of morning one particular issue that seems clear to me is that the test that today's web gateway uses to determine whether a given child is a recognized or unrecognized child should be as specific as possible as long as it never marks a child from today as unrecognized. Brian tells me that
uri.from_string("URI:CHK:<a>look at me I'm evil<a>")
raises aBadURIError
instead of what we would probably prefer which is to return anUnknownURI
. I don't see why it does that from looking at the code. Here is a map to the relevant code:The web gateway (which is a Tahoe-LAFS storage client) unpacks the directory contents with [its internal _unpack_contents() method]source:src/allmydata/dirnode.py@4119#L236 and passes the result to its caller from [its list() method]source:src/allmydata/dirnode.py@4119#L302. Already looking at the code I want more documentation here. Could
list()
grow a docstring that tells the caller more about what sorts of things it might find in the caps of the children that it returns?So the caller in this case takes the caps and passes each one to [NodeMaker.create_from_cap()]source:src/allmydata/nodemaker.py@4106#L47 which returns
UnknownNode
in case both itswritecap
andreadcap
parameters are empty and then calls [uri.from_string()]source:src/allmydata/uri.py@4101#L503.uri.from_string()
''looks'' to me like it is intended to, and does, return anUnknownURI
instance when the cap is not any of the known formats, but Brian says otherwise.So my proposals right now are:
DirectoryNode.list()
.uri.from_string()
to returnUnknownURI
whenever the cap is clearly not a well-formed cap of a known type and add a unit test of this.For reference,
uri.from_string("URI:CHK:<a>look at me I'm evil</a>")
starts by dispatching on a string prefix: we see theURI:CHK:
prefix and stop considering anything else (includingUnknownURI
). Then the CHK-specific handler class applies a regexp to thelook at me
part and throwsBadURIError
.The requirement is that any new format we introduce must not share one of our previously-claimed prefixes. The prefixes we've defined so far all look like
/^URI:A-Z\-+:/
, but I imagine we'll addtahoe://
or something in the future, and probably some binary formats that will also fit into this parse tree because they start with something non-ascii.So there's plenty of room for new prefixes, but not room for new shapes on current prefixes.
Oh, so this suggests a further refinement: child links that have caps that begin with one of our defined prefixes such as
URI:CHK:
but which are not well-formed according touri.from_string()
will not be treated as child links potentially-from-the-future but instead as bad child links, the same as child links which try to provide mutable access to you when they are in a mutable directory.Zooko, Brian and I discussed this on IRC, and initially came up with the following set of options. All the options have in common that when a child link of an immutable directory is recognized as being mutable, it should just be omitted. They differ on what should happen with unknown child links (i.e. caps from the future):
rw_uri
andro_uri
from its JSON representationro_uri
unknown_ro_uri
field that is used instead ofro_uri
. (For mutable directories, unknown children would have bothunknown_rw_uri
andunknown_ro_uri
fields.)lafs://
), so that non-immutable caps can be excluded without having to understand their format.There are also variations of each option according to whether the restriction is implemented in the webapi code or in DirectoryNode. If it is implemented in the webapi, then read-modify-write operations will preserve unknown caps that aren't directly affected by the operation. I think we decided that this is preferable because it loses data in fewer cases -- although it may introduce some inconsistencies between frontends, unless we manage to fix that for 1.6.
There was concensus that RO_URI option is unsafe (i.e. fails to address the issue in this bug), because it would be too easy for a
ro_uri
to be passed on to another client that understands it, but having lost track of the constraint that it is supposed to be immutable.We had already excluded FORWARD_COMPAT_METADATA on the basis that it requires making decisions that we don't have time to consider carefully enough for 1.6. UNKNOWN_RO_URI has the advantage of not throwing away information, but we decided it was too complicated to implement and review in the time we have. OMIT for 1.6, and FORWARD_COMPAT_METADATA in 1.7, seemed like a good compromise.
Overnight I thought of the following additional option, which has similar forward-compatibility advantages to FORWARD_COMPAT_METADATA but has fewer constraints on the future URL design:
ro_uri
with "imm.
" (and omitrw_uri
).Note that this doesn't imply that the new format would have to use "
imm.
" -- that prefix would only occur on URLs from the past. When we add support for the new format, the future client would see the "imm.
" prefix and check that the cap is immutable; if it is then it would work correctly.(We might end up registering both "
lafs:
" and "imm.lafs:
" in order to make this work in more cases, but I don't think that's a problem, and it's not strictly necessary. Anyway, there are already URI schemes containing '.
'.)In retrospect this is similar to something Zooko suggested on IRC ('mangle the cap by prepending "I'M NOT REALLY A CAP"'), except that "
imm.
" just means that the future client must check that it is an immutable cap.The effect of prepending "
imm.
" with the current webapi is to display"GET unknown: can only do t=info, not t="
, which is safe but ugly. We should fix that (it's ticket #884).Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future). It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).
Replying to zooko:
Oh, I hadn't thought of that. I suspect we won't end up doing it that way if we wait until 1.7 to fix #839, though, because at that point we'll probably know enough about what the new cap URLs are going to look like to distinguish mutable from immutable, and therefore we can refuse outright to copy future mutable caps. Copying them as
imm.
-prefixed URLs has the disadvantage that you're granting authority to read subsequent updates to the original mutable object (to someone who strips off theimm.
), which a deep-copy operation normally wouldn't do. So this would not be completely safe.Yes, that's a big advantage.
Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.) grep for
CannotPackUnknownNodeError
to see where this is enforced.It would however be safe to allow a client that is rewriting the whole directory to preserve entries containing caps that it does not understand. I'm not sure yet whether I'll be able to do that for 1.6.
Replying to davidsarah:
I've changed my mind on this. In all but two of the webapi interfaces that involve creating or changing child dirnodes, the operation either creates the caps that it adds (in which case they are known), or is given them in the format {"rw_uri": ..., "ro_uri": ...}. So if the webapi client knows how to attenuate the cap (or already has it in that form) but the webapi server -- i.e. storage client -- doesn't, then that's OK.
The two (almost equivalent) interfaces that don't take a cap with separate "rw_uri" and "ro_uri" slots, are [PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri]source:docs/frontends/webapi.txt#L647 and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, i.e. the "hard link existing cap" operations. In that case we don't know how to attenuate the cap, so we cannot safely accept an unknown cap to be added.
However, I don't think there's any reason to restrict the other operations. We obtain no security benefit from doing so, because a malicious client (with the relevant writecap) can always write whatever it likes into a directory. The important thing is to make sure that clients who obtain caps from the directory are not confused.
So, I propose to remove the following check in source:src/allmydata/dirnode.py#L402 (and similarly at L427):
(and remove
CannotPackUnknownNodeError
). Instead, the logic used by a webapi server when packing (encoding) and unpacking (decoding) a dirnode will be as follows:If the directory is immutable,
rw_uri
slots (i.e. silently ignore whatever was there) when unpacking.rw_uri
slot was present when packing.ro_uri
is unknown, strip any "ro." or "imm." prefix and then prepend "imm.".If the directory is mutable,
rw_uri
slots when packing and unpacking.ro_uri
is unknown and it does not already have a "ro." or "imm." prefix, then prepend "ro.".An URI is "unknown" if [
from_string
in uri.py]source:src/allmydata/uri.py#L504 returns an instance of UnknownURI.So the effect is that the URI returned in the
ro_uri
slot will be unusable by correct clients if it does not meet the constraint that it should have met to be in that slot -- i.e. being either read-only or deep-immutable.The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the
rw_uri
andro_uri
slots should be consistent with each other -- i.e. if both are present then they point to the same object, andro_uri
is the read-only version ofrw_uri
(assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.
(Note that if we read the directory twice, then the two reads are not guaranteed to be consistent. This issue doesn't arise for immutable directories because only the
ro_uri
slots will be present.)I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when reading the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS. (I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)
But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints.
But the storage client should enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory.
Okay, so the practice of prepending an
imm.
to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them. It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending ofimm.
is a way to mark the context from which that cap came.The reason we can't prepend
imm.
to every child link from an immutable directory andro.
to every link from aro_uri
slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognizero.
orimm.
.This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has
imm.
on the front of it then you should check whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off theimm.
and use the cap. Likewise withro.
. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prependimm.
even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prependingimm.
only to child links whose type you don't recognize.Replying to davidsarah:
I'm not sure either. Let's see: source:src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap.
(Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.)
Should the webapi server tag such caps with something like
unk.
so that the fact that they came from a context where this property couldn't be verified is not lost?Replying to [zooko]comment:29:
The constraints are only on decoding future cap URIs. When a future version decodes an URI that starts with "
ro.
" and is followed by the encoding of a future cap, it must check that the cap is read-only. When it decodes an URI that starts with "imm.
" and is followed by the encoding of a future cap, it must check that the cap is immutable. If those constraints are not met then decoding must fail (but it is still safe to handle the URI in undecoded form).There are no constraints on current clients, or on handling URIs as opaque strings.
(If an entity were to strip off a "
ro.
" or "imm.
" prefix without checking the constraint, it could confuse itself, so it shouldn't do that.)Correct, because it's not decoding those child URIs.
That would mean that a directory operation could have side-effects on child links that it isn't defined to alter. No component needs to check URIs that it isn't decoding, so this would be consistent between components.
Right.
Yes. (Currently it does refuse to put child links of unknown types into any directory, but this patch would change that.)
As I said, the constraint is only necessary on decoding. A future version of the storage client could redundantly check the constraint when it passes on an URI to a webapi client, and it should probably do so even if only to reduce the length of the URI (since it can then strip off the 3 or 4 character prefix).
Exactly.
OK, I've made this improvement in my patch.
Replying to [zooko]comment:30:
Except that if the most powerful cap is of unknown type,
_create_from_cap
(note the underscore) will returnNone
, socreate_from_cap
will returnUnknownNode(writecap, readcap)
, which doesn't ensure that the cap pair is consistent.So, the proposed behaviour for 1.6 is identical to the current behaviour here: known cap pairs retrieved from the webapi are consistent, but unknown ones may not be.
What's important here is off-line diminish from a writecap to a readcap. All the proposed designs support that. What might not be supported is off-line diminish from a short readcap to a verifycap. (Elk Point does not support that in cases where the readcap is too short to provide sufficient integrity.)
What would the webapi client do with that fact?
imm.
andro.
are associated with specific conformance requirements, butunk.
wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)Replying to [davidsarah]comment:32:
... although it should probably avoid decoding caps since that is the webapi server's responsibility. Which reminds me: we don't seem to have either a webapi operation or a CLI command to diminish a cap.
Okay, so the meaning of
imm.
in English is "I found this cap in an immutable context and I couldn't verify whether it is an immutable cap." rather than "I found this cap in an immutable context.". Sounds okay to me.Replying to [davidsarah]comment:32:
Okay, I'm convinced.
Great! Are we out of open issues for this ticket?
I chatted with Scott Kittermann, a Master of the Universe, and he said we still had an excellent chance of getting Tahoe-LAFS v1.6 included in Lucid if we upload it to them on Monday 25th instead of (as per our previous plan) Wednesday the 20th. So let's do it!! :-)
Replying to [davidsarah]comment:31:
I misread your comment -- you said a different directory. However, there's no webapi operation that directly copies caps from one directory to another. Copying is implemented by getting the JSON representation of a directory, and using
mkdir-with-children
ormkdir-immutable
orset-children
to create or modify another directory (I assume; I haven't looked at the implementation oftahoe cp
). So this is covered by the validation of known cap pairs that is done on directory reads.Yes, I think we are out of open issues. Yay for Tahoe 1.6 in Lucid!
If we must use prefixes, let's use
unknown.
instead ofunk.
. We don't need these to be short, and abbreviated-to-the-point-of-incomprehensibility keywords drive me nuts.I don't know if the same argument applies to
imm.
orro.
or whatever.I haven't read the rest of the discussion thoroughly enough to have an intelligent opinion, but this prefix stuff makes me nervous. OTOH, the notion of attaching a "guard" to the cap seems to make sense. I just worry about last-minute changes to a two-year old API.
I've put the prefixes in READONLY_PREFIX and IMMUTABLE_PREFIX constants, so this would be trivial to change. I have no strong opinion on whether to use "
ro.
" and "imm.
" vs "readonly.
" and "immutable.
".I understand the reasons for nervousness, but we're getting quite a lot of forward-compatibility goodness from this set of changes, and we should have sufficient time to review it before the release.
Attachment docs-diff.txt (24647 bytes) added
Documentation patches
Attachment test-diff.txt (173395 bytes) added
Diff for tests
Attachment all-diff.txt (245764 bytes) added
Diff for everything (code + tests + docs)
The changes are larger and less elegant than I'd hoped, and I
failed to resist the temptation to do some refactoring of
test_web.py. The changes to test_web.py that are most relevant,
1.e. excluding the refactoring, are in these functions:
The last of these is a new test that directly checks the
main problem in this ticket; you may want to look at it first.
To review the code patches, I suggest looking at interfaces.py,
unknown.py, uri.py, dirnode.py, and nodemaker.py first.
There are some commented-out print statements left, which I'll
remove before preparing the final patch.
I couldn't figure out how to tell darcs to produce a diff that
excludes certain directories, so all-diff.txt contains the test
and doc diffs as well.
The calls to
rw_uri.strip()
andro_uri.strip()
in dirnode.py should berw_uri.strip(' ')
andro_uri.strip(' ')
,since the [http://docs.python.org/library/stdtypes.html#str.strip Python docs] don't clearly specify which characters are treated as whitespace by default. Actually that probably also applies to uses of
.strip()
elsewhere.Attachment all-with-json-fix-diff.txt (274578 bytes) added
Diff for everything including t=json fix (code + tests + docs)
The t=json option was not working for unknown nodes, so I've fixed that.
(Go to here and click on JSON to see the error message produced by the current code.)
I've read through this ticket, and I want to make sure that I understand what the proposed solution is before I start reviewing it.
The motivating problem (originally, at least) is that immutable directories were allowed to have mutable children. This goes against the expectation that the contents of an immutable directory are themselves immutable, as stated in comment:73504. The problem them expanded to include dealing with how clients from the present deal with caps from the future.
The solution that I think ended up being the one that everyone agreed with was stated in comment:73525. This was further elaborated on in comment:27. Paraphrasing:
rw_uri
slot should be silently ignored, and unknown caps in thero_uri
slot should have anro.
orimm.
prefix removed, and replaced with animm.
prefix. When writing an immutable directory, unknown caps in thero_uri
slot should be prefixed withimm.
if they are not already, and the existence of unknown caps in therw_uri
slot should be cause for an error and a failure. (extrapolating from comment:73524)rw_uri
slot should be passed through as normal. When writing a mutable directory, unknown caps in therw_uri
slot should be passed through as normal, and unknown caps in thero_uri
slot should have any existing prefix removed and replaced withro.
.imm.
orro.
, webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed withimm.
(this is suggested in comment:29).Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.
Replying to kevan:
further elaborated on in comment:27. Paraphrasing:
omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for
an error and a failure.
Yes. This should fail the operation with a MustBeDeepImmutableError.
In an immutable directory, the
rwcapdata
field that would contain the encrypted and mac'drw_uri
should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this
field, but on reflection I think we should treat a non-empty
rwcapdata
in the same way as a netstringdecoding error, i.e. raise ValueError. That's what the latest patch new-all-diff.txt does.
We also test that an immutable directory's underlying filenode has no
.get_writekey()
method, so that thecode to decrypt this field would fail anyway if it were mistakenly executed.
with an
imm.
prefix.Yes.
if they are not already,
No, this isn't necessary: the
ro_uri
slots of an immutable directory are implicitly immutable. Theimm.
will be added when the slot is read out by a client that doesn't understand the cap format, but it isnot stored. In fact,
ro.
orimm.
will be stripped if present (bystrip_prefix_for_ro
inunknown.py).
(If the prefix were not stripped, then it would be more difficult to test what happens when
.imm
is absent-- this would be a case that the attacker could provoke that would not occur in normal operation.)
(extrapolating from comment:20)
Yes. Actually the existence of any cap in the
rw_uri
slot of an immutable directory is a cause forfailure. However it is valid to provide a
rw_uri
in the JSON representation or in an URL parameter if itcan be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest
version of the patch, if it already had an
imm.
prefix. In either of those cases, it will be moved to thero_uri
slot.When writing a mutable directory, unknown caps in the
rw_uri
slot should be passed through as normal, andunknown caps in the
ro_uri
slot should have any existing prefix removed and replaced withro.
.Not quite. If there was an existing
imm.
prefix, it should stay. (This would happen for an unknown cap thatis copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed
to refer to an immutable subtree is not forgotten.)
Similarly to the immutable case, if an unknown cap is given in a
rw_uri
slot with noro_uri
and thecap already has a
ro.
prefix, then it will be moved to thero_uri
slot. (This makes it possible tolink a
ro.
orimm.
-prefixed unknown cap into an existing directory using the WUI, without needing aJSON request.)
that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in
other words, that it is immutable if prefixed with
imm.
(this is suggested in comment:29).... and read-only if prefixed with
ro.
. Yes.Please do, thanks.
[formatting]fixed
Replying to kevan:
Yes. This should fail the operation with a MustBeDeepImmutableError.
In an immutable directory, the
rwcapdata
field that would contain the encrypted and mac'drw_uri
should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this field, but on reflection I think we should treat a non-emptyrwcapdata
in the same way as a netstring decoding error, i.e. raise ValueError. That's what the latest patch new-all-diff.txt does.We also test that an immutable directory's underlying filenode has no
.get_writekey()
method, so that the code to decrypt this field would fail anyway if it were mistakenly executed.Yes.
No, this isn't necessary: the
ro_uri
slots of an immutable directory are implicitly immutable. Theimm.
will be added when the slot is read out by a client that doesn't understand the cap format, but it is not stored. In fact,ro.
orimm.
will be stripped if present (bystrip_prefix_for_ro
in unknown.py).(If the prefix were not stripped, then it would be more difficult to test what happens when
.imm
is absent -- this would be a case that the attacker could provoke that would not occur in normal operation.)(extrapolating from comment:73524)
Yes. Actually the existence of any cap in the
rw_uri
slot of an immutable directory is a cause for failure. However it is valid to provide arw_uri
in the JSON representation or in an URL parameter if it can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest version of the patch, if it already had animm.
prefix. In either of those cases, it will be moved to thero_uri
slot.When writing a mutable directory, unknown caps in the
rw_uri
slot should be passed through as normal, and unknown caps in thero_uri
slot should have any existing prefix removed and replaced withro.
.Not quite. If there was an existing
imm.
prefix, it should stay. (This would happen for an unknown cap that is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed to refer to an immutable subtree is not forgotten.)Similarly to the immutable case, if an unknown cap is given in a
rw_uri
slot with noro_uri
and the cap already has aro.
prefix, then it will be moved to thero_uri
slot. (This makes it possible to link aro.
orimm.
-prefixed unknown cap into an existing directory using the WUI, without needing a JSON request.)... and read-only if prefixed with
ro.
. Yes.Please do, thanks.
Attachment new-all-diff.2.txt (202060 bytes) added
New diff for everything (code + tests + docs) -- review this version
I've looked through the changes to behavior (with the exception of those to the webapi) and documentation, and have the following comments so far. I'll probably have more after I start working on the remaining parts, but posting what I have now might make this patchset more likely to make it into trunk by Thursday.
interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of
MustBeDeepImmutableError
andMustBeReadOnlyError
. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error?The docstring for
strip_prefix_for_ro
should be inside the function, not before it.In the
*init*
method of UnknownNode, you sayif x is not None:
instead ofif x
. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison?How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical? From what I can understand by reading the code in UnknownURI, it is possible to get that result in either case. Apologies if this is a case of me being dense.
What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that UnknownNode can enter. It also serves as self-documentation, I guess.
Any reason for implementing
is_unknown
in_BaseURI
? Is it anything that might need to be added to theIURI
interface? What aboutis_readonly
,is_mutable
,get_readonly
,get_verify_cap
, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces?The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes.
I didn't see any problems in nodemaker.py or dirnode.py.
If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket.
I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the **kwargs form.
I have a block of free time tomorrow, so I'm hoping I will have the rest of the review done by then.
An UnknownNode may have one of these error objects in its .error property. They will only be thrown if node.raise_error() is called. This happens when we try to put the node into a directory (besides tests, the relevant calls to raise_error are in dirnode.py and nodemaker.py).
OK.
Yes. I'll recheck what happens to falsy URIs.
In either case, isinstance(read_cap, uri.UnknownURI) will be true. But only in the latter case will read_cap.get_error() be truthy.
It's intended as documentation. I normally use asserts only for documentation, not for validity checks.
No, I'll remove that. (I originally had an is_unknown method on URIs but changed to using isinstance(..., UnknownURI).)
The existing code was inconsistent; it had these methods for some URI classes and not others. They were already declared in IURI.
Fixed.
Out of scope (there are > 60 uses of "Tahoe" in webapi.txt!)
Will do.
web/common.py looks good.
web/dirnode.py looks good.
web/filenode.py looks good.
web/info.py looks good.
web/root.py looks good.
test/common.py looks good.
test/test_dirnode.py:
Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap?
(Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though)
I think that FakeMutableFile? needs to implement raise_error.
test_filenode.py:
around line 41,
could be re-written as
I think it reads better that way, anyway.
Similarly for:
just a little below that and for
around line 64 and
in the chunk around line 99 and
in the chunk around line 127.
(test_filenode.py uses failUnlessEqual fairly solidly where it could use those, though, so what you're doing is at least consistent with what is already there)
test/test_system.py seems okay.
test/test_uri.py:
It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought.
test_web.py:
I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes.
You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests.
Other comments:
Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925?
My checklist of things that I thought there needed to be tests for is appeased, anyway, and I didn't see any other problems, so I guess this is reviewed.
Attachment mutable-in-immutable-darcspatch.txt (242608 bytes) added
Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements
Attachment misc-tweaks-darcspatch.txt (60689 bytes) added
Miscellaneous documentation, test, and code formatting tweaks.
Attachment all-833-darcspatch.txt (282874 bytes) added
Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements (supercedes previous patches)
Attachment changes-since-kevans-review-diff.txt (19654 bytes) added
Changes since Kevan's review
(A version of this reply was sent to the tahoe-dev list while the trac server was down, but note that there are some extra comments about #925 and from_string_* at the end.)
Replying to kevan:
Yes, exactly. Comment added.
Well, they didn't start with explanations :-) This is a good point, but I think I'll need to address it after the 1.6 release.
Actually the tests never call it, but it should have this method for consistency. Fixed.
Changed for all cases in test_filenode.py.
You mean the ones that result in instances of !UnknownURI? (Technically there are no invalid uris any more, because uri.from_string will catch the !BadURIError and construct an !UnknownURI that remembers it. The !BadURIError might be reraised later, though.)
I think that if it's easy enough to work out, that's as it should be. The trouble with putting too many comments in is that reviewers trust the comments, when they shouldn't.
Don't make that assumption! The patch removed some of the restrictions on UnknownNodes, so there's definitely a need to explicitly test this.
In fact the addition should succeed (for a mutable directory) when the URI is prefixed with "ro." or "imm." -- in those cases we don't need to be able to diminish it to a readcap, because it already is one. Tests for the six possible cases (test_{PUT_NEWFILEURL, POST_link}uri_unknown{bad, ro_good, imm_good) added.
Yes, that made it a lot easier to see what needed to be tested.
Now done (test_dirnode.py: test_spaces_are_stripped_on_the_way_out). As mentioned in #925, I changed it to only strip trailing spaces, which is sufficient to address the padding forward-compatibility issue.
Incidentally, I decided not to change from_string_* in uri.py to use explicit named keyword arguments, because I think the code is clearer and more stable under maintenance as it is. I did add a comment that they take the same keyword args as from_string.
The new changes look good -- no complaints from me.
From grepping around the darcs patch, it seems like you've also remembered to remove the commented-out print statements -- if I'm mistaken, don't forget to do that. :)
The patches apply fine against a fresh checkout of trunk (as of about 5 minutes ago), and it is able to build without error. I need to leave for dinner now, but I'll confirm that all of the tests pass once I get back.
...and they pass, so everything looks good here. I'm going to mark this as "reviewed" and change its owner back to davidsarah -- if someone disagrees with that, then feel free to change it back.
Okay, I'm ready to apply the patches, but which of the patches attached to this ticket are the one that should be applied?
Oh I get it, the second-to-last one, which is marked as "supercedes previous patches" and which also seems to include the succeeding patch.
committed in changeset:6057bc02cc33b5b7, changeset:56c00cb381d3cd8c, changeset:b9eda4de6a4e1d3e. Thank you!
Phew! After rereading the past discussion, here are a few clarifications about how 1.6 will behave:
as Brian mentioned in comment:73514, a known writecap should not be allowed in a
ro_uri
slot. That is now prevented and tested in [test_mutant_dirnodes_are_omitted]source:src/allmydata/test/test_web.py?rev=4195#L3337 (seemutant_write_in_ro_child
).as suggested by Zooko in comment:73514, URIs such as "
URI:CHK:<a>look at me I'm evil<a>
" are now treated as unknown (the BadURIError raised by, in this case,CHKFileURI.init_from_string
is caught by [uri.from_string]source:src/allmydata/uri.py?rev=4195#L658). However they are not allowed to be put into directories -- the error thrown byinit_from_string
will be remembered and re-raised if we try to do that.the patch does not add an "
immutable
" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "ro_uri
" field holds a known-immutable cap, or starts with "imm.
", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that.there are no "
bad_children
" or "unrecognized_children
" keys in the JSON -- we omit bad children, and unrecognized children go under the main "children
" key.Also, note that directory listings containing unknown caps were readable with Tahoe v1.5, but not v1.4.1 or earlier versions.
Attachment fix-inaccurate-comment-darcspatch.txt (44097 bytes) added
Fix inaccurate comment in test_mutant_dirnodes_are_omitted
Replying to davidsarah:
I was slightly confused: there is already a "
mutable
" field (added in changeset:3ac4a734e507abbf), so we should not add "immutable
". The issue is whether the output of the "t=json
" operation on unknown nodes (which did not work prior to 1.6) should include "mutable
", with a value set according to whether thero_uri
starts with "imm.
". Currently there is no "mutable
" field.This isn't giving the webapi client any additional information that it couldn't work out for itself (and, contrary to the comment above, it does not need to understand the current cap formats, only the "
imm.
" prefix). However it might help to simplify client code.If a
false
value in the "mutable
" field means "alleged immutable", then this change should be made; if it means "definitely immutable", it should not. We should decide this before the 1.6 release, otherwise clients would have to take into account the field not being present in any case.Replying to [davidsarah]comment:59:
My writing skills need work. I meant, currently there is no "
mutable
" field in the output oft=json
for unknown nodes.Note that if (we only have the URI for the object itself, or the parent directory is mutable), and (the node doesn't have a
rw_uri
and itsro_uri
doesn't have the "imm.
" prefix), then we don't know whether or not it is immutable. In this case we'd omit the "mutable
" field.Attachment add-mutable-to-json-and-imm-ro-to-listings-darcspatch.txt (52805 bytes) added
Add mutable field to t=json output for unknown nodes, when mutability is known. Includes patch for #931 on which it is dependent.
Reopening for review of
t=json
changes.Note that there is also a minor bugfix for DeepCheckStreamer in directory.py: it was giving a type of
"file"
for unknown nodes. Now it gives a type of"unknown"
, which makes it consistent with ManifestStreamer.Attachment this-is-the-one-to-apply-darcspatch.txt (54144 bytes) added
Fix inaccurate comment in test_mutant_dirnodes_are_omitted. Show -IMM and -RO suffixes for types of immutable and read-only unknown nodes in directory listings. Add mutable field to t=json output for unknown nodes, when mutability is known. Fix example JSON in webapi.txt that cannot occur in practice.
Apparently Python 2.4 doesn't support the 'foo if test else bar' syntax. Patch for that coming up.
Attachment all-including-python24-fixes-darcspatch.txt (57750 bytes) added
Same patches as above but including Python 2.4 fixes.
Fixed in changeset:3e35959e9b9130a4, changeset:f1fd703dedca8c45. Thank you!
Attachment test-unknownnode-improvements-and-json-example-darcspatch.txt (48615 bytes) added
Improvements to tests for UnknownNode, and minor fix to a JSON example in webapi.txt
Patches to improve tests and webapi doc for this issue: changeset:e2ee725d7d325b8d, changeset:ea3954372a06a36c