reject mutable children when *reading* an immutable dirnode #833

Closed
opened 2009-11-18 07:34:46 +00:00 by warner · 72 comments

In #607 (DIR2-CHK), Zooko suggested that in addition to refusing to allow mutable children when creating an immutable dirnode, the tahoe code should guard against buggy/noncompliant implementations and complain when it sees a mutable child in a pre-existing immutable dirnode. The code should log an error of some sort and ignore the child.

When I implemented #607, I didn't do this. It should be done. The trickiest part is testing it, and deciding where to send the error message.

In #607 (DIR2-CHK), Zooko suggested that in addition to refusing to allow mutable children when creating an immutable dirnode, the tahoe code should guard against buggy/noncompliant implementations and complain when it sees a mutable child in a pre-existing immutable dirnode. The code should log an error of some sort and ignore the child. When I implemented #607, I didn't do this. It should be done. The trickiest part is testing it, and deciding where to send the error message.
warner added the
code-dirnodes
major
defect
1.5.0
labels 2009-11-18 07:34:46 +00:00
warner added this to the undecided milestone 2009-11-18 07:34:46 +00:00
davidsarah commented 2009-12-13 02:19:47 +00:00
Owner

This is a security (integrity) issue, because if I have an immutable directory, I should be able to assume collision-resistance for the whole of the subtree beneath that directory.

This is a security (integrity) issue, because if I have an immutable directory, I should be able to assume collision-resistance for the whole of the subtree beneath that directory.

I agree and I think we should do this before releasing immutable directories in Tahoe-LAFS v1.6.

I agree and I think we should do this before releasing immutable directories in Tahoe-LAFS v1.6.
zooko modified the milestone from undecided to 1.6.0 2009-12-13 04:13:59 +00:00

To clarify, I think this is a critical security issue because if you tahoe cp -r $IMM_DIR_NODE ./local and then give $IMM_DIR_NODE to your friend, and she also tahoe cp -r $IMM_DIR_NODE ./herlocal, then you can be assured that she has all the same stuff that you do, even if the original creator of the directory that you are copying tries to trick you so that you and your friend get different results. This is the "deep" analogue of #491 (URIs do not refer to unique files in Allmydata Tahoe).

To clarify, I think this is a critical security issue because if you `tahoe cp -r $IMM_DIR_NODE ./local` and then give `$IMM_DIR_NODE` to your friend, and she also `tahoe cp -r $IMM_DIR_NODE ./herlocal`, then you can be assured that she has all the same stuff that you do, even if the original creator of the directory that you are copying tries to trick you so that you and your friend get different results. This is the "deep" analogue of #491 (URIs do not refer to unique files in Allmydata Tahoe).
zooko added
critical
and removed
major
labels 2009-12-15 18:00:01 +00:00
davidsarah commented 2009-12-30 00:45:46 +00:00
Owner

A consequence of fixing this would be that you can be almost sure that an immutable directory represents a directed acyclic graph, because it should be infeasible to create a cycle of hashes. This seems like a potentially useful property.

A consequence of fixing this would be that you can be almost sure that an immutable directory represents a directed *acyclic* graph, because it should be infeasible to create a cycle of hashes. This seems like a potentially useful property.
Author

On the phone, Zooko and I discussed the following plan:

  • in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps.
  • do the same in the t=json webapi: objects that are not recognized as mutable will be returned with type="unknown" instead of type="filenode" or type="dirnode". This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include rw_uri and ro_uri.
  • the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them.

This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their tahoe cp -r $IMM_DIR_NODE ./herlocal command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents)

davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.

On the phone, Zooko and I discussed the following plan: * in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps. * do the same in the t=json webapi: objects that are not recognized as mutable will be returned with `type="unknown"` instead of `type="filenode"` or `type="dirnode"`. This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include `rw_uri` and `ro_uri`. * the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them. This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their `tahoe cp -r $IMM_DIR_NODE ./herlocal` command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents) davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.
warner self-assigned this 2010-01-09 02:47:09 +00:00
davidsarah commented 2010-01-09 06:35:19 +00:00
Owner

Replying to warner:

On the phone, Zooko and I discussed the following plan:

  • in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps.
  • do the same in the t=json webapi: objects that are not recognized as mutable will be returned with type="unknown" instead of type="filenode" or type="dirnode". This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include rw_uri and ro_uri.
  • the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them.

I'm a bit concerned that this approach would make it too easy to inadvertently write a webapi client that fails to enforce the restriction. Also, having to duplicate the check in all frontends seems like an indication that it is being checked at the wrong layer, i.e. that the webapi should be enforcing it instead.

Is there any reason why such entries need to be accessible at all? It doesn't seem different from any other incorrectly encoded directory entry, which might not be accessible.

This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their tahoe cp -r $IMM_DIR_NODE ./herlocal command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents)

davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73508): > On the phone, Zooko and I discussed the following plan: > > * in the WUI (i.e. the HTML directory-listing pages), when rendering an immutable directory, any recognized-as-mutable objects therein should be presented just like unrecognized objects (i.e. caps from the future). In other words, the WUI for immdirs will only generate download/list hyperlinks for objects that are recognized as immutable. It will generate non-hyperlinked child names for the unknown/illegal objects. The "More Info" page will still list the mutable objects's read/writecaps. > * do the same in the t=json webapi: objects that are not recognized as mutable will be returned with `type="unknown"` instead of `type="filenode"` or `type="dirnode"`. This will cause the CLI "tahoe ls" command to display the object with a question mark instead of as a file or directory. The dictionary of properties on the unknown object will still include `rw_uri` and `ro_uri`. > * the "tahoe cp" command, when faced with an unknown object, should print a stderr message and skip over the object. So doing a "cp -r" of a tahoe source directory that contains unknown objects will result in a smaller target directory which does not contain them. I'm a bit concerned that this approach would make it too easy to inadvertently write a webapi client that fails to enforce the restriction. Also, having to duplicate the check in all frontends seems like an indication that it is being checked at the wrong layer, i.e. that the webapi should be enforcing it instead. Is there any reason why such entries need to be accessible at all? It doesn't seem different from any other incorrectly encoded directory entry, which might not be accessible. > > > This will raise a barrier in front of casual access to these "illegal" files, and any user with a conforming client can be confident that their `tahoe cp -r $IMM_DIR_NODE ./herlocal` command will output the same files that your command (using your conforming client) did. If they really try hard (by manually retrieving the caps and linking them into a new parent directory), they can copy these illegal mutable files, but Tahoe's standard tools won't do it for them. (i.e. we don't have to completely hide these objects, we just have to make sure that "tahoe cp" and "tahoe get" and the WUI won't give you their contents) > > davidsarah: yeah, immutable directories should always (modulo broken hash functions) represent DAGs.. that should be part of the definition.
Author

I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

This would mean that "tahoe ls" couldn't even acknowledge the existence of such caps. Older clients would be more likely to overwrite them if they can't see them at all. And it limits the ability of other tools (perhaps smarter or more up-to-date than the tahoe node they're using) to do something with those caps.

I dunno what's the best thing to do. I agree that it feels a bit leaky, but I'm also uncomfortable with censoring out the existence of caps-from-the-future.

I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future. This would mean that "tahoe ls" couldn't even acknowledge the existence of such caps. Older clients would be more likely to overwrite them if they can't see them at all. And it limits the ability of other tools (perhaps smarter or more up-to-date than the tahoe node they're using) to do something with those caps. I dunno what's the best thing to do. I agree that it feels a bit leaky, but I'm also uncomfortable with censoring out the existence of caps-from-the-future.
davidsarah commented 2010-01-10 20:10:19 +00:00
Owner

Replying to warner:

I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

Not necessarily. If the directory is immutable, we could just omit rw_uri.

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73511): > I guess it comes down to how we deal with caps-from-the-future. We can't distinguish between mutable-caps-from-the-future and immutable-caps-from-the-future, since we can't figure out anything about caps-from-the-future other than the fact that we don't understand them. So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future. Not necessarily. If the directory is immutable, we could just omit `rw_uri`.

Hey that's a good idea! :-)

Hey that's a good idea! :-)
Author

That is a good idea, and an explicit form of it should go into
source:src/allmydata/web/directory.py#L801 (in
DirectoryJSONMetadata._got), to stomp out the rw_uri return value
if the directory being listed is readonly.

The directory unpacking function (source:src/allmydata/dirnode.py#L256, in
DirectoryNode._unpack_contents) will only return a rwcap value if
the dirnode has a writekey, which is only the case for mutable dirwritecaps
(neither mutable dirreadcaps nor immutable directories have a writekey). But
if the rocap value happens to really be a writecap (like if some
non-conformant implementation placed one there), DirectoryNode.list
would return a writeable filenode, and then the webapi directory listing
would get a rw_uri value, and would return it to the client without a
check to explicitly stomp it out.

(I hadn't previously considered the problem of writecaps stored in the rocap
column.. for recognized caps, we can block them, but of course we can't
really say anything about caps-from-the-future).

But I don't think that's enough. If our claim is that "each time a conformant
Tahoe client copies out the contents of an immutable directory, they will be
exactly the same", then only serving ro_uri isn't strong enough
(ro_uri is always read-only, but not always immutable). If some
non-conformant implementation puts a mutable readcap into an immutable
directory's ro_uri column, then a conformant client which only reports
ro_uri will still enable "tahoe cp" to copy from something mutable, and
then users could see different contents on different copies.

(If we'd decided to provide an imm_uri in addition to ro_uri and
rw_uri, then maybe we could have omitted everything but imm_uri,
but we didn't, and besides I think it would have made the dirnode columns
even more complex).

That *is* a good idea, and an explicit form of it should go into source:src/allmydata/web/directory.py#L801 (in `DirectoryJSONMetadata._got`), to stomp out the `rw_uri` return value if the directory being listed is readonly. The directory unpacking function (source:src/allmydata/dirnode.py#L256, in `DirectoryNode._unpack_contents`) will only return a `rwcap` value if the dirnode has a writekey, which is only the case for mutable dirwritecaps (neither mutable dirreadcaps nor immutable directories have a writekey). But if the `rocap` value happens to really be a writecap (like if some non-conformant implementation placed one there), `DirectoryNode.list` would return a writeable filenode, and then the webapi directory listing would get a `rw_uri` value, and would return it to the client without a check to explicitly stomp it out. (I hadn't previously considered the problem of writecaps stored in the rocap column.. for recognized caps, we can block them, but of course we can't really say anything about caps-from-the-future). But I don't think that's enough. If our claim is that "each time a conformant Tahoe client copies out the contents of an immutable directory, they will be exactly the same", then only serving `ro_uri` isn't strong enough (`ro_uri` is always read-only, but not always immutable). If some non-conformant implementation puts a mutable readcap into an immutable directory's `ro_uri` column, then a conformant client which only reports `ro_uri` will still enable "tahoe cp" to copy from something mutable, and then users could see different contents on different copies. (If we'd decided to provide an `imm_uri` in addition to `ro_uri` and `rw_uri`, then maybe we could have omitted everything but `imm_uri`, but we didn't, and besides I think it would have made the dirnode columns even more complex).

You know, we really should include an imm_uri, shouldn't we? I guess the way we are currently doing it means "If you are listing a mutable dir, then the ro_uri is a mutable child, and if you are listing an immutable dir, then the ro_uri is an immutable child.". Is that right? But what if there is an immutable child in a mutable dir?

You know, we really *should* include an `imm_uri`, shouldn't we? I guess the way we are currently doing it means "If you are listing a mutable dir, then the `ro_uri` is a mutable child, and if you are listing an immutable dir, then the `ro_uri` is an immutable child.". Is that right? But what if there is an immutable child in a mutable dir?

So in addition to whatever plan we settle on for reliably excluding known-mutable children, is there any use in predicting what format caps from the future might use to indicate read-only, read-write, and immutable? In my Tahoe-LAFS plugin for TiddlyWiki I made it recognize the current cap format and also:

// This is speculative: maybe in the future there will be a version of Tahoe where caps 
// start with these symbols, and if so then this JavaScript code will magically work with 
// that version of Tahoe.
TAHOE_FUTURE_IMMUTABLE_CAP_RE_STR = "i_" + ALPHANUMERIC_STRING;
TAHOE_FUTURE_READONLY_CAP_RE_STR = "r_" + ALPHANUMERIC_STRING;
TAHOE_FUTURE_WRITABLE_CAP_RE_STR = "W_" + ALPHANUMERIC_STRING;

http://allmydata.org/source/tiddly_on_tahoe/trunk/tahoe_tiddly/TahoePlugin.js

I'm not sure that does any good, in particular because we may end up using different indicators for cap-type, maybe in part because having a separator char there, while good for human eyeballs, is bad for selecting with double-click of a mouse...

Anyway, if you could think for a minute about forward-compatibility and make sure that such things are not of use in this case and state conclusively that Tahoe-LAFS v1.6 should treat all unrecognized caps (whether from the future, corrupted, or malicious) that it finds as the children of immutable directories as mutable and unreadable that would make me feel better. :-)

So in addition to whatever plan we settle on for reliably excluding known-mutable children, is there any use in predicting what format caps from the future might use to indicate read-only, read-write, and immutable? In my Tahoe-LAFS plugin for [TiddlyWiki](wiki/TiddlyWiki) I made it recognize the current cap format and also: ``` // This is speculative: maybe in the future there will be a version of Tahoe where caps // start with these symbols, and if so then this JavaScript code will magically work with // that version of Tahoe. TAHOE_FUTURE_IMMUTABLE_CAP_RE_STR = "i_" + ALPHANUMERIC_STRING; TAHOE_FUTURE_READONLY_CAP_RE_STR = "r_" + ALPHANUMERIC_STRING; TAHOE_FUTURE_WRITABLE_CAP_RE_STR = "W_" + ALPHANUMERIC_STRING; ``` <http://allmydata.org/source/tiddly_on_tahoe/trunk/tahoe_tiddly/TahoePlugin.js> I'm not sure that does any good, in particular because we may end up using different indicators for cap-type, maybe in part because having a separator char there, while good for human eyeballs, is bad for selecting with double-click of a mouse... Anyway, if you could think for a minute about forward-compatibility and make sure that such things are not of use in this case and state conclusively that Tahoe-LAFS v1.6 should treat *all* unrecognized caps (whether from the future, corrupted, or malicious) that it finds as the children of immutable directories as mutable and unreadable that would make me feel better. :-)
Author

hm. I guess immutability is a property of the object, while readability/writeability is a property of the cap (think of the cap as a facet here). There's no such thing as an "immutable cap" to a mutable object.

So maybe instead of "imm_uri" we should put another property on the JSON stuff we return, an "immutable" boolean.

Of course, for caps-from-the-future, we'd leave this empty, because we don't know. Then we'd need to decide whether the webapi promises all([for x in immdir.list()]x.immutable) == True (in which case it would be obligated to strip out everything except known-immutable children), or whether it promises something smaller.

WRT to anticipated future caps, I can currently imagine append-only caps, write-with-storage-authority caps, verify/repair caps, destroy caps, revocation caps, and others. Very few of these cleanly break down into mutable/immutable, or read/write. So I don't see an overwhelming amount of value to trying to anticipate them too much, and therefore having current code attempt to deduce much of anything about caps-from-the-future.

So yes, I believe that 1.6 should treat all unrecognized caps as unknown, and any operation that promises to only return caps that are known to have some particular property should be obligated to either filter out unrecognized caps or mark them in some way such that clients can easily tell which are which.

I'm still undecided as to whether marking nodetype==unknown or removing them entirely is better.

hm. I guess immutability is a property of the object, while readability/writeability is a property of the cap (think of the cap as a facet here). There's no such thing as an "immutable cap" to a mutable object. So maybe instead of "imm_uri" we should put another property on the JSON stuff we return, an "immutable" boolean. Of course, for caps-from-the-future, we'd leave this empty, because we don't know. Then we'd need to decide whether the webapi promises `all([for x in immdir.list()]x.immutable) == True` (in which case it would be obligated to strip out everything except known-immutable children), or whether it promises something smaller. WRT to anticipated future caps, I can currently imagine append-only caps, write-with-storage-authority caps, verify/repair caps, destroy caps, revocation caps, and others. Very few of these cleanly break down into mutable/immutable, or read/write. So I don't see an overwhelming amount of value to trying to anticipate them too much, and therefore having current code attempt to deduce much of anything about caps-from-the-future. So yes, I believe that 1.6 should treat all unrecognized caps as unknown, and any operation that promises to only return caps that are known to have some particular property should be obligated to either filter out unrecognized caps or mark them in some way such that clients can easily tell which are which. I'm still undecided as to whether marking nodetype==`unknown` or removing them entirely is better.

If I understand correctly an "mutable" flag is a good idea, and should perhaps be quickly added into v1.6 in the interests of forward-compatibility, but is orthogonal to this issue of how to exclude mutable children from the listing of a mutable directory.

What if we had a "bad_children" element in the "dirnode" JSON, adjacent to the current "children" element: source:docs/frontends/webapi.txt?rev=4112#L509. This would give the programmer all the information about the bad children while making it very unlikely that someone would accidentally include bad children in among the good children. :-)

Oh, and although I really like the name "bad_children", I guess it also includes good children from the future, so we ought to name it something like "unrecognized_children". How is a programmer to know (possibly just out of curiousity) why each child is in that set? Perhaps there is a flag on each child saying either "mutable=true" or "known_format=false" or something?

If I understand correctly an `"mutable"` flag is a good idea, and should perhaps be quickly added into v1.6 in the interests of forward-compatibility, but is orthogonal to this issue of how to exclude mutable children from the listing of a mutable directory. What if we had a `"bad_children"` element in the `"dirnode"` JSON, adjacent to the current `"children"` element: source:docs/frontends/webapi.txt?rev=4112#L509. This would give the programmer all the information about the bad children while making it very unlikely that someone would accidentally include bad children in among the good children. :-) Oh, and although I really like the name `"bad_children"`, I guess it also includes good children from the future, so we ought to name it something like `"unrecognized_children"`. How is a programmer to know (possibly just out of curiousity) why each child is in that set? Perhaps there is a flag on each child saying either `"mutable=true"` or `"known_format=false"` or something?

Oh, what about having two separate keys next to "children" -- "bad_children" for mutable children of an immutable directory and "unrecognized_children" for children from the future or corrupted.

I guess it boils down to whether we want the implementation written by a lazy programmer to exclude the children entirely from the listing or to include them but not make them "live" (clickable, cd'able, cp'able, etc.) -- just make them "greyed-out".

If we want them excluded, then they shouldn't appear as keys under "children" at all. If we want them included but "greyed-out", then they should appear as keys under "children" but they should not have a "ro_uri" or "rw_uri" key.

So maybe this means that mutable children of an immutable directory should appear in a separate "bad_children" key or shouldn't appear at all, while unrecognized children (hopefully from the future) should appear in the "children" key while having no "ro_uri" or "rw_uri".

Oh, what about having two separate keys next to `"children"` -- `"bad_children"` for mutable children of an immutable directory and `"unrecognized_children"` for children from the future or corrupted. I guess it boils down to whether we want the implementation written by a lazy programmer to exclude the children entirely from the listing or to include them but not make them "live" (clickable, cd'able, cp'able, etc.) -- just make them "greyed-out". If we want them excluded, then they shouldn't appear as keys under `"children"` at all. If we want them included but "greyed-out", then they should appear as keys under `"children"` but they should not have a `"ro_uri"` or `"rw_uri"` key. So maybe this means that mutable children of an immutable directory should appear in a separate `"bad_children"` key or shouldn't appear at all, while unrecognized children (hopefully from the future) should appear in the `"children"` key while having no `"ro_uri"` or `"rw_uri"`.

Brian wrote:

So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future.

David-Sarah wrote:

Not necessarily. If the directory is immutable, we could just omit rw_uri.

I wrote:

Good idea!

No wait, that's a bad idea. A read-only cap to a mutable directory is not a read-cap to an immutable directory. I keep getting confused about this, and in the history of Tahoe-LAFS Brian and I have been confused about it more than once. I'm going to get a post-it note and affix it to my monitor, saying:

"READ-ONLY DOESN'T MEAN IMMUTABLE."

Brian wrote: > So if we want to promise users who are listing an immutable directory that they'll only see immutable caps, we'd have to filter out anything that we don't explicitly recognize as immutable, which means removing all caps-from-the-future. David-Sarah wrote: > Not necessarily. If the directory is immutable, we could just omit rw_uri. I wrote: > Good idea! No wait, that's a *bad* idea. A read-only cap to a mutable directory is not a read-cap to an immutable directory. I keep getting confused about this, and in the history of Tahoe-LAFS Brian and I have been confused about it more than once. I'm going to get a post-it note and affix it to my monitor, saying: "READ-ONLY DOESN'T MEAN IMMUTABLE."

Brian and I chatted about this on IRC yesterday. At the end I got sleepy and confused about what constraints we impose on the web gateway with regard to the JSON-formatted directory listings that it delivers to web clients. It is a question of forward-compatibility -- today's web clients may read JSON-formatted directory listings from web gateways from the future. This is a separate but related issue to today's web gateways (which are also Tahoe-LAFS storage clients) reading distributed directories that contain children inserted by other Tahoe-LAFS storage clients from the future. That latter issue is what most of our thinking on this ticket has been about so far.

I'm not sure what to think about that whole idea -- I haven't thought it through yet.

However, now in the light of morning one particular issue that seems clear to me is that the test that today's web gateway uses to determine whether a given child is a recognized or unrecognized child should be as specific as possible as long as it never marks a child from today as unrecognized. Brian tells me that uri.from_string("URI:CHK:<a>look at me I'm evil<a>") raises a BadURIError instead of what we would probably prefer which is to return an UnknownURI. I don't see why it does that from looking at the code. Here is a map to the relevant code:

The web gateway (which is a Tahoe-LAFS storage client) unpacks the directory contents with [its internal _unpack_contents() method]source:src/allmydata/dirnode.py@4119#L236 and passes the result to its caller from [its list() method]source:src/allmydata/dirnode.py@4119#L302. Already looking at the code I want more documentation here. Could list() grow a docstring that tells the caller more about what sorts of things it might find in the caps of the children that it returns?

So the caller in this case takes the caps and passes each one to [NodeMaker.create_from_cap()]source:src/allmydata/nodemaker.py@4106#L47 which returns UnknownNode in case both its writecap and readcap parameters are empty and then calls [uri.from_string()]source:src/allmydata/uri.py@4101#L503.

uri.from_string() ''looks'' to me like it is intended to, and does, return an UnknownURI instance when the cap is not any of the known formats, but Brian says otherwise.

So my proposals right now are:

  1. We add a docstring explaining what the caller can safely assume about the caps returned from DirectoryNode.list().
  2. We change uri.from_string() to return UnknownURI whenever the cap is clearly not a well-formed cap of a known type and add a unit test of this.
  3. The JSON listing returned by the wapi excludes mutable children of immutable directories entirely. The presence of one can only mean an attack or a bug. It includes children of unrecognized types, but not in the ''children'' node of the JSON object, instead in a sibling node named ''unrecognized_children''. That way wapi-client code written by a lazy programmer who doesn't think about the concept of unrecognized children will be oblivious to their presence. This is perhaps suboptimal from a UI/UX standpoint, but it is safe. Code written by more conscientious programmer can read the ''unrecognized_children'' node and show those children as present but unusable ("greyed-out") for improved UX.
Brian and I chatted about this on IRC yesterday. At the end I got sleepy and confused about what constraints we impose on the web gateway with regard to the JSON-formatted directory listings that it delivers to web clients. It is a question of forward-compatibility -- today's web clients may read JSON-formatted directory listings from web gateways from the future. This is a separate but related issue to today's web gateways (which are also Tahoe-LAFS storage clients) reading distributed directories that contain children inserted by other Tahoe-LAFS storage clients from the future. That latter issue is what most of our thinking on this ticket has been about so far. I'm not sure what to think about that whole idea -- I haven't thought it through yet. However, now in the light of morning one particular issue that seems clear to me is that the test that today's web gateway uses to determine whether a given child is a recognized or unrecognized child should be as specific as possible as long as it never marks a child from today as unrecognized. Brian tells me that `uri.from_string("URI:CHK:<a>look at me I'm evil<a>")` raises a `BadURIError` instead of what we would probably prefer which is to return an `UnknownURI`. I don't see why it does that from looking at the code. Here is a map to the relevant code: The web gateway (which is a Tahoe-LAFS storage client) unpacks the directory contents with [its internal _unpack_contents() method]source:src/allmydata/dirnode.py@4119#L236 and passes the result to its caller from [its list() method]source:src/allmydata/dirnode.py@4119#L302. Already looking at the code I want more documentation here. Could `list()` grow a docstring that tells the caller more about what sorts of things it might find in the caps of the children that it returns? So the caller in this case takes the caps and passes each one to [NodeMaker.create_from_cap()]source:src/allmydata/nodemaker.py@4106#L47 which returns `UnknownNode` in case both its `writecap` and `readcap` parameters are empty and then calls [uri.from_string()]source:src/allmydata/uri.py@4101#L503. `uri.from_string()` ''looks'' to me like it is intended to, and does, return an `UnknownURI` instance when the cap is not any of the known formats, but Brian says otherwise. So my proposals right now are: 1. We add a docstring explaining what the caller can safely assume about the caps returned from `DirectoryNode.list()`. 2. We change `uri.from_string()` to return `UnknownURI` whenever the cap is clearly not a well-formed cap of a known type and add a unit test of this. 3. The JSON listing returned by the wapi excludes mutable children of immutable directories entirely. The presence of one can only mean an attack or a bug. It includes children of unrecognized types, but not in the ''children'' node of the JSON object, instead in a sibling node named ''unrecognized_children''. That way wapi-client code written by a lazy programmer who doesn't think about the concept of unrecognized children will be oblivious to their presence. This is perhaps suboptimal from a UI/UX standpoint, but it is safe. Code written by more conscientious programmer can read the ''unrecognized_children'' node and show those children as present but unusable ("greyed-out") for improved UX.
Author

For reference, uri.from_string("URI:CHK:<a>look at me I'm evil</a>") starts by dispatching on a string prefix: we see the URI:CHK: prefix and stop considering anything else (including UnknownURI). Then the CHK-specific handler class applies a regexp to the look at me part and throws BadURIError.

The requirement is that any new format we introduce must not share one of our previously-claimed prefixes. The prefixes we've defined so far all look like /^URI:A-Z\-+:/, but I imagine we'll add tahoe:// or something in the future, and probably some binary formats that will also fit into this parse tree because they start with something non-ascii.

So there's plenty of room for new prefixes, but not room for new shapes on current prefixes.

For reference, `uri.from_string("URI:CHK:<a>look at me I'm evil</a>")` starts by dispatching on a string prefix: we see the `URI:CHK:` prefix and stop considering anything else (including `UnknownURI`). Then the CHK-specific handler class applies a regexp to the `look at me` part and throws `BadURIError`. The requirement is that any new format we introduce must not share one of our previously-claimed prefixes. The prefixes we've defined so far all look like `/^URI:A-Z\-+:/`, but I imagine we'll add `tahoe://` or something in the future, and probably some binary formats that will also fit into this parse tree because they start with something non-ascii. So there's plenty of room for new prefixes, but not room for new shapes on current prefixes.

Oh, so this suggests a further refinement: child links that have caps that begin with one of our defined prefixes such as URI:CHK: but which are not well-formed according to uri.from_string() will not be treated as child links potentially-from-the-future but instead as bad child links, the same as child links which try to provide mutable access to you when they are in a mutable directory.

Oh, so this suggests a further refinement: child links that have caps that begin with one of our defined prefixes such as `URI:CHK:` but which are not well-formed according to `uri.from_string()` will *not* be treated as child links potentially-from-the-future but instead as bad child links, the same as child links which try to provide mutable access to you when they are in a mutable directory.
davidsarah commented 2010-01-16 07:43:05 +00:00
Owner

Zooko, Brian and I discussed this on IRC, and initially came up with the following set of options. All the options have in common that when a child link of an immutable directory is recognized as being mutable, it should just be omitted. They differ on what should happen with unknown child links (i.e. caps from the future):

  • OMIT: just omit unknown child links
  • NONE: include a directory entry for an unknown child link, but omit rw_uri and ro_uri from its JSON representation
  • RO_URI: include a directory entry with only ro_uri
  • UNKNOWN_RO_URI: include a directory entry with a new unknown_ro_uri field that is used instead of ro_uri. (For mutable directories, unknown children would have both unknown_rw_uri and unknown_ro_uri fields.)
  • FORWARD_COMPAT_METADATA: standardize a way of recognizing immutable caps for "all" future versions (say, by the first character after lafs://), so that non-immutable caps can be excluded without having to understand their format.

There are also variations of each option according to whether the restriction is implemented in the webapi code or in DirectoryNode. If it is implemented in the webapi, then read-modify-write operations will preserve unknown caps that aren't directly affected by the operation. I think we decided that this is preferable because it loses data in fewer cases -- although it may introduce some inconsistencies between frontends, unless we manage to fix that for 1.6.

There was concensus that RO_URI option is unsafe (i.e. fails to address the issue in this bug), because it would be too easy for a ro_uri to be passed on to another client that understands it, but having lost track of the constraint that it is supposed to be immutable.

We had already excluded FORWARD_COMPAT_METADATA on the basis that it requires making decisions that we don't have time to consider carefully enough for 1.6. UNKNOWN_RO_URI has the advantage of not throwing away information, but we decided it was too complicated to implement and review in the time we have. OMIT for 1.6, and FORWARD_COMPAT_METADATA in 1.7, seemed like a good compromise.

Zooko, Brian and I discussed this on IRC, and initially came up with the following set of options. All the options have in common that when a child link of an immutable directory is recognized as being mutable, it should just be omitted. They differ on what should happen with *unknown* child links (i.e. caps from the future): * OMIT: just omit unknown child links * NONE: include a directory entry for an unknown child link, but omit `rw_uri` and `ro_uri` from its JSON representation * RO_URI: include a directory entry with only `ro_uri` * UNKNOWN_RO_URI: include a directory entry with a new `unknown_ro_uri` field that is used instead of `ro_uri`. (For mutable directories, unknown children would have both `unknown_rw_uri` and `unknown_ro_uri` fields.) * FORWARD_COMPAT_METADATA: standardize a way of recognizing immutable caps for "all" future versions (say, by the first character after `lafs://`), so that non-immutable caps can be excluded without having to understand their format. There are also variations of each option according to whether the restriction is implemented in the webapi code or in DirectoryNode. If it is implemented in the webapi, then read-modify-write operations will preserve unknown caps that aren't directly affected by the operation. I think we decided that this is preferable because it loses data in fewer cases -- although it may introduce some inconsistencies between frontends, unless we manage to fix that for 1.6. There was concensus that RO_URI option is unsafe (i.e. fails to address the issue in this bug), because it would be too easy for a `ro_uri` to be passed on to another client that understands it, but having lost track of the constraint that it is supposed to be immutable. We had already excluded FORWARD_COMPAT_METADATA on the basis that it requires making decisions that we don't have time to consider carefully enough for 1.6. UNKNOWN_RO_URI has the advantage of not throwing away information, but we decided it was too complicated to implement and review in the time we have. OMIT for 1.6, and FORWARD_COMPAT_METADATA in 1.7, seemed like a good compromise.
davidsarah commented 2010-01-16 17:28:10 +00:00
Owner

Overnight I thought of the following additional option, which has similar forward-compatibility advantages to FORWARD_COMPAT_METADATA but has fewer constraints on the future URL design:

  • PREFIXED_RO_URI: include a directory entry for an unknown child link of an immutable directory, but prefix ro_uri with "imm." (and omit rw_uri).

Note that this doesn't imply that the new format would have to use "imm." -- that prefix would only occur on URLs from the past. When we add support for the new format, the future client would see the "imm." prefix and check that the cap is immutable; if it is then it would work correctly.

(We might end up registering both "lafs:" and "imm.lafs:" in order to make this work in more cases, but I don't think that's a problem, and it's not strictly necessary. Anyway, there are already URI schemes containing '.'.)

In retrospect this is similar to something Zooko suggested on IRC ('mangle the cap by prepending "I'M NOT REALLY A CAP"'), except that "imm." just means that the future client must check that it is an immutable cap.

The effect of prepending "imm." with the current webapi is to display "GET unknown: can only do t=info, not t=", which is safe but ugly. We should fix that (it's ticket #884).

Overnight I thought of the following additional option, which has similar forward-compatibility advantages to FORWARD_COMPAT_METADATA but has fewer constraints on the future URL design: * PREFIXED_RO_URI: include a directory entry for an unknown child link of an immutable directory, but prefix `ro_uri` with "`imm.`" (and omit `rw_uri`). Note that this doesn't imply that the new format would have to use "`imm.`" -- that prefix would *only* occur on URLs from the past. When we add support for the new format, the future client would see the "`imm.`" prefix and check that the cap is immutable; if it is then it would work correctly. (We might end up registering both "`lafs:`" and "`imm.lafs:`" in order to make this work in more cases, but I don't think that's a problem, and it's not strictly necessary. Anyway, there are already [URI schemes](http://www.iana.org/assignments/uri-schemes.html) containing '`.`'.) In retrospect this is similar to something Zooko suggested on IRC ('mangle the cap by prepending "I'M NOT REALLY A CAP"'), except that "`imm.`" just means that the future client must check that it is an immutable cap. The effect of prepending "`imm.`" with the current webapi is to display `"GET unknown: can only do t=info, not t="`, which is safe but ugly. We should fix that (it's ticket #884).

Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future). It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).

Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future). It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).
davidsarah commented 2010-01-17 01:10:46 +00:00
Owner

Replying to zooko:

Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future).

Oh, I hadn't thought of that. I suspect we won't end up doing it that way if we wait until 1.7 to fix #839, though, because at that point we'll probably know enough about what the new cap URLs are going to look like to distinguish mutable from immutable, and therefore we can refuse outright to copy future mutable caps. Copying them as imm.-prefixed URLs has the disadvantage that you're granting authority to read subsequent updates to the original mutable object (to someone who strips off the imm.), which a deep-copy operation normally wouldn't do. So this would not be completely safe.

It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link).

Yes, that's a big advantage.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73526): > Yay! Something that I like about PREFIXED_RO_URI is that it could perhaps be used for writing out child links of unrecognized types into new tahoe dirs. This would be a safe, forward-compatible, good-user-experience solution to #839 (Copying directories containing caps from the future). Oh, I hadn't thought of that. I suspect we won't end up doing it that way if we wait until 1.7 to fix #839, though, because at that point we'll probably know enough about what the new cap URLs are going to look like to distinguish mutable from immutable, and therefore we can refuse outright to copy future mutable caps. Copying them as `imm.`-prefixed URLs has the disadvantage that you're granting authority to read subsequent updates to the original mutable object (to someone who strips off the `imm.`), which a deep-copy operation normally wouldn't do. So this would not be completely safe. > It would also enable something that I would like: that you get the same behavior with respect to child links of unknown types and illegally mutant child links whether you are viewing a dir through the WUI or the the WAPI, copying a dir with "tahoe cp" or WUI/WAPI commands, or manipulating a dir (such as by deleting or renaming extant child links or adding a new child link). Yes, that's a big advantage.
davidsarah commented 2010-01-17 15:12:02 +00:00
Owner

Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.) grep for CannotPackUnknownNodeError to see where this is enforced.

It would however be safe to allow a client that is rewriting the whole directory to preserve entries containing caps that it does not understand. I'm not sure yet whether I'll be able to do that for 1.6.

Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.) grep for `CannotPackUnknownNodeError` to see where this is enforced. It would however be safe to allow a client that is rewriting the whole directory to preserve entries containing caps that it does not understand. I'm not sure yet whether I'll be able to do that for 1.6.
davidsarah commented 2010-01-17 18:24:01 +00:00
Owner

Replying to davidsarah:

Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.)

I've changed my mind on this. In all but two of the webapi interfaces that involve creating or changing child dirnodes, the operation either creates the caps that it adds (in which case they are known), or is given them in the format {"rw_uri": ..., "ro_uri": ...}. So if the webapi client knows how to attenuate the cap (or already has it in that form) but the webapi server -- i.e. storage client -- doesn't, then that's OK.

The two (almost equivalent) interfaces that don't take a cap with separate "rw_uri" and "ro_uri" slots, are [PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri]source:docs/frontends/webapi.txt#L647 and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, i.e. the "hard link existing cap" operations. In that case we don't know how to attenuate the cap, so we cannot safely accept an unknown cap to be added.

However, I don't think there's any reason to restrict the other operations. We obtain no security benefit from doing so, because a malicious client (with the relevant writecap) can always write whatever it likes into a directory. The important thing is to make sure that clients who obtain caps from the directory are not confused.

So, I propose to remove the following check in source:src/allmydata/dirnode.py#L402 (and similarly at L427):

  if isinstance(child_node, UnknownNode):
      # don't be willing to pack unknown nodes: we might accidentally
      # put some write-authority into the rocap slot because we don't
      # know how to diminish the URI they gave us. We don't even know
      # if they gave us a readcap or a writecap.
      msg = "cannot pack unknown node as child %s" % str(name)
      raise CannotPackUnknownNodeError(msg)

(and remove CannotPackUnknownNodeError). Instead, the logic used by a webapi server when packing (encoding) and unpacking (decoding) a dirnode will be as follows:

If the directory is immutable,

  • always omit rw_uri slots (i.e. silently ignore whatever was there) when unpacking.
  • raise an error (fail the webapi operation with no side effects on the directory) if an rw_uri slot was present when packing.
  • if the packed ro_uri is unknown, strip any "ro." or "imm." prefix and then prepend "imm.".

If the directory is mutable,

  • pass through any rw_uri slots when packing and unpacking.
  • if the packed ro_uri is unknown and it does not already have a "ro." or "imm." prefix, then prepend "ro.".

An URI is "unknown" if [from_string in uri.py]source:src/allmydata/uri.py#L504 returns an instance of UnknownURI.

So the effect is that the URI returned in the ro_uri slot will be unusable by correct clients if it does not meet the constraint that it should have met to be in that slot -- i.e. being either read-only or deep-immutable.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73528): > Note: the intent of the patch I'm writing for this ticket is not to alter the current behaviour that a client cannot change or add a dir entry containing a cap that it does not understand. (It would not be able to attenuate the cap in case it is a write cap.) I've changed my mind on this. In all but two of the webapi interfaces that involve creating or changing child dirnodes, the operation either creates the caps that it adds (in which case they are known), or is given them in the format {"rw_uri": ..., "ro_uri": ...}. So if the webapi client knows how to attenuate the cap (or already has it in that form) but the webapi server -- i.e. storage client -- doesn't, then that's OK. The two (almost equivalent) interfaces that don't take a cap with separate "rw_uri" and "ro_uri" slots, are [PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri]source:docs/frontends/webapi.txt#L647 and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, i.e. the "hard link existing cap" operations. In that case we don't know how to attenuate the cap, so we cannot safely accept an unknown cap to be added. However, I don't think there's any reason to restrict the other operations. We obtain no security benefit from doing so, because a malicious client (with the relevant writecap) can always write whatever it likes into a directory. The important thing is to make sure that clients who obtain caps *from* the directory are not confused. So, I propose to remove the following check in source:src/allmydata/dirnode.py#L402 (and similarly at L427): ``` if isinstance(child_node, UnknownNode): # don't be willing to pack unknown nodes: we might accidentally # put some write-authority into the rocap slot because we don't # know how to diminish the URI they gave us. We don't even know # if they gave us a readcap or a writecap. msg = "cannot pack unknown node as child %s" % str(name) raise CannotPackUnknownNodeError(msg) ``` (and remove `CannotPackUnknownNodeError`). Instead, the logic used by a webapi server when packing (encoding) and unpacking (decoding) a dirnode will be as follows: If the directory is immutable, * always omit `rw_uri` slots (i.e. silently ignore whatever was there) when unpacking. * raise an error (fail the webapi operation with no side effects on the directory) if an `rw_uri` slot was present when packing. * if the packed `ro_uri` is unknown, strip any "ro." or "imm." prefix and then prepend "imm.". If the directory is mutable, * pass through any `rw_uri` slots when packing and unpacking. * if the packed `ro_uri` is unknown and it does not already have a "ro." or "imm." prefix, then prepend "ro.". An URI is "unknown" if [`from_string` in uri.py]source:src/allmydata/uri.py#L504 returns an instance of UnknownURI. So the effect is that the URI returned in the `ro_uri` slot will be unusable by *correct* clients if it does not meet the constraint that it should have met to be in that slot -- i.e. being either read-only or deep-immutable.
davidsarah commented 2010-01-18 04:31:17 +00:00
Owner

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

(Note that if we read the directory twice, then the two reads are not guaranteed to be consistent. This issue doesn't arise for immutable directories because only the ro_uri slots will be present.)

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of *mutable* directories. If you read a directory entry or entries in a single webapi operation, then the `rw_uri` and `ro_uri` slots should be consistent with each other -- i.e. if both are present then they point to the same object, and `ro_uri` is the read-only version of `rw_uri` (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does. If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that. (Note that if we read the directory twice, then the two reads are not guaranteed to be consistent. This issue doesn't arise for immutable directories because only the `ro_uri` slots will be present.)

I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when reading the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS. (I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints.

But the storage client should enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory.

Okay, so the practice of prepending an imm. to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them. It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of imm. is a way to mark the context from which that cap came.

The reason we can't prepend imm. to every child link from an immutable directory and ro. to every link from a ro_uri slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize ro. or imm..

This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has imm. on the front of it then you should check whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the imm. and use the cap. Likewise with ro.. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend imm. even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending imm. only to child links whose type you don't recognize.

I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when *reading* the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS. (I guess it doesn't *have* to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.) But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints. But the storage client *should* enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory. Okay, so the practice of prepending an `imm.` to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them. It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of `imm.` is a way to mark the context from which that cap came. The reason we can't prepend `imm.` to every child link from an immutable directory and `ro.` to every link from a `ro_uri` slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize `ro.` or `imm.`. This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has `imm.` on the front of it then you should *check* whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the `imm.` and use the cap. Likewise with `ro.`. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend `imm.` even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending `imm.` only to child links whose type you don't recognize.

Replying to davidsarah:

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

I'm not sure either. Let's see: source:src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap.

(Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.)

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73530): > The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of *mutable* directories. If you read a directory entry or entries in a single webapi operation, then the `rw_uri` and `ro_uri` slots should be consistent with each other -- i.e. if both are present then they point to the same object, and `ro_uri` is the read-only version of `rw_uri` (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does. I'm not sure either. Let's see: source:src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap. (Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.) > If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that. Should the webapi server tag such caps with something like `unk.` so that the fact that they came from a context where this property couldn't be verified is not lost?
davidsarah commented 2010-01-18 19:31:48 +00:00
Owner

Replying to [zooko]comment:29:

I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when reading the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS.

The constraints are only on decoding future cap URIs. When a future version decodes an URI that starts with "ro." and is followed by the encoding of a future cap, it must check that the cap is read-only. When it decodes an URI that starts with "imm." and is followed by the encoding of a future cap, it must check that the cap is immutable. If those constraints are not met then decoding must fail (but it is still safe to handle the URI in undecoded form).

There are no constraints on current clients, or on handling URIs as opaque strings.
(If an entity were to strip off a "ro." or "imm." prefix without checking the constraint, it could confuse itself, so it shouldn't do that.)

(I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory,

Correct, because it's not decoding those child URIs.

but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

That would mean that a directory operation could have side-effects on child links that it isn't defined to alter. No component needs to check URIs that it isn't decoding, so this would be consistent between components.

But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints.

Right.

But the storage client should enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory.

Yes. (Currently it does refuse to put child links of unknown types into any directory, but this patch would change that.)

Okay, so the practice of prepending an imm. to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them.

As I said, the constraint is only necessary on decoding. A future version of the storage client could redundantly check the constraint when it passes on an URI to a webapi client, and it should probably do so even if only to reduce the length of the URI (since it can then strip off the 3 or 4 character prefix).

It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of imm. is a way to mark the context from which that cap came. The reason we can't prepend imm. to every child link from an immutable directory and ro. to every link from a ro_uri slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize ro. or imm..

Exactly.

This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has imm. on the front of it then you should check whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the imm. and use the cap. Likewise with ro.. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend imm. even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending imm. only to child links whose type you don't recognize.

OK, I've made this improvement in my patch.

Replying to [zooko]comment:29: > I'm not sure I understand everything about your current plan. However, I think I understand that the Tahoe-LAFS storage client has to enforce constraints on the semantics of a directory's child links when *reading* the directory, whether it is doing so in order to display the directory in the WUI, send a copy of the directory (in JSON form) back through the WAPI, or (???) to write a copy of the directory into a new directory in Tahoe-LAFS. The constraints are only on *decoding* future cap URIs. When a future version decodes an URI that starts with "`ro.`" and is followed by the encoding of a future cap, it must check that the cap is read-only. When it decodes an URI that starts with "`imm.`" and is followed by the encoding of a future cap, it must check that the cap is immutable. If those constraints are not met then decoding must fail (but it is still safe to handle the URI in undecoded form). There are no constraints on current clients, or on handling URIs as opaque strings. (If an entity were to strip off a "`ro.`" or "`imm.`" prefix without checking the constraint, it could confuse itself, so it shouldn't do that.) > (I guess it doesn't *have* to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, Correct, because it's not decoding those child URIs. > but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.) That would mean that a directory operation could have side-effects on child links that it isn't defined to alter. No component needs to check URIs that it isn't decoding, so this would be consistent between components. > But the Tahoe-LAFS storage client doesn't need to enforce any security constraints when writing directories -- a malicious client could always write arbitrarily bad things into Tahoe-LAFS directories and no-one should rely on the assumption that the Tahoe-LAFS directory that they are using was written by a client that enforces some security constraints. Right. > But the storage client *should* enforce constraints to help the programmer realize that they're trying to do something wrong, when it can tell that they are. That's why it will refuse to put a child link of a known type which is mutable into an immutable directory, but it will not refuse to put a child link of an unknown type into an immutable directory. Yes. (Currently it does refuse to put child links of unknown types into any directory, but this patch would change that.) > Okay, so the practice of prepending an `imm.` to a cap which it extracts from an immutable directory is best understood as a security constraint that the Tahoe-LAFS storage client enforces on child links of immutable directories when it reads them. As I said, the constraint is only *necessary* on decoding. A future version of the storage client *could* redundantly check the constraint when it passes on an URI to a webapi client, and it should probably do so even if only to reduce the length of the URI (since it can then strip off the 3 or 4 character prefix). > It needs to inform any other (correct) Tahoe-LAFS clients that eventually receive that cap that the cap came from an immutable context. The prepending of `imm.` is a way to mark the context from which that cap came. The reason we can't prepend `imm.` to every child link from an immutable directory and `ro.` to every link from a `ro_uri` slot is backwards-compatibility -- existing Tahoe-LAFS storage clients need to read directories and find caps of known types there and they won't recognize `ro.` or `imm.`. Exactly. > This suggests a forward-compatibility improvement: if you are a Tahoe-LAFS storage client ("gateway", "node") and I give you a cap and it has `imm.` on the front of it then you should *check* whether it is of a type that you recognize and you can tell that the type is immutable. If so, you can pop off the `imm.` and use the cap. Likewise with `ro.`. If Tahoe-LAFS v1.6 storage clients always perform that check, then this may free up future versions of the Tahoe-LAFS storage client to prepend `imm.` even to old-type child links (from the Tahoe-LAFS v1.6-era), which would be simpler and more consistent than prepending `imm.` only to child links whose type you don't recognize. OK, I've made this improvement in my patch.
davidsarah commented 2010-01-18 19:48:25 +00:00
Owner

Replying to [zooko]comment:30:

Replying to davidsarah:

The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of mutable directories. If you read a directory entry or entries in a single webapi operation, then the rw_uri and ro_uri slots should be consistent with each other -- i.e. if both are present then they point to the same object, and ro_uri is the read-only version of rw_uri (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does.

I'm not sure either. Let's see: source:src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap.

Except that if the most powerful cap is of unknown type, _create_from_cap (note the underscore) will return None, so create_from_cap will return UnknownNode(writecap, readcap), which doesn't ensure that the cap pair is consistent.

So, the proposed behaviour for 1.6 is identical to the current behaviour here: known cap pairs retrieved from the webapi are consistent, but unknown ones may not be.

(Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.)

What's important here is off-line diminish from a writecap to a readcap. All the proposed designs support that. What might not be supported is off-line diminish from a short readcap to a verifycap. (Elk Point does not support that in cases where the readcap is too short to provide sufficient integrity.)

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)

Replying to [zooko]comment:30: > Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73530): > > The main remaining issue I haven't quite decided on, is that if we're removing the restriction on adding unknown URIs to a directory, we need to consider what consistency properties should be guaranteed for reads of *mutable* directories. If you read a directory entry or entries in a single webapi operation, then the `rw_uri` and `ro_uri` slots should be consistent with each other -- i.e. if both are present then they point to the same object, and `ro_uri` is the read-only version of `rw_uri` (assuming you trust your gateway to ensure that). Actually I'm not sure whether the current code guarantees that, but I think it does. > > I'm not sure either. Let's see: source:src/allmydata/nodemaker.py@4106#L47. Yes! If the type is known then it takes the most powerful cap and constructs a Node with that, so it then performs its own diminish to get the ro-cap from the rw-cap. Except that if the most powerful cap is of unknown type, `_create_from_cap` (note the underscore) will return `None`, so `create_from_cap` will return `UnknownNode(writecap, readcap)`, which doesn't ensure that the cap pair is consistent. So, the proposed behaviour for 1.6 is identical to the current behaviour here: known cap pairs retrieved from the webapi are consistent, but unknown ones may not be. > (Note: this is an example of why off-line diminish can be important! Some designs require a round trip with a storage server before you can compute the ro-cap given the rw-cap. Oh but you could include a copy of the ro-cap with the rw-cap. Oh, but then the Tahoe-LAFS storage client couldn't be sure that the ro-cap pointed to the same file as the rw-cap. I hereby change my mind about off-line diminish -- I used to think it wasn't as important as having very small caps, but now I think it is very important.) What's important here is off-line diminish from a writecap to a readcap. All the proposed designs support that. What might not be supported is off-line diminish from a short readcap to a verifycap. (Elk Point does not support that in cases where the readcap is too short to provide sufficient integrity.) > > If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that. > > Should the webapi server tag such caps with something like `unk.` so that the fact that they came from a context where this property couldn't be verified is not lost? What would the webapi client do with that fact? `imm.` and `ro.` are associated with specific conformance requirements, but `unk.` wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)
davidsarah commented 2010-01-18 20:04:25 +00:00
Owner

Replying to [davidsarah]comment:32:

Replying to [zooko]comment:30:

Replying to davidsarah:

If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that.

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.)

... although it should probably avoid decoding caps since that is the webapi server's responsibility. Which reminds me: we don't seem to have either a webapi operation or a CLI command to diminish a cap.

Replying to [davidsarah]comment:32: > Replying to [zooko]comment:30: > > Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73530): > > > If unknown caps (i.e. unknown to the webapi server) are allowed to be returned in directory reads, then the webapi can't ensure this property for such caps. I'm leaning toward just documenting that. > > > > Should the webapi server tag such caps with something like `unk.` so that the fact that they came from a context where this property couldn't be verified is not lost? > > What would the webapi client do with that fact? `imm.` and `ro.` are associated with specific conformance requirements, but `unk.` wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. (Note that a webapi client could always decode the caps and check that they are consistent.) ... although it should probably avoid decoding caps since that is the webapi server's responsibility. Which reminds me: we don't seem to have either a webapi operation or a CLI command to diminish a cap.

Okay, so the meaning of imm. in English is "I found this cap in an immutable context and I couldn't verify whether it is an immutable cap." rather than "I found this cap in an immutable context.". Sounds okay to me.

Replying to [davidsarah]comment:32:

Should the webapi server tag such caps with something like unk. so that the fact that they came from a context where this property couldn't be verified is not lost?

What would the webapi client do with that fact? imm. and ro. are associated with specific conformance requirements, but unk. wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption.

Okay, I'm convinced.

Great! Are we out of open issues for this ticket?

I chatted with Scott Kittermann, a Master of the Universe, and he said we still had an excellent chance of getting Tahoe-LAFS v1.6 included in Lucid if we upload it to them on Monday 25th instead of (as per our previous plan) Wednesday the 20th. So let's do it!! :-)

Okay, so the meaning of `imm.` in English is "I found this cap in an immutable context and I couldn't verify whether it is an immutable cap." rather than "I found this cap in an immutable context.". Sounds okay to me. Replying to [davidsarah]comment:32: > > Should the webapi server tag such caps with something like `unk.` so that the fact that they came from a context where this property couldn't be verified is not lost? > > What would the webapi client do with that fact? `imm.` and `ro.` are associated with specific conformance requirements, but `unk.` wouldn't be -- it would just be a hint that the webapi client shouldn't assume something. It is easier to document that it shouldn't ever make that assumption. Okay, I'm convinced. Great! Are we out of open issues for this ticket? I chatted with Scott Kittermann, a Master of the Universe, and he said we still had an excellent chance of getting Tahoe-LAFS v1.6 included in Lucid if we upload it to them on Monday 25th instead of (as per our previous plan) Wednesday the 20th. So let's do it!! :-)
davidsarah commented 2010-01-18 20:26:05 +00:00
Owner

Replying to [davidsarah]comment:31:

Replying to [zooko]comment:29:

(I guess it doesn't have to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory,

Correct, because it's not decoding those child URIs.

but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.)

That would mean that a directory operation could have side-effects on child links that it isn't defined to alter.

I misread your comment -- you said a different directory. However, there's no webapi operation that directly copies caps from one directory to another. Copying is implemented by getting the JSON representation of a directory, and using mkdir-with-children or mkdir-immutable or set-children to create or modify another directory (I assume; I haven't looked at the implementation of tahoe cp). So this is covered by the validation of known cap pairs that is done on directory reads.

Yes, I think we are out of open issues. Yay for Tahoe 1.6 in Lucid!

Replying to [davidsarah]comment:31: > Replying to [zooko]comment:29: > > (I guess it doesn't *have* to enforce security properties on read when the purpose of the read is to make a shallow copy of (a subset of) the child links into a different Tahoe-LAFS directory, > > Correct, because it's not decoding those child URIs. > > > but I feel like it "should" do so in order to be consistent and parallel with the other two targets that the information could be headed toward: the WUI and the WAPI.) > > That would mean that a directory operation could have side-effects on child links that it isn't defined to alter. I misread your comment -- you said a *different* directory. However, there's no webapi operation that directly copies caps from one directory to another. Copying is implemented by getting the JSON representation of a directory, and using `mkdir-with-children` or `mkdir-immutable` or `set-children` to create or modify another directory (I assume; I haven't looked at the implementation of `tahoe cp`). So this is covered by the validation of known cap pairs that is done on directory reads. Yes, I think we are out of open issues. Yay for Tahoe 1.6 in Lucid!
Author

If we must use prefixes, let's use unknown. instead of unk. . We don't need these to be short, and abbreviated-to-the-point-of-incomprehensibility keywords drive me nuts.

I don't know if the same argument applies to imm. or ro. or whatever.

I haven't read the rest of the discussion thoroughly enough to have an intelligent opinion, but this prefix stuff makes me nervous. OTOH, the notion of attaching a "guard" to the cap seems to make sense. I just worry about last-minute changes to a two-year old API.

If we must use prefixes, let's use `unknown.` instead of `unk.` . We don't need these to be short, and abbreviated-to-the-point-of-incomprehensibility keywords drive me nuts. I don't know if the same argument applies to `imm.` or `ro.` or whatever. I haven't read the rest of the discussion thoroughly enough to have an intelligent opinion, but this prefix stuff makes me nervous. OTOH, the notion of attaching a "guard" to the cap seems to make sense. I just worry about last-minute changes to a two-year old API.
davidsarah commented 2010-01-19 03:25:41 +00:00
Owner

I've put the prefixes in READONLY_PREFIX and IMMUTABLE_PREFIX constants, so this would be trivial to change. I have no strong opinion on whether to use "ro." and "imm." vs "readonly." and "immutable.".

I understand the reasons for nervousness, but we're getting quite a lot of forward-compatibility goodness from this set of changes, and we should have sufficient time to review it before the release.

I've put the prefixes in READONLY_PREFIX and IMMUTABLE_PREFIX constants, so this would be trivial to change. I have no strong opinion on whether to use "`ro.`" and "`imm.`" vs "`readonly.`" and "`immutable.`". I understand the reasons for nervousness, but we're getting quite a lot of forward-compatibility goodness from this set of changes, and we should have sufficient time to review it before the release.
davidsarah commented 2010-01-22 06:12:25 +00:00
Owner

Attachment docs-diff.txt (24647 bytes) added

Documentation patches

**Attachment** docs-diff.txt (24647 bytes) added Documentation patches
davidsarah commented 2010-01-23 13:05:21 +00:00
Owner

Attachment test-diff.txt (173395 bytes) added

Diff for tests

**Attachment** test-diff.txt (173395 bytes) added Diff for tests
davidsarah commented 2010-01-23 13:06:14 +00:00
Owner

Attachment all-diff.txt (245764 bytes) added

Diff for everything (code + tests + docs)

**Attachment** all-diff.txt (245764 bytes) added Diff for everything (code + tests + docs)
davidsarah commented 2010-01-23 13:08:25 +00:00
Owner

The changes are larger and less elegant than I'd hoped, and I
failed to resist the temptation to do some refactoring of
test_web.py. The changes to test_web.py that are most relevant,
1.e. excluding the refactoring, are in these functions:

  _create_initial_children
  _create_immutable_children
  test_POST_NEWDIRURL_initial_children
  test_POST_NEWDIRURL_immutable
  test_POST_mkdir_immutable
  test_POST_mkdir_no_parentdir_initial_children
  test_POST_mkdir_no_parentdir_immutable
  test_unknown
  test_immutable_unknown
  test_deep_check
  test_mutant_dirnodes_are_omitted

The last of these is a new test that directly checks the
main problem in this ticket; you may want to look at it first.

To review the code patches, I suggest looking at interfaces.py,
unknown.py, uri.py, dirnode.py, and nodemaker.py first.

There are some commented-out print statements left, which I'll
remove before preparing the final patch.

I couldn't figure out how to tell darcs to produce a diff that
excludes certain directories, so all-diff.txt contains the test
and doc diffs as well.

The changes are larger and less elegant than I'd hoped, and I failed to resist the temptation to do some refactoring of test_web.py. The changes to test_web.py that are most relevant, 1.e. excluding the refactoring, are in these functions: ``` _create_initial_children _create_immutable_children test_POST_NEWDIRURL_initial_children test_POST_NEWDIRURL_immutable test_POST_mkdir_immutable test_POST_mkdir_no_parentdir_initial_children test_POST_mkdir_no_parentdir_immutable test_unknown test_immutable_unknown test_deep_check test_mutant_dirnodes_are_omitted ``` The last of these is a new test that directly checks the main problem in this ticket; you may want to look at it first. To review the code patches, I suggest looking at interfaces.py, unknown.py, uri.py, dirnode.py, and nodemaker.py first. There are some commented-out print statements left, which I'll remove before preparing the final patch. I couldn't figure out how to tell darcs to produce a diff that excludes certain directories, so all-diff.txt contains the test and doc diffs as well.
davidsarah commented 2010-01-23 13:27:42 +00:00
Owner

The calls to rw_uri.strip() and ro_uri.strip() in dirnode.py should be

rw_uri.strip(' ') and ro_uri.strip(' '),

since the [http://docs.python.org/library/stdtypes.html#str.strip Python docs] don't clearly specify which characters are treated as whitespace by default. Actually that probably also applies to uses of .strip() elsewhere.

The calls to `rw_uri.strip()` and `ro_uri.strip()` in dirnode.py should be `rw_uri.strip(' ')` and `ro_uri.strip(' ')`, since the [http://docs.python.org/library/stdtypes.html#str.strip Python docs] don't clearly specify which characters are treated as whitespace by default. Actually that probably also applies to uses of `.strip()` elsewhere.
davidsarah commented 2010-01-24 06:10:54 +00:00
Owner

Attachment all-with-json-fix-diff.txt (274578 bytes) added

Diff for everything including t=json fix (code + tests + docs)

**Attachment** all-with-json-fix-diff.txt (274578 bytes) added Diff for everything including t=json fix (code + tests + docs)
davidsarah commented 2010-01-24 06:14:20 +00:00
Owner

The t=json option was not working for unknown nodes, so I've fixed that.

(Go to here and click on JSON to see the error message produced by the current code.)

The t=json option was not working for unknown nodes, so I've fixed that. (Go to [here](http://testgrid.allmydata.org:3567/uri/URI%3ADIR2%3Abntvvn2bf37vwmrd3ttbygt4s4%3Aqymwa2lcvhjvi3hnamxew6ywro5jmj7zfnspmrpqv3n7faibprlq/child-from-the-future?t=info) and click on JSON to see the error message produced by the current code.)
kevan commented 2010-01-25 00:35:13 +00:00
Owner

I've read through this ticket, and I want to make sure that I understand what the proposed solution is before I start reviewing it.

The motivating problem (originally, at least) is that immutable directories were allowed to have mutable children. This goes against the expectation that the contents of an immutable directory are themselves immutable, as stated in comment:73504. The problem them expanded to include dealing with how clients from the present deal with caps from the future.

The solution that I think ended up being the one that everyone agreed with was stated in comment:73525. This was further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure.
  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored, and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced with an imm. prefix. When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm. if they are not already, and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure. (extrapolating from comment:73524)
  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal. When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..
  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

I've read through this ticket, and I want to make sure that I understand what the proposed solution is before I start reviewing it. The motivating problem (originally, at least) is that immutable directories were allowed to have mutable children. This goes against the expectation that the contents of an immutable directory are themselves immutable, as stated in [comment:73504](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73504). The problem them expanded to include dealing with how clients from the present deal with caps from the future. The solution that I think ended up being the one that everyone agreed with was stated in [comment:73525](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73525). This was further elaborated on in comment:27. Paraphrasing: * When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure. * When reading an immutable directory, unknown caps in the `rw_uri` slot should be silently ignored, and unknown caps in the `ro_uri` slot should have an `ro.` or `imm.` prefix removed, and replaced with an `imm.` prefix. When writing an immutable directory, unknown caps in the `ro_uri` slot should be prefixed with `imm.` if they are not already, and the existence of unknown caps in the `rw_uri` slot should be cause for an error and a failure. (extrapolating from [comment:73524](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73524)) * When reading a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal. When writing a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal, and unknown caps in the `ro_uri` slot should have any existing prefix removed and replaced with `ro.`. * When presented with a cap prefixed with `imm.` or `ro.`, webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with `imm.` (this is suggested in comment:29). Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.
davidsarah commented 2010-01-25 12:19:07 +00:00
Owner

Replying to kevan:

The solution that I think ended up being the one that everyone agreed with was stated in comment:73525. This was

further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be

omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for

an error and a failure.

Yes. This should fail the operation with a MustBeDeepImmutableError.

  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored,

In an immutable directory, the rwcapdata field that would contain the encrypted and mac'd rw_uri

should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this

field, but on reflection I think we should treat a non-empty rwcapdata in the same way as a netstring

decoding error, i.e. raise ValueError. That's what the latest patch new-all-diff.txt does.

We also test that an immutable directory's underlying filenode has no .get_writekey() method, so that the

code to decrypt this field would fail anyway if it were mistakenly executed.

and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced

with an imm. prefix.

Yes.

When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm.

if they are not already,

No, this isn't necessary: the ro_uri slots of an immutable directory are implicitly immutable. The

imm. will be added when the slot is read out by a client that doesn't understand the cap format, but it is

not stored. In fact, ro. or imm. will be stripped if present (by strip_prefix_for_ro in

unknown.py).

(If the prefix were not stripped, then it would be more difficult to test what happens when .imm is absent

-- this would be a case that the attacker could provoke that would not occur in normal operation.)

and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure.

(extrapolating from comment:20)

Yes. Actually the existence of any cap in the rw_uri slot of an immutable directory is a cause for

failure. However it is valid to provide a rw_uri in the JSON representation or in an URL parameter if it

can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest

version of the patch, if it already had an imm. prefix. In either of those cases, it will be moved to the

ro_uri slot.

  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal.

When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and

unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..

Not quite. If there was an existing imm. prefix, it should stay. (This would happen for an unknown cap that

is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed

to refer to an immutable subtree is not forgotten.)

Similarly to the immutable case, if an unknown cap is given in a rw_uri slot with no ro_uri and the

cap already has a ro. prefix, then it will be moved to the ro_uri slot. (This makes it possible to

link a ro. or imm.-prefixed unknown cap into an existing directory using the WUI, without needing a

JSON request.)

  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap

that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in

other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

... and read-only if prefixed with ro.. Yes.

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

Please do, thanks.

Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73543): > The solution that I think ended up being the one that everyone agreed with was stated in [comment:73525](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73525). This was further elaborated on in comment:27. Paraphrasing: > > * When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure. Yes. This should fail the operation with a [MustBeDeepImmutableError](wiki/MustBeDeepImmutableError). > * When reading an immutable directory, unknown caps in the `rw_uri` slot should be silently ignored, In an immutable directory, the `rwcapdata` field that would contain the encrypted and mac'd `rw_uri` should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this field, but on reflection I think we should treat a non-empty `rwcapdata` in the same way as a netstring decoding error, i.e. raise [ValueError](wiki/ValueError). That's what the latest patch new-all-diff.txt does. We also test that an immutable directory's underlying filenode has no `.get_writekey()` method, so that the code to decrypt this field would fail anyway if it were mistakenly executed. > and unknown caps in the `ro_uri` slot should have an `ro.` or `imm.` prefix removed, and replaced with an `imm.` prefix. Yes. > When writing an immutable directory, unknown caps in the `ro_uri` slot should be prefixed with `imm.` if they are not already, No, this isn't necessary: the `ro_uri` slots of an immutable directory are implicitly immutable. The `imm.` will be added when the slot is read out by a client that doesn't understand the cap format, but it is not stored. In fact, `ro.` or `imm.` will be stripped if present (by `strip_prefix_for_ro` in unknown.py). (If the prefix were not stripped, then it would be more difficult to test what happens when `.imm` is absent -- this would be a case that the attacker could provoke that would not occur in normal operation.) > and the existence of unknown caps in the `rw_uri` slot should be cause for an error and a failure. (extrapolating from comment:20) Yes. Actually the existence of *any* cap in the `rw_uri` slot of an immutable directory is a cause for failure. However it is valid to provide a `rw_uri` in the JSON representation or in an URL parameter if it can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest version of the patch, if it already had an `imm.` prefix. In either of those cases, it will be moved to the `ro_uri` slot. > * When reading a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal. When writing a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal, and unknown caps in the `ro_uri` slot should have any existing prefix removed and replaced with `ro.`. Not quite. If there was an existing `imm.` prefix, it should stay. (This would happen for an unknown cap that is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed to refer to an immutable subtree is not forgotten.) Similarly to the immutable case, if an unknown cap is given in a `rw_uri` slot with no `ro_uri` and the cap already has a `ro.` prefix, then it will be moved to the `ro_uri` slot. (This makes it possible to link a `ro.` or `imm.`-prefixed unknown cap into an existing directory using the WUI, without needing a JSON request.) > * When presented with a cap prefixed with `imm.` or `ro.`, webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with `imm.` (this is suggested in comment:29). ... and read-only if prefixed with `ro.`. Yes. > Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it. Please do, thanks.
davidsarah commented 2010-01-25 12:22:29 +00:00
Owner

[formatting]fixed

Replying to kevan:

The solution that I think ended up being the one that everyone agreed with was stated in comment:73525. This was further elaborated on in comment:27. Paraphrasing:

  • When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure.

Yes. This should fail the operation with a MustBeDeepImmutableError.

  • When reading an immutable directory, unknown caps in the rw_uri slot should be silently ignored,

In an immutable directory, the rwcapdata field that would contain the encrypted and mac'd rw_uri should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this field, but on reflection I think we should treat a non-empty rwcapdata in the same way as a netstring decoding error, i.e. raise ValueError. That's what the latest patch new-all-diff.txt does.

We also test that an immutable directory's underlying filenode has no .get_writekey() method, so that the code to decrypt this field would fail anyway if it were mistakenly executed.

and unknown caps in the ro_uri slot should have an ro. or imm. prefix removed, and replaced with an imm. prefix.

Yes.

When writing an immutable directory, unknown caps in the ro_uri slot should be prefixed with imm. if they are not already,

No, this isn't necessary: the ro_uri slots of an immutable directory are implicitly immutable. The imm. will be added when the slot is read out by a client that doesn't understand the cap format, but it is not stored. In fact, ro. or imm. will be stripped if present (by strip_prefix_for_ro in unknown.py).

(If the prefix were not stripped, then it would be more difficult to test what happens when .imm is absent -- this would be a case that the attacker could provoke that would not occur in normal operation.)

and the existence of unknown caps in the rw_uri slot should be cause for an error and a failure.

(extrapolating from comment:73524)

Yes. Actually the existence of any cap in the rw_uri slot of an immutable directory is a cause for failure. However it is valid to provide a rw_uri in the JSON representation or in an URL parameter if it can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest version of the patch, if it already had an imm. prefix. In either of those cases, it will be moved to the ro_uri slot.

  • When reading a mutable directory, unknown caps in the rw_uri slot should be passed through as normal.

When writing a mutable directory, unknown caps in the rw_uri slot should be passed through as normal, and unknown caps in the ro_uri slot should have any existing prefix removed and replaced with ro..

Not quite. If there was an existing imm. prefix, it should stay. (This would happen for an unknown cap that is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed to refer to an immutable subtree is not forgotten.)

Similarly to the immutable case, if an unknown cap is given in a rw_uri slot with no ro_uri and the cap already has a ro. prefix, then it will be moved to the ro_uri slot. (This makes it possible to link a ro. or imm.-prefixed unknown cap into an existing directory using the WUI, without needing a JSON request.)

  • When presented with a cap prefixed with imm. or ro., webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with imm. (this is suggested in comment:29).

... and read-only if prefixed with ro.. Yes.

Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it.

Please do, thanks.

[formatting]fixed Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73543): > The solution that I think ended up being the one that everyone agreed with was stated in [comment:73525](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73525). This was further elaborated on in comment:27. Paraphrasing: > > * When reading an immutable directory, caps that we can interpret, and that are known to be mutable should be omitted entirely. When writing an immutable directory, the existence of caps that are mutable should be cause for an error and a failure. Yes. This should fail the operation with a [MustBeDeepImmutableError](wiki/MustBeDeepImmutableError). > * When reading an immutable directory, unknown caps in the `rw_uri` slot should be silently ignored, In an immutable directory, the `rwcapdata` field that would contain the encrypted and mac'd `rw_uri` should always be empty (zero-length), since the directory has no writekey. In the original patches I ignored this field, but on reflection I think we should treat a non-empty `rwcapdata` in the same way as a netstring decoding error, i.e. raise [ValueError](wiki/ValueError). That's what the latest patch new-all-diff.txt does. We also test that an immutable directory's underlying filenode has no `.get_writekey()` method, so that the code to decrypt this field would fail anyway if it were mistakenly executed. > and unknown caps in the `ro_uri` slot should have an `ro.` or `imm.` prefix removed, and replaced with an `imm.` prefix. Yes. > When writing an immutable directory, unknown caps in the `ro_uri` slot should be prefixed with `imm.` if they are not already, No, this isn't necessary: the `ro_uri` slots of an immutable directory are implicitly immutable. The `imm.` will be added when the slot is read out by a client that doesn't understand the cap format, but it is not stored. In fact, `ro.` or `imm.` will be stripped if present (by `strip_prefix_for_ro` in unknown.py). (If the prefix were not stripped, then it would be more difficult to test what happens when `.imm` is absent -- this would be a case that the attacker could provoke that would not occur in normal operation.) > and the existence of unknown caps in the `rw_uri` slot should be cause for an error and a failure. (extrapolating from [comment:73524](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73524)) Yes. Actually the existence of *any* cap in the `rw_uri` slot of an immutable directory is a cause for failure. However it is valid to provide a `rw_uri` in the JSON representation or in an URL parameter if it can be diminished to a read cap (handled by existing code in dirnode.py:create_from_cap), or in the latest version of the patch, if it already had an `imm.` prefix. In either of those cases, it will be moved to the `ro_uri` slot. > * When reading a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal. When writing a mutable directory, unknown caps in the `rw_uri` slot should be passed through as normal, and unknown caps in the `ro_uri` slot should have any existing prefix removed and replaced with `ro.`. Not quite. If there was an existing `imm.` prefix, it should stay. (This would happen for an unknown cap that is copied from an immutable directory to a mutable one. It ensures that the constraint that the cap was supposed to refer to an immutable subtree is not forgotten.) Similarly to the immutable case, if an unknown cap is given in a `rw_uri` slot with no `ro_uri` and the cap already has a `ro.` prefix, then it will be moved to the `ro_uri` slot. (This makes it possible to link a `ro.` or `imm.`-prefixed unknown cap into an existing directory using the WUI, without needing a JSON request.) > * When presented with a cap prefixed with `imm.` or `ro.`, webapi servers should see if it is a cap that they understand without the prefix. If it is, they should attempt to verify that it matches the prefix -- in other words, that it is immutable if prefixed with `imm.` (this is suggested in comment:29). ... and read-only if prefixed with `ro.`. Yes. > Have I misunderstood anything? Have I missed anything? If not, I'll accept this ticket and start reviewing it. Please do, thanks.
davidsarah commented 2010-01-25 12:24:18 +00:00
Owner

Attachment new-all-diff.2.txt (202060 bytes) added

New diff for everything (code + tests + docs) -- review this version

**Attachment** new-all-diff.2.txt (202060 bytes) added New diff for everything (code + tests + docs) -- review this version
kevan commented 2010-01-26 03:47:06 +00:00
Owner

I've looked through the changes to behavior (with the exception of those to the webapi) and documentation, and have the following comments so far. I'll probably have more after I start working on the remaining parts, but posting what I have now might make this patchset more likely to make it into trunk by Thursday.

interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of MustBeDeepImmutableError and MustBeReadOnlyError. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error?

The docstring for strip_prefix_for_ro should be inside the function, not before it.

In the *init* method of UnknownNode, you say if x is not None: instead of if x. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison?

if given_ro_uri is not None:
            read_cap = uri.from_string(given_ro_uri, deep_immutable=deep_immutable, name=name)
            if isinstance(read_cap, uri.UnknownURI):
                self.error = read_cap.get_error()
                if self.error:
                    assert self.rw_uri is None and self.ro_uri is None
                    return

How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical? From what I can understand by reading the code in UnknownURI, it is possible to get that result in either case. Apologies if this is a case of me being dense.

if deep_immutable:
            assert self.rw_uri is None
            # strengthen the constraint on ro_uri to ALLEGED_IMMUTABLE_PREFIX
            if given_ro_uri is not None:
                if given_ro_uri.startswith(ALLEGED_IMMUTABLE_PREFIX):
                    self.ro_uri = given_ro_uri
                elif given_ro_uri.startswith(ALLEGED_READONLY_PREFIX):
                    self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri[len(ALLEGED_READONLY_PREFIX):]
                else:
                    self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri

What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that UnknownNode can enter. It also serves as self-documentation, I guess.

Any reason for implementing is_unknown in _BaseURI? Is it anything that might need to be added to the IURI interface? What about is_readonly, is_mutable, get_readonly, get_verify_cap, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces?

The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes.

I didn't see any problems in nodemaker.py or dirnode.py.

If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket.

I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the **kwargs form.

I have a block of free time tomorrow, so I'm hoping I will have the rest of the review done by then.

I've looked through the changes to behavior (with the exception of those to the webapi) and documentation, and have the following comments so far. I'll probably have more after I start working on the remaining parts, but posting what I have now might make this patchset more likely to make it into trunk by Thursday. interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of `MustBeDeepImmutableError` and `MustBeReadOnlyError`. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error? The docstring for `strip_prefix_for_ro` should be inside the function, not before it. In the `*init*` method of [UnknownNode](wiki/UnknownNode), you say `if x is not None:` instead of `if x`. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison? ``` if given_ro_uri is not None: read_cap = uri.from_string(given_ro_uri, deep_immutable=deep_immutable, name=name) if isinstance(read_cap, uri.UnknownURI): self.error = read_cap.get_error() if self.error: assert self.rw_uri is None and self.ro_uri is None return ``` How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical? From what I can understand by reading the code in UnknownURI, it is possible to get that result in either case. Apologies if this is a case of me being dense. ``` if deep_immutable: assert self.rw_uri is None # strengthen the constraint on ro_uri to ALLEGED_IMMUTABLE_PREFIX if given_ro_uri is not None: if given_ro_uri.startswith(ALLEGED_IMMUTABLE_PREFIX): self.ro_uri = given_ro_uri elif given_ro_uri.startswith(ALLEGED_READONLY_PREFIX): self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri[len(ALLEGED_READONLY_PREFIX):] else: self.ro_uri = ALLEGED_IMMUTABLE_PREFIX + given_ro_uri ``` What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that [UnknownNode](wiki/UnknownNode) can enter. It also serves as self-documentation, I guess. Any reason for implementing `is_unknown` in `_BaseURI`? Is it anything that might need to be added to the `IURI` interface? What about `is_readonly`, `is_mutable`, `get_readonly`, `get_verify_cap`, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces? The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes. I didn't see any problems in nodemaker.py or dirnode.py. If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket. I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the **kwargs form. I have a block of free time tomorrow, so I'm hoping I will have the rest of the review done by then.
davidsarah commented 2010-01-26 06:24:18 +00:00
Owner

interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of MustBeDeepImmutableError and MustBeReadOnlyError. This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error?

An UnknownNode may have one of these error objects in its .error property. They will only be thrown if node.raise_error() is called. This happens when we try to put the node into a directory (besides tests, the relevant calls to raise_error are in dirnode.py and nodemaker.py).

The docstring for strip_prefix_for_ro should be inside the function, not before it.

OK.

In the *init* method of UnknownNode, you say if x is not None: instead of if x. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison?

Yes. I'll recheck what happens to falsy URIs.

>  if given_ro_uri is not None:
>              read_cap = uri.from_string(given_ro_uri,
>  deep_immutable=deep_immutable, name=name)
>              if isinstance(read_cap, uri.UnknownURI):
>                  self.error = read_cap.get_error()
>                  if self.error:
>                      assert self.rw_uri is None and self.ro_uri is None
>                      return

How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical?

In either case, isinstance(read_cap, uri.UnknownURI) will be true. But only in the latter case will read_cap.get_error() be truthy.

>  if deep_immutable:
>              assert self.rw_uri is None
...

What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that UnknownNode can enter. It also serves as self-documentation, I guess.

It's intended as documentation. I normally use asserts only for documentation, not for validity checks.

Any reason for implementing is_unknown in _BaseURI? Is it anything that might need to be added to the IURI interface?

No, I'll remove that. (I originally had an is_unknown method on URIs but changed to using isinstance(..., UnknownURI).)

What about is_readonly, is_mutable, get_readonly, get_verify_cap, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces?

The existing code was inconsistent; it had these methods for some URI classes and not others. They were already declared in IURI.

The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes.

Fixed.

I didn't see any problems in nodemaker.py or dirnode.py.

If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket.

Out of scope (there are > 60 uses of "Tahoe" in webapi.txt!)

I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the **kwargs form.

Will do.

> interfaces.py looks okay, but I was a bit confused when I read "...and then used" at the end of [MustBeDeepImmutableError](wiki/MustBeDeepImmutableError) and [MustBeReadOnlyError](wiki/MustBeReadOnlyError). This is possibly me being dense after work, but maybe you could clarify the context in which those are being used to trigger that error? An [UnknownNode](wiki/UnknownNode) may have one of these error objects in its .error property. They will only be thrown if node.raise_error() is called. This happens when we try to put the node into a directory (besides tests, the relevant calls to raise_error are in dirnode.py and nodemaker.py). > The docstring for `strip_prefix_for_ro` should be inside the function, not before it. OK. > In the `*init*` method of [UnknownNode](wiki/UnknownNode), you say `if x is not None:` instead of `if x`. I'm assuming that that's to deal with empty string arguments/other arguments that evaluate to False in a comparison? Yes. I'll recheck what happens to falsy URIs. ``` > if given_ro_uri is not None: > read_cap = uri.from_string(given_ro_uri, > deep_immutable=deep_immutable, name=name) > if isinstance(read_cap, uri.UnknownURI): > self.error = read_cap.get_error() > if self.error: > assert self.rw_uri is None and self.ro_uri is None > return ``` > How does this know the difference between an UnknownURI that is an UnknownURI because we don't know what it is, and an UnknownURI that is unknown because we know what it is but it is prefixed with something that makes it nonsensical? In either case, isinstance(read_cap, uri.UnknownURI) will be true. But only in the latter case will read_cap.get_error() be truthy. ``` > if deep_immutable: > assert self.rw_uri is None ... ``` > What is this assertion meant to do? I don't think that anything so far in the code would have assigned anything other than none to self.rw_uri. I don't suppose that it hurts to have it there, aside from maybe meaning that the calling code needs to be more complicated if it wants to handle all of the error conditions that [UnknownNode](wiki/UnknownNode) can enter. It also serves as self-documentation, I guess. It's intended as documentation. I normally use asserts only for documentation, not for validity checks. > Any reason for implementing `is_unknown` in `_BaseURI`? Is it anything that might need to be added to the `IURI` interface? No, I'll remove that. (I originally had an is_unknown method on URIs but changed to using isinstance(..., UnknownURI).) > What about `is_readonly`, `is_mutable`, `get_readonly`, `get_verify_cap`, which were added to a number of URIs in uri.py. Should they be added to one of the URI interfaces? The existing code was inconsistent; it had these methods for some URI classes and not others. They were already declared in IURI. > The spacing on your changes to SSKVerifierURI is inconsistent with your changes in the other URI classes. Fixed. > I didn't see any problems in nodemaker.py or dirnode.py. > If you're correcting things in webapi.txt, you might try replacing Tahoe with Tahoe-LAFS, which (I gather) is more correct. But maybe that's out of scope for this ticket. Out of scope (there are > 60 uses of "Tahoe" in webapi.txt!) > I think it would be easier to read if you changed the from_string_ methods in uri.py to use explicit named keyword arguments instead of the **kwargs form. Will do.
kevan commented 2010-01-27 05:58:35 +00:00
Owner

web/common.py looks good.

web/dirnode.py looks good.

web/filenode.py looks good.

web/info.py looks good.

web/root.py looks good.

test/common.py looks good.

test/test_dirnode.py:

+        bad_future_node = UnknownNode(future_write_uri, None)
+        bad_kids1 = {u"one": (bad_future_node, {})}
         d.addCallback(lambda ign:
+                      self.shouldFail(MustNotBeUnknownRWError, "bad_kids1",
+                                      "cannot attach unknown",
                                       nm.create_new_mutable_directory,
                                       bad_kids1))

Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap?

(Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though)

I think that FakeMutableFile? needs to implement raise_error.

test_filenode.py:

around line 41,

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

could be re-written as

    self.failIf(fn1.is_unknown())
    self.failUnless(fn1.is_allowed_in_mutable_directory())

I think it reads better that way, anyway.

Similarly for:

+        self.failUnlessEqual(v.is_readonly(), True)
+        self.failUnlessEqual(v.is_mutable(), False)

just a little below that and for

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

around line 64 and

+        self.failUnlessEqual(n.is_unknown(), False)
+        self.failUnlessEqual(n.is_allowed_in_immutable_directory(), False)

in the chunk around line 99 and

+        self.failUnlessEqual(nro.is_mutable(), True)
+        self.failUnlessEqual(nro.is_readonly(), True)
+        self.failUnlessEqual(nro.is_unknown(), False)
+        self.failUnlessEqual(nro.is_allowed_in_immutable_directory(), False)

in the chunk around line 127.

(test_filenode.py uses failUnlessEqual fairly solidly where it could use those, though, so what you're doing is at least consistent with what is already there)

test/test_system.py seems okay.

test/test_uri.py:

It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought.

test_web.py:

I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes.

You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests.

Other comments:

Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925?

My checklist of things that I thought there needed to be tests for is appeased, anyway, and I didn't see any other problems, so I guess this is reviewed.

web/common.py looks good. web/dirnode.py looks good. web/filenode.py looks good. web/info.py looks good. web/root.py looks good. test/common.py looks good. test/test_dirnode.py: ``` + bad_future_node = UnknownNode(future_write_uri, None) + bad_kids1 = {u"one": (bad_future_node, {})} d.addCallback(lambda ign: + self.shouldFail(MustNotBeUnknownRWError, "bad_kids1", + "cannot attach unknown", nm.create_new_mutable_directory, bad_kids1)) ``` Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap? (Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though) I think that [FakeMutableFile](wiki/FakeMutableFile)? needs to implement raise_error. test_filenode.py: around line 41, ``` + self.failUnlessEqual(fn1.is_unknown(), False) + self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True) ``` could be re-written as ``` self.failIf(fn1.is_unknown()) self.failUnless(fn1.is_allowed_in_mutable_directory()) ``` I think it reads better that way, anyway. Similarly for: ``` + self.failUnlessEqual(v.is_readonly(), True) + self.failUnlessEqual(v.is_mutable(), False) ``` just a little below that and for ``` + self.failUnlessEqual(fn1.is_unknown(), False) + self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True) ``` around line 64 and ``` + self.failUnlessEqual(n.is_unknown(), False) + self.failUnlessEqual(n.is_allowed_in_immutable_directory(), False) ``` in the chunk around line 99 and ``` + self.failUnlessEqual(nro.is_mutable(), True) + self.failUnlessEqual(nro.is_readonly(), True) + self.failUnlessEqual(nro.is_unknown(), False) + self.failUnlessEqual(nro.is_allowed_in_immutable_directory(), False) ``` in the chunk around line 127. (test_filenode.py uses failUnlessEqual fairly solidly where it could use those, though, so what you're doing is at least consistent with what is already there) test/test_system.py seems okay. test/test_uri.py: It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought. test_web.py: I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes. You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests. Other comments: Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925? My checklist of things that I thought there needed to be tests for is appeased, anyway, and I didn't see any other problems, so I guess this is reviewed.
davidsarah commented 2010-01-27 07:11:54 +00:00
Owner

Attachment mutable-in-immutable-darcspatch.txt (242608 bytes) added

Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements

**Attachment** mutable-in-immutable-darcspatch.txt (242608 bytes) added Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements
davidsarah commented 2010-01-27 07:12:21 +00:00
Owner

Attachment misc-tweaks-darcspatch.txt (60689 bytes) added

Miscellaneous documentation, test, and code formatting tweaks.

**Attachment** misc-tweaks-darcspatch.txt (60689 bytes) added Miscellaneous documentation, test, and code formatting tweaks.
davidsarah commented 2010-01-27 23:23:25 +00:00
Owner

Attachment all-833-darcspatch.txt (282874 bytes) added

Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements (supercedes previous patches)

**Attachment** all-833-darcspatch.txt (282874 bytes) added Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements (supercedes previous patches)
davidsarah commented 2010-01-27 23:26:08 +00:00
Owner

Attachment changes-since-kevans-review-diff.txt (19654 bytes) added

Changes since Kevan's review

**Attachment** changes-since-kevans-review-diff.txt (19654 bytes) added Changes since Kevan's review
davidsarah commented 2010-01-27 23:36:41 +00:00
Owner

(A version of this reply was sent to the tahoe-dev list while the trac server was down, but note that there are some extra comments about #925 and from_string_* at the end.)

Replying to kevan:

test/test_dirnode.py:

+        bad_future_node = [UnknownNode](wiki/UnknownNode)(future_write_uri, None)
+        bad_kids1 = {u"one": (bad_future_node, {})}
         d.addCallback(lambda ign:
+                      self.shouldFail(MustNotBeUnknownRWError, "bad_kids1",
+                                      "cannot attach unknown",
                                       nm.create_new_mutable_directory,
                                       bad_kids1))

Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap?

Yes, exactly. Comment added.

(Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though)

Well, they didn't start with explanations :-) This is a good point, but I think I'll need to address it after the 1.6 release.

I think that FakeMutableFile? needs to implement raise_error.

Actually the tests never call it, but it should have this method for consistency. Fixed.

test_filenode.py:

around line 41,

+        self.failUnlessEqual(fn1.is_unknown(), False)
+        self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True)

could be re-written as

    self.failIf(fn1.is_unknown())
    self.failUnless(fn1.is_allowed_in_mutable_directory())

Changed for all cases in test_filenode.py.

test/test_uri.py:

It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought.

You mean the ones that result in instances of !UnknownURI? (Technically there are no invalid uris any more, because uri.from_string will catch the !BadURIError and construct an !UnknownURI that remembers it. The !BadURIError might be reraised later, though.)

I think that if it's easy enough to work out, that's as it should be. The trouble with putting too many comments in is that reviewers trust the comments, when they shouldn't.

test_web.py:

I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes.

Don't make that assumption! The patch removed some of the restrictions on UnknownNodes, so there's definitely a need to explicitly test this.

In fact the addition should succeed (for a mutable directory) when the URI is prefixed with "ro." or "imm." -- in those cases we don't need to be able to diminish it to a readcap, because it already is one. Tests for the six possible cases (test_{PUT_NEWFILEURL, POST_link}uri_unknown{bad, ro_good, imm_good) added.

You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests.

Yes, that made it a lot easier to see what needed to be tested.

Other comments:

Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925?

Now done (test_dirnode.py: test_spaces_are_stripped_on_the_way_out). As mentioned in #925, I changed it to only strip trailing spaces, which is sufficient to address the padding forward-compatibility issue.

Incidentally, I decided not to change from_string_* in uri.py to use explicit named keyword arguments, because I think the code is clearer and more stable under maintenance as it is. I did add a comment that they take the same keyword args as from_string.

(A version of this reply was sent to the tahoe-dev list while the trac server was down, but note that there are some extra comments about #925 and from_string_* at the end.) Replying to [kevan](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73548): > test/test_dirnode.py: > ``` > + bad_future_node = [UnknownNode](wiki/UnknownNode)(future_write_uri, None) > + bad_kids1 = {u"one": (bad_future_node, {})} > d.addCallback(lambda ign: > + self.shouldFail(MustNotBeUnknownRWError, "bad_kids1", > + "cannot attach unknown", > nm.create_new_mutable_directory, > bad_kids1)) > ``` > Could you add a comment here explaining why this should fail? If I understand correctly, it is because the cap you provide doesn't start with ro. or imm. and since it is unknown we don't know how to diminish it to a readonly cap? Yes, exactly. Comment added. > (Few of the tests so far in test_dirnode.py have explanations, but I've been able to understand them without. Explanations aren't a bad idea, though) Well, they didn't start with explanations :-) This is a good point, but I think I'll need to address it after the 1.6 release. > I think that [FakeMutableFile](wiki/FakeMutableFile)? needs to implement raise_error. Actually the tests never call it, but it should have this method for consistency. Fixed. > test_filenode.py: > > around line 41, > ``` > + self.failUnlessEqual(fn1.is_unknown(), False) > + self.failUnlessEqual(fn1.is_allowed_in_immutable_directory(), True) > ``` > could be re-written as > > ``` > self.failIf(fn1.is_unknown()) > self.failUnless(fn1.is_allowed_in_mutable_directory()) > ``` Changed for all cases in test_filenode.py. > test/test_uri.py: > > It would be nice to have a comment explaining why the invalid uris are invalid, though it's easy enough to figure out after a few seconds of looking at it. Just a thought. You mean the ones that result in instances of !UnknownURI? (Technically there are no invalid uris any more, because uri.from_string will catch the !BadURIError and construct an !UnknownURI that remembers it. The !BadURIError might be reraised later, though.) I think that if it's easy enough to work out, that's as it should be. The trouble with putting too many comments in is that reviewers trust the comments, when they shouldn't. > test_web.py: > > I'm assuming that existing tests will already fail if we attempt to add an unknown node using PUT /uri/$DIRCAP/SUBDIRS../CHILDNAME?t=uri and POST /uri/$DIRCAP/SUBDIRS../?t=uri&name=CHILDNAME&uri=CHILDCAP, since that was the existing behavior before your changes. Don't make that assumption! The patch removed some of the restrictions on UnknownNodes, so there's definitely a need to explicitly test this. In fact the addition should succeed (for a mutable directory) when the URI is prefixed with "ro." or "imm." -- in those cases we don't need to be able to diminish it to a readcap, because it already is one. Tests for the six possible cases (test_{PUT_NEWFILEURL, POST_link}_uri_unknown_{bad, ro_good, imm_good) added. > You've changed the _create_initial_children and _create_immutable_children to include unknown URIs, which threads tests for those nicely through the existing tests. Yes, that made it a lot easier to see what needed to be tested. > Other comments: > > Can you add a test for the change you made in the chunk starting around line 230 of dirnode.py that references #925? Now done (test_dirnode.py: test_spaces_are_stripped_on_the_way_out). As mentioned in #925, I changed it to only strip trailing spaces, which is sufficient to address the padding forward-compatibility issue. Incidentally, I decided not to change from_string_* in uri.py to use explicit named keyword arguments, because I think the code is clearer and more stable under maintenance as it is. I did add a comment that they take the same keyword args as from_string.
kevan commented 2010-01-28 02:40:36 +00:00
Owner

The new changes look good -- no complaints from me.

From grepping around the darcs patch, it seems like you've also remembered to remove the commented-out print statements -- if I'm mistaken, don't forget to do that. :)

The patches apply fine against a fresh checkout of trunk (as of about 5 minutes ago), and it is able to build without error. I need to leave for dinner now, but I'll confirm that all of the tests pass once I get back.

The new changes look good -- no complaints from me. From grepping around the darcs patch, it seems like you've also remembered to remove the commented-out print statements -- if I'm mistaken, don't forget to do that. :) The patches apply fine against a fresh checkout of trunk (as of about 5 minutes ago), and it is able to build without error. I need to leave for dinner now, but I'll confirm that all of the tests pass once I get back.
kevan commented 2010-01-28 06:38:16 +00:00
Owner

...and they pass, so everything looks good here. I'm going to mark this as "reviewed" and change its owner back to davidsarah -- if someone disagrees with that, then feel free to change it back.

...and they pass, so everything looks good here. I'm going to mark this as "reviewed" and change its owner back to davidsarah -- if someone disagrees with that, then feel free to change it back.

Okay, I'm ready to apply the patches, but which of the patches attached to this ticket are the one that should be applied?

Okay, I'm ready to apply the patches, but which of the patches attached to this ticket are the one that should be applied?

Oh I get it, the second-to-last one, which is marked as "supercedes previous patches" and which also seems to include the succeeding patch.

Oh I get it, the second-to-last one, which is marked as "supercedes previous patches" and which also seems to include the succeeding patch.

committed in changeset:6057bc02cc33b5b7, changeset:56c00cb381d3cd8c, changeset:b9eda4de6a4e1d3e. Thank you!

committed in changeset:6057bc02cc33b5b7, changeset:56c00cb381d3cd8c, changeset:b9eda4de6a4e1d3e. Thank you!
zooko added the
fixed
label 2010-01-28 15:02:44 +00:00
zooko closed this issue 2010-01-28 15:02:44 +00:00
davidsarah commented 2010-01-28 20:25:54 +00:00
Owner

Phew! After rereading the past discussion, here are a few clarifications about how 1.6 will behave:

  • as Brian mentioned in comment:73514, a known writecap should not be allowed in a ro_uri slot. That is now prevented and tested in [test_mutant_dirnodes_are_omitted]source:src/allmydata/test/test_web.py?rev=4195#L3337 (see mutant_write_in_ro_child).

  • as suggested by Zooko in comment:73514, URIs such as "URI:CHK:<a>look at me I'm evil<a>" are now treated as unknown (the BadURIError raised by, in this case, CHKFileURI.init_from_string is caught by [uri.from_string]source:src/allmydata/uri.py?rev=4195#L658). However they are not allowed to be put into directories -- the error thrown by init_from_string will be remembered and re-raised if we try to do that.

  • the patch does not add an "immutable" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "ro_uri" field holds a known-immutable cap, or starts with "imm.", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that.

  • there are no "bad_children" or "unrecognized_children" keys in the JSON -- we omit bad children, and unrecognized children go under the main "children" key.

Also, note that directory listings containing unknown caps were readable with Tahoe v1.5, but not v1.4.1 or earlier versions.

Phew! After rereading the past discussion, here are a few clarifications about how 1.6 will behave: * as Brian mentioned in [comment:73514](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73514), a known writecap should not be allowed in a `ro_uri` slot. That is now prevented and tested in [test_mutant_dirnodes_are_omitted]source:src/allmydata/test/test_web.py?rev=4195#L3337 (see `mutant_write_in_ro_child`). * as suggested by Zooko in [comment:73514](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73514), URIs such as "`URI:CHK:<a>look at me I'm evil<a>`" are now treated as unknown (the BadURIError raised by, in this case, `CHKFileURI.init_from_string` is caught by [uri.from_string]source:src/allmydata/uri.py?rev=4195#L658). However they are not allowed to be put into directories -- the error thrown by `init_from_string` will be remembered and re-raised if we try to do that. * the patch does not add an "`immutable`" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "`ro_uri`" field holds a known-immutable cap, or starts with "`imm.`", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that. * there are no "`bad_children`" or "`unrecognized_children`" keys in the JSON -- we omit bad children, and unrecognized children go under the main "`children`" key. Also, note that directory listings containing unknown caps were readable with Tahoe **v1.5**, but not v1.4.1 or earlier versions.
davidsarah commented 2010-01-28 20:31:29 +00:00
Owner

Attachment fix-inaccurate-comment-darcspatch.txt (44097 bytes) added

Fix inaccurate comment in test_mutant_dirnodes_are_omitted

**Attachment** fix-inaccurate-comment-darcspatch.txt (44097 bytes) added Fix inaccurate comment in test_mutant_dirnodes_are_omitted
davidsarah commented 2010-01-29 00:08:25 +00:00
Owner

Replying to davidsarah:

  • the patch does not add an "immutable" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "ro_uri" field holds a known-immutable cap, or starts with "imm.", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that.

I was slightly confused: there is already a "mutable" field (added in changeset:3ac4a734e507abbf), so we should not add "immutable". The issue is whether the output of the "t=json" operation on unknown nodes (which did not work prior to 1.6) should include "mutable", with a value set according to whether the ro_uri starts with "imm.". Currently there is no "mutable" field.

This isn't giving the webapi client any additional information that it couldn't work out for itself (and, contrary to the comment above, it does not need to understand the current cap formats, only the "imm." prefix). However it might help to simplify client code.

If a false value in the "mutable" field means "alleged immutable", then this change should be made; if it means "definitely immutable", it should not. We should decide this before the 1.6 release, otherwise clients would have to take into account the field not being present in any case.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/833#issuecomment-73557): > * the patch does not add an "`immutable`" field in the JSON representation. There's sufficient information to infer it: the dirnode is immutable iff the "`ro_uri`" field holds a known-immutable cap, or starts with "`imm.`", or is omitted. OTOH, that requires a client to understand at least the current cap formats, which is undesirable. I'll add a ticket for that. I was slightly confused: there is already a "`mutable`" field (added in changeset:3ac4a734e507abbf), so we should not add "`immutable`". The issue is whether the output of the "`t=json`" operation on unknown nodes (which did not work prior to 1.6) should include "`mutable`", with a value set according to whether the `ro_uri` starts with "`imm.`". Currently there is no "`mutable`" field. This isn't giving the webapi client any additional information that it couldn't work out for itself (and, contrary to the comment above, it does not need to understand the current cap formats, only the "`imm.`" prefix). However it *might* help to simplify client code. If a `false` value in the "`mutable`" field means "alleged immutable", then this change should be made; if it means "definitely immutable", it should not. We should decide this before the 1.6 release, otherwise clients would have to take into account the field not being present in any case.
davidsarah commented 2010-01-29 00:10:24 +00:00
Owner

Replying to [davidsarah]comment:59:

I was slightly confused: there is already a "mutable" field (added in changeset:3ac4a734e507abbf), ... Currently there is no "mutable" field.

My writing skills need work. I meant, currently there is no "mutable" field in the output of t=json for unknown nodes.

Replying to [davidsarah]comment:59: > I was slightly confused: there is already a "`mutable`" field (added in changeset:3ac4a734e507abbf), ... Currently there is no "`mutable`" field. My writing skills need work. I meant, currently there is no "`mutable`" field in the output of `t=json` for unknown nodes.
davidsarah commented 2010-01-29 03:06:09 +00:00
Owner

Note that if (we only have the URI for the object itself, or the parent directory is mutable), and (the node doesn't have a rw_uri and its ro_uri doesn't have the "imm." prefix), then we don't know whether or not it is immutable. In this case we'd omit the "mutable" field.

Note that if (we only have the URI for the object itself, or the parent directory is mutable), and (the node doesn't have a `rw_uri` and its `ro_uri` doesn't have the "`imm.`" prefix), then we don't know whether or not it is immutable. In this case we'd omit the "`mutable`" field.
davidsarah commented 2010-01-29 03:25:19 +00:00
Owner

Attachment add-mutable-to-json-and-imm-ro-to-listings-darcspatch.txt (52805 bytes) added

Add mutable field to t=json output for unknown nodes, when mutability is known. Includes patch for #931 on which it is dependent.

**Attachment** add-mutable-to-json-and-imm-ro-to-listings-darcspatch.txt (52805 bytes) added Add mutable field to t=json output for unknown nodes, when mutability is known. Includes patch for #931 on which it is dependent.
davidsarah commented 2010-01-29 03:28:58 +00:00
Owner

Reopening for review of t=json changes.

Note that there is also a minor bugfix for DeepCheckStreamer in directory.py: it was giving a type of "file" for unknown nodes. Now it gives a type of "unknown", which makes it consistent with ManifestStreamer.

Reopening for review of `t=json` changes. Note that there is also a minor bugfix for [DeepCheckStreamer](wiki/DeepCheckStreamer) in directory.py: it was giving a type of `"file"` for unknown nodes. Now it gives a type of `"unknown"`, which makes it consistent with [ManifestStreamer](wiki/ManifestStreamer).
tahoe-lafs removed the
fixed
label 2010-01-29 03:28:58 +00:00
davidsarah reopened this issue 2010-01-29 03:28:58 +00:00
davidsarah commented 2010-01-29 03:38:53 +00:00
Owner

Attachment this-is-the-one-to-apply-darcspatch.txt (54144 bytes) added

Fix inaccurate comment in test_mutant_dirnodes_are_omitted. Show -IMM and -RO suffixes for types of immutable and read-only unknown nodes in directory listings. Add mutable field to t=json output for unknown nodes, when mutability is known. Fix example JSON in webapi.txt that cannot occur in practice.

**Attachment** this-is-the-one-to-apply-darcspatch.txt (54144 bytes) added Fix inaccurate comment in test_mutant_dirnodes_are_omitted. Show -IMM and -RO suffixes for types of immutable and read-only unknown nodes in directory listings. Add mutable field to t=json output for unknown nodes, when mutability is known. Fix example JSON in webapi.txt that cannot occur in practice.
davidsarah commented 2010-01-29 04:01:58 +00:00
Owner

Apparently Python 2.4 doesn't support the 'foo if test else bar' syntax. Patch for that coming up.

Apparently Python 2.4 doesn't support the 'foo if test else bar' syntax. Patch for that coming up.
davidsarah commented 2010-01-29 04:02:44 +00:00
Owner

Attachment all-including-python24-fixes-darcspatch.txt (57750 bytes) added

Same patches as above but including Python 2.4 fixes.

**Attachment** all-including-python24-fixes-darcspatch.txt (57750 bytes) added Same patches as above but including Python 2.4 fixes.

Fixed in changeset:3e35959e9b9130a4, changeset:f1fd703dedca8c45. Thank you!

Fixed in changeset:3e35959e9b9130a4, changeset:f1fd703dedca8c45. Thank you!
zooko added the
fixed
label 2010-01-29 13:18:13 +00:00
zooko closed this issue 2010-01-29 13:18:13 +00:00
davidsarah commented 2010-01-30 07:03:26 +00:00
Owner

Attachment test-unknownnode-improvements-and-json-example-darcspatch.txt (48615 bytes) added

Improvements to tests for UnknownNode, and minor fix to a JSON example in webapi.txt

**Attachment** test-unknownnode-improvements-and-json-example-darcspatch.txt (48615 bytes) added Improvements to tests for [UnknownNode](wiki/UnknownNode), and minor fix to a JSON example in webapi.txt

Patches to improve tests and webapi doc for this issue: changeset:e2ee725d7d325b8d, changeset:ea3954372a06a36c

Patches to improve tests and webapi doc for this issue: changeset:e2ee725d7d325b8d, changeset:ea3954372a06a36c
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#833
No description provided.