give proper filenames on download #221

New Issue

zooko · 2007-12-04T21:34:29Z

zooko commented

2007-12-04 21:34:29 +00:00

As reported by Jonathan Tapicer:

The file download url should send a header with the correct file so the
browser shows it for saving, sending the 'filename' parameter to the Tahoe
node via GET seems to have no effect, the filename is always a long
incoherent string of characters. For example, try this link:
http://tahoebs1.allmydata.com:8011/uri/URI%3ACHK%3A7u9ffi6gzsoi7qzj55783qu9k
w%3Adb9y3ep7n1s3nui3h1bk34riqcrk4xjtowjo57nikdfpzo8ojamy%3A3%3A10%3A68560?fi
lename=foo.txt&save=true

According to Brian and/or RobK, the only way to really get the browser to give the right filename is to append "/foo.txt" to the end of the URL, and teach the tahoe backend (webish.py) to ignore that trailing filename-shaped thing.

As reported by Jonathan Tapicer: ``` The file download url should send a header with the correct file so the browser shows it for saving, sending the 'filename' parameter to the Tahoe node via GET seems to have no effect, the filename is always a long incoherent string of characters. For example, try this link: http://tahoebs1.allmydata.com:8011/uri/URI%3ACHK%3A7u9ffi6gzsoi7qzj55783qu9k w%3Adb9y3ep7n1s3nui3h1bk34riqcrk4xjtowjo57nikdfpzo8ojamy%3A3%3A10%3A68560?fi lename=foo.txt&save=true ``` According to Brian and/or RobK, the only way to really get the browser to give the right filename is to append "/foo.txt" to the end of the URL, and teach the tahoe backend (webish.py) to ignore that trailing filename-shaped thing.

zooko added the

labels 2007-12-04 21:34:29 +00:00

zooko added this to the 0.7.0 milestone 2007-12-04 21:34:29 +00:00

zooko added

code-frontend-web

and removed

unknown

labels 2007-12-04 21:40:19 +00:00

zooko commented

2007-12-04 21:52:36 +00:00

This feature is urgent for the project that Jonathan is working on, so I put it into v0.7.0 Milestone. Brian: how long do you think it will take to implement this? Based on my newly earner knowledge of webish.py, I think it can be done in 1 day.

warner was assigned by zooko

2007-12-04 21:52:36 +00:00

zooko commented

2007-12-04 21:53:36 +00:00

Now using the cool new "e-mail from trac" feature to Cc: Peter.

warner commented

2007-12-05 06:28:07 +00:00

The proposals we have on the table are:

GET /download/$URI/$filename
Which would only be for single-component pathnames, i.e.
/download/abd123f/foo.txt and not /download/abd123f/subdir/foo.txt
GET /download/$URI[/$SUBPATH]/$filename
Which would be for multi-component pathnames. The important feature is
that the last component of the URL is always the filename that we
want the browser to see, and is never used by the tahoe node to
find a child.
GET /uri/$URI[/$SUBPATH]?filename=$filename
This is what we currently have and it doesn't work. I believe the browser
uses a default filename of $SUBPATH?filename=$filename, which is
a mess.

The first option is only for single-component pathnames, but that's just the
sort of URL we create by default for file downloads right now, so it wouldn't
be that much of a limitation. The second option is less limiting but I find
it surprising to have most of the URL components be for tahoe and then the
last one be just for the benefit of the browser.

The proposals we have on the table are: 1. `GET /download/$URI/$filename` Which would only be for single-component pathnames, i.e. `/download/abd123f/foo.txt` and not `/download/abd123f/subdir/foo.txt` 2. `GET /download/$URI[/$SUBPATH]/$filename` Which would be for multi-component pathnames. The important feature is that the last component of the URL is *always* the filename that we want the browser to see, and is *never* used by the tahoe node to find a child. 3. `GET /uri/$URI[/$SUBPATH]?filename=$filename` This is what we currently have and it doesn't work. I believe the browser uses a default filename of `$SUBPATH?filename=$filename`, which is a mess. The first option is only for single-component pathnames, but that's just the sort of URL we create by default for file downloads right now, so it wouldn't be that much of a limitation. The second option is less limiting but I find it surprising to have most of the URL components be for tahoe and then the last one be just for the benefit of the browser.

zooko commented

2007-12-05 20:20:58 +00:00

Let me say up front that there does not exist a nice option, since there are two namespaces (the tahoe namespace and the namespace which is defined by the producer of the URL) that both require to use "/final_component" in some cases. So we are trying to choose the least surprising among the not-so-nice options.

I agree with your feeling about why the second option is surprising.

The surprising thing about the first option is that if you are a programmer writing HTML or javascript, or writing code in some other language that produces HTML or javascript when it is run, then you normally use GET /uri/$URI[/$SUBPATH], even when you are passing save=true, or even if the $SUBPATH is empty. However, if you pass save=true and $SUBPATH is empty at the same time, then you'll get this bad behavior. So you need to add a clause so that you do it as GET /uri/$URI[/$SUBPATH] unless you are downloading the result and the $SUBPATH is empty, in which case you do GET /download/$URI/$LOCAL_FILE_NAME instead.

Oh, and in fact, this same problem can apply to files which are viewed instead of downloaded! If you view a file, such as this one in your browser and then try to save that file to disk, it will give you a big long ugly suggested file name. So here is a proposal which offer good save-as file names for clicking a view link followed "File -> Save As" just as well as for clicking a download link:

1.b. There are two kinds of "GET":

GET /uri/$DIR_URI[/$SUBPATH]
GET /named/$FILE_URI/$LOCAL_FILE_NAME

This is just like option 1 except that it is not called "download" and it is orthogonal to the save=true option. For example, the HTML directories served up by webish.py could include links like these.

Hopefully a user might be able to see from the URL that the "/foo.txt" means something different in these two URLs: <http://localhost:8123/uri/DR_ysr4tryfm88rhk1od1zpo53r9wd5wb8e5xizwzg6ou5ifxuc/foo.txt>, <http://localhost:8123/named/MR_hu6fnak1cge5zkz9eiysfy66iwuwggtcpxc3bir6cwo6o3bf/foo.txt>.

1.c. You can allow the use of /named/ even when there is a subpath:

GET /uri/$URI[/$SUBPATH]
GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME

You can use // to separate the tahoe namespace from the local save-as name:

GET /uri/$URI[/$SUBPATH]
GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME

(Note that there is precedent for using // to indicate a boundary between nested namespaces -- it's the separator between "scheme" and "authority" in URIs.)

4.b. You can combine /named/ and // for redundant signals:

GET /uri/$URI[/$SUBPATH]
GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME

Okay, at this point I vote for option 4.b., I await Brian's feedback, and I request that Jonathan tell us which of these (especially 4.b) would make sense for his use.

Let me say up front that there does not exist a nice option, since there are two namespaces (the tahoe namespace and the namespace which is defined by the producer of the URL) that both require to use "/final_component" in some cases. So we are trying to choose the least surprising among the not-so-nice options. I agree with your feeling about why the second option is surprising. The surprising thing about the first option is that if you are a programmer writing HTML or javascript, or writing code in some other language that produces HTML or javascript when it is run, then you normally use `GET /uri/$URI[/$SUBPATH]`, even when you are passing `save=true`, or even if the `$SUBPATH` is empty. However, if you pass `save=true` *and* `$SUBPATH` is empty at the same time, then you'll get this bad behavior. So you need to add a clause so that you do it as `GET /uri/$URI[/$SUBPATH]` *unless* you are downloading the result and the `$SUBPATH` is empty, in which case you do `GET /download/$URI/$LOCAL_FILE_NAME` instead. Oh, and in fact, this same problem can apply to files which are viewed instead of downloaded! If you view a file, such as [this one](http://tahoebs1.allmydata.com:8123/uri/URI%3ASSK-RO%3Abpbzgmcw748wnofczop6qc5y83sqzdmhqs95ugqbuss65d1ucs9y%3Aswr57tmbobnejypxnhnrtw8jbcidd4nbyw14g8yay9mzhqkq1t9o?filename=index.html) in your browser and then try to save that file to disk, it will give you a big long ugly suggested file name. So here is a proposal which offer good save-as file names for clicking a view link followed "File -> Save As" just as well as for clicking a download link: 1.b. There are two kinds of "GET": * `GET /uri/$DIR_URI[/$SUBPATH]` * `GET /named/$FILE_URI/$LOCAL_FILE_NAME` This is just like option 1 except that it is not called "download" and it is orthogonal to the `save=true` option. For example, the HTML directories served up by webish.py could include links like these. Hopefully a user might be able to see from the URL that the "/foo.txt" means something different in these two URLs: `<http://localhost:8123/uri/DR_ysr4tryfm88rhk1od1zpo53r9wd5wb8e5xizwzg6ou5ifxuc/foo.txt>`, `<http://localhost:8123/named/MR_hu6fnak1cge5zkz9eiysfy66iwuwggtcpxc3bir6cwo6o3bf/foo.txt>`. 1.c. You can allow the use of `/named/` even when there is a subpath: * `GET /uri/$URI[/$SUBPATH]` * `GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME` 4. You can use `//` to separate the tahoe namespace from the local save-as name: * `GET /uri/$URI[/$SUBPATH]` * `GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME` (Note that there is precedent for using `//` to indicate a boundary between nested namespaces -- it's the separator between "scheme" and "authority" in URIs.) 4.b. You can combine `/named/` and `//` for redundant signals: * `GET /uri/$URI[/$SUBPATH]` * `GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME` Okay, at this point I vote for option 4.b., I await Brian's feedback, and I request that Jonathan tell us which of these (especially 4.b) would make sense for his use.

zooko commented

2007-12-07 12:43:41 +00:00

Brian: what do you think of proposal 4.b? Jonathan said (in e-mail) that he liked it.

Since ticket #222 solves Jonathan's immediate problem and is easier to do than this ticket, I'm putting #222 into Milestone v0.7.0 and bumping this one out.

Brian: what do you think of proposal 4.b? Jonathan said (in e-mail) that he liked it. Since ticket #222 solves Jonathan's immediate problem and is easier to do than this ticket, I'm putting #222 into Milestone v0.7.0 and bumping this one out.

zooko added this to the undecided milestone 2008-01-23 02:31:07 +00:00

warner commented

2008-03-08 00:42:09 +00:00

I'm slightly uncomfortable with 4/4b (using a double-slash to separate what
you're accessing from what you want to call it). I like having a distinction,
but a double-slash means:

you can no longer have empty subdirectory names. Granted, this would be
confusing and dumb, but we haven't prohibited it yet.
the twisted.web implementation would have an unusual code path. Basically
you iterate over path components.. if you get a non-empty string, you
perform a child lookup. But if you get a empty string, you stop with the
child that you already have (hopefully a file) and consume the rest of the
path (asserting that it is of length one) for use as the filename. Hmm,
maybe that isn't so tough after all.

I don't like 1c, because that would lead to something like:
http://localhost:8123/named/DR_usr4tryf/foo.txt/foo.txt

I don't think I like /named for some reason (it's only used for GET, never
for PUT, so something emphazising the read- or download- ness seems better).
That's not a strong feeling, though.

Hm, the '4' /uri/$URI/[$SUBPATH]//$FILENAME approach is growing on me.
It seems like we might be stealing a big chunk of the namespace for a
relatively trivial purpose, however.. we might want to use that same syntax
later for indicating which version of a multiversioned LDMF file you want to
retrieve (like the '@@' syntax that Clearcase uses for this purpose).

Would it be an unreasonable restriction to say that you can only use this
local-name feature for file URIs and not for subpaths? I guess that means I'm
leaning towards '1'.

Ah, so many choices..

I'm slightly uncomfortable with 4/4b (using a double-slash to separate what you're accessing from what you want to call it). I like having a distinction, but a double-slash means: * you can no longer have empty subdirectory names. Granted, this would be confusing and dumb, but we haven't prohibited it yet. * the twisted.web implementation would have an unusual code path. Basically you iterate over path components.. if you get a non-empty string, you perform a child lookup. But if you get a empty string, you stop with the child that you already have (hopefully a file) and consume the rest of the path (asserting that it is of length one) for use as the filename. Hmm, maybe that isn't so tough after all. I don't like 1c, because that would lead to something like: `http://localhost:8123/named/DR_usr4tryf/foo.txt/foo.txt` I don't think I like /named for some reason (it's only used for GET, never for PUT, so something emphazising the read- or download- ness seems better). That's not a strong feeling, though. Hm, the '4' `/uri/$URI/[$SUBPATH]//$FILENAME` approach is growing on me. It seems like we might be stealing a big chunk of the namespace for a relatively trivial purpose, however.. we might want to use that same syntax later for indicating which version of a multiversioned LDMF file you want to retrieve (like the '`@@`' syntax that Clearcase uses for this purpose). Would it be an unreasonable restriction to say that you can only use this local-name feature for file URIs and not for subpaths? I guess that means I'm leaning towards '1'. Ah, so many choices..

dreid commented

2008-03-08 03:39:13 +00:00

The Content-Disposition header should work correctly with most modern web browsers. At the very least it works on Safari 3, Firefox 2. It is not part of the HTTP spec but it is a widely implemented way of hinting at browsers what the default filename should be. It's mentioned in RFC2616 Section 19.5.1

19.5.1 Content-Disposition

   The Content-Disposition response-header field has been proposed as a
   means for the origin server to suggest a default filename if the user
   requests that the content is saved to a file. This usage is derived
   from the definition of Content-Disposition in RFC 1806 [changeset:d0fd8ddc8a113b24].

        content-disposition = "Content-Disposition" ":"
                              disposition-type *( ";" disposition-parm )
        disposition-type = "attachment" | disp-extension-token
        disposition-parm = filename-parm | disp-extension-parm
        filename-parm = "filename" "=" quoted-string
        disp-extension-token = token
        disp-extension-parm = token "=" ( token | quoted-string )

   An example is

        Content-Disposition: attachment; filename="fname.ext"

   The receiving user agent SHOULD NOT respect any directory path
   information present in the filename-parm parameter, which is the only
   parameter believed to apply to HTTP implementations at this time. The
   filename SHOULD be treated as a terminal component only.

   If this header is used in a response with the application/octet-
   stream content-type, the implied suggestion is that the user agent
   should not display the response, but directly enter a `save response
   as...' dialog.

   See section 15.5 for Content-Disposition security issues.

The filename at the end of the URL will ensure the most wide-ranging support and also provide a hint to humans as to the contents of the URL. But you might find the Content-Disposition header a less intrusive change in the meantime.

The Content-Disposition header should work correctly with most modern web browsers. At the very least it works on Safari 3, Firefox 2. It is not part of the HTTP spec but it is a widely implemented way of hinting at browsers what the default filename should be. It's mentioned in RFC2616 Section 19.5.1 ``` 19.5.1 Content-Disposition The Content-Disposition response-header field has been proposed as a means for the origin server to suggest a default filename if the user requests that the content is saved to a file. This usage is derived from the definition of Content-Disposition in RFC 1806 [changeset:d0fd8ddc8a113b24]. content-disposition = "Content-Disposition" ":" disposition-type *( ";" disposition-parm ) disposition-type = "attachment" | disp-extension-token disposition-parm = filename-parm | disp-extension-parm filename-parm = "filename" "=" quoted-string disp-extension-token = token disp-extension-parm = token "=" ( token | quoted-string ) An example is Content-Disposition: attachment; filename="fname.ext" The receiving user agent SHOULD NOT respect any directory path information present in the filename-parm parameter, which is the only parameter believed to apply to HTTP implementations at this time. The filename SHOULD be treated as a terminal component only. If this header is used in a response with the application/octet- stream content-type, the implied suggestion is that the user agent should not display the response, but directly enter a `save response as...' dialog. See section 15.5 for Content-Disposition security issues. ``` The filename at the end of the URL will ensure the most wide-ranging support and also provide a hint to humans as to the contents of the URL. But you might find the Content-Disposition header a less intrusive change in the meantime.

zooko commented

2008-03-08 15:45:49 +00:00

Hm, I thought that we already set the Content-Disposition header, but source:docs/webapi.txt@2219 says that we do so only if ?save=on.

Hm, I thought that we already set the Content-Disposition header, but source:docs/webapi.txt@2219 says that we do so only if `?save=on`.

zooko commented

2008-03-08 15:56:24 +00:00

Hm, I thought that we already set the Content-Disposition header, but source:docs/webapi.txt@2219#L168 says that we do so only if ?save=on. Trying it with wget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf> shows that the Content-Disposition header is not set, and with wget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf&save=on>. It also doesn't. Oh wait, I obviously don't understand what wget's --save-headers option is supposed to do -- it never shows me any headers. On the other hand curl's --dump-header filename.txt does what I expect, and shows that we do indeed set the Content-Disposition if ?save=on.

I also tested it with Firefox 2 on Mac OS X and it worked. I guess the limitation of this approach, though, is that you can't give someone an URL which they can either view or save and they get a reasonable filename when saving.

Hm, I thought that we already set the `Content-Disposition` header, but source:docs/webapi.txt@2219#L168 says that we do so only if `?save=on`. Trying it with `wget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf>` shows that the `Content-Disposition` header is not set, and with `wget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf&save=on>`. It also doesn't. Oh wait, I obviously don't understand what wget's `--save-headers` option is supposed to do -- it never shows me any headers. On the other hand curl's `--dump-header filename.txt` does what I expect, and shows that we do indeed set the `Content-Disposition` if `?save=on`. I also tested it with Firefox 2 on Mac OS X and it worked. I guess the limitation of this approach, though, is that you can't give someone an URL which they can *either* view or save and they get a reasonable filename when saving.

zooko commented

2008-03-08 22:18:04 +00:00

SamB had a couple of suggestions of how to set the default save filename without also triggering the browser so save the file immediately:

<SamB> well... you could TRY using an "inline" content-disposition...   [14:23] 
<SamB> but I don't think it's likely to help... 
<SamB> the other thing is that it might help not to have a content-type of 
       application/octet-stream                                         [14:24]

SamB had a couple of suggestions of how to set the default save filename without also triggering the browser so save the file immediately: ``` <SamB> well... you could TRY using an "inline" content-disposition... [14:23] <SamB> but I don't think it's likely to help... <SamB> the other thing is that it might help not to have a content-type of application/octet-stream [14:24] ```

warner commented

2008-05-09 01:15:15 +00:00

I'd like to make some progress on this one.

When we last left our intrepid heroes, they were pondering the following
alternatives:

1b : GET /named/$FILE_URI/$LOCAL_FILE_NAME
1c : GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME
4 : GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME
4b : GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME

I rejected 1c because of the broken-looking duplication in e.g.
/named/dir/foo.txt/foo.txt, and I'm not favorable towards 4 or 4b for
similar reasons: /uri/dir/foo.txt//foo.txt or
/named/dir/foo.txt//foo.txt.

I'm ok with 1b, but I'd suggest "file" instead of "named", to emphasize the
single-file-ness, since it would only be used for individual files. An
example of this would look like:
http://localhost:8123/file/IR_hu6fnak1cge5zkz9eiysfy66iwu/foo.txt

Zooko, where are your thoughts these days?

If we can get sufficient consensus on this, I'll implement it tomorrow.

I'd like to make some progress on this one. When we last left our intrepid heroes, they were pondering the following alternatives: * 1b : `GET /named/$FILE_URI/$LOCAL_FILE_NAME` * 1c : `GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME` * 4 : `GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME` * 4b : `GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME` I rejected 1c because of the broken-looking duplication in e.g. `/named/dir/foo.txt/foo.txt`, and I'm not favorable towards 4 or 4b for similar reasons: `/uri/dir/foo.txt//foo.txt` or `/named/dir/foo.txt//foo.txt`. I'm ok with 1b, but I'd suggest "file" instead of "named", to emphasize the single-file-ness, since it would only be used for individual files. An example of this would look like: `http://localhost:8123/file/IR_hu6fnak1cge5zkz9eiysfy66iwu/foo.txt` Zooko, where are your thoughts these days? If we can get sufficient consensus on this, I'll implement it tomorrow.

warner commented

2008-05-09 01:29:37 +00:00

fixing this will probably close #385 too, as long as the log-sanitizer recognizes /named or /file as it does /uri .

warner modified the milestone from undecided to 1.1.0

2008-05-09 01:29:37 +00:00

zooko commented

2008-05-09 18:29:55 +00:00

I'm in favor of 4. Your objections to 4 seem to be:

"you can no longer have empty subdirectory names. Granted, this would be confusing and dumb, but we haven't prohibited it yet."

So since you wrote that we have prohibited empty subdirectory names, haven't we? In any case I don't mind doing so and in fact I think I prefer to do.

"the twisted.web implementation would have an unusual code path. Basically you iterate over path components.. if you get a non-empty string, you perform a child lookup. But if you get a empty string, you stop with the child that you already have (hopefully a file) and consume the rest of the path (asserting that it is of length one) for use as the filename. Hmm, maybe that isn't so tough after all."

What do you think of this, now?

"It seems like we might be stealing a big chunk of the namespace for a relatively trivial purpose, however.. we might want to use that same syntax later for indicating which version of a multiversioned LDMF file you want to retrieve (like the '@@' syntax that Clearcase uses for this purpose)."

Well, maybe we can use '@@' for that if we later want to? :-)

"the broken-looking duplication" ... "e.g. /uri/dir/foo.txt//foo.txt"

I don't mind this. Tahoe programmers need to learn the difference between the first foo.txt, which specifies which child of dir, and the second, which specifies what name your web browser uses for that file. It is convenient that you can make the last * optional and then the web browser will use the former for the latter by default. In particular, this allows you to just append *webbrowsername to any Tahoe URL, e.g.:

uri/URI%3ACHK%3Awaqatup4yk7dyoosuyaux6vwzu%3Alzoa4c2phsp3x7ws47bofsbihjm5avxqo35qrqscgqvtwrjotyra%3A3%3A10%3A313227//wiki.html

means that the capability in question actually is the wiki page, and the name wiki.html is just what you want your web browser to call it.

Okay, overall I'm pretty happy with 4.b, and I hope that your objections, above, are not too strong.

The alternative, if I understand correctly, is 1.b, where the Tahoe filesystem's namespace is denoted by a top-level /uri/ (soon to be renamed /cap/), and the web browser's name for it is denoted by a top-level /named/ (or some such) and it allowed only when the cap is pointing directly at the file. I like the way that this unifies the two namespaces in the case that the URL includes only a capability -- in that case there is no way to write that you want the web browser to use a different filename from the one that the Tahoe directory uses. But I don't like the way that it "special cases" the case that the URL includes only a capability -- in that case instead of doing something that seems "natural" like appending a name to the URL, you have to change the top-level name from /cap/ to /named/. Argh.

I would like to agonize over this for a little while longer, please. :-)

Also I would like to invite tahoe-dev to notice this ticket, as some of them may have a useful insight to cut the Gordian knot of my ambivalence.

I'm in favor of 4. Your objections to 4 seem to be: * "you can no longer have empty subdirectory names. Granted, this would be confusing and dumb, but we haven't prohibited it yet." So since you wrote that we *have* prohibited empty subdirectory names, haven't we? In any case I don't mind doing so and in fact I think I prefer to do. * "the twisted.web implementation would have an unusual code path. Basically you iterate over path components.. if you get a non-empty string, you perform a child lookup. But if you get a empty string, you stop with the child that you already have (hopefully a file) and consume the rest of the path (asserting that it is of length one) for use as the filename. Hmm, maybe that isn't so tough after all." What do you think of this, now? * "It seems like we might be stealing a big chunk of the namespace for a relatively trivial purpose, however.. we might want to use that same syntax later for indicating which version of a multiversioned LDMF file you want to retrieve (like the '@@' syntax that Clearcase uses for this purpose)." Well, maybe we can use '@@' for that if we later want to? :-) * "the broken-looking duplication" ... "e.g. /uri/dir/foo.txt//foo.txt" I don't mind this. Tahoe programmers need to learn the difference between the first `foo.txt`, which specifies which child of `dir`, and the second, which specifies what name your web browser uses for that file. It is convenient that you can make the last `*` optional and then the web browser will use the former for the latter by default. In particular, this allows you to just append `*webbrowsername` to any Tahoe URL, e.g.: ``` uri/URI%3ACHK%3Awaqatup4yk7dyoosuyaux6vwzu%3Alzoa4c2phsp3x7ws47bofsbihjm5avxqo35qrqscgqvtwrjotyra%3A3%3A10%3A313227//wiki.html ``` means that the capability in question actually *is* the wiki page, and the name `wiki.html` is just what you want your web browser to call it. Okay, overall I'm pretty happy with 4.b, and I hope that your objections, above, are not too strong. The alternative, if I understand correctly, is 1.b, where the Tahoe filesystem's namespace is denoted by a top-level `/uri/` (soon to be renamed `/cap/`), and the web browser's name for it is denoted by a top-level `/named/` (or some such) and it allowed only when the cap is pointing directly at the file. I like the way that this unifies the two namespaces in the case that the URL includes only a capability -- in that case there is no way to write that you want the web browser to use a different filename from the one that the Tahoe directory uses. But I don't like the way that it "special cases" the case that the URL includes only a capability -- in that case instead of doing something that seems "natural" like appending a name to the URL, you have to change the top-level name from `/cap/` to `/named/`. Argh. I would like to agonize over this for a little while longer, please. :-) Also I would like to invite tahoe-dev to notice this ticket, as some of them may have a useful insight to cut the Gordian knot of my ambivalence.

warner commented

2008-05-14 21:05:56 +00:00

Hurray! Hurrah! We have consensus!

/file/FILECAP/@@named=/FILENAME

The actual implementation will simply ignore anything after FILECAP, so the
exact syntax is:

/file/FILECAP[/IGNORED..]

I'll implement this now. The code will simply have a locateChild method
that ignores the rest of the path segments.

Hurray! Hurrah! We have consensus! `/file/FILECAP/@@named=/FILENAME` The actual implementation will simply ignore anything after FILECAP, so the exact syntax is: `/file/FILECAP[/IGNORED..]` I'll implement this now. The code will simply have a `locateChild` method that ignores the rest of the path segments.

warner commented

2008-05-14 23:19:54 +00:00

Closed, by changeset:304abfee32a06e05.

warner added the

fixed

label 2008-05-14 23:19:54 +00:00

warner closed this issue

2008-05-14 23:19:54 +00:00

Sign in to join this conversation.