give proper filenames on download #221
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#221
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As reported by Jonathan Tapicer:
According to Brian and/or RobK, the only way to really get the browser to give the right filename is to append "/foo.txt" to the end of the URL, and teach the tahoe backend (webish.py) to ignore that trailing filename-shaped thing.
This feature is urgent for the project that Jonathan is working on, so I put it into v0.7.0 Milestone. Brian: how long do you think it will take to implement this? Based on my newly earner knowledge of webish.py, I think it can be done in 1 day.
Now using the cool new "e-mail from trac" feature to Cc: Peter.
The proposals we have on the table are:
GET /download/$URI/$filename
Which would only be for single-component pathnames, i.e.
/download/abd123f/foo.txt
and not/download/abd123f/subdir/foo.txt
GET /download/$URI[/$SUBPATH]/$filename
Which would be for multi-component pathnames. The important feature is
that the last component of the URL is always the filename that we
want the browser to see, and is never used by the tahoe node to
find a child.
GET /uri/$URI[/$SUBPATH]?filename=$filename
This is what we currently have and it doesn't work. I believe the browser
uses a default filename of
$SUBPATH?filename=$filename
, which isa mess.
The first option is only for single-component pathnames, but that's just the
sort of URL we create by default for file downloads right now, so it wouldn't
be that much of a limitation. The second option is less limiting but I find
it surprising to have most of the URL components be for tahoe and then the
last one be just for the benefit of the browser.
Let me say up front that there does not exist a nice option, since there are two namespaces (the tahoe namespace and the namespace which is defined by the producer of the URL) that both require to use "/final_component" in some cases. So we are trying to choose the least surprising among the not-so-nice options.
I agree with your feeling about why the second option is surprising.
The surprising thing about the first option is that if you are a programmer writing HTML or javascript, or writing code in some other language that produces HTML or javascript when it is run, then you normally use
GET /uri/$URI[/$SUBPATH]
, even when you are passingsave=true
, or even if the$SUBPATH
is empty. However, if you passsave=true
and$SUBPATH
is empty at the same time, then you'll get this bad behavior. So you need to add a clause so that you do it asGET /uri/$URI[/$SUBPATH]
unless you are downloading the result and the$SUBPATH
is empty, in which case you doGET /download/$URI/$LOCAL_FILE_NAME
instead.Oh, and in fact, this same problem can apply to files which are viewed instead of downloaded! If you view a file, such as this one in your browser and then try to save that file to disk, it will give you a big long ugly suggested file name. So here is a proposal which offer good save-as file names for clicking a view link followed "File -> Save As" just as well as for clicking a download link:
1.b. There are two kinds of "GET":
GET /uri/$DIR_URI[/$SUBPATH]
GET /named/$FILE_URI/$LOCAL_FILE_NAME
This is just like option 1 except that it is not called "download" and it is orthogonal to the
save=true
option. For example, the HTML directories served up by webish.py could include links like these.Hopefully a user might be able to see from the URL that the "/foo.txt" means something different in these two URLs:
<http://localhost:8123/uri/DR_ysr4tryfm88rhk1od1zpo53r9wd5wb8e5xizwzg6ou5ifxuc/foo.txt>
,<http://localhost:8123/named/MR_hu6fnak1cge5zkz9eiysfy66iwuwggtcpxc3bir6cwo6o3bf/foo.txt>
.1.c. You can allow the use of
/named/
even when there is a subpath:GET /uri/$URI[/$SUBPATH]
GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME
//
to separate the tahoe namespace from the local save-as name:GET /uri/$URI[/$SUBPATH]
GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME
(Note that there is precedent for using
//
to indicate a boundary between nested namespaces -- it's the separator between "scheme" and "authority" in URIs.)4.b. You can combine
/named/
and//
for redundant signals:GET /uri/$URI[/$SUBPATH]
GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME
Okay, at this point I vote for option 4.b., I await Brian's feedback, and I request that Jonathan tell us which of these (especially 4.b) would make sense for his use.
Brian: what do you think of proposal 4.b? Jonathan said (in e-mail) that he liked it.
Since ticket #222 solves Jonathan's immediate problem and is easier to do than this ticket, I'm putting #222 into Milestone v0.7.0 and bumping this one out.
I'm slightly uncomfortable with 4/4b (using a double-slash to separate what
you're accessing from what you want to call it). I like having a distinction,
but a double-slash means:
confusing and dumb, but we haven't prohibited it yet.
you iterate over path components.. if you get a non-empty string, you
perform a child lookup. But if you get a empty string, you stop with the
child that you already have (hopefully a file) and consume the rest of the
path (asserting that it is of length one) for use as the filename. Hmm,
maybe that isn't so tough after all.
I don't like 1c, because that would lead to something like:
http://localhost:8123/named/DR_usr4tryf/foo.txt/foo.txt
I don't think I like /named for some reason (it's only used for GET, never
for PUT, so something emphazising the read- or download- ness seems better).
That's not a strong feeling, though.
Hm, the '4'
/uri/$URI/[$SUBPATH]//$FILENAME
approach is growing on me.It seems like we might be stealing a big chunk of the namespace for a
relatively trivial purpose, however.. we might want to use that same syntax
later for indicating which version of a multiversioned LDMF file you want to
retrieve (like the '
@@
' syntax that Clearcase uses for this purpose).Would it be an unreasonable restriction to say that you can only use this
local-name feature for file URIs and not for subpaths? I guess that means I'm
leaning towards '1'.
Ah, so many choices..
The Content-Disposition header should work correctly with most modern web browsers. At the very least it works on Safari 3, Firefox 2. It is not part of the HTTP spec but it is a widely implemented way of hinting at browsers what the default filename should be. It's mentioned in RFC2616 Section 19.5.1
The filename at the end of the URL will ensure the most wide-ranging support and also provide a hint to humans as to the contents of the URL. But you might find the Content-Disposition header a less intrusive change in the meantime.
Hm, I thought that we already set the Content-Disposition header, but source:docs/webapi.txt@2219 says that we do so only if
?save=on
.Hm, I thought that we already set the
Content-Disposition
header, but source:docs/webapi.txt@2219#L168 says that we do so only if?save=on
. Trying it withwget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf>
shows that theContent-Disposition
header is not set, and withwget --save-headers <http://tahoebs1.allmydata.com:8123/uri/URI%3ACHK%3Az76adrcsxcud72oqjw62dzbuyy%3A57plpxz3skec4qnbhe43pzdfe2hjg5lh44wsfqxzg7y7klub2syq%3A3%3A10%3A319262?filename=IraqMedia_Oct03_rpt.pdf&save=on>
. It also doesn't. Oh wait, I obviously don't understand what wget's--save-headers
option is supposed to do -- it never shows me any headers. On the other hand curl's--dump-header filename.txt
does what I expect, and shows that we do indeed set theContent-Disposition
if?save=on
.I also tested it with Firefox 2 on Mac OS X and it worked. I guess the limitation of this approach, though, is that you can't give someone an URL which they can either view or save and they get a reasonable filename when saving.
SamB had a couple of suggestions of how to set the default save filename without also triggering the browser so save the file immediately:
I'd like to make some progress on this one.
When we last left our intrepid heroes, they were pondering the following
alternatives:
GET /named/$FILE_URI/$LOCAL_FILE_NAME
GET /named/$URI[/$SUBPATH]/$LOCAL_FILE_NAME
GET /uri/$URI[/$SUBPATH]//$LOCAL_FILE_NAME
GET /named/$URI[/$SUBPATH]//$LOCAL_FILE_NAME
I rejected 1c because of the broken-looking duplication in e.g.
/named/dir/foo.txt/foo.txt
, and I'm not favorable towards 4 or 4b forsimilar reasons:
/uri/dir/foo.txt//foo.txt
or/named/dir/foo.txt//foo.txt
.I'm ok with 1b, but I'd suggest "file" instead of "named", to emphasize the
single-file-ness, since it would only be used for individual files. An
example of this would look like:
http://localhost:8123/file/IR_hu6fnak1cge5zkz9eiysfy66iwu/foo.txt
Zooko, where are your thoughts these days?
If we can get sufficient consensus on this, I'll implement it tomorrow.
fixing this will probably close #385 too, as long as the log-sanitizer recognizes /named or /file as it does /uri .
I'm in favor of 4. Your objections to 4 seem to be:
So since you wrote that we have prohibited empty subdirectory names, haven't we? In any case I don't mind doing so and in fact I think I prefer to do.
What do you think of this, now?
Well, maybe we can use '@@' for that if we later want to? :-)
I don't mind this. Tahoe programmers need to learn the difference between the first
foo.txt
, which specifies which child ofdir
, and the second, which specifies what name your web browser uses for that file. It is convenient that you can make the last*
optional and then the web browser will use the former for the latter by default. In particular, this allows you to just append*webbrowsername
to any Tahoe URL, e.g.:means that the capability in question actually is the wiki page, and the name
wiki.html
is just what you want your web browser to call it.Okay, overall I'm pretty happy with 4.b, and I hope that your objections, above, are not too strong.
The alternative, if I understand correctly, is 1.b, where the Tahoe filesystem's namespace is denoted by a top-level
/uri/
(soon to be renamed/cap/
), and the web browser's name for it is denoted by a top-level/named/
(or some such) and it allowed only when the cap is pointing directly at the file. I like the way that this unifies the two namespaces in the case that the URL includes only a capability -- in that case there is no way to write that you want the web browser to use a different filename from the one that the Tahoe directory uses. But I don't like the way that it "special cases" the case that the URL includes only a capability -- in that case instead of doing something that seems "natural" like appending a name to the URL, you have to change the top-level name from/cap/
to/named/
. Argh.I would like to agonize over this for a little while longer, please. :-)
Also I would like to invite tahoe-dev to notice this ticket, as some of them may have a useful insight to cut the Gordian knot of my ambivalence.
Hurray! Hurrah! We have consensus!
/file/FILECAP/@@named=/FILENAME
The actual implementation will simply ignore anything after FILECAP, so the
exact syntax is:
/file/FILECAP[/IGNORED..]
I'll implement this now. The code will simply have a
locateChild
methodthat ignores the rest of the path segments.
Closed, by changeset:304abfee32a06e05.