set ETag on immutable directories, short-circuit on cache hit #443
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#443
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When retrieving an immutable file, the ETag header should be set to the file's URI. When retrieving a mutable file, it should be set to the current roothash value. When retrieving a directory, it should also be set to the roothash value.
The server should also use the right sort of If-Modified-Since dance, using the ETag to let browser caches do the right thing.
I added code recently to set an ETag on immutable files (using the storage index.. I'm not sure whether to use this or the URI). It needs review, however, because the twisted.web
req.setETag
method needs to be called at a specific point in the response process (to make it interact with the if-etag-equals HTTP header), and I'm not sure we're doing it quite right.We still don't have etags on mutable files.
#844 references http://allmydata.org/pipermail/tahoe-dev/2009-November/003221.html which claims that the immutable-file ETag support wasn't doing the right thing for (at least) Squid to recognize immutable files as unchanged. Let's make it the goal of this ticket to make sure that we have this support works properly for a representative sample of web proxies and browsers, and open another ticket to add ETag support for mutable files.
webapi should provide an ETagto Check that webapi support for ETags on immutable files works properlyCan we add into this file to also check that it does the right thing for immutable directories? Or is that sufficiently different that it should be a separate ticket?
Replying to zooko:
We can indeed.
Check that webapi support for ETags on immutable files works properlyto Check that webapi support for ETags on immutable files/directories works properlyI'll have a look at this, since I'm mucking around in that area anyway (at least for immutable; I'll need someone to explain the situation with mutable files to me).
For immutable files:
Basic If-Match appears to be completely ignored; it always returns 200 rather than 412 (precondition failed), regardless of the provided etag.
If-None-Match will return "304 not modified" and no body, as it is supposed.
This is consistent with twisted web's server, which only checks for if-none-match. I'm not sure quite how the body is getting suppressed. We should be checking the result of setETag() to see if we have a etag hit or not; it may be its doing too much work.
Will check immutable dirs next.
Attachment imm-dir-etag.diff.txt (2939 bytes) added
Add ETag support for immutable directories
Attachment short-circuit-imm-etag.diff.txt (2807 bytes) added
Short circuit GET when client already has ETag
Did anyone figure out why it was thought that the ETag mechanism wasn't working for Squid per comment:66774? Was the explanation something to do with the thing about If-Match that jsgf mentioned in comment:66781? I don't understand what If-Match is for.
I think if-match is for PUT/POST operations to make sure the entity at the specified URI is still in the expected state.
I'm not sure why squid might have problems with Tahoe. The posting referred to in comment:66774 is pretty vague about what might be going wrong; it would be nice to see some actual analysis (or even just an HTTP protocol dump).
Check that webapi support for ETags on immutable files/directories works properlyto set ETag on immutable directories, short-circuit on cache hitWell, in handling a filenode these patches say:
but in handling a dirnode these patches say:
Shouldn't the filenode also have
t
appended to its ETag so that people who requestsomefile?t=json
don't get the same thing as the previous person who requestedsomefile
?jsgf: please reply to this and, if appropriate, re-add the "review-needed" flag on this ticket.
Yes, that looks pretty dubious. The ETag should be computed by a single function everywhere, rather than redone in a bespoke fashion at each point. But let me double-check that there isn't something else going on there...
No, I see. The first one is for plain immutable files which only have one representation - their actual binary contents. The second is for directories, which are interpreted by Tahoe itself and can have multiple representations. If you access the directory as a just the raw immutable file then you'll end up with an ETag of the first form.
But won't the current patch make the following thing go wrong, causing the second request to cache-hit when instead the second request needs to returning something entirely different.
(If I'm right, then this probably goes to show that our source:docs/frontends/webapi.txt needs to be reorganized so that people like jsgf can easily find this stuff.)
I asked jsgf about this in email. He replied:
Which I believe means that comment:66789 is right that the current patch is wrong. I guess this shows that we need a unit test which would be red with the current patch--a unit test that loads a file with one representation (
GET /uri/$FILEREADCAP
) and then loads it again with a different representation (GET /uri/$FILEREADCAP?t=json
) and flunks the code under test if it gave back the same thing on the second query.This probably just needs one more unit test (to demonstrate the problem mentioned in comment:66790) and one small tweak and it will be ready.
Oh and a NEWS snippet.
Out of time for 1.8.
If anyone wants to add a unit test and fix the issue in comment:66789, they should do so in the next week in order for this improvement to get into 1.9.
I added unit tests and a minor tweak.
In jsgf's original patches, the short-circuit behavior was added to the FileDownloader class and would therefore only apply to the "t=None" case. So the problem in comment:66789 does not occur.
However, we might as well short-circuit when an etag matches for "t=json" etc., in which case the tags do need to be distinguished.
https://github.com/amiller/tahoe-lafs/pull/2.patch
I've merged together jsgf's changes with amiller's fixes, and added a couple of new tests. I also fixed the implementation to not use an etag for t=info or t=rename-form, since both actually provide variable output. "", "json", "uri", "readonly-uri" are all eligible.
I'll push the changes in a moment.
In changeset:5d404db898e1e6dc:
This landed in a series of patches ending with changeset:5d404db89.