webapi doesn't handle Range header correctly #2459
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2459
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The web-API hangs if a
GET
request withRange
header is sent and the end byteRange is equal the size or size-1 if cap is a unterminated string.given that filecap contains 'hi':
Apache httpd returns:
We can reproduce this problem with cURL and HTTPie. The command line for the latter is:
for example. (This suggests that it is not a cURL-specific problem.)
This seems to be specific to SDMF, which is strange. (Also tried LIT, MDMF, and 100-byte immutable files.)
See https://tools.ietf.org/html/rfc2616#section-14.16 for the HTTP/1.1 spec. The correct response for the
Range: bytes=0-2
case is the whole file. (A '416 Requested Range not satisfiable' error should not be returned in that case, because the requested range does have an overlap with the file contents. The web-API seems to correctly return a 416 error if the starting byte offset is past the end of the file.)There is currently no test for Range requests for SDMF in
test_mutable.py
; only for MDMF. I have added one for SDMF on the https://github.com/tahoe-lafs/tahoe-lafs/commits/2659.test-sdmf-version-partial-read.0 branch.The behaviour of this test (
Version.test_partial_read_sdmf_*
) is confusing; it works when the data is 100 bytes or 2 bytes, but not if it is 90 bytes. Perhaps there is an off-by-one error that only triggers when the size of the data is a multiple ofk
bytes? (See ticket:2462#comment:-1 for why that might happen.) But the case that hangs in this ticket is 2 bytes, which suggests that the test is not finding the same bug.webapi doesnt handle Range header correctlyto webapi doesn't handle Range header correctlyI wasn't able to reproduce this with 1.10.1 when my encoding parameters were 3/3/10. I was able to reproduce it with k=1/H=1/N=1.
This suggests something more than just an off-by-one error in the mutable retrieve code. Do we round the segsize up to be a multiple of 'k'?
In /tahoe-lafs/trac-2024-07-25/commit/a7e1dac27f0bc2b25b143f1be6f79d29c33ff41b:
In /tahoe-lafs/trac-2024-07-25/commit/89e9076c41420a4145ae9a1db236dc2a1eb41259:
So, it's useful to know that SDMF files, even though they only have a single segment, still round up their recorded
segsize
value to be a multiple ofshares.needed
. So if you upload a 2-byte file, and yourtahoe.cfg
holds the defaultk
of 3, then you'll wind up withsegsize=3
. If you've changedk=N=1
, you'll wind up withsegsize=2
.segsize
is used bymutable/retrieve.py
to decide which segments we're going to download. This only really makes sense for MDMF (which can have multiple segments), but when MDMF landed, SDMF got the same logic. It is also used in_set_segment()
to figure out how much of each segment should be delivered to the consumer. This last function had several bugs, and one failure case was to read with offset=0 and size=(some multiple of the segsize). In this case, if you're only reading one segment, the data would be truncated completely, and nothing would be written to the consumer.web/filenode.py
has already returned a Content-Length header by this point, so the HTTP client is expecting to see all the data it asked for. If the client is using a persistent connection, then they won't notice that the request has finished, and the client will hang.It looks like
_set_segment()
would also have had problems if you set the offset= to something non-zero: I think it would have returned the wrong number of bytes. The problem didn't show up in the two-byte file when it was uploaded withk=3
, because then the two-byte read wasn't a multiple of k, and the modulo bug wasn't triggered.We rewrote
_set_segment()
, and I think it should now handle all inputs correctly.It'd be nice to add a
test_web.py
case for this, but it needs to use a real SDMF file (uploaded with k=1). Most of the web tests are using fake file objects so they'll run faster.Actually, I'm ok with not adding a test. The
test_mutable.py
tests exercise theIReadable.read()
offset/range arguments pretty well, and I don't think we've observed any problems in the HTTP Range header parser. Would anyone object if I closed this?Replying to warner:
I'm ok with not adding a specific test of the HTTP layer, given that we already smoke-tested that, and the bug wasn't in that layer.
Ok, great, closing this one.