downloader hangs when server returns empty string #2024
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2024
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
While investigating the
test_download.Corrupt.test_each_byte catalog_detection=True
failure for #1382, after fixing the bitrot, we discovered that the downloader hangs when the server responds to a read request with zero bytes. In particular, when the test corrupts offset 8 (which is the MSB of thenum_leases
value), the storage server believes that the container includes a ridiculous number of leases, therefore the leases start (self._lease_offset
) before the beginning of the file, therefore all reads are truncated down to nothing.The actual bug is that the downloader doesn't handle this well. It looks like, when the read fails to satisfy any of the desired bytes, the downloader just issues a new read request, identical to the first. It then loops forever, trying to fetch the same range and always failing. This is an availability problem, since a corrupt/malicious server could prevent downloads from proceeding by mangling its responses in this way.
Instead, the downloader should never ask for a given range of bytes twice from the same storage server (at least without some intervening event like a reconnection). So the downloader will need to remember what it asked for, and if it doesn't get it, add those offsets to a list of "bytes that this server won't give us". Then, if we absolutely need any bytes that appear in that list, we declare the Share to be a loss and switch to a different one.
A simpler rule would probably work too: any zero-length reads are grounds to reject the share. We do some speculative reads (based upon assumptions about the segment size), so we'd need to look carefully at that code and make sure that the speculation cannot correctly yield zero bytes. In particular I'm thinking about the UEB fetch from the end of the share: its offset depends upon the size of the hash trees, so if our guessed segment size is too small, the UEB fetch might return zero bytes, but the correct thing to do is to re-fetch it from the correct place once we've grabbed the offset table.
The workaround I recommended for markb's work in #1382 is to just refrain from corrupting those four
num_leases
bytes. Once we fix this ticket, we should go back totest_download.py
and remove that workaround.Replying to warner:
That wouldn't work against a malicious server that, say, returns 1 byte.
+1.
Actually I'd like to have a test marked TODO for this bug first, to make sure it doesn't get lost.