mplayer triggers two bugs in Tahoe's new downloader #1154
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1154
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I tried to play a movie hosted on a Tahoe-LAFS grid with 'mplayer (@@http://127.0.0.1:3456/file/URI-blah/@@named=/test.avi@@)'. The error seems to be related to mplayer seeking at the end of the file for downloading the AVI index before seeking back to the beginning for the actual rendering.
Preventing mplayer from reading the index with the '--noidx' parameter avoid this error.
This ticket is related to #798.
ooh, good bugs. I'll look into them.
I figured out the first one: it turns out that webapi GETs that use a Range header were completely broken: the root cause was that python's
*len*
method is, apparently, not allowed to return along
(who knew?). I just pushed the patch (changeset:8844655705e4fb76).I'm still trying to figure out the
stopProducing
exception: it feels like the HTTP client dropped the connection at an unexpected time (when no actual segments were being fetched). There's a simple fix, but I want to build a reproducible test case first.Hi Brian, Thanks for looking into this!
I've attached the full debug log (mplayer-with-idx.log) as well as the mplayer console output (mplayer-console.log) to this ticket. I hope this helps.
BTW, during this test the file download doesn't stop when mplayer exits. It is still completely downloaded.
And, when mplayer is run a second time, after Tahoe finished downloading the whole file, it plays the movie just well and even allows seeking through it.
Attachment mplayer-with-idx.log (570920 bytes) added
Attachment mplayer-console.log (1253 bytes) added
Hm, it looks like
mplayer-with-idx.log
is actually using the old downloader: the log messages don't match the new code.I've found a few places where that second exception could possibly be raised, but I'm still trying to find a way to reproduce it for a test.
Oh, nevermind, I think I figured it out. There's actually three bugs overlapping here:
Spans/DataSpans
classes used*len*
methods that returnedlong
s instead ofint
s, causing an exception during download. (my changeset:8844655705e4fb76 fix was incorrect: it turns out that*nonzero*
is not allowed to return along
either).DownloadNode
, where a failure in one segment-fetch will cause all other pending segment-fetches to hang foreverstopProducing
that occurs during this hang-forever period causes an exception, because there is no active segment-fetch in placeThe bug1 fix is easy: replace
self.*len*
withself.len
and make*nonzero*
always return abool
. The bug3 fix is also easy:DownloadNode._cancel_request
should tolerateself._active_segment
beingNone
.The bug2 fix is not trivial but not hard. The start-next-fetch code in
DownloadNode
should be factored out, andDownloadNode.fetch_failed
code should invoke it after sending errbacks to the requests which failed. This will add a nice property: if you get unrecoverable bit errors in one segment, you might still be able to get valid data from other segments (as opposed to giving up on the whole file because of a single error). I think there are some other changes that must be made to really get this property, though.. when we get to the point where we sort shares by "goodness", we'll probably clean this up. The basic idea will be that shares with errors go to the bottom of the list but are not removed from it entirely: if we really can't find the data we need somewhere else, we'll give the known-corrupted share a try, in the hopes that there are some uncorrupted parts of the share.I've got a series of test cases to exercise these three bugs.. I just have to build them in the right order to make sure that I'm not fixing the wrong one first (and thus hiding one of the others from my test).
Oh, I should mention what I'm guessing mplayer was doing. I think it issued two simultaneous parallel webapi GET requests: one for the index near the end of the file, and another for the first block of the file. They both would have been delivered to the same
FileNode
instance, creating twoSegmentation
requests (one for eachread()
call), creating two calls toDownloadNode.get_segment()
. The second one would wait for the first one to finish, since to keep the memory footprint low,DownloadNode
only works on one segment at a time. The first segment-fetch failed because of the*len*
bug, leaving the second fetch hanging (because of the lost-progress bug). When mplayer got an error on the index GET, I believe it gave up on the other GET, dropping the HTTP connection, causingconnectionLost
andstopProducing
, causing aDownloadLoad._cancel_request
when no segment-fetch was active, triggering the third bug.Should be fixed now, in changeset:43c5032105288a58 and changeset:f6f9a97627d210a6.