gateway won't serve any page; variety of interesting error messages in twistd.log #1278

New Issue

zooko · 2010-12-10T20:48:02Z

zooko commented

2010-12-10 20:48:02 +00:00

Some servers on the public test grid are misbehaving -- I can tell because updating my blog takes about ten minutes sometimes. Having updated my blog a few times this morning, my gateway got into a broken state where it will not even load the recent uploads and downloads page...

Oh, wait, now it works again. Hm, I will attach twistd.log, and here are a few of the interesting messages:

2010-12-08 22:48:44-0800 [HTTPChannel,8,127.0.0.1] Unhandled Error
        Traceback (most recent call last):
        Failure: allmydata.immutable.downloader.common.DownloadStopped: our Consumer called stopProducing()
        
2010-12-10 10:21:11-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response
            return self._loop()
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 435, in _loop
            self.contacted_peers2)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 320, in addBoth
            callbackKeywords=kw, errbackKeywords=kw)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks
            self._runCallbacks()
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response
            return self._loop()
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 458, in _loop
            return self._failed(msg)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 561, in _failed
            raise UploadUnhappinessError(msg)
        allmydata.interfaces.UploadUnhappinessError: peer selection failed for <Tahoe2PeerSelector for upload kcaaq>: shares could be placed on only 4 server(s) such that any 3 of them have enough share
s to recover the file, but we were asked to place shares on at least 5 such servers. (placed 9 shares out of 10 total (1 homeless), want to place shares on at least 5 servers such that any 3 of them hav
e enough shares to recover the file, sent 13 queries to 11 peers, 2 queries placed some shares, 11 placed none (of which 4 placed none due to the server being full and 7 placed none due to an error)) (l
ast failure (from <PeerTracker for peer nszizgf5 and SI kcaaq>) was: [Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:546:_got_response
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:433:_loop
        /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:119:query
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:408:callRemote
        --- <exception caught here> ---
        /Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:133:maybeDeferred
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:447:_callRemote
        /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/broker.py:468:newRequestID
        ])
        
2010-12-10 10:21:12-0800 [-] Exception rendering error page:
2010-12-10 10:21:12-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/call.py", line 95, in fail
            self.deferred.errback(why)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 388, in errback
            self._startRunCallbacks(fail)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks
            self._runCallbacks()
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 82, in processingFailed
            handler.renderHTTP_exception(ctx, reason)
          File "/Library/Python/2.6/site-packages/allmydata/web/common.py", line 278, in renderHTTP_exception
            return super.renderHTTP_exception(self, ctx, f)
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception
            request.finishRequest( False )
          File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 176, in finishRequest
            server.Request.finish(self)
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/web/http.py", line 900, in finish
            "Request.finish called on a request after its connection was lost; "
        exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
        
2010-12-10 10:21:12-0800 [-] 'Original exception:'

2010-12-10 11:26:14-0800 [-] Unhandled Error
        Traceback (most recent call last):
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks
            self._runCallbacks()
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks
            current.result = callback(current.result, *args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 694, in <lambda>
            self._read_encrypted(length, ciphertext, hash_only, d2))
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 709, in _read_encrypted
            d = defer.maybeDeferred(self.original.read, size)
        --- <exception caught here> ---
          File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 133, in maybeDeferred
            result = f(*args, **kw)
          File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 1362, in read
            return defer.succeed([self._filehandle.read(length)])
        exceptions.ValueError: I/O operation on closed file

There is one flog incident file. I'll attach it.

Some servers on the public test grid are misbehaving -- I can tell because updating my blog takes about ten minutes sometimes. Having updated my blog a few times this morning, my gateway got into a broken state where it will not even load the recent uploads and downloads page... Oh, wait, now it works again. Hm, I will attach `twistd.log`, and here are a few of the interesting messages: ``` 2010-12-08 22:48:44-0800 [HTTPChannel,8,127.0.0.1] Unhandled Error Traceback (most recent call last): Failure: allmydata.immutable.downloader.common.DownloadStopped: our Consumer called stopProducing() 2010-12-10 10:21:11-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response return self._loop() File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 435, in _loop self.contacted_peers2) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 320, in addBoth callbackKeywords=kw, errbackKeywords=kw) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks self._runCallbacks() --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 546, in _got_response return self._loop() File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 458, in _loop return self._failed(msg) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 561, in _failed raise UploadUnhappinessError(msg) allmydata.interfaces.UploadUnhappinessError: peer selection failed for <Tahoe2PeerSelector for upload kcaaq>: shares could be placed on only 4 server(s) such that any 3 of them have enough share s to recover the file, but we were asked to place shares on at least 5 such servers. (placed 9 shares out of 10 total (1 homeless), want to place shares on at least 5 servers such that any 3 of them hav e enough shares to recover the file, sent 13 queries to 11 peers, 2 queries placed some shares, 11 placed none (of which 4 placed none due to the server being full and 7 placed none due to an error)) (l ast failure (from <PeerTracker for peer nszizgf5 and SI kcaaq>) was: [Failure instance: Traceback: <class 'foolscap.ipb.DeadReferenceError'>: Calling Stale Broker /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:546:_got_response /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:433:_loop /Library/Python/2.6/site-packages/allmydata/immutable/upload.py:119:query /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:408:callRemote --- <exception caught here> --- /Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py:133:maybeDeferred /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/referenceable.py:447:_callRemote /Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/broker.py:468:newRequestID ]) 2010-12-10 10:21:12-0800 [-] Exception rendering error page: 2010-12-10 10:21:12-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/foolscap-0.5.1_-py2.6.egg/foolscap/call.py", line 95, in fail self.deferred.errback(why) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 388, in errback self._startRunCallbacks(fail) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 455, in _startRunCallbacks self._runCallbacks() File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 82, in processingFailed handler.renderHTTP_exception(ctx, reason) File "/Library/Python/2.6/site-packages/allmydata/web/common.py", line 278, in renderHTTP_exception return super.renderHTTP_exception(self, ctx, f) File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception request.finishRequest( False ) File "/Library/Python/2.6/site-packages/Nevow-0.10.0-py2.6.egg/nevow/appserver.py", line 176, in finishRequest server.Request.finish(self) File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/web/http.py", line 900, in finish "Request.finish called on a request after its connection was lost; " exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this. 2010-12-10 10:21:12-0800 [-] 'Original exception:' ``` ``` 2010-12-10 11:26:14-0800 [-] Unhandled Error Traceback (most recent call last): File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 286, in addCallbacks self._runCallbacks() File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 542, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 694, in <lambda> self._read_encrypted(length, ciphertext, hash_only, d2)) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 709, in _read_encrypted d = defer.maybeDeferred(self.original.read, size) --- <exception caught here> --- File "/Library/Python/2.6/site-packages/Twisted-10.2.0_r30364-py2.6-macosx-10.6-universal.egg/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "/Library/Python/2.6/site-packages/allmydata/immutable/upload.py", line 1362, in read return defer.succeed([self._filehandle.read(length)]) exceptions.ValueError: I/O operation on closed file ``` There is one flog incident file. I'll attach it.

zooko added the

labels 2010-12-10 20:48:02 +00:00

zooko added this to the undecided milestone 2010-12-10 20:48:02 +00:00

zooko commented

2010-12-10 20:49:55 +00:00

Attachment twistd.log (519229 bytes) added

**Attachment** twistd.log (519229 bytes) added

twistd.log

507 KiB

zooko commented

2010-12-10 20:51:02 +00:00

Attachment 27Z-cgqrfqa.flog.bz2 (44340 bytes) added

**Attachment** 27Z-cgqrfqa.flog.bz2 (44340 bytes) added

27Z-cgqrfqa.flog.bz2

43 KiB

davidsarah commented

2011-02-21 06:00:09 +00:00

Possibly fixed by the patch on #1366. However, to close this ticket we should have a reproducible test.

tahoe-lafs modified the milestone from undecided to 1.9.0

2011-02-21 06:00:09 +00:00

warner commented

2011-02-21 06:14:31 +00:00

Hm, the #1366 patch would probably address the second weird message. But I don't think it could have been causing the web server to be unavailable forever. So yeah, let's leave this ticket open a bit longer until we understand what was going on.

warner commented

2011-10-13 17:05:50 +00:00

not making it into 1.9

warner modified the milestone from 1.9.0 to 1.10.0

2011-10-13 17:05:50 +00:00

nejucomo commented

2012-01-22 00:17:27 +00:00

I've seen similar behavior with respect to the second exception, but suspect my symptoms are distinct from #1366, so I filed #1664.

I have not seen the UploadUnhappinessError. I am using an LAE account where there N = K = 1.

I've seen similar behavior with respect to the second exception, but suspect my symptoms are distinct from #1366, so I filed #1664. I have *not* seen the `UploadUnhappinessError`. I am using an [LAE](https://leastauthority.com/) account where there `N = K = 1`.

zooko commented

2014-11-28 15:50:09 +00:00

I have a consulting client (codename "WAG") who is reporting an issue with very similar symptoms. They say that they have hundreds of thousands of people inconvenienced by this failure, so it is urgent for them to figure out what happened.

daira commented

2014-12-01 17:38:36 +00:00

Zooko, what's the evidence that this is the same issue? In my experience, two problem reports are only likely to be the same issue if they result in similar error reporting/tracebacks (in this case "exceptions.ValueError: I/O operation on closed file", for example).

Note that #1366 already fixed the "exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this." issue.

Zooko, what's the evidence that this is the same issue? In my experience, two problem reports are only likely to be the same issue if they result in similar error reporting/tracebacks (in this case "`exceptions.ValueError: I/O operation on closed file`", for example). Note that #1366 already fixed the "`exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.`" issue.

zooko commented

2014-12-03 21:10:05 +00:00

Replying to daira:

Zooko, what's the evidence that this is the same issue?

First of all the observed behavior of the gateway is similar — goes slower and slower and then "locks ups" entirely, refusing to display even its own Welcome Page. But, like you say, actual stack traces are a more reliable indicator of a common cause, and we have them…

WAG got these messages, in an incident report file named incident-2014-11-18--08-59-32Z-nchatry.flog.bz2:

06:20:45.186 [19723]: UNUSUAL error during GET FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/mutable/retrieve.py", line 222, in <lambda>
    self._pause_deferred.addCallback(lambda ignored: d.callback(res))
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 380, in callback
    self._startRunCallbacks(result)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 488, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/mutable/retrieve.py", line 228, in _check_for_stopped
    raise DownloadStopped("our Consumer called stopProducing()")
allmydata.interfaces.DownloadStopped: our Consumer called stopProducing()
]

and:

08:59:33.278 [28364487]: Unhandled Error Traceback (most recent call last): Failure: exceptions.RuntimeError: Producer was not unregistered for /file/URI:MDMF:CENSORED:CENSORED FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
Failure: exceptions.RuntimeError: Producer was not unregistered for /file/URI:MDMF:CENSORED:CENSORED
]

And these messages in an incident report file named incident-2014-10-16--14-09-34Z-twxj47a.flog.bz2:

2014-10-16_14:09:15.274918Z [11791490]: {'from-twisted':True, 'incarnation':('\xd4\xf0\xeb\x82\x9b\xa3\x05$', None), 'level':20, 'message':'Unhandled Error\nTraceback (most recent call last):\n File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-...0.0/src/allmydata/web/filenode.py", line 319, in replace_my_contents\n req.content.seek(0)\nexceptions.ValueError: I/O operation on closed file\n', 'system':'-', 'why':None, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 380, in callback
    self._startRunCallbacks(result)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 488, in _startRunCallbacks
    self._runCallbacks()
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 163, in gotPageContext
    pageContext.tag.renderHTTP, pageContext
--- <exception caught here> ---
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 137, in maybeDeferred
    result = f(*args, **kw)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/common.py", line 352, in renderHTTP
    return m(ctx)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/filenode.py", line 245, in render_PUT
    return self.replace_my_contents(req)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/filenode.py", line 319, in replace_my_contents
    req.content.seek(0)
exceptions.ValueError: I/O operation on closed file
]

and:

2014-10-16_14:09:15.281716Z [11791492]: {'from-twisted':True, 'incarnation':('\xd4\xf0\xeb\x82\x9b\xa3\x05$', None), 'level':20, 'message':'Unhandled Error\nTraceback (most recent call last):\n File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.eg...ost; "\nexceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.\n', 'system':'-', 'why':None, ...} FAILURE:
[CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 167, in gotPageContext
    processingFailed, self, pageContext
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 315, in addErrback
    errbackKeywords=kw)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 293, in addCallbacks
    self._runCallbacks()
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86

-- TRACEBACK ELIDED --

-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 82, in processingFailed
    handler.renderHTTP_exception(ctx, reason)
  File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/common.py", line 329, in renderHTTP_exception
    return super.renderHTTP_exception(self, ctx, f)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception
    request.finishRequest( False )
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 176, in finishRequest
    server.Request.finish(self)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/web/server.py", line 231, in finish
    return http.Request.finish(self)
  File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/web/http.py", line 885, in finish
    "Request.finish called on a request after its connection was lost; "
exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.
]

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1278#issuecomment-81506): > Zooko, what's the evidence that this is the same issue? First of all the observed behavior of the gateway is similar — goes slower and slower and then "locks ups" entirely, refusing to display even its own Welcome Page. But, like you say, actual stack traces are a more reliable indicator of a common cause, and we have them… WAG got these messages, in an incident report file named `incident-2014-11-18--08-59-32Z-nchatry.flog.bz2`: ``` 06:20:45.186 [19723]: UNUSUAL error during GET FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/data/allmydata-tahoe-1.10.0/src/allmydata/mutable/retrieve.py", line 222, in <lambda> self._pause_deferred.addCallback(lambda ignored: d.callback(res)) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 380, in callback self._startRunCallbacks(result) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 488, in _startRunCallbacks self._runCallbacks() --- <exception caught here> --- File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/data/allmydata-tahoe-1.10.0/src/allmydata/mutable/retrieve.py", line 228, in _check_for_stopped raise DownloadStopped("our Consumer called stopProducing()") allmydata.interfaces.DownloadStopped: our Consumer called stopProducing() ] ``` and: ``` 08:59:33.278 [28364487]: Unhandled Error Traceback (most recent call last): Failure: exceptions.RuntimeError: Producer was not unregistered for /file/URI:MDMF:CENSORED:CENSORED FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): Failure: exceptions.RuntimeError: Producer was not unregistered for /file/URI:MDMF:CENSORED:CENSORED ] ``` And these messages in an incident report file named `incident-2014-10-16--14-09-34Z-twxj47a.flog.bz2`: ``` 2014-10-16_14:09:15.274918Z [11791490]: {'from-twisted':True, 'incarnation':('\xd4\xf0\xeb\x82\x9b\xa3\x05$', None), 'level':20, 'message':'Unhandled Error\nTraceback (most recent call last):\n File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-...0.0/src/allmydata/web/filenode.py", line 319, in replace_my_contents\n req.content.seek(0)\nexceptions.ValueError: I/O operation on closed file\n', 'system':'-', 'why':None, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 380, in callback self._startRunCallbacks(result) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 488, in _startRunCallbacks self._runCallbacks() File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 575, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 163, in gotPageContext pageContext.tag.renderHTTP, pageContext --- <exception caught here> --- File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 137, in maybeDeferred result = f(*args, **kw) File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/common.py", line 352, in renderHTTP return m(ctx) File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/filenode.py", line 245, in render_PUT return self.replace_my_contents(req) File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/filenode.py", line 319, in replace_my_contents req.content.seek(0) exceptions.ValueError: I/O operation on closed file ] ``` and: ``` 2014-10-16_14:09:15.281716Z [11791492]: {'from-twisted':True, 'incarnation':('\xd4\xf0\xeb\x82\x9b\xa3\x05$', None), 'level':20, 'message':'Unhandled Error\nTraceback (most recent call last):\n File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.eg...ost; "\nexceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.\n', 'system':'-', 'why':None, ...} FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 167, in gotPageContext processingFailed, self, pageContext File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 315, in addErrback errbackKeywords=kw) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/internet/defer.py", line 293, in addCallbacks self._runCallbacks() File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86 -- TRACEBACK ELIDED -- -packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 82, in processingFailed handler.renderHTTP_exception(ctx, reason) File "/data/allmydata-tahoe-1.10.0/src/allmydata/web/common.py", line 329, in renderHTTP_exception return super.renderHTTP_exception(self, ctx, f) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 61, in renderHTTP_exception request.finishRequest( False ) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Nevow-0.11.1-py2.6.egg/nevow/appserver.py", line 176, in finishRequest server.Request.finish(self) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/web/server.py", line 231, in finish return http.Request.finish(self) File "/data/allmydata-tahoe-1.10.0/support/lib/python2.6/site-packages/Twisted-13.0.0-py2.6-linux-x86_64.egg/twisted/web/http.py", line 885, in finish "Request.finish called on a request after its connection was lost; " exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this. ] ```

zooko commented

2014-12-03 21:10:40 +00:00

Replying to daira:

Note that #1366 already fixed the "exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this." issue.

Hm… could that issue have been causing a file descriptor leak?

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1278#issuecomment-81506): > > Note that #1366 already fixed the "`exceptions.RuntimeError: Request.finish called on a request after its connection was lost; use Request.notifyFinish to keep track of this.`" issue. Hm… could that issue have been causing a file descriptor leak?

zooko commented

2014-12-03 21:23:25 +00:00

#1664 was similar to #1366, but is still open, because for some reason nejucomo suspected that they were actually different issues, so that the fix to #1366 wouldn't fix #1664. Could #1664 be affecting WAG and could it cause a file descriptor leak? (I doubt it.)

daira commented

2014-12-04 15:04:33 +00:00

Replying to [zooko]comment:8:

Replying to daira:

Zooko, what's the evidence that this is the same issue?

First of all the observed behavior of the gateway is similar — goes slower and slower and then "locks ups" entirely, refusing to display even its own Welcome Page. But, like you say, actual stack traces are a more reliable indicator of a common cause, and we have them…

OK, I'm convinced.

Replying to [zooko]comment:8: > Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1278#issuecomment-81506): > > Zooko, what's the evidence that this is the same issue? > > First of all the observed behavior of the gateway is similar — goes slower and slower and then "locks ups" entirely, refusing to display even its own Welcome Page. But, like you say, actual stack traces are a more reliable indicator of a common cause, and we have them… OK, I'm convinced.

Sign in to join this conversation.