intermittent test_system failure #1768

Open
opened 2012-06-14 22:46:12 +00:00 by warner · 9 comments

The windows buildslave reported a failure in test_system today, in
changeset:b2dcbbb6, which went away upon a rebuild.

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/FreeStorm%20WinXP-x86%20py2.6/builds/103

Relevant console output:

allmydata.test.test_system.SystemTest.test_mutable ... [OK]
(4.703 secs)
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent ... [<twisted.python.failure.Failure <class 'foolscap.ipb.DeadReferenceError'>>]
Traceback (most recent call last):
  File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
    self.fail("Errors snooped, terminating early")
twisted.trial.unittest.FailTest: Errors snooped, terminating early
[FAILURE]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()>
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]
(6.500 secs)
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... [OK]
(10.938 secs)

...

===============================================================================
[FAIL]
Traceback (most recent call last):
  File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
    self.fail("Errors snooped, terminating early")
twisted.trial.unittest.FailTest: Errors snooped, terminating early

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()>

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
-------------------------------------------------------------------------------

and corresponding test.log contents:

2012-06-14 19:38:27.801Z [-] (UDP Port 1279 Closed)
2012-06-14 19:38:27.801Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x035D34E0>
2012-06-14 19:38:28.410Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection
	
2012-06-14 19:38:28.441Z [-] Unhandled error in Deferred:
2012-06-14 19:38:28.441Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.504Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.519Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2012-06-14 19:38:28.519Z [-] (TCP Port 1274 Closed)
2012-06-14 19:38:28.519Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x03DAB8F0>
2012-06-14 19:38:28.519Z [-] (TCP Port 1190 Closed)

The "Unhandled Error" (as opposed to the usual "Unhandled Error in
Deferred") means that something called twisted's log.err()
with a Failure or Exception, but nothing else (in particular not a
why= argument). This is a pretty common (perhaps unfortunate)
pattern in our code: tahoe has roughly 47 such calls, and Foolscap
has about 29. So we don't really know what caught the exception
(and therefore what ought to be handling it differently). The
exception itself was a DeadReferenceError, caused by a connection
being replaced while a remote call was outstanding. And most of the
remote calls were get_version, which we do immediately after
a getReference or connectTo completes (with one extra
RIUploadHelper upload() call). The Dirty Reactor Error
indicates that something failed to shut down a Tub in the error
path.

I'm not really sure what's going on here. It feels like the client
is being bounced while it's in the middle of making some
connections, as if we're not waiting for the client to finish
starting up before we shut it down again. It also feels like there
are some log.err() calls that should be replaced with the
less-serious log.msg(), since they don't really justify
flunking the tests. (Any log.err will flunk a test, unless the test
specifically checks for and discards them).

Throwing a log.err at the end of a Deferred chain when you don't
really care about it anymore is pretty convenient and safe (you
won't be losing information about unexpected errors). But it's
still a sort of "XXX this point is never reached" sort of marker.
Having them get invoked during tests might suggest that control is
actually reaching those points.

The windows buildslave reported a failure in test_system today, in changeset:b2dcbbb6, which went away upon a rebuild. <https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/FreeStorm%20WinXP-x86%20py2.6/builds/103> Relevant console output: ``` allmydata.test.test_system.SystemTest.test_mutable ... [OK] (4.703 secs) allmydata.test.test_system.SystemTest.test_upload_and_download_convergent ... [<twisted.python.failure.Failure <class 'foolscap.ipb.DeadReferenceError'>>] Traceback (most recent call last): File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll self.fail("Errors snooped, terminating early") twisted.trial.unittest.FailTest: Errors snooped, terminating early [FAILURE]Traceback (most recent call last): Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean. DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug) <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()> [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload) [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version) [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [ERROR] (6.500 secs) allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... [OK] (10.938 secs) ... =============================================================================== [FAIL] Traceback (most recent call last): File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll self.fail("Errors snooped, terminating early") twisted.trial.unittest.FailTest: Errors snooped, terminating early allmydata.test.test_system.SystemTest.test_upload_and_download_convergent =============================================================================== [ERROR] Traceback (most recent call last): Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean. DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug) <DelayedCall 0x494acb0 [0.875s] called=0 cancelled=0 Reconnector._timer_expired()> allmydata.test.test_system.SystemTest.test_upload_and_download_convergent =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection allmydata.test.test_system.SystemTest.test_upload_and_download_convergent =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload) allmydata.test.test_system.SystemTest.test_upload_and_download_convergent =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version) allmydata.test.test_system.SystemTest.test_upload_and_download_convergent =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) allmydata.test.test_system.SystemTest.test_upload_and_download_convergent allmydata.test.test_system.SystemTest.test_upload_and_download_convergent allmydata.test.test_system.SystemTest.test_upload_and_download_convergent ------------------------------------------------------------------------------- ``` and corresponding test.log contents: ``` 2012-06-14 19:38:27.801Z [-] (UDP Port 1279 Closed) 2012-06-14 19:38:27.801Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x035D34E0> 2012-06-14 19:38:28.410Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: [7xfh] replaced by a new connection 2012-06-14 19:38:28.441Z [-] Unhandled error in Deferred: 2012-06-14 19:38:28.441Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload) 2012-06-14 19:38:28.504Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIHelper.tahoe.allmydata.com:get_version) 2012-06-14 19:38:28.504Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2012-06-14 19:38:28.504Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2012-06-14 19:38:28.519Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2012-06-14 19:38:28.519Z [-] (TCP Port 1274 Closed) 2012-06-14 19:38:28.519Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x03DAB8F0> 2012-06-14 19:38:28.519Z [-] (TCP Port 1190 Closed) ``` The "Unhandled Error" (as opposed to the usual "Unhandled Error in Deferred") means that something called twisted's `log.err()` with a Failure or Exception, but nothing else (in particular not a why= argument). This is a pretty common (perhaps unfortunate) pattern in our code: tahoe has roughly 47 such calls, and Foolscap has about 29. So we don't really know what caught the exception (and therefore what ought to be handling it differently). The exception itself was a DeadReferenceError, caused by a connection being replaced while a remote call was outstanding. And most of the remote calls were `get_version`, which we do immediately after a getReference or connectTo completes (with one extra `RIUploadHelper upload()` call). The Dirty Reactor Error indicates that something failed to shut down a Tub in the error path. I'm not really sure what's going on here. It feels like the client is being bounced while it's in the middle of making some connections, as if we're not waiting for the client to finish starting up before we shut it down again. It also feels like there are some `log.err()` calls that should be replaced with the less-serious `log.msg()`, since they don't really justify flunking the tests. (Any log.err will flunk a test, unless the test specifically checks for and discards them). Throwing a log.err at the end of a Deferred chain when you don't really care about it anymore is pretty convenient and safe (you won't be losing information about unexpected errors). But it's still a sort of "XXX this point is never reached" sort of marker. Having them get invoked during tests might suggest that control is actually reaching those points.
warner added the
code
normal
defect
1.9.1
labels 2012-06-14 22:46:12 +00:00
warner added this to the undecided milestone 2012-06-14 22:46:12 +00:00
davidsarah commented 2013-03-19 23:54:21 +00:00
Owner

Another probable instance of this error on the same buildslave (build 170):

[FAIL]
Traceback (most recent call last):
  File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll
    self.fail("Errors snooped, terminating early")
twisted.trial.unittest.FailTest: Errors snooped, terminating early

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\referenceable.py", line 455, in _callRemote
    reqID = broker.newRequestID()
  File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 477, in newRequestID
    raise DeadReferenceError("Calling Stale Broker")
foolscap.ipb.DeadReferenceError: Calling Stale Broker

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload)

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=iwlr) (during method=RIStorageServer.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=z3g3) (during method=RIStorageServer.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "c:\python26\lib\site-packages\twisted\python\log.py", line 84, in callWithLogger
    return callWithContext({"system": lp}, func, *args, **kw)
  File "c:\python26\lib\site-packages\twisted\python\log.py", line 69, in callWithContext
    return context.call({ILogContext: newCtx}, func, *args, **kw)
  File "c:\python26\lib\site-packages\twisted\python\context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "c:\python26\lib\site-packages\twisted\python\context.py", line 81, in callWithContext
    return func(*args,**kw)
  File "c:\python26\lib\site-packages\twisted\internet\selectreactor.py", line 155, in _doReadOrWrite
    self._disconnectSelectable(selectable, why, method=="doRead")
  File "c:\python26\lib\site-packages\twisted\internet\posixbase.py", line 263, in _disconnectSelectable
    selectable.connectionLost(f)
  File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 433, in connectionLost
    Connection.connectionLost(self, reason)
  File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 277, in connectionLost
    protocol.connectionLost(reason)
  File "c:\python26\lib\site-packages\twisted\protocols\tls.py", line 460, in connectionLost
    ProtocolWrapper.connectionLost(self, reason)
  File "c:\python26\lib\site-packages\twisted\protocols\policies.py", line 123, in connectionLost
    self.wrappedProtocol.connectionLost(reason)
  File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 253, in connectionLost
    self.tub.connectionDropped(self)
  File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\pb.py", line 1070, in connectionDropped
    self._allConnectionsAreDisconnected.fire(self)
  File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\observer.py", line 42, in fire
    assert not self._fired
exceptions.AssertionError: 

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
-------------------------------------------------------------------------------
Another probable instance of this error on the same buildslave (build 170): ``` [FAIL] Traceback (most recent call last): File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "c:\buildbot_tahoe\freestorm_winxp-x86_py2_6\build\src\allmydata\util\pollmixin.py", line 49, in _poll self.fail("Errors snooped, terminating early") twisted.trial.unittest.FailTest: Errors snooped, terminating early allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): File "c:\python26\lib\site-packages\twisted\internet\defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\referenceable.py", line 455, in _callRemote reqID = broker.newRequestID() File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 477, in newRequestID raise DeadReferenceError("Calling Stale Broker") foolscap.ipb.DeadReferenceError: Calling Stale Broker allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIUploadHelper.tahoe.allmydata.com:upload) allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=iwlr) (during method=RIStorageServer.tahoe.allmydata.com:get_version) allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=z3g3) (during method=RIStorageServer.tahoe.allmydata.com:get_version) allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) allmydata.test.test_system.SystemTest.test_upload_and_download_random_key =============================================================================== [ERROR] Traceback (most recent call last): File "c:\python26\lib\site-packages\twisted\python\log.py", line 84, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "c:\python26\lib\site-packages\twisted\python\log.py", line 69, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "c:\python26\lib\site-packages\twisted\python\context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "c:\python26\lib\site-packages\twisted\python\context.py", line 81, in callWithContext return func(*args,**kw) File "c:\python26\lib\site-packages\twisted\internet\selectreactor.py", line 155, in _doReadOrWrite self._disconnectSelectable(selectable, why, method=="doRead") File "c:\python26\lib\site-packages\twisted\internet\posixbase.py", line 263, in _disconnectSelectable selectable.connectionLost(f) File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 433, in connectionLost Connection.connectionLost(self, reason) File "c:\python26\lib\site-packages\twisted\internet\tcp.py", line 277, in connectionLost protocol.connectionLost(reason) File "c:\python26\lib\site-packages\twisted\protocols\tls.py", line 460, in connectionLost ProtocolWrapper.connectionLost(self, reason) File "c:\python26\lib\site-packages\twisted\protocols\policies.py", line 123, in connectionLost self.wrappedProtocol.connectionLost(reason) File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\broker.py", line 253, in connectionLost self.tub.connectionDropped(self) File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\pb.py", line 1070, in connectionDropped self._allConnectionsAreDisconnected.fire(self) File "C:\buildbot_tahoe\FreeStorm_WinXP-x86_py2_6\build\support\Lib\site-packages\foolscap-0.6.4-py2.6.egg\foolscap\observer.py", line 42, in fire assert not self._fired exceptions.AssertionError: allmydata.test.test_system.SystemTest.test_upload_and_download_random_key allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ------------------------------------------------------------------------------- ```
davidsarah commented 2013-03-20 00:01:40 +00:00
Owner

This may be related to or a duplicate of #1757. (Alternatively, if they are distinct bugs, then comment:89018 could be an instance of #1757 instead of this bug.)

This may be related to or a duplicate of #1757. (Alternatively, if they are distinct bugs, then [comment:89018](/tahoe-lafs/trac-2024-07-25/issues/1768#issuecomment-89018) could be an instance of #1757 instead of this bug.)

Hm that stack trace in the final ERROR in comment:89018 shows that connectionDropped is getting fired twice:

http://foolscap.lothar.com/trac/browser/foolscap/pb.py?annotate=blame&rev=900c2b4a7aaf9370a76d97a59ebe8943d2dac353#L1070

alternate hosting of same file:

https://github.com/warner/foolscap/blob/900c2b4a7aaf9370a76d97a59ebe8943d2dac353/foolscap/pb.py#l1070

This reminds me of #653, in which we determined that most likely there is a bug in foolscap which causes notifyOnDisconnect to not get called sometimes when it should get called. Putting these two together, it makes me think that foolscap can currently err on both sides — double-invoking a "respond to disconnect" event sometimes (but we've seen this only on Windows) and zero-invoking it other times.

As I've mentioned before, I really think "responding to disconnect events" is a losing game. To me it smells like "clean shutdown logic" in an app. It tends to be buggy, it is labor-intensive to implement and debug it, and it can never be 100% correct (because shutdowns of an app are sometimes hard and because disconnections of a network are sometimes undetectable).

So my preference for "crash-only programming" (in which you don't expend engineering effort trying to design and implement the "clean shutdown" case) is perhaps related to my preference for "crash-only networking", in which you assume that your application won't get a reliable notification of disconnect, and you don't expend engineering effort trying to deliver one.

So I would be somewhat more interested in removing "respond to disconnect events" features from foolscap and from Tahoe-LAFS (see #816, #1975) than in debugging this. However, I'm definitely not very happy with the current situation, where the unit tests sometimes spuriously (??) fail on Windows due to this probable bug in foolscap.

Hm that stack trace in the final `ERROR` in [comment:89018](/tahoe-lafs/trac-2024-07-25/issues/1768#issuecomment-89018) shows that `connectionDropped` is getting fired twice: <http://foolscap.lothar.com/trac/browser/foolscap/pb.py?annotate=blame&rev=900c2b4a7aaf9370a76d97a59ebe8943d2dac353#L1070> alternate hosting of same file: <https://github.com/warner/foolscap/blob/900c2b4a7aaf9370a76d97a59ebe8943d2dac353/foolscap/pb.py#l1070> This reminds me of #653, in which we determined that most likely there is a bug in foolscap which causes `notifyOnDisconnect` to *not* get called sometimes when it should get called. Putting these two together, it makes me think that foolscap can currently err on both sides — double-invoking a "respond to disconnect" event sometimes (but we've seen this only on Windows) and zero-invoking it other times. As I've mentioned before, I really think "responding to disconnect events" is a losing game. To me it smells like "clean shutdown logic" in an app. It tends to be buggy, it is labor-intensive to implement and debug it, and it can never be 100% correct (because shutdowns of an app are sometimes hard and because disconnections of a network are sometimes undetectable). So my preference for "crash-only programming" (in which you don't expend engineering effort trying to design and implement the "clean shutdown" case) is perhaps related to my preference for "crash-only networking", in which you assume that your application won't get a reliable notification of disconnect, and you don't expend engineering effort trying to deliver one. So I would be somewhat more interested in removing "respond to disconnect events" features from foolscap and from Tahoe-LAFS (see #816, #1975) than in debugging this. However, I'm definitely not very happy with the current situation, where the unit tests sometimes spuriously (??) fail on Windows due to this probable bug in foolscap.

I think #1757 was a duplicate of this.

I think #1757 was a duplicate of this.
daira commented 2013-07-15 14:21:02 +00:00
Owner

In tests there should be no unexpected disconnections anyway, right? So is the cause of the disconnection just Windows networking being flaky, or do similar errors happen on other platforms?

In tests there should be no unexpected disconnections anyway, right? So is the cause of the disconnection just Windows networking being flaky, or do similar errors happen on other platforms?

Our windows buildslave demonstrated this again:

https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/Marcus%20Cygwin%20WinXP/builds/140/

The relevant excerpt from stdout is:

allmydata.test.test_system.SystemTest.test_filesystem ... [<twisted.python.failure.Failure <class 'foolscap.ipb.DeadReferenceError'>>]
Traceback (most recent call last):
  File "/usr/lib/python2.6/site-packages/Twisted-11.1.0-py2.6-cygwin-1.7.9-i686.egg/twisted/internet/defer.py", line 134, in maybeDeferred
    result = f(*args, **kw)
  File "/home/marcus/bbtahoe/Marcus_Cygwin_WinXP/build/src/allmydata/util/pollmixin.py", line 49, in _poll
    self.fail("Errors snooped, terminating early")
twisted.trial.unittest.FailTest: Errors snooped, terminating early
[FAILURE]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: replaced by a new connection
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=njea) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
[ERROR]Traceback (most recent call last):
Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
[ERROR]
(17.628 secs)

And the relevant excerpt from trial log is:

2013-07-18 21:51:21.391Z [-] --> allmydata.test.test_system.SystemTest.test_filesystem <--
2013-07-18 21:51:21.470Z [-] Listener starting on 2549
2013-07-18 21:51:21.480Z [-] NevowSite starting on 2550
2013-07-18 21:51:21.480Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7cc48b8c>
2013-07-18 21:51:21.484Z [-] My pid: 136
2013-07-18 21:51:21.495Z [-] DatagramProtocol starting on 2551
2013-07-18 21:51:21.495Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4994c>
2013-07-18 21:51:21.503Z [-] DatagramProtocol starting on 2552
2013-07-18 21:51:21.504Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49d8c>
2013-07-18 21:51:21.509Z [-] DatagramProtocol starting on 2553
2013-07-18 21:51:21.510Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49ccc>
2013-07-18 21:51:21.519Z [-] (UDP Port 2551 Closed)
2013-07-18 21:51:21.519Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4994c>
2013-07-18 21:51:21.520Z [-] (UDP Port 2552 Closed)
2013-07-18 21:51:21.520Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49d8c>
2013-07-18 21:51:21.521Z [-] (UDP Port 2553 Closed)
2013-07-18 21:51:21.521Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49ccc>
2013-07-18 21:51:21.644Z [-] Listener starting on 2554
2013-07-18 21:51:21.658Z [-] DatagramProtocol starting on 2555
2013-07-18 21:51:21.658Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7d580a2c>
2013-07-18 21:51:21.918Z [-] Listener starting on 2556
2013-07-18 21:51:21.927Z [-] NevowSite starting on 2557
2013-07-18 21:51:21.927Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e6e24ec>
2013-07-18 21:51:21.931Z [-] My pid: 136
2013-07-18 21:51:21.932Z [-] (UDP Port 2555 Closed)
2013-07-18 21:51:21.932Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7d580a2c>
2013-07-18 21:51:21.941Z [-] DatagramProtocol starting on 2558
2013-07-18 21:51:21.942Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4942c>
2013-07-18 21:51:21.948Z [-] DatagramProtocol starting on 2559
2013-07-18 21:51:21.949Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e202c>
2013-07-18 21:51:21.953Z [-] DatagramProtocol starting on 2560
2013-07-18 21:51:21.954Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2cac>
2013-07-18 21:51:21.961Z [-] (UDP Port 2558 Closed)
2013-07-18 21:51:21.961Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4942c>
2013-07-18 21:51:21.962Z [-] (UDP Port 2559 Closed)
2013-07-18 21:51:21.962Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e202c>
2013-07-18 21:51:21.963Z [-] (UDP Port 2560 Closed)
2013-07-18 21:51:21.963Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2cac>
2013-07-18 21:51:22.135Z [-] Listener starting on 2561
2013-07-18 21:51:22.144Z [-] My pid: 136
2013-07-18 21:51:22.309Z [-] Listener starting on 2562
2013-07-18 21:51:22.316Z [-] My pid: 136
2013-07-18 21:51:23.019Z [-] Listener starting on 2563
2013-07-18 21:51:23.029Z [-] NevowSite starting on 2564
2013-07-18 21:51:23.029Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e6bb3cc>
2013-07-18 21:51:23.034Z [-] My pid: 136
2013-07-18 21:51:23.181Z [-] Listener starting on 2565
2013-07-18 21:51:23.190Z [-] My pid: 136
2013-07-18 21:51:23.201Z [-] DatagramProtocol starting on 2568
2013-07-18 21:51:23.201Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2c2c>
2013-07-18 21:51:23.210Z [-] DatagramProtocol starting on 2569
2013-07-18 21:51:23.210Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b4cc>
2013-07-18 21:51:23.215Z [-] DatagramProtocol starting on 2570
2013-07-18 21:51:23.216Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b50c>
2013-07-18 21:51:23.232Z [-] DatagramProtocol starting on 2571
2013-07-18 21:51:23.232Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b14c>
2013-07-18 21:51:23.241Z [-] DatagramProtocol starting on 2572
2013-07-18 21:51:23.242Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b66c>
2013-07-18 21:51:23.248Z [-] DatagramProtocol starting on 2573
2013-07-18 21:51:23.248Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b3ec>
2013-07-18 21:51:23.269Z [-] DatagramProtocol starting on 2574
2013-07-18 21:51:23.270Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4608c>
2013-07-18 21:51:23.277Z [-] DatagramProtocol starting on 2575
2013-07-18 21:51:23.277Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4634c>
2013-07-18 21:51:23.283Z [-] DatagramProtocol starting on 2576
2013-07-18 21:51:23.284Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46eec>
2013-07-18 21:51:23.297Z [-] DatagramProtocol starting on 2577
2013-07-18 21:51:23.297Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4688c>
2013-07-18 21:51:23.304Z [-] DatagramProtocol starting on 2578
2013-07-18 21:51:23.305Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46a0c>
2013-07-18 21:51:23.310Z [-] DatagramProtocol starting on 2579
2013-07-18 21:51:23.311Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7ce2ebac>
2013-07-18 21:51:23.329Z [-] (UDP Port 2568 Closed)
2013-07-18 21:51:23.329Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2c2c>
2013-07-18 21:51:23.330Z [-] (UDP Port 2569 Closed)
2013-07-18 21:51:23.331Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b4cc>
2013-07-18 21:51:23.331Z [-] (UDP Port 2570 Closed)
2013-07-18 21:51:23.332Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b50c>
2013-07-18 21:51:23.492Z [-] (UDP Port 2571 Closed)
2013-07-18 21:51:23.492Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b14c>
2013-07-18 21:51:23.493Z [-] (UDP Port 2572 Closed)
2013-07-18 21:51:23.493Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b66c>
2013-07-18 21:51:23.494Z [-] (UDP Port 2573 Closed)
2013-07-18 21:51:23.494Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b3ec>
2013-07-18 21:51:23.497Z [-] (UDP Port 2574 Closed)
2013-07-18 21:51:23.497Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4608c>
2013-07-18 21:51:23.497Z [-] (UDP Port 2575 Closed)
2013-07-18 21:51:23.497Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4634c>
2013-07-18 21:51:23.498Z [-] (UDP Port 2576 Closed)
2013-07-18 21:51:23.498Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46eec>
2013-07-18 21:51:23.499Z [-] (UDP Port 2577 Closed)
2013-07-18 21:51:23.499Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4688c>
2013-07-18 21:51:23.500Z [-] (UDP Port 2578 Closed)
2013-07-18 21:51:23.500Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46a0c>
2013-07-18 21:51:23.500Z [-] (UDP Port 2579 Closed)
2013-07-18 21:51:23.500Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7ce2ebac>
2013-07-18 21:51:27.782Z [-] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/>
2013-07-18 21:51:28.068Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/tahoe.css>
2013-07-18 21:51:28.071Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/>
2013-07-18 21:51:28.092Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/?t=json>
2013-07-18 21:51:28.093Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/tahoe.css>
2013-07-18 21:51:28.299Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/?t=json>
2013-07-18 21:51:36.534Z [-] (TCP Port 2557 Closed)
2013-07-18 21:51:36.536Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e6e24ec>
2013-07-18 21:51:36.537Z [-] (TCP Port 2556 Closed)
2013-07-18 21:51:37.802Z [-] Listener starting on 2556
2013-07-18 21:51:37.830Z [-] NevowSite starting on 2645
2013-07-18 21:51:37.830Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e33286c>
2013-07-18 21:51:37.834Z [-] My pid: 136
2013-07-18 21:51:37.852Z [-] DatagramProtocol starting on 2648
2013-07-18 21:51:37.852Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac88ec>
2013-07-18 21:51:37.860Z [-] DatagramProtocol starting on 2649
2013-07-18 21:51:37.860Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac828c>
2013-07-18 21:51:37.866Z [-] DatagramProtocol starting on 2650
2013-07-18 21:51:37.866Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac874c>
2013-07-18 21:51:37.875Z [-] (UDP Port 2648 Closed)
2013-07-18 21:51:37.875Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac88ec>
2013-07-18 21:51:37.876Z [-] (UDP Port 2649 Closed)
2013-07-18 21:51:37.876Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac828c>
2013-07-18 21:51:37.877Z [-] (UDP Port 2650 Closed)
2013-07-18 21:51:37.877Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac874c>
2013-07-18 21:51:38.831Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: replaced by a new connection
	
2013-07-18 21:51:38.948Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=njea) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
	
2013-07-18 21:51:38.951Z [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2013-07-18 21:51:38.953Z [-] (TCP Port 2645 Closed)
2013-07-18 21:51:38.954Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e33286c>
2013-07-18 21:51:38.955Z [-] (TCP Port 2556 Closed)
2013-07-18 21:51:38.956Z [-] (TCP Port 2565 Closed)
2013-07-18 21:51:38.957Z [-] (TCP Port 2564 Closed)
2013-07-18 21:51:38.958Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e6bb3cc>
2013-07-18 21:51:38.958Z [-] (TCP Port 2563 Closed)
2013-07-18 21:51:38.959Z [-] (TCP Port 2562 Closed)
2013-07-18 21:51:38.960Z [-] (TCP Port 2561 Closed)
2013-07-18 21:51:38.962Z [-] (TCP Port 2554 Closed)
2013-07-18 21:51:38.964Z [-] (TCP Port 2549 Closed)
2013-07-18 21:51:38.965Z [-] (TCP Port 2550 Closed)
2013-07-18 21:51:38.966Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7cc48b8c>
2013-07-18 21:51:39.019Z [-] Main loop terminated.
Our windows buildslave demonstrated this again: <https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/Marcus%20Cygwin%20WinXP/builds/140/> The relevant excerpt from stdout is: ``` allmydata.test.test_system.SystemTest.test_filesystem ... [<twisted.python.failure.Failure <class 'foolscap.ipb.DeadReferenceError'>>] Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/Twisted-11.1.0-py2.6-cygwin-1.7.9-i686.egg/twisted/internet/defer.py", line 134, in maybeDeferred result = f(*args, **kw) File "/home/marcus/bbtahoe/Marcus_Cygwin_WinXP/build/src/allmydata/util/pollmixin.py", line 49, in _poll self.fail("Errors snooped, terminating early") twisted.trial.unittest.FailTest: Errors snooped, terminating early [FAILURE]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: replaced by a new connection [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=njea) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2) [ERROR]Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [ERROR] (17.628 secs) ``` And the relevant excerpt from trial log is: ``` 2013-07-18 21:51:21.391Z [-] --> allmydata.test.test_system.SystemTest.test_filesystem <-- 2013-07-18 21:51:21.470Z [-] Listener starting on 2549 2013-07-18 21:51:21.480Z [-] NevowSite starting on 2550 2013-07-18 21:51:21.480Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7cc48b8c> 2013-07-18 21:51:21.484Z [-] My pid: 136 2013-07-18 21:51:21.495Z [-] DatagramProtocol starting on 2551 2013-07-18 21:51:21.495Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4994c> 2013-07-18 21:51:21.503Z [-] DatagramProtocol starting on 2552 2013-07-18 21:51:21.504Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49d8c> 2013-07-18 21:51:21.509Z [-] DatagramProtocol starting on 2553 2013-07-18 21:51:21.510Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49ccc> 2013-07-18 21:51:21.519Z [-] (UDP Port 2551 Closed) 2013-07-18 21:51:21.519Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4994c> 2013-07-18 21:51:21.520Z [-] (UDP Port 2552 Closed) 2013-07-18 21:51:21.520Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49d8c> 2013-07-18 21:51:21.521Z [-] (UDP Port 2553 Closed) 2013-07-18 21:51:21.521Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc49ccc> 2013-07-18 21:51:21.644Z [-] Listener starting on 2554 2013-07-18 21:51:21.658Z [-] DatagramProtocol starting on 2555 2013-07-18 21:51:21.658Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7d580a2c> 2013-07-18 21:51:21.918Z [-] Listener starting on 2556 2013-07-18 21:51:21.927Z [-] NevowSite starting on 2557 2013-07-18 21:51:21.927Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e6e24ec> 2013-07-18 21:51:21.931Z [-] My pid: 136 2013-07-18 21:51:21.932Z [-] (UDP Port 2555 Closed) 2013-07-18 21:51:21.932Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7d580a2c> 2013-07-18 21:51:21.941Z [-] DatagramProtocol starting on 2558 2013-07-18 21:51:21.942Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4942c> 2013-07-18 21:51:21.948Z [-] DatagramProtocol starting on 2559 2013-07-18 21:51:21.949Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e202c> 2013-07-18 21:51:21.953Z [-] DatagramProtocol starting on 2560 2013-07-18 21:51:21.954Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2cac> 2013-07-18 21:51:21.961Z [-] (UDP Port 2558 Closed) 2013-07-18 21:51:21.961Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cc4942c> 2013-07-18 21:51:21.962Z [-] (UDP Port 2559 Closed) 2013-07-18 21:51:21.962Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e202c> 2013-07-18 21:51:21.963Z [-] (UDP Port 2560 Closed) 2013-07-18 21:51:21.963Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2cac> 2013-07-18 21:51:22.135Z [-] Listener starting on 2561 2013-07-18 21:51:22.144Z [-] My pid: 136 2013-07-18 21:51:22.309Z [-] Listener starting on 2562 2013-07-18 21:51:22.316Z [-] My pid: 136 2013-07-18 21:51:23.019Z [-] Listener starting on 2563 2013-07-18 21:51:23.029Z [-] NevowSite starting on 2564 2013-07-18 21:51:23.029Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e6bb3cc> 2013-07-18 21:51:23.034Z [-] My pid: 136 2013-07-18 21:51:23.181Z [-] Listener starting on 2565 2013-07-18 21:51:23.190Z [-] My pid: 136 2013-07-18 21:51:23.201Z [-] DatagramProtocol starting on 2568 2013-07-18 21:51:23.201Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2c2c> 2013-07-18 21:51:23.210Z [-] DatagramProtocol starting on 2569 2013-07-18 21:51:23.210Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b4cc> 2013-07-18 21:51:23.215Z [-] DatagramProtocol starting on 2570 2013-07-18 21:51:23.216Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b50c> 2013-07-18 21:51:23.232Z [-] DatagramProtocol starting on 2571 2013-07-18 21:51:23.232Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b14c> 2013-07-18 21:51:23.241Z [-] DatagramProtocol starting on 2572 2013-07-18 21:51:23.242Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b66c> 2013-07-18 21:51:23.248Z [-] DatagramProtocol starting on 2573 2013-07-18 21:51:23.248Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b3ec> 2013-07-18 21:51:23.269Z [-] DatagramProtocol starting on 2574 2013-07-18 21:51:23.270Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4608c> 2013-07-18 21:51:23.277Z [-] DatagramProtocol starting on 2575 2013-07-18 21:51:23.277Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4634c> 2013-07-18 21:51:23.283Z [-] DatagramProtocol starting on 2576 2013-07-18 21:51:23.284Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46eec> 2013-07-18 21:51:23.297Z [-] DatagramProtocol starting on 2577 2013-07-18 21:51:23.297Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4688c> 2013-07-18 21:51:23.304Z [-] DatagramProtocol starting on 2578 2013-07-18 21:51:23.305Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46a0c> 2013-07-18 21:51:23.310Z [-] DatagramProtocol starting on 2579 2013-07-18 21:51:23.311Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7ce2ebac> 2013-07-18 21:51:23.329Z [-] (UDP Port 2568 Closed) 2013-07-18 21:51:23.329Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7e6e2c2c> 2013-07-18 21:51:23.330Z [-] (UDP Port 2569 Closed) 2013-07-18 21:51:23.331Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b4cc> 2013-07-18 21:51:23.331Z [-] (UDP Port 2570 Closed) 2013-07-18 21:51:23.332Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b50c> 2013-07-18 21:51:23.492Z [-] (UDP Port 2571 Closed) 2013-07-18 21:51:23.492Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b14c> 2013-07-18 21:51:23.493Z [-] (UDP Port 2572 Closed) 2013-07-18 21:51:23.493Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b66c> 2013-07-18 21:51:23.494Z [-] (UDP Port 2573 Closed) 2013-07-18 21:51:23.494Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7cb5b3ec> 2013-07-18 21:51:23.497Z [-] (UDP Port 2574 Closed) 2013-07-18 21:51:23.497Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4608c> 2013-07-18 21:51:23.497Z [-] (UDP Port 2575 Closed) 2013-07-18 21:51:23.497Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4634c> 2013-07-18 21:51:23.498Z [-] (UDP Port 2576 Closed) 2013-07-18 21:51:23.498Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46eec> 2013-07-18 21:51:23.499Z [-] (UDP Port 2577 Closed) 2013-07-18 21:51:23.499Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd4688c> 2013-07-18 21:51:23.500Z [-] (UDP Port 2578 Closed) 2013-07-18 21:51:23.500Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dd46a0c> 2013-07-18 21:51:23.500Z [-] (UDP Port 2579 Closed) 2013-07-18 21:51:23.500Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7ce2ebac> 2013-07-18 21:51:27.782Z [-] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/> 2013-07-18 21:51:28.068Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/tahoe.css> 2013-07-18 21:51:28.071Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/> 2013-07-18 21:51:28.092Z [HTTPPageGetter,client] Starting factory <HTTPClientFactory: http://127.0.0.1:2550/?t=json> 2013-07-18 21:51:28.093Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/tahoe.css> 2013-07-18 21:51:28.299Z [HTTPPageGetter,client] Stopping factory <HTTPClientFactory: http://127.0.0.1:2550/?t=json> 2013-07-18 21:51:36.534Z [-] (TCP Port 2557 Closed) 2013-07-18 21:51:36.536Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e6e24ec> 2013-07-18 21:51:36.537Z [-] (TCP Port 2556 Closed) 2013-07-18 21:51:37.802Z [-] Listener starting on 2556 2013-07-18 21:51:37.830Z [-] NevowSite starting on 2645 2013-07-18 21:51:37.830Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x7e33286c> 2013-07-18 21:51:37.834Z [-] My pid: 136 2013-07-18 21:51:37.852Z [-] DatagramProtocol starting on 2648 2013-07-18 21:51:37.852Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac88ec> 2013-07-18 21:51:37.860Z [-] DatagramProtocol starting on 2649 2013-07-18 21:51:37.860Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac828c> 2013-07-18 21:51:37.866Z [-] DatagramProtocol starting on 2650 2013-07-18 21:51:37.866Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac874c> 2013-07-18 21:51:37.875Z [-] (UDP Port 2648 Closed) 2013-07-18 21:51:37.875Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac88ec> 2013-07-18 21:51:37.876Z [-] (UDP Port 2649 Closed) 2013-07-18 21:51:37.876Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac828c> 2013-07-18 21:51:37.877Z [-] (UDP Port 2650 Closed) 2013-07-18 21:51:37.877Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x7dac874c> 2013-07-18 21:51:38.831Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: replaced by a new connection 2013-07-18 21:51:38.948Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=njea) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2) 2013-07-18 21:51:38.951Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=ncie) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2013-07-18 21:51:38.953Z [-] (TCP Port 2645 Closed) 2013-07-18 21:51:38.954Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e33286c> 2013-07-18 21:51:38.955Z [-] (TCP Port 2556 Closed) 2013-07-18 21:51:38.956Z [-] (TCP Port 2565 Closed) 2013-07-18 21:51:38.957Z [-] (TCP Port 2564 Closed) 2013-07-18 21:51:38.958Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7e6bb3cc> 2013-07-18 21:51:38.958Z [-] (TCP Port 2563 Closed) 2013-07-18 21:51:38.959Z [-] (TCP Port 2562 Closed) 2013-07-18 21:51:38.960Z [-] (TCP Port 2561 Closed) 2013-07-18 21:51:38.962Z [-] (TCP Port 2554 Closed) 2013-07-18 21:51:38.964Z [-] (TCP Port 2549 Closed) 2013-07-18 21:51:38.965Z [-] (TCP Port 2550 Closed) 2013-07-18 21:51:38.966Z [-] Stopping factory <nevow.appserver.NevowSite instance at 0x7cc48b8c> 2013-07-18 21:51:39.019Z [-] Main loop terminated. ```

Replying to daira:

In tests there should be no unexpected disconnections anyway, right? So is the cause of the disconnection just Windows networking being flaky, or do similar errors happen on other platforms?

I don't think it happens on any other platform.

The reason that I'm looking at this right now is that Nathan has submitted a patch #1659, but that patch needs to be tested on Windows, and the buildbot is currently not running tests on Windows, possibly due to this issue.

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1768#issuecomment-89022): > In tests there should be no unexpected disconnections anyway, right? So is the cause of the disconnection just Windows networking being flaky, or do similar errors happen on other platforms? I don't think it happens on any other platform. The reason that I'm looking at this right now is that Nathan has submitted a patch #1659, but that patch needs to be tested on Windows, and the buildbot is currently not running tests on Windows, possibly due to this issue.
daira commented 2015-11-02 15:32:08 +00:00
Owner
This may be the same error: <https://gist.github.com/merickson/27bbaf793c2eff7bae49>
daira commented 2016-07-12 14:45:20 +00:00
Owner

This is a similar failure, on Ubuntu 12.04 Precise, not Windows: https://travis-ci.org/tahoe-lafs/tahoe-lafs/jobs/142880571

This is a similar failure, on Ubuntu 12.04 Precise, **not** Windows: <https://travis-ci.org/tahoe-lafs/tahoe-lafs/jobs/142880571>
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1768
No description provided.