exceptions.OverflowError: join() result is too long for a Python string #807

Closed
opened 2009-09-17 06:00:52 +00:00 by zooko · 5 comments

Here is the complete contents from one "Log opened." to the next of tahoebs5.allmydata.com:/home/amduser/public/bs5c1/logs/twistd.log:

2009/07/15 06:30 -0700 [-] Log opened.
2009/07/15 06:30 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting up
2009/07/15 06:30 -0700 [-] reactor class: <class 'twisted.internet.selectreactor.SelectReactor'>
2009/07/15 06:30 -0700 [-] Loading tahoe-client.tac...
2009-07-15 13:30:44.135Z [-] Loaded.
2009-07-15 13:30:44.139Z [-] foolscap.pb.Listener starting on 45963
2009-07-15 13:30:44.141Z [-] nevow.appserver.NevowSite starting on 9051
2009-07-15 13:30:44.141Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x8b7e74c>
2009-07-15 13:30:44.145Z [-] twisted.internet.protocol.DatagramProtocol starting on 50791
2009-07-15 13:30:44.145Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8b7ec2c>
2009-07-15 13:30:44.161Z [-] (Port 50791 Closed)
2009-07-15 13:30:44.162Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8b7ec2c>
2009-07-27 05:00:32.314Z [Negotiation,158,98.202.225.214] got banana ERROR from remote side: internal server error, see logs
2009-08-07 00:49:15.226Z [-] Unhandled error in Deferred:
2009-08-07 00:49:15.259Z [-] Unhandled Error
        Traceback (most recent call last):
        Failure: foolscap.ipb.DeadReferenceError: Connection was lost

2009-08-31 01:56:53.611Z [-] Unhandled error in Deferred:
2009-08-31 01:56:53.635Z [-] Unhandled Error
        Traceback (most recent call last):
        Failure: foolscap.ipb.DeadReferenceError: Connection was lost

2009-08-31 01:58:11.104Z [-] Unhandled error in Deferred:
2009-08-31 01:58:11.104Z [-] Unhandled Error
        Traceback (most recent call last):
        Failure: foolscap.ipb.DeadReferenceError: Connection was lost

2009-08-31 12:32:18.472Z [-] Unhandled error in Deferred:
2009-08-31 12:32:18.496Z [-] Unhandled Error
        Traceback (most recent call last):
        Failure: foolscap.ipb.DeadReferenceError: Connection was lost

2009-09-11 03:10:34.056Z [Negotiation,1031,97.118.104.193] Unhandled Error
        Traceback (most recent call last):
          File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 48, in callWithLogger
            return callWithContext({"system": lp}, func, *args, **kw)
          File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 33, in callWithContext
            return context.call({ILogContext: newCtx}, func, *args, **kw)
          File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 59, in callWithContext
            return self.currentContext().callWithContext(ctx, func, *args, **kw)
          File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 37, in callWithContext
            return func(*args,**kw)
        --- <exception caught here> ---
          File "/usr/lib/python2.5/site-packages/twisted/internet/selectreactor.py", line 139, in _doReadOrWrite
            why = getattr(selectable, method)()
          File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 154, in doWrite
            return Connection.doWrite(self)
          File "/usr/lib/python2.5/site-packages/twisted/internet/abstract.py", line 104, in doWrite
            self.dataBuffer = buffer(self.dataBuffer, self.offset) + "".join(self._tempDataBuffer)
        exceptions.OverflowError: join() result is too long for a Python string

Here is the startup version announcement from an incident report file which was recorded on 2009-06-02:

Application versions (embedded in logfile):
          Nevow: 0.9.26
        Twisted: 2.5.0
allmydata-tahoe: 1.3.0-r3747
       argparse: 0.8.0
       foolscap: 0.3.2
       platform: Linux-Ubuntu_8.04-i686-32bit
      pyOpenSSL: 0.6
     pycryptopp: 0.5.2-1
         python: 2.5.2
         pyutil: 1.3.16-12
     setuptools: 0.6c8
     simplejson: 1.7.3
        twisted: 2.5.0
           zfec: 1.4.0-4
 zope.interface: 3.3.1
PID: 4189
Here is the complete contents from one "Log opened." to the next of tahoebs5.allmydata.com:/home/amduser/public/bs5c1/logs/twistd.log: ``` 2009/07/15 06:30 -0700 [-] Log opened. 2009/07/15 06:30 -0700 [-] twistd 2.5.0 (/usr/bin/python 2.5.2) starting up 2009/07/15 06:30 -0700 [-] reactor class: <class 'twisted.internet.selectreactor.SelectReactor'> 2009/07/15 06:30 -0700 [-] Loading tahoe-client.tac... 2009-07-15 13:30:44.135Z [-] Loaded. 2009-07-15 13:30:44.139Z [-] foolscap.pb.Listener starting on 45963 2009-07-15 13:30:44.141Z [-] nevow.appserver.NevowSite starting on 9051 2009-07-15 13:30:44.141Z [-] Starting factory <nevow.appserver.NevowSite instance at 0x8b7e74c> 2009-07-15 13:30:44.145Z [-] twisted.internet.protocol.DatagramProtocol starting on 50791 2009-07-15 13:30:44.145Z [-] Starting protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8b7ec2c> 2009-07-15 13:30:44.161Z [-] (Port 50791 Closed) 2009-07-15 13:30:44.162Z [-] Stopping protocol <twisted.internet.protocol.DatagramProtocol instance at 0x8b7ec2c> 2009-07-27 05:00:32.314Z [Negotiation,158,98.202.225.214] got banana ERROR from remote side: internal server error, see logs 2009-08-07 00:49:15.226Z [-] Unhandled error in Deferred: 2009-08-07 00:49:15.259Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost 2009-08-31 01:56:53.611Z [-] Unhandled error in Deferred: 2009-08-31 01:56:53.635Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost 2009-08-31 01:58:11.104Z [-] Unhandled error in Deferred: 2009-08-31 01:58:11.104Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost 2009-08-31 12:32:18.472Z [-] Unhandled error in Deferred: 2009-08-31 12:32:18.496Z [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost 2009-09-11 03:10:34.056Z [Negotiation,1031,97.118.104.193] Unhandled Error Traceback (most recent call last): File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 48, in callWithLogger return callWithContext({"system": lp}, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/log.py", line 33, in callWithContext return context.call({ILogContext: newCtx}, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 59, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/lib/python2.5/site-packages/twisted/python/context.py", line 37, in callWithContext return func(*args,**kw) --- <exception caught here> --- File "/usr/lib/python2.5/site-packages/twisted/internet/selectreactor.py", line 139, in _doReadOrWrite why = getattr(selectable, method)() File "/usr/lib/python2.5/site-packages/twisted/internet/tcp.py", line 154, in doWrite return Connection.doWrite(self) File "/usr/lib/python2.5/site-packages/twisted/internet/abstract.py", line 104, in doWrite self.dataBuffer = buffer(self.dataBuffer, self.offset) + "".join(self._tempDataBuffer) exceptions.OverflowError: join() result is too long for a Python string ``` Here is the startup version announcement from an incident report file which was recorded on 2009-06-02: ``` Application versions (embedded in logfile): Nevow: 0.9.26 Twisted: 2.5.0 allmydata-tahoe: 1.3.0-r3747 argparse: 0.8.0 foolscap: 0.3.2 platform: Linux-Ubuntu_8.04-i686-32bit pyOpenSSL: 0.6 pycryptopp: 0.5.2-1 python: 2.5.2 pyutil: 1.3.16-12 setuptools: 0.6c8 simplejson: 1.7.3 twisted: 2.5.0 zfec: 1.4.0-4 zope.interface: 3.3.1 PID: 4189 ```
zooko added the
code
major
defect
1.5.0
labels 2009-09-17 06:00:52 +00:00
zooko added this to the undecided milestone 2009-09-17 06:00:52 +00:00
zooko added
1.3.0
and removed
1.5.0
labels 2009-10-27 03:39:47 +00:00
Author

(http://svn.python.org/view/python/tags/r252/Objects/stringobject.c?revision=60915&view=markup)

Search in text for "join() result is too long for a Python string". It is guarded by ```if (sz < old_sz |

I don't see how to investigate this, reproduce it, or determine if it has been fixed in newer versions of Tahoe-LAFS. One of the reasons why not is that the exception raised by the selectreactor's _doReadOrWrite() apparently didn't get propagated to foolscap, because no accompanying incident report file was generated.

Brian: am I interpreting that correctly? Is there a way to make sure that all unhandled exceptions get registered with the foolscap logging system so that they can be reported as incidents? Do you have any other ideas how to learn more about this issue, or should we just close it as "wontfix"?

(http://svn.python.org/view/python/tags/r252/Objects/stringobject.c?revision=60915&view=markup) Search in text for "join() result is too long for a Python string". It is guarded by ```if (sz < old_sz | I don't see how to investigate this, reproduce it, or determine if it has been fixed in newer versions of Tahoe-LAFS. One of the reasons why not is that the exception raised by the selectreactor's `_doReadOrWrite()` apparently didn't get propagated to foolscap, because no accompanying incident report file was generated. Brian: am I interpreting that correctly? Is there a way to make sure that all unhandled exceptions get registered with the foolscap logging system so that they can be reported as incidents? Do you have any other ideas how to learn more about this issue, or should we just close it as "wontfix"?
Author

Brian: I'm still concerned about the meta-issue here. Can we somehow ensure that all exceptions get logged as foolscap logging incidents or at least as twistd.log lines? I really don't like the feeling that exceptions silently disappear sometimes.

Brian: I'm still concerned about the meta-issue here. Can we somehow ensure that all exceptions get logged as foolscap logging incidents or at least as `twistd.log` lines? I really don't like the feeling that exceptions silently disappear sometimes.
Author

Okay, that meta-issue about disappearing exceptions is now #1021.

Okay, that meta-issue about disappearing exceptions is now #1021.
Author

I wonder if we should just close this ticket as "irreproducible and possibly fixed by some other change in the interim". This may have been related to a bug that we had a long time ago in the combination of Tahoe-LAFS+Twisted which caused very large strings to be produced during exceptions (#379). This issue may have been caused by that, and there may be nothing more we need to do about it.

I'm going to add Cc: Brian as I close this so he has once last chance to look at it and think if we should do anything else about it. :-)

I wonder if we should just close this ticket as "irreproducible and possibly fixed by some other change in the interim". This may have been related to a bug that we had a long time ago in the combination of Tahoe-LAFS+Twisted which caused very large strings to be produced during exceptions (#379). This issue may have been caused by that, and there may be nothing more we need to do about it. I'm going to add Cc: Brian as I close this so he has once last chance to look at it and think if we should do anything else about it. :-)
zooko added the
cannot reproduce
label 2011-12-20 18:47:26 +00:00
zooko closed this issue 2011-12-20 18:47:26 +00:00

No idea. Smells like an out-of-memory problem, or memory corruption, or something. It's unlikely that this server tried to buffer 2GiB of output data to a client, unless maybe someone had a multi-gigabyte file with k=1 and a client which overparallelized segment downloads (which I suppose we did have briefly, but which is fixed now).

So yeah, CANNOTREPRODUCE sounds fine.

No idea. Smells like an out-of-memory problem, or memory corruption, or something. It's unlikely that this server tried to buffer 2GiB of output data to a client, unless maybe someone had a multi-gigabyte file with k=1 and a client which overparallelized segment downloads (which I suppose we did have briefly, but which is fixed now). So yeah, CANNOTREPRODUCE sounds fine.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#807
No description provided.