nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_{random_key,convergent} #1084

Open
opened 2010-06-16 01:16:35 +00:00 by davidsarah · 9 comments
davidsarah commented 2010-06-16 01:16:35 +00:00
Owner

(http://tahoe-lafs.org/buildbot/builders/FreeStorm%20CentOS5-i386/builds/17/steps/test/logs/stdio)

[ERROR]: allmydata.test.test_system.SystemTest.test_upload_and_download_random_key

Traceback (most recent call last):
  File "/home/buildbot/tahoe-lafs/FreeStorm CentOS5-i386/build/src/allmydata/test/test_system.py", line 340, in _uploaded
    "resumption saved us some work even though we were using random keys:"
exceptions.TypeError: int argument required

Test log here.

The code around that line is here.

This did not happen on a subsequent build with only this change.

(http://tahoe-lafs.org/buildbot/builders/FreeStorm%20CentOS5-i386/builds/17/steps/test/logs/stdio) ``` [ERROR]: allmydata.test.test_system.SystemTest.test_upload_and_download_random_key Traceback (most recent call last): File "/home/buildbot/tahoe-lafs/FreeStorm CentOS5-i386/build/src/allmydata/test/test_system.py", line 340, in _uploaded "resumption saved us some work even though we were using random keys:" exceptions.TypeError: int argument required ``` [Test log here.](http://tahoe-lafs.org/buildbot/builders/FreeStorm%20CentOS5-i386/builds/17/steps/test/logs/test.log) The code around that line is [here](http://allmydata.org/trac/tahoe-lafs/browser/trunk/src/allmydata/test/test_system.py?rev=80252debcd94fc28#L336). This did not happen on a [subsequent build](http://tahoe-lafs.org/buildbot/builders/FreeStorm%20CentOS5-i386/builds/18/steps/test/logs/stdio) with only [this change](http://allmydata.org/trac/tahoe-lafs/changeset/4479/trunk).
tahoe-lafs added the
code
major
defect
1.7β
labels 2010-06-16 01:16:35 +00:00
tahoe-lafs added this to the undecided milestone 2010-06-16 01:16:35 +00:00
davidsarah commented 2010-06-16 01:25:01 +00:00
Author
Owner
self.failIf(bytes_sent < len(DATA),
            "resumption saved us some work even though we were using random keys:"
            " read %d bytes out of %d total" %
            (bytes_sent, len(DATA)))

Because there was an exception in the %d formatting of the message argument, we do not know what the value of bytes_sent was. It doesn't seem to have been None because that would have produced a different error message:

Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02)
[MSC v.1500 32 bit (Intel)] on win32
...
>>> None < 42
True
>>> "%d" % (None,)
...
TypeError: %d format: a number is required, not NoneType

(but maybe there is a difference between Python 2.4.3 and 2.6.2 here.)

``` self.failIf(bytes_sent < len(DATA), "resumption saved us some work even though we were using random keys:" " read %d bytes out of %d total" % (bytes_sent, len(DATA))) ``` Because there was an exception in the %d formatting of the message argument, we do not know what the value of `bytes_sent` was. It doesn't seem to have been `None` because that would have produced a different error message: ``` Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit (Intel)] on win32 ... >>> None < 42 True >>> "%d" % (None,) ... TypeError: %d format: a number is required, not NoneType ``` (but maybe there is a difference between Python 2.4.3 and 2.6.2 here.)
davidsarah commented 2010-06-16 01:53:05 +00:00
Author
Owner

Replying to davidsarah:

Because there was an exception in the %d formatting of the message argument, we do not know what the value of bytes_sent was. It doesn't seem to have been None because that would have produced a different error message: ...
(but maybe there is a difference between Python 2.4.3 and 2.6.2 here.)

There was; discount this argument. bytes_sent might have been None.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/1084#issuecomment-78014): > Because there was an exception in the %d formatting of the message argument, we do not know what the value of `bytes_sent` was. It doesn't seem to have been `None` because that would have produced a different error message: ... > (but maybe there is a difference between Python 2.4.3 and 2.6.2 here.) There was; discount this argument. `bytes_sent` might have been `None`.
davidsarah commented 2011-07-31 21:01:45 +00:00
Author
Owner

#1273 was probably a duplicate. That failure occurred on Windows Vista, so the problem is not specific to CentOS or the CentOS builder. The error message is not exactly the same (I think an assertion was added), but seems to be due to the same type error.

#1273 was probably a duplicate. That failure occurred on Windows Vista, so the problem is not specific to CentOS or the CentOS builder. The error message is not exactly the same (I think an assertion was added), but seems to be due to the same type error.
tahoe-lafs changed title from nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key on CentOS builder to nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key 2011-07-31 21:01:45 +00:00
davidsarah commented 2011-08-02 01:29:45 +00:00
Author
Owner

In http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/stdio , this problem happens for both test_upload_and_download_random_key and test_upload_and_download_convergent:

[FAIL]
Traceback (most recent call last):
  File "/home/arthur/buildbot/Arthur lenny c7 32bit/build/src/allmydata/test/test_system.py", line 329, in _uploaded
    self.failUnless(isinstance(bytes_sent, (int, long)), bytes_sent)
twisted.trial.unittest.FailTest: None

allmydata.test.test_system.SystemTest.test_upload_and_download_convergent
allmydata.test.test_system.SystemTest.test_upload_and_download_random_key
In <http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/stdio> , this problem happens for both `test_upload_and_download_random_key` and `test_upload_and_download_convergent`: ``` [FAIL] Traceback (most recent call last): File "/home/arthur/buildbot/Arthur lenny c7 32bit/build/src/allmydata/test/test_system.py", line 329, in _uploaded self.failUnless(isinstance(bytes_sent, (int, long)), bytes_sent) twisted.trial.unittest.FailTest: None allmydata.test.test_system.SystemTest.test_upload_and_download_convergent allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ```
tahoe-lafs changed title from nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_random_key to nondeterministic failure of allmydata.test.test_system.SystemTest.test_upload_and_download_{random_key,convergent} 2011-08-02 01:29:45 +00:00
davidsarah commented 2011-08-02 01:31:10 +00:00
Author
Owner

Attachment Arthur-lenny-c7-32bit-build-745-test.txt (82202 bytes) added

Copy of http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/stdio/text

**Attachment** Arthur-lenny-c7-32bit-build-745-test.txt (82202 bytes) added Copy of <http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/stdio/text>
davidsarah commented 2011-08-02 01:33:03 +00:00
Author
Owner

Attachment Arthur-lenny-c7-32bit-build-745-test.log.txt (455870 bytes) added

Copy of http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/test.log/text

**Attachment** Arthur-lenny-c7-32bit-build-745-test.log.txt (455870 bytes) added Copy of <http://tahoe-lafs.org/buildbot/builders/Arthur%20lenny%20c7%2032bit/builds/745/steps/test/logs/test.log/text>

A problem with similar characteristics happened just now on Ruben's Fedora buildslave:

failed:
http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/864

Ended with:

allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 133, in maybeDeferred
    result = f(*args, **kw)
  File "/home/buildbot/tahoe/Ruben Fedora/build/src/allmydata/util/pollmixin.py", line 34, in _poll
    raise TimeoutError("PollMixin never saw %s return True" % check_f)
allmydata.util.pollmixin.TimeoutError: PollMixin never saw <bound method SystemTest._check_connections of <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key>> return True
[ERROR]Traceback (most recent call last):
Failure: twisted.internet.defer.TimeoutError: <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key> (tearDown) still running at 3600.0 secs
[ERROR]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug)
<DelayedCall 0x534e5f0 [39.4281477928s] called=0 cancelled=0 LoopingCall<60>(CPUUsageMonitor.check, *(), **{})()>
<DelayedCall 0x48e40e0 [39759.6027431s] called=0 cancelled=0 LeaseCheckingCrawler.start_slice()>
<DelayedCall 0x4bcb710 [99.8050701618s] called=0 cancelled=0 BucketCountingCrawler.start_slice()>
[ERROR]Traceback (most recent call last):
Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean.
Selectables:
<<class 'twisted.internet.tcp.Port'> of foolscap.pb.Listener on 42824>
[ERROR]
command timed out: 7200 seconds without output, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=11884.623658

rebuilt exact same version and passed:
http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/865

A problem with similar characteristics happened just now on Ruben's Fedora buildslave: failed: <http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/864> Ended with: ``` allmydata.test.test_system.SystemTest.test_upload_and_download_random_key ... Traceback (most recent call last): File "/usr/lib64/python2.7/site-packages/twisted/internet/defer.py", line 133, in maybeDeferred result = f(*args, **kw) File "/home/buildbot/tahoe/Ruben Fedora/build/src/allmydata/util/pollmixin.py", line 34, in _poll raise TimeoutError("PollMixin never saw %s return True" % check_f) allmydata.util.pollmixin.TimeoutError: PollMixin never saw <bound method SystemTest._check_connections of <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key>> return True [ERROR]Traceback (most recent call last): Failure: twisted.internet.defer.TimeoutError: <allmydata.test.test_system.SystemTest testMethod=test_upload_and_download_random_key> (tearDown) still running at 3600.0 secs [ERROR]Traceback (most recent call last): Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean. DelayedCalls: (set twisted.internet.base.DelayedCall.debug = True to debug) <DelayedCall 0x534e5f0 [39.4281477928s] called=0 cancelled=0 LoopingCall<60>(CPUUsageMonitor.check, *(), **{})()> <DelayedCall 0x48e40e0 [39759.6027431s] called=0 cancelled=0 LeaseCheckingCrawler.start_slice()> <DelayedCall 0x4bcb710 [99.8050701618s] called=0 cancelled=0 BucketCountingCrawler.start_slice()> [ERROR]Traceback (most recent call last): Failure: twisted.trial.util.DirtyReactorAggregateError: Reactor was unclean. Selectables: <<class 'twisted.internet.tcp.Port'> of foolscap.pb.Listener on 42824> [ERROR] command timed out: 7200 seconds without output, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=11884.623658 ``` rebuilt exact same version and passed: <http://tahoe-lafs.org/buildbot/builders/Ruben%20Fedora/builds/865>

I remember having a brainstorm that the bytes_sent would be None in the case that the helper had not been connected to the node before the node started its upload, so I hypothesized that there is a race condition in setting up the tests, between the node connecting to the helper and the node starting its upload.

Not sure if that applies to this new issue from comment:78020. Also, I thought I wrote some notes about that last time, but they are not on this ticket. Is there a different (redundant) ticket somewhere? Did I post my notes elsewhere than trac? I will investigate...

I remember having a brainstorm that the `bytes_sent` would be None in the case that the helper had not been connected to the node before the node started its upload, so I hypothesized that there is a race condition in setting up the tests, between the node connecting to the helper and the node starting its upload. Not sure if that applies to this new issue from [comment:78020](/tahoe-lafs/trac-2024-07-25/issues/1084#issuecomment-78020). Also, I thought I wrote some notes about that last time, but they are not on this ticket. Is there a different (redundant) ticket somewhere? Did I post my notes elsewhere than trac? I will investigate...
davidsarah commented 2011-09-09 18:21:15 +00:00
Author
Owner

comment:78020 looks like #846 to me. (It's in a different test method, but the error is almost identical to the one in http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/846/tahoe-846-test-fail.txt.)

I don't see any strong evidence that comment:78020 and #846 are the same bug as #1084 and #1273.

[comment:78020](/tahoe-lafs/trac-2024-07-25/issues/1084#issuecomment-78020) looks like #846 to me. (It's in a different test method, but the error is almost identical to the one in <http://tahoe-lafs.org/trac/tahoe-lafs/attachment/ticket/846/tahoe-846-test-fail.txt>.) I don't see any strong evidence that [comment:78020](/tahoe-lafs/trac-2024-07-25/issues/1084#issuecomment-78020) and #846 are the same bug as #1084 and #1273.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1084
No description provided.