bug in Twisted, triggered by pyOpenSSL-0.7 #402

Closed
opened 2008-05-01 18:17:26 +00:00 by warner · 20 comments

The symptom is that tahoe's test_system fails with "unclean reactor errors",
complaining about several foolscap negotiation timers that are still running
when the test finishes.

We tracked this down to a bug in twisted, inside some twisted code that is
only enabled in the presence of pyopenssl-0.7 (which just landed in sid a few
days ago). The previous pyopenssl-0.6 does not trigger the bug. This bug
causes foolscap's unit tests to fail in the same way.

The twisted folks (exarkun in particular) are now aware of the problem and
are able to reproduce it: http://twistedmatrix.com/trac/ticket/3218

The current workaround is to downgrade to pyopenssl-0.6 .

The symptom is that tahoe's test_system fails with "unclean reactor errors", complaining about several foolscap negotiation timers that are still running when the test finishes. We tracked this down to a bug in twisted, inside some twisted code that is only enabled in the presence of pyopenssl-0.7 (which just landed in sid a few days ago). The previous pyopenssl-0.6 does not trigger the bug. This bug causes foolscap's unit tests to fail in the same way. The twisted folks (exarkun in particular) are now aware of the problem and are able to reproduce it: <http://twistedmatrix.com/trac/ticket/3218> The current workaround is to downgrade to pyopenssl-0.6 .
warner added the
operational
major
defect
1.0.0
labels 2008-05-01 18:17:26 +00:00
warner added this to the undecided milestone 2008-05-01 18:17:26 +00:00
warner self-assigned this 2008-05-01 18:17:26 +00:00

I guess the current workaround should be to specify in our source:_auto_deps.py:

# v0.7 of pyOpenSSL triggers a bug in Twisted <= 8.1.0, which is the latest version of Twisted at this time: http://allmydata.org/trac/tahoe/ticket/402
setup_requires.append("pyOpenSSL >= 0.6, != 0.7")

I'll try that out on the buildbot tomorrow morning when I'm more wakeful and willing to spend time wrangling buildslaves...

I guess the current workaround should be to specify in our source:_auto_deps.py: ``` # v0.7 of pyOpenSSL triggers a bug in Twisted <= 8.1.0, which is the latest version of Twisted at this time: http://allmydata.org/trac/tahoe/ticket/402 setup_requires.append("pyOpenSSL >= 0.6, != 0.7") ``` I'll try that out on the buildbot tomorrow morning when I'm more wakeful and willing to spend time wrangling buildslaves...
zooko changed title from tahoe unit tests fail with latest debian sid to bug in Twisted, triggered by pyOpenSSL-0.7 2008-05-30 04:10:38 +00:00
zooko modified the milestone from undecided to 1.1.0 2008-05-30 21:32:41 +00:00
warner was unassigned by zooko 2008-05-30 21:32:47 +00:00
zooko self-assigned this 2008-05-30 21:32:47 +00:00

Ah, well I tried making Tahoe require pyOpenSSL >= 0.6, != 0.7, but the pyOpenSSL-0.6 tarball is not easy_install'able, as described here:

https://bugs.launchpad.net/pyopenssl/+bug/236190

If the pyOpenSSL maintainers fix the 0.6 tarball's permission bits as I submitted in that ticket, then this will start working.

Ah, well I tried making Tahoe require pyOpenSSL >= 0.6, != 0.7, but the pyOpenSSL-0.6 tarball is not `easy_install`'able, as described here: <https://bugs.launchpad.net/pyopenssl/+bug/236190> If the pyOpenSSL maintainers fix the 0.6 tarball's permission bits as I submitted in that ticket, then this will start working.

If this gets fixed so that pyOpenSSL can be easy_install'ed (provided that OpenSSL is installed), then this will reduce the need for #282 (more detailed and targeted docs about installing from source).

If this gets fixed so that pyOpenSSL can be `easy_install`'ed (provided that `OpenSSL` is installed), then this will reduce the need for #282 (more detailed and targeted docs about installing from source).

I've requested that JP give me admin privs for the pyOpenSSL sf.net project so that I can upload a pyOpenSSL-0.6.tar.gz which works around this problem.

I've requested that JP give me admin privs for the pyOpenSSL sf.net project so that I can upload a pyOpenSSL-0.6.tar.gz which works around this problem.

Now, as far as we know the combination of Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 doesn't lead to any bad behavior except for a vast number of unit tests failing doing to failure to close connections. One workaround, if we can't get an easy_install'able pyOpenSSL-0.6.tar.gz would be to code up an explicit "skip-test-like" behavior in our Makefile or perhaps in our test code to skip all these numerous failing tests if pyOpenSSL v0.7 is detected. I'm not sure exactly how that would be implemented. It also feels a little bit uncomfortable to deploy Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 because I'm not entirely sure that the bug wouldn't lead to other problems for Tahoe users. (Doubtless pyOpenSSL-0.6 is more buggy that pyOpenSSL-0.7, but at least we have experience with it and there are no known anomalies which could be explained by bugs in pyOpenSSL-0.6.)

Now, as far as we know the combination of Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 doesn't lead to any bad behavior except for a vast number of unit tests failing doing to failure to close connections. One workaround, if we can't get an `easy_install`'able `pyOpenSSL-0.6.tar.gz` would be to code up an explicit "skip-test-like" behavior in our Makefile or perhaps in our test code to skip all these numerous failing tests if pyOpenSSL v0.7 is detected. I'm not sure exactly how that would be implemented. It also feels a little bit uncomfortable to deploy Tahoe-1.1+Twisted-8.1+pyOpenSSL-0.7 because I'm not entirely sure that the bug wouldn't lead to other problems for Tahoe users. (Doubtless pyOpenSSL-0.6 is more buggy that pyOpenSSL-0.7, but at least we have experience with it and there are no known anomalies which could be explained by bugs in pyOpenSSL-0.6.)

Okay this whole issue is now foolscap's problem -- Tahoe doesn't actually require pyOpenSSL at all. Tahoe requires foolscap-with-secure-connections, and (currently) foolscap-with-secure-connections requires pyOpenSSL. So I'm closing this ticket as "fixed" and further work will be done in #438 (get foolscap to declare its dependency on pyOpenSSL) and http://foolscap.lothar.com/trac/ticket/66 (install requires pyOpenSSL (for secure mode)).

Okay this whole issue is now `foolscap`'s problem -- Tahoe doesn't actually require `pyOpenSSL` at all. Tahoe requires foolscap-with-secure-connections, and (currently) foolscap-with-secure-connections requires pyOpenSSL. So I'm closing this ticket as "fixed" and further work will be done in #438 (get foolscap to declare its dependency on pyOpenSSL) and <http://foolscap.lothar.com/trac/ticket/66> (install requires pyOpenSSL (for secure mode)).
zooko added the
fixed
label 2008-06-04 16:16:43 +00:00
zooko closed this issue 2008-06-04 16:16:43 +00:00

Oh wait, resolving this as "fixed" is a bit premature. Until there is a foolscap release that does this, and Tahoe specifies that it requires such a foolscap release, then this is still an open issue for Tahoe.

Also, there is a judgment call to be made as to what version of foolscap Tahoe should require.

Oh wait, resolving this as "fixed" is a bit premature. Until there is a foolscap release that does this, and Tahoe specifies that it requires such a foolscap release, then this is still an open issue for Tahoe. Also, there is a judgment call to be made as to what version of foolscap Tahoe should require.
zooko removed the
fixed
label 2008-06-04 16:17:57 +00:00
zooko reopened this issue 2008-06-04 16:17:57 +00:00

Okay, Brian is planning to release foolscap v0.2.8 which declares an "extra" dependency. If you specify that your project depends on foolscap "with the extra feature of secure connections", then foolscap will require pyOpenSSL.

He is not specifying that it requires a version of pyOpenSSL other than v0.7, which means that if Tahoe requires foolscap, and foolscap foolscap causes pyOpenSSL to be installed, and the version of pyOpenSSL that gets installed is version 0.7, then the Tahoe unit tests will all get ERRORs due to connections not being shut down properly.

So now I want to figure out how to make those ERRORs not happen when people install Tahoe and run make test.

Okay, Brian is planning to release foolscap v0.2.8 which declares an "extra" dependency. If you specify that your project depends on foolscap "with the extra feature of secure connections", then foolscap will require pyOpenSSL. He is not specifying that it requires a version of pyOpenSSL other than v0.7, which means that if Tahoe requires foolscap, and foolscap foolscap causes pyOpenSSL to be installed, and the version of pyOpenSSL that gets installed is version 0.7, then the Tahoe unit tests will all get ERRORs due to connections not being shut down properly. So now I want to figure out how to make those ERRORs not happen when people install Tahoe and run `make test`.
Author

Apparently we don't fully understand the combination of versions that trigger this problem. On my debian/sid system, I see timeout/reactor-unclean failures of tahoe's test_system (and of several foolscap unit tests). This system has:

  • python-twisted-8.1.0-1
  • python-openssl-0.7-1
  • libssl0.9.8g-10.1

However, an Ubuntu/Hardy system we just set up does not fail tests, when using what we believe to be twisted-8.1.0, pyopenssl-0.7, and libssl0.9.8g-4ubuntusomething.

If this really only causes failures on sid (but hardy is ok), we're willing to push it out a release.
We still want to get it fixed, but it will probably require the pyopenssl maintainers to
fix twisted#3218.

sid users are still advised to hold python-openssl at 0.6-5 .

Apparently we don't fully understand the combination of versions that trigger this problem. On my debian/sid system, I see timeout/reactor-unclean failures of tahoe's test_system (and of several foolscap unit tests). This system has: * python-twisted-8.1.0-1 * python-openssl-0.7-1 * libssl0.9.8g-10.1 However, an Ubuntu/Hardy system we just set up does *not* fail tests, when using what we believe to be twisted-8.1.0, pyopenssl-0.7, and libssl0.9.8g-4ubuntusomething. If this really only causes failures on sid (but hardy is ok), we're willing to push it out a release. We still want to get it fixed, but it will probably require the pyopenssl maintainers to fix [twisted#3218](http://twistedmatrix.com/trac/ticket/3218). sid users are still advised to hold python-openssl at 0.6-5 .

Are you sure it doesn't fail in that configuration? The problem includes a race condition dependent on timing of network operations and Python calls. It may just be that the race is biased towards going the wrong way frequently in one environment and the right way in the other.

Are you sure it doesn't fail in that configuration? The problem includes a race condition dependent on timing of network operations and Python calls. It may just be that the race is biased towards going the wrong way frequently in one environment and the right way in the other.

I just ran the Tahoe unit tests on Mac OS 10.4 on a PowerPC G4 867 MHz laptop, and this failure did not occur.

I just ran the Tahoe unit tests on Mac OS 10.4 on a PowerPC G4 867 MHz laptop, and this failure did not occur.

This is important because currently there are two workarounds, each of which is unacceptable to one of the Tahoe developers:

workaround #1: leave "secure_connections" out of the requirements that Tahoe needs from foolscap, so that installations of Tahoe, which trigger installations of foolscap, do not trigger installations of pyOpenSSL. This works around the problem because if you happen to have pyOpenSSL already installed, but invisible to setuptools, and it is a version of pyOpenSSL that doesn't trigger this bug, then everything works including no bogus test failures. However, this is unacceptable to Zooko because if you do not already have the right version of pyOpenSSL installed then you will get a runtime exception and you'll have to manually installed pyOpenSSL. It is unacceptable to Zooko to require users to manually install pyOpenSSL.

workaround #2: leave "secure_connections" in the requirements. Then you won't have to manually install anything, and if you happen to get a combination of Twisted and pyOpenSSL which do not trigger this bug, you won't get any bogus test failures. However, this is unacceptable to Brian, because if you get a combination of Twisted and pyOpenSSL and your development platform which triggers this bug then you'll get bogus test failures. People seeing bogus test failures are unacceptable to Brian (and his development platform -- sid -- is the one which incurs this failure).

Here is a work-around which is kind of ugly but at least it isn't unacceptable: write a tearDown() method to reach inside the reactor and clean off outstanding delayed calls and open sockets. Also we would have to change the Tahoe unit tests to not wait for connection cleanup before passing the tests.

A good solution to this would, of course, be to fix this bug in Twisted and/or pyOpenSSL. Maybe we could contribute some time to that. I vaguely recall that there is now a unit test for the problem...

This is important because currently there are two workarounds, each of which is unacceptable to one of the Tahoe developers: workaround #1: leave "secure_connections" out of the requirements that Tahoe needs from foolscap, so that installations of Tahoe, which trigger installations of foolscap, do *not* trigger installations of pyOpenSSL. This works around the problem because if you happen to have pyOpenSSL already installed, but invisible to setuptools, and it is a version of pyOpenSSL that doesn't trigger this bug, then everything works including no bogus test failures. However, this is unacceptable to Zooko because if you do not already have the right version of pyOpenSSL installed then you will get a runtime exception and you'll have to manually installed pyOpenSSL. It is unacceptable to Zooko to require users to manually install pyOpenSSL. workaround #2: leave "secure_connections" in the requirements. Then you won't have to manually install anything, and if you happen to get a combination of Twisted and pyOpenSSL which do not trigger this bug, you won't get any bogus test failures. However, this is unacceptable to Brian, because if you get a combination of Twisted and pyOpenSSL and your development platform which triggers this bug then you'll get bogus test failures. People seeing bogus test failures are unacceptable to Brian (and his development platform -- sid -- is the one which incurs this failure). Here is a work-around which is kind of ugly but at least it isn't unacceptable: write a tearDown() method to reach inside the reactor and clean off outstanding delayed calls and open sockets. Also we would have to change the Tahoe unit tests to not wait for connection cleanup before passing the tests. A *good* solution to this would, of course, be to fix this bug in Twisted and/or pyOpenSSL. Maybe we could contribute some time to that. I vaguely recall that there is now a unit test for the problem...
zooko added
critical
and removed
major
labels 2008-07-25 19:39:13 +00:00

Looks like the Twisted folks have been making progress on this issue:

http://twistedmatrix.com/trac/ticket/3218

Looks like the Twisted folks have been making progress on this issue: <http://twistedmatrix.com/trac/ticket/3218>
Author

zooko says that the twisted folks say that this may only happen with the select reactor.. so another easy workaround is to use the pollreactor instead. I'll test this and report back.

zooko says that the twisted folks say that this may only happen with the select reactor.. so another easy workaround is to use the pollreactor instead. I'll test this and report back.

Okay this is fixed by changeset:01e5ca68e2640274, changeset:3eb5f221d7ed217b, changeset:677f26f0f4f10d04, changeset:bd0fe3588b314711, changeset:5a0e98d693fd2f3e. (Changes to the build system sometimes take multiple patches, because I use the buildbot to try out my changes on all of our platforms at once. If the buildbot "try this out but don't commit it to trunk" feature were working and I knew how to use it then I would do that instead.)

The fix is to set --reactor=poll on linux. (So this is in a sense a work-around instead of a fix, but on the other hand there's no reason for us to prefer the select reactor on linux, so this is fine.)

Okay this is fixed by changeset:01e5ca68e2640274, changeset:3eb5f221d7ed217b, changeset:677f26f0f4f10d04, changeset:bd0fe3588b314711, changeset:5a0e98d693fd2f3e. (Changes to the build system sometimes take multiple patches, because I use the buildbot to try out my changes on all of our platforms at once. If the buildbot "try this out but don't commit it to trunk" feature were working and I knew how to use it then I would do that instead.) The fix is to set --reactor=poll on linux. (So this is in a sense a work-around instead of a fix, but on the other hand there's no reason for us to prefer the select reactor on linux, so this is fine.)
zooko added the
fixed
label 2008-07-30 16:53:02 +00:00
zooko closed this issue 2008-07-30 16:53:02 +00:00

Hooray -- the Twisted folks have fixed this issue:

http://twistedmatrix.com/trac/ticket/3218

Hooray -- the Twisted folks have fixed this issue: <http://twistedmatrix.com/trac/ticket/3218>
Author

Excellent. Now we just need to wait for them to make a release, and add advice in the README to avoid the combination of Twisted in (8.0.1 .. 8.1.0) and pyOpenSSL-0.7 .

incidentally: I've tested twisted-8.0.1 and 8.1.0 (against pyopenssl-0.7) and saw test failures. I don't know about 8.0.0 . I see some failures against twisted-2.5.0, and different (non-ssl-related) ones against twisted-2.4.0 . So the versionspace to avoid might be Twisted in (8.0.0 .. 8.1.0) and pyopenssl-0.7 .. don't know yet.

Or, we could just be satisfied with always using the pollreactor. But if we ever want to simplify the Makefile and remove that platform-detection / reactor-choosing code, we could force users to go with a post-8.1.0 release of twisted instead.

Excellent. Now we just need to wait for them to make a release, and add advice in the README to avoid the combination of Twisted in (8.0.1 .. 8.1.0) and pyOpenSSL-0.7 . incidentally: I've tested twisted-8.0.1 and 8.1.0 (against pyopenssl-0.7) and saw test failures. I don't know about 8.0.0 . I see some failures against twisted-2.5.0, and different (non-ssl-related) ones against twisted-2.4.0 . So the versionspace to avoid might be Twisted in (8.0.0 .. 8.1.0) and pyopenssl-0.7 .. don't know yet. Or, we could just be satisfied with always using the pollreactor. But if we ever want to simplify the Makefile and remove that platform-detection / reactor-choosing code, we could force users to go with a post-8.1.0 release of twisted instead.

I'm not aware of any reason to prefer a select reactor over a poll reactor if there is a poll reactor on your platform, so I'm satisfied with our current reactor chooser.

I think I'll suggest to the Twisted folks (via their issue tracker) that Twisted could make poll reactor the default reactor on platforms that support it.

I'm not aware of any reason to prefer a select reactor over a poll reactor if there is a poll reactor on your platform, so I'm satisfied with our current reactor chooser. I think I'll suggest to the Twisted folks (via their issue tracker) that Twisted could make poll reactor the default reactor on platforms that support it.

(http://twistedmatrix.com/trac/ticket/2234) # Select default reactor based on platform and available libraries

(http://twistedmatrix.com/trac/ticket/2234) # Select default reactor based on platform and available libraries
warner added this to the 1.3.0 milestone 2008-09-03 01:16:35 +00:00
launchpad commented 2008-10-31 15:33:34 +00:00
Owner

Updating Launchpad bug reference

Updating Launchpad bug reference
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#402
No description provided.