Avoid Popen() of executables that don't exist #109

Merged
warner merged 1 commits from 2023-caution into master 2014-09-13 09:58:55 +00:00
warner commented 2014-09-12 04:26:30 +00:00 (Migrated from github.com)

The stdlib 'subprocess' module in python-2.7.4 through 2.7.7 suffers
from http://bugs.python.org/issue18851 which causes unrelated file
descriptors to be closed when subprocess.call() fails the exec(),
such as when the executable being invoked does not actually exist. There
appears to be some randomness involved. This was fixed in python-2.7.8.

Tahoe's iputil.py uses subprocess.call on many different "ifconfig"-type
executables, most of which don't exist on any given platform (added in
git commit 8e31d66cd0). This results in a lot of file-descriptor
closing, which (at least during unit tests) tends to clobber important
things like Tub TCP sockets. This seems to be the root cause behind
ticket:2121, in which normal code tries to close already-closed sockets,
crashing the unit tests. Since different platforms have different
ifconfigs, some platforms will experience more failed execs than others,
so this bug could easily behave differently on linux vs freebsd, as well
as working normally on python-2.7.8 or 2.7.4.

This patch inserts a guard to make sure that os.path.isfile() is true
before allowing Popen.call() to try executing the target. This ought to
be enough to avoid the bug. It changes both iputil.py and
allmydata.init (which uses Popen for calling "lsb_release"), which
are all the places where 'subprocess' is used outside of unit tests.

Other potential fixes: use the 'subprocess32' module from PyPI (which is
a bug-free backport of the Python3 stdlib subprocess module, but would
introduce a new dependency), or require python >= 2.7.8 (but this would
rule out development/deployment on the current OS-X 10.9 release, which
ships with 2.7.5, as well as other distributions like Ubuntu 14.04 LTS).

I believe this closes ticket:2121, and given the apparent relationship
between 2121 and 2023, I think it also closes ticket:2023 (although
since 2023 doesn't have copies of the failing log files, it's hard to
tell). I'm hoping that this will tide us over until 1.11 is released, at
which point we can execute on the plan to remove iputil.py entirely by
changing the way that nodes learn their externally-facing IP address.

The stdlib 'subprocess' module in python-2.7.4 through 2.7.7 suffers from http://bugs.python.org/issue18851 which causes unrelated file descriptors to be closed when `subprocess.call()` fails the `exec()`, such as when the executable being invoked does not actually exist. There appears to be some randomness involved. This was fixed in python-2.7.8. Tahoe's iputil.py uses subprocess.call on many different "ifconfig"-type executables, most of which don't exist on any given platform (added in git commit 8e31d66cd0b). This results in a lot of file-descriptor closing, which (at least during unit tests) tends to clobber important things like Tub TCP sockets. This seems to be the root cause behind ticket:2121, in which normal code tries to close already-closed sockets, crashing the unit tests. Since different platforms have different ifconfigs, some platforms will experience more failed execs than others, so this bug could easily behave differently on linux vs freebsd, as well as working normally on python-2.7.8 or 2.7.4. This patch inserts a guard to make sure that os.path.isfile() is true before allowing Popen.call() to try executing the target. This ought to be enough to avoid the bug. It changes both iputil.py and allmydata.**init** (which uses Popen for calling "lsb_release"), which are all the places where 'subprocess' is used outside of unit tests. Other potential fixes: use the 'subprocess32' module from PyPI (which is a bug-free backport of the Python3 stdlib subprocess module, but would introduce a new dependency), or require python >= 2.7.8 (but this would rule out development/deployment on the current OS-X 10.9 release, which ships with 2.7.5, as well as other distributions like Ubuntu 14.04 LTS). I believe this closes ticket:2121, and given the apparent relationship between 2121 and 2023, I think it also closes ticket:2023 (although since 2023 doesn't have copies of the failing log files, it's hard to tell). I'm hoping that this will tide us over until 1.11 is released, at which point we can execute on the plan to remove iputil.py entirely by changing the way that nodes learn their externally-facing IP address.
coveralls commented 2014-09-12 04:47:51 +00:00 (Migrated from github.com)

Coverage Status

Coverage decreased (-0.01%) when pulling 1a07dfdedef4747784ad1798542a2f4c0b9c4be3 on warner:2023-caution into 6a38a3c54e on tahoe-lafs:master.

[![Coverage Status](https://coveralls.io/builds/1200447/badge)](https://coveralls.io/builds/1200447) Coverage decreased (-0.01%) when pulling **1a07dfdedef4747784ad1798542a2f4c0b9c4be3 on warner:2023-caution** into **6a38a3c54e0846dea625f5af0a4d365f9368acf6 on tahoe-lafs:master**.
warner commented 2014-09-12 05:46:33 +00:00 (Migrated from github.com)

not sure why that failed (but I filed https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2298 about it. probably intermittent. I'll re-push a rebased form to trigger a new build.

not sure why that failed (but I filed https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2298 about it. probably intermittent. I'll re-push a rebased form to trigger a new build.
coveralls commented 2014-09-12 06:12:31 +00:00 (Migrated from github.com)

Coverage Status

Coverage decreased (-0.02%) when pulling 5e70aa818e5c06c5f50d1c1ba4d8f4d4f9ffdf34 on warner:2023-caution into 528364a421 on tahoe-lafs:master.

[![Coverage Status](https://coveralls.io/builds/1200565/badge)](https://coveralls.io/builds/1200565) Coverage decreased (-0.02%) when pulling **5e70aa818e5c06c5f50d1c1ba4d8f4d4f9ffdf34 on warner:2023-caution** into **528364a4219fbcf9f802ba54d1711ed97e693346 on tahoe-lafs:master**.
warner commented 2014-09-12 07:01:38 +00:00 (Migrated from github.com)

That travis build hit the same intermittent failure. The actual iputil.py stuff passed. Don't be put off by the red X :)

That travis build hit the same intermittent failure. The actual iputil.py stuff passed. Don't be put off by the red X :)
warner commented 2014-09-12 20:01:51 +00:00 (Migrated from github.com)

ok, I think we fixed the intermittent failure (https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2290), so hopefully this next rebase will look less scary.

ok, I think we fixed the intermittent failure (https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2290), so hopefully this next rebase will look less scary.
coveralls commented 2014-09-12 20:39:43 +00:00 (Migrated from github.com)

Coverage Status

Coverage decreased (-0.0%) when pulling d27a57cb49 on warner:2023-caution into 38668c9e35 on tahoe-lafs:master.

[![Coverage Status](https://coveralls.io/builds/1203722/badge)](https://coveralls.io/builds/1203722) Coverage decreased (-0.0%) when pulling **d27a57cb49893d6edb192492636d267fb74b9cd9 on warner:2023-caution** into **38668c9e35a9f27123f05d3726e12263248996e6 on tahoe-lafs:master**.
daira commented 2014-09-13 09:52:41 +00:00 (Migrated from github.com)

Reviewing. Coverage is not actually decreasing btw, that's due to https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2295 .

Reviewing. Coverage is not actually decreasing btw, that's due to https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2295 .
warner commented 2014-09-13 18:04:12 +00:00 (Migrated from github.com)

yeah, I figure "lsb_release" is probably specified to live in a pretty stable spot, so looking at either /bin or /usr/bin seemed sufficient.

thanks!

yeah, I figure "lsb_release" is probably specified to live in a pretty stable spot, so looking at either /bin or /usr/bin seemed sufficient. thanks!
Sign in to join this conversation.
No reviewers
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: tahoe-lafs/tahoe-lafs#109
No description provided.