EINTR from communication with subprocess in allmydata/util/iputil.py _query #1381

Closed
opened 2011-03-22 20:04:17 +00:00 by davidsarah · 13 comments
davidsarah commented 2011-03-22 20:04:17 +00:00
Owner

Reported by 'sickness' on irc:

#   Run
#     test_loadable ...                                                      [OK]
#     test_reloadable ... Node._startService failed, aborting
# [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 4] Interrupted system call
# /usr/lib/python2.6/threading.py:497:__bootstrap
# /usr/lib/python2.6/threading.py:525:__bootstrap_inner
# /usr/lib/python2.6/threading.py:477:run
# --- <exception caught here> ---
# /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker
# /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext
# /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext
# /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:222:_synchronously_find_addresses_via_config
# /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:237:_query
# /usr/lib/python2.6/subprocess.py:689:communicate
# /usr/lib/python2.6/subprocess.py:1233:_communicate
# /usr/lib/python2.6/subprocess.py:1157:wait
# ]
# calling os.abort()

Possibly related: http://bugs.python.org/issue1068268 . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).

Reported by 'sickness' on irc: ``` # Run # test_loadable ... [OK] # test_reloadable ... Node._startService failed, aborting # [Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 4] Interrupted system call # /usr/lib/python2.6/threading.py:497:__bootstrap # /usr/lib/python2.6/threading.py:525:__bootstrap_inner # /usr/lib/python2.6/threading.py:477:run # --- <exception caught here> --- # /usr/lib/python2.6/vendor-packages/twisted/python/threadpool.py:210:_worker # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:59:callWithContext # /usr/lib/python2.6/vendor-packages/twisted/python/context.py:37:callWithContext # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:222:_synchronously_find_addresses_via_config # /home/righdieg/allmydata-tahoe-1.8.2/src/allmydata/util/iputil.py:237:_query # /usr/lib/python2.6/subprocess.py:689:communicate # /usr/lib/python2.6/subprocess.py:1233:_communicate # /usr/lib/python2.6/subprocess.py:1157:wait # ] # calling os.abort() ``` Possibly related: <http://bugs.python.org/issue1068268> . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).
tahoe-lafs added the
code-network
major
defect
1.8.2
labels 2011-03-22 20:04:17 +00:00
tahoe-lafs added this to the 1.9.0 milestone 2011-03-22 20:04:17 +00:00
sickness commented 2011-03-22 21:28:16 +00:00
Author
Owner

The OS is opensolaris snv134 64bit

$ uname -a

SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris

$ psrinfo -pv

The physical processor has 2 virtual processors (0 1)

x86 (GenuineIntel 1067A family 6 model 23 step 10 clock 2800 MHz)

Pentium(r) Dual-Core CPU E6300 @ 2.80GHz

$ isainfo -x

amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu

i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu

This is instead the tahoe version:

$ allmydata-tahoe-1.8.2/bin/tahoe --version

allmydata-tahoe: 1.8.2,

foolscap: 0.6.1,

pycryptopp: 0.5.29,

zfec: 1.4.22,

Twisted: 8.2.0,

Nevow: 0.10.0,

zope.interface: unknown,

python: 2.6.4,

platform: SunOS-5.11-i86pc-i386-32bit-ELF,

pyOpenSSL: 0.11,

simplejson: 2.0.9,

pycrypto: 2.3,

pyasn1: unknown,

mock: 0.7.0,

sqlite3: 2.4.1 [3.6.17]sqlite,

setuptools: 0.6c16dev3

The OS is opensolaris snv134 64bit $ uname -a SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris $ psrinfo -pv The physical processor has 2 virtual processors (0 1) x86 (GenuineIntel 1067A family 6 model 23 step 10 clock 2800 MHz) Pentium(r) Dual-Core CPU E6300 @ 2.80GHz $ isainfo -x amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu This is instead the tahoe version: $ allmydata-tahoe-1.8.2/bin/tahoe --version allmydata-tahoe: 1.8.2, foolscap: 0.6.1, pycryptopp: 0.5.29, zfec: 1.4.22, Twisted: 8.2.0, Nevow: 0.10.0, zope.interface: unknown, python: 2.6.4, platform: SunOS-5.11-i86pc-i386-32bit-ELF, pyOpenSSL: 0.11, simplejson: 2.0.9, pycrypto: 2.3, pyasn1: unknown, mock: 0.7.0, sqlite3: 2.4.1 [3.6.17]sqlite, setuptools: 0.6c16dev3
davidsarah commented 2011-03-23 01:26:42 +00:00
Author
Owner

Replying to sickness:

python: 2.6.4,

Hmm, that should have had the backported fix for http://bugs.python.org/issue1068268 . Oh well, we would need to work around it for earlier Pythons anyway.

Replying to [sickness](/tahoe-lafs/trac-2024-07-25/issues/1381#issuecomment-83001): > python: 2.6.4, Hmm, that should have had the backported fix for <http://bugs.python.org/issue1068268> . Oh well, we would need to work around it for earlier Pythons anyway.

Should we work-around this by catching OSError with errno==4 and retrying the subprocess?

Should we work-around this by catching `OSError` with `errno==4` and retrying the subprocess?
davidsarah commented 2011-05-29 15:33:32 +00:00
Author
Owner

Replying to zooko:

Should we work-around this by catching OSError with errno==4 and retrying the subprocess?

Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The try/except should cover [lines 236 and 237 of iputil.py]source:src/allmydata/util/iputil.py@4971#L236.

BTW, rather than 4 we should use errno.EINTR (I think this is defined on all platforms, even though EINTR is only really relevant on Unix).

Should _query return [] (i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1381#issuecomment-83003): > Should we work-around this by catching `OSError` with `errno==4` and retrying the subprocess? Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The `try/except` should cover [lines 236 and 237 of iputil.py]source:src/allmydata/util/iputil.py@4971#L236. BTW, rather than 4 we should use `errno.EINTR` (I think this is defined on all platforms, even though `EINTR` is only really relevant on Unix). Should `_query` return `[]` (i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').
tahoe-lafs modified the milestone from 1.9.0 to 1.10.0 2011-08-14 00:09:40 +00:00

See #1988

See #1988
daira commented 2013-05-27 20:50:38 +00:00
Author
Owner

This is a separate bug to #1988, though. The correct fix is to retry.

This is a separate bug to #1988, though. The correct fix is to retry.

Replying to daira:

This is a separate bug to #1988, though. The correct fix is to retry.

Agreed.

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1381#issuecomment-83007): > This is a separate bug to #1988, though. The correct fix is to retry. Agreed.
daira commented 2013-05-30 18:36:06 +00:00
Author
Owner
Review needed for <https://github.com/daira/tahoe-lafs/commits/refactor-address-finding>.

Replying to daira:

Review needed for https://github.com/daira/tahoe-lafs/commits/refactor-address-finding.

Not ready yet, tests fail.

Replying to [daira](/tahoe-lafs/trac-2024-07-25/issues/1381#issuecomment-83009): > Review needed for <https://github.com/daira/tahoe-lafs/commits/refactor-address-finding>. Not ready yet, tests fail.
daira commented 2013-06-14 23:52:37 +00:00
Author
Owner

Oops, I accidentally committed the patch for this while committing the reviewed fix for #1717. Sorry :-(

I'll fix the tests next.

Oops, I accidentally committed the patch for this while committing the reviewed fix for #1717. Sorry :-( I'll fix the tests next.
Daira Hopwood <david-sarah@jacaranda.org> commented 2013-06-25 18:15:57 +00:00
Author
Owner

In /tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6:

iputil.py: add tests for recent changes. refs #1381, #1988, #982, #1064, #1536, #1935, #898, #1707, #1918

Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org>
In [/tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6](/tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6): ``` iputil.py: add tests for recent changes. refs #1381, #1988, #982, #1064, #1536, #1935, #898, #1707, #1918 Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org> ```
daira commented 2013-06-27 02:09:40 +00:00
Author
Owner

The tests are fixed, but this still needs review. The relevant patches for the bugfix are [6a445d73]changeset:6a445d73bc5253ec4ae0dec70af02e33bc869cf6/trunk and [6104950e]changeset:6104950ed8a7a356eed2218f2df958d074022eea/trunk. It is tested by simulating an EINTR on the first call to subprocess.Popen, in each of the new test_list_async_mock_* tests.

The tests are fixed, but this still needs review. The relevant patches for the bugfix are [6a445d73]changeset:6a445d73bc5253ec4ae0dec70af02e33bc869cf6/trunk and [6104950e]changeset:6104950ed8a7a356eed2218f2df958d074022eea/trunk. It is tested by simulating an `EINTR` on the first call to `subprocess.Popen`, in each of the new `test_list_async_mock_*` tests.

+1

+1
tahoe-lafs added the
fixed
label 2013-07-17 13:00:09 +00:00
daira closed this issue 2013-07-17 13:00:09 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1381
No description provided.