EINTR from communication with subprocess in allmydata/util/iputil.py _query #1381
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1381
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Reported by 'sickness' on irc:
Possibly related: http://bugs.python.org/issue1068268 . It may be that the patch for that bug wasn't complete enough. EINTR failures are usually not very reproducible, but the fix is just to repeat the query until it works (or fails with a different error).
The OS is opensolaris snv134 64bit
$ uname -a
SunOS MYWORKPC 5.11 snv_134 i86pc i386 i86pc Solaris
$ psrinfo -pv
The physical processor has 2 virtual processors (0 1)
x86 (GenuineIntel 1067A family 6 model 23 step 10 clock 2800 MHz)
Pentium(r) Dual-Core CPU E6300 @ 2.80GHz
$ isainfo -x
amd64: ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu
i386: ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu
This is instead the tahoe version:
$ allmydata-tahoe-1.8.2/bin/tahoe --version
allmydata-tahoe: 1.8.2,
foolscap: 0.6.1,
pycryptopp: 0.5.29,
zfec: 1.4.22,
Twisted: 8.2.0,
Nevow: 0.10.0,
zope.interface: unknown,
python: 2.6.4,
platform: SunOS-5.11-i86pc-i386-32bit-ELF,
pyOpenSSL: 0.11,
simplejson: 2.0.9,
pycrypto: 2.3,
pyasn1: unknown,
mock: 0.7.0,
sqlite3: 2.4.1 [3.6.17]sqlite,
setuptools: 0.6c16dev3
Replying to sickness:
Hmm, that should have had the backported fix for http://bugs.python.org/issue1068268 . Oh well, we would need to work around it for earlier Pythons anyway.
Should we work-around this by catching
OSError
witherrno==4
and retrying the subprocess?Replying to zooko:
Yes, I believe so. We probably shouldn't retry forever, so let's retry 10 times. The
try/except
should cover [lines 236 and 237 of iputil.py]source:src/allmydata/util/iputil.py@4971#L236.BTW, rather than 4 we should use
errno.EINTR
(I think this is defined on all platforms, even thoughEINTR
is only really relevant on Unix).Should
_query
return[]
(i.e. no addresses) if the subprocess fails? Oh, I see that issue is #854 ('what to do when you can't find any IP address for yourself').See #1988
This is a separate bug to #1988, though. The correct fix is to retry.
Replying to daira:
Agreed.
Review needed for https://github.com/daira/tahoe-lafs/commits/refactor-address-finding.
Replying to daira:
Not ready yet, tests fail.
Oops, I accidentally committed the patch for this while committing the reviewed fix for #1717. Sorry :-(
I'll fix the tests next.
In /tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6:
The tests are fixed, but this still needs review. The relevant patches for the bugfix are [6a445d73]changeset:6a445d73bc5253ec4ae0dec70af02e33bc869cf6/trunk and [6104950e]changeset:6104950ed8a7a356eed2218f2df958d074022eea/trunk. It is tested by simulating an
EINTR
on the first call tosubprocess.Popen
, in each of the newtest_list_async_mock_*
tests.+1