don't stop the process if you can't execute "ifconfig" or "route.exe" #1988
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1988
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Currently, the tahoe-lafs process tries to learn its IP address by creating a socket ([iputil.get_local_ip_for()]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L101), and also by executing either
ifconfig
(on unix) orroute.exe
(on windows) ([iputil code]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L137).Even if it doesn't learn its IP address, or if it learns and incorrect IP address, it can still work, provided that the other processes that it needs to talk to have announced their IP addresses and it is able to open TCP connections to them.
Now the problem is that if it can't execute
ifconfig
/route.exe
successfully, then it stops the process.This is the number one problem that prevents people from running Tahoe-LAFS on new platforms (#898, #1536, #1918, #1707). In fact, it may be the only thing that prevents Tahoe-LAFS from being portable to all sorts of unix-likes! It also causes various other problems due to security constraints (#982), and subprocesses being finicky (#854, #1064, #1381, #1935)
What if instead, if the call to the subprocess (
ifconfig
orroute.exe
) fails, that it logs this fact and then moves on. This would change the failure mode from the user's perspective from "It stopped at startup" to one of the following:I don't like, in principle, changing a hard "stop loudly" failure into a soft "misbehave confusingly" failure, but in this case it might be worth it. The thing is that the confusing 3-way failure mode from (partial) inability to connect to other processes already happens due to other reasons. We wouldn't be adding that failure mode, we would be converting certain situations (previously unsupported platform,
PATH
is set weirdly, security feature prevent access toifconfig
, etc.) from the hard-stop failure mode to the latter failure mode.This ticket supercedes #854 and #1536. Oh! And I see that #1536 has a patch from mk.fg! Good...
Okay, now I have a question: how often does
iputil.get_local_ip_for()
fail and the subprocess code succeed? When we first started using this technique it was literally 14 years ago, in 1999, at Mojo Nation, and at the time it seemed like the subprocess technique was necessary to get a high success rate of figuring our your own IP address. I have no idea if that is really true today.Actually let's leave #1536 (log these kinds of errors usefully) separate from this ticket (don't stop the process when you have this kind of error).
Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.
Replying to utf8notsupported:
Using
route.exe
is known to work. Let's not change it unless there's a compelling reason to do.Replying to [zooko]comment:4:
+1
In /tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6:
This should now be fixed on trunk, but the changes haven't been reviewed. I made a couple of mistakes on the way due partly to accidentally committing my branch before it was ready, and partly to platform differences that were revealed by buildbot testing, so I suggest reviewing the overall diff. (Note that this includes the fix to #1381.)
I'm looking at this patch, and one part is unclear: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/iputil.py?rev=b0883807361830c609dff1677c3cb34fd64d3ebb#L194
Why is the same try statement executed 5 times before failure?
Replying to markberger:
That's the fix for #1381.
subprocess.Popen()
orPopen.communicate()
can randomly fail with EINTR.I finally got back to reviewing this and it looks good to me. Since the patch has already been committed to trunk, I'm closing this ticket.