don't stop the process if you can't execute "ifconfig" or "route.exe" #1988

Closed
opened 2013-05-27 17:27:22 +00:00 by zooko · 10 comments

Currently, the tahoe-lafs process tries to learn its IP address by creating a socket ([iputil.get_local_ip_for()]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L101), and also by executing either ifconfig (on unix) or route.exe (on windows) ([iputil code]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L137).

Even if it doesn't learn its IP address, or if it learns and incorrect IP address, it can still work, provided that the other processes that it needs to talk to have announced their IP addresses and it is able to open TCP connections to them.

Now the problem is that if it can't execute ifconfig/route.exe successfully, then it stops the process.

This is the number one problem that prevents people from running Tahoe-LAFS on new platforms (#898, #1536, #1918, #1707). In fact, it may be the only thing that prevents Tahoe-LAFS from being portable to all sorts of unix-likes! It also causes various other problems due to security constraints (#982), and subprocesses being finicky (#854, #1064, #1381, #1935)

What if instead, if the call to the subprocess (ifconfig or route.exe) fails, that it logs this fact and then moves on. This would change the failure mode from the user's perspective from "It stopped at startup" to one of the following:

  1. "I can't connect to anyone."
  2. "I can't connect to some servers/clients but can connect to others."
  3. "I can connect to all servers/clients and everything works."

I don't like, in principle, changing a hard "stop loudly" failure into a soft "misbehave confusingly" failure, but in this case it might be worth it. The thing is that the confusing 3-way failure mode from (partial) inability to connect to other processes already happens due to other reasons. We wouldn't be adding that failure mode, we would be converting certain situations (previously unsupported platform, PATH is set weirdly, security feature prevent access to ifconfig, etc.) from the hard-stop failure mode to the latter failure mode.

This ticket supercedes #854 and #1536. Oh! And I see that #1536 has a patch from mk.fg! Good...

Currently, the tahoe-lafs process tries to learn its IP address by creating a socket ([iputil.get_local_ip_for()]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L101), and also by executing either `ifconfig` (on unix) or `route.exe` (on windows) ([iputil code]source:trunk/src/allmydata/util/iputil.py?annotate=blame&rev=1b84612fdf6623885ad4999fa245f9c87ccb53f6#L137). Even if it doesn't learn its IP address, or if it learns and incorrect IP address, it can still work, provided that the other processes that it needs to talk to have announced *their* IP addresses and it is able to open TCP connections to them. Now the problem is that if it can't execute `ifconfig`/`route.exe` successfully, then it stops the process. This is the number one problem that prevents people from running Tahoe-LAFS on new platforms (#898, #1536, #1918, #1707). In fact, it may be the *only* thing that prevents Tahoe-LAFS from being portable to all sorts of unix-likes! It also causes various other problems due to security constraints (#982), and subprocesses being finicky (#854, #1064, #1381, #1935) What if instead, if the call to the subprocess (`ifconfig` or `route.exe`) fails, that it logs this fact and then moves on. This would change the failure mode from the user's perspective from "It stopped at startup" to one of the following: 1. "I can't connect to anyone." 2. "I can't connect to some servers/clients but can connect to others." 3. "I can connect to all servers/clients and everything works." I don't like, in principle, changing a hard "stop loudly" failure into a soft "misbehave confusingly" failure, but in this case it might be worth it. The thing is that the confusing 3-way failure mode from (partial) inability to connect to other processes already happens due to other reasons. We wouldn't be *adding* that failure mode, we would be converting certain situations (previously unsupported platform, `PATH` is set weirdly, security feature prevent access to `ifconfig`, etc.) from the hard-stop failure mode to the latter failure mode. This ticket supercedes #854 and #1536. Oh! And I see that #1536 has a patch from mk.fg! Good...
zooko added the
code-network
normal
defect
1.10.0
labels 2013-05-27 17:27:22 +00:00
zooko added this to the undecided milestone 2013-05-27 17:27:22 +00:00
Author

Okay, now I have a question: how often does iputil.get_local_ip_for() fail and the subprocess code succeed? When we first started using this technique it was literally 14 years ago, in 1999, at Mojo Nation, and at the time it seemed like the subprocess technique was necessary to get a high success rate of figuring our your own IP address. I have no idea if that is really true today.

Okay, now I have a question: how often does `iputil.get_local_ip_for()` *fail* and the subprocess code succeed? When we first started using this technique it was literally 14 years ago, in 1999, at Mojo Nation, and at the time it seemed like the subprocess technique was necessary to get a high success rate of figuring our your own IP address. I have no idea if that is really true today.
Author

Actually let's leave #1536 (log these kinds of errors usefully) separate from this ticket (don't stop the process when you have this kind of error).

Actually let's leave #1536 (log these kinds of errors usefully) separate from this ticket (don't stop the process when you have this kind of error).
utf8notsupported commented 2013-06-15 07:36:56 +00:00
Owner

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.
Author

Replying to utf8notsupported:

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

Using route.exe is known to work. Let's not change it unless there's a compelling reason to do.

Replying to [utf8notsupported](/tahoe-lafs/trac-2024-07-25/issues/1988#issuecomment-92037): > Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig. Using `route.exe` is known to work. Let's not change it unless there's a compelling reason to do.
daira commented 2013-06-17 12:30:54 +00:00
Owner

Replying to [zooko]comment:4:

Replying to utf8notsupported:

Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig.

Using route.exe is known to work. Let's not change it unless there's a compelling reason to do.

+1

Replying to [zooko]comment:4: > Replying to [utf8notsupported](/tahoe-lafs/trac-2024-07-25/issues/1988#issuecomment-92037): > > Why not use ipconfig.exe on windows instead of route.exe? It's a closer w32 analog to ifconfig. > > Using `route.exe` is known to work. Let's not change it unless there's a compelling reason to do. +1
Daira Hopwood <david-sarah@jacaranda.org> commented 2013-06-25 18:15:57 +00:00
Owner

In /tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6:

iputil.py: add tests for recent changes. refs #1381, #1988, #982, #1064, #1536, #1935, #898, #1707, #1918

Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org>
In [/tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6](/tahoe-lafs/trac-2024-07-25/commit/a493ee0bb641175ecf918e28fce4d25df15994b6): ``` iputil.py: add tests for recent changes. refs #1381, #1988, #982, #1064, #1536, #1935, #898, #1707, #1918 Signed-off-by: Daira Hopwood <david-sarah@jacaranda.org> ```
daira commented 2013-06-27 02:01:24 +00:00
Owner

This should now be fixed on trunk, but the changes haven't been reviewed. I made a couple of mistakes on the way due partly to accidentally committing my branch before it was ready, and partly to platform differences that were revealed by buildbot testing, so I suggest reviewing the overall diff. (Note that this includes the fix to #1381.)

This should now be fixed on trunk, but the changes haven't been reviewed. I made a couple of mistakes on the way due partly to accidentally committing my branch before it was ready, and partly to platform differences that were revealed by buildbot testing, so I suggest reviewing the [overall diff](https://tahoe-lafs.org/trac/tahoe-lafs/changeset?reponame=trunk&new=b0883807361830c609dff1677c3cb34fd64d3ebb%40%2F&old=d85a75d7f689cb55ecddb319dc2057f38e4db87a%40%2F). (Note that this includes the fix to #1381.)
tahoe-lafs modified the milestone from undecided to 1.11.0 2013-06-27 02:01:24 +00:00
zooko was assigned by tahoe-lafs 2013-06-27 02:01:24 +00:00

I'm looking at this patch, and one part is unclear: https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/iputil.py?rev=b0883807361830c609dff1677c3cb34fd64d3ebb#L194

Why is the same try statement executed 5 times before failure?

I'm looking at this patch, and one part is unclear: <https://tahoe-lafs.org/trac/tahoe-lafs/browser/trunk/src/allmydata/util/iputil.py?rev=b0883807361830c609dff1677c3cb34fd64d3ebb#L194> Why is the same try statement executed 5 times before failure?
daira commented 2013-08-18 18:20:44 +00:00
Owner

Replying to markberger:

Why is the same try statement executed 5 times before failure?

That's the fix for #1381. subprocess.Popen() or Popen.communicate() can randomly fail with EINTR.

Replying to [markberger](/tahoe-lafs/trac-2024-07-25/issues/1988#issuecomment-92044): > Why is the same try statement executed 5 times before failure? That's the fix for #1381. `subprocess.Popen()` or `Popen.communicate()` can randomly fail with EINTR.

I finally got back to reviewing this and it looks good to me. Since the patch has already been committed to trunk, I'm closing this ticket.

I finally got back to reviewing this and it looks good to me. Since the patch has already been committed to trunk, I'm closing this ticket.
markberger added the
fixed
label 2013-09-16 23:34:43 +00:00
tahoe-lafs modified the milestone from soon to 1.11.0 2014-11-27 03:48:03 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1988
No description provided.