create-node --listen=tor hangs with tor-0.2.8.8 #2837

Open
opened 2016-10-09 15:01:32 +00:00 by warner · 2 comments

After finishing off #2490, I noticed during testing that tahoe create-node --listen=tor consistently hangs on one of my test machines (which is running debian/sid, with tor-0.2.8.8 git-!8d8a099454d994bd). This happens when the system Tor process has been running for a while, at least a few days. If I bounce the Tor process, then create-node finishes correctly in the 30-40 seconds that I expect it to take.

This doesn't happen on an Ubuntu-16.04 box (running tor-0.2.7.6 git-!605ae665009853bd). Both cases are using txtorcon-0.17.0 . I'm guessing that there's something broken with the Tor on my sid box, but maybe there's something about the tor-control-port command stream in the more recent Tor that's confusing txtorcon.

Meejah suggested a patch like this to turn on command-stream debugging:

from txtorcon.log import debug_logging
debug_logging()

and with that, I found differences between the two command streams. They're identical (modulo the random auth-cookie) through the following commands and their responses:

cmd: AUTHCHALLENGE SAFECOOKIE [cookie]
cmd: AUTHENTICATE [cookie]
cmd: GETINFO signal/names
cmd: GETINFO version
Connected to a Tor with VERSION [version]
cmd: GETINFO events/names
cmd: USEFEATURE EXTENDED_EVENTS
cmd: GETINFO ns/all
6278 named routers found.
[list of duplicates]
2494 GUARDs

At that point, both do a cmd: GETINFO circuit-status. The working case (with 0.2.7.6) gets back a bunch of circuit_(new|extend|built) responses, then does a series of GETINFO ip-to-country/ipaddr commands, then a GETINFO stream-status. The hanging case sends the circuit-status but never sees the circuit_* messages, and goes directly to the GETINFO stream-status.

I don't know if this debugging includes the actual responses to each command, or if it's just logging async notifications.

After finishing off #2490, I noticed during testing that `tahoe create-node --listen=tor` consistently hangs on one of my test machines (which is running debian/sid, with tor-0.2.8.8 git-!8d8a099454d994bd). This happens when the system Tor process has been running for a while, at least a few days. If I bounce the Tor process, then create-node finishes correctly in the 30-40 seconds that I expect it to take. This doesn't happen on an Ubuntu-16.04 box (running tor-0.2.7.6 git-!605ae665009853bd). Both cases are using txtorcon-0.17.0 . I'm guessing that there's something broken with the Tor on my sid box, but maybe there's something about the tor-control-port command stream in the more recent Tor that's confusing txtorcon. Meejah suggested a patch like this to turn on command-stream debugging: ``` from txtorcon.log import debug_logging debug_logging() ``` and with that, I found differences between the two command streams. They're identical (modulo the random auth-cookie) through the following commands and their responses: ``` cmd: AUTHCHALLENGE SAFECOOKIE [cookie] cmd: AUTHENTICATE [cookie] cmd: GETINFO signal/names cmd: GETINFO version Connected to a Tor with VERSION [version] cmd: GETINFO events/names cmd: USEFEATURE EXTENDED_EVENTS cmd: GETINFO ns/all 6278 named routers found. [list of duplicates] 2494 GUARDs ``` At that point, both do a `cmd: GETINFO circuit-status`. The working case (with 0.2.7.6) gets back a bunch of `circuit_(new|extend|built)` responses, then does a series of `GETINFO ip-to-country/ipaddr` commands, then a `GETINFO stream-status`. The hanging case sends the circuit-status but never sees the `circuit_*` messages, and goes directly to the `GETINFO stream-status`. I don't know if this debugging includes the actual responses to each command, or if it's just logging async notifications.
warner added the
code-network
normal
defect
1.11.0
labels 2016-10-09 15:01:32 +00:00
warner added this to the undecided milestone 2016-10-09 15:01:32 +00:00
Owner

All the "600"-level responses are async notifications (i.e. all the circuit_* etc stuff) -- so it sort of seems like Tor "isn't doing stuff" in the hanging case (or at least: not creating new circuits).

You can try also something like:

def log_msg(msg):
    print("Tor: {}".format(msg))
control_proto.add_event_listener("INFO", log_msg)
control_proto.add_event_listener("DEBUG", log_msg)
All the "600"-level responses are async notifications (i.e. all the circuit_* etc stuff) -- so it sort of seems like Tor "isn't doing stuff" in the hanging case (or at least: not creating new circuits). You can try also something like: ``` def log_msg(msg): print("Tor: {}".format(msg)) control_proto.add_event_listener("INFO", log_msg) control_proto.add_event_listener("DEBUG", log_msg) ```
Author

debian#835119 and tor#19969 might be related.

[debian#835119](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=835119) and [tor#19969](https://trac.torproject.org/projects/tor/ticket/19969) might be related.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2837
No description provided.