Clients in onion grid busy-wait if a storage node is unreachable #2364

Open
opened 2015-01-18 19:57:13 +00:00 by mhazinsk · 3 comments
mhazinsk commented 2015-01-18 19:57:13 +00:00
Owner

When running "torify tahoe start", the process uses 100% CPU while attempting to connect to a down storage node. Meanwhile, connections to all other storage nodes and the introducer show as intermittent on the grid status page so the grid is unusable.

I'm using the latest trunk version of Tahoe on Arch Linux with Torsocks 2.0.0

Here's the errors that get printed:

[Jan 18 14:42:14] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:42:14-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=awo7) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:42:14-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
	
[Jan 18 14:42:24] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:42:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:42:56] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:42:56-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
[Jan 18 14:43:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:43:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:43:43] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=xivp) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
2015-01-18 14:43:43-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2)
	
[Jan 18 14:43:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:35] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:44:47] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:44:47-0500 [-] Unhandled Error
	Traceback (most recent call last):
	Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version)
	
[Jan 18 14:44:58] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:23] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:34] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:45:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
[Jan 18 14:46:21] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528)
2015-01-18 14:46:21-0500 [-] Reconnector._failed (furl=[redacted]): [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.NegotiationError'>: no connection established within client timeout

This problem persists across restarts of tahoe. However, when I remove the tahoe directory, recreate a tahoe directory with "tahoe create-client ~/.tahoe", and restart tahoe, the problem goes away. So far I have not been able to reproduce this on any other machines.

When running "torify tahoe start", the process uses 100% CPU while attempting to connect to a down storage node. Meanwhile, connections to all other storage nodes and the introducer show as intermittent on the grid status page so the grid is unusable. I'm using the latest trunk version of Tahoe on Arch Linux with Torsocks 2.0.0 Here's the errors that get printed: ``` [Jan 18 14:42:14] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) 2015-01-18 14:42:14-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=awo7) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2015-01-18 14:42:14-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2) [Jan 18 14:42:24] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:42:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:42:56] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) 2015-01-18 14:42:56-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [Jan 18 14:43:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:43:33] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:43:43] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) 2015-01-18 14:43:43-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=xivp) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2015-01-18 14:43:43-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version) 2015-01-18 14:43:43-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=poce) (during method=RIIntroducerPublisherAndSubscriberService_v2.tahoe.allmydata.com:subscribe_v2) [Jan 18 14:43:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:44:20] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:44:35] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:44:47] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) 2015-01-18 14:44:47-0500 [-] Unhandled Error Traceback (most recent call last): Failure: foolscap.ipb.DeadReferenceError: Connection was lost (to tubid=rnfm) (during method=RIStorageServer.tahoe.allmydata.com:get_version) [Jan 18 14:44:58] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:45:23] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:45:34] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:45:53] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) [Jan 18 14:46:21] ERROR torsocks[15086]: Host unreachable (in socks5_recv_connect_reply() at socks5.c:528) 2015-01-18 14:46:21-0500 [-] Reconnector._failed (furl=[redacted]): [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.NegotiationError'>: no connection established within client timeout ``` This problem persists across restarts of tahoe. However, when I remove the tahoe directory, recreate a tahoe directory with "tahoe create-client ~/.tahoe", and restart tahoe, the problem goes away. So far I have not been able to reproduce this on any other machines.
tahoe-lafs added the
unknown
major
defect
1.10.0
labels 2015-01-18 19:57:13 +00:00
tahoe-lafs added this to the undecided milestone 2015-01-18 19:57:13 +00:00
tahoe-lafs added
code-network
and removed
unknown
labels 2015-01-18 22:26:24 +00:00
daira commented 2015-01-19 02:23:36 +00:00
Author
Owner

mhazinsk: did you retain the original .tahoe directory, and can you diff it against the current one? (If you paste the result, don't include contents of files under private/, just which files are different.)

mhazinsk: did you retain the original `.tahoe` directory, and can you diff it against the current one? (If you paste the result, don't include contents of files under `private/`, just which files are different.)
mhazinsk commented 2015-01-25 06:43:30 +00:00
Author
Owner

I did, here's the result:

[matt@teselecta .tahoe]$ diff . ../.tahoe-badwolf/
Only in ../.tahoe-badwolf/: announcement-seqnum
Common subdirectories: ./logs and ../.tahoe-badwolf/logs
diff ./my_nodeid ../.tahoe-badwolf/my_nodeid
1c1
< nqgzgxcihanqdsxeusrmlohitkrvamvs
---
> okossyjuxucppahd6pns3bvniubwmip5
diff ./node.pubkey ../.tahoe-badwolf/node.pubkey
1c1
< pub-v0-d27copzopwk4yegc6reegfdjobjgzcn6ylt2bd2xnlddwy3gcejq
---
> pub-v0-lxibmoqhu4ielykc4fid3duj7duvjxylwlafm2wc464ytpjzy57q
Common subdirectories: ./private and ../.tahoe-badwolf/private
diff ./tahoe.cfg ../.tahoe-badwolf/tahoe.cfg
11c11
< nickname = teselecta
---
> nickname = badwolf
Common subdirectories: ./tmp and ../.tahoe-badwolf/tmp
Only in .: twistd.pid

The "~/.tahoe-badwolf" directory was migrated from another machine and caused the problem shown in the description.

I did, here's the result: ``` [matt@teselecta .tahoe]$ diff . ../.tahoe-badwolf/ Only in ../.tahoe-badwolf/: announcement-seqnum Common subdirectories: ./logs and ../.tahoe-badwolf/logs diff ./my_nodeid ../.tahoe-badwolf/my_nodeid 1c1 < nqgzgxcihanqdsxeusrmlohitkrvamvs --- > okossyjuxucppahd6pns3bvniubwmip5 diff ./node.pubkey ../.tahoe-badwolf/node.pubkey 1c1 < pub-v0-d27copzopwk4yegc6reegfdjobjgzcn6ylt2bd2xnlddwy3gcejq --- > pub-v0-lxibmoqhu4ielykc4fid3duj7duvjxylwlafm2wc464ytpjzy57q Common subdirectories: ./private and ../.tahoe-badwolf/private diff ./tahoe.cfg ../.tahoe-badwolf/tahoe.cfg 11c11 < nickname = teselecta --- > nickname = badwolf Common subdirectories: ./tmp and ../.tahoe-badwolf/tmp Only in .: twistd.pid ``` The "~/.tahoe-badwolf" directory was migrated from another machine and caused the problem shown in the description.

I've encountered this problem after upgrading to torsocks 2.x too. The problem did not exist with torsocks 1.x (version 2 is a complete rewrite, btw).

I'm not sure what is causing it, but rather than figuring it out dawuud and I are working on native proxy support in foolscap so that we can stop using torsocks altogether. It is working with dawuud's branches of tahoe and foolscap, but it is not ready to be merged yet. See #517.

I've encountered this problem after upgrading to torsocks 2.x too. The problem did not exist with torsocks 1.x (version 2 is a complete rewrite, btw). I'm not sure what is causing it, but rather than figuring it out dawuud and I are working on native proxy support in foolscap so that we can stop using torsocks altogether. It is working with dawuud's branches of tahoe and foolscap, but it is not ready to be merged yet. See #517.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2364
No description provided.