ugly (temporary) error message when connecting to new onion server #2850

Open
opened 2016-12-17 22:18:32 +00:00 by warner · 4 comments

While running some smoketests on 1.12, I spun up a new onion-based server (tahoe create-node --listen=tor) on an existing grid, while watching a client on that grid.

I think the client hears about the new service before it's really ready to listen, because for the first 10 or 15 seconds, the client's Welcome page displays an ugly traceback squeezed into the "Connection" column (screenshots attached). The full traceback (obtained from the flogtool log) was:

21:56:02.363 L20 []#132 TubConnector created from p4pmhkdfwp2hdrm33xgvyoiijzkkj6cm to likvz52mql57cxczpf4nnthfjsm4tngj
21:56:02.363 L20 []#133 considering hint: tor:7uq4kbwg5cbiwlhv.onion:3457
21:56:02.363 L20 []#134 connecting to hint: tor:7uq4kbwg5cbiwlhv.onion:3457
21:56:03.671 L28 []#135
 FAILURE:
 [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last):
   File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 498, in errback
     self._startRunCallbacks(fail)
   File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 565, in _startRunCallbacks
     self._runCallbacks()
   File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 651, in _runCallbacks
     current.result = callback(current.result, *args, **kw)
   File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1355, in gotResult
     _inlineCallbacks(r, g, deferred)
 --- <exception caught here> ---
   File "/home/warner/v

 -- TRACEBACK ELIDED --

 receive(data)
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/tube.py", line 41, in receive
     status = self._interp.receive(data)
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 48, in receive
     for x in self.next:
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 177, in apply
     for x in self._apply(f, ruleName, argvals):
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 110, in _apply
     for x in rule():
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 256, in parse_Or
     for x in self._eval(subexpr):
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 241, in parse_And
     for x in self._eval(subexpr):
   File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 440, in parse_Action
     val = eval(expr.data, self.globals, self._localsStack[-1])
   File "<string>", line 1, in <module>

   File "/home/warner/venv/local/lib/python2.7/site-packages/txsocksx/client.py", line 153, in serverResponse
     raise e.socks5ErrorMap.get(status)()
 txsocksx.errors.HostUnreachable:
 ]
21:56:03.671 L20 []#136 connectorFinished (<foolscap.connection.TubConnector object at 0x7fe0acf919d0 from p4pmhkdfwp2hdrm33xgvyoiijzkkj6cm to likvz52mql57cxczpf4nnthfjsm4tngj>)

After about 15 or maybe 30 seconds, the server became reachable, and the error message cleared up. I believe the client was using the SOCKS port of the local Tor daemon, so the delay probably wasn't because the client was waiting for a local Tor to launch, nor waiting for a control connection to the local Tor.

I think the fix for this will be to have the foolscap Tor handler map this txsocksx.errors.HostUnreachable to something more concise, like just "onion server unreachable".

It's possible that we could do something on the server side too. The server was using a control port on the system Tor daemon to register the onion service. It will have published the .onion address as soon as it started (just after establishing a Tor-mediated connection to the Introducer), but it tells Tor to set up the server at startup too, and that process probably didn't finish by the time the client heard about the address. The .onion address is generated during tahoe create-node, and written into tahoe.cfg, so it's available very early, before the Tor control connection is even started.

So maybe we want to change the server to wait for some sort of ACK from the tor_provider, telling us that the Tor daemon says the onion service descriptors and rendezvous points are ready to go, before allowing the IntroducerClient to publish the storage server. That might help with the first launch of the server. It won't help with later launches, when the Introducer (and clients) already have the announcement, and they're busy trying+retrying to connect, but there's nothing we can do about that.

While running some smoketests on 1.12, I spun up a new onion-based server (`tahoe create-node --listen=tor`) on an existing grid, while watching a client on that grid. I think the client hears about the new service before it's really ready to listen, because for the first 10 or 15 seconds, the client's Welcome page displays an ugly traceback squeezed into the "Connection" column (screenshots attached). The full traceback (obtained from the flogtool log) was: ``` 21:56:02.363 L20 []#132 TubConnector created from p4pmhkdfwp2hdrm33xgvyoiijzkkj6cm to likvz52mql57cxczpf4nnthfjsm4tngj 21:56:02.363 L20 []#133 considering hint: tor:7uq4kbwg5cbiwlhv.onion:3457 21:56:02.363 L20 []#134 connecting to hint: tor:7uq4kbwg5cbiwlhv.onion:3457 21:56:03.671 L28 []#135 FAILURE: [CopiedFailure instance: Traceback from remote host -- Traceback (most recent call last): File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 498, in errback self._startRunCallbacks(fail) File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 565, in _startRunCallbacks self._runCallbacks() File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 651, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/home/warner/venv/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 1355, in gotResult _inlineCallbacks(r, g, deferred) --- <exception caught here> --- File "/home/warner/v -- TRACEBACK ELIDED -- receive(data) File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/tube.py", line 41, in receive status = self._interp.receive(data) File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 48, in receive for x in self.next: File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 177, in apply for x in self._apply(f, ruleName, argvals): File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 110, in _apply for x in rule(): File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 256, in parse_Or for x in self._eval(subexpr): File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 241, in parse_And for x in self._eval(subexpr): File "/home/warner/venv/local/lib/python2.7/site-packages/ometa/interp.py", line 440, in parse_Action val = eval(expr.data, self.globals, self._localsStack[-1]) File "<string>", line 1, in <module> File "/home/warner/venv/local/lib/python2.7/site-packages/txsocksx/client.py", line 153, in serverResponse raise e.socks5ErrorMap.get(status)() txsocksx.errors.HostUnreachable: ] 21:56:03.671 L20 []#136 connectorFinished (<foolscap.connection.TubConnector object at 0x7fe0acf919d0 from p4pmhkdfwp2hdrm33xgvyoiijzkkj6cm to likvz52mql57cxczpf4nnthfjsm4tngj>) ``` After about 15 or maybe 30 seconds, the server became reachable, and the error message cleared up. I believe the client was using the SOCKS port of the local Tor daemon, so the delay probably wasn't because the client was waiting for a local Tor to launch, nor waiting for a control connection to the local Tor. I think the fix for this will be to have the foolscap Tor handler map this `txsocksx.errors.HostUnreachable` to something more concise, like just "onion server unreachable". It's possible that we could do something on the server side too. The server was using a control port on the system Tor daemon to register the onion service. It will have published the .onion address as soon as it started (just after establishing a Tor-mediated connection to the Introducer), but it tells Tor to set up the server at startup too, and that process probably didn't finish by the time the client heard about the address. The .onion address is generated during `tahoe create-node`, and written into tahoe.cfg, so it's available very early, before the Tor control connection is even started. So maybe we want to change the server to wait for some sort of ACK from the tor_provider, telling us that the Tor daemon says the onion service descriptors and rendezvous points are ready to go, before allowing the IntroducerClient to publish the storage server. That might help with the first launch of the server. It won't help with later launches, when the Introducer (and clients) already have the announcement, and they're busy trying+retrying to connect, but there's nothing we can do about that.
warner added the
code-frontend-web
normal
defect
1.11.0
labels 2016-12-17 22:18:32 +00:00
warner added this to the 1.13.0 milestone 2016-12-17 22:18:32 +00:00
Author

Attachment onion-connect-error.png (335967 bytes) added

error during connect

**Attachment** onion-connect-error.png (335967 bytes) added error during connect
Author

Attachment onion-connect-success.png (189680 bytes) added

successful connect

**Attachment** onion-connect-success.png (189680 bytes) added successful connect

Moving open issues out of closed milestones.

Moving open issues out of closed milestones.
exarkun modified the milestone from 1.13.0 to 1.15.0 2020-06-30 14:45:13 +00:00
Owner

Ticket retargeted after milestone closed

Ticket retargeted after milestone closed
meejah modified the milestone from 1.15.0 to soon 2021-03-30 18:40:19 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2850
No description provided.