SSL handshake failure with 1.12 storage nodes over I2P #2861
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2861
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Steps to recreate:
--hide-ip --listen=i2p
andconnections tcp = disabled
.Expected behaviour:
The storage node connects to all existing storage nodes that are online and reachable.
Actual behaviour:
The storage node connects fine to older storage nodes (running the patched 1.9.2 or 1.10.0 Tahoe-LAFS+I2P builds), but fails to connect to 1.12.0 storage nodes (including itself via loopback). Specifically, the web UI shows that it gets past "connecting" to "negotiating", and then throws:
Huh, I don't know what that could be. My hunch is that the I2P connection is not really established, and the negotiation messages are getting dropped or corrupted somehow.
Let's try to extract more information from foolscap:
flogtool tail -s out.flog CLIENTNODEDIR/private/logport.furl
When the storage node's announcement arrives (via the Introducer), the client will attempt to connect, and will record some of the negotiation process into
out.flog
. We're looking for deviations from the usual negotiation process, maybe something about an expected message not being seen.If that doesn't yield anything immediately useful, the next step will be to modify the foolscap code on both sides and have them log everything they get over the connection. It'd be interesting to know whether they make it far enough to switch to TLS (and it's really the TLS handshake that's getting broken), or if something goes awry before that point (when they're still speaking HTTP-ish).
For background, Foolscap starts by making a plain TCP connection, then exchanges a very HTTP-like request and response, then both sides are supposed to do
.start_tls()
. So the connecting host will sendGET /id/$tubid HTTP/1.1
and some headers (includingUpgrade: TLS/1.0
) and a double-newline and then the very next byte will be the TLS negotiation (maybe CLIENTHELLO?). The receiving host will sendHTTP/1.1 101 Switching Protocols
and some headers and a double-newline and then start on the TLS bytes.Foolscap is expecting the connection it gets to be 8bit-clean and transparent. Can you think of any reason why the I2P proxy might be interpreting HTTP-like data inside the connection and maybe modifying the data or its behavior in response to what it sees?
Attachment introducer-log.txt (15435 bytes) added
Log of 1.12 introducer receiving connection from 1.12 storage node
This issue affects a 1.12 storage node connecting to a 1.12 introducer as well. See the attached logs.
Attachment storage-log.txt (28300 bytes) added
Log of 1.12 storage node connecting to 1.12 introducer over I2P
Attachment client-to-introducer-1.12.ssldump.txt (6482 bytes) added
ssldump of communication between Tahoe 1.12 client and 1.12 introducer
Attachment client-to-introducer-1.11.ssldump.txt (3622 bytes) added
ssldump of communication between Tahoe 1.12 client and 1.11 introducer
The two new files show the network behaviour for a 1.12 client connecting over I2P to a 1.11 introducer vs a 1.12 introducer. The 1.11 introducer is my custom-patched build, but the patches don't touch negotiation at all, so it is equivalent to stock 1.11 for these purposes.
The only difference between the traces pre-TLS is that the 1.12 introducer sends the client a bunch of extra cruft at the end of its
HTTP/1.1 101 Switching Protocols
packet. The client doesn't seem to care, though.The client then sends a
ClientHello
packet to the introducer, which AFAICT is identical in both cases. However, the 1.11 introducer responds with aServerHello
, while the1.12
introducer closes the connection.Note that the introducer-to-client cruft visible in the
ssldump
trace looks exactly the same (at least, the first few bytes match) as that in the additionaldataReceived()
calls in the earlier storage node logs.Attachment client-to-introducer-1.12-noi2p.ssldump.txt (3493 bytes) added
ssldump of communication between Tahoe 1.12 client and 1.12 introducer using localhost TCP instead of I2P
Okay, it is definitely related to whatever that cruft is. I manually tweaked the tahoe.cfg files of the 1.12 client and 1.12 introducer to use localhost TCP instead of I2P. The cruft disappeared and the connection started working.
we're on a tight timeline for the debian freeze.. I think we need to get 1.12.1 released in about 24 hours to get it into Stretch. Any progress on this one?
Ticket retargeted after milestone closed
We did a lot of digging in today's devchat, and learned the following:
startTLS()
method knows whether its connection is a client-like or a server-like connection (transport._tlsClientDefault
), and tells TLS to use a ClientHello or ServerHello to matchtxtorcon
onion-service listener uses a server-like connection, so that works too.transport.startTLS(ctx, normal=False)
to tell it to flip the direction, which would probably helpWe don't yet know a good way to tell Foolscap that it needs to pass in this argument. Some options:
handler.hint_to_endpoint()
could somehow return(endpoint, tls_is_reversed)
normal = getattr(self.transport, "_foolscap_tls_is_normal", True)
startTLS()
to upcall with the rightnormal
argumentOne complication is that it isn't always obvious (to e.g. txi2p) that the connection it was given is a client-like or server-like transport (or whether it's capable of startTLS at all). It's unfortunate that
startTLS()
takesnormal=
rather thanisClient=
. Foolscap knows for sure whether it wants TLS to be client-like or server-like, but when the only knob we have isnormal=
, we must also know whether the underlying ITransport is client-like or server-like (so we know when to reverse TLS's handling). As far as we've been able to tell, the ITransport client-vs-server flag is private, even though thenormal=
argument is public.Some additional things to check before diving too deep into finding a good approach:
Client: Upgrade
message, and confirm that it really is a ClientHello. (it is supposed to be a ServerHello, but if startTLS is confused by txi2p using client-like connections, it makes sense that we'd send a ClientHello here). We don't know why tlsdump didn't parse it as such (maybe it wasn't expecting a TLS packet to appear in the middle of a protocol stream, which would imply that tlsdump doesn't handle STARTTLS-like protocols very well). Either compare these bytes against a normal wireshark trace, or look up the TLS docs and manually check the packet format. str4d astutely noticed that the cruft bytes include things like "c0 30" and "c0 2c", which were identified by tlsdump (in the "-noi2p.ssldump.txt" trace) as unrecognized ciphersuite values, and that only the ClientHello contains multiple ciphersuites (since the ServerHello only contains the decision). He also noticed that the Foolscap server shouldn't be sending any TLS messages at all until the client has sent the ClientHello, since TLS servers make the decision, so they can't send anything without first hearing the client's hello.normal=False
and see if that makes the connections workOther possibilities that we came up with:
We identified at least two concerns about the way txi2p is working, that shouldn't affect correctness but probably affect performance:
txi2p.sam.stream.StreamAcceptReceiver.dataReceived
: any application data that is received in the same chunk as the initial peer-destination line will be delayed. It gets stashed asself.initialData
properly, but will not be delivered until the nextdataReceived
is called. If the peer sends an initial chunk and then waits for a response, the local application will never receive that chunk. This is not a problem for client-goes-first protocols like HTTP, but would cause a loss of progress for server-goes-first protocols like SMTPStreamAcceptReceiver
(txi2p/grammar.py
) uses ananything:data
clause to match all bytes once the parser has moved into the post-SAMState_readData
state, and that clause probably just matches a single wildcard byte. This is sound, but probably bad for performance (especially for foolscap), since a large chain of python methods will be executed for every byte of the input. It would be fastest if large bytestrings could be transferred in complete buffers in a single call. We should do some performance tests on this and compare the CPU usage of a tahoe server (during file upload) for a given fixed data rate, I2P vs plain TCP. Ideally the txi2p parser would be bypassed completely once a connection has been moved toState_readData
, similar totwisted.protocols.basic.LineReceiver.setRawMode()
, but doing that safely requires careful attention to the.dataReceived()
ordering/duplication/reentrancy concerns described above.I've confirmed that a quick hack to call
startTLS(ctx, normal=False)
on an I2P-based Introducer (but not the client) was enough to allow a connection to get through. Not a solution yet, but it suggests we're headed in the right direction.Hm, it might be cleanest to get a new API added to Twisted: maybe a
side=
argument toITLSTransport.startTLS()
. It would overridenormal=
, and would explicitly declare the TLS stack to be "client" or "server", independent of the underlying transport. Foolscap knows exactly which TLS side it's supposed to be, so it could just set it directly.I'll see if there are any relevant Twisted tickets on this, and if not I'll add one.
Twisted#3278 appears to be the same issue (two AMP clients trying to use TLS). The patch on that ticket is 9 years old (!), so will undoubtedly need some cleanup, but it's probably a good place to start.
and I've confirmed that the
16 03 01 00 e0 01 00 00 dc 03 03 22 71
sequence (in the "cruft") is a ClientHello: the16
means "Handshake",03 01
is TLS-1.0,00 e0
is a packet length,01
is ClientHello, then
00 00 dc` is handshake length (all according to the wikipedia TLS diagram).In order to make storage nodes usable in the short term, I've pushed a workaround for this bug to txi2p, which I will release shortly as 0.3.2. It's not particularly robust, as it assumes that the underlying transport always behaves as a client; I agree with warner that a new API in Twisted would be a much better solution.
Moving open issues out of closed milestones.
Ticket retargeted after milestone closed