disconnect unresponsive servers (using foolscap's disconnectTimeout) #521

New Issue

warner · 2008-09-24T04:11:26Z

warner commented

2008-09-24 04:11:26 +00:00

#287 describes an important application-level fix we need to make: when
uploading or downloading files, don't depend upon timely responses from our
servers. Another aspect of this is to try and identify servers that are stuck
(i.e. trapped in an infinite loop, or having memory problems: something that
allows the TCP connections to look alive but prevents responses to the
Foolscap messages). Yesterday, prodtahoe7 got into a situation like this,
because of disk problems that got so bad we couldn't even log into the host
through the local console.

To this end, Foolscap offers two timer options: "keepaliveTimeout" (which
defaults to four minutes) and "disconnectTimeout" (which defaults to off). We
are considering activating the disconnectTimeout to reduce the time during
which a stuck server causes clients to hang until we implement the #287 fix.

The keepaliveTimeout means that every $TIMEOUT the Tub-to-Tub protocol object
(called "banana") will check to see how long it has been since any data was
received on that connection. If this "age" is greater than $TIMEOUT, the Tub
sends a PING message to the other side. As soon as the remote Banana protocol
instance receives the PING, it will send back a PONG message. The idea is
that the PONG will reset the last-heard-from-age.

This approach, while cheap (the only operation that occurs during
dataReceived is to record the current time), means that the PING might occur
after 4 minutes of silence, or after almost 8 minutes (imagine that the
four-minute timer is reset at T=0, data is received at time T=1s, the timer
fires at T=4min and sees that age=3:59s, so it resets, the timer fires again
at T=8min and then sends the PING).

keepaliveTimeout is mainly intended to keep entries in NAT tables alive. It
also has the effect of giving TCP something to work with: TCP will drop a
connection that does not ACK the outbound data within some (rather long) time
period; the keepaliveTimeout insures that each side tries to send at least a
few bytes every 8ish minutes, allowing TCP to drop the connection within a
few hours of a silent disconnect.

The other Foolscap timeout is "disconnectTimeout". It works the same way as
keepaliveTimeout, but when the timer fires and the silence is found to be too
long, it drops the connection. This timer is not enabled in Foolscap by
default because I wasn't sure what could be a safe+appropriate value to use.

The metric of interest here is the min/max period of unresponsiveness after
which the connection will be dropped. This is nontrivial because of the way
that these two timers interact (i.e. it depends upon their relative phases).
If we set disconnectTimeout to 15 minutes, then after we send a PING, we
might wait anywhere from 7 to 38 minutes before disconnecting. (DT-2KT to
2DT+2*KT).

The problem is that the lack of inbound traffic (which would reset the timers
and prevent the disconnect) is not a good indicator of a stuck server. Client
A might be uploading a large amount of data to Server B. The server sees lots
of data arriving, so its keepalive and disconnect timers are happy. If the
server doesn't need to respond to the client for anything, it won't be
sending any data. Eventually (max=2*keepaliveTimeout) the client will send a
PING, but this could get stuck behind the data that's being sent, so the PONG
won't be sent until that PING finally makes it across the wire.

For Tahoe, the worst case here is when a client is uploading a file, which
involves sending a block of data (128KiB/3==40KiB) to each of 10 servers at
the same time (400KiB in total). If this takes more than 7 minutes to
transfer (an upstream rate of 975Bps/7.8kbps), then we're in danger of
abandoning one or more of the connections. The problem is worse if we're
uploading several files at once, or if the user's upstream pipe is being
shared with other applications or other computers.

Increasing the proposed disconnectTimeout to 30 minutes results in a 22-68
minute window of silence-before-disconnect.

It may be that the best fix would be to modify Foolscap to use a different
timer mechanism: a timer which fires once every keepaliveTimeout/4 would
reduce the variability considerably, while not increasing the quiescent CPU
usage by more than a factor of four. The range would then be from (DT-1.25KT
to 1.25DT+1.25*KT), so KT=4min and DT=15min would give us 10-23.75min, and
DT=30min would give us 25-42.5min .

The real answer, of course, is that connections are nothing more than a
convenient fiction, and that we must be prepared to suffer the reality that
lies behind that curtain. The timeout tradeoffs in #287 are the real
questions to address.

#287 describes an important application-level fix we need to make: when uploading or downloading files, don't depend upon timely responses from our servers. Another aspect of this is to try and identify servers that are stuck (i.e. trapped in an infinite loop, or having memory problems: something that allows the TCP connections to look alive but prevents responses to the Foolscap messages). Yesterday, prodtahoe7 got into a situation like this, because of disk problems that got so bad we couldn't even log into the host through the local console. To this end, Foolscap offers two timer options: "keepaliveTimeout" (which defaults to four minutes) and "disconnectTimeout" (which defaults to off). We are considering activating the disconnectTimeout to reduce the time during which a stuck server causes clients to hang until we implement the #287 fix. The keepaliveTimeout means that every $TIMEOUT the Tub-to-Tub protocol object (called "banana") will check to see how long it has been since any data was received on that connection. If this "age" is greater than $TIMEOUT, the Tub sends a PING message to the other side. As soon as the remote Banana protocol instance receives the PING, it will send back a PONG message. The idea is that the PONG will reset the last-heard-from-age. This approach, while cheap (the only operation that occurs during dataReceived is to record the current time), means that the PING might occur after 4 minutes of silence, or after almost 8 minutes (imagine that the four-minute timer is reset at T=0, data is received at time T=1s, the timer fires at T=4min and sees that age=3:59s, so it resets, the timer fires again at T=8min and then sends the PING). keepaliveTimeout is mainly intended to keep entries in NAT tables alive. It also has the effect of giving TCP something to work with: TCP will drop a connection that does not ACK the outbound data within some (rather long) time period; the keepaliveTimeout insures that each side tries to send at least a few bytes every 8ish minutes, allowing TCP to drop the connection within a few hours of a silent disconnect. The other Foolscap timeout is "disconnectTimeout". It works the same way as keepaliveTimeout, but when the timer fires and the silence is found to be too long, it drops the connection. This timer is not enabled in Foolscap by default because I wasn't sure what could be a safe+appropriate value to use. The metric of interest here is the min/max period of unresponsiveness after which the connection will be dropped. This is nontrivial because of the way that these two timers interact (i.e. it depends upon their relative phases). If we set disconnectTimeout to 15 minutes, then after we send a PING, we might wait anywhere from 7 to 38 minutes before disconnecting. (DT-2*KT to 2*DT+2*KT). The problem is that the lack of inbound traffic (which would reset the timers and prevent the disconnect) is not a good indicator of a stuck server. Client A might be uploading a large amount of data to Server B. The server sees lots of data arriving, so its keepalive and disconnect timers are happy. If the server doesn't need to respond to the client for anything, it won't be sending any data. Eventually (max=2*keepaliveTimeout) the client will send a PING, but this could get stuck behind the data that's being sent, so the PONG won't be sent until that PING finally makes it across the wire. For Tahoe, the worst case here is when a client is uploading a file, which involves sending a block of data (128KiB/3==40KiB) to each of 10 servers at the same time (400KiB in total). If this takes more than 7 minutes to transfer (an upstream rate of 975Bps/7.8kbps), then we're in danger of abandoning one or more of the connections. The problem is worse if we're uploading several files at once, or if the user's upstream pipe is being shared with other applications or other computers. Increasing the proposed disconnectTimeout to 30 minutes results in a 22-68 minute window of silence-before-disconnect. It may be that the best fix would be to modify Foolscap to use a different timer mechanism: a timer which fires once every keepaliveTimeout/4 would reduce the variability considerably, while not increasing the quiescent CPU usage by more than a factor of four. The range would then be from (DT-1.25*KT to 1.25*DT+1.25*KT), so KT=4min and DT=15min would give us 10-23.75min, and DT=30min would give us 25-42.5min . The real answer, of course, is that connections are nothing more than a convenient fiction, and that we must be prepared to suffer the reality that lies behind that curtain. The timeout tradeoffs in #287 are the real questions to address.

warner added the

labels 2008-09-24 04:11:26 +00:00

warner added this to the 1.3.0 milestone 2008-09-24 04:11:26 +00:00

warner self-assigned this 2008-09-24 04:11:26 +00:00

zooko commented

2008-09-24 13:26:06 +00:00

See also #193 and #253.

zooko commented

2009-02-07 19:49:05 +00:00

This doesn't seem to be necessary for 1.3.0.

zooko removed this from the 1.3.0 milestone 2009-02-07 19:49:05 +00:00

davidsarah commented

2009-11-22 16:12:41 +00:00

Can the server send unsolicited PONGs to a client that is uploading to it?

(I agree that fixing #287 is the real solution.)

Can the server send unsolicited PONGs to a client that is uploading to it? (I agree that fixing #287 is the real solution.)

warner commented

2009-11-24 06:07:42 +00:00

hm, I suppose. I guess that would take the form of a third timer, which keeps track of how long it's been since we last sent anything, and sends a PONG (or similar no-op message) when the timer fires. Perhaps give it the same value (and timer) as the first one, so the code that might send a PING will also always send a PONG.

Foolscap#143 has been opened for this one.

hm, I suppose. I guess that would take the form of a third timer, which keeps track of how long it's been since we last *sent* anything, and sends a PONG (or similar no-op message) when the timer fires. Perhaps give it the same value (and timer) as the first one, so the code that might send a PING will also always send a PONG. [Foolscap#143](http://foolscap.lothar.com/trac/ticket/143) has been opened for this one.

davidsarah commented

2010-12-16 00:53:52 +00:00

A case possibly related to this was reported by Shu Lin on tahoe-dev.

A case possibly related to this was reported by Shu Lin [on tahoe-dev](http://tahoe-lafs.org/pipermail/tahoe-dev/2010-December/005727.html).

tahoe-lafs added this to the undecided milestone 2011-08-16 04:33:15 +00:00

Sign in to join this conversation.