Timeout error when uploading a file with some SFTP clients, e.g. WinSCP #1041

Open
opened 2010-05-14 22:16:29 +00:00 by freestorm · 9 comments
freestorm commented 2010-05-14 22:16:29 +00:00
Owner

When uploading a file (random generate), the upload is stopping at 100% with the error:

Host is not communicating for more than 15 seconds. Still waiting...
Warning. Aborting this operation will close connection!

Client:

WinSCP 4.2.7 (build 758)

OS:

Windows XP SP3 French

When uploading a file (random generate), the upload is stopping at 100% with the error: Host is not communicating for more than 15 seconds. Still waiting... Warning. Aborting this operation will close connection! Client:<br> WinSCP 4.2.7 (build 758) OS:<br> Windows XP SP3 French
tahoe-lafs added the
unknown
critical
defect
1.6.1
labels 2010-05-14 22:16:29 +00:00
tahoe-lafs added this to the undecided milestone 2010-05-14 22:16:29 +00:00
tahoe-lafs added
code-frontend
major
and removed
unknown
critical
labels 2010-05-15 01:21:22 +00:00
davidsarah commented 2010-05-15 01:32:31 +00:00
Author
Owner

IRC discussion (slightly edited):

davidsarah: I'm testing with Winscp, he display an strange error message: The host has not responded for more than 15 seconds, still waiting Cancel Help, (translated from French)

But: my SFTP node is on my machine, and the Introducer and Helper are in other location with VPN, so I need to test on the same LAN

any bug that causes a hang would probably result in that message from WinSCP, but 15 secs is a fairly short timeout

for example we can't guarantee that the latency of a 'close' request will be less than 15 secs

davidsarah: yes, I think so

davidsarah: it append near the end of transfer I think

an upload?

what size of file?

davidsarah: yes upload (all files are random)

1 Mbyte => OK

10 Mbyte and 50 Mbyte => restarting many times, and after it's okay

ah

so that does sound like 'close' latency
[...]

that's irritating, because the only way we can shorten the close latency (without more extensive changes) is to return success from the 'close' before we know that the file has actually been uploaded

does WinSCP have any way to configure that timeout? (I'm guessing not)

unfortunately the SFTP protocol has no way to say, "yes I'm still doing that, be patient"

I'm looking into WinSCP configuration

please open a ticket for this timeout problem
[...]

  • davidsarah thinks about how to solve that problem

I think this will have to be a known limitation of SFTP using some clients, for v1.7

OK, I'm going to do the same test on LAN, maybe Internet Connexion errors

we could have a config option to allow returning early success from the close, but I'm very reluctant to compromise on correctness/reliability here

IRC discussion (slightly edited): > <FreeStorm> davidsarah: I'm testing with Winscp, he display an strange error message: The host has not responded for more than 15 seconds, still waiting Cancel Help, (translated from French) > <FreeStorm> But: my SFTP node is on my machine, and the Introducer and Helper are in other location with VPN, so I need to test on the same LAN > <davidsarah> any bug that causes a hang would probably result in that message from WinSCP, but 15 secs is a fairly short timeout > <davidsarah> for example we can't guarantee that the latency of a 'close' request will be less than 15 secs > <FreeStorm> davidsarah: yes, I think so > <FreeStorm> davidsarah: it append near the end of transfer I think > <davidsarah> an upload? > <davidsarah> what size of file? > <FreeStorm> davidsarah: yes upload (all files are random) > <FreeStorm> 1 Mbyte => OK > <FreeStorm> 10 Mbyte and 50 Mbyte => restarting many times, and after it's okay > <davidsarah> ah > <davidsarah> so that does sound like 'close' latency [...] > <davidsarah> that's irritating, because the only way we can shorten the close latency (without more extensive changes) is to return success from the 'close' before we know that the file has actually been uploaded > <davidsarah> does WinSCP have any way to configure that timeout? (I'm guessing not) > <davidsarah> unfortunately the SFTP protocol has no way to say, "yes I'm still doing that, be patient" > <FreeStorm> I'm looking into WinSCP configuration > <davidsarah> please open a ticket for this timeout problem [...] > * davidsarah thinks about how to solve that problem > <davidsarah> I think this will have to be a known limitation of SFTP using some clients, for v1.7 > <FreeStorm> OK, I'm going to do the same test on LAN, maybe Internet Connexion errors > <davidsarah> we *could* have a config option to allow returning early success from the close, but I'm very reluctant to compromise on correctness/reliability here
davidsarah commented 2010-05-16 03:15:08 +00:00
Author
Owner

It's just possible that we may be able to fix this by sending keepalive packets on the connection. Whether this will work depends on whether the timeout is between a 'close' request and its response (in which case it won't help), or between any two SFTP packets.

It's just possible that we may be able to fix this by [sending keepalive packets](http://twistedmatrix.com/trac/browser/trunk/twisted/conch/ssh/transport.py?rev=25457#L488) on the connection. Whether this will work depends on whether the timeout is between a 'close' request and its response (in which case it won't help), or between any two SFTP packets.

Using a random encryption key instead of convergent encryption would solve this, right? Then the upload from Tahoe-LAFS gateway to storage servers (or helper) could proceed at the same time as the upload from SFTP client to Tahoe-LAFS gateway is proceeding. The cost would be that you lose convergence. We did a measurement on the allmydata.com customer base's files at one point. Unfortunately I don't recall precisely and a quick search hasn't turned up my published notes, but I think the estimated space savings from convergence for that set was less than 1%.

Using a random encryption key instead of convergent encryption would solve this, right? Then the upload from Tahoe-LAFS gateway to storage servers (or helper) could proceed at the same time as the upload from SFTP client to Tahoe-LAFS gateway is proceeding. The cost would be that you lose convergence. We did a measurement on the allmydata.com customer base's files at one point. Unfortunately I don't recall precisely and a quick search hasn't turned up my published notes, but I think the estimated space savings from convergence for that set was less than 1%.
davidsarah commented 2010-05-16 16:51:27 +00:00
Author
Owner

Replying to zooko:

Using a random encryption key instead of convergent encryption would solve this, right?

Nope. The problem is that SFTP allows random access writes, so the SFTP client could write a large file, then go back and change the first byte just before the close.

It is possible to take advantage of streaming upload in the case where the file is opened with flags FXF_WRITE | FXF_APPEND (meaning that all writes will be at the end of file). Most clients don't use FXF_APPEND, though, even when they are going to write the file linearly.

#935 could be implemented in a way that fixes this problem: the 'close' would cause the file to be stored durably on the gateway, which would be responsible for uploading it to the grid asynchronously (even if the gateway crashes and restarts). That would be at the expense of a looser consistency model: a successful 'close' would only guarantee that the file is immediately visible via this gateway, not other gateways.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1041#issuecomment-77268): > Using a random encryption key instead of convergent encryption would solve this, right? Nope. The problem is that SFTP allows random access writes, so the SFTP client could write a large file, then go back and change the first byte just before the close. It is possible to take advantage of streaming upload in the case where the file is opened with flags FXF_WRITE | FXF_APPEND (meaning that all writes will be at the end of file). Most clients don't use FXF_APPEND, though, even when they are going to write the file linearly. #935 could be implemented in a way that fixes this problem: the 'close' would cause the file to be stored durably on the gateway, which would be responsible for uploading it to the grid asynchronously (even if the gateway crashes and restarts). That would be at the expense of a looser consistency model: a successful 'close' would only guarantee that the file is immediately visible via this gateway, not other gateways.

But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the FXF_APPEND flag), then using a random encryption key would make things work very well and using convergent encryption makes things fail, if the file is large enough, or become unreliable, if we do write-caching. Am I right?

I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported.

But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the `FXF_APPEND` flag), then using a random encryption key would make things work very well and using convergent encryption makes things fail, if the file is large enough, or become unreliable, if we do write-caching. Am I right? I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported.

replying to myself:

Replying to zooko:

But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the FXF_APPEND flag), then using a random encryption key would make things work very well

No, this doesn't make sense. What would the Tahoe-LAFS gateway do if, after it had streamingly uploaded the file, then the SFTP client seeked back to the beginning and wrote something?

I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported.

This might still make sense, but it requires more changes to the Tahoe-LAFS upload logic.

replying to myself: Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1041#issuecomment-77270): > But in the common case that a client opens the file, writes the file in order from beginning to end, and closes the file (even though it doesn't give the `FXF_APPEND` flag), then using a random encryption key would make things work very well No, this doesn't make sense. What would the Tahoe-LAFS gateway do if, after it had streamingly uploaded the file, then the SFTP client seeked back to the beginning and wrote something? > I hope that in the long run we extend Tahoe-LAFS to support out-of-order writes of immutable files too, so that the case you described would also be cleanly supported. This might still make sense, but it requires more changes to the Tahoe-LAFS upload logic.
davidsarah commented 2010-05-25 22:17:38 +00:00
Author
Owner

In order to make the SFTP frontend work correctly with sshfs, we are planning to make the following changes (the first has already been done):

  • files can be renamed and deleted while there are handles to them.
  • if a file is closed and then reopened before the close has completed, then the open will be delayed until the close has completed.

This was necessary because sshfs returns success from a close call immediately after sending the FXP_CLOSE mesage, without waiting for a response from the SFTP server.

So we could now send the response to FXF_CLOSE immediately, without compromising consistency as viewed through the SFTP frontend. That would fix this bug, but with the following negative side-effects:

  • the upload might fail, in which case there would be no way to notify the client that it had failed.
  • the upload would not immediately be visible via non-SFTP frontends or other gateways.
In order to make the SFTP frontend work correctly with sshfs, we are planning to make the following changes (the first has already been done): * files can be renamed and deleted while there are handles to them. * if a file is closed and then reopened before the close has completed, then the open will be delayed until the close has completed. This was necessary because sshfs returns success from a close call immediately after sending the FXP_CLOSE mesage, without waiting for a response from the SFTP server. So we *could* now send the response to FXF_CLOSE immediately, without compromising consistency as viewed through the SFTP frontend. That would fix this bug, but with the following negative side-effects: * the upload might fail, in which case there would be no way to notify the client that it had failed. * the upload would not immediately be visible via non-SFTP frontends or other gateways.
davidsarah commented 2010-06-12 21:13:35 +00:00
Author
Owner

This problem is documented in wiki/SftpFrontend.

This problem is documented in [wiki/SftpFrontend](wiki/SftpFrontend).
slush commented 2010-06-13 19:06:26 +00:00
Author
Owner

Simple workaround work for me: Set WinSCP->Connection->Timeouts to 6000 seconds (maximum allowed). I succesfully uploaded 185MB file thru 1Mbit line on first attempt.

To be honest, second test to large upload was also succesful, but WinSCP crashed immediately after upload finished :).

Simple workaround work for me: Set WinSCP->Connection->Timeouts to 6000 seconds (maximum allowed). I succesfully uploaded 185MB file thru 1Mbit line on first attempt. To be honest, second test to large upload was also succesful, but WinSCP crashed immediately after upload finished :).
tahoe-lafs changed title from Error when uploading a file with WinSCP in SFTP to Timeout error when uploading a file with some SFTP clients, e.g. WinSCP 2011-06-28 17:51:53 +00:00
warner added
code-frontend-ftp-sftp
and removed
code-frontend
labels 2014-12-02 19:42:20 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1041
No description provided.