SFTP: handle download failures correctly; remove use of IFinishableConsumer #1525

Closed
opened 2011-09-04 01:57:06 +00:00 by davidsarah · 19 comments
davidsarah commented 2011-09-04 01:57:06 +00:00
Owner

Brian noticed that a Deferred was being dropped at source:src/allmydata/frontends/sftpd.py@5179#L676. (Although this code changed when MDMF landed changeset:a87fc8ffab55dfd2, the immutable path of the [previous code]source:src/allmydata/frontends/sftpd.py@5127#L675 also dropped a Deferred, at line 680.)

It is incorrect to drop the Deferred, because that will cause SFTP read requests to hang if downloading the previous file contents fails. However fixing it while still allowing streaming reads is a bit tricky. The intention is that we download the file in parallel with accepting SFTP requests, and block each read request until the relevant part of the file is available (and prior writes have completed). If the download fails, then any SFTP read requests past the prefix that was read correctly should also fail. Also, if the file was written and not truncated, the SFTP close request should fail, at least in the case when the new contents depend on the part of the old contents that we weren't able to download.

Brian noticed that a Deferred was being dropped at source:src/allmydata/frontends/sftpd.py@5179#L676. (Although this code changed when MDMF landed changeset:a87fc8ffab55dfd2, the immutable path of the [previous code]source:src/allmydata/frontends/sftpd.py@5127#L675 also dropped a Deferred, at line 680.) It is incorrect to drop the Deferred, because that will cause SFTP read requests to hang if downloading the previous file contents fails. However fixing it while still allowing streaming reads is a bit tricky. The intention is that we download the file in parallel with accepting SFTP requests, and block each read request until the relevant part of the file is available (and prior writes have completed). If the download fails, then any SFTP read requests past the prefix that was read correctly should also fail. Also, if the file was written and not truncated, the SFTP close request should fail, at least in the case when the new contents depend on the part of the old contents that we weren't able to download.
tahoe-lafs added the
code-frontend
major
defect
1.9.0a1
labels 2011-09-04 01:57:06 +00:00
tahoe-lafs added this to the soon milestone 2011-09-04 01:57:06 +00:00
davidsarah commented 2011-09-04 02:03:57 +00:00
Author
Owner

Attachment fix-1525.darcs.patch (64091 bytes) added

SFTP: make sure that download failures are not dropped. fixes #1525

**Attachment** fix-1525.darcs.patch (64091 bytes) added SFTP: make sure that download failures are not dropped. fixes #1525
davidsarah commented 2011-09-04 05:01:41 +00:00
Author
Owner

Attachment broken-tests-for-1525.darcs.patch (57845 bytes) added

SFTP: tests for ref #1525, but test_openFile_write is failing depending on the order of previous tests :-( Can anyone help debugging this?

**Attachment** broken-tests-for-1525.darcs.patch (57845 bytes) added SFTP: tests for ref #1525, but test_openFile_write is failing depending on the order of previous tests :-( Can anyone help debugging this?
davidsarah commented 2011-09-04 05:07:01 +00:00
Author
Owner

In broken-tests-for-1525.darcs.patch, there seems to be a dependency on the test order:

davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp.Handler.test_openFile_write allmydata.test.test_sftp.Handler.test_openFile_read allmydata.test.test_sftp.Handler.test_openDirectory_and_attrs
allmydata.test.test_sftp
  Handler
    test_openFile_write ...                                                [OK]
    test_openFile_read ...                                                 [OK]
    test_openDirectory_and_attrs ...                                       [OK]

-------------------------------------------------------------------------------
Ran 3 tests in 8.720s

PASSED (successes=3)
davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp
allmydata.test.test_sftp
  Handler
    test_basic ...                                                         [OK]
    test_convert_error ...                                                 [OK]
    test_execCommand_and_openShell ...                                     [OK]
    test_extendedRequest ...                                               [OK]
    test_makeDirectory ...                                                 [OK]
    test_not_implemented ...                                               [OK]
    test_openDirectory_and_attrs ...                                       [OK]
    test_openFile_read ...                                                 [OK]
    test_openFile_write ... Traceback (most recent call last):
Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3'
/usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent
/home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda>
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data
/usr/lib/python2.6/genericpath.py:49:getsize
]'>
]
[ERROR]
    test_removeDirectory ...                                               [OK]
    test_removeFile ...                                                    [OK]
    test_renameFile ...                                                    [OK]
    test_renameFile_posix ...                                              [OK]

===============================================================================
[ERROR]: allmydata.test.test_sftp.Handler.test_openFile_write

Traceback (most recent call last):
Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3'
/usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent
/home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks
--- <exception caught here> ---
/usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda>
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read
/home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data
/usr/lib/python2.6/genericpath.py:49:getsize
]'>
]
-------------------------------------------------------------------------------
Ran 13 tests in 20.290s

FAILED (errors=1, successes=12)

Notice that the error in test_openfile_write refers to the directory 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' being missing, but that is the wrong directory -- all the basedirs in each test should be unique, and the ones in test_openFile_write should start with sftp/openFile_write. Can anyone see what is going on?

In [broken-tests-for-1525.darcs.patch](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-77bf-3f68-0d13-0fbf7cf0781e), there seems to be a dependency on the test order: ``` davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp.Handler.test_openFile_write allmydata.test.test_sftp.Handler.test_openFile_read allmydata.test.test_sftp.Handler.test_openDirectory_and_attrs allmydata.test.test_sftp Handler test_openFile_write ... [OK] test_openFile_read ... [OK] test_openDirectory_and_attrs ... [OK] ------------------------------------------------------------------------------- Ran 3 tests in 8.720s PASSED (successes=3) davidsarah@shinier:~/tahoe/1.9alpha$ bin/tahoe debug trial --rterror allmydata.test.test_sftp allmydata.test.test_sftp Handler test_basic ... [OK] test_convert_error ... [OK] test_execCommand_and_openShell ... [OK] test_extendedRequest ... [OK] test_makeDirectory ... [OK] test_not_implemented ... [OK] test_openDirectory_and_attrs ... [OK] test_openFile_read ... [OK] test_openFile_write ... Traceback (most recent call last): Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' /usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent /home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks --- <exception caught here> --- /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda> /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data /usr/lib/python2.6/genericpath.py:49:getsize ]'> ] [ERROR] test_removeDirectory ... [OK] test_removeFile ... [OK] test_renameFile ... [OK] test_renameFile_posix ... [OK] =============================================================================== [ERROR]: allmydata.test.test_sftp.Handler.test_openFile_write Traceback (most recent call last): Failure: allmydata.interfaces.NoSharesError: no shares (need 3). Last failure: [Failure instance: Traceback (failure with no frames): <class 'foolscap.tokens.RemoteException'>: <RemoteException around '[Failure instance: Traceback: <type 'exceptions.OSError'>: [Errno 2] No such file or directory: 'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3' /usr/lib/python2.6/dist-packages/twisted/internet/base.py:796:runUntilCurrent /home/davidsarah/tahoe/1.9alpha/support/lib/python2.6/site-packages/foolscap-0.6.1-py2.6.egg/foolscap/eventual.py:26:_turn /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:318:callback /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:424:_startRunCallbacks --- <exception caught here> --- /usr/lib/python2.6/dist-packages/twisted/internet/defer.py:441:_runCallbacks /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:83:<lambda> /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:80:_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/test/no_network.py:66:_really_call /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:314:remote_read /home/davidsarah/tahoe/1.9alpha/src/allmydata/storage/immutable.py:89:read_share_data /usr/lib/python2.6/genericpath.py:49:getsize ]'> ] ------------------------------------------------------------------------------- Ran 13 tests in 20.290s FAILED (errors=1, successes=12) ``` Notice that the error in test_openfile_write refers to the directory `'sftp/openFile_read/servers/7r4gd6xu/storage/shares/hc/hcatcbxpnrcch37ihub4vi2shy/3'` being missing, but that is the wrong directory -- all the basedirs in each test should be unique, and the ones in test_openFile_write should start with `sftp/openFile_write`. Can anyone see what is going on?
davidsarah commented 2011-09-05 05:46:26 +00:00
Author
Owner

Attachment more-deterministically-wrong.darcs.patch (77500 bytes) added

Version of broken test that is more deterministic (allmydata.test.test_sftp2)

**Attachment** more-deterministically-wrong.darcs.patch (77500 bytes) added Version of broken test that is more deterministic (allmydata.test.test_sftp2)

note: this will also fix the deprecation warning seen in #1564

note: this will also fix the deprecation warning seen in #1564

This ticket is marked review-needed but the comments make it sound like it isn't ready for review because some of the tests are wrong. If that's true, please remove review-needed.

This ticket is marked `review-needed` but the comments make it sound like it isn't ready for review because some of the tests are wrong. If that's true, please remove `review-needed`.
davidsarah commented 2011-12-31 20:23:05 +00:00
Author
Owner

The tests are indeed wrong.

The tests are indeed wrong.
tahoe-lafs modified the milestone from soon to 1.10.0 2012-04-01 00:03:03 +00:00

On this morning's dev call, we agreed that if the tests can be made to pass soon (in the next week), we can land this, else we'll kick it out to 1.11. davidsarah is still on the hook to fix them.

On this morning's dev call, we agreed that if the tests can be made to pass soon (in the next week), we can land this, else we'll kick it out to 1.11. davidsarah is still on the hook to fix them.
davidsarah commented 2013-01-08 03:00:54 +00:00
Author
Owner

The current work in progress on this ticket, not including tests, is at https://github.com/davidsarah/tahoe-lafs/commits/1525-sftp-failures. Currently the SFTP tests hang at test_openFile_read, and it doesn't look as though I'll have time to fix that for 1.10, so I'm bumping this to 1.11.

The current work in progress on this ticket, not including tests, is at <https://github.com/davidsarah/tahoe-lafs/commits/1525-sftp-failures>. Currently the SFTP tests hang at `test_openFile_read`, and it doesn't look as though I'll have time to fix that for 1.10, so I'm bumping this to 1.11.
tahoe-lafs modified the milestone from 1.10.0 to 1.11.0 2013-01-08 03:00:54 +00:00
davidsarah commented 2013-02-20 21:59:08 +00:00
Author
Owner

I'm bringing this back into the 1.10 milestone, because (as mentioned in #1295), Twisted 12.3.0 has removed IFinishableConsumer. This patch removes use of that interface.

I'm bringing this back into the 1.10 milestone, because (as mentioned in #1295), Twisted 12.3.0 has removed `IFinishableConsumer`. This patch removes use of that interface.
tahoe-lafs modified the milestone from 1.11.0 to 1.10.0 2013-02-20 21:59:08 +00:00
tahoe-lafs changed title from SFTP: handle download failures correctly to SFTP: handle download failures correctly; remove use of IFinishableConsumer 2013-02-20 21:59:08 +00:00

So the next step is for davidsarah to fix the unit test?

So the next step is for davidsarah to fix the unit test?
davidsarah commented 2013-02-25 22:26:47 +00:00
Author
Owner

Replying to zooko:

So the next step is for davidsarah to fix the unit test?

Yes (and rebase the patch).

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1525#issuecomment-85495): > So the next step is for davidsarah to fix the unit test? Yes (and rebase the patch).

The import error part of this (formerly ticket #1926) is a blocker for 1.10.

Now, maybe we should re-open #1926 and fix it in the easy way (by constraining the Twisted version to not allow the latest Twisted). But I don't think so. I'd rather fix this ticket, #1525.

The import error part of this (formerly ticket #1926) is a blocker for 1.10. Now, *maybe* we should re-open #1926 and fix it in the easy way (by constraining the Twisted version to not allow the latest Twisted). But I don't think so. I'd rather fix this ticket, #1525.
davidsarah commented 2013-03-10 23:56:03 +00:00
Author
Owner

I'm working on this now.

I'm working on this now.
davidsarah commented 2013-03-19 02:43:21 +00:00
Author
Owner
~~Review needed for <https://github.com/tahoe-lafs/tahoe-lafs/pull/33>.~~
davidsarah commented 2013-03-19 05:51:47 +00:00
Author
Owner

Committed in 48a2989e and 50c65629. Still needs review.

Committed in [48a2989e](https://github.com/tahoe-lafs/tahoe-lafs/commit/48a2989ee1a1d31d91408fa7e4bf5f9b25b01b76) and [50c65629](https://github.com/tahoe-lafs/tahoe-lafs/commit/50c6562901449427c1189890b9b8a1e4e5c81478). Still needs review.

marking as "critical" to indicate this is a 1.10 blocker. All it needs is someone (probably me) to review-after-the-fact.

marking as "critical" to indicate this is a 1.10 blocker. All it needs is someone (probably me) to review-after-the-fact.
warner added
critical
and removed
major
labels 2013-03-20 21:31:20 +00:00

I spent a couple of hours on the plane studying the SFTP code. Wow, it's complex. It may need to be that complex, but it sure was hard to get my head around the control flows. I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them. If Daira's manual and automated testing shows these changes to have worked correctly, I'm +1 on them too.

I spent a couple of hours on the plane studying the SFTP code. Wow, it's complex. It may need to be that complex, but it sure was hard to get my head around the control flows. I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them. If Daira's manual and automated testing shows these changes to have worked correctly, I'm +1 on them too.
warner added the
fixed
label 2013-04-15 05:38:12 +00:00
daira commented 2013-04-20 23:35:25 +00:00
Author
Owner

Replying to warner:

I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them.

Please do, while they are fresh in your mind!

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/1525#issuecomment-85504): > I've got a list of code-cleanup suggestions, but I suspect most of them were applicable to the pre-#1525 code as well, so don't really qualify as review for this ticket. I think I'll open a separate ticket to discuss them. Please do, while they are fresh in your mind!
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1525
No description provided.