SFTP+SSHFS hangs for second concurrent operation #1976

Open
opened 2013-05-23 16:34:10 +00:00 by luckyredhot · 12 comments
luckyredhot commented 2013-05-23 16:34:10 +00:00
Owner

I am using Tahoe-Lafs FTPS frontend with SSHFS on Ubuntu 12.04.
If I try to run second operation (simply "ls" or "du") while first writing is running, second one can completely hang sometimes. It does not even stops on sending SIGKILL so I need to kill parent bash session.

Tahoe-LAFS versions 1.9.2 and 1.10.0 are both affected.

SSHFS mount options:

sshfs -p 8022 -o uid=33 -o gid=33 -o nonempty -o allow_other -o idmap=user tahoe@127.0.0.1:/ /mnt/tahoe

If this is SFTP issue it should be fixed.
If this is SSHFS issue then probably we have to find other client or some workaround (probably 2 sshfs mounts - for writing and for reading).

Any help is appreciated :)
Please also suggest on commands which I may run when issue occurs to gather some debug information.

Thanks!

I am using Tahoe-Lafs FTPS frontend with SSHFS on Ubuntu 12.04. If I try to run second operation (simply "ls" or "du") while first writing is running, second one can completely hang sometimes. It does not even stops on sending SIGKILL so I need to kill parent bash session. Tahoe-LAFS versions 1.9.2 and 1.10.0 are both affected. SSHFS mount options: ``` sshfs -p 8022 -o uid=33 -o gid=33 -o nonempty -o allow_other -o idmap=user tahoe@127.0.0.1:/ /mnt/tahoe ``` If this is SFTP issue it should be fixed. If this is SSHFS issue then probably we have to find other client or some workaround (probably 2 sshfs mounts - for writing and for reading). Any help is appreciated :) Please also suggest on commands which I may run when issue occurs to gather some debug information. Thanks!
tahoe-lafs added the
code-frontend
normal
defect
1.10.0
labels 2013-05-23 16:34:10 +00:00
tahoe-lafs added this to the undecided milestone 2013-05-23 16:34:10 +00:00
luckyredhot commented 2013-05-23 16:36:21 +00:00
Author
Owner

Attachment tahoe_version (376 bytes) added

tahoe --version

**Attachment** tahoe_version (376 bytes) added tahoe --version
daira commented 2013-05-23 16:44:58 +00:00
Author
Owner

To get debugging output from sshfs, restart it in the foreground with options:

-o debug,sshfs_debug,loglevel=debug

To get debugging output from the gateway, see the Realtime Logging section of source:docs/logging.rst.

To get debugging output from sshfs, restart it in the foreground with options: ``` -o debug,sshfs_debug,loglevel=debug ``` To get debugging output from the gateway, see the Realtime Logging section of source:docs/logging.rst.
tahoe-lafs changed title from FTPS+SSHFS hangs for second operation to SFTP+SSHFS hangs for second concurrent operation 2013-05-23 16:44:58 +00:00
luckyredhot commented 2013-06-11 09:20:34 +00:00
Author
Owner

Ok, I've catched an issue.
It happens when

  1. One write operation is in progress (I am constantly copying files to grid folder)
  2. Second operation tries to get listing/attributes. It usually happens not from the first time, but consequently running "ls" command causes all operations to freeze for long span. In my case I've got only 5 files in folder, but "ls" operation took 40 (!) seconds. It will last forever on hundreds of files.

See attached logs. I've issued ls before [80576] LSTAT

Ok, I've catched an issue. It happens when 1. One write operation is in progress (I am constantly copying files to grid folder) 2. Second operation tries to get listing/attributes. It usually happens not from the first time, but consequently running "ls" command causes all operations to freeze for long span. In my case I've got only 5 files in folder, but "ls" operation took 40 (!) seconds. It will last forever on hundreds of files. See attached logs. I've issued ls before *[80576] LSTAT*
luckyredhot commented 2013-06-11 09:22:22 +00:00
Author
Owner

Attachment Tahoe-LAFS_SSHFS_Debug_001.log (2589 bytes) added

**Attachment** Tahoe-LAFS_SSHFS_Debug_001.log (2589 bytes) added

Thanks for the bug report, luckyredhot! Is there any incident report file generated by the LAFS gateway when this happens? If not, could you force it to generate one? See wiki/HowToReportABug for instructions.

Thanks for the bug report, luckyredhot! Is there any incident report file generated by the LAFS gateway when this happens? If not, could you force it to generate one? See [wiki/HowToReportABug](wiki/HowToReportABug) for instructions.
luckyredhot commented 2013-06-14 15:06:42 +00:00
Author
Owner

Attachment incident-2013-06-11--12-22-26Z-oqcgkpa.flog.bz2 (31127 bytes) added

incident file

**Attachment** incident-2013-06-11--12-22-26Z-oqcgkpa.flog.bz2 (31127 bytes) added incident file
luckyredhot commented 2013-06-14 15:07:12 +00:00
Author
Owner

Incident file has been attached.
Hope it'll be helpful.

Incident file has been attached. Hope it'll be helpful.
luckyredhot commented 2013-06-21 07:40:15 +00:00
Author
Owner

Daira,
thanks for yesterday's analysis. What are the following steps we can make?
Probably I may try to raise issue one more time to get additional logs?
Or you think upgrading to 1.10 may also be helpful? (AFAIK SFTP wasn't modified there from 1.9.2).

Daira, thanks for yesterday's analysis. What are the following steps we can make? Probably I may try to raise issue one more time to get additional logs? Or you think upgrading to 1.10 may also be helpful? (AFAIK SFTP wasn't modified there from 1.9.2).
daira commented 2013-06-21 09:32:28 +00:00
Author
Owner

SFTP was actually modified in 1.10 to improve error handling; I doubt it affects this bug, but it may help slightly in debugging. I'm going to try to reproduce the problem myself, but please feel free to attach another log, since the file incident-2013-06-11--12-22-26Z-oqcgkpa.flog.bz2 seems to be corrupted in some way.

It's unfortunate that the sshfs debug log doesn't include timestamps that could be correlated with the foolscap log.

SFTP was actually modified in 1.10 to improve error handling; I doubt it affects this bug, but it may help slightly in debugging. I'm going to try to reproduce the problem myself, but please feel free to attach another log, since the file [incident-2013-06-11--12-22-26Z-oqcgkpa.flog.bz2](/tahoe-lafs/trac-2024-07-25/attachments/000078ac-e4d3-d3f0-fce6-60c57fc5ee3f) seems to be corrupted in some way. It's unfortunate that the sshfs debug log doesn't include timestamps that could be correlated with the foolscap log.
luckyredhot commented 2013-06-25 06:44:35 +00:00
Author
Owner

What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable?

What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable?

Replying to luckyredhot:

What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable?

Dear Oleksandr:

I would assume that the storage servers have nothing to do with this bug. However, since I don't understand this bug, maybe my assumption is bad.

However, I suspect you'd get more better debugging information for your effort if you try different versions of Tahoe-LAFS for the gateway rather than the servers.

Replying to [luckyredhot](/tahoe-lafs/trac-2024-07-25/issues/1976#issuecomment-91851): > What do you think if I perform partial Grid update (for example, upgrade 2 of existing 5 nodes to 1.10) and try to catch issue on both 1.9.2 and 1.10 nodes of the same Grid? Sound reasonable? Dear Oleksandr: I would assume that the storage servers have nothing to do with this bug. However, since I don't understand this bug, maybe my assumption is bad. However, I suspect you'd get more better debugging information for your effort if you try different versions of Tahoe-LAFS for the *gateway* rather than the servers.
daira commented 2013-06-26 00:23:26 +00:00
Author
Owner

I agree with zooko that this is unlikely to be related to the storage server versions.

I agree with zooko that this is unlikely to be related to the storage server versions.
warner added
code-frontend-ftp-sftp
and removed
code-frontend
labels 2014-12-02 19:50:10 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1976
No description provided.