review #1037 (SFTP) #1106

Open
opened 2010-06-29 02:27:29 +00:00 by zooko · 7 comments

Per comment:77205, the patches that fixed #1037 need to be reviewed. I've reviewed about half of them. I'll start posting my notes.

Per [comment:77205](/tahoe-lafs/trac-2024-07-25/issues/1037#issuecomment-77205), the patches that fixed #1037 need to be reviewed. I've reviewed about half of them. I'll start posting my notes.
zooko added the
code
major
defect
1.7.0
labels 2010-06-29 02:27:29 +00:00
zooko added this to the 1.7.1 milestone 2010-06-29 02:27:29 +00:00
zooko self-assigned this 2010-06-29 02:27:29 +00:00
Author

Review notes (the review is not complete)

David-Sarah: overall this is excellent work! It is readable and correct, with a few but not many parts that are un-idiomatic.

Things that need to be responded to:

  • Hey look sftpd.noisy is True. Is that intentional?
  • Maybe document this with "assert" (and check it, too, when asserts are compiled in): # invariant: self.download_size <= self.current_size
  • Is it our job to unregister any extant producer before self.producer=p in registerProducer()?
  • !! Is that a busy eventually loop in _iterate() ? Bad!
  • add doc for self.async. I guess it means download is finished.
  • add doc for _convert_error()
  • data[len(data))]offset:min(offset+length, is the same as dataoffset:offset+length
  • Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code.
    try:
        if noisy: logmsg(traceback.format_exc(err.value), level=NOISY)
    except:  # pragma: no cover
        pass
  • "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom?
  • The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)

Ideas that don't necessarily need to be responded to:

  • class OverwriteableFileConsumer could use unit tests
  • I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact.
  • # TODO: use download interface described in #993 when implemented. I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-)
  • I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?) On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be mor easily read than any alternative.
  • idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD.
Review notes (the review is not complete) David-Sarah: overall this is excellent work! It is readable and correct, with a few but not many parts that are un-idiomatic. Things that need to be responded to: * Hey look `sftpd.noisy` is `True`. Is that intentional? * Maybe document this with "assert" (and check it, too, when asserts are compiled in): `# invariant: self.download_size <= self.current_size` * Is it our job to unregister any extant producer before `self.producer=p in registerProducer()`? * !! Is that a busy eventually loop in _iterate() ? Bad! * add doc for self.async. I guess it means download is finished. * add doc for _convert_error() * data[len(data))]offset:min(offset+length, is the same as dataoffset:offset+length * Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code. ``` try: if noisy: logmsg(traceback.format_exc(err.value), level=NOISY) except: # pragma: no cover pass ``` * "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom? * The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) `TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)` Ideas that don't necessarily need to be responded to: * class OverwriteableFileConsumer could use unit tests * I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact. * `# TODO: use download interface described in #993 when implemented.` I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-) * I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?) On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be mor easily read than any alternative. * idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD.
davidsarah commented 2010-07-09 20:49:59 +00:00
Owner

Replying to zooko:

  • Hey look sftpd.noisy is True. Is that intentional?

Yes, at this point the noisy logging is still very useful. It doesn't have much of a performance impact I think, so I might just s/if noisy: //g.

  • Maybe document this with "assert" (and check it, too, when asserts are compiled in): # invariant: self.download_size <= self.current_size

The invariant is documenting that this condition is always true. That it's true at that point (actually, made true by the next two lines) is fairly obvious -- so I'm not sure an assert would really help, but I don't object to adding one.

  • Is it our job to unregister any extant producer before self.producer=p in registerProducer()?

No. According to the twisted API docs, a RuntimeError should be raised if a producer is already registered. I'll fix that.

  • !! Is that a busy eventually loop in _iterate() ? Bad!

It is. Why is this bad? The synchronous alternative would be:

while not self.is_done:
    p.resumeProducing()

which is worse wrt potentially starving concurrent activities (but that is what Twisted does in some of its consumers, and we do it in source:src/allmydata/util/consumer.py).

  • add doc for self.async. I guess it means download is finished.

No, see the doc comment for GeneralSFTPFile:

    I wrap an instance of OverwriteableFileConsumer, which is responsible for
    storing the file contents. In order to allow write requests to be satisfied
    immediately, there is effectively a FIFO queue between requests made to this
    file handle, and requests to my OverwriteableFileConsumer. This queue is
    implemented by the callback chain of self.async.

(There should be a similar comment for ShortReadOnlySFTPFile.) Is this unclear?

  • add doc for _convert_error()

Will do.

  • data[len(data))]offset:min(offset+length, is the same as dataoffset:offset+length

Oh, so it is. I was not aware of that aspect of Python slicing semantics (maybe because we would have made it raise an error ;-)

  • Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code.
    try:
        if noisy: logmsg(traceback.format_exc(err.value), level=NOISY)
    except:  # pragma: no cover
        pass

traceback.format_exc was in practice raising exceptions in obscure cases (I think related to implicit unicode<->str conversion, but I forget the details). Rather than trying to make sure that it never raises an exception, I decided to suppress them. The traceback is very useful for debugging when it does get logged, so I'm loath to remove this code.

  • "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom?

In case we are running over Tor or I2P, we shouldn't reveal information that would identify the node, as discussed in #1008.

  • The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)

Agreed.

Ideas that don't necessarily need to be responded to:

  • class OverwriteableFileConsumer could use unit tests

Yes, absolutely. It's difficult to test because it depends on the ordering of reads and overwrites relative to writes coming from the producer, but that could be made deterministic by using a custom test producer.

  • I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact.

It is indeed defined by IFinishableConsumer. I'll add a docstring.

  • # TODO: use download interface described in #993 when implemented. I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-)

It is documented in /tahoe-lafs/trac-2024-07-25/issues/6055#comment:5, so I'll remove the TODO.

  • I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?)

They do. (They didn't in a previous version, where I had extra logging in eventually_callback, but that's not needed any more.) I will simplify the Schönfinkel'd calls.

On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be more easily read than any alternative.

  • idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD.

Will do (but first the tests will need to be changed to not be dependent on SIZE_THRESHOLD being 1000).

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1106#issuecomment-78359): > * Hey look `sftpd.noisy` is `True`. Is that intentional? Yes, at this point the noisy logging is still very useful. It doesn't have much of a performance impact I think, so I might just `s/if noisy: //g`. > * Maybe document this with "assert" (and check it, too, when asserts are compiled in): `# invariant: self.download_size <= self.current_size` The invariant is documenting that this condition is always true. That it's true at that point (actually, made true by the next two lines) is fairly obvious -- so I'm not sure an `assert` would really help, but I don't object to adding one. > * Is it our job to unregister any extant producer before `self.producer=p in registerProducer()`? No. According to the twisted [API docs](http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.interfaces.IConsumer.registerProducer.html), a `RuntimeError` should be raised if a producer is already registered. I'll fix that. > * !! Is that a busy eventually loop in _iterate() ? Bad! It is. Why is this bad? The synchronous alternative would be: ``` while not self.is_done: p.resumeProducing() ``` which is worse wrt potentially starving concurrent activities (but that is what Twisted does in some of its consumers, and we do it in source:src/allmydata/util/consumer.py). > * add doc for self.async. I guess it means download is finished. No, see the doc comment for `GeneralSFTPFile`: ``` I wrap an instance of OverwriteableFileConsumer, which is responsible for storing the file contents. In order to allow write requests to be satisfied immediately, there is effectively a FIFO queue between requests made to this file handle, and requests to my OverwriteableFileConsumer. This queue is implemented by the callback chain of self.async. ``` (There should be a similar comment for `ShortReadOnlySFTPFile`.) Is this unclear? > * add doc for _convert_error() Will do. > * data[len(data))]offset:min(offset+length, is the same as dataoffset:offset+length Oh, so it is. I was not aware of that aspect of Python slicing semantics (maybe because we would have made it raise an error ;-) > * Why do we need this catch-all-exceptions? Because err.value is not always an exception? This is noisy to the reader of source code. > ``` > try: > if noisy: logmsg(traceback.format_exc(err.value), level=NOISY) > except: # pragma: no cover > pass > ``` `traceback.format_exc` was in practice raising exceptions in obscure cases (I think related to implicit unicode<->str conversion, but I forget the details). Rather than trying to make sure that it never raises an exception, I decided to suppress them. The traceback is very useful for debugging when it does get logged, so I'm loath to remove this code. > * "The message argument to SFTPError must not reveal information that might compromise anonymity." Huh? Who's anonymity might be revealed to whom? In case we are running over Tor or I2P, we shouldn't reveal information that would identify the node, as discussed in #1008. > * The following TODO can be removed based on testing in the 1.7.0 development cycle (and feedback from 1.7.0 release) `TODO: check that clients are okay with this being a "?". (They should be because the longname is intended for human consumption.)` Agreed. > Ideas that don't necessarily need to be responded to: > * class OverwriteableFileConsumer could use unit tests Yes, absolutely. It's difficult to test because it depends on the ordering of reads and overwrites relative to writes coming from the producer, but that could be made deterministic by using a custom test producer. > * I think ".finish()" means the download is finished but local writes can proceed apace; docstring needed; Oh! I guess (based on the name) that it is defined by IFinishableConsumer. Maybe docstring that fact. It is indeed defined by [IFinishableConsumer](http://twistedmatrix.com/documents/10.0.0/api/twisted.internet.interfaces.IFinishableConsumer.html). I'll add a docstring. > * `# TODO: use download interface described in #993 when implemented.` I don't like "TODO" in comments in source code. They aren't usually noticed by the people who need to be reminded, and they often become stale (the thing no longer needs to be done, but nobody thinks to find the TODO and remove it.) I prefer issue tickets. This "TODO" should probably be deleted or moved to an issue ticket or maybe do the thing that it says to do. :-) It is documented in [/tahoe-lafs/trac-2024-07-25/issues/6055](/tahoe-lafs/trac-2024-07-25/issues/6055)#comment:5, so I'll remove the TODO. > * I find "eventually_callback(d)(expr)" harder to understand than "eventually(d.callback, expr)". (They do mean the same thing, right?) They do. (They didn't in a previous version, where I had extra logging in `eventually_callback`, but that's not needed any more.) I will simplify the [Schönfinkel'd](http://en.wikipedia.org/wiki/Moses_Sch%C3%B6nfinkel) calls. > On the other hand I find self.async.addCallbacks(..., eventually_errback(d)) to be more easily read than any alternative. > * idea: encourage users to experiment with hacking their SIZE_THRESHOLD, or perhaps even better do some more systematic measurements of how long it takes to download an entire file vs. to read parts out of a larger file in order to choose a more optimized SIZE_THRESHOLD. Will do (but first the tests will need to be changed to not be dependent on SIZE_THRESHOLD being 1000).
davidsarah commented 2010-07-12 03:16:33 +00:00
Owner

Attachment sftp-comments.dpatch (10082 bytes) added

SFTP: address some of the comments in zooko's review (#1106).

**Attachment** sftp-comments.dpatch (10082 bytes) added SFTP: address some of the comments in zooko's review (#1106).
davidsarah commented 2010-07-12 03:17:09 +00:00
Owner

Attachment sftp-no-trunc-files-opened-with-append.dpatch (4012 bytes) added

SFTP: refuse to truncate files opened with FXF_APPEND (see /tahoe-lafs/trac-2024-07-25/issues/6099#comment:20).

**Attachment** sftp-no-trunc-files-opened-with-append.dpatch (4012 bytes) added SFTP: refuse to truncate files opened with FXF_APPEND (see [/tahoe-lafs/trac-2024-07-25/issues/6099](/tahoe-lafs/trac-2024-07-25/issues/6099)#comment:20).
Author

sftp-comments.dpatch looks good.

re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test?

sftp-comments.dpatch looks good. re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test?
davidsarah commented 2010-07-15 20:11:23 +00:00
Owner

Replying to zooko:

re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test?

This patch caused a regression in test_openFile_write. It needs more work.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1106#issuecomment-78361): > re: sftp-no-trunc-files-opened-with-append.dpatch : should we add a unit test? This patch caused a regression in test_openFile_write. It needs more work.
davidsarah commented 2010-07-18 02:22:38 +00:00
Owner

sftp-comments.dpatch was applied in changeset:15ddab08edebd7fe.

sftp-comments.dpatch was applied in changeset:15ddab08edebd7fe.
tahoe-lafs modified the milestone from 1.7.1 to soon 2010-07-18 02:22:38 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1106
No description provided.