drop-upload: also monitor subdirectories #1433

Closed
opened 2011-07-17 02:27:12 +00:00 by davidsarah · 13 comments
davidsarah commented 2011-07-17 02:27:12 +00:00
Owner

The prototype implementation of drop-upload in #1429 monitors only a single directory, and ignores creation of subdirectories or changes to files within them. It is possible to monitor subdirectories as well, although the efficiency and complexity of this varies depending on the design of the platform's filesystem monitoring API (if we support multiple platforms with this feature as suggested in #1431 and #1432).

The prototype implementation of drop-upload in #1429 monitors only a single directory, and ignores creation of subdirectories or changes to files within them. It is possible to monitor subdirectories as well, although the efficiency and complexity of this varies depending on the design of the platform's filesystem monitoring API (if we support multiple platforms with this feature as suggested in #1431 and #1432).
tahoe-lafs added the
code-frontend
major
defect
1.8.2
labels 2011-07-17 02:27:12 +00:00
tahoe-lafs added this to the undecided milestone 2011-07-17 02:27:12 +00:00
tahoe-lafs added
enhancement
and removed
defect
labels 2011-07-25 12:26:58 +00:00
warner removed the
code-frontend
label 2014-12-02 19:47:01 +00:00
daira commented 2015-04-10 12:28:06 +00:00
Author
Owner

Note that the IN_ONLYDIR constant in the current event mask doesn't do what you might think it does. It only ensures that the root being monitored is a directory.

Note that the IN_ONLYDIR constant in the current event mask doesn't do what you might think it does. It only ensures that the root being monitored is a directory.
daira commented 2015-04-10 20:01:04 +00:00
Author
Owner

On 07/04/15 21:05, David Stainton wrote:>

a couple of my initial and trife observations:
1.
Looks like the latest Twisted inotify class has got a recursive watch.
I'm not saying we need to use that version of Twisted; it's a simple loop we can implement.
https://twistedmatrix.com/trac/browser/tags/releases/twisted-15.0.0/twisted/internet/inotify.py#L350

This has been there since the inotify support was added in Twisted 10.1.0, in fact.

I looked at the source code of the twisted INotify class to see if it's watch method could be called more than once and behave properly.
It looks like it does support this usage.

Yes, it does, but the recursive/autoAdd support looks sufficient so we shouldn't need this. (We also don't need it for Windows because ReadDirectoryChangesW has a similar flag to watch a subdirectory tree.)

I am not clear on why drop_upload.py uses this inotify mask:

mask = inotify.IN_CLOSE_WRITE | inotify.IN_MOVED_TO | inotify.IN_ONLYDIR

Combining all these definitions:
IN_MOVED_TO - File moved into watched directory
IN_CLOSE_WRITE - File opened for writing was closed
IN_ONLYDIR - Only watch pathname if it is a directory.

This means we only watch subdirectories that have been created or moved into the directory. Did I understand things correctly?

See comment:84197 .

the inotify man page states:

   If monitoring an entire directory subtree, and a new subdirectory is created in that tree, be aware that by the time you create a watch for the new subdirectory, new files may already have been created in the subdirectory.  Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch.

This obviously means we should construct a mask specifically to catch newly added directories like this:

mask = inotify.IN_MOVED_TO | inotify.IN_ONLYDIR

and then add a specific inotify watch for that subdirectory... AND add recursively add that directory and its contents to the upload queue.

The autoAdd flag already adds watches for children of newly added directories (and has done so since Twisted 10.1.0):

https://twistedmatrix.com/trac/browser/tags/releases/twisted-10.1.0/twisted/internet/inotify.py#L259

It also synthesizes IN_CREATE events for files and subdirectories found in those newly added directories (see the _addChildren method in inotify.py).

On 07/04/15 21:05, David Stainton wrote:> > a couple of my initial and trife observations: > 1. > Looks like the latest Twisted inotify class has got a recursive watch. > I'm not saying we need to use that version of Twisted; it's a simple loop we can implement. > <https://twistedmatrix.com/trac/browser/tags/releases/twisted-15.0.0/twisted/internet/inotify.py#L350> This has been there since the inotify support was added in Twisted 10.1.0, in fact. > 2. > I looked at the source code of the twisted INotify class to see if it's `watch` method could be called more than once and behave properly. > It looks like it does support this usage. Yes, it does, but the `recursive`/`autoAdd` support looks sufficient so we shouldn't need this. (We also don't need it for Windows because `ReadDirectoryChangesW` has a similar flag to watch a subdirectory tree.) > 3. > I am not clear on why drop_upload.py uses this inotify mask: > > `mask = inotify.IN_CLOSE_WRITE | inotify.IN_MOVED_TO | inotify.IN_ONLYDIR` > > Combining all these definitions: > IN_MOVED_TO - File moved into watched directory > IN_CLOSE_WRITE - File opened for writing was closed > IN_ONLYDIR - Only watch pathname if it is a directory. > > This means we only watch subdirectories that have been created or moved into the directory. Did I understand things correctly? See [comment:84197](/tahoe-lafs/trac-2024-07-25/issues/1433#issuecomment-84197) . > 4. > the inotify man page states: > > If monitoring an entire directory subtree, and a new subdirectory is created in that tree, be aware that by the time you create a watch for the new subdirectory, new files may already have been created in the subdirectory. Therefore, you may want to scan the contents of the subdirectory immediately after adding the watch. > > This obviously means we should construct a mask specifically to catch newly added directories like this: > > `mask = inotify.IN_MOVED_TO | inotify.IN_ONLYDIR` > > and then add a specific inotify watch for that subdirectory... AND add recursively add that directory and its contents to the upload queue. The `autoAdd` flag already adds watches for children of newly added directories (and has done so since Twisted 10.1.0): <https://twistedmatrix.com/trac/browser/tags/releases/twisted-10.1.0/twisted/internet/inotify.py#L259> It also synthesizes `IN_CREATE` events for files and subdirectories found in those newly added directories (see the `_addChildren` method in `inotify.py`).
dawuud commented 2015-04-11 00:03:50 +00:00
Author
Owner

sounds like we need a simple one line change like this:
https://github.com/david415/tahoe-lafs/tree/david-1433-1

Does this satisfy our spec?:
https://github.com/tahoe-lafs/tahoe-lafs/blob/master/docs/proposed/magic-folder/filesystem-integration.rst

which states that we need to do a full scan of new directories...

I am wondering what would happen if a user moved a directory tree into the magic folder?
Would the subdirectories be scanned? It looks like to me they would not be. If this is the case then maybe we'll have to not use the autoAdd feature.

sounds like we need a simple one line change like this: <https://github.com/david415/tahoe-lafs/tree/david-1433-1> Does this satisfy our spec?: <https://github.com/tahoe-lafs/tahoe-lafs/blob/master/docs/proposed/magic-folder/filesystem-integration.rst> which states that we need to do a full scan of new directories... I am wondering what would happen if a user moved a directory tree into the magic folder? Would the subdirectories be scanned? It looks like to me they would not be. If this is the case then maybe we'll have to not use the `autoAdd` feature.
daira commented 2015-04-11 01:38:15 +00:00
Author
Owner

The spec says that a scan of new directories needs to be done, it doesn't say what code is responsible for that. At the time, I hadn't looked at the relevant Twisted code.

We do need to check what happens in the move-one-directory-into-another case. I suspect it Just Works though, since _addChildren doesn't distinguish that case, and shouldn't need to.

The spec says that a scan of new directories needs to be done, it doesn't say what code is responsible for that. At the time, I hadn't looked at the relevant Twisted code. We do need to check what happens in the move-one-directory-into-another case. I suspect it Just Works though, since `_addChildren` doesn't distinguish that case, and shouldn't need to.
daira commented 2015-04-11 01:39:36 +00:00
Author
Owner

I suggest first writing tests for these cases, rather than worrying about the implementation.

I suggest first writing tests for these cases, rather than worrying about the implementation.
tahoe-lafs modified the milestone from undecided to 1.11.0 2015-04-12 22:32:28 +00:00
dawuud commented 2015-04-15 16:59:41 +00:00
Author
Owner

i added some code to david-1433-1 branch that handles the symlink and directory cases.

if symlink then ignore file and print error to the log.
if dir then create the subdirectory.

i added some code to david-1433-1 branch that handles the symlink and directory cases. if symlink then ignore file and print error to the log. if dir then create the subdirectory.
dawuud commented 2015-04-18 03:34:07 +00:00
Author
Owner
here's my working code AND unit tests! tests: <https://github.com/david415/tahoe-lafs/blob/d9153c482eb30eb15117d758fc7303f4a90076d3/src/allmydata/test/test_drop_upload.py#L71-L114> code: <https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque>
daira commented 2015-04-18 16:37:40 +00:00
Author
Owner

Excellent! Will review now.

Excellent! Will review now.
dawuud commented 2015-04-24 07:08:07 +00:00
Author
Owner

Ok recently fixed some problems with these tests... and I think it exposes a bug... must look into it...

Ok recently fixed some problems with these tests... and I think it exposes a bug... must look into it...
dawuud commented 2015-04-28 17:49:06 +00:00
Author
Owner

fixed unit tests and code. awaiting code review from daira for my recent changes regarding the addition of a notifier watch upon directory placement in the uploader directory. (the watch is recursive and a recursive scan of the new directory is also applied)

fixed unit tests and code. awaiting code review from daira for my recent changes regarding the addition of a notifier watch upon directory placement in the uploader directory. (the watch is recursive and a recursive scan of the new directory is also applied)
daira commented 2015-04-28 23:47:09 +00:00
Author
Owner

I reviewed this (and we made a few other fixes in our Tuesday pairing session); it looks good. I still need to review the rest of the queue/scanning code.

I reviewed this (and we made a few other fixes in our Tuesday pairing session); it looks good. I still need to review the rest of the queue/scanning code.
tahoe-lafs added the
fixed
label 2015-05-02 16:42:02 +00:00
daira closed this issue 2015-05-02 16:42:02 +00:00
daira commented 2015-05-02 16:44:00 +00:00
Author
Owner

Closing this and using ticket #2406 for any further review comments.

Closing this and using ticket #2406 for any further review comments.

Milestone renamed

Milestone renamed
warner modified the milestone from 1.11.0 to 1.12.0 2016-03-22 05:02:52 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1433
No description provided.