drop-upload: don't perform redundant uploads when a file is quickly modified three or more times #1440

Closed
opened 2011-07-23 04:35:03 +00:00 by davidsarah · 9 comments
davidsarah commented 2011-07-23 04:35:03 +00:00
Owner

When a file is uploaded by the drop-upload frontend, events that would cause further uploads of the same file are queued. This queue only needs to record one event for a given file, e.g. if a file is uploading and gets two more modification events before the first upload completes, the middle event can be dropped.

When a file is uploaded by the drop-upload frontend, events that would cause further uploads of the same file are queued. This queue only needs to record one event for a given file, e.g. if a file is uploading and gets two more modification events before the first upload completes, the middle event can be dropped.
tahoe-lafs added the
unknown
major
defect
1.8.2
labels 2011-07-23 04:35:03 +00:00
tahoe-lafs added this to the undecided milestone 2011-07-23 04:35:03 +00:00
tahoe-lafs added
enhancement
code-frontend
and removed
defect
unknown
labels 2011-07-25 12:27:20 +00:00
daira commented 2014-04-15 01:11:02 +00:00
Author
Owner

See also #2220.

See also #2220.
warner removed the
code-frontend
label 2014-12-02 19:47:11 +00:00
dawuud commented 2015-04-07 22:25:25 +00:00
Author
Owner

I thought of a solution. Would something along these lines be acceptable? :

We use an explicit dequeue implementation and we utilize a helping hash map to track currently queued files/dirs. Each time we add a file to the queue we first check if that file is already in the hashmap. We only add a file to the queue when it is not already in the hashmap. Likewise we must remove it from the hashmap once that queue item has been processed.

I thought of a solution. Would something along these lines be acceptable? : We use an explicit dequeue implementation and we utilize a helping hash map to track currently queued files/dirs. Each time we add a file to the queue we first check if that file is already in the hashmap. We only add a file to the queue when it is not already in the hashmap. Likewise we must remove it from the hashmap once that queue item has been processed.
daira commented 2015-04-10 17:07:55 +00:00
Author
Owner

Replying to dawuud:

We use an explicit dequeue implementation and we utilize a helping hash map to track currently queued files/dirs. Each time we add a file to the queue we first check if that file is already in the hashmap. We only add a file to the queue when it is not already in the hashmap. Likewise we must remove it from the hashmap once that queue item has been processed.

Yes, this is basically the same as the pending_delay code in allmydata/windows/inotify.py. We should add something similar to drop_upload.py, and then we can probably remove the Windows-specific implementation.

Replying to [dawuud](/tahoe-lafs/trac-2024-07-25/issues/1440#issuecomment-84332): > We use an explicit dequeue implementation and we utilize a helping hash map to track currently queued files/dirs. Each time we add a file to the queue we first check if that file is already in the hashmap. We only add a file to the queue when it is not already in the hashmap. Likewise we must remove it from the hashmap once that queue item has been processed. Yes, this is basically the same as the pending_delay code in [allmydata/windows/inotify.py](https://github.com/tahoe-lafs/tahoe-lafs/blob/9cd24713e1e51dbc0834148332ce03f43628b4b5/src/allmydata/windows/inotify.py). We should add something similar to `drop_upload.py`, and then we can probably remove the Windows-specific implementation.
dawuud commented 2015-04-10 23:29:21 +00:00
Author
Owner

ok here's some code that does that:
https://github.com/david415/tahoe-lafs/tree/david-1440-1

although this code doesn't work because i branched from my code for ticket #1449

ok here's some code that does that: <https://github.com/david415/tahoe-lafs/tree/david-1440-1> although this code doesn't work because i branched from my code for ticket #1449
tahoe-lafs modified the milestone from undecided to 1.11.0 2015-04-12 22:33:33 +00:00
dawuud commented 2015-04-14 07:22:56 +00:00
Author
Owner

here's my latest... i've used the explicit deque from the other branch and i've added
a pending set of files... this allows deduplication:
https://github.com/david415/tahoe-lafs/tree/dropupload-redundant-uploads-1

I had to comment out part of the drop uploader unit test to get it to pass.
We need a unit test for this deduplicating of upload events.

here's my latest... i've used the explicit deque from the other branch and i've added a pending set of files... this allows deduplication: <https://github.com/david415/tahoe-lafs/tree/dropupload-redundant-uploads-1> I had to comment out part of the drop uploader unit test to get it to pass. We need a unit test for this deduplicating of upload events.
dawuud commented 2015-04-14 21:17:36 +00:00
Author
Owner

OK I pushed more code into the same branch. I worked on the unit test for deduplicating uploads but so far only the mock test works ;-( Maybe Daira can help me figure this out?


This is indeed incorrect.. and further more after some reflection I think it falls under the category of beginners mistake with regards to being a async interleave concurrency programming design error. ;-)

I'm very interested in know about the advantages and disadvantages of a deferred-based-upload-queue versus an explicit queue like this one.

OK I pushed more code into the same branch. I worked on the unit test for deduplicating uploads but so far only the mock test works ;-( Maybe Daira can help me figure this out? --- This is indeed incorrect.. and further more after some reflection I think it falls under the category of beginners mistake with regards to being a async interleave concurrency programming design error. ;-) I'm very interested in know about the advantages and disadvantages of a deferred-based-upload-queue versus an explicit queue like this one.
dawuud commented 2015-04-15 19:09:08 +00:00
Author
Owner
latest working code here: <https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.1-bugfixes-1>
tahoe-lafs added the
fixed
label 2015-05-02 16:42:30 +00:00
daira closed this issue 2015-05-02 16:42:30 +00:00
daira commented 2015-05-02 16:44:06 +00:00
Author
Owner

Closing this and using ticket #2406 for any further review comments.

Closing this and using ticket #2406 for any further review comments.

Milestone renamed

Milestone renamed
warner modified the milestone from 1.11.0 to 1.12.0 2016-03-22 05:02:52 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1440
No description provided.