drop-upload: don't perform redundant uploads when a file is quickly modified three or more times #1440
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1440
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When a file is uploaded by the drop-upload frontend, events that would cause further uploads of the same file are queued. This queue only needs to record one event for a given file, e.g. if a file is uploading and gets two more modification events before the first upload completes, the middle event can be dropped.
See also #2220.
I thought of a solution. Would something along these lines be acceptable? :
We use an explicit dequeue implementation and we utilize a helping hash map to track currently queued files/dirs. Each time we add a file to the queue we first check if that file is already in the hashmap. We only add a file to the queue when it is not already in the hashmap. Likewise we must remove it from the hashmap once that queue item has been processed.
Replying to dawuud:
Yes, this is basically the same as the pending_delay code in allmydata/windows/inotify.py. We should add something similar to
drop_upload.py
, and then we can probably remove the Windows-specific implementation.ok here's some code that does that:
https://github.com/david415/tahoe-lafs/tree/david-1440-1
although this code doesn't work because i branched from my code for ticket #1449
here's my latest... i've used the explicit deque from the other branch and i've added
a pending set of files... this allows deduplication:
https://github.com/david415/tahoe-lafs/tree/dropupload-redundant-uploads-1
I had to comment out part of the drop uploader unit test to get it to pass.
We need a unit test for this deduplicating of upload events.
OK I pushed more code into the same branch. I worked on the unit test for deduplicating uploads but so far only the mock test works ;-( Maybe Daira can help me figure this out?
This is indeed incorrect.. and further more after some reflection I think it falls under the category of beginners mistake with regards to being a async interleave concurrency programming design error. ;-)
I'm very interested in know about the advantages and disadvantages of a deferred-based-upload-queue versus an explicit queue like this one.
latest working code here:
https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.1-bugfixes-1
Closing this and using ticket #2406 for any further review comments.
Milestone renamed