drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway) #1449

Closed
opened 2011-07-27 16:47:11 +00:00 by davidsarah · 17 comments
davidsarah commented 2011-07-27 16:47:11 +00:00
Owner

This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately.

In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.

This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately. In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.
tahoe-lafs added the
code-frontend
major
defect
1.8.2
labels 2011-07-27 16:47:11 +00:00
tahoe-lafs added this to the soon milestone 2011-07-27 16:47:11 +00:00
tahoe-lafs changed title from drop-upload: files may not be uploaded with sufficient diversity if few servers are connected (e.g. soon after starting the gateway) to drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway) 2011-07-28 03:43:47 +00:00
warner removed the
code-frontend
label 2014-12-02 19:47:23 +00:00
dawuud commented 2015-04-10 03:47:10 +00:00
Author
Owner

I've got a very rough draft untested solution to this ticket right here:
https://github.com/david415/tahoe-lafs/tree/david-1449

I've got a very rough draft untested solution to this ticket right here: <https://github.com/david415/tahoe-lafs/tree/david-1449>
daira commented 2015-04-10 17:51:57 +00:00
Author
Owner
Results of David and my pairing session today: <https://github.com/daira/tahoe-lafs/commits/1449.wait-for-enough-servers.1>
dawuud commented 2015-04-10 20:49:31 +00:00
Author
Owner

my current state is now:
commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch https://github.com/david415/tahoe-lafs/tree/david-1449

the client unit tests now pass.
i've removed the storage client test.

the drop upload test fails... hangs forever.

my current state is now: commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch <https://github.com/david415/tahoe-lafs/tree/david-1449> the client unit tests now pass. i've removed the storage client test. the drop upload test fails... hangs forever.
dawuud commented 2015-04-10 23:02:59 +00:00
Author
Owner

in more recent commits i got more unit tests to pass...

however, having trouble getting the drop upload test to pass, still.

in more recent commits i got more unit tests to pass... however, having trouble getting the drop upload test to pass, still.
tahoe-lafs modified the milestone from soon to 1.11.0 2015-04-12 22:34:18 +00:00
dawuud commented 2015-04-13 23:38:26 +00:00
Author
Owner

ok i pushed my latest changes to here:
https://github.com/david415/tahoe-lafs/tree/david-1449

i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.

ok i pushed my latest changes to here: <https://github.com/david415/tahoe-lafs/tree/david-1449> i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.
dawuud commented 2015-04-14 06:20:09 +00:00
Author
Owner

fix it with a deque! same branch. please review.

fix it with a deque! same branch. please review.
daira commented 2015-04-14 16:59:38 +00:00
Author
Owner

On #tahoe-lafs:

daira: dawuud: the current code in DropUploader._notify will (in the path not in self._pending branch) call _append_to_deque which adds the path to self._pending, then process the deque (synchronously), then add the path to self._pending again
daira: the second self._pending.add(path) is wrong and should be deleted
daira: processing the deque synchronously also may cause problems
daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous
daira: in particular, note that the deferred that is returned by _process is dropped by the call to func(*fields[1:]) in _process_deque
daira: so this code will try to upload things in parallel...
daira: which may work for immutable files, but is a bad idea for mutables, especially directories
daira: I'll rebase the code as it is, anyway, so that we can review it more easily

On #tahoe-lafs: > daira: dawuud: the current code in `DropUploader._notify` will (in the `path not in self._pending` branch) call `_append_to_deque` which adds the path to `self._pending`, then process the deque (synchronously), then add the path to `self._pending` again > daira: the second `self._pending.add(path)` is wrong and should be deleted > daira: processing the deque synchronously also may cause problems > daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous > daira: in particular, note that the deferred that is returned by `_process` is dropped by the call to `func(*fields[1:])` in `_process_deque` > daira: so this code will try to upload things in parallel... > daira: which may work for immutable files, but is a bad idea for mutables, especially directories > daira: I'll rebase the code as it is, anyway, so that we can review it more easily
daira commented 2015-04-14 17:05:00 +00:00
Author
Owner
-            return self.uploader.startService()
+            self.uploader.setServiceParent(self.client)
+            self.uploader.startService()
+            self.uploader.upload_ready()
+            return None

self.uploader.startService() returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)

``` - return self.uploader.startService() + self.uploader.setServiceParent(self.client) + self.uploader.startService() + self.uploader.upload_ready() + return None ``` `self.uploader.startService()` returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)
daira commented 2015-04-14 17:05:51 +00:00
Author
Owner

I really want linear types for deferreds, so they can't be dropped implicitly!

I really want linear types for deferreds, so they can't be dropped implicitly!
daira commented 2015-04-14 17:27:25 +00:00
Author
Owner
Rebased at <https://github.com/daira/tahoe-lafs/commits/1449.dropupload-redundant-uploads.2>.
dawuud commented 2015-04-14 19:17:11 +00:00
Author
Owner

new working code here -> https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2

  • i fixed the add-to-pending-bug

  • perform uploads serially (let's optimize later!)

  • push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's eventually...

new working code here -> <https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2> - i fixed the add-to-pending-bug - perform uploads serially (let's optimize later!) - push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's `eventually`...
dawuud commented 2015-04-17 20:11:18 +00:00
Author
Owner

I just designed another uploader deque. Here:
https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque

I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.

I just designed another uploader deque. Here: <https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque> I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.
dawuud commented 2015-04-23 01:24:52 +00:00
Author
Owner

I think this:
https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353

is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.

I think this: <https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353> is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.
daira commented 2015-04-27 16:08:01 +00:00
Author
Owner

I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.

I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.
tahoe-lafs added the
fixed
label 2015-05-02 16:42:42 +00:00
daira closed this issue 2015-05-02 16:42:42 +00:00
daira commented 2015-05-02 16:44:11 +00:00
Author
Owner

Closing this and using ticket #2406 for any further review comments.

Closing this and using ticket #2406 for any further review comments.

Milestone renamed

Milestone renamed
warner modified the milestone from 1.11.0 to 1.12.0 2016-03-22 05:02:52 +00:00
meejah <meejah@meejah.ca> commented 2016-04-26 19:51:59 +00:00
Author
Owner

In a56a3ad/trunk:

Teach StorageFarmBroker to fire a deferred when a connection threshold is reached. refs #1449

Signed-off-by: Daira Hopwood <daira@jacaranda.org>
In [a56a3ad/trunk](/tahoe-lafs/trac-2024-07-25/commit/a56a3adaae52dd8e128c9dc9631985af8d207a63): ``` Teach StorageFarmBroker to fire a deferred when a connection threshold is reached. refs #1449 Signed-off-by: Daira Hopwood <daira@jacaranda.org> ```
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1449
No description provided.