drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway) #1449

New Issue

tahoe-lafs · 2011-07-27T16:47:11Z

davidsarah commented

2011-07-27 16:47:11 +00:00

This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately.

In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.

This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately. In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.

tahoe-lafs added the

labels 2011-07-27 16:47:11 +00:00

tahoe-lafs added this to the soon milestone 2011-07-27 16:47:11 +00:00

tahoe-lafs changed title from ~~drop-upload: files may not be uploaded with sufficient diversity if few servers are connected (e.g. soon after starting the gateway)~~ to drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)

2011-07-28 03:43:47 +00:00

warner removed the

code-frontend

label 2014-12-02 19:47:23 +00:00

dawuud commented

2015-04-10 03:47:10 +00:00

I've got a very rough draft untested solution to this ticket right here:
https://github.com/david415/tahoe-lafs/tree/david-1449

I've got a very rough draft untested solution to this ticket right here: <https://github.com/david415/tahoe-lafs/tree/david-1449>

daira commented

2015-04-10 17:51:57 +00:00

Results of David and my pairing session today: https://github.com/daira/tahoe-lafs/commits/1449.wait-for-enough-servers.1

Results of David and my pairing session today: <https://github.com/daira/tahoe-lafs/commits/1449.wait-for-enough-servers.1>

dawuud commented

2015-04-10 20:49:31 +00:00

my current state is now:
commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch https://github.com/david415/tahoe-lafs/tree/david-1449

the client unit tests now pass.
i've removed the storage client test.

the drop upload test fails... hangs forever.

my current state is now: commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch <https://github.com/david415/tahoe-lafs/tree/david-1449> the client unit tests now pass. i've removed the storage client test. the drop upload test fails... hangs forever.

dawuud commented

2015-04-10 23:02:59 +00:00

in more recent commits i got more unit tests to pass...

however, having trouble getting the drop upload test to pass, still.

in more recent commits i got more unit tests to pass... however, having trouble getting the drop upload test to pass, still.

tahoe-lafs modified the milestone from soon to 1.11.0

2015-04-12 22:34:18 +00:00

dawuud commented

2015-04-13 23:38:26 +00:00

ok i pushed my latest changes to here:
https://github.com/david415/tahoe-lafs/tree/david-1449

i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.

ok i pushed my latest changes to here: <https://github.com/david415/tahoe-lafs/tree/david-1449> i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.

dawuud commented

2015-04-14 06:20:09 +00:00

fix it with a deque! same branch. please review.

daira commented

2015-04-14 16:59:38 +00:00

On #tahoe-lafs:

daira: dawuud: the current code in DropUploader._notify will (in the path not in self._pending branch) call _append_to_deque which adds the path to self._pending, then process the deque (synchronously), then add the path to self._pending again
daira: the second self._pending.add(path) is wrong and should be deleted
daira: processing the deque synchronously also may cause problems
daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous
daira: in particular, note that the deferred that is returned by _process is dropped by the call to func(*fields[1:]) in _process_deque
daira: so this code will try to upload things in parallel...
daira: which may work for immutable files, but is a bad idea for mutables, especially directories
daira: I'll rebase the code as it is, anyway, so that we can review it more easily

On #tahoe-lafs: > daira: dawuud: the current code in `DropUploader._notify` will (in the `path not in self._pending` branch) call `_append_to_deque` which adds the path to `self._pending`, then process the deque (synchronously), then add the path to `self._pending` again > daira: the second `self._pending.add(path)` is wrong and should be deleted > daira: processing the deque synchronously also may cause problems > daira: it may be the change to synchronous processing that made the tests work, but I think we probably have to change it back to asynchronous > daira: in particular, note that the deferred that is returned by `_process` is dropped by the call to `func(*fields[1:])` in `_process_deque` > daira: so this code will try to upload things in parallel... > daira: which may work for immutable files, but is a bad idea for mutables, especially directories > daira: I'll rebase the code as it is, anyway, so that we can review it more easily

daira commented

2015-04-14 17:05:00 +00:00

-            return self.uploader.startService()
+            self.uploader.setServiceParent(self.client)
+            self.uploader.startService()
+            self.uploader.upload_ready()
+            return None

self.uploader.startService() returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)

``` - return self.uploader.startService() + self.uploader.setServiceParent(self.client) + self.uploader.startService() + self.uploader.upload_ready() + return None ``` `self.uploader.startService()` returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)

daira commented

2015-04-14 17:05:51 +00:00

I really want linear types for deferreds, so they can't be dropped implicitly!

daira commented

2015-04-14 17:27:25 +00:00

Rebased at https://github.com/daira/tahoe-lafs/commits/1449.dropupload-redundant-uploads.2.

Rebased at <https://github.com/daira/tahoe-lafs/commits/1449.dropupload-redundant-uploads.2>.

dawuud commented

2015-04-14 19:17:11 +00:00

new working code here -> https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2

i fixed the add-to-pending-bug
perform uploads serially (let's optimize later!)
push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's eventually...

new working code here -> <https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2> - i fixed the add-to-pending-bug - perform uploads serially (let's optimize later!) - push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's `eventually`...

dawuud commented

2015-04-17 20:11:18 +00:00

I just designed another uploader deque. Here:
https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque

I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.

I just designed another uploader deque. Here: <https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque> I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.

dawuud commented

2015-04-23 01:24:52 +00:00

I think this:
https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353

is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.

I think this: <https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353> is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.

daira commented

2015-04-27 16:08:01 +00:00

I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.

tahoe-lafs added the

fixed

label 2015-05-02 16:42:42 +00:00

daira closed this issue

2015-05-02 16:42:42 +00:00

daira commented

2015-05-02 16:44:11 +00:00

Closing this and using ticket #2406 for any further review comments.

warner commented

2016-03-22 05:02:52 +00:00

Milestone renamed

warner modified the milestone from 1.11.0 to 1.12.0

2016-03-22 05:02:52 +00:00

meejah <meejah@meejah.ca> commented

2016-04-26 19:51:59 +00:00

In a56a3ad/trunk:

Teach StorageFarmBroker to fire a deferred when a connection threshold is reached. refs #1449

Signed-off-by: Daira Hopwood <daira@jacaranda.org>

In [a56a3ad/trunk](/tahoe-lafs/trac-2024-07-25/commit/a56a3adaae52dd8e128c9dc9631985af8d207a63): ``` Teach StorageFarmBroker to fire a deferred when a connection threshold is reached. refs #1449 Signed-off-by: Daira Hopwood <daira@jacaranda.org> ```

Sign in to join this conversation.