drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway) #1449
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1449
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This is related to #719, but it may be a more significant problem for the drop-upload frontend because it starts monitoring the directory immediately.
In the latest #1429 patch, the 'Operational Statistics' page shows the number of drop-uploads that have failed, and there may be information about those failures in logs, but there is no other indication to the user that changed files have not been successfully uploaded.
drop-upload: files may not be uploaded with sufficient diversity if few servers are connected (e.g. soon after starting the gateway)to drop-upload: updates to files may be lost if few servers are connected (e.g. soon after starting the gateway)I've got a very rough draft untested solution to this ticket right here:
https://github.com/david415/tahoe-lafs/tree/david-1449
Results of David and my pairing session today: https://github.com/daira/tahoe-lafs/commits/1449.wait-for-enough-servers.1
my current state is now:
commit a6708d07e7a54b0c44ae51602f40b4919ca834fe of branch https://github.com/david415/tahoe-lafs/tree/david-1449
the client unit tests now pass.
i've removed the storage client test.
the drop upload test fails... hangs forever.
in more recent commits i got more unit tests to pass...
however, having trouble getting the drop upload test to pass, still.
ok i pushed my latest changes to here:
https://github.com/david415/tahoe-lafs/tree/david-1449
i wasn't able to get the drop uploader unit tests passing so i just fixed naming convention usage like Daira mentioned earlier.
fix it with a deque! same branch. please review.
On #tahoe-lafs:
self.uploader.startService()
returns a deferred which is dropped here. (Maybe it should be synchronous, the Twisted API doc is not clear.)I really want linear types for deferreds, so they can't be dropped implicitly!
Rebased at https://github.com/daira/tahoe-lafs/commits/1449.dropupload-redundant-uploads.2.
new working code here -> https://github.com/david415/tahoe-lafs/tree/1449.dropupload-redundant-uploads.2
i fixed the add-to-pending-bug
perform uploads serially (let's optimize later!)
push and pop the deque asynchronously (is that the correct term?) This design is highly influenced by Foolscap's
eventually
...I just designed another uploader deque. Here:
https://github.com/david415/tahoe-lafs/tree/2406.otf-objective-2.3.1-fix-upload-deque
I believe this to be a correct design that enforces sequential uploads and asynchronous deque appends... without the weird concurrent interleave bugs of my sloppy previous attempts.
I think this:
https://github.com/david415/tahoe-lafs/blob/2406.otf-objective-2.3.1-fix-upload-deque/src/allmydata/client.py#L349-L353
is bad because it may cause unbalanced share allocation to storage servers. It seems likely that only connecting to K or H+1 servers would caused a single file's shares to be clustered on a smaller number of servers... meaning that some individual servers will get more than one share from that same file. This is bad... especially given that we do not yet have a "rebalancing" commandline tool of any kind.
I'm happy with the min(N, H+1) heuristic for now; we can reconsider this with Zooko and Brian's input before we merge to trunk.
Closing this and using ticket #2406 for any further review comments.
Milestone renamed
In a56a3ad/trunk: