the mutable publisher should try harder to place all shares #1640
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1640
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If a connection error is encountered while pushing a share to a storage server, the mutable publisher forgets about the writer object associated with the (share, server) placement; this is consistent with the pre-1.9 publisher, and, in high level terms, means that the publisher views that share placement as probably invalid, associating the error with a server failure or something like it. The pre-1.9 publisher attempts to find another home for the share placed on the broken server. The current publisher doesn't.
When I first wrote the publisher, I wanted to support streaming upload of mutable files. That made it hard to find a new home for a share placed on a broken storage server, since we wouldn't necessarily have all of the parts of the share we generated and placed before the failure available to upload to a new server. We ended up ditching streaming uploads due to other concerns; instead, we write a share all at once, and we have everything we will write to a storage server available to us when we write. Given this, there's no compelling reason that the publisher couldn't attempt to find a new home for shares placed on broken servers. Ensuring that all shares are placed if at all possible makes it more likely that there will be a recoverable version of the mutable file available after an update.
In practical terms, this increases the chance of data loss somewhat, proportional to the number of servers that fail during a publish operation. If too many storage servers fail during the upload process and too much of the initial share placement is lost due to these failures, the newly-placed mutable file might not be recoverable. A fix would involve a way to change the server associated with a writer after the writer is created, and probably some control flow changes to ensure that write failures result in shares being reassigned.
I'm not sure this is important enough to warrant trying to fix it in a 1.9.1. That's because a server failing during an upload isn't that common, and if it does happen it isn't that damaging. Or, wait, does mutable upload have a servers-of-happiness-style of check to return a failure message in case the file is not sufficiently robustly stored?
The old publisher won't finish until it has either placed all of its shares somewhere or has tried and failed a certain number of times to do place all of its shares somewhere. In the second case, a failure message is returned. The new publisher will return a failure message if it can't place enough shares for the file to be recoverable. So the robustness criterion in the old publisher is whether all shares are placed somewhere, and the robustness criterion in the new publisher is whether enough shares are placed for the file to be recoverable.
Replying to zooko:
No, that is ticket #1057.