Build intermitently-connected replication-only storage grid #2123
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2123
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'm trying to achieve the following Tahoe-LAFS scenario:
Assumptions
Requirements
Proposed setup
With this setup, the process would be as follows:
Current problem
Share placement as of it is today, cannot guarantee that no share is wrongly placed in a node where there is already one. With a k=1/N=2, if a cron triggered repair is run when node is isolated, we would be wasting space, since the local node already holds shares enough to retrieve the whole file.
Worse still: one single node holding both of N shares would prevent arriving nodes to get their replicas (their own local share), since repairer would be satisfied with both shares being present in the grid, even in the same node. This could lead to shares never achieving any replication outside of the creator node, creating a SPOF for data.
Proposed solution
Add server-side configuration option in storage nodes to make them gently reject holding shares in excess of k. This would address space wasting. Also, since local storage node would refuse to store an extra/unneeded share, a new storage node arriving would get the remaining share at repair time to fulfill the desired N, thus achieving/increasing replication.
Current/future placement improvements can't be relied on to be able to achieve this easily and, since look like it's more of a storage server-side policy, it's unlikely. At least, as far as I'm currenly able to understand share placement now or how that could even be achieved with minimal/enough guarantee (sometimes this gets quantum-physics to me). I think it's too much to rely on upload client behavior/decisions and they will have very limited knowledge window of the whole grid, IMO.
Apart from the described use case, this setting would be useful in other scenarios where the storage node operator should exercise some control for other reasons.
I've discussed this scenario already with Daira and Warner to ensure that the described solution would work for this scenario. As per Zooko's suggestion, I've done this writeup to allow some discussion before jumping into coding in my own branch as the next step. That's in a separate ticket (#2124), just to keep feature specs and implementation separate from this single use case, since I think other scenarios might come up that could benefit from implementing the proposed solution.
I've also collected implementation details while discussing this with Daira and Warner ~~
but I'll leave that for the followup ticket~~ that can also be found at #2124.
Anyone else interested in this scenario? Suggestions/imporvements?
For more info, this ticket is a subset of #1657. See also related issues: #793 and #1107.
"No wasted space. Here, "wasted space" is defined as "shares in excess of necessary to read the file locally" (>k). We want only to hold shares enough to have a full local replica of the grid readable, not any more."
This requirement is #2107, which, I must confess has confused the hell out of me. Please visit #2107 and see if you can explain your requirement precisely enough that someone could implement it.
Linking ticket #2124 with feature implementation.
Linking to related issues: #793, #1107, #1657 in issue summary.