Dynamic share migration to maintain file health #661
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#661
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Dynamic share repair to maintain file health. based on the following features
already exist in Allmydata-Tahoe1.3 we can improve automatic repair:
Foolscap provides the knowledge of the alive nodes.
Verification of file availability can be delegated to other node through
read-cap or a verify-cap without security risk.
The proposed auto repair process:
Using memory-based algorithm, because client know where the file shares
exist so we can keep tack of alive file shares, for simplicity we
consider that share availability from its node availability.
repair process triggered automatically from the repairer, repair
responsibility has many technique based repair cost ; network bandwidth
and fault tolerant.
time out , we can use lazy repair technique to avoid node temporary node
failure, i.e waiting for a certain time before repair process starts.
reintegration, using memory-based repair technique remembering failed
storage servers, who come back to life, will help in reducing Tahoe grid
resources such as network bandwidth and storage space.
repairer, selection of repair responsibly takes many issues into
consideration: security , repairer location , repairer resources.
I reformatted the original description so that trac will represent the numbered items as a list.
re-reformatted it: I think trac requires the leading space to trigger the "display as list" formatter
The following clump of tickets are closely related:
Actually there are probably too many overlapping tickets here.
Part of the redundancy is due to distinguishing repair from rebalancing. But when #614 and #778 are fixed, a healthy file will by definition be balanced across servers, so there's no need to make that distinction. Perhaps there will also be a "super-healthy" status that means shares are balanced across the maximum number of servers, i.e. N. (When we support geographic dispersal / rack-awareness, the definitions of "healthy" and "super-healthy" will presumably change again so that they also imply that shares have the desired distribution.)
There are basically four options for how repair/rebalancing could be triggered:
The last option does not justify 4 tickets! (#450, #483, #543, #661) Unless anyone objects, I'm going to merge these all into #483 [actually #543]edit:.
Duplicate of #543.