build some share-migration tools #481
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#481
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Zandr and I were talking about what sorts of tools we'd like to have
available when it comes time to move shares from one disk to another.
The Repairer is of course the first priority, and should be able to handle
share loss, but there are some techniques we might use to make things more
efficient: using shares that already exist instead of generating new ones.
If we have a large disk full of shares that has some problems (bad blocks,
etc), we should be able to dd or scp off the shares to another system. This
wants a tool that will try to read a file (skipping it if we get io errors),
verify as much of it as we can (seeing if the UEB hash matches), then sending
it over the network to somewhere else.
If a disk is starting to fail (we've seen SMART statistics, or we're starting
to see hash failures in the shares we return, etc), then we might want to
kick the disk into "abandon ship" mode: get all shares off the disk (and onto
better ones) as quickly as possible. The server could do the peer-selection
work and ask around and find the "right" server for each share (i.e. the
first one in the permuted order that doesn't already have a share), or it
could just fling them to a "lifeboat" node and leave the peer-selection work
until later.
Repair nodes should have a directory where we can dump shares that came from
other servers: the repair node should treat that directory as a work queue,
and it should find a home for each one (or discard it as a duplicate). The
repair node needs to be careful to not treat abandon-ship nodes as suitable
targets, so we can avoid putting shares back on the server that was trying to
get rid of them.
It might also be useful to split up a storage server, or to take a functional
server and export half of its shares in a kind of rebalancing step.
Duplicate of #864.