backup manager task (inside the node) #1018
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1018
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
So, after finally wrangling the hardware into the right places,
I've finished setting up my personal backup grid, and have
started to upload my large photo archives into it with the "tahoe
backup" CLI tool. It's a large archive, and my rough estimate is
that it will take about 45 hours of continuous uploading to
complete.
One of the nodes is in my parent's house, and their downstream
DSL is not very fast, so I expect that I'm maxing it out while
I'm doing the upload. I don't want to impact their email and web
browsing, so I'm trying to only run the upload at night. At this
rate (8 hours a day) my backup is likely to take about 6 days.
Each night I start "tahoe backup", and each morning I kill it.
The backupdb is working perfectly, and it only takes a few
seconds to skip over the 10k-ish files that have already been
uploaded.
But, what I'm starting to want is something to automate all this.
I'd like to have a "backup manager" task, inside the node, which
knows the source directory and target dirnode, and is configured
with some timing information. Maybe something in tahoe.cfg like
this:
The node would use a small DB to remember how long it's been
since the last backup completed, and wouldn't start a new one
until the ".frequency" duration had elapsed. It would look at
".allowed_times" to figure out whether it's allowed to start a
backup right now or not, and would wait until the window begins.
At that point, it would start a node-side "tahoe backup"
equivalent, and let it run until either it completes or the
window closes, at which point the process would be suspended
until the next window.
The "b1" prefix is just an .ini-format trick to let you specify
multiple jobs.
Once Foolscap learns how to perform bandwidth management
(Foolscap#41), it
would be nice to add a "b1.bandwidth" value, which would tell the
backup manager that this job is not allowed to use more than a
certain amount. I can imagine refinements to that specification,
to say something like "don't send more than X bps to Tub 1234",
to specifically protect my parent's downstream (while not
directly limiting anything else). Another option is to tell the
node what percentage of our resources (upstream/downstream
bandwidth, CPU time) we're willing to put into this task, and
have it throttle the backup job when the usage goes above that
threshold.
Later, when we get a similar "checker/repair/rebalancing manager"
in the node (#450, #543, #483, #661), we could configure it in a
similar way, to control how much time/disk/IO it spends on the
repair task. Because a tahoe-side deep-traversal is so much more
expensive than a local disk walk (where the OS caches a lot of
data), the repair manager probably wants to use a fairly large DB
to keep track of which dirnodes have been visited or not, and
which files haven't been checked in a while, etc. The backup
manager can afford to simply kill and restart the "tahoe backup"
job each time, because the backupdb does a good job of letting it
skip over earlier work.
I'm not entirely sure how to best display the status of this
task. Probably a web page, that shows some estimates of total
files seen and how many have been uploaded or skipped so far. But
I don't know how this page needs to be protected. If we don't put
any controls on it, and don't display anything too secret (like
dircaps), then maybe we can afford to put it at a guessable URL
(like we currently do with the storage server status page). If we
decide that it contains sensitive data, or we want to add
controls (like "pause backup", or maybe let you twiddle config
settings right from the web page), then it needs to be
unguessable. #674 is about having private WUI pages like this.
Replying to warner:
I'd like that as well!
Additionally, some kind of throttle config.option for the uploading node/client would be great, so that the uploader's connection is still usable and not maxed out constantly during large uploads, even if a helper service is used.