Encourage folks to use a third-party backup tool with Tahoe-LAFS integration instead of tahoe backup
#2919
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2919
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
There are many backup tools for all major platforms. Many of them are quite good (support sophisticated backup scenarios, have good user experience, good failure recovery, good documentation, active and ongoing development, etc). Compared to many of these,
tahoe backup
is primitive, unreliable, and difficult to use.I have no doubt that continued development on
tahoe backup
could turn it into a world-class backup tool. However, I have some doubts about whether there is any compelling reason to invest these resources in application development that's not core to the privacy and security goals of Tahoe-LAFS.Instead, what seems reasonable, is that efforts could be focused on integrating Tahoe-LAFS into one or more of these existing tools to provide a high-quality (private, secure, distributed, available) storage engine to complement the existing backup application functionality they already provide.
This allows Tahoe-LAFS development efforts to primarily focus on Tahoe-LAFS core values and backup application development efforts to focus on backup functionality - the best of both worlds.
Therefore, identify a major backup tool with an extensible storage engine (for each major platform) and update the
tahoe backup
documentation to refer users to those tools. If users meet with success in their use of these tools, consider eventually deprecatingtahoe backup
entirely (with an eye toward removing it and the corresponding maintenance burden).duplicity is one such third-party tool which already has Tahoe-LAFS integration (for almost ten years). It talks to a local Tahoe-LAFS client node to perform an incremental tarfile-based backup.
duplicity itself is a mature project with a non-trivial userbase. The Tahoe-LAFS integration appears to basically work though it may not be as polished as the rest of the project (due to limited use, I expect). For example, it doesn't appear to report progress accurately.
duplicity seems to be primarily focused on GNU/Linux but it appears to also work on macOS (it is packaged in Homebrew). It may work on Cygwin on Windows (an independent party seems to be selling Cygwin-based Windows packages w/ support) but the CLI experience is probably not what most Windows users are looking for.
Also, duplicity is implemented in Python so the potential for Tahoe-LAFS developers to contributed improved Tahoe-LAFS support upstream seems high.
It is licensed GPLv2.
duplicati is another third-party tool which also has Tahoe-LAFS integration. It presents a web-based interface (local server) which can be used to configure, monitor, and interact with schedulable backup jobs. It has packages for all three major operating systems and is pretty easy to work with (GUI-based). It also has a CLI interface.
The Tahoe-LAFS integration works (it's a little rough but no worse than that of duplicity). It is implemented in C# and web stuff. It is licensed LGPL.
There are some sizeable tradeoffs to using external backup software.
Duplicity-style full+incremental backups require periodically uploading your entire dataset, even if all the data is already present, to prevent the restore chains from becoming infeasibly long. Furthermore, you can't expire any files out of a backup chain until you do another full, even if none of their data is in use anymore. So Duplicity will often end up using significantly more bandwidth and storage.
Systems like Borg that keep things in smaller chunks do better in terms of bandwidth and storage, but the multiple round-trips needed to update the various chunk stores and indices result in fairly significant latency unless all the Tahoe nodes are on your LAN.
Tahoe's built-in backup option does a good job of being bandwidth and latency efficient, and easily allows expiring old datasets without losing deduplication, but it loses permissions and xattrs and doesn't have any built-in retry functionality if grid connectivity is interrupted.
So it all depends on what it is you're backing up. Having some documentation about which backup programs are known to support Tahoe as a backing store would be good, but the built-in backup function is not so terrible that people should necessarily be encouraged to use something else. With a simple wrapper to detect failed backup attempts and retry it is more than sufficient for simple data sets and the fact that it knows a little about Tahoe internals and will perform rudimentary checking on leases and integrity simplifies its use a little.