Encourage folks to use a third-party backup tool with Tahoe-LAFS integration instead of tahoe backup #2919

Open
opened 2018-04-10 16:58:03 +00:00 by exarkun · 3 comments

There are many backup tools for all major platforms. Many of them are quite good (support sophisticated backup scenarios, have good user experience, good failure recovery, good documentation, active and ongoing development, etc). Compared to many of these, tahoe backup is primitive, unreliable, and difficult to use.

I have no doubt that continued development on tahoe backup could turn it into a world-class backup tool. However, I have some doubts about whether there is any compelling reason to invest these resources in application development that's not core to the privacy and security goals of Tahoe-LAFS.

Instead, what seems reasonable, is that efforts could be focused on integrating Tahoe-LAFS into one or more of these existing tools to provide a high-quality (private, secure, distributed, available) storage engine to complement the existing backup application functionality they already provide.

This allows Tahoe-LAFS development efforts to primarily focus on Tahoe-LAFS core values and backup application development efforts to focus on backup functionality - the best of both worlds.

Therefore, identify a major backup tool with an extensible storage engine (for each major platform) and update the tahoe backup documentation to refer users to those tools. If users meet with success in their use of these tools, consider eventually deprecating tahoe backup entirely (with an eye toward removing it and the corresponding maintenance burden).

There are many backup tools for all major platforms. Many of them are quite good (support sophisticated backup scenarios, have good user experience, good failure recovery, good documentation, active and ongoing development, etc). Compared to many of these, `tahoe backup` is primitive, unreliable, and difficult to use. I have no doubt that continued development on `tahoe backup` *could* turn it into a world-class backup tool. However, I have some doubts about whether there is any compelling reason to invest these resources in application development that's not core to the privacy and security goals of Tahoe-LAFS. Instead, what seems reasonable, is that efforts could be focused on integrating Tahoe-LAFS into one or more of these existing tools to provide a high-quality (private, secure, distributed, available) storage engine to complement the existing backup application functionality they already provide. This allows Tahoe-LAFS development efforts to primarily focus on Tahoe-LAFS core values and backup application development efforts to focus on backup functionality - the best of both worlds. Therefore, identify a major backup tool with an extensible storage engine (for each major platform) and update the `tahoe backup` documentation to refer users to those tools. If users meet with success in their use of these tools, consider eventually deprecating `tahoe backup` entirely (with an eye toward removing it and the corresponding maintenance burden).
exarkun added the
code-frontend-cli
normal
enhancement
1.12.1
labels 2018-04-10 16:58:03 +00:00
exarkun added this to the undecided milestone 2018-04-10 16:58:03 +00:00
Author

duplicity is one such third-party tool which already has Tahoe-LAFS integration (for almost ten years). It talks to a local Tahoe-LAFS client node to perform an incremental tarfile-based backup.

$ duplicity --no-encryption backup-dummy/ tahoe://<alias>/<path>
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1523380691.24 (Tue Apr 10 13:18:11 2018)
EndTime 1523380691.24 (Tue Apr 10 13:18:11 2018)
ElapsedTime 0.00 (0.00 seconds)
SourceFiles 5
SourceFileSize 8204 (8.01 KB)
NewFiles 5
NewFileSize 8204 (8.01 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 5
RawDeltaSize 12 (12 bytes)
TotalDestinationSizeChange 208 (208 bytes)
Errors 0
-------------------------------------------------

duplicity itself is a mature project with a non-trivial userbase. The Tahoe-LAFS integration appears to basically work though it may not be as polished as the rest of the project (due to limited use, I expect). For example, it doesn't appear to report progress accurately.

duplicity seems to be primarily focused on GNU/Linux but it appears to also work on macOS (it is packaged in Homebrew). It may work on Cygwin on Windows (an independent party seems to be selling Cygwin-based Windows packages w/ support) but the CLI experience is probably not what most Windows users are looking for.

Also, duplicity is implemented in Python so the potential for Tahoe-LAFS developers to contributed improved Tahoe-LAFS support upstream seems high.

It is licensed GPLv2.

[duplicity](http://duplicity.nongnu.org/) is one such third-party tool which already has Tahoe-LAFS integration (for almost ten years). It talks to a local Tahoe-LAFS client node to perform an incremental tarfile-based backup. ``` $ duplicity --no-encryption backup-dummy/ tahoe://<alias>/<path> Local and Remote metadata are synchronized, no sync needed. Last full backup date: none No signatures found, switching to full backup. --------------[ Backup Statistics ]-------------- StartTime 1523380691.24 (Tue Apr 10 13:18:11 2018) EndTime 1523380691.24 (Tue Apr 10 13:18:11 2018) ElapsedTime 0.00 (0.00 seconds) SourceFiles 5 SourceFileSize 8204 (8.01 KB) NewFiles 5 NewFileSize 8204 (8.01 KB) DeletedFiles 0 ChangedFiles 0 ChangedFileSize 0 (0 bytes) ChangedDeltaSize 0 (0 bytes) DeltaEntries 5 RawDeltaSize 12 (12 bytes) TotalDestinationSizeChange 208 (208 bytes) Errors 0 ------------------------------------------------- ``` duplicity itself is a mature project with a non-trivial userbase. The Tahoe-LAFS integration appears to basically work though it may not be as polished as the rest of the project (due to limited use, I expect). For example, it doesn't appear to report progress accurately. duplicity seems to be primarily focused on GNU/Linux but it appears to also work on macOS (it is packaged in Homebrew). It may work on Cygwin on Windows (an independent party seems to be selling Cygwin-based Windows packages w/ support) but the CLI experience is probably not what most Windows users are looking for. Also, duplicity is implemented in Python so the potential for Tahoe-LAFS developers to contributed improved Tahoe-LAFS support upstream seems high. It is licensed GPLv2.
Author

duplicati is another third-party tool which also has Tahoe-LAFS integration. It presents a web-based interface (local server) which can be used to configure, monitor, and interact with schedulable backup jobs. It has packages for all three major operating systems and is pretty easy to work with (GUI-based). It also has a CLI interface.

The Tahoe-LAFS integration works (it's a little rough but no worse than that of duplicity). It is implemented in C# and web stuff. It is licensed LGPL.

[duplicati](https://www.duplicati.com/) is another third-party tool which also has Tahoe-LAFS integration. It presents a web-based interface (local server) which can be used to configure, monitor, and interact with schedulable backup jobs. It has packages for all three major operating systems and is pretty easy to work with (GUI-based). It _also_ has a CLI interface. The Tahoe-LAFS integration works (it's a little rough but no worse than that of duplicity). It is implemented in C# and web stuff. It is licensed LGPL.
tlhonmey commented 2018-09-10 03:37:18 +00:00
Owner

There are some sizeable tradeoffs to using external backup software.

Duplicity-style full+incremental backups require periodically uploading your entire dataset, even if all the data is already present, to prevent the restore chains from becoming infeasibly long. Furthermore, you can't expire any files out of a backup chain until you do another full, even if none of their data is in use anymore. So Duplicity will often end up using significantly more bandwidth and storage.

Systems like Borg that keep things in smaller chunks do better in terms of bandwidth and storage, but the multiple round-trips needed to update the various chunk stores and indices result in fairly significant latency unless all the Tahoe nodes are on your LAN.

Tahoe's built-in backup option does a good job of being bandwidth and latency efficient, and easily allows expiring old datasets without losing deduplication, but it loses permissions and xattrs and doesn't have any built-in retry functionality if grid connectivity is interrupted.

So it all depends on what it is you're backing up. Having some documentation about which backup programs are known to support Tahoe as a backing store would be good, but the built-in backup function is not so terrible that people should necessarily be encouraged to use something else. With a simple wrapper to detect failed backup attempts and retry it is more than sufficient for simple data sets and the fact that it knows a little about Tahoe internals and will perform rudimentary checking on leases and integrity simplifies its use a little.

There are some sizeable tradeoffs to using external backup software. Duplicity-style full+incremental backups require periodically uploading your entire dataset, even if all the data is already present, to prevent the restore chains from becoming infeasibly long. Furthermore, you can't expire any files out of a backup chain until you do another full, even if none of their data is in use anymore. So Duplicity will often end up using significantly more bandwidth and storage. Systems like Borg that keep things in smaller chunks do better in terms of bandwidth and storage, but the multiple round-trips needed to update the various chunk stores and indices result in fairly significant latency unless all the Tahoe nodes are on your LAN. Tahoe's built-in backup option does a good job of being bandwidth and latency efficient, and easily allows expiring old datasets without losing deduplication, but it loses permissions and xattrs and doesn't have any built-in retry functionality if grid connectivity is interrupted. So it all depends on what it is you're backing up. Having some documentation about which backup programs are known to support Tahoe as a backing store would be good, but the built-in backup function is not so terrible that people should necessarily be encouraged to use something else. With a simple wrapper to detect failed backup attempts and retry it is more than sufficient for simple data sets and the fact that it knows a little about Tahoe internals and will perform rudimentary checking on leases and integrity simplifies its use a little.
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#2919
No description provided.