Cater to rsync as a target Tahoe client. #78

Open
opened 2007-07-05 21:46:52 +00:00 by nejucomo · 4 comments

Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device.

What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change?

If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval.

One sufficient support feature would be file-system emulation (fuse, WebDav, ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.

Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device. What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change? If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval. One sufficient support feature would be file-system emulation (fuse, [WebDav](wiki/WebDav), ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.
nejucomo added the
code
major
enhancement
0.4.0
labels 2007-07-05 21:46:52 +00:00
nejucomo added this to the eventually milestone 2007-07-05 21:46:52 +00:00

this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all?

To support this, we'd probably need to use something other than CHK.

this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all? To support this, we'd probably need to use something other than CHK.
warner added
minor
and removed
major
labels 2007-07-25 03:06:50 +00:00

Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).

Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).
warner modified the milestone from eventually to undecided 2008-06-01 20:53:01 +00:00
davidsarah commented 2009-11-23 03:34:17 +00:00
Owner

rsync coarse hashes are documented at http://klubkev.org/rsync/ . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.

rsync coarse hashes are documented at <http://klubkev.org/rsync/> . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.

hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the ["crypttext_hash"] key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for ["encrypted_rsync_hashes"], covered by a UEB key named ["encrypted_rsync_hash_root"], ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for.

(btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly disallowing extensions like this)

hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the `["crypttext_hash"]` key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for `["encrypted_rsync_hashes"]`, covered by a UEB key named `["encrypted_rsync_hash_root"]`, ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for. (btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly *disallowing* extensions like this)
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#78
No description provided.