Cater to rsync as a target Tahoe client. #78
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#78
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Imagine a scenario where a sysadmin of a large enterprise network needs to perform routine backups, and does so by rsyncing from many clients to one large raid storage device.
What if they could replace the single large raid with a vdrive, and run tahoe storage nodes on each workstation, and have all of the client-side rsync automation work without change?
If this use case is as common and the Tahoe replacement as useful as I believe it to be, it would behoove Tahoe to cater to rsync for both publication and retrieval.
One sufficient support feature would be file-system emulation (fuse, WebDav, ...) which rsync can already use. However, it may also be worthwhile to implement an rsync-specialized interface to Tahoe if the efficiency-gains-to-development-time tradeoff was right.
this means being able to efficiently modify files in-place, right? and/or record rsync's coarse hashes in some place so the update code could decide which blocks needed to be modified without actually having to download them all?
To support this, we'd probably need to use something other than CHK.
Nowadays, we have two things: Small Decentralized Mutable Files and Immutable Files (which Brian called "CHKs" in the previous message). The former might support rsync okay for sufficiently small files. There is currently a hard limit of 3.5 MB, which ought to be raised, but there will remain a couple of soft limits -- see #359 (eliminate hard limit on size of SDMFs).
rsync coarse hashes are documented at http://klubkev.org/rsync/ . Note that they're not secure hashes (the algorithm uses MD4 and a variant of adler32), so they must be treated as confidential.
hm, perhaps a more open-ended file format could have room for features like this in the UEB hash. For example, our current immutable UEB hash is specified to be a dictionary, in which e.g. the
["crypttext_hash"]
key contains the flat SHA256d hash of the ciphertext. If the share format (or post-decode pre-decrypt ciphertext format) could also be expanded, we could add a section for["encrypted_rsync_hashes"]
, covered by a UEB key named["encrypted_rsync_hash_root"]
, ignored by older clients, but available for more advanced clients to use for .. whatever it is an rsync hash would be useful for.(btw, of course we've discussed elsewhere the security implications of an extensible format and the possible benefits of explicitly disallowing extensions like this)