"tahoe cp" should avoid full upload/download when the destination already exists (using backupdb and/or plaintext hashes) #658
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#658
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Now that the backupdb seems to be working well for "tahoe backup", it's time to extend "tahoe cp" to use it too.
In the upload direction (tahoe cp LOCAL REMOTE), the backupdb should be used to let us skip a new upload of a file that's already been uploaded. The goal is to allow periodic "tahoe cp LOCAL REMOTE" (with fixed values of LOCAL and REMOTE) to do as little work as possible.
In the download direction (tahoe cp REMOTE LOCAL), the backupdb should also be used, to let us skip a download of a file that's already been downloaded. When a Tahoe file is downloaded and written to local disk, a path+timestamps-to-URI entry should be added to the db. Before downloading a file to local disk, the disk should be checked for an existing file with the same timestamps: if present, and if the URI matches the URI that was going to be downloaded, the download should be skipped.
I think this should be gated by an option that is not the default (or else make it the default for a new command called something other than
cp
). Otherwise, if anything goes wrong then it won't be obvious that the backupdb could be at fault; users are likely considertahoe cp
to be a lower-level operation that copies files unconditionally, like Unixcp
does.Plaintext hashes would be a more robust way of doing this than URI+timestamp (but dependent on #453).
IOW, for downloading a file:
cp
would need to go to the servers to find the concensus value for the plaintext hash of the current version. Then it would proceed as for an immutable file.If the existing file is the correct one, it should still be
touch
ed to update its mtime.For uploading a file, if there is an existing copy then you would have to verify it.
The storage server protocol and webapi would need to be able to return a hash of the file first. (See http://www.usenix.org/events/nsdi04/tech/full_papers/mogul/mogul.pdf for a similar protocol with some relevant discussion of design issues.)
"tahoe cp" should use backupdb, in both directionsto "tahoe cp" should avoid full upload/download when the destination already exists (using backupdb and/or plaintext hashes)This may interact with the planned magic folder db (see source:docs/proposed/magic-folder/filesystem-integration.rst).