tahoe backup should be able to backup symlinks #641
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#641
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Running tahoe backup on a directory containing a symbolic link currently doesn't work. It raises the following exception instead.
Well, it's perhaps easier to discard them for now and simply display a warning message.
Attachment bug-641.dpatch (19195 bytes) added
Here's a patch which makes tahoe backup ignore symlinks.
I've made a patch, which instead of yours, skips everything that isn't a file or a directory. This also work for file that are unix sockets, devices and so on.
Please note that really non-dangling links (targets) gets backupped with or without your patch. Just dangling links are dangerous.
I've attached to this ticket a patch file, 'small_symlink_test.patch' really an hack that alters your code to do a much more simple test without using any other function call or temp dir. If runner under linux it demonstrates that links with real target works, and that maybe your test code fails somewhere in being a real useful test?
Now i'm too tired and i'll look at it more in detail tomorrow, maybe i'll end up with a franken-patch that will glue the best of the two. Have a look at my patches.
Attachment small_symlink_test.patch (2493 bytes) added
Attachment half-fix-for-bug-641.dpatch (23934 bytes) added
While you're at it, you might want to consider also skipping directories which are on other devices. I think it's generally a bad idea to recurse into a network share unless it's been specifically requested. To do that, just look at the st_dev field from lstat. If it doesn't match the st_dev of the parent directory, skip it.
This one is somewhat debatable. For me, I'd rather have it skip network shares because my file server has terabytes of stuff on it and if the backup process goes in there it will never get to the rest of the stuff I want it to back up. Perhaps others have a different perspective.
What's the status of this patch? I've been running it in one my local sandboxes for weeks now, and I just now obliterated those patches in order to test something closer to current trunk. It looks like none of the patches in this ticket has good unit tests yet.
What about mimicking rsync behavior ? It's probably much more intuitive for users to have a consistent default behavior while allowing special cases by the use of additional CLI arguments.
By default, if no special argument given, follow symlinks, cross filesystem boundaries and don't save any special files (fifo, devices and sockets). In case of dangling symlink, display a warning and continue.
Implement new CLI arguments to change this behavior:
Note that implementation that last three options requires a way to store file type and associated parameters in metadata.
I've started using 'tahoe backup' for serious personal use, so I'm starting
to run into these sorts of problems. My first workaround was to hack my
"tahoe backup" client to skip over symlinks.
I like the idea of matching rsync's options, except that we don't have a way
to record non-files yet, so we can't actually implement
--devices
,--specials
, or--links
. Our current default behavior is to followdirectory symlinks, but abort when we encounter a file symlink.
If our cap-string scheme were general enough, I'd say we should create a cap
type that says "here is a filecap, treat its contents as the target of a
symlink" (just like our dircaps say "here is a filecap, treat its contents as
an encoded directory table"). But that's a deeper change.. still appropriate
for this ticket, which after all says "tahoe backup should be able to backup
symlinks", but represents more work than I want to do right now.
Right now, I just want to be able to use "tahoe backup" even though my home
directory has a couple of symlinks in it. I'd be happy with an option to skip
symlinks altogether (whether they point to files or directories), or to skip
file-symlinks. And I'd be happy if we always skipped the special things like
devices and sockets.. I don't have any of those in my home directory..
they're only in /tmp/ and /dev/ and places that I'm not yet trying to back
up.
#729 is an instance of the same problem.
for now (i.e. for 1.6.0), I'm going to have "tahoe backup" skip all symlinks, emitting the same
WARNING: cannot backup special file %s
message that you get with device files and named pipes.From the duplicate #1380 filed by gdt:
Attachment 641-symlink-depth-limit-1.darcs.patch (67525 bytes) added
I would also like for "tahoe backup" to handle symlinks. Most specifically, I like to symlink directories I want backed-up into my main "Dropbox" folder (the target of "tahoe backup" in my crontab).
After a few experiments with Dropbox, it seems that Dropbox 'follows' symlinks to a limit depth, but it doesn't 'preserve' the symlinks (i.e. it does not behave like rsync --links). There seem to be a handful of hazards with following symlinks: you can have infinite recursion if circular symlinks aren't detected, and even without recursion, symlinks can cause redundant data to be stored.
I'm attaching a patch just to show my approach so far, to enforce a symlink depth limit of 3 (for directories only). I'll look into making tests that show how this approach behaves. For my immediate personal needs, this is already a solution.
I rewrote my previous patch from 4 months ago (I forgot I ever posted it here) but nothing has changed in my approach.
I have now added a unit test that creates a directory with a symlink cycle and shows what happens. Cycles are only followed up to 3 levels deep. Other notable behavior is that multiple symlinks to the same file will be uploaded to tahoe_lafs multiple times as separate files.
https://github.com/amiller/tahoe-lafs/pull/1.patch
Per this mailing list discussion, a better way to detect cycles than counting how many symlinks you've traversed is to examine the dev and inode of each thing and raise an exception about recursive symlinks if you encounter the same one a second time. That way we can handle an arbitrarily deep nest of symlinks.
Here's some code I wrote for a different tool that uses dev and inode to identify files:
https://tahoe-lafs.org/trac/dupfilefind/browser/trunk/dupfilefind/dff.py?annotate=blame
Replying to zooko:
imho, it would be a good idea to keep backing up symlinks optional (well, /make/ it optional)
I don't want the "limit it to K levels deep" approach, so I'm unsetting
review-needed
. Thank you for your contribution, amiller!I'm not sure the status of this ticket... but I wanted to past along my github commit, which includes tests and is currently rebased against matser.
https://github.com/amiller/tahoe-lafs/commit/3deafed1c790e076481032536260a29ba2007401