Tahoe backup should WARN and go on when finding errors like: links to deleted files or access/read permission denied in local files/directories #729

Closed
opened 2009-06-02 19:17:36 +00:00 by stockrt · 16 comments
stockrt commented 2009-06-02 19:17:36 +00:00
Owner

I am facing this problem when backing up my files:

stockrt@host ~/ $ tahoe backup -v stockrt tahoe:backups/stockrt

Traceback (most recent call last):
  File "/usr/bin/tahoe", line 8, in <module>
    load_entry_point('allmydata-tahoe==1.4.1', 'console_scripts', 'tahoe')()
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/runner.py", line 91, in run
    rc = runner(sys.argv[1:])
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/runner.py", line 78, in runner
    rc = cli.dispatch[command](so)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/cli.py", line 424, in backup
    rc = tahoe_backup.backup(options)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 374, in backup
    return bu.run()
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 216, in run
    new_backup_dircap = self.process(options.from_dir, latest_backup_dircap)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process
    newchilddircap = self.process(childpath, oldchildcap)
  File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 272, in process
    raise BackupProcessingError("Cannot backup this file %r" % childpath)
allmydata.scripts.tahoe_backup.BackupProcessingError: Cannot backup this file 'stockrt/Roger/etc/namedb'

stockrt@host ~/ $ ll stockrt/Roger/etc/namedb
lrwxrwxrwx 1 stockrt stockrt 21 2009-01-29 23:00 stockrt/Roger/etc/namedb -> /var/named/etc/namedb

This destination does not exists in my local filesystem: /var/named/etc/namedb but I have this (broken) link in my local "to backup" directory: stockrt/Roger/etc

I think Tahoe should warn it but do not break just because of it.

The same should occur for read/access permission denied for files and directories.

Regards,

Rogério Schneider

I am facing this problem when backing up my files: ``` stockrt@host ~/ $ tahoe backup -v stockrt tahoe:backups/stockrt Traceback (most recent call last): File "/usr/bin/tahoe", line 8, in <module> load_entry_point('allmydata-tahoe==1.4.1', 'console_scripts', 'tahoe')() File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/runner.py", line 91, in run rc = runner(sys.argv[1:]) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/runner.py", line 78, in runner rc = cli.dispatch[command](so) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/cli.py", line 424, in backup rc = tahoe_backup.backup(options) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 374, in backup return bu.run() File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 216, in run new_backup_dircap = self.process(options.from_dir, latest_backup_dircap) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process newchilddircap = self.process(childpath, oldchildcap) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process newchilddircap = self.process(childpath, oldchildcap) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 266, in process newchilddircap = self.process(childpath, oldchildcap) File "/usr/lib/python2.5/site-packages/allmydata_tahoe-1.4.1-py2.5.egg/allmydata/scripts/tahoe_backup.py", line 272, in process raise BackupProcessingError("Cannot backup this file %r" % childpath) allmydata.scripts.tahoe_backup.BackupProcessingError: Cannot backup this file 'stockrt/Roger/etc/namedb' stockrt@host ~/ $ ll stockrt/Roger/etc/namedb lrwxrwxrwx 1 stockrt stockrt 21 2009-01-29 23:00 stockrt/Roger/etc/namedb -> /var/named/etc/namedb ``` This destination does not exists in my local filesystem: /var/named/etc/namedb but I have this (broken) link in my local "to backup" directory: stockrt/Roger/etc I think Tahoe should warn it but do not break just because of it. The same should occur for read/access permission denied for files and directories. Regards, Rogério Schneider
tahoe-lafs added the
unknown
trivial
defect
1.4.1
labels 2009-06-02 19:17:36 +00:00
tahoe-lafs added this to the undecided milestone 2009-06-02 19:17:36 +00:00

reformatted traceback

reformatted traceback
warner added
code-frontend-cli
and removed
unknown
labels 2009-07-11 11:32:25 +00:00
tahoe-lafs added
major
and removed
trivial
labels 2009-12-07 02:40:02 +00:00
davidsarah commented 2009-12-21 00:28:45 +00:00
Author
Owner

The failure for dangling symlinks is #641. (This ticket is not quite a duplicate because it also asks to tolerate other failures.)

The failure for dangling symlinks is #641. (This ticket is not quite a duplicate because it also asks to tolerate other failures.)
francois commented 2010-01-10 01:57:39 +00:00
Author
Owner

Attachment tahoe-backup-tolerate-errors.dpatch (39327 bytes) added

**Attachment** tahoe-backup-tolerate-errors.dpatch (39327 bytes) added
francois commented 2010-01-10 02:03:14 +00:00
Author
Owner

This patch renders tahoe backup more tolerant to special or unreadable files. A warning is displayed on stderr for each file which was skipped.

I think it's a good first step which allows tahoe backup to work out of the box for most users before it can store all those Unix specialties (see ticket #641).

This patch renders *tahoe backup* more tolerant to special or unreadable files. A warning is displayed on stderr for each file which was skipped. I think it's a good first step which allows *tahoe backup* to work out of the box for most users before it can store all those Unix specialties (see ticket #641).
tahoe-lafs modified the milestone from undecided to 1.6.0 2010-01-10 02:09:25 +00:00
davidsarah commented 2010-01-10 06:51:57 +00:00
Author
Owner
+    def test_ignore_symlinks(self):
+        if not hasattr(os, 'symlink'):
+            raise unittest.SkipTest("There is no symlink on this platform.")

Vista has symbolic links (NTFS junction points), but Windows Python probably doesn't have os.symlink yet. There is a mklink command that could be used to test behaviour with junction points. I don't think that the lack of a test on Windows should hold up this fix, though.

``` + def test_ignore_symlinks(self): + if not hasattr(os, 'symlink'): + raise unittest.SkipTest("There is no symlink on this platform.") ``` Vista has symbolic links (NTFS junction points), but Windows Python probably doesn't have `os.symlink` yet. There is a `mklink` command that could be used to test behaviour with junction points. I don't think that the lack of a test on Windows should hold up this fix, though.
davidsarah commented 2010-01-10 06:58:04 +00:00
Author
Owner

Wouldn't it be better to catch the exception caused by an access error, rather than testing using os.access etc?

Wouldn't it be better to catch the exception caused by an access error, rather than testing using `os.access` etc?

I agree, although I'd also like to avoid even trying to open some things. It would be pretty unfortunate to discover a copy of /dev/zero or /dev/urandom lying in your backup source tree.

Our discussions on #833 (dealing with illegal mutable children inside an immutable directory) led us to want "tahoe cp" to warn-and-skip "bad" things found on the tahoe side, and we figured that the same policy would be appropriate for "bad" things found on the local disk side (specifically symlinks and device files). I think it'd be reasonable to have "tahoe backup" do the same.

If so, the way I'd do it would be to have both "tahoe backup" and "tahoe cp" test the potential local source filesystem object with os.path.isfile and os.path.isdir before opening or entering it. These tests should rule out special files and symlinks (whether dangling or not). Skipping symlinks to directories is more important than skipping symlinks to files, but I think it'd be ok to skip both unless/until we come up with some reasonable way to record these sorts of things in tahoe directly.

If these tests pass, then we try to open the thing, and if that raises EnvironmentError (as would be caused by permission problems), we should emit a warning and skip over it. It might be nice to count how many such warnings we emit and dump a summary at the end, so that folks using --verbose never get to see that something went wrong. Also, we should consider having the "tahoe backup" exit code be non-zero if anything was skipped due to errors or non-normal-fileness.

I agree, although I'd also like to avoid even trying to open some things. It would be pretty unfortunate to discover a copy of /dev/zero or /dev/urandom lying in your backup source tree. Our discussions on #833 (dealing with illegal mutable children inside an immutable directory) led us to want "tahoe cp" to warn-and-skip "bad" things found on the tahoe side, and we figured that the same policy would be appropriate for "bad" things found on the local disk side (specifically symlinks and device files). I think it'd be reasonable to have "tahoe backup" do the same. If so, the way I'd do it would be to have both "tahoe backup" and "tahoe cp" test the potential local source filesystem object with `os.path.isfile` and `os.path.isdir` before opening or entering it. These tests should rule out special files and symlinks (whether dangling or not). Skipping symlinks to directories is more important than skipping symlinks to files, but I think it'd be ok to skip both unless/until we come up with some reasonable way to record these sorts of things in tahoe directly. If these tests pass, then we try to open the thing, and if that raises `EnvironmentError` (as would be caused by permission problems), we should emit a warning and skip over it. It might be nice to count how many such warnings we emit and dump a summary at the end, so that folks using --verbose never get to see that something went wrong. Also, we should consider having the "tahoe backup" exit code be non-zero if anything was skipped due to errors or non-normal-fileness.
francois commented 2010-01-19 22:58:26 +00:00
Author
Owner

Thanks Brian and David-Sarah for the comments, here is a newer version of the patch which takes your comments into account. What do you think now?

Thanks Brian and David-Sarah for the comments, here is a newer version of the patch which takes your comments into account. What do you think now?
francois commented 2010-01-19 22:58:46 +00:00
Author
Owner

Attachment tahoe-backup-tolerate-errors-v2.dpatch (47449 bytes) added

**Attachment** tahoe-backup-tolerate-errors-v2.dpatch (47449 bytes) added
davidsarah commented 2010-01-20 00:50:54 +00:00
Author
Owner

Looks good, just some nitpicks:

  • why except OSError: when listing a directory, but except IOError: when processing a file?
  • typo in the comment for upload: "when call" should be "when called"
Looks good, just some nitpicks: * why `except OSError:` when listing a directory, but `except IOError:` when processing a file? * typo in the comment for `upload`: "when call" should be "when called"
francois commented 2010-01-20 08:43:41 +00:00
Author
Owner

Replying to davidsarah:

  • why except OSError: when listing a directory, but except IOError: when processing a file?

It seems to be the way the Python stdlib works. But as Brian mentioned, I should probably catch the parent exception EnvironmentError instead in both cases.

  • typo in the comment for upload: "when call" should be "when called"

Thanks.

I'll send an updated patch.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/729#issuecomment-71508): > * why `except OSError:` when listing a directory, but `except IOError:` when processing a file? It seems to be the way the Python stdlib works. But as Brian mentioned, I should probably catch the parent exception `EnvironmentError` instead in both cases. > * typo in the comment for `upload`: "when call" should be "when called" Thanks. I'll send an updated patch.
francois commented 2010-01-20 09:49:17 +00:00
Author
Owner

Attachment tahoe-backup-tolerate-errors-v3.dpatch (51943 bytes) added

**Attachment** tahoe-backup-tolerate-errors-v3.dpatch (51943 bytes) added
francois commented 2010-01-24 21:23:18 +00:00
Author
Owner

Brian, Zooko, do you feel comfortable commiting this patch before 1.6?

Brian, Zooko, do you feel comfortable commiting this patch before 1.6?
davidsarah commented 2010-01-24 21:37:53 +00:00
Author
Owner

Reviewed, can't see any problems. Tests look sufficient.

Reviewed, can't see any problems. Tests look sufficient.

applied in changeset:b03406af9d216278. Thank you very much!

applied in changeset:b03406af9d216278. Thank you very much!
zooko added the
fixed
label 2010-01-26 15:32:50 +00:00
zooko closed this issue 2010-01-26 15:32:50 +00:00

I kinda think we should also be skipping symlinks in general (not just dangling ones). The os.path.isdir and os.path.isfile tests will return True for symlinks that point to directories and files. So if we want to skip symlinks, those tests need to turn into if os.path.isdir() and not os.path.islink(), etc.

I kinda think we should also be skipping symlinks in general (not just dangling ones). The `os.path.isdir` and `os.path.isfile` tests will return True for symlinks that point to directories and files. So if we want to skip symlinks, those tests need to turn into `if os.path.isdir() and not os.path.islink()`, etc.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#729
No description provided.