make tahoe backup keep more filesystem metadata #1325

Open
opened 2011-01-15 22:31:30 +00:00 by chrysn · 2 comments
chrysn commented 2011-01-15 22:31:30 +00:00
Owner

there is a number of probems due to which tahoe backup can not replace rsync style backups yet. the core of the problem is that tahoe-lafs can not keep all the information that is stored in a posix style file system. the issues i see are:

  • ctime/mtime is not saved
  • symlinks can not be saved (compare ticket #641, which has been around for two years)
  • other special files can not be saved (devices etc)
  • user, group and permissions are not saved
  • acls are not saved

i am aware that tahoe has its own ways of dealing with permissions, that it has its own time stamps, and that directories work in a way that every directory entry is kind of a link anyway, but that's not the point -- it's about being able to restore a disk's contents from a backup.

from my point of view, symlinks, times and user/group/permissions are the most important of these; device files are nowadays created on a ramdisk on the fly anyway, and acl users know the problem well enough to have their workarouds (afair this is an issue with most backup systems).

implementation-wise, i guess that most if not all of this can be stored in the directory as additional information.

if it is possible in trac, i suggest all related bugs to be marked as "blocking" this bug.

is this something that is realistic to achieve for tahoe-lafs?

there is a number of probems due to which `tahoe backup` can not replace rsync style backups yet. the core of the problem is that tahoe-lafs can not keep all the information that is stored in a posix style file system. the issues i see are: * ctime/mtime is not saved * symlinks can not be saved (compare ticket #641, which has been around for two years) * other special files can not be saved (devices etc) * user, group and permissions are not saved * acls are not saved i am aware that tahoe has its own ways of dealing with permissions, that it has its own time stamps, and that directories work in a way that every directory entry is kind of a link anyway, but that's not the point -- it's about being able to restore a disk's contents from a backup. from my point of view, symlinks, times and user/group/permissions are the most important of these; device files are nowadays created on a ramdisk on the fly anyway, and acl users know the problem well enough to have their workarouds (afair this is an issue with most backup systems). implementation-wise, i guess that most if not all of this can be stored in the directory as additional information. if it is possible in trac, i suggest all related bugs to be marked as "blocking" this bug. is this something that is realistic to achieve for tahoe-lafs?
tahoe-lafs added the
unknown
major
enhancement
1.8.1
labels 2011-01-15 22:31:30 +00:00
tahoe-lafs added this to the undecided milestone 2011-01-15 22:31:30 +00:00

I think this would require #307 (maybe add node metadata? (in addition to edge metadata)) and/or #947 (Add file-with-metadata caps). (Hm, maybe those two tickets should be merged.)

I think this would require #307 (maybe add node metadata? (in addition to edge metadata)) and/or #947 (Add file-with-metadata caps). (Hm, maybe those two tickets should be merged.)
chrysn commented 2011-01-18 14:39:02 +00:00
Author
Owner

two other issues came to my mind related to this, though both in the low-priority class:

  • sparse files (might actually be implemented, didn't test it)
  • hardlinks

hardlinks are not too much of an issue server-wise due to the backuping node using the same convergence key, but when restoring, the file gets duplicated. (hardlinking all files from the same readcap is not a good idea either as they might originally have been distinct but had equal contents.)

it might be reasonable to implement this by saving each file's device and inode number (i figure there has to be something compatible for each file system that provides hard links). that would solve it for the backup case (where all files are created more-or-less atomically), but is probably the wrong approach for a more general case where one wants to create arbitrary hard-links in tahoe-lafs. (one could argue that identical mutable file hashes are equivalent to hard-links, but then again that wouldn't work out too well for the backup scenario.)

two other issues came to my mind related to this, though both in the low-priority class: * sparse files (might actually be implemented, didn't test it) * hardlinks hardlinks are not too much of an issue server-wise due to the backuping node using the same convergence key, but when restoring, the file gets duplicated. (hardlinking all files from the same readcap is not a good idea either as they might originally have been distinct but had equal contents.) it might be reasonable to implement this by saving each file's device and inode number (i figure there has to be something compatible for each file system that provides hard links). that would solve it for the backup case (where all files are created more-or-less atomically), but is probably the wrong approach for a more general case where one wants to create arbitrary hard-links in tahoe-lafs. (one could argue that identical mutable file hashes are equivalent to hard-links, but then again that wouldn't work out too well for the backup scenario.)
zooko changed title from make `tahoe backup` useable as a replacement for rsync to make `tahoe backup` keep more filesystem metadata 2011-01-27 13:31:36 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1325
No description provided.