backupdb and ext4 i_version/generation xattributes #1228

Open
opened 2010-10-20 00:20:49 +00:00 by warner · 3 comments

I recently learned that several linux filesystems can track
version/generation numbers for local files. We could use this
information in the backupdb to improve the speed+reliability of
detecting files that have not been modified since the last time we
did a backup.

lsattr -v FOO.txt shows a "version/generation number", and
probably works for even old ext2 filesystems.

[http://www.spinics.net/lists/linux-fsdevel/msg33753.html this] and
[http://linux-ima.sourceforge.net/ this] talk about "mounting a
filesystem with i_version support", and suggests that the
following ext4 extended-attributes will become available:

  • file.crtime - actual file creation time
  • file.i_generation - inode generation number
  • file.i_version ("directories only") - inode data version number

It's not yet clear to me what information is really available, or
how one might get to it (especially from python), but this ticket is to remind me that "tahoe
backup" would be a lot better if we could quickly and reliably
determine that a file had not changed. The filesize+timestamp
heuristic is useful, but it'd be nice to be able to do better. A
real generation number would be ideal, if the kernel promises to
update it reliably. A kernel-maintained hash of the filesystem contents would be great too (it would let us detect renames without reading the file contents).

I recently learned that several linux filesystems can track version/generation numbers for local files. We could use this information in the backupdb to improve the speed+reliability of detecting files that have not been modified since the last time we did a backup. `lsattr -v FOO.txt` shows a "version/generation number", and probably works for even old ext2 filesystems. [http://www.spinics.net/lists/linux-fsdevel/msg33753.html this] and [http://linux-ima.sourceforge.net/ this] talk about "mounting a filesystem with `i_version` support", and suggests that the following ext4 extended-attributes will become available: * `file.crtime` - actual file creation time * `file.i_generation` - inode generation number * `file.i_version` ("directories only") - inode data version number It's not yet clear to me what information is really available, or how one might get to it (especially from python), but this ticket is to remind me that "tahoe backup" would be a lot better if we could quickly and reliably determine that a file had not changed. The filesize+timestamp heuristic is useful, but it'd be nice to be able to do better. A real generation number would be ideal, if the kernel promises to update it reliably. A kernel-maintained hash of the filesystem contents would be great too (it would let us detect renames without reading the file contents).
warner added the
code-encoding
major
enhancement
1.8.0
labels 2010-10-20 00:20:49 +00:00
warner added this to the undecided milestone 2010-10-20 00:20:49 +00:00
warner self-assigned this 2010-10-20 00:20:49 +00:00
davidsarah commented 2010-10-20 04:01:42 +00:00
Owner

Replying to warner:

It's not yet clear to me what information is really available, or
how one might get to it (especially from python), [...]

The patch you linked to would have made this info accessible via "extended attributes" for ext4 filesystems, but it doesn't look like that patch was accepted.

In general, anything you can do from C, you can do from Python using ctypes (if the FFI overhead is not an issue, as it probably isn't in this case).

Replying to [warner](/tahoe-lafs/trac-2024-07-25/issues/6290): > It's not yet clear to me what information is really available, or > how one might get to it (especially from python), [...] [The patch you linked to](http://www.spinics.net/lists/linux-fsdevel/msg33753.html) would have made this info accessible via "extended attributes" for ext4 filesystems, but it doesn't look like that patch was accepted. In general, anything you can do from C, you can do from Python using `ctypes` (if the FFI overhead is not an issue, as it probably isn't in this case).
randombit commented 2010-10-25 18:23:37 +00:00
Owner

It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer. It's not clear to me where the values reported on XFS and ZFS filesystems are coming from (it's not the inode number, at least). Trying to set the values (using chattr -v) also fails on these filesystems.

I like the idea of a kernel maintained per-file hash cache. I may play with this in my copious free time.

It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer. It's not clear to me where the values reported on XFS and ZFS filesystems are coming from (it's not the inode number, at least). Trying to set the values (using `chattr -v`) also fails on these filesystems. I like the idea of a kernel maintained per-file hash cache. I may play with this in my copious free time.
davidsarah commented 2010-10-26 01:30:08 +00:00
Owner

Replying to randombit:

It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer.

We could potentially test whether this attribute is working correctly in a given directory subtree, by updating a test file and seeing whether it changes.

Replying to [randombit](/tahoe-lafs/trac-2024-07-25/issues/1228#issuecomment-80629): > It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer. We could potentially test whether this attribute is working correctly in a given directory subtree, by updating a test file and seeing whether it changes.
tahoe-lafs added
normal
and removed
major
labels 2012-04-01 05:05:17 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1228
No description provided.