backupdb and ext4 i_version/generation xattributes #1228
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1228
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I recently learned that several linux filesystems can track
version/generation numbers for local files. We could use this
information in the backupdb to improve the speed+reliability of
detecting files that have not been modified since the last time we
did a backup.
lsattr -v FOO.txt
shows a "version/generation number", andprobably works for even old ext2 filesystems.
[http://www.spinics.net/lists/linux-fsdevel/msg33753.html this] and
[http://linux-ima.sourceforge.net/ this] talk about "mounting a
filesystem with
i_version
support", and suggests that thefollowing ext4 extended-attributes will become available:
file.crtime
- actual file creation timefile.i_generation
- inode generation numberfile.i_version
("directories only") - inode data version numberIt's not yet clear to me what information is really available, or
how one might get to it (especially from python), but this ticket is to remind me that "tahoe
backup" would be a lot better if we could quickly and reliably
determine that a file had not changed. The filesize+timestamp
heuristic is useful, but it'd be nice to be able to do better. A
real generation number would be ideal, if the kernel promises to
update it reliably. A kernel-maintained hash of the filesystem contents would be great too (it would let us detect renames without reading the file contents).
Replying to warner:
The patch you linked to would have made this info accessible via "extended attributes" for ext4 filesystems, but it doesn't look like that patch was accepted.
In general, anything you can do from C, you can do from Python using
ctypes
(if the FFI overhead is not an issue, as it probably isn't in this case).It may be hard to use this safely, though: on my local XFS filesystems, and remote ZFS mounts, lsattr -v returns a number, but doesn't update it on writes. It fails (bad ioctl for device type) on a tmpfs, which is far safer. It's not clear to me where the values reported on XFS and ZFS filesystems are coming from (it's not the inode number, at least). Trying to set the values (using
chattr -v
) also fails on these filesystems.I like the idea of a kernel maintained per-file hash cache. I may play with this in my copious free time.
Replying to randombit:
We could potentially test whether this attribute is working correctly in a given directory subtree, by updating a test file and seeing whether it changes.