back up the content of a file even if the content changes without changing mtime #1937
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1937
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
From [//pipermail/tahoe-dev/2008-September/000809.html].
If an application writes to a file twice in quick succession, then the operating system may give that file the same
mtime
value both times.mtime
granularity varies between OSes and filesystems, and is often coarser than you would wish:¹ http://www.infosec.jmu.edu/documents/jmu-infosec-tr-2009-002.pdf
² http://msdn.microsoft.com/en-us/library/windows/desktop/ms724290%28v=vs.85%29.aspx
mtime
isn't necessarily updated until the filehandle is closed [¹, ²]Note that FAT is the standard filesystem for removable media (isn't it?), so it is actually very common.
Now the problem is, what happens if
an application writes some data,
D1
into a file, and the timestamp gets updated toT1
, and thentahoe backup
readsD1
, and thenthe app writes some new data,
D2
, and the timestamp doesn't get updated because steps 2 and 3 happened within the filesystem's granularity?What happens is that
tahoe backup
has savedD1
, but from then on it will never saveD2
, since it falsely believes it already saved it since its timestamp is stillT1
. If this were to happen in practice, the effect for the user would be that when they go to read the file from Tahoe-LAFS, they find the previous version of its contents —D1
— and not the most recent version —D2
. This unfortunately user would probably not have any way to figure out what happened, and would justly blame Tahoe-LAFS for being unreliable.The same problem can happen if the timestamp of a file gets reset to an earlier value, such as with the
touch -t
unix command, or by the system clock getting moved. (The system clock getting moved happens surprisingly often in the wild.)A user can avoid this problem by passing
--ignore-timestamps
totahoe backup
, which will cause that run oftahoe backup
to reupload every file. That is very expensive in terms of time, disk, and CPU usage (even if the files get deduplicated by the servers).Here's a proposed solution which avoids the failure of preservation due to the race condition. This solution does not address the problem due to timestamps getting reset, e.g. by
touch -t
or by the system clock getting moved.Let
G
be the local filesystem's worst-case Granularity in seconds times some fudge factor, such as2
. So if the filesystem is FAT, letG=4
, if the filesystem is ext4, letG=0.002
, if the filesystem is NTFS, letG=0.004
, else letG=2
.When
tahoe backup
examines a file, if the file's currentmtime
is withinG
seconds of the current time, then don't read its contents at that time, but instead delay forG
seconds and then try again.If we use the approach of comment:91280, then I suggest using a fixed G = 4s instead of trying to guess what the timestamp granularity is. Also, after the file has been uploaded we should check the mtime again, in case it was modified while we were reading it.
Short of making a shadow copy on filesystems that support it, it's not possible to get a completely consistent snapshot of a filesystem that is being modified, using POSIX APIs.
Replying to daira:
+1
Hm, I think this is a separate issue. The problem that this ticket seeks to address is that different-contents-same-mtime can lead to data loss. The issue you raise in this comment is, I think, #427.