what happens when a file changes as you're copying it? #427
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#427
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
A long while ago, Zooko and I had a discussion about what might happen if
somebody changes a file while the Tahoe node is busy encoding it. I put that
discussion and some context on the old ChangingFilesWhileCopyingThem wiki
page. This this is more of a discussion than a published document, I've moved
the contents of that page into this ticket.
Context
Zooko and I were talking about whether we should encode the whole file to
shares first, then upload them, or whether to encode just one chunk at a time
and try to get it to all servers before moving to the next chunk. This turned
into a discussion about what happens when somebody changes a file while we're
encoding it.
The act of uploading a file initiates a process that takes non-zero time to
complete. If the user attempts to modify the file during this time, the
resulting uploaded file would most likely be incoherent.
One way to approach this is to copy the whole file into a temporary directory
before doing any encoding work. This reduces the window of vulnerability to
the time to perform the disk copy, at the expense of extra disk footprint and
disk IO.
Another approach is to use filesystem locking to prevent anybody from
modifying the file while the encode is in progress. This could keep the file
unmodifiable for a long time for a large file being pushed out over a slow
link when we insist upon getting all shares for a chunk pushed before moving
to the next chunk (or if just one of the upload targets is slow and we refuse
to buffer any shares, in the hopes of minimizing our disk footprint).
A third approach would be to make a hash of the file at the beginning of the
process, and then compute the same hash while we encode/upload the file. Just
before we finish, we compare the hashes. If they match, we tell the
leaseholders to commit and we report success (i.e. we modify the filetree
with the new file). If they don't, then we tell the leaseholders to abandon
their shares and we start again. Holding the file open during the whole
encode process protects it from deletion (and behaves nicely under unix, as
the directory entry itself can be deleted but our encode process gets to hold
on to the only remaining reference; under windows this would behave more like
file-locking which is annoying but at least correct). However it might
require a UI to at least warn the user that they shouldn't modify files while
we're uploading them because it causes us to waste time and bandwidth.
Here's a transcript of some of the discussion we had:
think about changing files while copying themto what happens when a file changes as you're copying it?