what happens when a file changes as you're copying it? #427

New Issue

warner · 2008-05-30T01:10:15Z

warner commented

2008-05-30 01:10:15 +00:00

A long while ago, Zooko and I had a discussion about what might happen if
somebody changes a file while the Tahoe node is busy encoding it. I put that
discussion and some context on the old ChangingFilesWhileCopyingThem wiki
page. This this is more of a discussion than a published document, I've moved
the contents of that page into this ticket.

Context

Zooko and I were talking about whether we should encode the whole file to
shares first, then upload them, or whether to encode just one chunk at a time
and try to get it to all servers before moving to the next chunk. This turned
into a discussion about what happens when somebody changes a file while we're
encoding it.

The act of uploading a file initiates a process that takes non-zero time to
complete. If the user attempts to modify the file during this time, the
resulting uploaded file would most likely be incoherent.

One way to approach this is to copy the whole file into a temporary directory
before doing any encoding work. This reduces the window of vulnerability to
the time to perform the disk copy, at the expense of extra disk footprint and
disk IO.

Another approach is to use filesystem locking to prevent anybody from
modifying the file while the encode is in progress. This could keep the file
unmodifiable for a long time for a large file being pushed out over a slow
link when we insist upon getting all shares for a chunk pushed before moving
to the next chunk (or if just one of the upload targets is slow and we refuse
to buffer any shares, in the hopes of minimizing our disk footprint).

A third approach would be to make a hash of the file at the beginning of the
process, and then compute the same hash while we encode/upload the file. Just
before we finish, we compare the hashes. If they match, we tell the
leaseholders to commit and we report success (i.e. we modify the filetree
with the new file). If they don't, then we tell the leaseholders to abandon
their shares and we start again. Holding the file open during the whole
encode process protects it from deletion (and behaves nicely under unix, as
the directory entry itself can be deleted but our encode process gets to hold
on to the only remaining reference; under windows this would behave more like
file-locking which is annoying but at least correct). However it might
require a UI to at least warn the user that they shouldn't modify files while
we're uploading them because it causes us to waste time and bandwidth.

Here's a transcript of some of the discussion we had:

(15:05:47) warnerdata: yeah, so while the ideal case is that we finish writing all the shares for chunk[0] before working on chunk[1], I think we need to be prepared to buffer some (bounded) amount on disk to deal with the slowest server
(15:05:49) Zooko: So since we want to make a copy, in order to allow the data to be stored by the network-sending-code ...
(15:05:56) warnerdata: zero copy is not necessary :)
(15:06:36) Zooko: We need to think through this a bit more.  How is the bound enforced?
(15:06:37) warnerdata: we just have to bound the memory footprint. I've been vaguely thinking about a chunksize of 1MB or so, which means that for each copy we get a footprint += 1MB, which isn't a big deal
(15:06:45) warnerdata: it's a soft bound
(15:06:57) Zooko: One option is that no server can receive the K+1 share until all servers have received the K share.
(15:07:18) Zooko: K was a bad variable name for that.
(15:07:24) Zooko: I mean, you know, the i'th share.
(15:07:59) warnerdata: yeah
(15:08:24) Zooko: And.. we could time-out the transaction so that if some servers haven't accepted their shares in time then we drop them and continue with the remaining servers...
(15:08:33) Zooko: ... possibly coming back around for another try later.
(15:08:39) Zooko: Bleah.  Complexity is happening...
(15:09:32) warnerdata: let's assume one bound for memory footprint and a second, much higher bound for disk footprint, and if a server is slow we're allowed to write their shares out to disk for later delivery
(15:09:47) warnerdata: as long as we *can* write the shares to disk, this is a problem we can address later
(15:09:58) Zooko: Hm.  Ok.
(15:10:02) ***Zooko thinks.
(15:10:22) Zooko: So something like-- for each share, try to send it, after some indication of failure-or-timeout, then write it to disk.
(15:10:26) Zooko: Ok.
(15:10:41) Zooko: Actually, it's probably more sensible to just regenerate it from the original file.
(15:10:45) Zooko: Except that file is mutable.  Hm.
(15:10:53) Zooko: Ok.
(15:14:08) warnerdata: hm, really we want a copy-on-write link to the file
(15:14:16) Zooko: ...  :-)
(15:14:18) Zooko: Oh well.
(15:14:33) Zooko: Let's..  Hm.  We *do* have to think a little bit about file-locking just during the encode phase.
(15:14:42) warnerdata: on unix at least I'm thinking we should make a hardlink to the file from within a node-local tempdir
(15:14:51) Zooko: But I think we should then write everything out to disk (perhaps after a quick attempt to send it down the network).
(15:15:04) Zooko: hardlinks aren't copy-on-write...
(15:15:46) warnerdata: I'm also ok with doing a hash pass when we first decide to start uploading the file, and doing a second pass as we actually encode it, and if they differ then we tell the leaseholders "nevermind" and start the process again
(15:16:05) Zooko: That's not fool-proof of course...
(15:16:14) Zooko: Can't we tell Windows to lock the file, do the encoding, then unlock it?
(15:16:33) warnerdata: maybe. do we worry about this for the current codebase?
(15:17:10) Zooko: I feel that we need to have that for a good product in the future.  For right now, it just means to me that we should write encoded shares to disk instead of relying on the original file being available later.
(15:17:19) Zooko: How does that sound to you?
(15:17:30) warnerdata: hm
(15:18:29) Zooko: http://en.wikipedia.org/wiki/File_locking
(15:18:43) Zooko: Hm.  I think this is saying that by default, having an open file descriptor is sufficient to do the locking that we want.
(15:19:36) warnerdata: on one hand I like that, it fits with the persistent workqueue I've been writing. on the other hand it makes me think that most of the files we'll be uploading will not mutate during the encode/push window and that it means unnecessarily increasing our disk footprint, and will require users to have free space of at least 4x the size of the file they're uploading
(15:20:19) Zooko: Ah -- the hash trick *is* safe, since we have locking during the duration of the open file descriptor.
(15:20:36) Zooko: GRR.
(15:20:38) Zooko: C-w deleted the window.
(15:20:47) warnerdata: maybe if we think that the file has a high chance of being mutated during encode (say, it's rather large, or our upload speed is noticibly slow, or the user tells us explicitly) then we make private copy of the source file. Rather than putting shares on disk, we regenerate them if necessary. That's a disk-vs-CPU tradeoff.
(15:20:58) Zooko: So, we can open (and lock) the file, do a pass of hashing and encoding, and remember the hash.
(15:21:18) Zooko: Later we can do the same thing, but..  Oops, we can't tell until we've finished the hashing whether the shares we've been producing are valid...
(15:21:32) Zooko: I think we can use Windows file locking instead of making a copy of the file.
(15:22:43) Zooko: We require user disk space only in the case that some servers are too slow or completely disconnected.
(15:23:03) Zooko: The most common case is, I think, that shares hang around in memory while being shipped out to servers, and then we move onto the next segment.
(15:24:10) Zooko: What is the user interface meaning of the event that we uploaded some shares to some servers, but other servers failed, and then when we went back to finish up we found that the file had changed?
(15:24:21) Zooko: Well, we can avoid that by keeping it locked as long as we are still trying to back it up.
(15:24:50) Zooko: That's the simplest semantics (from the user's perspective, I think), and also requires no extra disk space: keep the file locked until we're satisfied with the backup.
(15:25:54) warnerdata: hrm, maybe. I think it will depend upon how long that process takes. I'd like this to behave as much as possible as normal 'cp'
(15:26:15) warnerdata: I don't usually think about what happens if I change a file while I'm copying it.. probably because I never do it.
(15:26:25) Zooko: If we unlock it before being satisfied of the backup, then we have to clarify what are the semantics of it changing after we've begun backup but before we've completed.
(15:26:47) warnerdata: well, the hash pass was what I had in mind for that
(15:26:51) Zooko: Intuitively to me, if a user says they want ...
(15:27:08) Zooko: The hash pass allows us to determine if it has changed, but doesn't tell us what our responsibility is to the user.
(15:27:12) warnerdata: but I guess that would mean throwing an error or some other UI-requiring thing
(15:27:14) warnerdata: right
(15:27:21) Zooko: Suppose he clicks "BACK ME UP" on a file, and then while it is working, he goes and edits that file and saves.
(15:27:39) warnerdata: what would happen right now if the BACK ME UP button just copied it to a (slow) disk?
(15:27:45) Zooko: So with the locking mechanism, it is very clear: he can't open the file for editing in the first palce.
(15:27:47) warnerdata: what would that user expect? or be happy with?
(15:27:57) ***Zooko thinks.
(15:28:10) warnerdata: I have to admit, I don't think I am qualified to answer this question
(15:28:26) warnerdata: I'd want to observe "normal" users and see what their expectations are
(15:28:28) Zooko: Trust your intuition.  Your eyes can deceive you.  Leave the blast shield down.
(15:28:32) warnerdata: hah!
(15:28:35) Zooko: I agree, real live user testing would be perfect.
(15:28:48) warnerdata: also, I think we can defer this issue for a couple weeks or months
(15:28:50) Zooko: So, what's faster: cp the file to temp file on disk, or encode it and transmit it to servers?
(15:29:00) Zooko: Okay, the former is faster.
(15:29:07) Zooko: Yes, let's defer.
(15:29:18) Zooko: http://en.wikipedia.org/wiki/Volume_Shadow_Copy_Service
(15:29:31) warnerdata: ooh, I'll make a trac page for outstanding issues and copy this discussion into it
(15:29:46) Zooko: Cool.

A long while ago, Zooko and I had a discussion about what might happen if somebody changes a file while the Tahoe node is busy encoding it. I put that discussion and some context on the old [ChangingFilesWhileCopyingThem](wiki/ChangingFilesWhileCopyingThem) wiki page. This this is more of a discussion than a published document, I've moved the contents of that page into this ticket. ## Context Zooko and I were talking about whether we should encode the whole file to shares first, then upload them, or whether to encode just one chunk at a time and try to get it to all servers before moving to the next chunk. This turned into a discussion about what happens when somebody changes a file while we're encoding it. The act of uploading a file initiates a process that takes non-zero time to complete. If the user attempts to modify the file during this time, the resulting uploaded file would most likely be incoherent. One way to approach this is to copy the whole file into a temporary directory before doing any encoding work. This reduces the window of vulnerability to the time to perform the disk copy, at the expense of extra disk footprint and disk IO. Another approach is to use filesystem locking to prevent anybody from modifying the file while the encode is in progress. This could keep the file unmodifiable for a long time for a large file being pushed out over a slow link when we insist upon getting all shares for a chunk pushed before moving to the next chunk (or if just one of the upload targets is slow and we refuse to buffer any shares, in the hopes of minimizing our disk footprint). A third approach would be to make a hash of the file at the beginning of the process, and then compute the same hash while we encode/upload the file. Just before we finish, we compare the hashes. If they match, we tell the leaseholders to commit and we report success (i.e. we modify the filetree with the new file). If they don't, then we tell the leaseholders to abandon their shares and we start again. Holding the file open during the whole encode process protects it from deletion (and behaves nicely under unix, as the directory entry itself can be deleted but our encode process gets to hold on to the only remaining reference; under windows this would behave more like file-locking which is annoying but at least correct). However it might require a UI to at least warn the user that they shouldn't modify files while we're uploading them because it causes us to waste time and bandwidth. Here's a transcript of some of the discussion we had: ``` (15:05:47) warnerdata: yeah, so while the ideal case is that we finish writing all the shares for chunk[0] before working on chunk[1], I think we need to be prepared to buffer some (bounded) amount on disk to deal with the slowest server (15:05:49) Zooko: So since we want to make a copy, in order to allow the data to be stored by the network-sending-code ... (15:05:56) warnerdata: zero copy is not necessary :) (15:06:36) Zooko: We need to think through this a bit more. How is the bound enforced? (15:06:37) warnerdata: we just have to bound the memory footprint. I've been vaguely thinking about a chunksize of 1MB or so, which means that for each copy we get a footprint += 1MB, which isn't a big deal (15:06:45) warnerdata: it's a soft bound (15:06:57) Zooko: One option is that no server can receive the K+1 share until all servers have received the K share. (15:07:18) Zooko: K was a bad variable name for that. (15:07:24) Zooko: I mean, you know, the i'th share. (15:07:59) warnerdata: yeah (15:08:24) Zooko: And.. we could time-out the transaction so that if some servers haven't accepted their shares in time then we drop them and continue with the remaining servers... (15:08:33) Zooko: ... possibly coming back around for another try later. (15:08:39) Zooko: Bleah. Complexity is happening... (15:09:32) warnerdata: let's assume one bound for memory footprint and a second, much higher bound for disk footprint, and if a server is slow we're allowed to write their shares out to disk for later delivery (15:09:47) warnerdata: as long as we *can* write the shares to disk, this is a problem we can address later (15:09:58) Zooko: Hm. Ok. (15:10:02) ***Zooko thinks. (15:10:22) Zooko: So something like-- for each share, try to send it, after some indication of failure-or-timeout, then write it to disk. (15:10:26) Zooko: Ok. (15:10:41) Zooko: Actually, it's probably more sensible to just regenerate it from the original file. (15:10:45) Zooko: Except that file is mutable. Hm. (15:10:53) Zooko: Ok. (15:14:08) warnerdata: hm, really we want a copy-on-write link to the file (15:14:16) Zooko: ... :-) (15:14:18) Zooko: Oh well. (15:14:33) Zooko: Let's.. Hm. We *do* have to think a little bit about file-locking just during the encode phase. (15:14:42) warnerdata: on unix at least I'm thinking we should make a hardlink to the file from within a node-local tempdir (15:14:51) Zooko: But I think we should then write everything out to disk (perhaps after a quick attempt to send it down the network). (15:15:04) Zooko: hardlinks aren't copy-on-write... (15:15:46) warnerdata: I'm also ok with doing a hash pass when we first decide to start uploading the file, and doing a second pass as we actually encode it, and if they differ then we tell the leaseholders "nevermind" and start the process again (15:16:05) Zooko: That's not fool-proof of course... (15:16:14) Zooko: Can't we tell Windows to lock the file, do the encoding, then unlock it? (15:16:33) warnerdata: maybe. do we worry about this for the current codebase? (15:17:10) Zooko: I feel that we need to have that for a good product in the future. For right now, it just means to me that we should write encoded shares to disk instead of relying on the original file being available later. (15:17:19) Zooko: How does that sound to you? (15:17:30) warnerdata: hm (15:18:29) Zooko: http://en.wikipedia.org/wiki/File_locking (15:18:43) Zooko: Hm. I think this is saying that by default, having an open file descriptor is sufficient to do the locking that we want. (15:19:36) warnerdata: on one hand I like that, it fits with the persistent workqueue I've been writing. on the other hand it makes me think that most of the files we'll be uploading will not mutate during the encode/push window and that it means unnecessarily increasing our disk footprint, and will require users to have free space of at least 4x the size of the file they're uploading (15:20:19) Zooko: Ah -- the hash trick *is* safe, since we have locking during the duration of the open file descriptor. (15:20:36) Zooko: GRR. (15:20:38) Zooko: C-w deleted the window. (15:20:47) warnerdata: maybe if we think that the file has a high chance of being mutated during encode (say, it's rather large, or our upload speed is noticibly slow, or the user tells us explicitly) then we make private copy of the source file. Rather than putting shares on disk, we regenerate them if necessary. That's a disk-vs-CPU tradeoff. (15:20:58) Zooko: So, we can open (and lock) the file, do a pass of hashing and encoding, and remember the hash. (15:21:18) Zooko: Later we can do the same thing, but.. Oops, we can't tell until we've finished the hashing whether the shares we've been producing are valid... (15:21:32) Zooko: I think we can use Windows file locking instead of making a copy of the file. (15:22:43) Zooko: We require user disk space only in the case that some servers are too slow or completely disconnected. (15:23:03) Zooko: The most common case is, I think, that shares hang around in memory while being shipped out to servers, and then we move onto the next segment. (15:24:10) Zooko: What is the user interface meaning of the event that we uploaded some shares to some servers, but other servers failed, and then when we went back to finish up we found that the file had changed? (15:24:21) Zooko: Well, we can avoid that by keeping it locked as long as we are still trying to back it up. (15:24:50) Zooko: That's the simplest semantics (from the user's perspective, I think), and also requires no extra disk space: keep the file locked until we're satisfied with the backup. (15:25:54) warnerdata: hrm, maybe. I think it will depend upon how long that process takes. I'd like this to behave as much as possible as normal 'cp' (15:26:15) warnerdata: I don't usually think about what happens if I change a file while I'm copying it.. probably because I never do it. (15:26:25) Zooko: If we unlock it before being satisfied of the backup, then we have to clarify what are the semantics of it changing after we've begun backup but before we've completed. (15:26:47) warnerdata: well, the hash pass was what I had in mind for that (15:26:51) Zooko: Intuitively to me, if a user says they want ... (15:27:08) Zooko: The hash pass allows us to determine if it has changed, but doesn't tell us what our responsibility is to the user. (15:27:12) warnerdata: but I guess that would mean throwing an error or some other UI-requiring thing (15:27:14) warnerdata: right (15:27:21) Zooko: Suppose he clicks "BACK ME UP" on a file, and then while it is working, he goes and edits that file and saves. (15:27:39) warnerdata: what would happen right now if the BACK ME UP button just copied it to a (slow) disk? (15:27:45) Zooko: So with the locking mechanism, it is very clear: he can't open the file for editing in the first palce. (15:27:47) warnerdata: what would that user expect? or be happy with? (15:27:57) ***Zooko thinks. (15:28:10) warnerdata: I have to admit, I don't think I am qualified to answer this question (15:28:26) warnerdata: I'd want to observe "normal" users and see what their expectations are (15:28:28) Zooko: Trust your intuition. Your eyes can deceive you. Leave the blast shield down. (15:28:32) warnerdata: hah! (15:28:35) Zooko: I agree, real live user testing would be perfect. (15:28:48) warnerdata: also, I think we can defer this issue for a couple weeks or months (15:28:50) Zooko: So, what's faster: cp the file to temp file on disk, or encode it and transmit it to servers? (15:29:00) Zooko: Okay, the former is faster. (15:29:07) Zooko: Yes, let's defer. (15:29:18) Zooko: http://en.wikipedia.org/wiki/Volume_Shadow_Copy_Service (15:29:31) warnerdata: ooh, I'll make a trac page for outstanding issues and copy this discussion into it (15:29:46) Zooko: Cool. ```