repair of mutable files/directories should not increment the sequence number #1209
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1209
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Particularly with my root directory, I often find that 9 shares of seqN are available compared to 10 desired. I do a repair, and this results in 10 shares of seqN+1 and then 9 are deleted. Then the next day there are 9 of seqN+1 and 1 of seqN, and the file is again not healthy. This repeats daily.
It seems that the missing seqN shares should be generated and placed, and then when servers churn more it's likely that 10 can still be found, and no unrecoverable versions. Perhaps I don't get something, but the current behavior is not stable with intermittent servers.
I have observed this problem with directories, but it seems likely that it applies to all
immutable files.gdt: did you mean "mutable" instead of "immutable"? Immutable files don't have a sequence number!
Sorry, I really meant directories in particular. I have edited the summary and description.
repair of immutable files should not increment the sequence numberto repair of directories (all immutable files) should not increment the sequence numberClarify that the ticket is really about directories, but that it likely applies to all immutable files.
repair of directories (all immutable files) should not increment the sequence numberto repair of directories (all immutable files?) should not increment the sequence numberReplying to gdt:
You mean it likely applies to all mutable files, right? :-) Directories are normally mutable, although it is also possible to have immutable directories. But immutable directories don't have sequence numbers. :-)
repair of directories (all immutable files?) should not increment the sequence numberto repair of mutable files/directories should not increment the sequence number#1210 was a duplicate. Its description was:
This is a great idea, especially since one of the failure modes for mutable files (when a share is migrated to a new server, causing the write-enabler to become wrong, causing the share to get "stuck" at some old version) is that it's never going to be able to get rid of that old share, so the file will always appear "unhealthy". In that case, constantly clobbering the perfectly-valid most-recent-version shares is obviously wrong.
Brian wrote on tahoe-dev:
Greg Troxel wrote:
Brian replied:
Replying to Brian:
Counterexample: suppose there is a recoverable version with sequence number S, and an unrecoverable version with sequence number S+1. (There are no versions with sequence number >= S+2.) Then, the new sequence number needs to be S+2, in order for clients not to use the shares from S+1. Deleting the shares with sequence number S+1 is also possible but inferior, since sequence numbers should be monotonically increasing to minimize race conditions.
+1.
Proposed algorithm:
This would also fix #1004, #614, and #1130. IIUC, an implementation of the /tahoe-lafs/trac-2024-07-25/issues/6192#comment:80210 algorithm is being worked on in #1382.
I think davidsarah's proposed algorithm is a good choice. A few comments:
Replying to gdt:
I agree. To make this more explicit:
Otherwise,
(The reason to only do the removal when the Recovered version is happy, is in case of a partition where different clients can see different subsets of servers. In that case, removing shares is a bad idea unless we know that the Recovered version has been stored reliably, not just recoverably. Also, we shouldn't remove shares from servers that weren't considered in steps 1 and 2, if they have connected in the meantime.)
The algorithm says to upload to version S+1 in that case. I think this is the right thing.
I believe that davidsarah's algorithm should help for most cases. Does it make sense to use a vector clock (http://en.wikipedia.org/wiki/Vector_clock) for even more robustness. I don't believe this should happen often (ever), but if there is a partial brain split where several nodes can connect to more than S storage servers but less than half the total storage servers in the grid there could still be significant churn. (Did that last sentence make any sense?) I'm probably over-thinking this solution so feel free to ignore me.
Regarding vector clock, I wouldn't say overthinking, but I think that tahoe needs to have a design subspace within the space of all network and user behaviors. To date, tahoe has been designed for (by observing what's been done) the case where all servers are almost always connected. I'm talking about a case where at any given time most servers are connected, and most servers are connected almost all the time. In that world, not being connected is still unusual and to be avoided, but when you have 20 servers, the odds that one is missing are still pretty high.
So I think this ticket should stay focused on the mostly-connected case. If you want to work on a far more distributed less available use case, good luck, but I think it's about 50x the work of fixing /tahoe-lafs/trac-2024-07-25/issues/6271.
The vector clock algorithm is designed to ensure causal ordering (assuming cooperative peers) in a general peer-to-peer message passing system. It's inefficient and overcomplicated for this case, and the assumptions it relies on aren't met in any case.
Incidentally, the comment:19 algorithm never makes things worse even in the case of partition, because it only deletes shares of a lower version than a version that is stored happily, even when run on an arbitrary partition of a grid.