redefine "Healthy" to be "Happy" for checker/verifier/repairer #614
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#614
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Part of dreid's performance problem (in addition to the major part: #610, and the other consideration: #613) is that his client is uploading every file he has ever uploaded when the checker reports that the file is not "Healthy", with only 9 shares of the M=10 (K=3). Maybe we should redefine "Healthy" to be 7 shares and let numbers of shares greater than 7 be "super extra Healthy".
I choose 7 because that is the current default value of "shares of happiness". "shares of happiness" is a related notion: when you are doing an upload, if some of the attempts to upload shares fail, and you are left with 7 or more shares at the end, then you report to the user that the upload succeeded. If enough uploads fail so that you can't get more than 6 shares uploaded, then you immediately abort and report to the user that the upload failed. Maybe repairer ought to use the same heuristic as uploader does with regard to how many shares is enough to "call it good".
Per #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better), the definition of a "well stored" file should be a file for which there are "servers of happiness" distinct servers such that any
K
of them are sufficient to recover your file. This would be a good definition for "healthy" in checker/verifier/repairer -- don't bother repairing a file if there are already "servers of happiness" for that file.redefine "Healthy" to be 7 in 3-of-10 encodingto redefine "Healthy" to be "Happy" in 3-of-10 encodingredefine "Healthy" to be "Happy" in 3-of-10 encodingto redefine "Healthy" to be "Happy" for checker/verifier/repairerWhen this is fixed, remember to change the webapi.txt doc for
POST $URL?t=check
.#778 ("shares of happiness" is the wrong measure; "servers of happiness" is better) is fixed! Now we should change checker/verifier/repairer to report health (for the first two) and to trigger a repair (for the last one) only if the file in question lacks sufficient servers-of-happiness.
Kevan: if you are interested, you could investigate whether this is such an easy change to make that we could squeeze it into the v1.7 release.
(I doubt it.)
The checker/verifier would be straightforward to modify. The repairer works by downloading and re-uploading the file, regardless of how healthy it is (relying on the immutable file upload code to not do more work than it needs to), and doesn't explicitly consider file health in doing that. However, from what I can tell, you can't repair an immutable file without first checking its health, and the immutable filenode code won't repair a file unless the Checker/Verifier says that it is unhealthy, so it is probably enough to just change the definition of health in the Checker/Verifier.
I might have some time later this week to work on this, but getting it implemented correctly and clearly, testing it thoroughly, and having adequate code review by the 23rd might be optimistic.
Yeah, let's plan to finish this up in v1.8.0.
I'm just wondering if not doing this for v1.7.0 means there is a regression in v1.7.0. I don't think so, but I thought I should write down my notes here to be sure. The notion of whether a file is "healthy" according to checker-verifier and the notion of whether an upload was "successful" according to uploader-repairer differs (also true in earlier releases ever since repairer was originally implemented).
There are two ways the difference could manifest:
For a while I was worried about the second case as a potential regression in v1.7.0 because the new uploader (and therefore repairer) has more stringent requirements for what constitutes a successful upload than the v1.6 one did. I imagined that maybe every time someone's checker-repairer process ran the checker would report "This is not healthy" and then the repairer would run and do a lot of work but leave the file in such a state that the next time the checker looked at it the checker would still consider it to be unhealthy.
However, I don't believe this is a risk for v1.7 because the new uploader (repairer), while it could be satisfied with fewer shares available than the checker would be, it actually goes ahead and uploads all shares if possible, which would be enough to satisfy the v1.7 checker. In fact, this is the same pattern as the old v1.6 uploader, which would be satisfied with only shares-of-happiness shares being available (which is not enough to satisfy a checker/verifier) but goes ahead and uploads all
N
shares normally.So, there's no problem, but I thought I should write all this down just in case someone else detects a flaw in my thinking. Also if we ever implement #946 (upload should succeed as soon as the servers-of-happiness criterion is met) then we'll have to revisit this issue!
In changeset:31f66c5470641890:
See also: #1212, where we discuss and ultimately agree (I think?) on how a happiness-aware repairer/checker ought to work.
not making it into 1.9
There was discussion of this issue on tahoe-dev: [//pipermail/tahoe-dev/2013-March/008091.html]
I have started working on this ticket. You can view my progress here: https://github.com/markberger/tahoe-lafs/tree/614-happy-is-healthy
As of right now I've only written a couple unit tests that should pass before this ticket is closed. If someone could glance over them to make sure they're implemented correctly, I would really appreciate it. The solution to this problem will probably be a little hacky because h is not stored in the verify cap while k and n are. Whoever is working on the new cap design might want to consider storing h in the verify cap if servers-of-happiness is going to be used with the new caps.
markberger: great!
I wouldn't call it "hacky". Just say that the checker/verifier/repairer is judging whether a given file is happy according to its current user. Different users may have different happiness requirements for the same file when they run the checker/verifier/repairer. It is up to the person running the checker/verifier/repairer, not up to the person who originally uploaded the file, to decide what constitutes happiness for them.
After thinking about this ticket, I believe that it should be blocked by #1057 (use servers-of-happiness for mutable files). Continuing to diverge the behavior of immutable and mutable files only creates more trouble for users. Also it doesn't make sense to upload a file that the checker might deem unhealthy.
My github branch has a working patch for immutable files (some tests need to be altered because repair does not occur when there are 9 out of 10 shares) but I am going to put this ticket on hold and focus on #1057.
Let's revisit this ticket after we land #1382.