don't place shares on servers that already have shares #2107
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2107
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I think we should change it so that the Upload Strategy doesn't bother uploading a share to a server if that server already has a share of that file. I used to think it was worth uploading "extra" shares to a server that already had a share, because there are some (probably rare) cases where it can help with file recovery, and it never hurts file recovery, but I've changed my mind.
My main reason for changing my mind about this is that Diego "sickness" Righi is confused and dismayed by this behavior (e.g. #2106), and if a feature confuses and dismays users, then it is probably not worth it. Consider also Kevan Carstensen's master's thesis on Upload Strategy Of Happiness, which says:
That's a pretty good argument! Especially the part about "You're paying for that space, and if you upload 2 or 3 shares to one server, you're paying 2 or 3 times as much, but not getting much fault-tolerance in return for it.". See also this post on tahoe-dev.
I would add that if a user is diagnosing the state of their grid, or reasoning about possible future behavior of their grid, it will be more intuitive and easier for them (at least for many people) to think about if they can assume that shares will never be intentionally doubled-up.
I guess another reason to make this change is that Kevan's master's thesis assumes it (see the quote in the ticket description), so making this change would make our implementation match his analysis more precisely.
I think you're misinterpreting the thesis. It says not to put a given share on the same server if it already exists there; not that upload or repair can't create shares that are duplicates of existing shares but on different servers that already have (other) shares. The latter is clearly necessary to achieve happiness. (Proof: suppose that one server has all shnums, the remaining servers have shnum 0, and shares.happy > 2.)
To be clear, I think you're misinterpreting the word "store" in the first sentence of the thesis quote. It means "place on a given server", not "end up with on a given server".
Replying to daira:
Good point. Here are some possible rules:
• [rule]bad: Don't put a share on a server if that server already has a different share. Sometimes, as Daira points out, we would have to break that rule in order to achieve Servers-of-Happiness.
• [rule]bad: Don't put a share on a server if that share is already on another server. The same example Daira gave makes it so we would have to break this rule in order to achieve Servers-of-Happiness.
• [rule?]good: Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level.
• [rule?]good: Don't put a share on a server unless doing so is a necessary placement in an upload placement that would reach the required Servers-of-Happiness-Level
H
.Replying to zooko:
[numbering for ease of reference]added
I'm confused as to why we're trying to redesign this now. Doesn't the algorithm based on Kevan's thesis that Mark has already implemented (modulo some minor details) already satisfy 3?
Rule 4 seems ambiguous if "necessary" means "necessary to achieve happiness". The problem is that there can be multiple ways to achieve happiness using different subsets of a given share placement, and none of these may be "necessary" given that the others exist.
Okay, so can we all agree on Rule 3?
Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."
Well, no, that doesn't make sense either. Because suppose you have K=3, N=6 and there are 6 servers. Well, putting a share on the first server will not increase the happiness level! It will be 0 both before and after placing that one share.
So is there a rule which satisfies sickness's (and now my) intuition that it is confusing and usually-not-useful to add shares to a server that already has shares?
Replying to [zooko]comment:7:
No, it will be 1 after placing the share.
It was intentional that the definition of happiness distinguishes between happiness levels less than K, even though none of them are sufficient for the file to be recoverable (this was an advantage of the maximum matching formulation over the "number of servers, the survival and correct function of which will guarantee that your file is available" definition that preceded it; see /tahoe-lafs/trac-2024-07-25/issues/5840#comment:162).
Replying to [daira]comment:8:
Wait, what? I thought "Happiness" was defined as "The size of the largest set of servers such that any K-sized subset of it can recover your file". That number is 0 before and after uploading 1 share of a K=3 file. What is the definition of "Happiness"?
Servers of happiness is not a boolean value. It is a measurement of how effectively shares are distributed. For each distinctive server with any nonzero number of shares, the servers of happiness value increases by 1. In terms of how we model the problem, the servers of happiness value is len(E) where E is the set of edges used in the maximum matching bipartite graph of shares to servers.
In terms of your example zooko, when you place a share on one server, you are constructing an edge between that share and that server in the bipartite graph. Therefore, the servers of happiness value is 1.
However, we do use a boolean value to indicate whether a file is healthy. We should measure this value by h_actual >= h_min where h_actual is the happiness value of the share distribution and h_min is the minimum happiness level specified by the user. However, this hasn't been implemented yet (see ticket #614).
h_max is defined as min(len(shares), len(servers)) where shares and servers are sets. This is because we will always be limited by the smallest set in a maximum matching bipartite graph.
As per (@@https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2123#comment:93921@@) I've done a writeup at #2124 with a feature proposal that might be of interest here.
Thanks, markberger!
Replying to markberger:
Hm, isn't this true only if that share isn't already provided by another server? Or more precisely true only if some combination of the servers already holding shares wouldn't be able to provide the file in just as many cases without the addition of this new share?
Okay, so I think what this means is that sickness and amontero and me would be satisfied with Rule 3:
Rule 3: "Don't put a share on a server unless doing so would increase the Servers-of-Happiness-Level."
Does everyone agree that this is a good rule to enforce on an uploader or repairer? markberger? daira? sickness? amontero? brian?
rule 3 is ok to me too! :)
P.S.: that said I would like to stress my previous explanation ticket https://tahoe-lafs.org/trac/tahoe-lafs/ticket/2106 with another even more simple example:
let's assume we have a 2 servers "RAIC" setup which mimics the classic 2 local disks RAID1 setup
we want tahoe-lafs to write 1 share of any file on both servers so in case one of them dies, the other holds all the necessary shares to recover the file, something like 1:2:2
now what if a server dies, or is unreachable, and we write BOTH shares on the only one remaining server?
current uploader would be happy, and when the broken/gone server will be eventually back, current repairer wouldn't create a share on it because the other server already has 2 shares...
...ok now what if the server which holds both shares dies? :/
Replying to sickness:
Right. Good, simple example. Rule 3 would prevent the uploader or repairer from putting both shares on one server.
Replying to [zooko]comment:13:
Yes, due to the nature of a maximum matching bipartite graph, the servers of happiness value will only increase for shares that have not been placed somewhere else. And yes, when a share is placed on a server that already contains some nonzero number of shares, the servers of happiness value does not increase.
Replying to [zooko]comment:13:
Looks like to me it would fulfill the needs of my scenario of "not wasting space in any storage node (waste=shares >k)" (#2123).
Also, sickness' RAIC 1:2:2 example looks very appealing to me, with one question: would this modification allow you to add a 3rd. hot-spare node that would get its own share when repair is run? (ex. before decomissioning a soon-to-fail node)
In other words: would this place shares above N's limit on new nodes not having none?
Replying to [amontero]comment:17:
This would not. If you want that, then you should probably set N higher in the first place. E.g. you could have two servers, K=1, H=2, and set N=3. Then whenever the uploader or repairer runs, it puts one share on each of the two servers and considers itself to have succeeded. If you attach a third server, then it would attempt to upload a 3rd share to that 3rd server.
Replying to [zooko]comment:18:
Actually it will also put the 3rd share on one of the two servers.
In that case repair would put another copy of the 3rd share on the 3rd server, yes. When #1816 is fixed, only 3 shares would have their leases renewed (the redundant copy of the 3rd share on one of the first two servers would not).
Replying to [daira]comment:19:
Not if Rule 3 from this ticket (#2107) is implemented in the uploader!
Replying to [zooko]comment:20:
Huh. Well then I don't agree with Rule 3. I thought we'd already agreed that once all servers have at least one share, any excess shares up to
shares.total
should also be distributed fairly evenly between the available servers?Replying to [daira]comment:21:
No! We have certainly not all agreed to that. That is the opposite of what sickness and amontero, and to a lesser extend I myself want, which is the topic of this ticket. They want, if I understand correctly, that once all servers have at least one share, then any excess shares do not get uploaded to any servers.
(I've been too busy to pay attention, so apologies if me throwing out a random comment is not helpful.)
It seems clear that "servers of happiness" is an approximation to a richer property that cannot be described so simply. So a rule written in terms of SoH is going to be an approximation to the correct behavior.
The real question is maximizing the ability to retrieve the file, given some assumed probability distribution of a) a server losing a share, but staying up and b) a server going away, traded off against storage cost somehow. In particular, I'm not sure how often a) happens, and whether we want to support the notion of assigning different probabilities to different servers, or correlated probabilities. So two placements with the same SoH can still have different probabilities, and the higher one is still preferred.
A key issue is when the number of servers is less than the number of shares, vs when it's more. I suspect that behaviors are quite different then. We've been talking about that, but perhaps that great divide should be more front and center in the discussion. It's also not reasonable to expect people to tweak encoding as the number of servers changes. It should be sane to run 3/10 all the time, and just place more shares if there are 4 servers.
It seems obvious to me that better balance is better. But what about S=4, 3/10 encoding. Is 1,3,3,3 really worse than 2,3,2,3? Or is it better, because if 3 out of 4 are lost, there are 3 ways to win and one to lose, vs 2 and 2 with 2,3,2,3? So it seems that regardless of probabilities, 1,3,3,3 is more robust. I'm not sure how to generalize this, except that for k=3 one should not put more than k until all servers have k.
@zooko: So for 3/10, 4 servers, do you really want to stop with 1 share on each server? That seems gratuitously risky.
(I certainly agree that with 3/10 and S>10 servers, one wants 1 share on each of 10 servers.)
My vote (if I get one) is to leave this as it currently is. My first reaction when learning about erasure-coding was that Shares Needed and Total Shares were set in stone. I have always felt that !Share/Servers of Happiness was more of just a suggestion.
When I first started using the public test grid, I set several files to intentionally have more Shares Needed and Total Shares than there were total nodes in the grid. I figured that as the grid grew, I would be able to re-balance these shares to gain better redundancy.
Replying to PRabahy:
Wait, I'm sorry but I don't know what that means. The whole topic is dizzyingly confusing to me. It's amazing to me that after, what 14 years of working with wide-area decentralized erasure-coding, I can't even formulate a rigorous statement of what I want.
PRabahy: what do you think of "Rule 3"?
Where "Servers-of-Happiness-Level" is defined as:
(I think. Did I get that definition right??)
gdt: what do you think of Rule 3?
My guess is that none of the following people even understand what Servers of Happiness or Rule 3 mean: PRabahy, sickness, amontero, gdt, zooko, warner. That leaves daira, markberger, and Kevin Carstensen as the only people on the planet who understand what Servers of Happiness even means.
This is distressing to me, because of the notion that a thing isn't good enough just by providing good behavior, but in addition to that it also has to be understandable and predictable behavior to users. I'm currently thinking that no decentralized erasure-coding system can do that. Perhaps The only solution is to strip erasure-coding out of a future version of LAFS and make K always = 1 (i.e. straight replication). ;-)
So no, I did not really follow the s-o-h definition. I don't see why a k-sized subset is the right metric, if K is the shares-needed (e.g. 3 in 3-of-10 coding). If I have 4 servers and 3/10 encoding, then I certainly want any 3 servers to recover, but I want more than that, because losing 2 of 4 is not so unthinkable. What I want is to minimize the probability that the file will be unrecoverable, given some arguably reasonable assumptions, and aubject to not trying to store any particular share on more than one server, on the theory that such duplication is wasteful and should be done by changing the coding instead.
For S >> N, I see why s-o-h makes sense; it essentially measures the number of servers that have a share, but not quite. But I don't see how s-o-h strictly relates to probability of success (even assuming equal probability of node failure). And probability of success is what we should care about, not some intermediate metric that is hard to understand how it relates to what we care about.
Here's why I'm opposed to implementing this ticket now: The behaviour of spreading out shares between available servers in a round-robin fashion is very long-standing. That's the behaviour expected and possibly relied on by people who have chosen their encoding parameters with the current placement algorithm in mind. #1382 was supposed to make the minimal change to the placement algorithm necessary to achieve happiness in cases where #778 caused regressions. If we want to do something else, let's decide on that after getting feedback from release 1.11.
Replying to [zooko]comment:26:
(Most of the following echos what gdt said.)
For example, I have a file encoded 2of5 and a grid with 2 nodes. Once each node has a share, the file is recoverable and the "Servers-of-Happiness-Level" cannot increase any further. I believe that each node should receive 2 shares distinct shares and the 5th should be discarded. This will allow the file to be recovered from either node but not waste any space on the grid. If a 3rd node comes online and a repair is preformed, the 5th share should be regenerated and placed on the 3rd node (both original nodes keep their same shares). If a 4th node comes online and a repair is preformed, one of the original nodes should transfer a share to the new node.
Hopefully this is clear. I know how confusing matching intuition with formal definitions can be.
Replying to [zooko]comment:26:
Not exactly. Suppose that there is no K-sized subset of servers that can recover the file; then there is no such largest set, and so this wording would not give a well-defined result.
The correct definition is:
Servers-of-Happiness-Level: the size of a maximum matching in the bipartite graph relating servers to shares that they hold.
or equivalently:
Servers-of-Happiness-Level: the maximum number of links that can be drawn from servers to shares that they hold, such that no server and no share appears more than once.
The original definition could instead be repaired by setting the result to 0 when there is no K-sized subset that can recover the file, but that would be less useful. The difference doesn't matter (since the definitions coincide) for deciding when an upload or repair meets a happiness threshold
shares.happy
>= K, but it does matter for defining intermediate steps of a placement algorithm.Replying to [daira]comment:30:
I'd like to come up with a succinct specification of this which is understandable to people who don't know the phrases "matching" and "bipartite graph".
Closer! How about:
Servers-of-Happiness-Level: how many "server↔share" pairs are there, not counting any server more than once and not counting any share more than once.
Replying to [zooko]comment:31:
…following-up to my own post…
Oh, to be more precise, we have to specify that the Servers-of-Happiness-Level is the size of the largest such set of "server↔share" pairs. There is only one such set of "server↔share" pairs in the "ideal" case (e.g. number of servers >
N
, no server disconnects before an upload and then reconnects after an upload, etc.), but there could be different ways to choose "server↔share" pairs in non-ideal cases, such as:To determine what the Servers-of-Happiness-Level of this situation is, you have to pick a subset of the pairs so that the resulting subset doesn't have any server appearing more than once or any share appearing more than once. Therefore the Servers-of-Happiness-Level of this situation is
2
.So, this ticket (#2107) is about an uploader or repairer choosing not to upload a share to any server, if uploading that share would not increase the Servers-of-Happiness-Level. For example, in the case shown above, if the uploader is considering uploading share 0, it could do that an increase the Servers-of-Happiness-Level from
2
to3
:But once it has done that, if it is then considering uploading share 3, it cannot increase the Servers-of-Happiness-Level by uploading share 3, so if #2107 is accepted, it will not upload share 3 at all.
If I understand correctly, sickness, amontero, and recent-zooko are in favor of #2017 — do not upload a share if doing so does not increase the Servers-of-Happiness-Level — and gdt and previous-zooko are in favor of the opposite — go ahead and upload an extra share even if it doesn't increase Servers-of-Happiness-Level. I think PRabahy's comment:93944 is suggesting a different metric (which doesn't yet have a name), and in order to achieve that metric you have to upload shares even when uploading them doesn't increase the Servers-of-Happiness-Level.
Daira's comment:93943 is pretty persuasive to me: let's not do ticket #2107 until after Tahoe-LAFS v1.11 is released, on the basis of not changing multiple things at a time unnecessarily.
Servers-of-Happiness-Level: the size of the largest set of
server↔share
pairs, such that no server appears more than once in the set and no share appears more than once in the set.Whatever we ultimately decide about this, we're going to release Tahoe-LAFS v1.11 before we do anything!