shares.happy is the wrong name of the measure #1092
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
5 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1092
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
There is a configuration option named
shares.happy
which is how you control the servers-of-happiness value. It is mis-named! It should be namedservers.happy
. Of course, it belongs right next toshares.needed
andshares.total
, but hopefully placement and docs can make their intimate relationship clear. Also,shares.needed
serves double-duty. It means both:Maybe that name should also be changed or at least documented even more carefully.
Assigning to Brian. The next step on this ticket is for Brian to study the new servers-of-happiness feature (#778) and let us know what he thinks about it, both in general and in regard to this specific issue.
I'm attaching a patch that changes
shares.happy
toservers.happy
. The client now ignoresshares.happy
, since it doesn't make a lot of sense to useshares.happy
forservers.happy
, given the differences between the two robustness metrics. Should we make the startup code print a warning if it doesn't find aservers.happy
but does find ashares.happy
?I've defined
servers.happy
with the default value of 1; this means that servers of happiness checks will be disabled for nodes without aservers.happy
directive in theirtahoe.cfg
(including the result oftahoe create-node
).I don't think there's a particularly convincing argument for leaving the default at 7; probably the only good it is doing is forcing people to reason about their grid when they have to go in and edit
tahoe.cfg
when their uploads fail because their "Hello, world!" grid isn't big enough to satisfyservers.happy=7
. There are probably friendlier ways to do that :-). I'm open to being convinced for a value that isn't 1, but I think that there's something to be said for giving the user the information that they need to set the value sensibly and staying out of their way until they do that.(I don't have a clear opinion yet on
shares.needed
, since I hadn't thought about that until I read the ticket this morning)Attachment 1092.dpatch (8527 bytes) added
-1 on the servers.happy.
If we're going to change, I think it would be good to also pick a different word than happy. There's an important concept lurking under a seemingly flippant word.
bWhat's really going on is that this single variable is a rough first cut at ensuring that there is adequate redundancy based on some policy and some knowledge of physical and administrative correlation among servers. I see the 3/7/10 values as very closely linked, and changing shares to servers makes that less clear.
I do agree that shares.happy gives the wrong impression. So I'll suggest "shares.independent", with the meaning being "the minimum number of shares that must be on independent servers". I think that's what is meant, and this keeps the parallelism of shares.* and clarifies this variable. One could have shares.independent and shares.independent-target, but I'm not sure independent-target needs to be different from total.
The current ordering gives the impression that shares.needed are shares.total are more independent than they are. So perhaps "shares.coding = (3, 10)" would be better than two variables. (I am under the impression that I can't just set shares.total to 12 and reconstruct those missing sh10, sh11 without having to recode the entire file; if I'm confused on that point this paragraph is invalid.)
3/7/10 seems reasonable, and I've been using 2/5/7. I don't think it makes sense to talk about the right value of shares.independent/shares.happy without considering the whole 3-tuple.
Thinking about kevan's comments on the default, I think there are two use cases: setting up a single node with storage to play with tahoe for the very first time, and actually wanting to store bits. 1 is definitely not a good value for actual use. So perhaps there should be "tahoe create-test-node" that has encoding parameters set up for demo use, where the node is client, server, and introducer. Then create-node can be tuned for real use.
Replying to kevan:
A value of 1 means that at least one share has been placed (it is vacuously true that it is on an independent server). This isn't sufficient for the file to be retrievable.
We should probably require that at least
k
shares are placed in order for an upload or repair to succeed, regardless of the happiness threshold. In that case happiness thresholds less thank
would make more sense.Independently of that, I don't think that 1 is a sensible default. Even for a toy grid that is only being created for someone to see that Tahoe works, it's not unreasonable to require at least two servers. If the happiness threshold is 1, then even if there are no other servers, uploads will succeed by putting shares on the gateway, provided it has sufficient space. I don't think they should succeed (by default) in that case.
You know, I actually kinda like servers.happy=1, probably because I still
haven't internalized the whole bijective-mapping-of-servers concept yet. (I
mean, I know what's going on, yet each time that error appears, I walk away
in confusion because the text of the error message is so hard to follow, so
it leaves a general taste in my mouth that the whole idea is bad, even though
I know it's not really that bad)
Kevan's arguments in the first comment are spot on. "forcing people to reason
about their grid" needs to happen in a friendlier place than the error
message.
gdt's comment about the flippant use of "happy" is accurate too. I originally
picked that for shares-of-happiness because it was a somewhat arbitrary
threshold appliedin a very narrow and probably-rare error case (you've
connected to enough servers at the start of the upload, but then some were
lost by the time you finished.. do you still declare success? are you still
happy?)
(you're correct: you can't go from 3-of-10 to 3-of-12 without reencoding the
whole file. raw zfec would treat them the same, but the share-hash-trees that
tahoe adds for integrity checking would be different, so we fold both k and N
into the CHK hash, so you'll get an entirely different encryption key and
share data anyways)
Yeah, combining two tahoe.cfg directives into one might be a good idea. In
fact, it should be phrased in the same way we talk about it in english:
client
shares.encoding = 3-of-10
I get the impression that this issue is more about "servers" than about
"shares", so I wonder if maybe it ought to be "servers.independent". I know
the math touches both, but I'd like to give users the ability to learn how
this works in chunks, where the first chunk is only about shares ("3-of-10, I
need 3 distinct shares, doesn't matter where they come from, ok, got it"),
and then a later chunk is about where those shares are placed ("oh, right,
what happens if there aren't enough servers?"). Maybe, if all the "shares."
configuration fit into the first chunk, then all the controls that involve
servers (even though they also involve shares) could be put into a different
namespace and support the user's concept of a second chunk of things to
learn. "servers." might support that.
I'm still undecided about what the default "use-case" ought to be. I think
it's vital that folks be able to bring up a small grid and test it out. I
also think it's important to protect "tahoe backup" users against the trivial
case where you're only putting shares on yourself. Maybe what I'm really
wishing for were better #467 explicit-server-selection code and UI. Maybe I'm
coming around to the idea that diversity trumps write-availability: if you
have some way of configuring (or at least acknowledging) who you're
supposed to connect to, then you could fail writes unless all those servers
were present. Maybe a set of checkboxes on the known-servers web page,
meaning "don't allow uploads to succeed unless this server is present". Maybe
I'm balking at simple integer success criteria because I don't see it as
being easy for a user (or me) to understand what it means, whereas a list of
required serverids is pretty straightforward.
But I'm hesitant on the explicit serverlist too, because of how it'd not work
so well in very dynamic grids, and how it kind of needs constant attention
and decision making by the user.
Hm. I'll think about the checkboxes idea more, I kinda like it.
Replying to zooko:
This is wrong.
shares.needed
only ever refers to a number of shares. Those shares can be served from any number of servers (which necessarily is between 1 andshares.needed
inclusive, but that's a logical requirement rather than an additional criterion imposed by the upload/download/repair algorithms).Milestone renamed
moving most tickets from 1.12 to 1.13 so we can release 1.12 with magic-folders
Moving open issues out of closed milestones.
Ticket retargeted after milestone closed