good handling of small numbers of servers, or strange choice of servers #213
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#213
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Suppose you try to upload something when you are on an airplane and you are
completely disconnected from all of your servers other than the one that you
yourself are running.
option 1. fail
option 2. silently upload all M shares to yourself
option 3. be transparent about this -- have output showing the user what is
happening and a knob the user can use to control to what degree you can rely
on yourself alone to store things
option 4. have a "rebalancing" operation in which data which is stored on a
"skewed" set of servers (such as too few servers, or on servers which are
less well places on the unit circle) gets moved to a more appropriate set
option 5. be transparent about that, too
See also ticket #232 -- "peer selection doesn't rebalance shares on overwrite of mutable file".
I think that silent rebalancing is going to be an important user-friendly feature. Part of the repairer's job will be to make sure the shares are distributed across a healthy set of peers, since that falls under the title of "improving the health of the file".
Providing an interface that lets the user see where their file got put is good and useful, but I don't want users to be obligated to use or pay attention to it: the abstraction of "the grid is a big place where my files go" is a valuable one, and forcing abstraction-boundary breaks adds to the user's cognitive load.
Perhaps the upload button should have a flag next to it that says "Warning: we're only connected to N peers right now, so you won't get the reliability you might expect: please consider waiting until you have more peers available" might help.
I think the general principle here is that we've (well, as least I've) been designing tahoe with a static set of peers in mind: the membership of the grid changes slowly over time. Uploading a file while you're on an airplane and then connecting to a larger grid violates this expectation.
As you may know, I question the value of the unbroken abstraction of "the grid is a big place where files go". I question it specifically because the cost of making it an unbroken abstraction seems high and potentially very high. On the other hand, it seems quite useful as a partial abstraction. "The grid is a big place where files go, except when it isn't for one of the following reasons..."
We don't have to agree right now on how valuable this abstraction is -- let's just agree to keep an open mind about these issues. Certainly for the two use cases that we have in mind -- the managed proprietary grid operated by sysadmins, and the friendnet -- the user (who is the sysadmin in the former case, I think), is expected to understand and monitor the state of the set of peers during normal usage.
If you mean that you aren't supposed to upload a file while you are on an airplane, and then later connect to a larger grid, because you understand that the set of servers you will be uploading to when you are on the airplane is too small, then I agree.
If you mean that people shouldn't use tahoe on machines that travel on airplanes, I'm not sure what I think about that. Certainly such portable machines should fit into the friendnet case, right? Also in the managed proprietary grid case, I should think that our semantics ought to specify some safe/useful/communicative behavior in the case that there are few servers.
So after some discussions today on my long-term use-case, it would seem that the same functionality set could solve this, #398, and #467. Let me explain:
I control a small number of nodes-- let's say four. I want to be able to tell my uploads that they should always leave four shares on the four nodes I own, and send the remaining six to the grid. That way, if I'm offline with only my four nodes for company, I can still use my files; similarly, when I go offline, people with access can also use my files.
In this case, I might also want to be able to configure the use of helpers etc. on a per-subnet basis; that is, "use the helper unless the node you're pushing to is on my LAN, in which case, it's silly."
Ideally I could also set up a modified rebalancer that says "make four shares and put them on my local grid subset," but that's secondary.
There's some good discussion in this ticket, but I think all of the changes we might make are covered by #778, #398, and #467. I'm closing this one as a duplicate and putting a reference to this one into #398 and #467.