upload should take better advantage of existing shares #610
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#610
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Our current upload process (which is nearly the oldest code in the entire
tahoe tree) could be smarter in the presence of existing shares. If a file is
uploaded in January, then a few dozen servers are added in February, then in
March it is (for whatever reason) uploaded again, here's what currently
happens:
partial ordering as the original list but with the new servers inserted in
various pseudo-random places
hold on to the next sequentially numbered share
list of shares that they might already have
home for, but it also never unasks a server to hold a share that it later
learns is housed somewhere else
So, if the client queries a server which already has a share, that server
will probably end up with two shares. In addition, many shares will probably
end up being sent to a new server even though some other server (later in the
permuted list) already has a copy.
To fix this, the upload process needs to do more work:
other server already has that particular share
already-uploaded share
energy into finding additional ones
to increase the chance that it can detect this evidence
We're planning an overhaul of immutable upload/download code, both to improve
parallelism and to replace the DeferredList with a state machine (to make it
easier to bypass stalled servers, for example). These goals should be
included in that work.
This process will work best when the shares are closer to the beginning of
the permuted list. A "share rebalancing" mechanism should be created to
gradually move shares in this direction over time. This is another facet of
repair: no only should there be enough shares in existence, but they should
be located in the best place for a downloader to find them quickly.
I'm going to add Cc: tahoe-dev to this ticket, and then I'm going to post the original contents of this ticket to tahoe-dev along with a link to this ticket.
Zandr suggested a simple "good enough" solution for 1.3.1:
This wouldn't be perfect, because there could be other pre-existing shares beyond the end of the portion of the permuted list that we wouldn't see, but it would at least remove some of the duplicated shares.
The following clump of tickets might be of interest to people who are interested in this ticket: #711 (repair to different levels of M), #699 (optionally rebalance during repair or upload), #543 ('rebalancing manager'), #232 (Peer selection doesn't rebalance shares on overwrite of mutable file.), #678 (converge same file, same K, different M), #610 (upload should take better advantage of existing shares), #573 (Allow client to control which storage servers receive shares).
Also related: #778 ("shares of happiness" is the wrong measure; "servers of happiness" is better).
This ticket was fixed by the patches that fixed #778. I think. Assigning this ticket to Kevan to verify and document that this ticket is fixed by his patches.
I don't think #778 fixed this.
The bipartite matching wording of this issue is: "If it later learns that some other server already has that particular share, and it can use that share on that other server to create a maximum bipartite matching (or: a bipartite matching that is larger than the happiness threshold, though I'd take maximum reliability over the inefficiency of a duplicate share), it needs to cancel share-upload requests". #778 does not do this.
(I think this part of the issue would probably be better solved by the uploader rewrite alluded to at the end of #778; it isn't a regression compared to 1.6.1, and it would be a lot nicer if we considered this as one of the requirements to a new uploader than if we tried to graft it onto the existing uploader)
#778 searches for pre-existing shares regardless of whether it has seen them, but only on (some of the) servers that it can't place them on. I think what this ticket suggests is to expand that search to peers that we can write to, and make that search triggered or otherwise conditional on finding pre-existing shares when issuing
allocate_buckets()
requests. #778 doesn't do that.The read-only share discovery logic in #778 was implemented to solve the specific case of happiness being undercounted due to the fact that the upload process never contacted read only peers at all, and would therefore never know about the shares that they held. This ticket is more about doing that for all servers to maximize the efficiency of share placement on the grid. An easy first stab at that would be to expand the share discovery logic to work across all of the peers that it knows about, though that would add even more queries to the upload process.
#778 does do this -- it asks servers that it would not have asked in 1.6.1 to try to detect evidence of pre-existing shares.
Kevan: does your patch from #1382 affect this issue?
This is likely to be fixed by implementing the algorithm in ticket:1130#comment:69401, provided that it applies to upload as well as repair.
During 7/9/13's dev chat meeting, Brian suggested that the uploader contact 4n servers in the case where existing shares are found instead of the 2n used in ticket #1382. There was no decision made but we agreed that this ticket should be revisited once #1382 is closed and lands in trunk.