downloader: coordinate crypttext_hash_tree requests #1544
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1544
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
One performance improvement idea from the #1264 and Performance/Sep2011 analysis work is to reduce the number of read() requests roughly in half by introducing cross-Share coordination of crypttext_hash_tree node fetches.
Each share should have identical copies of the crypttext_hash_tree (and, if we ever bring it back, the plaintext_hash_tree too). To produce a validated copy of segment0, we need to fetch the crypttext_hash_tree nodes that form the merkle-tree "uncle chain" for seg0. That means fetching ln2(numsegs) hash nodes.
At present, each Share treats this task as its own personal duty: when calculating the "desire" bitmap, the Share checks the common
IncompleteHashTree
to see which nodes are still needed, then sends enough requests to fetch all of the missing nodes. Because each Share performs this calculation at about the same time (before any server responses have come back), all Shares will conclude that they need the full uncle chain. So there will be lots of parallel requests that will return the same data. All but the first will be discarded.The improvement would be to have the Shares coordinate these overlapping reads. The first Share to check the hashtree should somehow "claim" the hash node: it will send a request, and other Shares will refrain from sending that request, instead they'll use a Deferred or Observer or something to find out when the uncle chain is available. If the first Share's request fails, then some other Share should be elected to send their own request. Ideally this would prefer a different server than the first one (if there are two Shares on the same server, and the first one failed to provide the hash node, the second one is not very likely to work either).
Also, there needs to be a timeout/impatience mechanism: if the first Share hasn't yielded a result by the time the other data blocks have arrived, we should consider sending extra requests.
This isn't trivial, because it requires new code that can coordinate between otherwise-independent Shares. The performance improvement is considerable while we don't have readv() support in the downloader. Once that's in place, the marginal improvement provided by coordinated requests may be too small to be worth the effort: less IO, less data transmitted (scales with N but still a small fraction of the total data sent), but no fewer
remote_readv()
messages.