downloader/share: coalesce mutiple Share.loop() calls #1268
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1268
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As alluded to in comment:79607 , the behavior of
src/allmydata/immutable/downloader/share.py (in the
Share
class)should be improved. Each time a data-fetch response arrives from the
server (either for a data block or for a couple of hash nodes), a call
to
Share.loop
is queued. When run,loop
recomputes thedesire/satisfaction bitmap (a
Spans
instance), which takes anontrivial amount of time. (although much less than it used to, before
some of the issues in #1170 were fixed).
The problem is that we're recomputing it far too many times, and most of
them are redundant. For example, supposed we get responses to 4 requests
in a big bunch (because they all arrived in the same network packet).
Foolscap will queue an eventual-send for the response Deferreds all on
the same reactor turn. When those fire, they'll update the
what-data-we've-received
DataSpans
and then queue a call toloop(). All four messages will be processed before the first call to
loop() occurs.
The first loop() pass will process the
received
spans, deliver thecompleted block to the caller that asked for it, and then update the
desire/satisfaction bitmaps in preparation for whatever the next block
might be. The second loop() pass will recompute the bitmaps, even though
no changes have occurred (no new blocks were requested, no new responses
received). The third and fourth loop() passes will do the same.
viz-3.png shows a bit of it: the pale yellow
candlestick-like shapes (just below the "misc" label) are the calls to
measure satisfaction and recompute the desire bitmap. The first call
spends a lot of time in satisfy() (to deliver the completed block), and
then a moderare amount of time in desire() (to figure out what we need
next). The second and third calls both spend a tiny amount of time in
satisfy() (since there are no completed blocks ready to be delivered),
but then spend the same moderate amount of time in desire() recomputing
the same bitmap as before.
This gets worse when there are a lot of loop() calls queued, which means
it gets worse when there are lots of (small) hash node requests coming
back. As a result, the amount of wasted time is larger just after
segments which trigger most hash-node requests. This distribution is an
interesting function (probably one of those exciting sequences in the
OEIS), but basically it's zero for all odd-numbered
segments, and gets larger as you switch over higher branches in a binary
tree (so the max is for seg0, the next highest is for NUMSEGS/2, third
highest is for NUMSEGS/4 and 3*NUMSEGS/4, etc).
180c2-viz-delays.png shows this variation: the
time between the end of the last pink block() request for any given
segment, and the end of the segment() request, is mostly filled with
this wasted recomputation. The time spent there is larger for some
segments than others. In the example chart, the largest set of block
requests (for the NUMSEGS/2 midpoint segment) has about 70ms of wasted
time, whereas the very next segment (NUMSEGS/2+1, which needs no extra
hash nodes) only spends about 10ms in that phase.
The total amount of time wasted scales (linearly, I think) with NUMSEGS.
It should also scale linearly with 'k', since there's a separate Share
instance (and thus a whole set of these queued loop() calls) for each
active share.
This problem is hypothesized to be involved with #1264 (slow downloads
for large k), although the rough measurements I did for #1170 suggest
that the time spent is too small (8% for a local one-CPU download of a
12MB file) to explain #1264.
The fix will be to have
Share
refrain from queueing redundantcalls to
loop()
:eventually(self.loop)
into aschedule_loop()
to check a flag:eventually(self.loop)
loop()
Then we should re-run the #1264 tests to see if there's a significant
change.
I wrote up some notes about François's profiling results over on comment:9:/tahoe-lafs/trac-2024-07-25/issues/6326. It might shed light on this ticket.
Attachment mergeloops.diff (1463 bytes) added
patch to merge loops
The patch is pretty simple. Manual checking on a 12MB file shows a reduction in the number of calls to
loop()
from around 1200 to about 580, and a casual check shows the download time drops from 4.2s to 4.0s, so it looks like it's doing its job.Looks good.
In changeset:c9def7697757bdae: