new-downloader requests too much data, builds up #1181
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1181
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
In #1170 comment:43 I describe a problem that
I noticed in the way the new downloader code handles data for long
reads (hundreds of segments). It looks like we ask for data which we
don't end up using. The
.received
DataSpans
structure onone of the later-to-arrive shares grew by about 350 ranges (and 35kB)
over the course of 762 segments. The one increment I looked at was by
64 bytes.
I have two guesses, the second more likely than the first:
our use of
IncompleteHashTree.needed_hashes()
with theinclude_leaf=True
argument is telling us that we should askfor e.g. the hash of block0, so we ask for that part of the hash
tree. But then we retrieve block0 itself, and can hash it ourselves
and add it to the hash tree, meaning we never use the H(block0)
data from
self._received
. This would result in a 32-byte hashvalue being added (and never removed) from
Self._received
once every other segment.
The ciphertext hash tree nodes are requested from all shares,
because we never know which one will arrive first (and which ones
won't arrive at all). But we only consume the data for these hash
nodes out of the first share that provides them: we leave the
others alone. This will result in a variable amount of data being
left over in each share: larger amounts when the ciphertext hash
tree path down to the current segnum leaf flips over higher
branches (half of the time you'll only get one extra node, a
quarter of the time you'll get two extra nodes, etc, maximizing on
the first and numsegs/2 segments in which you have to get a full
ln2(numsegs) nodes).
I'm not so sure it's the first, since I ran the unit tests with those
include_leaf=
arguments set toFalse
, and they failed.But the ciphertext hash tree feels pretty likely.
The simple fix is to just discard the whole
self._received
structure after each segment is complete: in the ideal case (correctly
guessed segsize), it should be empty at that point anyways, and even
if the segsize guess was wrong, the bandwidth hit for flushing the
data is probably not more than a single block (and the round-trips hit
is probably zero). Here's a patch to implement the simple fix:
The complex fix is to consult the ciphertext hash tree when these node
hashes arrive, and either add them to the hash tree or discard them
(instead of the current practice of always storing them and then only
remove them if the hash tree still needs them). It might be easier to
do this if
DataSpans
had a feature to label each byte (in someefficient way), so you could ask it for the ranges labelled
"ciphertext hash tree nodes", and discard just those. (this might also
help with pipelining segments, so you merge requests but still get the
large blocks of data back in the right order, to minimize your
buffering needs).
Incidentally, that patch causes a few unit tests to fail, but it's only the two in
test_immutable
that assert a specific number of read() calls. They can safely be modified to raise the call limit.Attachment 1181-patch.diff (2172 bytes) added
patch which also fixes unit test failures
We've applied 1181-patch.diff as changeset:c89a464510394089 but I guess that's not the real solution, so I'm leaving this ticket open.