analyze mutable-file retrieval performance: count roundtrips #304
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#304
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I'd like to spend some time analyzing the control flow in mutable.Retrieve .
My original hope was to pull down the whole contents of the file in a single
round-trip, but it turned out to be trickier than I'd thought. I think we're
currently doing more like 3 or 4 RTT.
Directory traversal would be considerably faster with fewer roundtrips.
Or would it? Finding out how much of the time is spend waiting for the
network would be a great starting point. The first step would be to implement
a "Retrieval Results" object (similar to the immutable-file "Upload Results"
object I just wrote this week), into which we could throw timing information
during retrieval. Then find a good way to display this information: maybe in
a separate box at the bottom of the webish directory display (if some debug
checkbox were selected).
Note that the only call we ever make to the storage servers is "readv", so
the issue is simpler than with immutable files (in which there is a call to
get_buckets, followed by some number of 'read' calls, and foolscap does not
yet do promise pipelining).
One potential improvement is to avoid doing more than one readv per server.
This means reading enough data on the first call that we don't need to ask
for more. To do this, we must make a guess as to a reasonable amount of data
to ask for: we need the pubkey from one peer, the signed version info from
k+epsilon peers, and full share data from 'k' peers.
The other potential improvement is to exploit as much parallelism as
possible. There are plenty of tradeoffs here. If we knew 'k' ahead of time,
we would know how many signed-version-info queries we need.
Hm, maybe something like this:
mutable file. For
DirectoryNode
s, the container should provide ahint, so that if dirnodes use 1-of-5 instead of 3-of-7, our guess for k is
'1'
sure that it does, but I want to double-check
servers for the signed-version-info+pubkey data.
queries.
NAK.
If we're correct (or high) about 'k', and all the servers have shares, then
this will do the job in a single RTT. If we underestimated 'k', we'll need
2RTT. If some servers did not have shares, we'll need about 2RTT (or
perhaps a fraction more, depending upon the spread of response times).
I just did some dirnode-size analysis, and put the results in
docs/dirnodes.txt . The summary is that each child adds about 336 bytes to
the dirnode, and therefore 112 bytes to a k=3 -encoded share. The SDMF format
stores about 935 bytes of pubkey+sig+hashes, followed by share data, followed
by 1216 bytes of encprivkey. So if we fetch 2kB of data on the initial read,
we should be able to retrieve a 9-entry dirnode. If we fetch 3kB, we should
be able to get 18 entries, or 7 entries plus the encprivkey. 4kB would get us
27 entries, or 16 plus the encprivkey.
I did an experiment to examine the effects of changing the Publish initial
read_size. Using 2048-bit keys, and using the same speed-test setup we use on
the buildbot, I got:
I was expecting the 3kB case to upload files one RTT faster (about 16ms on
DSL, 3ms in colo). I can't see any evidence of such a speedup, although it is
clear that one RTT is somewhat lost in the noise.
We need to investigate mutable-file behavior more closely. As pointed out
above, the real first step will be to add "Retrieval Results" / "Publish
Results" with detailed timing data.
Incidentally (for completeness), an extra experiment I did with 522-bit keys
is bogus, because the whole share is smaller, and a 1kB read is probably
enough to get everything. The results of that experiment were:
colo: (522-bit keys)
DSL:
I added some retrieval-side timing data to the new webish "status" page,
which now includes how long each query took. Analyzing the results revealed
that the Retrieve code was doing a refetch to grab the encrypted private key,
which of course really isn't useful there (you only need the private key to
create a new version of the file, so Publish wants it more than Retrieve). As
a result, the default initial-read size of 2000 bytes was causing us to do
two round-trips, even for small directories.
I changed the code [#2245] to have separate unpack_share() and
unpack_share_data() methonds. The latter raises NeedMoreDataError if it
can't get the encrypted private key, the former only raises
NeedMoreDataError if it can't get the share data (and is silent about the
encprivkey). Retrieve now uses unpack_share_data() exclusively.
The result is one round trip instead of two for directories with fewer than
about 9 or 10 entries, allowing them to complete in roughly half the time.