rearrange share format to make downloads faster #1543
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1543
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
(I thought I had a ticket for this already, but now I can't find it)
In a new share format, we could arrange the layout of the various share pieces to benefit download speed. The biggest slowdowns are:
Our current designs treat whole-file downloads and random-access as equal peers: we design first for minimum alacrity for the single-segment case. We could instead recognize that most downloads are whole-file, and accept a slight alacrity hit in exchange for faster throughput and higher transport efficiency:
and we could rearrange the share to help:
Replying to warner:
Just for context about my thinking, I am generally not that keen on increasing performance for some usages while decreasing it for others, even if the former usages are more common. One thing that I don't like about that is that people may make assumptions about expected performance based on past observed performance, and if the performance is more consistent over different use cases, their assumptions are less wrong.
This goes against a common (haha) engineering maxim of "optimize the common case". I think that maxim makes sense when a single user action is going to exercise a whole smorgasbord of different cases inside a system, so optimizing the most common of those internal cases results in a speedup for most or all possible user actions. But it is potentially problematic when different user actions exercise substantially different distributions of the internal cases.
In this particular issue I'll bet it doesn't matter much since the fewer-round-trips and the scatter-gather-reads (the first two improvements that Brian mentioned) are probably the big issues, and improving them would probably not increase the performance gap between different usages.
Another alternative would be to start using multiple variable-length records (i.e. files) on the backend. In all three of our current backend designs, we chose to use the minimal number of variable length records (i.e. files) per share. That minimal number is 1 -- each share is stored in one file, including the share data, the hash tree nodes, and the leases. This choice minimizes the number of disk seeks necessary to find the share data and the hash tree nodes, but if the share data is large and the reader isn't going to be reading all of it in order then we may end up needing to jump around within it (or read over data that we don't need), which might end up costing us an extra seek or a few anyway.
Another problem with storing both share data and hash tree nodes together in a single file is what to do if the share data shrinks or grows, which can happen with SDMF and MDMF but not with immutables, or if the uploader doesn't know the size of the share data when they begin the upload, which currently can't happen but I hope will happen in the future (#1288).
Designing a layout and an algorithm to manage that which optimizes performance can get complicated, and the recent security issue #1528 has really driven home to me how even moderate complexity hides bugs. (Mea culpa: it hid the bug from me, not from Brian -- Brian implemented this moderate complexity correctly, originally.) Even if we get it right, it takes engineering time to do so.
So an alternative which makes it simpler for us is to use more variable-length records (files) per share, i.e. the storage server creates one file to hold the share data and another to hold the hash tree nodes. This costs, I think, one extra disk seek to find the second file (given that you've already paid the several disk seeks to find the first file), but it makes things easier and might actually win back some of the performance lost to the extra seek:
Replying to [zooko]comment:2:
For example, you could say:
The drawback of optimizing for the reader like that is that the writer then has to update all of those records whenever it changes the contents of any leaf (block). This is the opposite extreme of the layout where the writer has to do minimal work and the reader has to read from each of the separately located uncles. I think there may be a sweet spot tradeoff in which the first five levels of the binary tree are stored in the first record, so the writer always has to update that first record as well as the record(s) that are more local to its write and the reader always has to read that first record as well as the record(s) that are more local to its read.
Replying to [zooko]comment:3:
Anybody who followed this idea closely may have noticed a minor error here -- the records would be 96 hashvalues wide, not 64, so to download an entire record would take 3072 bytes, not 2048 bytes.
Replying to [zooko]comment:2:
I just noticed that Brian wrote some ideas about this in #600. He emphasized the important of keeping things together in a single file in order to ease moving shares around. I agree that this is an important reason, and it was one of the reasons uppermost on our minds back then. Nowadays I perceive this as not such a big deal -- shares get migrated rarely, and when they do a simple rsync does the right thing.
Duplicate of #1687?