eliminate hard limit on size of SDMFs #359
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#359
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We currently impose a hard limit on SDMFs of 3.5 MB. (It was recently raised from the initial value of 1 MB in order to support directories with up to 10,000 entries.)
We could remove this artificial limit entirely. There would remain "soft limits":
Creating or updating an SDMF would take approximately 1+N/K * filesize RAM.
It would take approximately N/K * filesize upload bandwidth to change even just one byte of the file. (if/when we implement a mutable upload helper, the client-to-helper bandwidth will be equal to the filesize).
FYI, we don't have a mutable-file upload helper yet.
What's the limit on an immutable file?
It was ticket #346 to raise it to an extremely high limit. The currently limit is that there is a 64-bit unsigned field which holds the offset in bytes of the next data element that comes after the share contents on the storage server's disk. See the implementation and the in-line docs in source:src/allmydata/immutable/layout.py@3864.
This means that each individual share is limited to a few bytes less than
2^64^
. Therefore the overall file is limited tok*2^64^
. There might be some other limitation that I've forgotten about, but we haven't encountered it in practice, where people have many times uploaded files in excess of 12 GiB.Note that zooko's recent comments are about immutable files and their shares, whereas this ticket is about mutable files and shares, which use a different layout. However the same general statements are true. Mutable files were designed after we had some experience with immutable files, but before I learned to always use 64-bit fields for everything. They've used somewhat larger offset fields since day 1, which are big enough to accomodate very large shares. The layout is described in source:src/allmydata/mutable/layout.py .
To be precise, they use 32-bit fields to hold the offsets of the signature, share_hash_chain, block_hash_tree, and share_data, then use a 64-bit field to hold the offset of the enc_privkey and EOF. So they can tolerate 2^64^ bit share_data sections, which is where the bulk of the share's data lives. The block_hash_tree section is smaller than the share_data section, but still scales linearly with filesize. Because of the 32-bit field for
offsetshare_data
, it must be somewhat shorter than 2^32^ bytes, limiting it to 2^27^ hashes, so 2^26^ segments, which at our default 128KiB (2^17^) segsize means 2^43^ bytes, which is the limiting factor. By raising the segsize to e.g. 4MB (2^22^) this limit grows to 2^48^ bytes.So, SDMF mutable files are limited by the share format to k*2^43^ bytes, or about 24TiB. Until we implement MDMF and can process mutable files one segment at a time (instead of holding the whole file in RAM), we'll be soft-limited by available memory, so practically speaking the limit is a couple of GB.
If we stick with the same share format for MDMF (which was our goal: old clients should be able to keep using their SDMF code to read MDMF-generated files, unless we really do need a separate salt for each segment: #393), then MDMF files will be limited to k2^43^ bytes with a RAM footprint of about x128KiB (where "x" is probably 2 or 3). An uploader-side max_segsize configuration change can scale those two values together up to a filesize limit of k2^64^ bytes and a RAM footprint of x256GiB.
If we do change the share format for MDMF, then we should of course use 64-bit fields everywhere and remove this 2^43^ limit.
Finally, it turns out that this ticket is actually a dupe of #694, which was closed when we removed the hard limit on SDMF files in changeset:db939750a8831c1e back in June 2009. I'd initially imposed the arbitrary 3.5MB limit to discourage people from using the (inefficient, memory-hungry) SDMF format in ways that would disappoint their hopes for high-performance behavior, but I was talked out of this and Kevan implemented the fix, which was first released in 1.5.0 .
For the record, my comment:65482 was about immutable files because David-Sarah asked about them in comment:65481. :-) Thanks for the description of the mutable file size limits.