store shares in single files, instead of 7 files and a directory #85
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
1 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#85
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
At the moment we have significant filesystem overhead in our share storage, mostly because common filesystems like ext3 consume a whole disk block (4096 bytes) even for 1-byte files. Since we use 7 files in a separate directory for each share, that means 8*4096=32768 bytes consumed per share, even for 1-byte shares. As a result, the lower bound on storage consumed occurs at 102400 bytes (i.e. all files 102400 bytes or smaller consume the same amount of storage), at which point the storage consumed is 3276800 (3.3MB).
Our storage format was chosen for simplicity and ease of implementation, but this represents a huge overhead. So the plan is to combine all 7 files into a single one, and to not put it in its own directory. That will reduce the minimum share size to one disk block (4096) instead of 8 (32768), and will bring the lower bound on storage to a filesize of 81250, at which point the storage consumed will be 409600 (410kB), an 8x improvement.
Reducing this filesystem-blocksize overhead below that would involve packing multiple shares (for different URIs) into a single file, which complicates the deletion and indexing of them. It might be useful, but hopefully we can avoid this step.
Also, we need to figure out a good place to put leases, once we implement them, but they can probably live in a separate database with different packing and access requirements.
I'm actively working on this one right now. The basics are in place, but the interfaces between the new bucket-proxies and the rest of the system are not working yet. I'm hoping to finish it tomorrow.
Done, in changesets changeset:cd8648d39b897684, changeset:1f8e407d9cda19ed, changeset:7589a8ee82eb6531, changeset:35117d77a0bb2177, and changeset:4d868e6649c2c5d8. The new format increases the actual overhead slightly, the layout described in storageserver.py:WriteBucketProxy shows that we add about 36 bytes to allow the share to be self-describing. But the overhead caused by 4kB disk blocks is reduced by 8x (for small files).