add a mutable-file cache #465
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#465
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
If the mutable-file retrieval process were allowed to keep an on-disk cache (indexed by SI+roothash, populated by either publish or retrieve), then a lot of directory-traversal operations would run a lot faster.
The cache could store plaintext, or ciphertext (or, if/when we implement deep-traversal caps, it could contain the intermediate form). For a node that runs on behalf of a single user, the plaintext cache would be fastest. For a webapi node that works for multiple users, I wouldn't feel comfortable holding on to the plaintext for any longer than necessary (i.e. we want to maintain forward secrecy), so if a cache still seemed useful then I'd want it to be just a ciphertext cache.
The cache should only store one version per SI, so once we publish or discover a different roothash, that should replace the old cache entry. The cache should have a bounded size, with a random-discard policy. Of course we need some efficient way to manage that size (doing 'du -s' on the cache directory would be slow for a large cache, either we should keep the cache from getting that large or do something more clever).
It would probably be enough to declare that the cache is implemented as a single directory, with one file per SI, and each file contains the base32-encoded roothash + newline + plaintext/ciphertext . The size bound is imposed by limiting the number of files in this directory, and it is counted at startup.
Note that because the cache is indexed by (SI,roothash), it is an accurate cache: a servermap-update is always performed (incurring one round-trip), and only the retrieve phase is bypassed upon a cache hit. This only helps improve retrieves of directories that are large enough to not fit in the initial read that the servermap-update performs (since there is already a small share-cache that holds these reads), which probably means 6 children or more per directory. These not-so-small directories could be fetched in a single round-trip instead of two RTT.
If we allowed the cache to be indexed by just SI (or if we were to introduce a separate cache that mapped from SI to somewhat-current-roothash), it would be an inaccurate cache, implementing a tradeoff between up-to-dateness and performance. In this mode, the node would be allowed to cache the state of a directory for some amount of time. We'd get zero-RTT retrieves for some directories, but we'd also run the risk of not noticing updates that someone else had made.
If you like this ticket, you might also like #606 (backupdb: add directory cache), #316 (add caching to tahoe proper?), and #300 (macfuse: need some sort of caching).
See also #1045 (Memory leak during massive file upload (or download) through SFTP frontend).
It appears that that problem is due to the current design of the
ResponseCache
in source:allmydata/mutable/common.py, and might be solved by replacing that cache (which stores share responses) with a mutable file cache as described in this ticket.