macfuse: need some sort of caching #300
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#300
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
so doing some initial experiments with mac fuse and python fuse bindings, it seems like the simple act of viewing a directory in finder generates a large number of calls through the fuse api.
I ran a stub (loopback) fs with instrumentation of each fuse call, and opened a directory or two, with only a few files. (having tested a much larger directory and seen correspondingly larger numbers of calls). the tool inserted a 100ms delay in answering each call, which explains the spacing of calls over time.
see attached log.
Attachment tfuse.log (37981 bytes) added
log of fuse calls
So if I'm reading that log right, when the finder looks in a directory, it
makes the following calls:
about 102 calls to access(DIR)
14 calls to getattr(DIR)
3 calls to getattr(.DS_Store)
1 call to getattr(.hidden)
1 call to readdir(DIR)
21 calls to statfs()
for FILE in DIR:
24 calls to access(FILE)
6 calls to getattr(FILE)
12 calls to access(FILE.swp)
3 calls to getattr(FILE.swp)
And displaying that 5-file directory resulted in about 330 system calls.
Impressive! :-)
It sounds like everything except statfs() can be handled with the data from a
single dirnode, so caching it long enough to make sure that this batch of
330-ish calls can be fed with a single Tahoe dirnode fetch is an important
goal. We have a few numbers to suggest how long it takes to perform this
fetch:
http://allmydata.org/tahoe-figleaf-graph/hanford.allmydata.com-tahoe_speedstats_delay_SSK.html
suggests that it takes about 70ms for a Tahoe node to retrieve a small
mutable file over a DSL line. There will be some extra delays involved if we include web API time, or more servers than those used on our speed-net
test, but I believe that any given directory should be fully retrieveable in
under a second.
So we'll need to choose a caching policy based upon the following criteria:
the same dirnode contents, in rapid succession
to 100ms
The cache entries should expire after some reasonable period of time. Longer
expiration times will produce surprises and frustration when a user updates a
directory on one machine and then fails to see the updates on a different
machine.
If the expiration time is more than a few seconds, the implementation will
require some sort of forced-expiration or local-update in the face of
locally-caused changes to the directory, to make sure you can see the changes
you just made. (if we didn't have caching, we wouldn't need this
relatively-complicated feature).
My straw-man suggestion is the following:
More data (specifically system-call traces) would be useful on the following
cases:
finder do a lot of calls for the ancestor directories? If so, that will
increase the pressure to retain cached entries longer.
directory re-read? That will influence the modify-the-cache vs.
expire-the-cache design decisions.
If you like this ticket, you might also like #606 (backupdb: add directory cache), #465 (add a mutable-file cache), and #316 (add caching to tahoe proper?).
The direct FUSE support in Tahoe-LAFS was removed in 4f8e3e5ae8fefc01df3177e737d8ce148edd60b9 (2011). The preferred route to have native filesystem-like interface is via the SFTP frontend and something like sshfs.