poor performance with large number of files via windows FUSE? #321
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#321
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Peter and Fabrice have reported problems with dragging a large folder into
the windows FUSE frontend. We're still collecting data, but the implication
is that there is a super-linear slowdown somewhere, maybe in the FUSE plugin,
maybe in the local Tahoe node that it connects to. We expect to spend roughly
one second per file right now: our automated perfnet tests show 600ms per
immutable file upload and 300ms per directory update; prodnet has a different
number of servers but I'd expect the values to be fairly close. Peter says
that this is not sufficient to explain the slowdowns.
We are currently running tests with additional instrumentation to figure out
where this time is being spent.
It's too bad we didn't implement #273 -- "How does tahoe handle lots of simultaneous file-upload tasks?" -- before now. If we had, then we would know already how the Tahoe node itself handles this load.
Err, I mean #173 -- "How does tahoe handle lots of simultaneous file-upload tasks?".
unfortunately no.. the FUSE plugin is only giving one task to the tahoe node at a time. No parallelism here.
Fine then -- let us add an automated performance measurement that says "How deos tahoe handle lots of sequential file-upload tasks?".
#327 -- "performance measurement of directories"
We've performed some log analysis, and identified that the problem is simply
the dirnodes becoming too large. A directory with 353 children consumes
114305 bytes, and at 3-of-10 encoding, requires about 400kB to be written on
each update. A 1MBps SDSL line can do about 100kBps, so this takes about 4
seconds to send out all the shares. The Retrieve that precedes the Publish
takes a third of this time, so it needs 1 or 2 seconds. The total time to
update a dirnode of this size is about 10 seconds. Small directories take
about 2 seconds.
One thing that surprised me was that dirnodes are twice as large as I'd
thought: 324 bytes per child. I guess my previous estimates (of 100-150) were
based on design that we haven't yet implemented, in which we store binary
child caps instead of ASCII ones. So the contents of dirnode are large enough
to take a non-trivial amount of time to upload. Also note that this means our
1MB limit on SMDF files imposes a roughly 3000-child limit on dirnodes (but
this could be easily raised by allowing larger segments).
There are four things we can do about this.
The most significant is to do fewer dirnode updates. A FUSE plugin (with a
POSIX-like API) doesn't give us any advance notice of how many child
entries are going to be added, so the best we can do is a Nagle-like
algorithm that tries to batch writes together for efficiency. The basic
idea is that when a dirnode update request comes in, start a timer
(perhaps five seconds). Merge in any other update requests that arrive
during that time. When the timer expires, do the actual update. This will
help the lots-of-small-files case as long as the files are fairly small
and upload quickly. In the test we ran (with 1024 byte files), this would
probably have reduced the number of dirnode updates by a factor of 5.
The biggest problem is that this can't be done completely safely: it
requires lying to the close() call and pretending that the child has been
added when it actually hasn't. We could recover some safety by adding a
flush() or sync() call of some sort, and not returning from it until all
the nagle timers have been accelerated and finished.
Make dirnodes smaller. DSA-mutable files (#217) and packing binary caps
into dirnodes (no ticket yet) would cut the per-child size in half
(assuming I'm remembering my numbers correctly). Once dirnodes get large
enough to exceed the size of the overhead (2kB overhead, so roughly 6
entries), this will cut about 50% off the large dirnode update time.
We discovered an the unnecessary retrieve during the directory update
process. We need to update the API (#328) to remove this and provide the
safe-update semantics that were intended. Fixing this would shave about
10%-15% off the time needed to do a dirnode update (both large and small).
Serializing the directory contents (including encrypting the writecaps)
took 500ms for 353 entries. The dirnode could cache and reuse the
encrypted strings instead of generating new ones each time. This might
save about 5% of the large-dirnode update time. Ticket #329 describes
this.
Zooko has started work on reducing the dirnode updates, by adding an HTTP
interface to IDirectoryNode.set_uris() (allowing the HTTP client to add
multiple children at once). Mike is going make the winFUSE plugin split the
upload process into separate upload-file-get-URI and dirnode-add-child
phases, which will make it possible for him to implement the Nagle-like timer
and batch the updates.
Attachment NOTES (2195 bytes) added
some timing notes from our logfile analysis
Oh, we also noticed a large number of t=json queries being submitted by the
winFUSE plugin. At the beginning of the test (when the directory only had a
few entries, and updates took about 3 seconds), we were seeing about 5 such
queries per child entry. All of these queries require a directory fetch, and
most resulted in a 404 because the target filename wasn't present in the
directory. When dirnode updates started taking longer (10 seconds), we saw
fewer of these per update (maybe 1).
Early in the test, these queries took 210ms each. At the end of the test they
take one or two seconds each. This might represent 15%-30% of the time spent
doing the dirnode updates.
The plugin should do fewer of these queries: they are consuming network
bandwidth and slowing down the directory update. If it is doing them to see
if the file has been added to the directory yet, then it would be far more
efficient to simply wait for the response to the PUT call. If they are being
done for some other reason, then we should consider some sort of read cache
to reduce their impact.
MikeB: is this issue handled Well Enough for v0.9.0 now?
This issue is somewhat improved, and is hereby considered Good Enough for allmydata.org "Tahoe" v0.9.0.
(Further performance tuning might be applied before the Allmydata.com 3.0 product release, but that can be done after the allmydata.org "Tahoe" v0.9.0 release.)