leasedb: performance regression #1870
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
6 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1870
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The 1818-leasedb branch has a performance regression that shows up when running the test suite
-- in fact, the test suite is not able to pass at the moment purely due to timeouts.Since the regression does not show up when using
make tmpfstest
(which uses a memory-based tmpfs filesystem rather than disk), my tentative conclusion is that it is due to the latency of leasedb database syncs. There are currently many redundant syncs due to every SQL query/update being in a separate transaction, and due to there being more SQL queries and updates than necessary per storage API request. We could also use a more relaxed consistency mode, if that is safe.Here are my notes about this:
https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html
Bottom line: I believe we should turn on sqlite's synchronous = NORMAL, journal mode = WAL.
Also, that we should fix ticket #1893, which would reduce this load.
I thought that mode caused a problem with file handle leakage? Or am I misremembering?
No, I remembered correctly. In current leasedb.py:
Replying to zooko:
I'm skeptical. We don't even know whether most of the increased latency is for mutable or immutable operations, and #1893 would only affect mutable writes. In any case, that can't be the cause of the regression, since trunk has always renewed leases on mutable writes.
Replying to [daira]comment:8:
I didn't say that #1893 could be the cause of the regression!
But judging from my comments in [//pipermail/tahoe-dev/2012-December/007877.html https://tahoe-lafs.org/pipermail/tahoe-dev/2012-December/007877.html], it sounds like calls to leasedb might be happening 3X as often as they need to or even more, so that might have a significant effect on performance.
I've split the file descriptor issue out to #2015.
Replying to [zooko]comment:9:
Yes, but most of those calls would happen regardless of #1893, I think.
Now that #2015 is fixed, I think this no longer blocks merging the cloud branch, even though I would very much like to reduce or eliminate the remaining performance regression before merging.
I just ran all unit tests of trunk, and of 1819-cloud-merge-opensource, and the latter took a lot longer:
My next question was: is the 1819-cloud-merge-opensource branch taking longer to do the same tests as the master branch? Or is it that the extra tests on the new branch (1185 vs. 1139 on master) are what is making the test suite take longer? The answer is that it is at least partially that existing tests are slower, for example:
Next, I extended fdleakfinder to print out a histogram of how many times each filename was opened and ran that on
allmydata.test.test_cli.Backup.test_backup
. This showed that 1819 branch is not opening sqlite db files very often in this test (just once each), but that it is opening the share files a lot more often:Interesting. Is it possible to identify where in the code the extra file opens are?
I'm not sure if this will help, but I ran the unit tests on master and 1819-cloud-merge-opensource and calculated the difference in time for all shared tests. Here are all the tests which are more than five seconds slower on the current cloud branch:
You can view the rest of the data I collected here: https://gist.github.com/markberger/136765c5e87480b8017e
Replying to zooko:
I'm not sure if this is relevant, but that test runs significantly faster on the cloud branch:
Edit: I posted the wrong test data.
Replying to zooko:
According to data I collected, the total time to execute the extra tests on 1819-cloud-merge-opensource is 232.95s. There is a difference of about 628s with the tests that both branches have in common.
markberger: can you test the hypothesis that the main problem is with an increase in latency of filesystem operations? To do that, repeat the comparison of trunk with 1819-cloud-merge-opensource, but using
make tmpfstest
for both.After running the tests again using
make tmpfstest
, the increased latency seems to be the largest factor of the regression.The 22s regression appears to occur in mutable update:
The rest of the tests have insignificant differences.
Thanks, that's very helpful.
I'm not too worried about the 22s regression. It's the rest of the difference shown in comment:90360 that we need to focus on, by reducing the number of syncs.
I was getting confused by the different measurements (mine and markberger's) of the timings of different test runs (comment:90360, comment:90361, comment:90364, comment:20, comment:21, comment:23).
So, I tried to reproduce markberger's observation from comment:90364 that the test case which takes the most additional time on the cloud branch is
allmydata.test.test_cli.Cp.test_copy_using_filecap
.I also observe that this test takes about twice as long on cloud branch as on trunk:
So then I ran it under strace, as instructed by the fdleakfinder docs. It took about 64 seconds to run this test under strace on the branch:
… and about 41 seconds to do it on trunk:
I ran fdleakfinder on the strace files. It says that on trunk 6499 files were opened during the course of the test, and on branch 13,999 files were opened.
From eyeballing the output, I see one kind of file that is used a lot on branch and not at all on trunk. There are 6520 uniquely-named tempfiles, located in
/var/tmp
used on branch and 0 on trunk. Each one is opened only once on branch. Here's an excerpt from fdleakfinder's report:Here's the count:
A searching engine query for "sqlite /var/tmp/" took me to this thread, in which sqlite inventor Richard Hipp seems to be saying that sqlite uses these temp files for computing a sql query which could require too much RAM, and that the files in
/var/tmp
are for holding a temporary table that sqlite generates to compute this query:(@@http://www.mail-archive.com/sqlite-users@sqlite.org/msg73481.html@@)
The example in the sqlite mailing list discussion is the "UNION ALL" query, which we don't use. But apparently some sql query that we are using is causing sqlite to generate a temporary table.
Eek. If there are queries using space linear in the number of shares stored, that won't scale at all.
I wonder which queries are creating the temporary tables? I'll see if I can find out.
Only the following queries are used in that test:
(found by instrumenting leasedb.py and piping the output through
|sort |uniq
).These are not queries that I would expect to use temporary tables, given that (
storage_index
,shnum
) is the primary key onshares
.The temporary files are opened during
INSERT OR REPLACE INTO
andUPDATE
queries. They are not opened duringSELECT
queries.This is informative: http://www.sqlite.org/tempfiles.html
Aha, I believe these are Statement Journal Files (section 2.5 of http://www.sqlite.org/tempfiles.html).
What's confusing is that "A statement journal is only created for an UPDATE or INSERT statement that might change multiple rows of a database [...]".
INSERT OR REPLACE INTO
queries by definition never affect multiple rows. TheUPDATE
statement in comment:90372 cannot update multiple rows because theWHERE
clause specifies exact values forstorage_index
andshnum
, which together form the primary key, although I can see why sqlite might not "know" that in advance.I added this to
dbutil.py
, and the strace output no longer shows any files in/var/tmp
being opened:1819-cloud-merge branch with
PRAGMA temp_store = MEMORY;
:and without:
So a small improvement (1 minute on the full test suite) but nothing spectacular :-(
As we noticed in the dev-chat today,
Cp.test_copy_using_filecap
is mistakenly copying its entire basedir into the tahoe filesystem (which includes all 10 server basedirs, and their shares, which are being created during thecp
by the files being copied in.. it's remarkable that the process terminates at all). This resulted in something like 174 shares being created (per server), instead of 4. Fixing this doesn't address the overall regression, but does explain why this one test was taking 17 seconds even on trunk.OTOH, leaving that test broken while we figure out the performance regression provides us with a convenient lots-of-shares stress test, that's highly responsive to changes in the leasedb times. I'd much rather iterate on a 30-ish second test than on the full test suite.
Replying to warner:
Fixed in #2048.
That's fine, we'll just not pull that fix onto 1819-cloud-merge[-opensource] yet.
Actually we shouldn't pay any attention to the unfixed version of
Cp.test_copy_using_filecap
, because it would be expected to have different behaviour on trunk vs the cloud branch anyway due to the different number of files in the base directory. So that was a red herring, although I'm glad we fixed it.If we want a stress test,
allmydata.test.test_mutable.Update.test_replace_locations_max_shares
is pretty stressful :-pMilestone renamed
renaming milestone
Moving open issues out of closed milestones.
Ticket retargeted after milestone closed