dirnodes could cache encrypted/serialized entries for speed #329
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#329
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
We found that for large directories (353 entries), it takes a non-trivial
amount of time to serialize the child entries into a single string: 500ms.
This currently represents about 5% of the time necessary to update the
directory, and might grow to 10% once we make some other performance
improvements.
I suspect (but do not have confirmation) that most of this time is spent
encrypting the child write-caps. We do a couple of hashes, an HMAC, and a
short AES encryption for each one.
Since most updates only modify one row, we could consider caching the
serialized forms and re-using them during update. Basically _unpack_contents
would return a dict mapping childname to (name, rocap, rwcap, metadata,
serialized), and all update functions would put a None in the 'serialized'
slot. _pack_contents would use the 'serialized' slot if present, otherwise it
would re-serialize the other parameters.
This is only a minor improvement, so I expect it will be a long time before
we decide to implement it, but I wanted to get the ideas down before we
forgot.
See also #414 (profiling on directory unpacking) and #383 (large directories take a long time to modify).
See also #327 (performance measurement of directories).
Kevan Carstensen expressed interest in working on this.
So, summarizing discussions with zooko and warner...
We first need to handle testing for backwards compatibility. This means that we'll want to capture a base32-encoded representation of the output of _pack_contents as it sits now when run against some small, known directory structure. We'll then use that to test _unpack_contents (to make sure that the unpacked directory structure is what we're expecting), and _pack contents (to make sure that packing the same known directory structure results in the same packed representation).
Then we can do the optimizations that this ticket suggests.
(that didn't really need to be a list, huh.)
I've started work on the first point by writing a little program to try to get the base32 representation of the output of _pack_contents. Unfortunately, I run into path issues when I try to run it.
At the top of my script, I have
to tell the Python interpreter about tahoe's support directory. The goal was to have a somewhat less robust version of the detection that the tahoe executable does when it first starts (since I know where the directory is on my system, and don't imagine that the script is going to be run elsewhere, since it's kind of a one-time deal to make tests work).
When I run the script, and try to import something from the tests directory, I get
which would seem to suggest that my path hacking doesn't do what I want it to.
Is there something easy that I'm missing that would make this work?
(also, my approach for getting this information is based on what of tahoe's code I've read, which isn't much. if there's a more obvious way to get the strings that we need, or if my program is flawed in some other way, please let me know)
Attachment get_dirnode_strings.py (1322 bytes) added
a first shot at making a test program
The way to do that is to set the
PYTHONPATH
environment variable before invoking thepython
interpreter. There are some hacks (mostly due to setuptools, I think) that need to run at interpreter startup time and learn about the available Python packages, so adding to yoursys.path
after interpreter startup won't work.It was a good idea though.
If you can't or don't want to set your
PYTHONPATH
, then you'll need to find thefoolscap
package, probably in/path-to-tahoe/tahoe/support/lib/python2.6/site-packages/foolscap-0.4.2-py2.6.egg
, and add that whole path (including thething.egg
part) to yoursys.path
, and then do likewise with the other packages.That fixed the issue -- thanks.
Just so I'm clear on the next part:
Currently,
_unpack_contents
returns a dictionary mapping the name of each child node to a pair(child node instance, metadata)
, where metadata is a dictionary of metadata._pack_contents
gets therocap
andrwcap
(if present) from the child node instance. What we want to do is_unpack_contents
to return a dict mapping the name of each child node to a triple(child node instance, metadata, serialized)
, where child node instance and metadata are as they are now, and serialized is the serialized from of the other data._pack_contents
to deal with this new representation; specifically, to useserialized
if present instead of serializing the other data, which we think is expensive.Adder
,MetadataSetter
andDeleter
to work with these changes; specifically, they should, when modifying a node, set the correspondingserialized
value to None, which will tell_pack_contents
that it needs to re-serialize that node._pack_contents
to the new_unpack_contents
, we should get the same underlying directory structure back (i.e.: all expected child node names are present, and metadata values are as they should be. Is there anything else we'd want to check?)_pack_contents
should be the same as the output from the old one, given the same directory structure.Adder
,Deleter
, andMetadataSetter
in the same way that we do now. Are there already tests to adequately verify this, or should I plan on writing those, too?(apologies if this is pedantic, but I'm still getting the hang of tahoe, and don't want to go off on an irrelevant tangent because I've made a bad assumption somewhere)
That sounds good to me!
I usually write code like this by adding a new unit test. You can run a
specific unit test with a minimum of fuss (and startup time and extraneous
noise) by doing e.g.
make quicktest TEST=allmydata.test.test_dirnode.DeepStats.test_stats
.That will get all the PYTHONPATH stuff set up for you. Anything you
print
from the unit test will get displayed, so I use this for one-offtools all the time.
Also, you can use
misc/run-with-pythonpath.py
, which will set up theright environment and then run the command of your choice. For example, if
you wrote a
foo.py
to do the stuff you just described, then you couldinvoke
python misc/run-with-pythonpath.py python foo.py
.I'm disappointed that the sys.path technique you tried didn't work.. it used
to.
Your approach to
_pack_contents
sounds great! Note that theserialized
form should contain the wholename+rocap+encrwcap+metadata
string (i.e. be sure to include the name).That way, if
_pack_contents
sees that it has a pre-serialized stringavailable, it can just append that to the list of entries that it's building,
and doesn't need to re-serialize the childname either. At the end of that
loop, it should do a single
"".join(entries)
to perform the finalassembly (rather than doing incremental
+=
operations, which would be alot slower).
We should make sure that the rocap/rwcap is the same too. Doing
child.get_uri() and comparing it against a constant is plenty.
test_dirnode.py should already have test coverage for the modifiers. It
exercises all the
NewDirectoryNode
methods, likeset_uri
andlist
. If they pass, then you've updated the modifier classessuccessfully.
Also, if it works on your platform (it's somewhat touchy right now), use our
test-coverage tools:
make quicktest-figleaf figleaf-output
, and checkto see that all the code you've touched is actually being run. The buildbot
has a link to the current coverage data (generated under py2.4; it seems that
py2.5 gives slightly different answers). I think it should be pretty easy to
make these changes and achieve the same or better coverage than before.
Also, I don't know if it'd be worth it, but you could get fancy and instead
of changing the data structure from a list of two-tuples to a list of
three-tuples, you could:
list
*setitem*
to remove that auxilliary valueget_both_items
method, which takes a key and returns a two-tuple of (aux-serialized, value) (which is really (aux-serialized, (child,metadata)))set_both_items
method which takes (key, aux-serialized, (child,metadata))Adder
andDeleter
don't have to change: they'll set the entries that have changed and leave the rest alone_pack_contents
usesget_both_items
and prefers the pre-serialized form if it's still thereSince
NewDirectoryNode.list
returns whatever_unpack_contents
returns, you might manage to write less code if you use this technique.
Otherwise you'll need to find all the callers of
_read
and update themto tolerate the new three-tuple (and add a wrapper to
list
to convertit into two-tuple form for external callers, so that it continues to honor
the same interface defined in
allmydata.interfaces.IDirectoryNode
).Oh, and of course, it would be a good idea to actually measure the time that serialization/deserialization takes before doing any of this work. Create 10/100/1k/10k/100k entries at random, call
_pack_contents
undertimeit.py
or some other sort of how-long-does-it-take loop, and figure out anA+Bx
curve (I'd expect the serialization time to be some constant A plus some per-entry time B, let's figure out what A and B are). Let's make sure that serialization is the culprit, it might be that call to_create_node
in unpack that we should focus on. And let's set a reasonable goal.. maybe we should be able to unpack+modify+pack 10k entries in less than a second, or something.The measuring part is my job -- #327 (performance measurement of directories).
Okay, I've got tests in place -- I'll attach them in a bit.
One minor nit: I was working on the assumption that
_pack_contents
and_unpack_contents
were inverses -- i.e., if I do_pack_contents(_unpack_contents(string))
, I should getstring
back. I made this a condition in my test function, but it failed --_pack_contents
would return a different string, even when fed with the unmodified dict returned from_unpack_contents
. But all of the data inside the dict seem the same -- at least the rocaps, rwcaps, and metadata. Is this normal?Also, the hardcoding of the base32-encoded representation of that directory tree is kind of horrific -- any ideas for a better way of doing that?
source:src/allmydata/dirnode.py#L201
The reason is that the encryption of the write-cap uses a random IV. There isn't an easy way to make that deterministic and still secure right now, so we can't rely on it for testing.
(Hm, for future reference -- i.e. after Tahoe v1.5.0 -- maybe we could generate the random IV as the secure hash of the write cap itself. That would be make it deterministic without, as far as I can currently see, losing security.)
As to the base32-encoded directory tree, that looks okay to me.
Okay. Given that, do you think that the other tests in
test_unpack_and_pack_behavior()
are going to be enough?I've finished doing a few things.
_pack_contents
and_unpack_contents
intest_dirnode.py
to not check the output of_pack_contents
for equality, since that will always fail at the moment. I've also moved a misplaced comment.CachingDict
, the data structure described by warner above (cool idea, by the way).CachingDict
todirnode.py
. It basically does what warner suggested that it do._pack_contents
,_unpack_contents
, and_create
to useCachingDict
s instead ofdict
s.With these done (tests.txt, dict.txt, optimizations.txt applied),
python setup.py test
passes on my machine.Attachment tests.txt (26728 bytes) added
tests for _unpack_contents, _pack_contents, and CachingDict
Attachment optimizations.txt (22867 bytes) added
changing dirnode.py to use CachingDict
I've looked over your patches and it looks good! I'll work on my benchmarks of dirnode now so I can see what difference it makes to CPU usage.
I noticed when running a test script that zooko had that I was throwing away *args and **kwargs in my CachingDict subclass. I'm attaching a patch that fixes that issue.
Attachment dict.2.txt (19958 bytes) added
(erm, not **kwargs -- typo)
changeset:e414c73877ba6337 adds benchmarking of unpack-followed-by-repack, which is the functionality that this ticket and Kevan's patch is about.
I've run the benchmarks, and am attaching the results.
To get the
unmodifed_results
file, I simply ranbench_dirnode.py
against an unmodified checkout from trunk.To get the
modified_results
file, I needed to change a call todict
inbench_dirnode.py
to refer todirnode.CachingDict
(otherwise_pack_contents
raises anAssertionError
). I'm attaching a patch that duplicates that change. I then applieddict.txt
,optimizations.txt
, andtests.txt
to an otherwise unmodified source tree, and ranbench_dirnode.py
against it.The results seem to favor the optimized implementation.
I'm also updating dict.txt again -- hopefully for the last time -- to fix another *args related issue that I found.
Attachment dict.txt (19947 bytes) added
CachingDict implementation
Attachment bench.txt (20384 bytes) added
benchmark modifications for testing optimized code
Attachment modified_benchmark (1977 bytes) added
Benchmarks for optimized code
Attachment unmodified_benchmark (1977 bytes) added
benchmarks for unoptimized code
could you summarize the benchmark numbers in terms of A+Bx factors? Also, it's not clear to me what those numbers point to as being the expensive part: is it the netstring parsing? or the encryption?
Summarizing into
A+Bx
is not the way I do it. Instead I look at a handful of data points, like this:This is a good example of why I do it this way: because it isn't linear! So any
A+Bx
summary would be off. I already have a patch which fixes the observable non-linearity there. I guess I might as well commit that one so that everyone can see it. There -- changeset:efafcfb91a09b4de, entitled "directories: keep track of your position as you decode netstring after netstring from an input buffer instead of copying the trailing part | This makes decoding linear in the number of netstrings instead of O(N^2^).". After this patch, the same benchmark on the same machine says:As to your question about what's the expensive part, after the optimization patch from Kevan plus a few that I have here in my sandbox, benchmarking shows:
and profiling shows:
Whoops, time for me to go to work! I hope to measure and commit the rest of my optimization patches today after work. There is one that I really want other people's input on before I commit it... That one generates the per-entry IV with a secure hash of the writecap instead of with
os.urandom(16)
.Fixed by changeset:903005a52830ba96 but see also the new #752 (speed up directories more). Thanks, Kevan!
Note that r3971 didn't actually commit the optimization code -- just the dictionary that it uses. If you want that, you should also apply the optimizations.txt patch and the tests.txt patch (which also contains tests for CachingDict).
Oh how embarassing that I thought that was the optimization patch. I was actually going to ask you what you thought about the mysterious fact that changeset:903005a52830ba96 seems to moderately speed up pack and unpack but not so much unpack-and-repack!
Now that I've applied your actual optimization patch, time to unpack and repack a 4096-entry directory dropped substantially. However, time to pack and time to unpack also changed, which suggests that my benchmark script is accidentally using your cache, or that extraneous factors on my system are screwing up the results or something.
Before
optimizations.txt
:after
optimizations.txt
:I've applied your patches plus one from me to fix a conflict between one of my optimizations and one of yours.
The patch to cache entries was and required follow-up merge patch changeset:34213cd2c70246f3.
At around the same time other patches to optimize directory processing were also committed: changeset:efafcfb91a09b4de (the big one that makes processing linear instead of O(N^2^)), changeset:c0d1e7deaec145d6, changeset:786ed012b3510135 (which required urgent security fix follow-up changeset:c1d5717cf0ecd68f.