warn users about the performance issues of mutable files #878
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#878
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Performance issues:
Currently, new users can carefully read the Tahoe-LAFS docs and then go on and decide to use mutable files without being aware of these issues. To close this ticket, fix that.
Here's the thread where new user Jody Harris made it clear that a new user who does read the docs still doesn't learn about these issues: http://allmydata.org/pipermail/tahoe-dev/2010-January/003478.html
I'll take care of this.
I added the documentation to known_issues.txt, since there are proposals and tickets open that hope to fix this (which would seem to imply that it is a known issue).
Thoughts? Things that should be there but aren't?
After reading a message (http://allmydata.org/pipermail/tahoe-dev/2010-January/003488.html) on tahoe-dev, I realized that I had misunderstood mutable file modification when writing my first patch. While the process I described was accurate for certain operations (specifically directory modification), it didn't apply to file creation using the CLI or the WUI, the places where users would be creating mutable files, and the places where the warning would be relevant. I'm attaching a reworded patch that fixes this issue.
This ticket is a subset of #757 (there isn't a doc that says "which operations are efficient").
FWIW here are measurements of how many CPU cycles are needed to generate an RSA 2048 bit key: http://bench.cr.yp.to/results-sign.html (the ones labelled "ronald2048"). That is not measuring the same implementation of RSA as the one we use, but it is a good benchmark to show that generating RSA keys is expensive.
(http://allmydata.org/trac/tahoe/attachment/ticket/878/mutable_docs.txt#L21) :
"will be invalidated if the file is modified" -> "would be invalidated if the file were modified".
"tahoe-lafs" -> "Tahoe-LAFS" (three times)
while "billions of CPU cycles" is technically accurate, it would be more meaningful to users to say "perhaps an entire second on a desktop PC" (and maybe a parenthetical remark about small ARM boxes). We don't want to scare them away from using directories altogether, just help them understand why a loop that creates a million directories might take a million seconds.
Also, I believe the motivation for this ticket was specifically about large mutable files, so I'd emphasize the unfortunate-and-we-haven't-fixed-with-MDMF performance aspects (i.e. the cost=O(filesize) parts) rather than the unfortunate-and-we-haven't-fixed-with-ECDSA aspects (like the constant cost of creating new mutable files).
For Jody Harris, seconds elapsed on today's average PC might be more useful (or maybe not -- perhaps he prefers CPU cycles), but for Jonathan Ellis (the bug reporter of #757) CPU cycles is probably more useful. Also I wonder about people who are running their Tahoe-LAFS gateway on virtual machine. Would seconds-on-an-average-modern CPU significantly underestimate the cost to them?
like I said, "billions of CPU cycles" is more accurate (and more universal), but I think the most likely audience for this document will be well-served by having at least one human-meaningful unit of measure in there somewhere, even if only anecdotally. For example, I tell people that the unit tests currently take about 240s on my 2008-era laptop, and I tell them that "tahoe mkdir" takes about 800ms on the same machine. And I expect that people will know how their own hardware compares to a reference point like that. Let's not refuse to offer them a translation hint just because we can't give them an exact number of seconds for their particular hardware.
I'm updating the patch to include David-Sarahs' suggestions. Thanks for the feedback. :)
zooko and I were talking in IRC, and concluded that the explanation of why RSA is used with mutable files is inappropriate for known_issues.txt. I'll remove it when I work on the cycles versus seconds issue.
I think I agree with Brian.
Without a meaningful human figure to put "billions of CPU cycles" into perspective, that paragraph is a tad scarier than it needs to be. My first instinct when reading this exchange was to try to work both figures in there, but the point of that paragraph seems a lot clearer with only seconds than with both cycles and seconds.
I moved the explanation of mutable file performance issues to docs/performance.txt, because that seemed like a more appropriate place for it.
Attachment mutable_docs.txt (35048 bytes) added
mutable file documentation
Looks good to me.
Applied as changeset:26c6b806d7922da1. Thank you!