show sizes in unambiguous way that doesn't get mistaken for different units #964
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#964
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
When setting up a storage node, it took me a long time to figure out why the storage wasn't respecting my set 15GB of reserved space on the drive. I finally realized that like hard drive manufacturers but unlike the rest of the planet, Tahoe is counting size in base-10, not base-2-- so kilobytes are 1000 bytes, not 1024, and so on. This leads to reports from Tahoe being dramatically different than, say,
df -h
, and thus creates confusion.A good way out of this mess is to spell it out explicitly -- write "2^30^" or "10^9^" or "one billion".
See also:
http://en.wikipedia.org/wiki/Binary_prefix
Hitherto I believe we've been using "GiB" to mean 2^30^ (per http://en.wikipedia.org/wiki/Binary_prefix ) and we may have sometimes been using "GB" to mean 10^9^. That latter usage, while technically correct, and accurate and meaningful to the vast majority of users (who do base-10 arithmetic in their heads but not base-2 arithmetic), is confusing to hackers like USSJoin. So we should avoid it, possibly by replacing uses of "GB" with "10^9^".
Alternately, you could simply use GiB consistently everywhere (for instance, on the storage information page) and do the units as GiB, not 10^9. That way people don't have to go "what's a 10^9?" when looking at information pages.
We generally prefer base-10 arithmetic, because it is easier for users to use. For example, if you ask my mom how many 20-byte things she can store in a 2-terabyte bucket, she'll probably ask "What's a terabyte?", and if you tell her it is a trillion bytes, she'll say "Then I can store 100 billion of them.". If instead you tell her that it is 2^40^ bytes, then she'll either have to get out a calculator or she'll just give up.
In fact, I strongly suspect that a similar problem applies to computer hackers as well as to moms. Quick, how many 20-byte things can you store in a bucket of size 2^41^ bytes (two TiB)? I think it will take you longer to answer that question that it would take my mom to answer the base-10 variant of the question.
If you answered "about a hundred billion of them" then your answer was 10% off!
Back when our buckets were on the order of thousands of elements in size, the approximation of 2^10^≅10^3^ was only 2.5% off. The approximation of 2^20^≅10^6^ is 5% off, 2^30^≅10^9^ is 7.5% off, and 2^40^≅10^12^ is 10% off.
By the way, Apple products now report file sizes and filesystem spaces in base-10: http://support.apple.com/kb/TS2419
We're with USSJoin on this. Resist the hard disk manufacturers' conspiracy.
Replying to zooko:
This is not a convincing argument, since you're more likely to need to know how many 1 MiB files, say, can be stored in a terabyte. (BTW, your mom's estimate would be wildly wrong for 20-byte files due to overhead.)
What mattered was that there was a consistent convention. Since most uses of "GB" (for example) still mean 2^30 bytes, Tahoe is going in the wrong direction to reduce confusion.
People have strong reasons for strong preferences on both sides. How about making this configurable, so then we can fight about the default instead of forcing one style on everyone? (Also, base-2 sizes should be the default.)
/me runs into the room waving his hands madly like a muppet. nooo!
Please don't contribute to the confusion by printing "x GB" and silently
using it to mean 2^30^. The entire non-computer world, the SI, and every
dictionary on the planet knows that the metric G suffix means giga means
10^9^. And while I find "GiB" pretty funny-looking, it is an unambiguous,
learnable, and eventually-straightforward term that clearly means 2^30^.
Let's not conflate the two. Sure, this helps the hard-drive manufacturers,
but it's a terminology bugfix, not a conspiracy :).
I'm -1 on having a config option for pretending GB=2^30^: someone looking at
the web page (and not at the config file) would be unable to learn the truth.
In places where we have evidence that people want both sorts of values, we
should give them both sorts of values. For example, on the "Storage Server
Status" page, we currently show abbreviated GB (10^9^) and unabbreviated
number-of-bytes:
My hope was that the "319728959488" would look enough like the "319.73" to
cue the reader into remembering that GB means 10^9^, but the original
poster's experience suggests this failed. Some other options for that
display:
Since this is a web page, we could also have a popup over the "319.73 GB"
line that displays a number of other formats, not unlike we recently added a
popup to Foolscap's log-web-viewer display to show timestamps in alternate
formats (UTC/local/short/long):
I'm -0 on having a config option that makes these pages display GiB
instead of GB, as long as it never ever tries to pretend that GB is
2^30^, and that there continues to be a full-number-of-bytes display so that
someone looking at the abbreviation has a chance to figure out what it means
and become confident in our consistent use of terms.
Huh? Except for programmer-driven test cases, I don't think there's any
particular quantization on filesizes. I'm not counting 1 MB or 1 MiB files,
I'm counting how many digital pictures I can stuff onto a disk, and they're
all sorts of random sizes. The only real quantization I can think of would be
the chapters on a ripped DVD image (according to wikipedia these are usually
1 GiB in size), but I really don't think "how many non-terminal DVD VOB files
can I fit on this disk" is a common question.
We're always using GB to mean 10^9^ and GiB to mean 2^30^. I fix the
code if I discover it doing otherwise.
This is what people call "a bike shed". The theory goes that few people are willing to contribute their opinions about designing nuclear power plants, because that is very complex and requires high expertise, but many people are willing to contribute their opinions about designing a bike shed, because it is simple enough that they can see how they would like it to be.
(Aside: I don't really like that metaphor of a "bike shed" because it belittles the concerns of the contributors. I actually agree with USSJoin, davidsarah, and kmarkley86 that user interface issues are important, including this one. Don't forget that the original post by USSJoin explained how he actually lost some of his time due to confusion. Wasting user time is not okay! Also, a design being simple and easy to understand doesn't mean that it doesn't matter how it is done!)
However, this issue has now distracted both David-Sarah and Brian from building nuclear power plants. Let's put a stop to the discussion. Our policy will be to express numbers in units that are as unambiguous as possible so that a user who assumes that "GB" means 2^20^ and a user who assume that "GB" means 10^9^ will both have a minimal chance of wasting their time with confusion. Specifically, the suggestions that Brian made in comment:75729 about redundantly listing the same value in different units would probably help.
That's the main idea -- to make the user interface sufficiently clear (even at the cost of redundancy) that nobody wastes their time mistaking the units. I believe this policy will satisfice.
We will continue to use
KiB
to mean 10^3^,MiB
to mean 10^6^,GiB
to mean 10^9^,TiB
to mean 10^12^ etc. as per http://en.wikipedia.org/wiki/Binary_prefix , and never useKB
to mean 2^10^ etc.. However, as per the main idea, above, we will probably try to reduce the use ofKB
at all in favor of less ambiguous designations.Replying to warner:
You know what? This might have worked if the bytes display had included commas, like this:
I don't know about USSJoin, but for me, my eyeballs just slide right off of "319728959488" after the first couple of digits.
Proposed action items to make this ticket closable:
List sizes for storage using base-2 sizes, not base-10to show sizes in unambiguous way that doesn't get mistaken for different unitsI like the commas idea: my eyeballs slide off long numbers too. My hesitation is that every once in a rare while, I cut-and-paste a number like that into a calculator or python repl, and the commas would mess that up. But I think readability trumps cut-and-pasteability. So +1 on the commas.
You might also add units:
(319,728,959,488 bytes)
. But maybe not.Oh, you know, there might possibly be a CSS styling thing that lets you tell the system that this is a number, and that it ought to add comma-like things according to the current locale (since they'd be periods in europe). I have a hazy memory that suggests doing this would also retain cut-and-pasteability, because the commas/periods would be purely visual: cut/paste would still get the original non-comma-ified number. Does this ring any bells for anyone, or am I completely imagining it?
The only way I know to pull off locale-formatted numbers is to use a span with a CSS-class and use javascript to read those elements using parseInt()/parseFloat() and replacing them with toLocaleString(). The locale will get picked up from whatever browser is being used and degrades nicely to plain numbers if javascript is unavailable.
Replying to ScottD:
That's not worth the complexity IMHO. Separating the digit groups with spaces, e.g.
is understood internationally, and personally I think it's more readable. In HTML, (NARROW NO-BREAK SPACE) might be better.
Replying to zooko:
Replying to [Zancas]comment:14:
Argh! How did I screw that up‽ Thanks, Zancas, for noticing. Everyone reading this: disregard what I wrote and just believe that we're going to do what http://en.wikipedia.org/wiki/Binary_prefix says about how to spell the base-2 things. Also, as previously mentioned on this ticket, spelling out numbers with commas in place is unambiguous and is the standard format for integers in Internet English writing.