parse_abbreviated_size doesn't accept T for terabytes (and other quibbles with the regex it uses) #1812
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1812
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
stercor tried to use "
reserved_space = 1.5TiB
" intahoe.cfg
. This failed and used areserved_space
of 0 bytes (as shown on the server status page), becauseT
is not recognized by [parse_abbreviated_size in allmydata.util.abbreviate]source:git/src/allmydata/util/abbreviate.py@39a41f1d#L58.Note that the error due to an unparseable size is silent and only reported in the foolscap log. Continuing with a
reserved_space
of 0 bytes is a rather unhelpful failure mode. The code responsible for that is at source:git/src/allmydata/client.py@32377469#L248.The regex is
r"^(\d+)(kKmMgG?iB?bB?)$"
, which also allows strings ending in"BB"
or"Bb"
; I don't think that allowing the repeated B was intentional.Replying to davidsarah:
... especially because
parse_abbreviated_size
will then fail with aKeyError
(not aValueError
as it should) because only oneB
is stripped off, not both.Please review: https://github.com/davidsarah/tahoe-lafs/commit/d8baf7aa5fe967f2611c55d297fb522b16706f89Note that this only addresses the terabyte and double-B issues, not the error reporting. It also needs tests. I allowed E for exabytes as well.
Apparently I don't know my petabytes from my exabytes. Here's a fixed version:
https://github.com/davidsarah/tahoe-lafs/commit/84d55da319f199c1ff335186330d34ebf94868fe
Review needed for https://github.com/tahoe-lafs/tahoe-lafs/pull/26 (now with tests).
I don't understand why "I" and "i" are considered valid suffixes.
The "
^
" symbol in the re.match call used to parse out the "number" and "suffix" values is, if I understand correctly, redundant with the way re.match works.This is because the "pattern" (first) argument to re.match is applied to the beginning of the "string" (second) argument. The "
^
" specifies that the pattern is to match from the beginning of its target. So the two functions, the one intrinsic to re.match, and the one specified by "^
" are identical.This doesn't necessarily mean that the "
^
" should be removed from the code as it, debatably, increases readability.I've reviewed these changes and run the unittests they include.
I concur that the doubled "BB" suffix now correctly raises a
ValueError
.I concur that Tera-, Peta-, and Exa-bytes are now correctly handled as scales of 2^40^, 2^50^, and 2^60^ respectively.
I note that the "
^
" in the "pattern" argument to re.match is not strictly necessary, but assert that it is correct, and may enhance readability.I do not understand why the suffix's "I" and "i" are to be handled as they are.
It seems to me that patterns that contain only "I", "i", or e.g. "iB", are errors, but it's plausible I'm simply unfamiliar with established convention.
Whether "IB" or "I" on its own should be an error is debatable, but it is currently accepted, and I don't see any particular reason to change that. Use of
^
at the start of the pattern is simply a matter of style; I prefer the explicitness. So I'll commit this as-is, plus a change to source:git/docs/configuration.rst to mention the new prefixes.Change to configuration.rst is in changeset:eba64f2a, the rest is in the previous two patches.
Reopening to fix this:
Looks like we should not catch the exception, and allow it to be raised, since
init_storage()
is called before daemonization, so errors will appear on the 'tahoe start' stderr. The only thing that might need special-casing is "reserved_space=" (i.e. the default do-not-reserve-any-space), just in case that returns an empty string or something unparseable. This should be covered in unit tests.So this patch isn't actually ready, since it needs the improvements Brian described in comment:89589. Removing the "reviewed" tag.
Replying to zooko:
The patch has already been applied, but when I reopened the ticket I should have removed the "reviewed" tag since it was not applicable to any yet-to-be-applied patch.
Um, so since the patch was applied, but according to Brian (comment:89589), we should not catch the exception, does this mean that fixing the patch should be a blocker for Tahoe-LAFS v1.10? Assigning to Brian to clarify.
Please see the last patch on https://github.com/warner/tahoe-lafs/tree/1812-parse-size for review, it should report the parse error early, and updates the tests to match.
In changeset:b084396bddd1e12a: