parse_abbreviated_size doesn't accept T for terabytes (and other quibbles with the regex it uses) #1812

Closed
opened 2012-09-19 03:24:35 +00:00 by davidsarah · 16 comments
davidsarah commented 2012-09-19 03:24:35 +00:00
Owner

stercor tried to use "reserved_space = 1.5TiB" in tahoe.cfg. This failed and used a reserved_space of 0 bytes (as shown on the server status page), because T is not recognized by [parse_abbreviated_size in allmydata.util.abbreviate]source:git/src/allmydata/util/abbreviate.py@39a41f1d#L58.

Note that the error due to an unparseable size is silent and only reported in the foolscap log. Continuing with a reserved_space of 0 bytes is a rather unhelpful failure mode. The code responsible for that is at source:git/src/allmydata/client.py@32377469#L248.

The regex is r"^(\d+)(kKmMgG?iB?bB?)$", which also allows strings ending in "BB" or "Bb"; I don't think that allowing the repeated B was intentional.

stercor tried to use "`reserved_space = 1.5TiB`" in `tahoe.cfg`. This failed and used a `reserved_space` of 0 bytes (as shown on the server status page), because `T` is not recognized by [parse_abbreviated_size in allmydata.util.abbreviate]source:git/src/allmydata/util/abbreviate.py@39a41f1d#L58. Note that the error due to an unparseable size is silent and only reported in the foolscap log. Continuing with a `reserved_space` of 0 bytes is a rather unhelpful failure mode. The code responsible for that is at source:git/src/allmydata/client.py@32377469#L248. The regex is `r"^(\d+)(kKmMgG?iB?bB?)$"`, which also allows strings ending in `"BB"` or `"Bb"`; I don't think that allowing the repeated B was intentional.
tahoe-lafs added the
code
normal
defect
1.9.2
labels 2012-09-19 03:24:35 +00:00
tahoe-lafs added this to the soon milestone 2012-09-19 03:24:35 +00:00
davidsarah commented 2012-09-19 03:31:30 +00:00
Author
Owner

Replying to davidsarah:

The regex is r"^(\d+)(kKmMgG?iB?bB?)$", which also allows strings ending in "BB" or "Bb"; I don't think that allowing the repeated B was intentional.

... especially because parse_abbreviated_size will then fail with a KeyError (not a ValueError as it should) because only one B is stripped off, not both.

Replying to [davidsarah](/tahoe-lafs/trac-2024-07-25/issues/6844): > The regex is `r"^(\d+)(kKmMgG?iB?bB?)$"`, which also allows strings ending in `"BB"` or `"Bb"`; I don't think that allowing the repeated B was intentional. ... especially because `parse_abbreviated_size` will then fail with a `KeyError` (not a `ValueError` as it should) because only one `B` is stripped off, not both.
davidsarah commented 2012-09-19 04:28:14 +00:00
Author
Owner

Please review: https://github.com/davidsarah/tahoe-lafs/commit/d8baf7aa5fe967f2611c55d297fb522b16706f89

Note that this only addresses the terabyte and double-B issues, not the error reporting. It also needs tests. I allowed E for exabytes as well.

~~Please review: <https://github.com/davidsarah/tahoe-lafs/commit/d8baf7aa5fe967f2611c55d297fb522b16706f89>~~ Note that this only addresses the terabyte and double-B issues, not the error reporting. It also needs tests. I allowed E for exabytes as well.
davidsarah commented 2012-09-19 04:38:07 +00:00
Author
Owner

Apparently I don't know my petabytes from my exabytes. Here's a fixed version:
https://github.com/davidsarah/tahoe-lafs/commit/84d55da319f199c1ff335186330d34ebf94868fe

Apparently I don't know my petabytes from my exabytes. Here's a fixed version: <https://github.com/davidsarah/tahoe-lafs/commit/84d55da319f199c1ff335186330d34ebf94868fe>
davidsarah commented 2012-12-27 20:04:03 +00:00
Author
Owner

Review needed for https://github.com/tahoe-lafs/tahoe-lafs/pull/26 (now with tests).

Review needed for <https://github.com/tahoe-lafs/tahoe-lafs/pull/26> (now with tests).
tahoe-lafs modified the milestone from soon to 1.10.0 2012-12-27 20:04:03 +00:00
warner was unassigned by tahoe-lafs 2012-12-27 20:04:03 +00:00
zancas was assigned by tahoe-lafs 2012-12-27 20:04:03 +00:00

I don't understand why "I" and "i" are considered valid suffixes.

I don't understand why "I" and "i" are considered valid suffixes.

The "^" symbol in the re.match call used to parse out the "number" and "suffix" values is, if I understand correctly, redundant with the way re.match works.

This is because the "pattern" (first) argument to re.match is applied to the beginning of the "string" (second) argument. The "^" specifies that the pattern is to match from the beginning of its target. So the two functions, the one intrinsic to re.match, and the one specified by "^" are identical.

This doesn't necessarily mean that the "^" should be removed from the code as it, debatably, increases readability.

The "`^`" symbol in the re.match call used to parse out the "number" and "suffix" values is, if I understand correctly, redundant with the way re.match works. This is because the "pattern" (first) argument to re.match is applied to the beginning of the "string" (second) argument. The "`^`" specifies that the pattern is to match from the beginning of its target. So the two functions, the one intrinsic to re.match, and the one specified by "`^`" are identical. This doesn't necessarily mean that the "`^`" should be removed from the code as it, debatably, increases readability.

I've reviewed these changes and run the unittests they include.

I concur that the doubled "BB" suffix now correctly raises a ValueError.

I concur that Tera-, Peta-, and Exa-bytes are now correctly handled as scales of 2^40^, 2^50^, and 2^60^ respectively.

I note that the "^" in the "pattern" argument to re.match is not strictly necessary, but assert that it is correct, and may enhance readability.

I do not understand why the suffix's "I" and "i" are to be handled as they are.

It seems to me that patterns that contain only "I", "i", or e.g. "iB", are errors, but it's plausible I'm simply unfamiliar with established convention.

I've reviewed these changes and run the unittests they include. I concur that the doubled "BB" suffix now correctly raises a `ValueError`. I concur that Tera-, Peta-, and Exa-bytes are now correctly handled as scales of 2^40^, 2^50^, and 2^60^ respectively. I note that the "`^`" in the "pattern" argument to re.match is not strictly necessary, but assert that it is correct, and may enhance readability. I do not understand why the suffix's "I" and "i" are to be handled as they are. It seems to me that patterns that contain only "I", "i", or e.g. "iB", are errors, but it's plausible I'm simply unfamiliar with established convention.
davidsarah commented 2012-12-27 23:28:50 +00:00
Author
Owner

Whether "IB" or "I" on its own should be an error is debatable, but it is currently accepted, and I don't see any particular reason to change that. Use of ^ at the start of the pattern is simply a matter of style; I prefer the explicitness. So I'll commit this as-is, plus a change to source:git/docs/configuration.rst to mention the new prefixes.

Whether "IB" or "I" on its own should be an error is debatable, but it is currently accepted, and I don't see any particular reason to change that. Use of `^` at the start of the pattern is simply a matter of style; I prefer the explicitness. So I'll commit this as-is, plus a change to source:git/docs/configuration.rst to mention the new prefixes.
davidsarah commented 2012-12-27 23:42:48 +00:00
Author
Owner

Change to configuration.rst is in changeset:eba64f2a, the rest is in the previous two patches.

Change to configuration.rst is in changeset:eba64f2a, the rest is in the previous two patches.
tahoe-lafs added the
fixed
label 2012-12-27 23:42:48 +00:00
davidsarah commented 2012-12-29 02:33:54 +00:00
Author
Owner

Reopening to fix this:

Note that the error due to an unparseable size is silent and only reported in the foolscap log. Continuing with a reserved_space of 0 bytes is a rather unhelpful failure mode. The code responsible for that is at source:git/src/allmydata/client.py@32377469#L248.

Reopening to fix this: > Note that the error due to an unparseable size is silent and only reported in the foolscap log. Continuing with a reserved_space of 0 bytes is a rather unhelpful failure mode. The code responsible for that is at source:git/src/allmydata/client.py@32377469#L248.
tahoe-lafs removed the
fixed
label 2012-12-29 02:33:54 +00:00
tahoe-lafs added
minor
and removed
normal
labels 2013-01-31 15:50:42 +00:00

Looks like we should not catch the exception, and allow it to be raised, since init_storage() is called before daemonization, so errors will appear on the 'tahoe start' stderr. The only thing that might need special-casing is "reserved_space=" (i.e. the default do-not-reserve-any-space), just in case that returns an empty string or something unparseable. This should be covered in unit tests.

Looks like we should *not* catch the exception, and allow it to be raised, since `init_storage()` is called before daemonization, so errors will appear on the 'tahoe start' stderr. The only thing that might need special-casing is "reserved_space=" (i.e. the default do-not-reserve-any-space), just in case that returns an empty string or something unparseable. This should be covered in unit tests.

So this patch isn't actually ready, since it needs the improvements Brian described in comment:89589. Removing the "reviewed" tag.

So this patch isn't actually ready, since it needs the improvements Brian described in [comment:89589](/tahoe-lafs/trac-2024-07-25/issues/1812#issuecomment-89589). Removing the "reviewed" tag.
davidsarah commented 2013-03-10 23:54:20 +00:00
Author
Owner

Replying to zooko:

So this patch isn't actually ready, since it needs the improvements Brian described in comment:89589. Removing the "reviewed" tag.

The patch has already been applied, but when I reopened the ticket I should have removed the "reviewed" tag since it was not applicable to any yet-to-be-applied patch.

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1812#issuecomment-89590): > So this patch isn't actually ready, since it needs the improvements Brian described in [comment:89589](/tahoe-lafs/trac-2024-07-25/issues/1812#issuecomment-89589). Removing the "reviewed" tag. The patch has already been applied, but when I reopened the ticket I should have removed the "reviewed" tag since it was not applicable to any yet-to-be-applied patch.

Um, so since the patch was applied, but according to Brian (comment:89589), we should not catch the exception, does this mean that fixing the patch should be a blocker for Tahoe-LAFS v1.10? Assigning to Brian to clarify.

Um, so since the patch was applied, but according to Brian ([comment:89589](/tahoe-lafs/trac-2024-07-25/issues/1812#issuecomment-89589)), we should *not* catch the exception, does this mean that fixing the patch should be a blocker for Tahoe-LAFS v1.10? Assigning to Brian to clarify.

Please see the last patch on https://github.com/warner/tahoe-lafs/tree/1812-parse-size for review, it should report the parse error early, and updates the tests to match.

Please see the last patch on <https://github.com/warner/tahoe-lafs/tree/1812-parse-size> for review, it should report the parse error early, and updates the tests to match.
David-Sarah Hopwood <david-sarah@jacaranda.org> commented 2013-03-20 23:25:42 +00:00
Author
Owner

In changeset:b084396bddd1e12a:

client.py: throw error when reserved_space= is unparseable. Closes #1812.

This should now fail quickly (during "tahoe start"). Previously this
would silently treat an unparseable size as "0", and the only way to
discover that it had had a problem would be to look at the foolscap log,
or examine the storage-service web page for the unexpected "Reserved
Size" number.
In changeset:b084396bddd1e12a: ``` client.py: throw error when reserved_space= is unparseable. Closes #1812. This should now fail quickly (during "tahoe start"). Previously this would silently treat an unparseable size as "0", and the only way to discover that it had had a problem would be to look at the foolscap log, or examine the storage-service web page for the unexpected "Reserved Size" number. ```
tahoe-lafs added the
fixed
label 2013-03-20 23:25:42 +00:00
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Reference: tahoe-lafs/trac-2024-07-25#1812
No description provided.