Crashed WUI due to abbreviate_time in common.py #1055

New Issue

tahoe-lafs · 2010-05-24T16:34:25Z

rycee commented

2010-05-24 16:34:25 +00:00

I've been having a few difficulties with a Tahoe-node I'm running. It produced a long error message whenever I accessed the /storage web path.

With a bit of sleuthing I tracked the problem down to common.py:93. The line

return "%dus" % (1000000*s)

seems to assume that s an integer but s actually is a float as can be seen earlier in the function. Changing this line to

return "%dus" % (1000000*int(s))

fixed the issue for me and /storage now shows without problem.

Looking through the other functions in the file it appears as if there are other cases of %d being used together with float values. Perhaps they can also cause problems?

I cannot produce a stack trace at this time since the node is currently in use but I can try again later if there is a need for it.

I've been having a few difficulties with a Tahoe-node I'm running. It produced a long error message whenever I accessed the /storage web path. With a bit of sleuthing I tracked the problem down to [common.py:93](http://allmydata.org/trac/tahoe-lafs/browser/src/allmydata/web/common.py#L93). The line ``` return "%dus" % (1000000*s) ``` seems to assume that s an integer but s actually is a float as can be seen earlier in the function. Changing this line to ``` return "%dus" % (1000000*int(s)) ``` fixed the issue for me and /storage now shows without problem. Looking through the other functions in the file it appears as if there are other cases of %d being used together with float values. Perhaps they can also cause problems? I cannot produce a stack trace at this time since the node is currently in use but I can try again later if there is a need for it.

tahoe-lafs added the

labels 2010-05-24 16:34:25 +00:00

tahoe-lafs added this to the undecided milestone 2010-05-24 16:34:25 +00:00

zooko commented

2010-05-27 21:39:08 +00:00

Let's fix this for v1.7.0.

zooko modified the milestone from undecided to 1.7.0

2010-05-27 21:39:08 +00:00

rycee commented

2010-05-28 07:49:42 +00:00

I have tried to reproduce this problem and have been unsuccesfull. I think what happened is that the computer clock had for some reason gotten very much out of sync prior to the previous pass by the expire crawler. Eventually the time got synchronized over ntp and the reported problem occurred. This should have caused the abbreviate_time function to recieve a very large negative number. This seems to rhyme well with the following python session:

>>> "%d" % (1000000 * -1234567891234.0)
'-1234567891233999872'
>>> "%d" % (1000000 * -12345678912345.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: int argument required
>>> "%d" % (1000000 * int(-12345678912345.0))
'-12345678912345000000'

That is, python will handle floats as well as can be expected together with %d, as long as the number is small enough.

So, in conclusion, this error should be quite rare indeed and I would not protest if this bug report is closed as WONTFIX or similar. It may possibly be interesting to make the code a bit more robust in these cases, I would have appreciated a "check your clock dummy!" message :-)

I have tried to reproduce this problem and have been unsuccesfull. I think what happened is that the computer clock had for some reason gotten very much out of sync prior to the previous pass by the expire crawler. Eventually the time got synchronized over ntp and the reported problem occurred. This should have caused the abbreviate_time function to recieve a very large negative number. This seems to rhyme well with the following python session: ``` >>> "%d" % (1000000 * -1234567891234.0) '-1234567891233999872' >>> "%d" % (1000000 * -12345678912345.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: int argument required >>> "%d" % (1000000 * int(-12345678912345.0)) '-12345678912345000000' ``` That is, python will handle floats as well as can be expected together with %d, as long as the number is small enough. So, in conclusion, this error should be quite rare indeed and I would not protest if this bug report is closed as WONTFIX or similar. It may possibly be interesting to make the code a bit more robust in these cases, I would have appreciated a "check your clock dummy!" message :-)

zooko commented

2010-05-30 21:00:27 +00:00

Tahoe-LAFS is intended to function correctly in all respects even if the system clock jumps around to arbitrary values [*].

So instead of closing this as wontfix, we'll write a unit test that shows this bug. Thanks for tracking it down! If you want to help out with the next step, report to us what version of Python you have, because my version of Python doesn't behave like yours:

Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> "%d" % (1000000 * -1234567891234.0)
'-1234567891233999872'
>>> "%d" % (1000000 * -12345678912345.0)
'-12345678912344999936'
>>> "%d" % (1000000 * int(-12345678912345.0))
'-12345678912345000000'

[*] Not that it currently does so, but I consider it to be a bug that it exhibits random misbehavior when the gettimeofday(2) gets kicked by ntp or by a sysadmin, due to Tahoe-LAFS relying on the ordering of events in the Twisted reactor and Twisted using the result from gettimeofday(2); see http://twistedmatrix.com/trac/ticket/2424

Tahoe-LAFS is intended to function correctly in all respects even if the system clock jumps around to arbitrary values [*]. So instead of closing this as `wontfix`, we'll write a unit test that shows this bug. Thanks for tracking it down! If you want to help out with the next step, report to us what version of Python you have, because my version of Python doesn't behave like yours: ``` Python 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "%d" % (1000000 * -1234567891234.0) '-1234567891233999872' >>> "%d" % (1000000 * -12345678912345.0) '-12345678912344999936' >>> "%d" % (1000000 * int(-12345678912345.0)) '-12345678912345000000' ``` [*] Not that it currently *does* so, but I consider it to be a bug that it exhibits random misbehavior when the `gettimeofday(2)` gets kicked by ntp or by a sysadmin, due to Tahoe-LAFS relying on the ordering of events in the Twisted reactor and Twisted using the result from `gettimeofday(2)`; see <http://twistedmatrix.com/trac/ticket/2424>

rycee commented

2010-05-31 06:10:51 +00:00

Replying to zooko:

If you want to help out with the next step, report to us what version of Python you have, because my version of Python doesn't behave like yours

The session of comment 2 was taken with the following Python:

Python 2.5.5 (r255:77872, Apr 21 2010, 08:40:04) 
[GCC 4.4.3] on linux2

This is the stock Python of an amd64 Debian Sid installation. The error of the bug report occurred on a 32bit Lenny box and on this box I get the following result:

Python 2.5.2 (r252:60911, Jan 24 2010, 14:53:14) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "%d" % (1000000 * -1234.0)
'-1234000000'
>>> "%d" % (1000000 * -12345.0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: int argument required

Thanks for following up on this bug!

Replying to [zooko](/tahoe-lafs/trac-2024-07-25/issues/1055#issuecomment-77536): > If you want to help out with the next step, report to us what version of Python you have, because my version of Python doesn't behave like yours The session of [comment 2](/tahoe-lafs/trac-2024-07-25/issues/1055#issuecomment-77535) was taken with the following Python: ``` Python 2.5.5 (r255:77872, Apr 21 2010, 08:40:04) [GCC 4.4.3] on linux2 ``` This is the stock Python of an amd64 Debian Sid installation. The error of the bug report occurred on a 32bit Lenny box and on this box I get the following result: ``` Python 2.5.2 (r252:60911, Jan 24 2010, 14:53:14) [GCC 4.3.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "%d" % (1000000 * -1234.0) '-1234000000' >>> "%d" % (1000000 * -12345.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: int argument required ``` Thanks for following up on this bug!

rycee commented

2010-05-31 06:47:32 +00:00

Just a note that I cannot reproduce the "int argument required" error with Python 2.6.5 in Debian Sid. So I suppose it is something specific to Python 2.5?

zooko commented

2010-05-31 07:25:55 +00:00

Yeah I suppose it is a bug in Python 2.5. We have Python 2.5 buildslaves, so if you write a unit test that exercises this bug then we can add it to the test suite and know that it is fixed on our supported platforms.

zooko modified the milestone from 1.7.0 to 1.7.1

2010-06-18 23:56:39 +00:00

zooko commented

2010-07-17 07:34:57 +00:00

This just needs a simple unit test.

zooko modified the milestone from 1.7.1 to 1.8β

2010-07-17 07:34:57 +00:00

jacoblyles commented

2010-07-29 21:37:26 +00:00

Attachment WHAT.dpatch (2170 bytes) added

Added unit test to show bug in allmydata.web.common.abbreviate_time in python 2.5

**Attachment** WHAT.dpatch (2170 bytes) added Added unit test to show bug in allmydata.web.common.abbreviate_time in python 2.5

WHAT.dpatch

2.1 KiB

zooko commented

2010-07-29 22:14:28 +00:00

jacoblyles:

Thanks for the patch! Are you sure that your patch will trigger the bug, though? It looks like your patch will cause common.abbreviate_time() to be called with the argument -123000, which is an integer. common.abbreviate_time() will then multiply it by 1,000,000, making it be -123,000,000,000 As far as I understand, the bug shows up only when the argument is float instead of integer, and only when the argument is less than about negative 10 billion, and when the Python version is Python 2.5. If you could change your test to pass a float then we can run the test on our Python 2.5 buildbots.

jacoblyles: Thanks for the patch! Are you sure that your patch will trigger the bug, though? It looks like your patch will cause `common.abbreviate_time()` to be called with the argument `-123000`, which is an integer. `common.abbreviate_time()` will then multiply it by 1,000,000, making it be -123,000,000,000 As far as I understand, the bug shows up only when the argument is float instead of integer, and only when the argument is less than about negative 10 billion, and when the Python version is Python 2.5. If you could change your test to pass a float then we can run the test on our Python 2.5 buildbots.

jacoblyles commented

2010-07-29 22:45:15 +00:00

Yes, this will trigger the bug. common.abbreviate_time() casts its input to a float at common.py:82. I have run the test on my own machine with python 2.6 and python 2.5 to confirm that it fails with 2.5.

Yes, this will trigger the bug. common.abbreviate_time() casts its input to a float at [common.py:82](http://tahoe-lafs.org/trac/tahoe-lafs/browser/src/allmydata/web/common.py#L82). I have run the test on my own machine with python 2.6 and python 2.5 to confirm that it fails with 2.5.

davidsarah commented

2010-07-30 20:41:27 +00:00

Test looks good to me.

jacoblyles commented

2010-07-30 21:27:16 +00:00

Attachment web_common_25.dpatch (2546 bytes) added

This patch fixes an error with interpolating float values in "%d" format strings in python 2.5

**Attachment** web_common_25.dpatch (2546 bytes) added This patch fixes an error with interpolating float values in "%d" format strings in python 2.5

web_common_25.dpatch

2.5 KiB

jacoblyles commented

2010-07-30 21:29:20 +00:00

I've attached a patch that should fix the problem. The magic value to break "%d" in Python 2.5 is a float of size 2.0^31^ or larger. I replaced "%d" with "%.0f"

davidsarah commented

2010-07-30 23:27:37 +00:00

Fixed by changeset:d3e6e15e23bcc93f. Not sure why this wasn't auto-closed by trac.

tahoe-lafs added the

fixed

label 2010-07-30 23:27:37 +00:00

davidsarah closed this issue

2010-07-30 23:27:37 +00:00

zooko commented

2010-07-30 23:46:02 +00:00

jacoblyles: thanks! A couple of things to note for future reference:

I re-edited your patch description to be a little more detailed and to have the string "fixes #1055" in it. See also wiki/Patches for examples of good patch descriptions.
David-Sarah is adding unit tests of the adjacent formatting code in addition to the unit test you added for the one that your patch fixed.
It might be good to bundle the tests together into the fix-it patch. At least in this case it would have prevented me from forgetting to apply your tests. :-)

jacoblyles: thanks! A couple of things to note for future reference: 1. I re-edited your patch description to be a little more detailed and to have the string "fixes #1055" in it. See also [wiki/Patches](wiki/Patches) for examples of good patch descriptions. 2. David-Sarah is adding unit tests of the adjacent formatting code in addition to the unit test you added for the one that your patch fixed. 3. It might be good to bundle the tests together into the fix-it patch. At least in this case it would have prevented me from forgetting to apply your tests. :-)

davidsarah commented

2010-07-31 00:07:46 +00:00

Test patch committed in changeset:e276050d5ab2a937. It turned out that the instances of %d in abbreviate_rate and abbreviate_size could not have overflowed (and float underflow is not a problem here), so I just committed the original attachment:WHAT.dpatch.

Test patch committed in changeset:e276050d5ab2a937. It turned out that the instances of `%d` in `abbreviate_rate` and `abbreviate_size` could not have overflowed (and float underflow is not a problem here), so I just committed the original attachment:WHAT.dpatch.

zooko commented

2010-07-31 02:53:00 +00:00

Oh, and another thing: we need a snippet for source:NEWS to describe this change.