Crashed WUI due to abbreviate_time in common.py #1055
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#1055
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
I've been having a few difficulties with a Tahoe-node I'm running. It produced a long error message whenever I accessed the /storage web path.
With a bit of sleuthing I tracked the problem down to common.py:93. The line
seems to assume that s an integer but s actually is a float as can be seen earlier in the function. Changing this line to
fixed the issue for me and /storage now shows without problem.
Looking through the other functions in the file it appears as if there are other cases of %d being used together with float values. Perhaps they can also cause problems?
I cannot produce a stack trace at this time since the node is currently in use but I can try again later if there is a need for it.
Let's fix this for v1.7.0.
I have tried to reproduce this problem and have been unsuccesfull. I think what happened is that the computer clock had for some reason gotten very much out of sync prior to the previous pass by the expire crawler. Eventually the time got synchronized over ntp and the reported problem occurred. This should have caused the abbreviate_time function to recieve a very large negative number. This seems to rhyme well with the following python session:
That is, python will handle floats as well as can be expected together with %d, as long as the number is small enough.
So, in conclusion, this error should be quite rare indeed and I would not protest if this bug report is closed as WONTFIX or similar. It may possibly be interesting to make the code a bit more robust in these cases, I would have appreciated a "check your clock dummy!" message :-)
Tahoe-LAFS is intended to function correctly in all respects even if the system clock jumps around to arbitrary values [*].
So instead of closing this as
wontfix
, we'll write a unit test that shows this bug. Thanks for tracking it down! If you want to help out with the next step, report to us what version of Python you have, because my version of Python doesn't behave like yours:[*] Not that it currently does so, but I consider it to be a bug that it exhibits random misbehavior when the
gettimeofday(2)
gets kicked by ntp or by a sysadmin, due to Tahoe-LAFS relying on the ordering of events in the Twisted reactor and Twisted using the result fromgettimeofday(2)
; see http://twistedmatrix.com/trac/ticket/2424Replying to zooko:
The session of comment 2 was taken with the following Python:
This is the stock Python of an amd64 Debian Sid installation. The error of the bug report occurred on a 32bit Lenny box and on this box I get the following result:
Thanks for following up on this bug!
Just a note that I cannot reproduce the "int argument required" error with Python 2.6.5 in Debian Sid. So I suppose it is something specific to Python 2.5?
Yeah I suppose it is a bug in Python 2.5. We have Python 2.5 buildslaves, so if you write a unit test that exercises this bug then we can add it to the test suite and know that it is fixed on our supported platforms.
This just needs a simple unit test.
Attachment WHAT.dpatch (2170 bytes) added
Added unit test to show bug in allmydata.web.common.abbreviate_time in python 2.5
jacoblyles:
Thanks for the patch! Are you sure that your patch will trigger the bug, though? It looks like your patch will cause
common.abbreviate_time()
to be called with the argument-123000
, which is an integer.common.abbreviate_time()
will then multiply it by 1,000,000, making it be -123,000,000,000 As far as I understand, the bug shows up only when the argument is float instead of integer, and only when the argument is less than about negative 10 billion, and when the Python version is Python 2.5. If you could change your test to pass a float then we can run the test on our Python 2.5 buildbots.Yes, this will trigger the bug. common.abbreviate_time() casts its input to a float at common.py:82. I have run the test on my own machine with python 2.6 and python 2.5 to confirm that it fails with 2.5.
Test looks good to me.
Attachment web_common_25.dpatch (2546 bytes) added
This patch fixes an error with interpolating float values in "%d" format strings in python 2.5
I've attached a patch that should fix the problem. The magic value to break "%d" in Python 2.5 is a float of size 2.0^31^ or larger. I replaced "%d" with "%.0f"
Fixed by changeset:d3e6e15e23bcc93f. Not sure why this wasn't auto-closed by trac.
jacoblyles: thanks! A couple of things to note for future reference:
Test patch committed in changeset:e276050d5ab2a937. It turned out that the instances of
%d
inabbreviate_rate
andabbreviate_size
could not have overflowed (and float underflow is not a problem here), so I just committed the original attachment:WHAT.dpatch.Oh, and another thing: we need a snippet for source:NEWS to describe this change.
Attachment NEWS-1055.dpatch.txt (2615 bytes) added