pycryptopp-related hang of unit tests on platforms using buggy Gnu as 2.20 (e.g. MinGW 5.1.x, Ubuntu Karmic) #853
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
2 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#853
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Kai posted this bug report http://allmydata.org/pipermail/tahoe-dev/2009-December/003298.html
For what it is worth, we have a buildbot which tests Tahoe-LAFS on WinXP, and it works on that machine, so we have a starting point from which to figure out what is different between that build machine [1] and your machine.
Here is the report of the most recent five builds on the Windows buildbot:
http://allmydata.org/buildbot/builders/windows
And here is the dump of the versions of various components on that machine:
http://allmydata.org/buildbot/builders/windows/builds/1651/steps/show-tool-versions/logs/stdio
Kai: after you run the test and the test freezes, could you please look in the
_trial_temp
subdirectory and attach thetest.log
file to this ticket?Just out of curiousity, why are you using Tahoe-LAFS v1.5.0-r4108? The v1.5.0 release was -r4037 and the current head is -r4128. I guess you downloaded -r4108 back when it was the current head? I don't have any reason to think that this affects your bug.
So I've been looking at the code and I can't see how
res
can beNone
on [line 98 of iputil.py]source:src/allmydata/util/iputil.py@4108#L98._collect()
will be invoked and the argument passed to it will be whatever was returned from_find_addresses_via_config()
. Unless the thing that was returned fromfind_addresses_via_config()
is a Deferred, in which case the argument passed to_collect()
will be whatever value is given to that Deferred when it is fired. (This is the way Twisted works.)_find_addresses_via_config()
is defined on [source:src/allmydata/util/iputil.py@4108#L238 line 238]... Oh, okay I see where aNone
can be returned. From [source:src/allmydata/util/iputil.py@4108#L208 here]. Okay, so this suggests that on Kai's machine there is noroute.exe
executable on the path which emits a dotted-quad IPv4 address when executed asroute.exe print
.The first thing we need to do is make Tahoe-LAFS handle this case more gracefully. Let's see it is [source:src/allmydata/node.py@4108#L259] that invokes
iputil.get_local_address_async()
, and the result is passed to [source:src/allmydata/node.py@4108#L322 _setup_tub()]. What should we do when the answer to "list all of my IP addresses" is that there are none?In any case it seems like maybe [source:src/allmydata/util/iputil.py@4108#L194 iputil's SequentialTrier] should return an empty set instead of
None
when it runs out of executables to query.I'm investigating a cryptopp problem on my Windows Seven 64, using official Mingw-5.1.6, which shows up as a WebResultsRendering never ending test_check.
I had to manually install pycryptopp patching his config.h, disabling assembler parts (#define CRYPTOPP_DISABLE_ASM) in order to pass tahoe tests.
Kai, could you please run this command and report if it loops to infinite :
Are you using a 64-bits XP ? What is your processor ?
Thanks.
zooko: We are using r4108 because it is the latest generated tarball (builder tarballs seems defunct since 21^th^ November).
Please help me diagnose this pycryptopp problem. What version of pycryptopp?
There have been some bugs in amd64 Crypto++ assembly recently but all known problems are fixed in the most recent release of pycryptopp. There could be an interaction with Mingw -- I don't think many people have used Mingw to build pycryptopp before.
The tarballs stopped coming because some of the Supported Platforms stopped passing unit tests after r4108. That means that r4108 is the most recent version for which all the Supported Platforms passed all unit tests. r4108 is fine for now.
Attachment test.log (496 bytes) added
_trial_tmp\test.log
Replying to zooko:
I have no experience to use darcs. So I DLed the latest package in http://allmydata.org/source/tahoe/tarballs/ , that is, allmydata-tahoe-1.5.0-r4108.tar.bz2 .
Replying to zooko:
...
Oops. I set
PATH=C:\mingw516\bin
, which resulted in excluding route.exe from PATH!As I had already installed cygwin tools including g++ and python, I thought that it was better to exclude them from PATH not to confuse build process, and that setting
PATH=C:\mingw516\bin
was the easiest way to do it.After installation of Tahoe with
PATH=C:\mingw516\bin
, I started a new shell without changing PATH and started Tahoe with it. Tahoe successfully started and I could access the WUI with firefox. However, when I tried to DL a JPG file found in http://testgrid.allmydata.org:3567/ via my WUI, it freezes again and no access to my WUI was possible, including the access to the main screen. At this time, python.exe occupies ~100 % CPU.Replying to Grumpf:
At this time, python occupied ~100 % CPU and had to terminated by Task Manager. The content of _trial_tmp\test.log was:
The above test was done on my PC in my home. WinXP Pro SP3 32 bit is installed, as the CPU is PenM 1.2 GHz. When I first reported the problem, it was on the PC in my office. So I don't remember the detailed specs about the OS (sorry,) but probably it was WinXP Pro SP3 32 bit running on Core2Duo E7400.
Replying to zooko:
According to the WUI just after startup,
My versions
allmydata-tahoe: 1.5.0-r4108, foolscap: 0.4.2, pycryptopp: 0.5.17, zfec: 1.4.5, Twisted: 8.2.0, Nevow: 0.9.33, zope.interface: 3.5.2, python: 2.6.4, platform: Windows-XP-5.1.2600-SP3, sqlite: 3.5.9, simplejson: 2.0.9, pyopenssl: 0.9, argparse: 0.9.1, twisted: 8.2.0, nevow: 0.9.33-r17222, pyOpenSSL: 0.9, pyutil: 1.3.34, zbase32: 1.1.1, setuptools: 0.6c12dev, pysqlite: 2.4.1
Okay I moved the issue about
route.exe
not being found over to a new ticket: #854 (what to do when you can't find any IP address for yourself).This ticket is now renamed to "unit tests hang on Windows (pycryptopp related?)". Let me see if I understand what has been reported so far:
allmydata.test.test_checker.WebResultsRendering.test_check
using the official "python.org" Python 2.6.4 (from http://www.python.org/download/ ) on Windows XP SP3 32-bit. Histest.log
doesn't have anything useful in it. Kai: would you please turn on verbose logging ( http://allmydata.org/trac/tahoe/browser/docs/logging.txt ) and try again?config.h
? Does Tahoe-LAFS fail to build? Does it build but fail its unit tests? If so in what way? Also could you please run the pycryptopp unit tests.networking failures on WinXPto unit tests hang on Windows (pycryptopp related?)Replying to zooko:
I couldn't turn on verbose logging mode either by the arguments of make, nor by
SET FLOGTOTWISTED=1
etc. The content of test.log was same as before.I also tried
SET FLOGFILE=flog.out
, only to find empty flog.out.Zooko: Could you tell me what am I doing wrong with this, or how to hard-code the verbose mode flag into some .py files ?
(Instead I manually inserted
print inspect.stack()[0][1:3]
after each line ofWebResultsRendering.test_check
and saw thatu = uri.CHKFileURI("\x00"*16, "\x00"*32, 3, 10, 1234)
causes hangup. And with respect toHashUtilTests
, the cause wash1 = hashutil.convergence_hash(3, 10, 1000, "data", "secret")
. But I didn't investigate further ...)Additionally, when I DLed and built pycryptopp independently from Tahoe, the unit test aborted:
At this time, the "sorry for the inconvenience" dialog appeared.
Moreover, the unit test of Crypto++ also aborted during the test of SHA-256.
It might be that Crypto++ doesn't compile with Mingw, but I'm not sure which part of the programs is responsible for our problem.
Working crappy workaround is to downgrade GNU as from 2.20 (current MinGW) to previous 2.19.1 (both as.exe files and nothing else at all).
The problem is inherited from Crypto++.
Now to figure if it is a MinGW-centric problem, and a real fix...
Zooko: I suffer the exact same problems nodakai do. I pinned it down to Crypto++, symptoms are never-ending Tahoe test units, crash during first pycrypto test unit and never-ending Crypto++ SHA256 test.
Same underlying issue as (@@http://www.mail-archive.com/cryptopp-users@googlegroups.com/msg05292.html@@) ?
Maybe the fix mentioned in (@@http://www.mail-archive.com/cryptopp-users@googlegroups.com/msg05299.html@@) didn't make it into as 2.20.
Replying to davidsarah:
Sorry, the regression was in 2.20. It is fixed in some branch of 2.20, but not released.
Removing windows tag since this is known to happen also on Ubuntu Linux (https://bugs.launchpad.net/ubuntu/+bug/461303) when using the buggy as.
The pycryptopp bug is http://allmydata.org/trac/pycryptopp/ticket/31 .unit tests hang on Windows (pycryptopp related?)to pycryptopp-related hang of unit tests on platforms using buggy Gnu as 2.20 (e.g. MinGW 5.1.x, Ubuntu Karmic)Here is the launchpad page which tracks this issue across all of the various operating systems and projects that it affects:
https://bugs.launchpad.net/ubuntu/+source/binutils/+bug/461303
The as bug is x86-specific. It's not clear whether it affects only x86-64, or also x86-32.
Hrm. So we could work-around this by releasing a new version of pycryptopp which inspects the version number of GNU assembler and the platform and tries to figure out if this particular version of GNU assembler has this bug, and if so then disable the assembly optimizations. Or, it could even be more clever and try compiling and executing, and if it segfaults then it turns off assembly optimization and tries again. But I think instead we should just close this as 'invalid', indicating that it is the problem of some other people (a combination of binutils to produce a new fixed release -- http://sourceware.org/bugzilla/show_bug.cgi?id=10856 -- and/or MingW to patch binutils or to upgrade binutils in MingW to the fixed release -- https://sourceforge.net/tracker/index.php?func=detail&aid=2913876&group_id=2435&atid=102435 ).
Replying to zooko:
I don't entirely agree. This isn't the first time that pycryptopp/Crypto++ has failed in a platform-dependent way that was detected by unit tests -- see pycryptopp#17 and pycryptopp#24. If pycryptopp (or Crypto++) were to do the "more clever" behaviour above, then we'd have a good chance of heading off such problems in future. (Some bugs would be prevented, others would fail compilation rather than failing only if unit tests are run explicitly.) But that can be a separate ticket.
Well, the idea I floated above of attempting compilation, then test, then recompilation a different way if the test fails makes me think it is "too clever". I could imagine a bug in that process leading to problems even when there isn't a bug in Crypto++! On the other hand, I definitely think we should add a "quick start up self test" to pycryptopp and in fact I have already done so but I haven't committed to to pycryptopp trunk yet. Once that is in place then Tahoe-LAFS can import pycryptopp, execute the quick start-up self-test, and if it fails Tahoe-LAFS can exit quickly and loudly.
What do you think?
By the way there is an idiom in Python packaging which is to try compiling a native-code extension module and if the compilation fails (such as if there is no C compiler present, or some necessary header files are missing), then go ahead and use the pure-Python implementation of that module instead. So maybe if we wanted to go down this route we could start by adapting that idiom.
Replying to zooko:
+1.
FreeStorm also hit this bug, on Windows XP with MinGW. The symptoms were slightly different: when running the pycryptopp tests using
python setup.py test
, the error wasThis was solved by downgrading to binutils 2.19.1 -- specifically, by extracting binutils-2.19.1-mingw32-bin.tar.gz over the MinGW installation directory.
Replying to davidsarah:
AdvancedInstall#Whatifthatdoesntwork has been updated to describe this.
I just tested this on MinGW, and it looks like the issue has been fixed there (probably because they upgraded to the latest snapshot of binutils, version 2.20.51.20100613, to fix an unrelated bug).
Thanks, Wei Dai. I've updated wiki/AdvancedInstall#Whatifthatdoesntwork.
binutils 2.20.1, released 2010-03-03, does not have the ChangeLog entry from http://sourceware.org/bugzilla/show_bug.cgi?id=10856#c5 but does have the patch to expr.c. Weird. But I guess it is fixed in binutils 2.20.1.