non-deterministic test hang on OpenBSD #2017
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
3 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#2017
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
sickness's OpenBSD buildslave showed a test timeout:
(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/27)
Rerunning the tests with the exact same build (using Buildbot's "force rebuild" feature) resulted in success:
https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28
In that run (build number 28), those tests took only a few seconds:
(from https://tahoe-lafs.org/buildbot-tahoe-lafs/builders/sickness%20OpenBSD%205.0%20x86%20py2.7/builds/28/steps/test/logs/timings)
So there is a non-deterministic bug that exhibits on sickness's buildslave which causes those two tests to hang.
Questions:
Does this happen on any other buildslaves?
Did this ever happen before the recent patches which changed the behavior of iputil — b0883807361830c609dff1677c3cb34fd64d3ebb, f97b8e5e1df75284aa9b89dd830f8728040eab67, [08590b1f6a880d51751fdcacea6a007ebc568f2e], [16b245563db2f6ca71b9332b06debbe3e1d734b4], b31a4f6e870cb56efa40c785a868a944b964e8b9, a493ee0bb641175ecf918e28fce4d25df15994b6, [6104950ed8a7a356eed2218f2df958d074022eea], f77ec470d75f4b8fb81b1abca4ee3b73f1ad8b22, [8e31d66cd0b0821ccaa2c7c259e7d6f262ad4738], [6a445d73bc5253ec4ae0dec70af02e33bc869cf6]?
I suspect those iputil patches of causing this hang.
sickness: could you please run the unit tests from the current trunk version repeatedly with trial's
--until-failure
option?./bin/tahoe debug trial --until-failure allmydata.test
(See wiki/HowToWriteTests for more options.) If you can reliably reproduce the problem, then would you use git to rewind to before those patches and see if that makes the problem go away? Thanks!did run the tests as requested (rsyncing the build subdir of the buildbot in /tmp/someotherdir because I don't know how to properly checkout trunk) and here's the results:
so then I've run:
If it were the iputil patches, why does it only affect
test_client_no_noise
? Many other tests depend on the iputil address-finding code.Actually sickness' result in comment:92470 proves that it wasn't the iputil patches. (changeset:d85a75d7f689cb55ecddb319dc2057f38e4db87a/trunk was before those patches.)
sickness:
sickness: we don't know how to proceed with this. It seems like there might be an old bug, since, per daira's comment:92472, changeset:d85a75d7f689cb55ecddb319dc2057f38e4db87a/trunk was before the recent iputil patches. So, could you do some investigation of what version(s) of tahoe-lafs exhibit this failure on your !OpenBSD system and report back? You might just start with the Tahoe-LAFS v1.10 release f9af0633d8da426cbcaed3ff05ab6d7128148bb0 and see if your system could stably run this test on that version. Thanks!
I have to add that after the previous test exits, a process seems to still hang:
tahoe --quiet start test_runner/RunNode/test_client_no_noise/c1
and it seems to sit there forever until I manually kill it :/
Hm. Could you try editing the source code, i.e. the code of source:trunk/src/allmydata/test/test_runner.py, and add these lines somewhere near the top:
And then re-run that experiment?
The OpenBSD builder is gone. This could be re-opened and fixed by a dedicated OpenBSD maintainer but won't be otherwise.