Many tests are flaky #3412
Labels
No Label
0.2.0
0.3.0
0.4.0
0.5.0
0.5.1
0.6.0
0.6.1
0.7.0
0.8.0
0.9.0
1.0.0
1.1.0
1.10.0
1.10.1
1.10.2
1.10a2
1.11.0
1.12.0
1.12.1
1.13.0
1.14.0
1.15.0
1.15.1
1.2.0
1.3.0
1.4.1
1.5.0
1.6.0
1.6.1
1.7.0
1.7.1
1.7β
1.8.0
1.8.1
1.8.2
1.8.3
1.8β
1.9.0
1.9.0-s3branch
1.9.0a1
1.9.0a2
1.9.0b1
1.9.1
1.9.2
1.9.2a1
LeastAuthority.com automation
blocker
cannot reproduce
cloud-branch
code
code-dirnodes
code-encoding
code-frontend
code-frontend-cli
code-frontend-ftp-sftp
code-frontend-magic-folder
code-frontend-web
code-mutable
code-network
code-nodeadmin
code-peerselection
code-storage
contrib
critical
defect
dev-infrastructure
documentation
duplicate
enhancement
fixed
invalid
major
minor
n/a
normal
operational
packaging
somebody else's problem
supercritical
task
trivial
unknown
was already fixed
website
wontfix
worksforme
No Milestone
No Assignees
4 Participants
Notifications
Due Date
No due date set.
Reference: tahoe-lafs/trac-2024-07-25#3412
Loading…
Reference in New Issue
Block a user
No description provided.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As observed in this run, part of a PR that doesn't change any code, the
test_POST_upload_replace
fails twice, once with atwisted.web.error
and another with aKeyError
.In this build,
allmydata.test.test_system.SystemTest
fails with aFailTest
(#3321).this run reveals that
test_filesystem
is flaky.Also this run looks flaky in
test_status_path_404_error
.In this run,
test_POST_rename_file
failed with the same errors astest_POST_upload_replace
, suggesting all of test_web is flaky.pull request
test_POST_upload_replace is flakyto Many tests are flakyMy plan is to just use this issue as an umbrella to capture most of the tests that are currently flaky and mark each with a 'retry' decorator to bypass the flakiness until someone has time to address the root cause.
I've flagged this as review-needed, but I'm not confident the PR covers all of the flaky tests. Even the most recent runs are still failing deprecations (tracked separately in #3414). Given the amount of toil this work requires, I'd like to get an initial review of the approach and feedback on the issue generally.
The collective consensus has been that we'll live with flaky tests... though personally, I'd like to see some improvement here, as it seems to be an issue multiple times per week, and CI is slow - as a result, we have eyeballed failing tests and decided to merge anyway (IIRC, this introduced a bucket into master at least once, because the test was actually failing not just flaky).
So I guess if there's a quick and dirty "fix" like retrying, that's worth doing but we should discuss further before investing significant effort in addressing root causes.
I think everyone would like to see some improvement. It's just not clear how to make that happen.
This ticket is a partial duplicate of https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3321
In the PR, exarkun advised:
So I reached out for advice.
In the #twisted IRC channel, I got some advice from tos9, Glyph, and others regarding the fact that
trial
allows test methods to return a Deferred object, which may have one or more callbacks that will only be later executed, so wrapping those methods inretry
will have little effect.I created trial-retry to work out the kinks, and there I landed on a technique that seems to have the intended effect.
Glyph had this to say about the approach:
It remains to be seen if the issue is merely theoretical or if such limitations might be triggered and in this codebase.
Acknowledging this potential weakness of the approach, I plan to apply this technique to the work in progress and see if it improves the reliability of the test suite.
Turns out the problem is even more complicated than I imagined. As illustrated here, tests can create Deferred objects and does not even need to return them for them to be honored in the test results. And because the handling of those results is outside the scope of the method, there's no decorator on the method that can intervene when those deferred tests fail... at least, not without some support by the test runner.
Given that:
I'm going to shelve this effort for now.
Here is some of the conversation about this effort:
I've thought about this some more and had some ideas. Like what if the asynchronous test could be synchronized then retried? That doesn't work because the event loop is already created for setup/teardown.
This is happening... a lot less? Going to close it, can open new one if new issues arise.