diff options
author | Anna Wojciechowska <anna.wojciechowska@qt.io> | 2024-10-16 19:52:18 +0200 |
---|---|---|
committer | Volker Hilsheimer <volker.hilsheimer@qt.io> | 2025-02-09 13:21:36 +0000 |
commit | 546d1dcc8bace489074784d56dc785816210f54f (patch) | |
tree | 0fc71361c8eb31fbb2fd628b3543f4a32d5a615e | |
parent | b28cacdaa4e4c3cd3778b549d8ccc6da26cca1ee (diff) |
QUIP-24: Refine rules for blacklisting automated tests
Refine the rules related to the blacklisting feature.
Clarify that blacklisting should be treated as a
temporary measure. Define a specific grace period
for resolving the issue, and ensure the issue
is addressed within that timeframe
Change-Id: I368fc394a996d55fcf8a5297c747fc1c66d38007
Reviewed-by: Axel Spoerl <axel.spoerl@qt.io>
Reviewed-by: Shawn Rutledge <shawn.rutledge@qt.io>
Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
-rw-r--r-- | quip-0024-Blacklisting-Rules.rst | 129 |
1 files changed, 129 insertions, 0 deletions
diff --git a/quip-0024-Blacklisting-Rules.rst b/quip-0024-Blacklisting-Rules.rst new file mode 100644 index 0000000..355d83a --- /dev/null +++ b/quip-0024-Blacklisting-Rules.rst @@ -0,0 +1,129 @@ +QUIP: 24 +Title: Rules and guidelines for ignoring results of flaky automatic tests +Author: Anna Wojciechowska, + Axel Spörl, + Edward Welbourne +Status: Active +Type: Process +Created: 2024-07-19 + +Motivation +========== + +The blacklisting functionality was introduced in Qt 5.4, replacing the previous +insignificant_test mechanism. Blacklisting was implemented as a tool to mitigate +the CI (Continuous Integration)-blocking effect of flaky tests. A blacklisted +test will run as usual; but if it fails, it prints BFAIL instead of FAIL, +and the exit code is successful. If it passes, it prints BPASS instead of PASS. +It enables integrations despite a failing automated test, thus creating +technical debt. While solving an immediate problem, it poses the risk of +faulty commits being merged instead of being prevented by a test failure. + +Since its introduction, there has been a steady increase in the number of +blacklisted automated tests. There are different approaches to calculating the +number of blacklisted tests. In this document, we refer to the blacklisted +functions metric, defined as the number of distinct blacklisted test functions +(without the distinction of data tags) that are run as part of the Health Check +Coin task in the dev branch. Using this metric, we counted 560 blacklisted test +functions, which constituted 2.8% of the total test functions as of July 8, 2024. + +The absolute number continues to rise, despite efforts to unblacklist +tests; the relative number due to the growth of The Qt Project remained steady +in the last year. Some tests have remained blacklisted since the mechanism's +introduction. As of July 2024, the average age of a blacklisted test is 3.6 +years, and 260 tests have been blacklisted for more than three years. Those +numbers show that the blacklisting feature sometimes has the effect of +permanently excluding test results, despite spending CI time to keep running +them. + +Recommendations +=============== + +This QUIP sets out recommended practice for the handling of flaky tests. +It clarifies that blacklisting is a form of quarantine, selectively ignoring +a flaky test so that it does not block development. Since that may allow +regressions to get integrated, that the test would have caught, this quarantine +does not change the fact that the flaky test is in need of fixing. This +technical debt should be tracked and visible to contributors responsible for +or interested in the relevant code. + +A test that sporadically crashes cannot be guarded against with blacklisting. +In such a case, other mechanisms for avoiding the problem should be used: see +`Best practices for test`_ for guidance on what to do in such a case. + +Review Guideline for Patches that Blacklist Tests +------------------------------------------------- + +A test may be blacklisted if all of the following conditions are met: + +- The test has demonstrated flakiness in integrations. + See more at `general information about qt test system + <https://wiki.qt.io/Qt_test_system>`_ +- A preliminary investigation has shown that it is difficult to provide + an immediate fix. +- There is a relevant open, prioritized as (at least) important (P2) and + assigned ticket at https://bugreports.qt.io, ensuring that the matter + is followed up upon. +- The ticket must be mentioned as a comment in the BLACKLIST file. + +Blacklisting the test means adding an entry to the test directory's BLACKLIST +file. This entry comprises a heading for each test to be blacklisted followed +by condition lines that identify the contexts in which the test is known to be +flaky. These conditions should be narrowly specified, to avoid ignoring the test +in contexts in which it is not known to be flaky. + +Each condition line added should be marked with a comment identifying a bug +report that shall remain open once the blacklisting has been integrated. Such +bug reports should also be identified by `Task-number` footers in the commit +message, to enable those later working on the ticket to find and revert the +blacklisting. Such bug reports should be assigned, prioritized and otherwise +annotated suitably to ensure they are visible to relevant contributors, to +enable them to suitably prioritize work to fix the flakiness within the context +of their work on Qt. + +Reviewers should verify, in addition to the preconditions above, that each +condition line added to the BLACKLIST file has a comment referencing a bug +report, as above, and that each such bug report: + +- Appears in a `Task-number` footer of the commit message (and *not* in + a `Fixes` footer), +- Describes the flakiness clearly, including relevant excerpts from logs of + builds exhibiting flaky failures (not just links to the logs), and +- Is suitably assigned, prioritized and annotated to ensure it shall get the + attention of relevant contributors. + +When to Retract a Blacklisting +------------------------------ + +Ideally the bug report relating to the blacklisted test shall be addressed, +leading to the blacklisting being retracted as part of fixing the problems that +cause it. + +Contributors may, in any case, make changes that cause the blacklisted test to +cease being flaky. One of the concerns with blacklisting is that a contribution +that the test should have caught may get through integration. This may lead to +the formerly flaky test always failing. Equally, since the underlying cause of +the flakiness is typically unknown, a contribution may fix it without those +involved being aware they are doing so. + +The Continuous Integration system keeps track of the history of results from +blacklisted tests (among others). This makes it possible to review the history +of past results for a particular test and configuration condition to determine +whether it is still flaky. A blacklisted test is only flaky if it does produce +a mix of BPASS and BFAIL outcomes. If it ceases being flaky, its blacklisting +should be retracted. See `Best practices for test +<https://doc.qt.io/qt-6/qttest-best-practices-qdoc.html>`_ for how to resolve +that. + + +References +========== + +- `Qt test lib tutorial`: https://doc.qt.io/qt-6/qttestlib-tutorial6.html + +- `QEXPECT_FAIL documentation`: https://doc.qt.io/qt-6/qtest.html#QEXPECT_FAIL + +- `Reproducing autotests fails`: https://wiki.qt.io/How_to_reproduce_autotest_fails + +- `Best practices for test`: https://doc.qt.io/qt-6/qttest-best-practices-qdoc.html +- `General Qt test system information`: https://wiki.qt.io/Qt_test_system |