QUIP-24: Refine rules for blacklisting automated tests

Refine the rules related to the blacklisting feature. Clarify that blacklisting should be treated as a temporary measure. Define a specific grace period for resolving the issue, and ensure the issue is addressed within that timeframe Change-Id: I368fc394a996d55fcf8a5297c747fc1c66d38007 Reviewed-by: Axel Spoerl <axel.spoerl@qt.io> Reviewed-by: Shawn Rutledge <shawn.rutledge@qt.io> Reviewed-by: Volker Hilsheimer <volker.hilsheimer@qt.io>
author: Anna Wojciechowska <anna.wojciechowska@qt.io> 2024-10-16 19:52:18 +0200
committer: Volker Hilsheimer <volker.hilsheimer@qt.io> 2025-02-09 13:21:36 +0000
commit: 546d1dcc8bace489074784d56dc785816210f54f (patch)
tree: 0fc71361c8eb31fbb2fd628b3543f4a32d5a615e
parent: b28cacdaa4e4c3cd3778b549d8ccc6da26cca1ee (diff)
1 files changed, 129 insertions, 0 deletions
diff --git a/quip-0024-Blacklisting-Rules.rst b/quip-0024-Blacklisting-Rules.rst
new file mode 100644
index 0000000..355d83a
--- /dev/null
+++ b/quip-0024-Blacklisting-Rules.rst
@@ -0,0 +1,129 @@
+QUIP: 24
+Title: Rules and guidelines for ignoring results of flaky automatic tests
+Author: Anna Wojciechowska,
+        Axel Spörl,
+        Edward Welbourne
+Status: Active
+Type: Process
+Created: 2024-07-19
+
+Motivation
+==========
+
+The blacklisting functionality was introduced in Qt 5.4, replacing the previous
+insignificant_test mechanism. Blacklisting was implemented as a tool to mitigate
+the CI (Continuous Integration)-blocking effect of flaky tests. A blacklisted
+test will run as usual; but if it fails, it prints BFAIL instead of FAIL,
+and the exit code is successful. If it passes, it prints BPASS instead of PASS.
+It enables integrations despite a failing automated test, thus creating
+technical debt. While solving an immediate problem, it poses the risk of
+faulty commits being merged instead of being prevented by a test failure.
+
+Since its introduction, there has been a steady increase in the number of
+blacklisted automated tests. There are different approaches to calculating the
+number of blacklisted tests. In this document, we refer to the blacklisted
+functions metric, defined as the number of distinct blacklisted test functions
+(without the distinction of data tags) that are run as part of the Health Check
+Coin task in the dev branch. Using this metric, we counted 560 blacklisted test
+functions, which constituted 2.8% of the total test functions as of July 8, 2024.
+
+The absolute number continues to rise, despite efforts to unblacklist
+tests; the relative number due to the growth of The Qt Project remained steady
+in the last year. Some tests have remained blacklisted since the mechanism's
+introduction. As of July 2024, the average age of a blacklisted test is 3.6
+years, and 260 tests have been blacklisted for more than three years. Those
+numbers show that the blacklisting feature sometimes has the effect of
+permanently excluding test results, despite spending CI time to keep running
+them.
+
+Recommendations
+===============
+
+This QUIP sets out recommended practice for the handling of flaky tests.
+It clarifies that blacklisting is a form of quarantine, selectively ignoring
+a flaky test so that it does not block development.  Since that may allow
+regressions to get integrated, that the test would have caught, this quarantine
+does not change the fact that the flaky test is in need of fixing. This
+technical debt should be tracked and visible to contributors responsible for
+or interested in the relevant code.
+
+A test that sporadically crashes cannot be guarded against with blacklisting.
+In such a case, other mechanisms for avoiding the problem should be used: see
+`Best practices for test`_ for guidance on what to do in such a case.
+
+Review Guideline for Patches that Blacklist Tests
+-------------------------------------------------
+
+A test may be blacklisted if all of the following conditions are met:
+
+- The test has demonstrated flakiness in integrations.
+  See more at `general information about qt test system
+  <https://wiki.qt.io/Qt_test_system>`_
+- A preliminary investigation has shown that it is difficult to provide
+  an immediate fix.
+- There is a relevant open, prioritized as (at least) important (P2) and
+  assigned ticket at https://bugreports.qt.io, ensuring that the matter
+  is followed up upon.
+- The ticket must be mentioned as a comment in the BLACKLIST file.
+
+Blacklisting the test means adding an entry to the test directory's BLACKLIST
+file. This entry comprises a heading for each test to be blacklisted followed
+by condition lines that identify the contexts in which the test is known to be
+flaky. These conditions should be narrowly specified, to avoid ignoring the test
+in contexts in which it is not known to be flaky.
+
+Each condition line added should be marked with a comment identifying a bug
+report that shall remain open once the blacklisting has been integrated. Such
+bug reports should also be identified by `Task-number` footers in the commit
+message, to enable those later working on the ticket to find and revert the
+blacklisting. Such bug reports should be assigned, prioritized and otherwise
+annotated suitably to ensure they are visible to relevant contributors, to
+enable them to suitably prioritize work to fix the flakiness within the context
+of their work on Qt.
+
+Reviewers should verify, in addition to the preconditions above, that each
+condition line added to the BLACKLIST file has a comment referencing a bug
+report, as above, and that each such bug report:
+
+- Appears in a `Task-number` footer of the commit message (and *not* in
+  a `Fixes` footer),
+- Describes the flakiness clearly, including relevant excerpts from logs of
+  builds exhibiting flaky failures (not just links to the logs), and
+- Is suitably assigned, prioritized and annotated to ensure it shall get the
+  attention of relevant contributors.
+
+When to Retract a Blacklisting
+------------------------------
+
+Ideally the bug report relating to the blacklisted test shall be addressed,
+leading to the blacklisting being retracted as part of fixing the problems that
+cause it.
+
+Contributors may, in any case, make changes that cause the blacklisted test to
+cease being flaky. One of the concerns with blacklisting is that a contribution
+that the test should have caught may get through integration. This may lead to
+the formerly flaky test always failing. Equally, since the underlying cause of
+the flakiness is typically unknown, a contribution may fix it without those
+involved being aware they are doing so.
+
+The Continuous Integration system keeps track of the history of results from
+blacklisted tests (among others). This makes it possible to review the history
+of past results for a particular test and configuration condition to determine
+whether it is still flaky. A blacklisted test is only flaky if it does produce
+a mix of BPASS and BFAIL outcomes. If it ceases being flaky, its blacklisting
+should be retracted. See  `Best practices for test
+<https://doc.qt.io/qt-6/qttest-best-practices-qdoc.html>`_ for how to resolve
+that.
+
+
+References
+==========
+
+- `Qt test lib tutorial`: https://doc.qt.io/qt-6/qttestlib-tutorial6.html
+
+- `QEXPECT_FAIL documentation`: https://doc.qt.io/qt-6/qtest.html#QEXPECT_FAIL
+
+- `Reproducing autotests fails`: https://wiki.qt.io/How_to_reproduce_autotest_fails
+
+- `Best practices for test`: https://doc.qt.io/qt-6/qttest-best-practices-qdoc.html
+- `General Qt test system information`: https://wiki.qt.io/Qt_test_system
author	Anna Wojciechowska <anna.wojciechowska@qt.io>	2024-10-16 19:52:18 +0200
committer	Volker Hilsheimer <volker.hilsheimer@qt.io>	2025-02-09 13:21:36 +0000
commit	546d1dcc8bace489074784d56dc785816210f54f (patch)
tree	0fc71361c8eb31fbb2fd628b3543f4a32d5a615e
parent	b28cacdaa4e4c3cd3778b549d8ccc6da26cca1ee (diff)