How much of the production code is exercised by a test. What they all miss is whether the assertions that these tests make are good and strong enough to capture bugs. If we introduce a bug in the code, even in a line covered by a test, will the test break?
As mentioned earlier, coverage alone is not enough to determine whether a test suite is good. We have been thinking about how far our test suite goes to evaluate the strength of our test suite. Now let’s think of the test suite’s fault detection capability. How many bugs can it reveal?
This is the idea behind mutation testing. In a nutshell, we purposefully insert a bug in the existing code and check whether the test suite breaks. If it does, that’s a point for the test suite. If it does not (all tests are green even with the bug in the code), we have found something to improve in our test suite. We then repeat the process: we create another buggy version of the problem by changing something else in the code, and we check whether the test suite captures that bug.
These buggy versions are mutants of the original, supposedly correct, version of the program. If the test suite breaks when executed against a mutant, we say that the test suite kills that mutant. If it does not break, we say that the mutant survives. A test suite achieves 100% mutation coverage if it kills all possible mutants.
Mutation testing makes two interesting assumptions. First, the competent programmer hypothesis assumes that the program is written by a competent programmer and that the implemented version is either correct or differs from the correct program by a combination of simple errors. Second, the coupling effect says that a complex bug is caused by a combination of many small bugs. Therefore, if your test suite can catch simple bugs, it will also catch the more complex ones.
Pitest is the most popular open source tool for mutation testing in Java. Here are a few examples of mutators from its manual:
- Conditionals boundary —Relational operators such as
<
and<=
are replaced by other relational operators. - Increment —It replaces
i++
withi--
and vice versa. - Invert negatives —It negates variables: for example,
i
becomes-i
. - Math operators —It replaces mathematical operators: for example, a plus becomes a minus.
- True returns —It replaces entire boolean variables with
true
. - Remove conditionals —It replaces entire
if
statements with a simpleif(true)
{...}
.
Running Pitest is simple, as it comes with plugins for Maven and Gradle. For example, I ran it against the LeftPad
implementation and tests we wrote earlier; figure 3.11 shows the resulting report. As in a code coverage report, a line’s background color indicates whether all the mutants were killed by the test suite.
Figure 3.11 Part of a report generated by Pitest. Lines 26, 31, 32, 36, 38, 39, 43, and 44 have surviving mutants.
The next step is to evaluate the surviving mutants. It is very important to analyze each surviving mutant, as some may not be useful.
Remember that mutation testing tools do not know your code—they simply mutate it. This sometimes means they create mutants that are not useful. For example, in the line that contains int
pads
=
size
-
strLen
, Pitest mutated the size
variable to size++
. Our test suite does not catch this bug, but this is not a useful mutant: the size
variable is not used after this line, so incrementing it has no effect on the program.
You should view mutation testing in the same way as coverage tools: it can augment the test suite engineered based on the program’s requirements.
Mutation testing faces various challenges in practice, including the cost. To use mutation testing, we must generate many mutants and execute the whole test suite with each one. This makes mutation testing quite expensive. Considerable research is dedicated to lowering the cost of mutation testing, such as reducing the number of mutants to try, detecting equivalent mutants (mutants that are identical to the original program in terms of behavior), and reducing the number of test cases or test case executions (see the work of Ferrari, Pizzoleto, and Offutt, 2018). As a community, we are taking steps toward a solution, but we are not there yet.
Despite the cost, mutation testing is highly beneficial. In a very recent paper by Parsai and Demeyer (2020), the authors demonstrate that mutation coverage reveals additional weaknesses in the test suite compared to branch coverage and that it can do so with an acceptable performance overhead during project build. Even large companies like Google are investing in mutation testing in their systems, as reported by Petrovic´ and Ivankovic´ (2018).
Researchers are also exploring mutation testing in areas other than Java backend code. Yandrapally and Mesbah (2021) propose mutations for the Document Object Model (DOM) in HTML pages to assess whether web tests are strong enough. In addition, Tuya and colleagues (2006) proposed the use of mutation in SQL queries.
I suggest that you try to apply mutation testing, especially in more sensitive parts of your system. While running mutation testing for the entire system can be expensive, running it for a smaller set of classes is feasible and may give you valuable insights about what else to test.
Leave a Reply