4. Software Testing

Software testing is a critical process in software development, focusing on evaluating software to ensure it meets specified requirements and standards. It is instrumental in identifying defects, ensuring quality, and increasing confidence in the software’s functionality and performance.

For an overview of the subject of testing, please refer to the corresponding Wikipedia article.

4.1. Purpose

The primary purposes of software testing are:

Gaining Information About Quality: Assessing the software’s quality by identifying defects, ensuring it behaves as expected, and meets user needs and requirements.
Increasing Confidence in Correctness: Verifying that the software performs its intended functions correctly, thereby building confidence among the stakeholders, including developers, clients, and users.

4.2. Focus on Correctness

Software testing focuses on verifying the correctness of the software in relation to:

Functional Requirements

Testing whether the software performs its intended functions.

Includes unit testing, integration testing, system testing, and acceptance testing.

Dynamic Nonfunctional Requirements

Assessing aspects like performance, scalability, reliability, and more.

Involves performance testing, stress testing, load testing, reliability testing, and more.

4.3. Significance

Software testing is vital because it:

Identifies defects early in the development cycle, reducing the cost of bug fixes.
Ensures that the software meets the business and technical requirements that guided its design and development.
Provides assurance to the stakeholders that the software is reliable, secure, and high-performing.

4.4. Limitations of Software Testing

Software testing, while crucial, has certain inherent limitations. These constraints impact the extent to which testing can assure software quality and correctness.

Testing is Not Usually Exhaustive: Testing typically cannot cover every possible scenario or input combination, making it not equivalent to a formal proof of correctness. This limitation means that even after extensive testing, there might still be undetected defects.
Manual Testing is Tedious and Not Easily Replicable: Manual testing requires significant effort and time, and it is prone to human error. Due to its tedious nature, it can be challenging to replicate precisely, leading to inconsistencies in test results.
Testing is Relative to the Correctness of Specification: The effectiveness of testing is contingent on the accuracy and completeness of the software specifications. If these specifications are incorrect or incomplete, testing based on them can miss critical defects. Thus, the correctness of the software is only as reliable as the specifications against which it is tested.

These limitations underscore the need for a comprehensive approach to software quality assurance, combining rigorous testing with other methods like code reviews, static analysis, and formal verification techniques.

4.5. Alternatives to Testing

These are some alternatives and complementary approaches to software testing:

Transformational programming

Automatically derive correct executable program from specification

Static analysis

Formally proves the correctness of a system based on its source code.

Model checking

Assumes both the system and its specification are represented formally. The system is typically represented as a finite-state machine or other state-transition system. Model checking exhaustively verifies correctness w.r.t. a specification based on, e.g., temporal logic. This aproach is widely used in embedded systems.

Examples of model checking tools: SPIN, TLA+, etc. (see also this Wikipedia article on model checking tools)

The effectiveness of model checking is still relative to the correctness of the given specification.

4.6. Testing Types and Techniques

Manual Testing: Performed by human testers who interact with the software to find defects.
Automated Testing: Uses tools and scripts to automatically execute tests and compare actual outcomes with expected results.
Black-Box Testing: Focuses on input and output without considering internal code structure.
White-Box Testing: Involves testing internal structures or workings of an application, as opposed to its functionality (i.e., looking inside the ‘box’).

4.7. Levels of Testing

Testing is a vast software engineering topic, encompassing various levels:

unit testing
integration testing
system testing
acceptance testing

These levels are orthogonal to the various testing types, techniques, and tactics.

4.8. Unit Testing Techniques

The main value proposition of automated unit testing is that it encourages frequent regression testing by making it painless. During the last two decades, this “test-infected” mindset has gradually entered the mainstream including introductory computer science courses.

4.8.1. Unit Test Outcomes

The following unit test outcomes are usually possible for any given test:

The test passes, usually indicated as a green checkmark.
An assertion in the test fails, usually indicated as a yellow exclamation mark.
There is a runtime error before reaching any assertions, usually indicated as a red “x”.
The test times out or does not terminate at all.
The testing tool or entire system crashes; these outcomes are uncommon in today’s managed code environments.

At the unit testing level, the following techniques are of particular interest:

4.8.2. Ad-hoc testing

Also called example-based testing, where we provide one or more specific test cases, where we programmatically interact with the system under test (SUT) and then examine the result or effect of the interaction.

assert(isPalindrome("radar"))
assert(!isPalindrome("lidar"))

4.8.3. Table-Based Testing (also known as data-driven testing)

Here, we provide a table of two or more columns corresponding to arguments and expected results of the function or method under test. This technique allows for a more concise representation of several similar ad-hoc tests.

val palindromeTable =
  "string" | "result" |
  "a"      ! true |
  "aa"     ! true |
  "ab"     ! false |
  "mom"    ! true |
  "dad"    ! true |
  "kid"    ! false |
  "abba"   ! true |
  "appl"   ! false |
  "uncle"  ! false |
  "radar"  ! true |
  "lidar"  ! false |
  "hannah" | true

palindromeTable |> (
  (s, r) => assert(isPalindrome(s) == r)
)

4.8.4. Property-Based Testing

Here, we express the relationship between arguments and expected results as a universally quantified property.

\[\forall \texttt{s} \in \text{String} : \texttt{isPalindrome(s)} \Leftrightarrow (\texttt{s} = \texttt{s.reverse})\]

Using a suitable propert-based testing library, such as jqwik, we can express this property as executable code. Typically, such a library automatically generates a large number of argument values and then evaluates the property for each argument as a separate test.

@Property
boolean isPalindromeWorks(@ForAll final String s) {
  return isPalindrome(s) == new StringBuilder(s).reverse().toString().equals(s);
}

4.8.5. Stateless Testing

Orthogonal to the techniques discussed so far, stateless testing refers to the simple case where the function or method-under-test (MUT) is stateless, i.e., its result depends solely on its arguments and, possibly, the instance variables of an immutable object. Accordingly, stateless tests are typically simple and consist of these steps:

If we are testing a method, create an instance of the class providing the MUT.
Invoke the method.
Express assertions on the result.

4.8.6. Stateful Testing

In contrast to stateless testing, this refers to cases where we the system-under-test (SUT) is a stateful object and we want to test the correctness of the SUT in response to both observer and mutator methods. The challenge is that the space of possible interactions with a stateful object can blow up quickly if we want to test thoroughly.

For example, this test represents only one possible scenario involving the stateful offer and poll methods of a bounded buffer.

@Test
void testOffer2ThenPoll2() {
  final var value1 = "hello";
  final var value2 = "world";
  assertTrue(fixture.offer(value1));
  assertTrue(fixture.offer(value2));
  assertEquals(value1, fixture.poll());
  assertEquals(value2, fixture.poll());
  assertTrue(fixture.isEmpty());
}

Some testing libraries, however, support property-based stateful testing that exercise arbitrary scenarios involving the desired methods.

Using the jqwik library, assuming the action classes for invoking specific methods are defined separately, the following code will generate and exercise a large number of interactions involving the offer and poll methods.

@Provide
Arbitrary<ActionChain<SimpleQueue<String>>> simpleQueueActions() {
  return ActionChain
    .<SimpleQueue<String>>startWith(() -> new FixedArrayQueue<String>(5))
    .withAction(new OfferAction())
    .withAction(new PollAction());
}

@Property
void checkSimpleQueue(@ForAll("simpleQueueActions") final ActionChain<SimpleQueue<String>> chain) {
  chain.run();
}

4.9. Frameworks and Tools

Various frameworks and tools have arisen to make automated testing easier and more effective.

Frameworks: “XUnit” and similar frameworks for a variety of languages
Mocking
Testing patterns
Tools for GUI test automation, e.g. Selenium, Espresso
Build and dependency management tools
Continuous integration/deployment/delivery pipelines

These topics are typically covered in the COMP 370/470: Software Quality, Metrics, and Testing course

4.10. Code Coverage for Structure-Based Testing

Code coverage is a way to measure how thoroughly we are testing. With the help of an appropriate tool, such as JaCoCo or scoverage, we can generate coverage metrics during the build process.

Specific coverage metrics in ascending order of rigor include

Module/class coverage
Function/method coverage
Statement coverage
Edge coverage
Branch coverage
Condition/predicate coverage
Path coverage

4.11. Code Examples

arrayqueue-java-sbt

switch to java.util.Queue (?)

start with specification-based testing

then add property-based testing

consoleapp-java

testability requires modularity

modularity complicates scalability

start with specification-based testing

discuss branch coverage issue

then add property-based testing

4.12. Limitations of Testing (Revisited)

Let’s consider this example from Hillel Wayne’s blog.

def add(x: int, y: int): int {
  if (x == 12976 && y == 14867) {
    return x - y;
  }
  return x + y;
}

Possible questions include:

What is this function intended to do?
What does it actually do?
To what extent can testing help?
Is there any technique related to testing that can actually help?
Can formal methods help?

4.13. Conclusion

In conclusion, software testing is an integral part of the software development lifecycle. It not only ensures that the software is free from defects but also meets the functional and dynamic nonfunctional requirements, thereby increasing overall confidence in the software.

4.14. Further Reading

We have covered various aspects of testing in other works, including