Testing for concurrency on JVM

white ceramic mug on white ceramic saucer beside bread on white ceramic plate

Writing thread-safe code is in itself a major challenge and add to that the challenge of verifying if the code we have written is actually thread-safe, now we are looking at a problem where bugs are bound to creep in. As part of this post, we will look at a couple of approaches of how can we test for concurrency and what additional problems can arise when we decide to write tests for concurrent code.

Note: The post covers code commits that represent small digestible pieces of code. The original code repository along with references is mentioned in the end of the post.

For starters, let us consider the business problem we are trying to solve. We want to build a collection which can store at most one element. In order to achieve this, we extend the existing ArrayList collection and write our functionality on top of it. Here is the initial implementation of this functionality(Code commit)

  public boolean addIfCollectionIsEmpty(T element) {
    boolean isEmpty = super.isEmpty();
    if (isEmpty) {
      super.add(element);
    }
    return !isEmpty;
  }

In the first go, the above code looks correct. It is doing what it is required to do i.e. it first checks if the collection is empty and then goes ahead and add the element to the collection. This fulfills the contract of the method which is required to add to the collection only if it is empty. The code will also behave correctly in single threaded applications and we are even able to write a test to verify this behavior(Code commit).

public class AtMaxOneSizedCollectionIncorrectTest {

  private AtMaxOneSizedCollection<String> atMaxOneSizedCollection;

  @Before
  public void setUp() {
    this.atMaxOneSizedCollection = new AtMaxOneSizedCollection<>();
  }

  @Test
  public void testAtMaxOnceInsertionSuccessful() {
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
    assertEquals(atMaxOneSizedCollection.size(), 1);
  }
}

But the correctness of the code goes away as soon as multiple threads start accessing the addIfCollectionIsEmpty method. Now two threads can reach the empty collection check at the same time and both will have the view of an empty collection. This will result in both threads adding the element to the collection and in-turn break the contract of our implementation.

The above test will continue to pass giving us the wrong signal that our implementation is still correct as we are not testing for multiple thread access as part of the test. The addIfCollectionIsEmpty is invoked sequentially which will never lead to multiple threads reaching the empty collection check at the same time.

Now let us see a few approaches which test for concurrent thread-access but will lead to flakiness problems in our test suite. This is worse than having no tests as now it gives the fake confidence to the developer that their code is thread-safe when the test passes and leads to utter confusion when the test starts failing later.

We first attempt to write a test using the TestNG framework that provides support for writing tests with large thread pool(Code commit).

public class AtMaxOneSizedCollectionTestNgTest {

  private final AtMaxOneSizedCollection<String> atMaxOneSizedCollection = 
      new AtMaxOneSizedCollection<>();

  @Test(threadPoolSize = 10, invocationCount = 10)
  public void testCollectionWithMultipleThreads() {
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
    assertThat(atMaxOneSizedCollection.size(), is(1));
  }
}

In the above test we are running our test using a thread pool of size 10 which will invoke our code 10 times for each thread. Initially this feels like the right approach as we are now running our code with multiple threads instead of a single thread, so we are bound to catch errors due to thread-unsafe code. But once we run the test for couple of iterations, we notice that it results in unpredictable behavior. For one of the test runs, we see our test passing for all the invocations.

But we soon start seeing failures for future test runs. This confirms that we do have failures due to thread-unsafe implementation. Though it doesn’t provides us with concrete information on how to reproduce this failure. It’s just pure luck that we are able to see these failures on our development setup. If we were unlucky, we could have just checked-in this test case by patting ourselves on our back for writing well-performant and bug free multithreaded code. The look of surprise and disbelief would have been priceless when we saw the same test breaking the build in future runs.

Another similar approach which is commonly used to catch concurrency errors is to use stress test. Though similar to previous approach, stress test can only answer that the code is not thread-safe but doesn’t has any way to provide the means to reproduce the error aside from just running the test multiple times and praying that it fails. To write a stress test we make use of the tempus-fugit library. (Code commit)

public class AtMaxOneSizedCollectionStressTest {

  @Rule
  public ConcurrentRule concurrentRule = new ConcurrentRule();

  @Rule
  public RepeatingRule rule = new RepeatingRule();

  private static final AtMaxOneSizedCollection<String> atMaxOneSizedCollection = new AtMaxOneSizedCollection<>();

  @Test
  @Concurrent(count = 100)
  @Repeating(repetition = 100)
  public void runMultipleTimes() {
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
  }

  @AfterClass
  public static void annotatedTestRunMultipleTimes() {
    assertEquals(1, atMaxOneSizedCollection.size());
  }
}

While running the above stress test, we end up with similar flaky behavior as that of TestNG test. In one instance, we see our test passing for our current implementation but for another instance we are faced with failures due to concurrent thread-access.

Having such flaky tests turn out to be a major bottleneck in developer productivity as developers spend a big chunk of their time battling build failures due to such flaky runs. It also leads to the team mistrusting their test infrastructure as now they cannot answer confidently about whether a test failure was due to an actual bug or a flaky test. Developers soon get into a habit of just rerunning the build and brute forcing their code change into production when the flaky test ends up passing. Such a codebase is infused with landmines that can go off silently and bring down the production resulting in degraded customer experience.

What we actually intend to accomplish from writing a test is to test for all possible thread inter-leavings. We want to test for various scenarios where multiple threads access our code concurrently and run through these scenarios in all possible orderings such as thread1 executing before thread2, both threads executing concurrently and so on. One possible way to accomplish this is to make use of a framework such as MultithreadedTC that provides control over ordering of threads in a multi-threaded application. A sample test using this framework is as below(Code commit)

public class AtMaxOneSizedCollectionMultithreadedTest extends MultithreadedTest {

  private AtMaxOneSizedCollection<String> atMaxOneSizedCollection;

  @Override
  public void initialize() {
    this.atMaxOneSizedCollection = new AtMaxOneSizedCollection<>();
  }

  public void thread1() {
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
  }

  public void thread2() {
    atMaxOneSizedCollection.addIfCollectionIsEmpty("foo");
  }

  @Override
  public void finish() {
    assertEquals(1, atMaxOneSizedCollection.size());
  }

  @Test
  public void testAtMaxOneSizedCollection() throws Throwable {
    TestFramework.runManyTimes(new AtMaxOneSizedCollectionMultithreadedTest(), 100);
  }
}

Having a control over the exact ordering of threads accessing our code allows us to make sense of the failure and provides means to reproduce such errors. The above test is run 100 times for the test run. In order to catch any flakiness, I ran the test itself for 100 times resulting in 10,000 runs. We don’t see any flakiness during these runs and all the test runs result in failure due to our incorrect implementation.

Finally let’s make our implementation thread-safe and re-run our test suite. There are various ways to make our code thread safe ranging from using locks, an atomic size variable etc. I have decided to keep it simple by adding a synchronized keyword to the addIfCollectionIsEmpty method. I agree this might not be the most performant way of solving this issue but right now I am more interested in the testing aspect of concurrent programming(Code commit). We notice that our test suite is passing as we have made our implementation thread-safe

The best way to have correct concurrency in your codebase is to avoid concurrency altogether. Step back and have a look at your logic to see if you can avoid shared mutability. This will simplify your business logic and you won’t have to tackle the challenges that come along with testing the concurrent code. Also it is not always possible to test for concurrency through writing tests and for such cases approaches like model checking and static analysis might be useful. I plan to cover those techniques as part of future posts. I hope you have enjoyed going through this post and I have shared the code repository and references from which this post was heavily inspired. Do go through the code and the references for further details.

References