ConcurrencyTesting

From MozillaWiki
Revision as of 02:39, 13 March 2009 by Cgj (talk | contribs)
Jump to navigation Jump to search

The Mozilla platform is becoming increasingly concurrent --- already it has multi-threaded network and image decoding (right?) code, and soon JavaScript "worker threads." Multi-threaded parsing is also being implemented. All this calls for more rigorous testing of potential concurrency bugs, in the same way that regression and performance tests are currently run.

This page aims to provide a fairly high-level overview of concurrency testing in general, strategies that seem likely to work on the Mozilla code base, and particular tools that could work for Mozilla (or if necessary, proposals for in-house tools).

Overview of Concurrency Testing

There are several general strategies, roughly listed by their applicability to Mozilla's code base.

Static analysis

Static analyses enforce invariants by only checking a program's source code. One type of invariant is that all follow a locking protocol (lock before unlock, etc.). With additional programmer-provided annotations, static analyses can discover more bugs, or weed out false positives. An example annotation:

 lock l;
 int x LOCKEDBY(l);

which means that 'x' must only be accessed when the current thread holds the lock 'l'.

Almost all static analyses are either sound (no "false negatives," i.e. no missed errors), complete (no "false positives," i.e. all errors are "real"), or neither --- but they are almost never both sound and complete. A difficulty of analyzing C++ code is in designing a (non-trivial) sound analysis that both understands pointers and isn't made completely useless by too many false positives. So pretty much any (non-trivial) static concurrency analysis for C++ has to choose between soundness and usefulness. Luckily, many "trivial" static analyses are quite useful --- type checking is one!

Dynamic analysis

Dynamic analyses usually instrument a binary to record memory accesses, lock acquisitions, and so forth. Then the instrumented binary is run as normal to gather data, and desirable properties are checked during execution (or afterwards). One such property is: there are no data races, i.e., unordered writes/reads to a global memory location.

Most dynamic analyses use one of two algorithms: "happens-before" and "lockset." Loosely, the "happens-before" algorithm checks that there is a (partial) order on accesses to shared memory. The order is determined by several types of events; for example, if the main thread accesses variable X, then spawns a child thread which also accesses X, then those two accesses of X are definitely ordered --- the child thread can't touch X until it has been spawned! Thus spawning a thread is one way to establish an order on memory accesses.

The "lockset" algorithm, again roughly speaking, keeps a set of locks for each shared memory location (L[v]). Initially, each L[v] contains all locks in the program. As the program runs, each spawned thread t track of the locks it owns, in a set L[t]. Each time a thread t accesses a shared memory location v, L[v] is to set to the intersection of L[v] and L[t]. If L[v] ever becomes empty, then there's a potential bug! This is an incomplete and potentially dense description, but consider the following cases that illustrate the algorithm:

  1. t holds no locks (L[t]={}) and accesses memory location v
    1. L[v] = L[v].intersect ({}) // t accesses v
    2. L[v] == {}
    3. Bug!
  1. t1 holds lock l1 (L[t1]={l1}), t2 holds lock l2 (L[t2]={l2}), and both access memory location v.
    1. L[v] = L[v].intersect ({ l1 }) // t1 accesses v
    2. L[v] == { l1 } // OK
    3. L[v] = L[v].intersect ({ l2 }) // t2 accesses v
    4. L[v] == { }
    5. Bug!

Both these analyses (and basically all other dynamic analyses) suffer from the problem that they can only catch bugs that are exposed by a particular execution of the instrumented program. That is, it's possible for a tool to report different bugs on different runs of the program. This can be ameliorated by running the program multiple times. One benefit over static analyses, especially for C++ code, is that dynamic analyses have perfectly precise information about what pointers actually point to --- they only need to dereference the pointer!

Model checking

Loosely speaking, model checkers attempt to check all possible "executions" of a "model." Most program models can be thought of as state machines: the "state" is the value of all variables at some point during execution, and "transitions" are program statements executed by a particular thread, changing one state into another. Note that there may be more one transition possible from a particular program state --- this means that more than one thread is executable at that state.

To make the model-checking problem tenable, most model checkers require that models have only finite state. Then, the checker starts with the program's "start state" and does a complete exploration of the model's state space --- this isn't anything fancy, just a depth/breadth-first search of a graph starting from the "start state" node. Loosely speaking again, a model checker finds an error when some "assertion" fails at a particular program state. (Other types of errors are beyond the scope of this document.)

Model checkers are awesome when they can be used, but they need lots of memory and CPU time to check non-trivial programs. However, there are many clever optimizations that can shrink the number of program states that need to be explored. Also, there are also heuristics for exploring program states (e.g., breadth- vs. depth-first) that increase the likelihood of finding errors in an allotted amount of CPU/memory.

Formal methods (verification)

Proofs using process calculi and the like. Bless the soul who attempts this on any part of Mozilla code!

Testing Strategies

These methods are not mutually exclusive --- they're quite complementary in fact!

  • Static concurrency analyses are performed along with our other static analyses (outparams, et al.), hopefully at compile time!
  • Dynamic checkers (possibly multiple ones) run a normally-built Firefox on a suite of tests, a la regression tests/reftests (the tests can be shared). The tests could be run multiple times.
  • Model checkers run on small, specially-constructed test harnesses of critical code. For example, a small driver program that AddRefs/UnRefs k objects in t threads.

Tools

Roughly in order of how easily they can be applied to Mozilla code.

  • Helgrind (valgrind tool): dynamic (happens-before) checker
    • + should work out-of-the-box
    • - Linux only (maybe on OS X now?)
    • + GPL (mostly)
    • - false positives
  • CHESS: "iterative context-bounding" model checker
    • + should work out-of-the-box
    • - Windows only
    • - license is a problem
    • + no false positives
    • (schach --- a prototype of an iterative context-bounding checker for linux --- is being investigated by bent and cjones)
  • SharC: static and dynamic checker
    • - requires work to run on Mozilla
    • + should start using their sharing annotations whether or not we run the SharC tool
    • - pthread only (as far as cjones knows)
    • - only works for C code (as far as cjones knows)
    • + BSD licensed
    • - false positives
  • Hybrid checker: dynamic (happens-before and lockset)
    • - might require significant work (no C++ impl exists, as far as cjones knows)
    • + or, could be implemented as a less-significant modification of helgrind
    • - false positives
  • RaceFuzzer: something like a hybrid between a dynamic checker and model checker
    • + requires significant work
    • - Java only (and no public release, as far as cjones knows)
    • - technique probably won't scale to program of Firefox's size
    • + no false positives
  • SPIN: model checker
    • - resource requirements can be high, to put it mildly
    • - "good" models have to be hand-written in Promela, an "obscure" modeling language, to put it mildly
    • +/- can automatically generate models from C code, but the models are pretty crappy
    • + license looks OK
    • + no false positives

Plan (TODOs)

If you take one, please mark it as taken.

  • (??) add sharing annotations to existing multi-threaded code
  • (bent/cjones) evaluate tools above (except CHESS, for legal reasons)
    • if none suffices, write one
  • (??) decide on desired static analyses
    • use SharC, other tool, or write them with treehydra
  • (QA?) create set of test cases for dynamic analyses
    • same as regression tests?
  • (??) find least amount of platform code needed to usefully support non-trivial modules (such as worker threads)
    • create separate build target for each concurrent module? (like unit tests)
    • attempt to model-check these targets?
    • integrate with tinderbox?
  • (??) integrate testing into build system and/or tinderbox, wherever appropriate