Tuesday, December 18, 2007

MaceMC: Model Checking for the Systems Guys

A couple of months ago, Prof. Qiu directed me to a very cool paper about a model checker that won "Best Paper" in NSDI 2007. The NSDI '08 program committee met here in Austin yesterday to go through papers, and were kind enough to stick around and give an all-day workshop on their own research today. Amin Vahdat was a member of the program committee as well as being an author on the paper, so he gave a little presentation on it. (He also got into a bit of a tiff w/ Mothy Roscoe from ETH Zurich over whether his approach w/ Mace was right or not, but that's neither here nor there...)

The cool thing about the paper is that it introduces a model checker for a relevant class of liveness properties, rather than the normal safety properties that any old model checker off the street handles. Safety properties are essentially those properties that say something bad never happens. For example, you may have as a safety property that a password is never stored in plaintext. Liveness properties on the other hand state that something good will eventually happen. An example of a liveness property is that my waiter will eventually bring me my food, though it may take unbearably long (a safety property is that he never spits in it!).

Model checking, in a nutshell, is a way of systematically exploring the state space of a program and checking whether it meets a certain specification (this is a very rough nutshell!). In theory, for finite state machines, you can exhaustively test the state space and verify that it does in fact have the specific safety properties you desire by showing that the properties hold in every reachable state. In practice, this is intractable because of the "State Explosion" problem which results in a few more states than you would like (think combinatorially many more!). The nice thing about model checking safety properties (in an FSM) is that it is decidable; if you run the model checker long enough, it will either prove that the properties are met, or it will give you a counter example (a state that violates the property). On the other hand, liveness properties rarely have counter examples. In general, a counter example for a liveness property is an infinitely long trace where the liveness requirement is never met. Kind of hard to generate those buggers... In certain cases, for example loops in a deterministic system, counter examples can be shown, but this doesn't hold in general.

The guys in the paper (@ UCSD) have a special language (Mace) that they use for developing distributed systems, and this is the language that their model checker verifies. So, unlike many model checkers which are used to verify the correctness of little "toy" models written in some logic, this works directly upon the implementation itself and can find real-world bugs.

The Paper

No comments: