Automated Theorem Proving in Austin TX

Thursday, February 21, 2008

Model Checking for Fun and Profit

So, I'm starting to fall in love with model checking. It looks like it will lead to much of what I'm looking for from formal verification: user friendliness, push-buttonness, and utility in actual software development amongst other things. Now, it's never going to enable you to do full functional correctness on anything but the most trivial of programs, but that's not what's needed most of the time. Simply bring model checking to bear in a big way in normal software development would be a huge step forward for both formal methods and software engineering. Projects like SLAM are wonderful examples of the utility that a properly designed and integrated formal methods suite can bring to bear.

Wednesday, February 20, 2008

Billie G and the Boys

Turns out the Eclipse problem wasn't versioning, but rather syncing. For some reason, I had told unison not to sync ".metadata" directories. Naturally, this kept the Eclipse project metadata from getting copied over, yadda yadda yadda.

Anyways, Bill Gates gave a lecture today at UT, talking about the future of software etc. Most of it was standard "exciting time to be in CS" boiler plate that he's clearly given a million times, but some things he said were actually interesting. In particular, he made a brief statement about higher level languages, and how he doesn't believe that we are operating at a high enough level yet. In particular, he claims that we can get to within an order of magnitude of density of English specification. It made me realize that I'm not very familiar with the recent developments in highly dense and readable programming languages. I'm vaguely aware of Joy and Icon, but I don't have enough experience to speak on them.

P.S. If you can't tell, I'm turning this more into an actual blog/work journal. I started looking around for an application to do journaling when I remembered that I had this already set up!

Monday, February 18, 2008

Little bit of thinking, not much working

Today wasn't very productive. I got up early, waited in line (along with ~70 other idiots) to get my bracelet for the Bill Gates talk on Wednesday. Later that day I discovered that there was quite a surplus. Oh well.

I raced home and ran up to work. Once there, I did a whole lot of nothing for 2 hours. On the plus side, I figured out how to auto-forward my Lotus Notes mail to my outside address (use filters!), so I shouldn't need to actually go into NI as often. It'd be even easier if I had figured out how to setup Lotus on my laptop for remote access, but oh well. I don't want no stinking Lotus code on my nice clean laptop anyways.

I ran back home and then biked back to UT in time for the PL Lunch. While enjoying my tasty BBQ, I heard an interesting presentation on "Repairing Data Structures using Assertions" by Baseem Elkarablieh (dry run for his thesis proposal). In a nutshell, he used a boolean function (repOK()) that verified the integrity of a data structure to generate a search for a new data structure that satisfied the repOK() predicate. The basic approach is little more than an application of constraint solving (hardly novel when you consider their uses in test generation), but the really cool stuff came out of his work using static analysis to optimize the search process. On (moderately) complicated structures, he saw an order of magnitude increase in performance.

Relevant papers:
Assertion-based Repair of Complex Data Structures
STARC: Static Analysis for Efficient Repair of Complex Data

Afterwards, I ran home yet again (I'm going to be sore tomorrow) and tried to get cracking on some ideas I've been researching for work. But, I kept coming up with excuses for why I couldn't start working just yet; "My computers are out of sync" (run unison), "My folders are disorganized" (move massive files to file server, delete crap), and finally, the biggie: "Eclipse doesn't work". This is the only remotely legitimate excuse I came up with. See, for some bizarre reason, even though I synced my Projects directory between minerva and mercury, my work project workspace didn't show up in mercury. I suspect that this has to do with some subtle version difference between the Eclipse versions in Feisty and Gutsy (minerva is Gutsy), so I'm finally getting around to upgrading my long lived and trustworthy install of Feisty. It'll take a few more hours, so we'll see how it went in the morning...

Tuesday, December 18, 2007

MaceMC: Model Checking for the Systems Guys

A couple of months ago, Prof. Qiu directed me to a very cool paper about a model checker that won "Best Paper" in NSDI 2007. The NSDI '08 program committee met here in Austin yesterday to go through papers, and were kind enough to stick around and give an all-day workshop on their own research today. Amin Vahdat was a member of the program committee as well as being an author on the paper, so he gave a little presentation on it. (He also got into a bit of a tiff w/ Mothy Roscoe from ETH Zurich over whether his approach w/ Mace was right or not, but that's neither here nor there...)

The cool thing about the paper is that it introduces a model checker for a relevant class of liveness properties, rather than the normal safety properties that any old model checker off the street handles. Safety properties are essentially those properties that say something bad never happens. For example, you may have as a safety property that a password is never stored in plaintext. Liveness properties on the other hand state that something good will eventually happen. An example of a liveness property is that my waiter will eventually bring me my food, though it may take unbearably long (a safety property is that he never spits in it!).

Model checking, in a nutshell, is a way of systematically exploring the state space of a program and checking whether it meets a certain specification (this is a very rough nutshell!). In theory, for finite state machines, you can exhaustively test the state space and verify that it does in fact have the specific safety properties you desire by showing that the properties hold in every reachable state. In practice, this is intractable because of the "State Explosion" problem which results in a few more states than you would like (think combinatorially many more!). The nice thing about model checking safety properties (in an FSM) is that it is decidable; if you run the model checker long enough, it will either prove that the properties are met, or it will give you a counter example (a state that violates the property). On the other hand, liveness properties rarely have counter examples. In general, a counter example for a liveness property is an infinitely long trace where the liveness requirement is never met. Kind of hard to generate those buggers... In certain cases, for example loops in a deterministic system, counter examples can be shown, but this doesn't hold in general.

The guys in the paper (@ UCSD) have a special language (Mace) that they use for developing distributed systems, and this is the language that their model checker verifies. So, unlike many model checkers which are used to verify the correctness of little "toy" models written in some logic, this works directly upon the implementation itself and can find real-world bugs.

The Paper

Saturday, September 29, 2007

Formal Methods at Microsoft

The other night (coincidentally right before an initial interview with Microsoft) I ran across the Static Driver Verification project in the Windows Group at Microsoft. SDV is a tool that integrates a home grown model checker (SLAM) for drivers with a whole suite of specifications for Windows HW drivers. This is an amazing example of the potential benefits of formal verification in software when an appropriate focus is put on utility as well as theoretical interest.

The Microsoft Research group that produced SDV also wrote a tech report on their success in a technology transfer from a research project to an actual software tool usable by real developers both inside and outside of MS.

It makes me wonder whether or not a similar project could be undertaken for Linux. It seems that there would be a significant barrier from the severe lack of stable driver APIs, requiring a constant update of the formal specifications. Whatever spec language used would have to be easily maintainable by regular kernel developers as they update the interfaces in each release.

Friday, September 21, 2007

Slow to start

It seems this blog has been abandoned. There weren't any posts this summer as I was working at IBM in the Linux Technology Center on a project totally unrelated to ATP (we used Xen + libvirt; awesome stuff!).

But, now that the year has started up again, I've found myself employed by a large Austin company to integrate the ACL2 system into their existing product. Yeah! I'm not sure how much of the actual project I can disclose, but I'll try to put what I learn about ACL2 here as I go along.

Sunday, April 8, 2007

Counterexamples in ACL2

One of the cool features missing in ACL2 is the ability to create counterexamples to a claim upon a failed proof attempt. This isn't always possible of course¹, but it would still be a useful feature, since the existence of a counterexample would let users know that the attempt failed because the proposition wasn't a theorem, not because of a lack of capability in ACL2.

A failed search for counterexamples might also help ACL2 recognize why a proposition is in fact probably true, and find a new approach to proving it. This is pretty unlikely, but it could be a useful step in helping the ATP get over the problem of emulating human proof techniques. When we're stuck finding a proof for a theorem, and we suspect that the conjecture is false, we might try to find a counterexample. As we search for a counterexample, we may discover that a certain tact we pick is fruitless because a subtle condition of the conjecture that we missed covers this case. This newfound insight can help us find a new proof by observing some previously overlooked structure in the problem.

Some interesting work in this realm has been done in the form of QuickCheck, and its Lisp child ClickCheck. There is also a technical report on this topic by Myung Won Kim that I haven't made my way through yet.

1 Consider if a program were always able to construct a counterexample to every false claim. Then theorem proving would be decidable, since we could tell a whether a conjecture was a theorem by whether or not we can construct a counterexample to it. But it is in fact undecidable, so we have a contradiction.