The kinds of static analysis you're talking about are different than what Matt is talking about in his article.
Matt is talking about situations where you can make guarantees. That is, if his analyzer says that a register can never take on a negative value, that is a guarantee. It is not based on a heuristic, it is based on a proof. How this guarantee is possible, despite the halting problem, is because he designs an abstract representation of the program being analyzed. It loses a lot of information, but by using it, one can actually prove things about the program.
If I understand you correctly, you're talking about heuristics which allow one to say "It is very likely that this program will exhibit that behavior." It's not a guarantee, because it's not based on a proof, but rather on heuristics which try to match patterns in the code in question to patterns in code that was known to exhibit such behavior. Carmack talks about using such static analysis in his projects in a popular blog post (HN discussions: http://news.ycombinator.com/item?id=4543553 and http://news.ycombinator.com/item?id=3388290).
In simpler terms, Matt's approach avoids the halting problem by only providing guarantees on a limited set of behaviors. The approach you're talking about avoids the halting problem by not providing guarantees, but by providing likelihoods.
No, I'm talking about the same thing Matt is. Notice how, in his sign analysis example, the analyzer can return {-, 0, +}, indicating that it doesn't know the sign of some particular value. Thus, it's not an algorithm for determining the signs of values computed by an arbitrary program; it's only a heuristic that will sometimes tell you the sign. (And I don't mean "sometimes" probabilistically -- I mean "for some expressions in some programs". Given a particular expression in a particular program, it's deterministic.)
The art of static analysis, then, is to design heuristics that provide useful information in enough cases to be of practical value.
It's useful to make a distinction between the "sound" static analyses Matt Might posted about and heuristic-based, or "pragmatic" analyses.
Cousot's Analysis of Signs is sound. It can fail by returning {-,0,+}. However, when it succeeds, it does make guarantees. If it returns {0} for an integer, then it has proven that that integer will always and only have the value 0. Furthermore, if that integer can be 0, it is guaranteed the analysis will return a set containing 0.
Pragmatic analyses have no such guarantees; if a pragmatic analysis of signs returns {0}, then anything could happen. They are "pragmatic" in that their value is determined by how correlated their output is with the answer on real programs; there are no theorems to prove about them.
If you can provide a guarantee based on a proof, then I do not consider that a heuristic. I consider there to be two differences between proof-based analysis and heuristic-based analysis:
1. Proof-based analysis may provide guarantees. In other words, no false-positives, but false-negatives are possible. Heuristics cannot provide guarantees, only probabilities, and they can have both false-positives and false-negatives.
2. Heuristic-based analysis require data to create. For example, let's say we analyzed some code and saw:
A heuristic based analysis may say "The pattern of allocations and uses in this code is consistent with known memory leaks in other applications." That is, we have recognized a pattern in code that when seen elsewhere, there was a memory leak. This is a very different animal than proof-based analysis because it relies on historical data and pattern matching against it. But it may be a false-positive. In contrast, a proof-based analysis does not need to look at any other code but the one in question; it has proven what the behavior is. It could not provide a false-positive.
There are two different kinds of proofs in discussion here. You're talking about the proofs the analyzer provides about the behavior of the program under analysis. I'm talking about what we can or cannot prove about the behavior of the analyzer itself. Can we prove that it will never construct an incorrect proof? Or that it will find a proof if one exists?
Those are all actually the same proof. The proofs that the analysis provides about the program are essentially the soundness proof of the analysis combined with its output. So if I prove that the signs that values of a variable can hold will be contained in the set found by my static analysis, and my analyzer outputs {+}, then that is a proof that that variable can only store positive values. It produces a proof in the same sense that, if I prove the Pythagorean theorem, then I can evaluate the expression sqrt(x^2+y^2) with x=3 and y=4 to get a proof that the corresponding line has length 5. Getting an incorrect proof out of that would be tantamount to an arithmetic error.
Note that most analyses do no actual automated theorem proving; they just, conceptually at least, instantiate existing proofs. If, by "find a proof," you mean "show the presence of bugs if there are any," then that's part of the same soundness proof. Conversely, showing the absence of bugs if there aren't any is completeness; you can't have both due to the halting problem.
Whether an analyzer correctly implements an analysis is a totally separate beast, and is an example of verification. Xavier Leroy has done some excellent work in verifying static analyzers.
I agree with your post, though I am curious to know if there is any use for "this function is likely to terminate" inside static analysis.
How this guarantee is possible, despite the halting problem, is because he designs an abstract representation of the program being analyzed. It loses a lot of information, but by using it, one can actually prove things about the program.
I would rather say: the halting problem shows that there exists at least one program for which it is impossible to prove termination - not that every termination problem is undecidable, as the trivial example "return 5;" shows.
The conclusion "this function is likely to terminate" may not be helpful, but "this function is likely to leak memory" may be extraordinarily helpful.
The reason that everyone brings up the halting problem in this context is not that we want to test programs if they halt. (Well, we may, but we're generally looking for other behaviors.) It's because for some behaviors, if you want to be able to handle all cases, then that problem can be reduced to the halting problem. Matt provides an excellent demonstration of this with array indices.
I'm confused by your restatement, because it does not restate what I said. You're talking about the halting problem. I'm talking about proving general behaviors in programs, despite the theoretical result that solving the halting problem is impossible, and that many behaviors can be reduced to the halting problem. The solution, as Matt explains, is to find ways to describe behaviors so they do not reduce to the halting problem. Using this technique, you still can have false negatives ("I don't know"), but you can guarantee no false positives.
PREfix incorporated analysis to identify likely exit functions. This help cut down false positives in situations like
if (p == NULL)
fatal_error();
*p = 0;
With the inference-based analysis we did in PREfix, if we don't recognize fatal_error() as a function that's likely to exit the program, we'd report a potential NULL pointer on the next line. But we didn't have the source code for fatal_error, how could we possibly know that? Ideally it would be a volatile function but a lot of older code is very sloppy about that. So we employed various heuristics based on the names of the functions (and gave the user an opportunity to tune the results) ... it worked surprisingly well in practice.
I am curious to know if there is any use for "this function is likely to terminate" inside static analysis.
Well, something like "this condition is likely to be true" could help give an estimate for whether a loop will run many times, which leads to things like a better estimate for the cost of spilling to the stack values used in that loop instead of keeping them in registers.
the halting problem shows that there exists at least one program for which it is impossible to prove termination
I'm not sure that "every halting checker has some input where it gives the wrong answer" permits the conclusion "there is some input where they all give the wrong answer." Is "the machine must generate a proof of its answer" what lets you make that extra leap?
Matt is talking about situations where you can make guarantees. That is, if his analyzer says that a register can never take on a negative value, that is a guarantee. It is not based on a heuristic, it is based on a proof. How this guarantee is possible, despite the halting problem, is because he designs an abstract representation of the program being analyzed. It loses a lot of information, but by using it, one can actually prove things about the program.
If I understand you correctly, you're talking about heuristics which allow one to say "It is very likely that this program will exhibit that behavior." It's not a guarantee, because it's not based on a proof, but rather on heuristics which try to match patterns in the code in question to patterns in code that was known to exhibit such behavior. Carmack talks about using such static analysis in his projects in a popular blog post (HN discussions: http://news.ycombinator.com/item?id=4543553 and http://news.ycombinator.com/item?id=3388290).
In simpler terms, Matt's approach avoids the halting problem by only providing guarantees on a limited set of behaviors. The approach you're talking about avoids the halting problem by not providing guarantees, but by providing likelihoods.