347 lines
14 KiB
HTML
347 lines
14 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||
|
"http://www.w3.org/TR/html4/strict.dtd">
|
||
|
<html>
|
||
|
<head>
|
||
|
<title>Checker Developer Manual</title>
|
||
|
<link type="text/css" rel="stylesheet" href="menu.css">
|
||
|
<link type="text/css" rel="stylesheet" href="content.css">
|
||
|
<script type="text/javascript" src="scripts/menu.js"></script>
|
||
|
</head>
|
||
|
<body>
|
||
|
|
||
|
<div id="page">
|
||
|
<!--#include virtual="menu.html.incl"-->
|
||
|
|
||
|
<div id="content">
|
||
|
|
||
|
<h1 style="color:red">This Page Is Under Construction</h1>
|
||
|
|
||
|
<h1>Checker Developer Manual</h1>
|
||
|
|
||
|
<p>The static analyzer engine performs symbolic execution of the program and
|
||
|
relies on a set of checkers to implement the logic for detecting and
|
||
|
constructing bug reports. This page provides hints and guidelines for anyone
|
||
|
who is interested in implementing their own checker. The static analyzer is a
|
||
|
part of the Clang project, so consult <a href="http://clang.llvm.org/hacking.html">Hacking on Clang</a>
|
||
|
and <a href="http://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a>
|
||
|
for general developer guidelines and information. </p>
|
||
|
|
||
|
<ul>
|
||
|
<li><a href="#start">Getting Started</a></li>
|
||
|
<li><a href="#analyzer">Analyzer Overview</a></li>
|
||
|
<li><a href="#idea">Idea for a Checker</a></li>
|
||
|
<li><a href="#registration">Checker Registration</a></li>
|
||
|
<li><a href="#skeleton">Checker Skeleton</a></li>
|
||
|
<li><a href="#node">Exploded Node</a></li>
|
||
|
<li><a href="#bugs">Bug Reports</a></li>
|
||
|
<li><a href="#ast">AST Visitors</a></li>
|
||
|
<li><a href="#testing">Testing</a></li>
|
||
|
<li><a href="#commands">Useful Commands</a></li>
|
||
|
</ul>
|
||
|
|
||
|
<h2 id=start>Getting Started</h2>
|
||
|
<ul>
|
||
|
<li>To check out the source code and build the project, follow steps 1-4 of
|
||
|
the <a href="http://clang.llvm.org/get_started.html">Clang Getting Started</a>
|
||
|
page.</li>
|
||
|
|
||
|
<li>The analyzer source code is located under the Clang source tree:
|
||
|
<br><tt>
|
||
|
$ <b>cd llvm/tools/clang</b>
|
||
|
</tt>
|
||
|
<br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>,
|
||
|
<tt>test/Analysis</tt>.</li>
|
||
|
|
||
|
<li>The analyzer regression tests can be executed from the Clang's build
|
||
|
directory:
|
||
|
<br><tt>
|
||
|
$ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b>
|
||
|
</tt></li>
|
||
|
|
||
|
<li>Analyze a file with the specified checker:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b>
|
||
|
</tt></li>
|
||
|
|
||
|
<li>List the available checkers:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -analyzer-checker-help</b>
|
||
|
</tt></li>
|
||
|
|
||
|
<li>See the analyzer help for different output formats, fine tuning, and
|
||
|
debug options:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -help | grep "analyzer"</b>
|
||
|
</tt></li>
|
||
|
|
||
|
</ul>
|
||
|
|
||
|
<h2 id=analyzer>Static Analyzer Overview</h2>
|
||
|
The analyzer core performs symbolic execution of the given program. All the
|
||
|
input values are represented with symbolic values; further, the engine deduces
|
||
|
the values of all the expressions in the program based on the input symbols
|
||
|
and the path. The execution is path sensitive and every possible path through
|
||
|
the program is explored. The explored execution traces are represented with
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplidedGraph</a> object.
|
||
|
Each node of the graph is
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>,
|
||
|
which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>.
|
||
|
<p>
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a>
|
||
|
represents the corresponding location in the program (or the CFG graph).
|
||
|
<tt>ProgramPoint</tt> is also used to record additional information on
|
||
|
when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt>
|
||
|
kind means that the state is the result of purging dead symbols - the
|
||
|
analyzer's equivalent of garbage collection.
|
||
|
<p>
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a>
|
||
|
represents abstract state of the program. It consists of:
|
||
|
<ul>
|
||
|
<li><tt>Environment</tt> - a mapping from source code expressions to symbolic
|
||
|
values
|
||
|
<li><tt>Store</tt> - a mapping from memory locations to symbolic values
|
||
|
<li><tt>GenericDataMap</tt> - constraints on symbolic values
|
||
|
</ul>
|
||
|
|
||
|
<h3>Interaction with Checkers</h3>
|
||
|
Checkers are not merely passive receivers of the analyzer core changes - they
|
||
|
actively participate in the <tt>ProgramState</tt> construction through the
|
||
|
<tt>GenericDataMap</tt> which can be used to store the checker-defined part
|
||
|
of the state. Each time the analyzer engine explores a new statement, it
|
||
|
notifies each checker registered to listen for that statement, giving it an
|
||
|
opportunity to either report a bug or modify the state. (As a rule of thumb,
|
||
|
the checker itself should be stateless.) The checkers are called one after another
|
||
|
in the predefined order; thus, calling all the checkers adds a chain to the
|
||
|
<tt>ExplodedGraph</tt>.
|
||
|
|
||
|
<h3>Representing Values</h3>
|
||
|
During symbolic execution, <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a>
|
||
|
objects are used to represent the semantic evaluation of expressions. They can
|
||
|
represent things like concrete integers, symbolic values, or memory locations
|
||
|
(which are memory regions). They are a discriminated union of "values",
|
||
|
symbolic and otherwise.
|
||
|
<p>
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol)
|
||
|
is meant to represent abstract, but named, symbolic value.
|
||
|
Symbolic values can have constraints associated with them. Symbols represent
|
||
|
an actual (immutable) value. We might not know what its specific value is, but
|
||
|
we can associate constraints with that value as we analyze a path.
|
||
|
<p>
|
||
|
|
||
|
<a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol.
|
||
|
It is used to provide a lexicon of how to describe abstract memory. Regions can
|
||
|
layer on top of other regions, providing a layered approach to representing memory.
|
||
|
For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>,
|
||
|
but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could
|
||
|
be used to represent the memory associated with a specific field of that object.
|
||
|
So how do we represent symbolic memory regions? That's what <a href="http://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a>
|
||
|
is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the
|
||
|
symbol is unique and has a unique name; that symbol names the region.
|
||
|
<p>
|
||
|
Let's see how the analyzer processes the expressions in the following example:
|
||
|
<p>
|
||
|
<pre class="code_example">
|
||
|
int foo(int x) {
|
||
|
int y = x * 2;
|
||
|
int z = x;
|
||
|
...
|
||
|
}
|
||
|
</pre>
|
||
|
<p>
|
||
|
Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated,
|
||
|
we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in
|
||
|
this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>.
|
||
|
Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>,
|
||
|
which references the value <b>currently bound</b> to <tt>x</tt>. That value is
|
||
|
symbolic; it's whatever <tt>x</tt> was bound to at the start of the function.
|
||
|
Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>,
|
||
|
and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When
|
||
|
we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions,
|
||
|
and create a new <tt>SVal</tt> that represents their multiplication (which in
|
||
|
this case is a new symbolic expression, which we might call <tt>$1</tt>). When we
|
||
|
evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>),
|
||
|
and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>)
|
||
|
to the <tt>MemRegion</tt> in the symbolic store.
|
||
|
<br>
|
||
|
The second line is similar. When we evaluate <tt>x</tt> again, we do the same
|
||
|
dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt>
|
||
|
might reference the same underlying values.
|
||
|
|
||
|
<p>
|
||
|
To summarize, MemRegions are unique names for blocks of memory. Symbols are
|
||
|
unique names for abstract symbolic values. Some MemRegions represents abstract
|
||
|
symbolic chunks of memory, and thus are also based on symbols. SVals are just
|
||
|
references to values, and can reference either MemRegions, Symbols, or concrete
|
||
|
values (e.g., the number 1).
|
||
|
|
||
|
<!--
|
||
|
TODO: Add a picture.
|
||
|
<br>
|
||
|
Symbols<br>
|
||
|
FunctionalObjects are used throughout.
|
||
|
-->
|
||
|
<h2 id=idea>Idea for a Checker</h2>
|
||
|
Here are several questions which you should consider when evaluating your
|
||
|
checker idea:
|
||
|
<ul>
|
||
|
<li>Can the check be effectively implemented without path-sensitive
|
||
|
analysis? See <a href="#ast">AST Visitors</a>.</li>
|
||
|
|
||
|
<li>How high the false positive rate is going to be? Looking at the occurrences
|
||
|
of the issue you want to write a checker for in the existing code bases might
|
||
|
give you some ideas. </li>
|
||
|
|
||
|
<li>How the current limitations of the analysis will effect the false alarm
|
||
|
rate? Currently, the analyzer only reasons about one procedure at a time (no
|
||
|
inter-procedural analysis). Also, it uses a simple range tracking based
|
||
|
solver to model symbolic execution.</li>
|
||
|
|
||
|
<li>Consult the <a
|
||
|
href="http://llvm.org/bugs/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a>
|
||
|
to get some ideas for new checkers and consider starting with improving/fixing
|
||
|
bugs in the existing checkers.</li>
|
||
|
</ul>
|
||
|
|
||
|
<h2 id=registration>Checker Registration</h2>
|
||
|
All checker implementation files are located in <tt>clang/lib/StaticAnalyzer/Checkers</tt>
|
||
|
folder. Follow the steps below to register a new checker with the analyzer.
|
||
|
<ol>
|
||
|
<li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/NewChecker.cpp</tt>
|
||
|
<pre class="code_example">
|
||
|
using namespace clang;
|
||
|
using namespace ento;
|
||
|
|
||
|
namespace {
|
||
|
class NewChecker: public Checker< check::PreStmt<CallExpr> > {
|
||
|
public:
|
||
|
void checkPreStmt(const CallExpr *CE, CheckerContext &Ctx) const {}
|
||
|
}
|
||
|
}
|
||
|
void ento::registerNewChecker(CheckerManager &mgr) {
|
||
|
mgr.registerChecker<NewChecker>();
|
||
|
}
|
||
|
</pre>
|
||
|
|
||
|
<li>Pick the package name for your checker and add the registration code to
|
||
|
<tt>./lib/StaticAnalyzer/Checkers/Checkers.td</tt>. Note, all checkers should
|
||
|
first be developed as experimental. Suppose our new checker performs security
|
||
|
related checks, then we should add the following lines under
|
||
|
<tt>SecurityExperimental</tt> package:
|
||
|
<pre class="code_example">
|
||
|
let ParentPackage = SecurityExperimental in {
|
||
|
...
|
||
|
def NewChecker : Checker<"NewChecker">,
|
||
|
HelpText<"This text should give a short description of the checks performed.">,
|
||
|
DescFile<"NewChecker.cpp">;
|
||
|
...
|
||
|
} // end "security.experimental"
|
||
|
</pre>
|
||
|
|
||
|
<li>Make the source code file visible to CMake by adding it to
|
||
|
<tt>./lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>.
|
||
|
|
||
|
<li>Compile and see your checker in the list of available checkers by running:<br>
|
||
|
<tt><b>$clang -cc1 -analyzer-checker-help</b></tt>
|
||
|
</ol>
|
||
|
|
||
|
|
||
|
<h2 id=skeleton>Checker Skeleton</h2>
|
||
|
There are two main decisions you need to make:
|
||
|
<ul>
|
||
|
<li> Which events the checker should be tracking.
|
||
|
See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a>
|
||
|
for the list of available checker callbacks.</li>
|
||
|
<li> What data you want to store as part of the checker-specific program
|
||
|
state. Try to minimize the checker state as much as possible. </li>
|
||
|
</ul>
|
||
|
|
||
|
<h2 id=bugs>Bug Reports</h2>
|
||
|
|
||
|
<h2 id=ast>AST Visitors</h2>
|
||
|
Some checks might not require path-sensitivity to be effective. Simple AST walk
|
||
|
might be sufficient. If that is the case, consider implementing a Clang
|
||
|
compiler warning. On the other hand, a check might not be acceptable as a compiler
|
||
|
warning; for example, because of a relatively high false positive rate. In this
|
||
|
situation, AST callbacks <tt><b>checkASTDecl</b></tt> and
|
||
|
<tt><b>checkASTCodeBody</b></tt> are your best friends.
|
||
|
|
||
|
<h2 id=testing>Testing</h2>
|
||
|
Every patch should be well tested with Clang regression tests. The checker tests
|
||
|
live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests,
|
||
|
execute the following from the <tt>clang</tt> build directory:
|
||
|
<pre class="code">
|
||
|
$ <b>TESTDIRS=Analysis make test</b>
|
||
|
</pre>
|
||
|
|
||
|
<h2 id=commands>Useful Commands/Debugging Hints</h2>
|
||
|
<ul>
|
||
|
<li>
|
||
|
While investigating a checker-related issue, instruct the analyzer to only
|
||
|
execute a single checker:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To dump AST:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -ast-dump test.c</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> checkers:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To see all available debug checkers:
|
||
|
<br><tt>
|
||
|
$ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To see which function is failing while processing a large file use
|
||
|
<tt>-analyzer-display-progress</tt> option.
|
||
|
</li>
|
||
|
<li>
|
||
|
While debugging execute <tt>clang -cc1 -analyze -analyzer-checker=core</tt>
|
||
|
instead of <tt>clang --analyze</tt>, as the later would call the compiler
|
||
|
in a separate process.
|
||
|
</li>
|
||
|
<li>
|
||
|
To view <tt>ExplodedGraph</tt> (the state graph explored by the analyzer) while
|
||
|
debugging, goto a frame that has <tt>clang::ento::ExprEngine</tt> object and
|
||
|
execute:
|
||
|
<br><tt>
|
||
|
(gdb) <b>p ViewGraph(0)</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To see the <tt>ProgramState</tt> while debugging use the following command.
|
||
|
<br><tt>
|
||
|
(gdb) <b>p State->dump()</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To see <tt>clang::Expr</tt> while debugging use the following command. If you
|
||
|
pass in a SourceManager object, it will also dump the corresponding line in the
|
||
|
source code.
|
||
|
<br><tt>
|
||
|
(gdb) <b>p E->dump()</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
<li>
|
||
|
To dump AST of a method that the current <tt>ExplodedNode</tt> belongs to:
|
||
|
<br><tt>
|
||
|
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b>
|
||
|
(gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump(getContext().getSourceManager())</b>
|
||
|
</tt>
|
||
|
</li>
|
||
|
</ul>
|
||
|
|
||
|
</div>
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|