133 lines
6.0 KiB
HTML
133 lines
6.0 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||
|
"http://www.w3.org/TR/html4/strict.dtd">
|
||
|
<html>
|
||
|
<head>
|
||
|
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||
|
<title>Clang - Performance</title>
|
||
|
<link type="text/css" rel="stylesheet" href="menu.css">
|
||
|
<link type="text/css" rel="stylesheet" href="content.css">
|
||
|
<style type="text/css">
|
||
|
</style>
|
||
|
</head>
|
||
|
<body>
|
||
|
|
||
|
<!--#include virtual="menu.html.incl"-->
|
||
|
|
||
|
<div id="content">
|
||
|
|
||
|
<!--*************************************************************************-->
|
||
|
<h1>Clang - Performance</h1>
|
||
|
<!--*************************************************************************-->
|
||
|
|
||
|
<p>This page tracks the compile time performance of Clang on two
|
||
|
interesting benchmarks:</p>
|
||
|
<ul>
|
||
|
<li><i>Sketch</i>: The Objective-C example application shipped on
|
||
|
Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
|
||
|
"typical" Objective-C app. The source itself has a relatively
|
||
|
small amount of code (~7,500 lines of source code), but it relies
|
||
|
on the extensive Cocoa APIs to build its functionality. Like many
|
||
|
Objective-C applications, it includes
|
||
|
<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
|
||
|
significant stress test of the front-end's performance on lexing,
|
||
|
preprocessing, parsing, and syntax analysis.</li>
|
||
|
<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
|
||
|
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
|
||
|
large amount of C source code (~220,000 lines) with few system
|
||
|
dependencies. This stresses the back-end's performance on generating
|
||
|
assembly code and debug information.</li>
|
||
|
</ul>
|
||
|
|
||
|
<!--*************************************************************************-->
|
||
|
<h2><a name="enduser">Experiments</a></h2>
|
||
|
<!--*************************************************************************-->
|
||
|
|
||
|
<p>Measurements are done by serially processing each file in the
|
||
|
respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
|
||
|
order to track the performance of various subsystems the timings have
|
||
|
been broken down into separate stages where possible:</p>
|
||
|
|
||
|
<ul>
|
||
|
<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
|
||
|
perform any output. For gcc and llvm-gcc, the -MM option is used
|
||
|
as a rough equivalent to this step.</li>
|
||
|
<li><tt>-parse-noop</tt>: This option runs the parser on the input,
|
||
|
but without semantic analysis or any output. gcc and llvm-gcc have
|
||
|
no equivalent for this option.</li>
|
||
|
<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
|
||
|
analysis.</li>
|
||
|
<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
|
||
|
converts to the LLVM intermediate representation but doesn't
|
||
|
generate native code.</li>
|
||
|
<li><tt>-S -O0</tt>: Perform actual code generation to produce a
|
||
|
native assembler file.</li>
|
||
|
<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
|
||
|
the assembly output.</li>
|
||
|
</ul>
|
||
|
|
||
|
<p>This set of stages is chosen to be approximately additive, that is
|
||
|
each subsequent stage simply adds some additional processing. The
|
||
|
timings measure the delta of the given stage from the previous
|
||
|
one. For example, the timings for <tt>-fsyntax-only</tt> below show
|
||
|
the difference of running with <tt>-fsyntax-only</tt> versus running
|
||
|
with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
|
||
|
llvm-gcc. This amounts to a fairly accurate measure of only the time
|
||
|
to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
|
||
|
|
||
|
<p>These timings are chosen to break down the compilation process for
|
||
|
clang as much as possible. The graphs below show these numbers
|
||
|
combined so that it is easy to see how the time for a particular task
|
||
|
is divided among various components. For example, <tt>-S -O0</tt>
|
||
|
includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
|
||
|
|
||
|
<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
|
||
|
faster than the GCC optimizers at a given -O level, so we only focus on -O0
|
||
|
compile time here.</p>
|
||
|
|
||
|
<!--*************************************************************************-->
|
||
|
<h2><a name="enduser">Timing Results</a></h2>
|
||
|
<!--*************************************************************************-->
|
||
|
|
||
|
<!--=======================================================================-->
|
||
|
<h3><a name="2008-10-31">2008-10-31</a></h3>
|
||
|
<!--=======================================================================-->
|
||
|
|
||
|
<h4 style="text-align:center">Sketch</h4>
|
||
|
<img class="img_slide"
|
||
|
src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings">
|
||
|
|
||
|
<p>This shows Clang's substantial performance improvements in
|
||
|
preprocessing and semantic analysis; over 90% faster on
|
||
|
-fsyntax-only. As expected, time spent in code generation for this
|
||
|
benchmark is relatively small. One caveat, Clang's debug information
|
||
|
generation for Objective-C is very incomplete; this means the <tt>-S
|
||
|
-O0 -g</tt> numbers are unfair since Clang is generating substantially
|
||
|
less output.</p>
|
||
|
|
||
|
<p>This chart also shows the effect of using precompiled headers (PCH)
|
||
|
on compiler time. gcc and llvm-gcc see a large performance improvement
|
||
|
with PCH; about 4x in wall time. Unfortunately, Clang does not yet
|
||
|
have an implementation of PCH-style optimizations, but we are actively
|
||
|
working to address this.</p>
|
||
|
|
||
|
<h4 style="text-align:center">176.gcc</h4>
|
||
|
<img class="img_slide"
|
||
|
src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings">
|
||
|
|
||
|
<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
|
||
|
involves a large amount of code generation. The time spent in Clang's
|
||
|
LLVM IR generation and code generation is on par with gcc's code
|
||
|
generation time but the improved parsing & semantic analysis
|
||
|
performance means Clang still comes in at ~29% faster versus gcc
|
||
|
on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
|
||
|
|
||
|
<p>These numbers indicate that Clang still has room for improvement in
|
||
|
several areas, notably our LLVM IR generation is significantly slower
|
||
|
than that of llvm-gcc, and both Clang and llvm-gcc incur a
|
||
|
significantly higher cost for adding debugging information compared to
|
||
|
gcc.</p>
|
||
|
|
||
|
</div>
|
||
|
</body>
|
||
|
</html>
|