1f51a89d39
These were ignored by git accidentally. We want ALL OF THEM since they all came in the llvm/clang source distribution.
524 lines
24 KiB
HTML
524 lines
24 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
|
<html>
|
|
<head>
|
|
<title>Clang Driver Manual</title>
|
|
<link type="text/css" rel="stylesheet" href="../menu.css">
|
|
<link type="text/css" rel="stylesheet" href="../content.css">
|
|
<style type="text/css">
|
|
td {
|
|
vertical-align: top;
|
|
}
|
|
</style>
|
|
</head>
|
|
<body>
|
|
|
|
<!--#include virtual="../menu.html.incl"-->
|
|
|
|
<div id="content">
|
|
|
|
<h1>Driver Design & Internals</h1>
|
|
|
|
<ul>
|
|
<li><a href="#intro">Introduction</a></li>
|
|
<li><a href="#features">Features and Goals</a>
|
|
<ul>
|
|
<li><a href="#gcccompat">GCC Compatibility</a></li>
|
|
<li><a href="#components">Flexible</a></li>
|
|
<li><a href="#performance">Low Overhead</a></li>
|
|
<li><a href="#simple">Simple</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#design">Design</a>
|
|
<ul>
|
|
<li><a href="#int_intro">Internals Introduction</a></li>
|
|
<li><a href="#int_overview">Design Overview</a></li>
|
|
<li><a href="#int_notes">Additional Notes</a>
|
|
<ul>
|
|
<li><a href="#int_compilation">The Compilation Object</a></li>
|
|
<li><a href="#int_unified_parsing">Unified Parsing & Pipelining</a></li>
|
|
<li><a href="#int_toolchain_translation">ToolChain Argument Translation</a></li>
|
|
<li><a href="#int_unused_warnings">Unused Argument Warnings</a></li>
|
|
</ul>
|
|
</li>
|
|
<li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li>
|
|
</ul>
|
|
</li>
|
|
</ul>
|
|
|
|
|
|
<!-- ======================================================================= -->
|
|
<h2 id="intro">Introduction</h2>
|
|
<!-- ======================================================================= -->
|
|
|
|
<p>This document describes the Clang driver. The purpose of this
|
|
document is to describe both the motivation and design goals
|
|
for the driver, as well as details of the internal
|
|
implementation.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h2 id="features">Features and Goals</h2>
|
|
<!-- ======================================================================= -->
|
|
|
|
<p>The Clang driver is intended to be a production quality
|
|
compiler driver providing access to the Clang compiler and
|
|
tools, with a command line interface which is compatible with
|
|
the gcc driver.</p>
|
|
|
|
<p>Although the driver is part of and driven by the Clang
|
|
project, it is logically a separate tool which shares many of
|
|
the same goals as Clang:</p>
|
|
|
|
<p><b>Features</b>:</p>
|
|
<ul>
|
|
<li><a href="#gcccompat">GCC Compatibility</a></li>
|
|
<li><a href="#components">Flexible</a></li>
|
|
<li><a href="#performance">Low Overhead</a></li>
|
|
<li><a href="#simple">Simple</a></li>
|
|
</ul>
|
|
|
|
<!--=======================================================================-->
|
|
<h3 id="gcccompat">GCC Compatibility</h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>The number one goal of the driver is to ease the adoption of
|
|
Clang by allowing users to drop Clang into a build system
|
|
which was designed to call GCC. Although this makes the driver
|
|
much more complicated than might otherwise be necessary, we
|
|
decided that being very compatible with the gcc command line
|
|
interface was worth it in order to allow users to quickly test
|
|
clang on their projects.</p>
|
|
|
|
<!--=======================================================================-->
|
|
<h3 id="components">Flexible</h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>The driver was designed to be flexible and easily accommodate
|
|
new uses as we grow the clang and LLVM infrastructure. As one
|
|
example, the driver can easily support the introduction of
|
|
tools which have an integrated assembler; something we hope to
|
|
add to LLVM in the future.</p>
|
|
|
|
<p>Similarly, most of the driver functionality is kept in a
|
|
library which can be used to build other tools which want to
|
|
implement or accept a gcc like interface. </p>
|
|
|
|
<!--=======================================================================-->
|
|
<h3 id="performance">Low Overhead</h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>The driver should have as little overhead as possible. In
|
|
practice, we found that the gcc driver by itself incurred a
|
|
small but meaningful overhead when compiling many small
|
|
files. The driver doesn't do much work compared to a
|
|
compilation, but we have tried to keep it as efficient as
|
|
possible by following a few simple principles:</p>
|
|
<ul>
|
|
<li>Avoid memory allocation and string copying when
|
|
possible.</li>
|
|
|
|
<li>Don't parse arguments more than once.</li>
|
|
|
|
<li>Provide a few simple interfaces for efficiently searching
|
|
arguments.</li>
|
|
</ul>
|
|
|
|
<!--=======================================================================-->
|
|
<h3 id="simple">Simple</h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>Finally, the driver was designed to be "as simple as
|
|
possible", given the other goals. Notably, trying to be
|
|
completely compatible with the gcc driver adds a significant
|
|
amount of complexity. However, the design of the driver
|
|
attempts to mitigate this complexity by dividing the process
|
|
into a number of independent stages instead of a single
|
|
monolithic task.</p>
|
|
|
|
<!-- ======================================================================= -->
|
|
<h2 id="design">Internal Design and Implementation</h2>
|
|
<!-- ======================================================================= -->
|
|
|
|
<ul>
|
|
<li><a href="#int_intro">Internals Introduction</a></li>
|
|
<li><a href="#int_overview">Design Overview</a></li>
|
|
<li><a href="#int_notes">Additional Notes</a></li>
|
|
<li><a href="#int_gcc_concepts">Relation to GCC Driver Concepts</a></li>
|
|
</ul>
|
|
|
|
<!--=======================================================================-->
|
|
<h3><a name="int_intro">Internals Introduction</a></h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>In order to satisfy the stated goals, the driver was designed
|
|
to completely subsume the functionality of the gcc executable;
|
|
that is, the driver should not need to delegate to gcc to
|
|
perform subtasks. On Darwin, this implies that the Clang
|
|
driver also subsumes the gcc driver-driver, which is used to
|
|
implement support for building universal images (binaries and
|
|
object files). This also implies that the driver should be
|
|
able to call the language specific compilers (e.g. cc1)
|
|
directly, which means that it must have enough information to
|
|
forward command line arguments to child processes
|
|
correctly.</p>
|
|
|
|
<!--=======================================================================-->
|
|
<h3><a name="int_overview">Design Overview</a></h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>The diagram below shows the significant components of the
|
|
driver architecture and how they relate to one another. The
|
|
orange components represent concrete data structures built by
|
|
the driver, the green components indicate conceptually
|
|
distinct stages which manipulate these data structures, and
|
|
the blue components are important helper classes. </p>
|
|
|
|
<div style="text-align:center">
|
|
<a href="DriverArchitecture.png">
|
|
<img width=400 src="DriverArchitecture.png"
|
|
alt="Driver Architecture Diagram">
|
|
</a>
|
|
</div>
|
|
|
|
<!--=======================================================================-->
|
|
<h3><a name="int_stages">Driver Stages</a></h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>The driver functionality is conceptually divided into five stages:</p>
|
|
|
|
<ol>
|
|
<li>
|
|
<b>Parse: Option Parsing</b>
|
|
|
|
<p>The command line argument strings are decomposed into
|
|
arguments (<tt>Arg</tt> instances). The driver expects to
|
|
understand all available options, although there is some
|
|
facility for just passing certain classes of options
|
|
through (like <tt>-Wl,</tt>).</p>
|
|
|
|
<p>Each argument corresponds to exactly one
|
|
abstract <tt>Option</tt> definition, which describes how
|
|
the option is parsed along with some additional
|
|
metadata. The Arg instances themselves are lightweight and
|
|
merely contain enough information for clients to determine
|
|
which option they correspond to and their values (if they
|
|
have additional parameters).</p>
|
|
|
|
<p>For example, a command line like "-Ifoo -I foo" would
|
|
parse to two Arg instances (a JoinedArg and a SeparateArg
|
|
instance), but each would refer to the same Option.</p>
|
|
|
|
<p>Options are lazily created in order to avoid populating
|
|
all Option classes when the driver is loaded. Most of the
|
|
driver code only needs to deal with options by their
|
|
unique ID (e.g., <tt>options::OPT_I</tt>),</p>
|
|
|
|
<p>Arg instances themselves do not generally store the
|
|
values of parameters. In many cases, this would
|
|
simply result in creating unnecessary string
|
|
copies. Instead, Arg instances are always embedded inside
|
|
an ArgList structure, which contains the original vector
|
|
of argument strings. Each Arg itself only needs to contain
|
|
an index into this vector instead of storing its values
|
|
directly.</p>
|
|
|
|
<p>The clang driver can dump the results of this
|
|
stage using the <tt>-ccc-print-options</tt> flag (which
|
|
must precede any actual command line arguments). For
|
|
example:</p>
|
|
<pre>
|
|
$ <b>clang -ccc-print-options -Xarch_i386 -fomit-frame-pointer -Wa,-fast -Ifoo -I foo t.c</b>
|
|
Option 0 - Name: "-Xarch_", Values: {"i386", "-fomit-frame-pointer"}
|
|
Option 1 - Name: "-Wa,", Values: {"-fast"}
|
|
Option 2 - Name: "-I", Values: {"foo"}
|
|
Option 3 - Name: "-I", Values: {"foo"}
|
|
Option 4 - Name: "<input>", Values: {"t.c"}
|
|
</pre>
|
|
|
|
<p>After this stage is complete the command line should be
|
|
broken down into well defined option objects with their
|
|
appropriate parameters. Subsequent stages should rarely,
|
|
if ever, need to do any string processing.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Pipeline: Compilation Job Construction</b>
|
|
|
|
<p>Once the arguments are parsed, the tree of subprocess
|
|
jobs needed for the desired compilation sequence are
|
|
constructed. This involves determining the input files and
|
|
their types, what work is to be done on them (preprocess,
|
|
compile, assemble, link, etc.), and constructing a list of
|
|
Action instances for each task. The result is a list of
|
|
one or more top-level actions, each of which generally
|
|
corresponds to a single output (for example, an object or
|
|
linked executable).</p>
|
|
|
|
<p>The majority of Actions correspond to actual tasks,
|
|
however there are two special Actions. The first is
|
|
InputAction, which simply serves to adapt an input
|
|
argument for use as an input to other Actions. The second
|
|
is BindArchAction, which conceptually alters the
|
|
architecture to be used for all of its input Actions.</p>
|
|
|
|
<p>The clang driver can dump the results of this
|
|
stage using the <tt>-ccc-print-phases</tt> flag. For
|
|
example:</p>
|
|
<pre>
|
|
$ <b>clang -ccc-print-phases -x c t.c -x assembler t.s</b>
|
|
0: input, "t.c", c
|
|
1: preprocessor, {0}, cpp-output
|
|
2: compiler, {1}, assembler
|
|
3: assembler, {2}, object
|
|
4: input, "t.s", assembler
|
|
5: assembler, {4}, object
|
|
6: linker, {3, 5}, image
|
|
</pre>
|
|
<p>Here the driver is constructing seven distinct actions,
|
|
four to compile the "t.c" input into an object file, two to
|
|
assemble the "t.s" input, and one to link them together.</p>
|
|
|
|
<p>A rather different compilation pipeline is shown here; in
|
|
this example there are two top level actions to compile
|
|
the input files into two separate object files, where each
|
|
object file is built using <tt>lipo</tt> to merge results
|
|
built for two separate architectures.</p>
|
|
<pre>
|
|
$ <b>clang -ccc-print-phases -c -arch i386 -arch x86_64 t0.c t1.c</b>
|
|
0: input, "t0.c", c
|
|
1: preprocessor, {0}, cpp-output
|
|
2: compiler, {1}, assembler
|
|
3: assembler, {2}, object
|
|
4: bind-arch, "i386", {3}, object
|
|
5: bind-arch, "x86_64", {3}, object
|
|
6: lipo, {4, 5}, object
|
|
7: input, "t1.c", c
|
|
8: preprocessor, {7}, cpp-output
|
|
9: compiler, {8}, assembler
|
|
10: assembler, {9}, object
|
|
11: bind-arch, "i386", {10}, object
|
|
12: bind-arch, "x86_64", {10}, object
|
|
13: lipo, {11, 12}, object
|
|
</pre>
|
|
|
|
<p>After this stage is complete the compilation process is
|
|
divided into a simple set of actions which need to be
|
|
performed to produce intermediate or final outputs (in
|
|
some cases, like <tt>-fsyntax-only</tt>, there is no
|
|
"real" final output). Phases are well known compilation
|
|
steps, such as "preprocess", "compile", "assemble",
|
|
"link", etc.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Bind: Tool & Filename Selection</b>
|
|
|
|
<p>This stage (in conjunction with the Translate stage)
|
|
turns the tree of Actions into a list of actual subprocess
|
|
to run. Conceptually, the driver performs a top down
|
|
matching to assign Action(s) to Tools. The ToolChain is
|
|
responsible for selecting the tool to perform a particular
|
|
action; once selected the driver interacts with the tool
|
|
to see if it can match additional actions (for example, by
|
|
having an integrated preprocessor).
|
|
|
|
<p>Once Tools have been selected for all actions, the driver
|
|
determines how the tools should be connected (for example,
|
|
using an inprocess module, pipes, temporary files, or user
|
|
provided filenames). If an output file is required, the
|
|
driver also computes the appropriate file name (the suffix
|
|
and file location depend on the input types and options
|
|
such as <tt>-save-temps</tt>).
|
|
|
|
<p>The driver interacts with a ToolChain to perform the Tool
|
|
bindings. Each ToolChain contains information about all
|
|
the tools needed for compilation for a particular
|
|
architecture, platform, and operating system. A single
|
|
driver invocation may query multiple ToolChains during one
|
|
compilation in order to interact with tools for separate
|
|
architectures.</p>
|
|
|
|
<p>The results of this stage are not computed directly, but
|
|
the driver can print the results via
|
|
the <tt>-ccc-print-bindings</tt> option. For example:</p>
|
|
<pre>
|
|
$ <b>clang -ccc-print-bindings -arch i386 -arch ppc t0.c</b>
|
|
# "i386-apple-darwin9" - "clang", inputs: ["t0.c"], output: "/tmp/cc-Sn4RKF.s"
|
|
# "i386-apple-darwin9" - "darwin::Assemble", inputs: ["/tmp/cc-Sn4RKF.s"], output: "/tmp/cc-gvSnbS.o"
|
|
# "i386-apple-darwin9" - "darwin::Link", inputs: ["/tmp/cc-gvSnbS.o"], output: "/tmp/cc-jgHQxi.out"
|
|
# "ppc-apple-darwin9" - "gcc::Compile", inputs: ["t0.c"], output: "/tmp/cc-Q0bTox.s"
|
|
# "ppc-apple-darwin9" - "gcc::Assemble", inputs: ["/tmp/cc-Q0bTox.s"], output: "/tmp/cc-WCdicw.o"
|
|
# "ppc-apple-darwin9" - "gcc::Link", inputs: ["/tmp/cc-WCdicw.o"], output: "/tmp/cc-HHBEBh.out"
|
|
# "i386-apple-darwin9" - "darwin::Lipo", inputs: ["/tmp/cc-jgHQxi.out", "/tmp/cc-HHBEBh.out"], output: "a.out"
|
|
</pre>
|
|
|
|
<p>This shows the tool chain, tool, inputs and outputs which
|
|
have been bound for this compilation sequence. Here clang
|
|
is being used to compile t0.c on the i386 architecture and
|
|
darwin specific versions of the tools are being used to
|
|
assemble and link the result, but generic gcc versions of
|
|
the tools are being used on PowerPC.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Translate: Tool Specific Argument Translation</b>
|
|
|
|
<p>Once a Tool has been selected to perform a particular
|
|
Action, the Tool must construct concrete Jobs which will be
|
|
executed during compilation. The main work is in translating
|
|
from the gcc style command line options to whatever options
|
|
the subprocess expects.</p>
|
|
|
|
<p>Some tools, such as the assembler, only interact with a
|
|
handful of arguments and just determine the path of the
|
|
executable to call and pass on their input and output
|
|
arguments. Others, like the compiler or the linker, may
|
|
translate a large number of arguments in addition.</p>
|
|
|
|
<p>The ArgList class provides a number of simple helper
|
|
methods to assist with translating arguments; for example,
|
|
to pass on only the last of arguments corresponding to some
|
|
option, or all arguments for an option.</p>
|
|
|
|
<p>The result of this stage is a list of Jobs (executable
|
|
paths and argument strings) to execute.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Execute</b>
|
|
<p>Finally, the compilation pipeline is executed. This is
|
|
mostly straightforward, although there is some interaction
|
|
with options
|
|
like <tt>-pipe</tt>, <tt>-pass-exit-codes</tt>
|
|
and <tt>-time</tt>.</p>
|
|
</li>
|
|
|
|
</ol>
|
|
|
|
<!--=======================================================================-->
|
|
<h3><a name="int_notes">Additional Notes</a></h3>
|
|
<!--=======================================================================-->
|
|
|
|
<h4 id="int_compilation">The Compilation Object</h4>
|
|
|
|
<p>The driver constructs a Compilation object for each set of
|
|
command line arguments. The Driver itself is intended to be
|
|
invariant during construction of a Compilation; an IDE should be
|
|
able to construct a single long lived driver instance to use
|
|
for an entire build, for example.</p>
|
|
|
|
<p>The Compilation object holds information that is particular
|
|
to each compilation sequence. For example, the list of used
|
|
temporary files (which must be removed once compilation is
|
|
finished) and result files (which should be removed if
|
|
compilation fails).</p>
|
|
|
|
<h4 id="int_unified_parsing">Unified Parsing & Pipelining</h4>
|
|
|
|
<p>Parsing and pipelining both occur without reference to a
|
|
Compilation instance. This is by design; the driver expects that
|
|
both of these phases are platform neutral, with a few very well
|
|
defined exceptions such as whether the platform uses a driver
|
|
driver.</p>
|
|
|
|
<h4 id="int_toolchain_translation">ToolChain Argument Translation</h4>
|
|
|
|
<p>In order to match gcc very closely, the clang driver
|
|
currently allows tool chains to perform their own translation of
|
|
the argument list (into a new ArgList data structure). Although
|
|
this allows the clang driver to match gcc easily, it also makes
|
|
the driver operation much harder to understand (since the Tools
|
|
stop seeing some arguments the user provided, and see new ones
|
|
instead).</p>
|
|
|
|
<p>For example, on Darwin <tt>-gfull</tt> gets translated into two
|
|
separate arguments, <tt>-g</tt>
|
|
and <tt>-fno-eliminate-unused-debug-symbols</tt>. Trying to write Tool
|
|
logic to do something with <tt>-gfull</tt> will not work, because Tool
|
|
argument translation is done after the arguments have been
|
|
translated.</p>
|
|
|
|
<p>A long term goal is to remove this tool chain specific
|
|
translation, and instead force each tool to change its own logic
|
|
to do the right thing on the untranslated original arguments.</p>
|
|
|
|
<h4 id="int_unused_warnings">Unused Argument Warnings</h4>
|
|
<p>The driver operates by parsing all arguments but giving Tools
|
|
the opportunity to choose which arguments to pass on. One
|
|
downside of this infrastructure is that if the user misspells
|
|
some option, or is confused about which options to use, some
|
|
command line arguments the user really cared about may go
|
|
unused. This problem is particularly important when using
|
|
clang as a compiler, since the clang compiler does not support
|
|
anywhere near all the options that gcc does, and we want to make
|
|
sure users know which ones are being used.</p>
|
|
|
|
<p>To support this, the driver maintains a bit associated with
|
|
each argument of whether it has been used (at all) during the
|
|
compilation. This bit usually doesn't need to be set by hand,
|
|
as the key ArgList accessors will set it automatically.</p>
|
|
|
|
<p>When a compilation is successful (there are no errors), the
|
|
driver checks the bit and emits an "unused argument" warning for
|
|
any arguments which were never accessed. This is conservative
|
|
(the argument may not have been used to do what the user wanted)
|
|
but still catches the most obvious cases.</p>
|
|
|
|
<!--=======================================================================-->
|
|
<h3><a name="int_gcc_concepts">Relation to GCC Driver Concepts</a></h3>
|
|
<!--=======================================================================-->
|
|
|
|
<p>For those familiar with the gcc driver, this section provides
|
|
a brief overview of how things from the gcc driver map to the
|
|
clang driver.</p>
|
|
|
|
<ul>
|
|
<li>
|
|
<b>Driver Driver</b>
|
|
<p>The driver driver is fully integrated into the clang
|
|
driver. The driver simply constructs additional Actions to
|
|
bind the architecture during the <i>Pipeline</i>
|
|
phase. The tool chain specific argument translation is
|
|
responsible for handling <tt>-Xarch_</tt>.</p>
|
|
|
|
<p>The one caveat is that this approach
|
|
requires <tt>-Xarch_</tt> not be used to alter the
|
|
compilation itself (for example, one cannot
|
|
provide <tt>-S</tt> as an <tt>-Xarch_</tt> argument). The
|
|
driver attempts to reject such invocations, and overall
|
|
there isn't a good reason to abuse <tt>-Xarch_</tt> to
|
|
that end in practice.</p>
|
|
|
|
<p>The upside is that the clang driver is more efficient and
|
|
does little extra work to support universal builds. It also
|
|
provides better error reporting and UI consistency.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Specs</b>
|
|
<p>The clang driver has no direct correspondent for
|
|
"specs". The majority of the functionality that is
|
|
embedded in specs is in the Tool specific argument
|
|
translation routines. The parts of specs which control the
|
|
compilation pipeline are generally part of
|
|
the <i>Pipeline</i> stage.</p>
|
|
</li>
|
|
|
|
<li>
|
|
<b>Toolchains</b>
|
|
<p>The gcc driver has no direct understanding of tool
|
|
chains. Each gcc binary roughly corresponds to the
|
|
information which is embedded inside a single
|
|
ToolChain.</p>
|
|
|
|
<p>The clang driver is intended to be portable and support
|
|
complex compilation environments. All platform and tool
|
|
chain specific code should be protected behind either
|
|
abstract or well defined interfaces (such as whether the
|
|
platform supports use as a driver driver).</p>
|
|
</li>
|
|
</ul>
|
|
</div>
|
|
</body>
|
|
</html>
|