Parsing C++ with GCC plugins, Part 1

You have probably heard about the recent release of GCC 4.5.0. One of the new features in this version is the support for plugins. You can now write a shared object (.so) that can be loaded into GCC and hooked into various stages of the compilation process.

In the past couple of months I have been working on a new project (what it’s about is a secret, for now; UDATE: no longer a secret ) that uses GCC and the new plugin feature in order to parse C++ and then to generate some code based on it.

Writing a plugin to accomplish this was both fun and frustrating. Fun because GCC has a very rich abstract syntax tree (AST, sometimes called C++ Tree in GCC documentation). The amount of information available about parsed C++ is amazing; there isn’t much you can’t infer about the code. It was frustrating because this AST is very complex and very poorly documented. So is the plugin API. Most of the time I was reading the AST headers to learn more about the API and studied the GCC compiler source code to understand how to use it.

While there are a few other plugins around (and more will probably be written in the future), most of them concentrate on either optimizations or code generation (a good example of the latter is LLVM’s DragonEgg plugin). The only exception is probably Mozilla’s Dehydra/Treehydra set of plugins. However, Dehydra simply exposes a flattened subset of GCC’s AST as a set of JavaScript objects (for example, there is no namespace or #include information). Treehydra relies on GIMPLE which is a representation one level below (towards the machine code) from the parsed C++.

As a result, there isn’t much information or source code examples that show how to work with the GCC’s C++ AST. And since I have already figured out most of the basics, I was thinking about writing a series of blog posts that show how to use GCC plugins to parse C++. What you do based on this information is up to you. Some of the potential applications include static analysis, (source) code generation, documentation generation, binding to other languages, editor/IDE support, etc. In today’s post I am going to show how to set up the plugin infrastructure for this kind of tasks. If there is interest, future posts will cover various aspects of working with GCC’s AST. So if you would like to read more on this topic, drop a line in the comments and if there is enough interest, I will write more on GCC plugins.

GCC plugin API is covered in Chapter 23, “Plugins” in the GCC Internals documentation. As described in this chapter, there are several compilation events (or phases) that the plugin can register for. Unfortunately none of the existing events are suitable for the kind of task that we want to perform. What we want is to be called just after the AST has been constructed and before any other passes are performed. We don’t want to perform any other passes since that would only be a waste of time. All we need is the C++ AST. At first it may seem that PLUGIN_FINISH_UNIT is a good place to run our code. However, a number of passes are performed before it (you can test this by registering a callback for the PLUGIN_OVERRIDE_GATE event which will allow you to see all the passes that are being executed).

One way to achieve what we want would be to register a callback for the PLUGIN_OVERRIDE_GATE event. This callback is called before every pass and it allows the plugin to decide whether to run the pass in question. The first call to this callback will then by definition be before any other pass has run. We can then call our code from this first execution of the callback and then terminate GCC. Here is the skeleton for this callback:

extern "C" void
gate_callback (void* gcc_data, void*)
{
  // If there were errors during compilation,
  // let GCC handle the exit.
  //
  if (errorcount || sorrycount)
    return;
 
  int r (0);
 
  //
  // Process AST. Issue diagnostics and set r
  // to 1 in case of an error.
  //
 
  // Terminate GCC.
  //
  exit (r);
}

errorcount and sorrycount are GCC variables that contain the error counts. The plugin API includes all the internal GCC headers so a plugin can access all the data and call all the functions that the code in the GCC compiler itself can.

Now we have set up the entry point for our plugin in the overall compilation process. There is, however, another thing that we need to take care of: the compiler output. When you execute something like this:

g++ -fplugin=plugin.so -c test.cxx

g++ isn’t the executable that will actually load plugin.so. g++ is a compiler driver that runs several other programs under the hood in order to translate test.cxx to test.o (use the -v option to see what’s actually being executed by g++). It first runs the program called cc1plus which is the actual C++ compiler and which will load the plugin. The output of cc1plus is an assembly file. Once the assembly file is generated, g++ invokes as to translate the assembly file to test.o.

Our plugin is altering the GCC compilation process. Instead of the assembly file we want to generate something else (or maybe no output files at all in case of a static analysis tool). Do you see the problem now? While our plugin is producing some other output, g++ assumes it will produce an assembly file which it will then try to pass to the assembler.

While we can try to invoke cc1plus directly, it is an internal program of GCC and is invoked by g++ with some additional options which we would rather not deal with. Instead, we can ask g++ to produce an assembly file by passing -S instead of -c. In this case g++ is not going to invoke the assembler and nobody will care that the output assembly file does not exist.

So this part is sorted out then. Well, not quite. While we terminate GCC quite early, before any assembly can actually be generated, the output assembly file is still created. To get rid of this file we need to add the following line in our plugin_init():

asm_file_name = HOST_BIT_BUCKET;

HOST_BIT_BUCKET is defined as "/dev/null". Here is the complete source code for the skeleton of our plugin:

// GCC header includes to get the parse tree
// declarations. The order is important and
// doesn't follow any kind of logic.
//
 
#include <stdlib.h>
#include <gmp.h>
 
#include <cstdlib> // Include before GCC poisons
                   // some declarations.
 
extern "C"
{
#include "gcc-plugin.h"
 
#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "tree.h"
#include "intl.h"
 
#include "tm.h"
 
#include "diagnostic.h"
#include "c-common.h"
#include "c-pragma.h"
#include "cp/cp-tree.h"
}
 
#include <iostream>
 
using namespace std;
 
int plugin_is_GPL_compatible;
 
extern "C" void
gate_callback (void*, void*)
{
  // If there were errors during compilation,
  // let GCC handle the exit.
  //
  if (errorcount || sorrycount)
    return;
 
  int r (0);
 
  //
  // Process AST. Issue diagnostics and set r
  // to 1 in case of an error.
  //
  cerr << "processing " << main_input_filename << endl;
 
  exit (r);
}
 
extern "C" int
plugin_init (plugin_name_args* info,
             plugin_gcc_version* ver)
{
  int r (0);
 
  cerr << "starting " << info->base_name << endl;
 
  //
  // Parse options if any.
  //
 
  // Disable assembly output.
  //
  asm_file_name = HOST_BIT_BUCKET;
 
  // Register callbacks.
  //
  register_callback (info->base_name,
                     PLUGIN_OVERRIDE_GATE,
                     &gate_callback,
                     0);
  return r;
}

You can compile and try it out like so:

$ g++-4.5 -I`g++-4.5 -print-file-name=plugin`/include \
-fPIC -shared plugin.cxx -o plugin.so
 
$ g++-4.5 -S -fplugin=./plugin.so test.cxx
starting plugin
processing test.cxx

Update: Starting with version 4.7.0, GCC can be built either in C or C++ mode. And starting with version 4.8.0, it is always built as C++. If you try to run the above example using GCC built in the C++ mode, you will get an error saying that the plugin cannot be loaded because one or more symbols are undefined. The reason for this error is that now all the GCC symbols have C++ linkage while we include them as extern "C". The solution to this problem is to remove the extern "C" { } block around the include directives at the beginning of our plugin source code (note that the following functions should still remain extern "C").

Another option that you will probably want to add to the plugin invocation is -x c++. It tells GCC that what’s being compiled is C++ regardless of the file extension. This is useful if you plan to compile, for example, C++ header files (in this case and without this option, GCC will try to generate a precompiled header instead of an assembly file). Having to remember to specify the two options (-S -x c++) could be quite inconvenient for the users of our plugin.

The plugin can also have options of its own which are specified on the g++ command line in the following form:

-fplugin-arg-<plugin-name>-<key>[=<value>]

This is quite verbose and can also become a major inconvenience for the users of our plugin. To address the above two problems it makes sense to create a driver for our plugin, similar to how g++ is a driver for cc1plus. The driver will automatically pass the -S -x c++ -fplugin=./plugin.so options to g++ and convert plugin options to the -fplugin-arg- format before passing them to g++.

For my project I wrote a plugin driver that uses the following conventions. The driver recognizes the commonly used options such as -I, -D, etc., and passes them to g++ as is. Otherwise the -x option can be used to pass extra options to g++ (for example, -x -m32 ). If an argument to -x does not start with ‘-‘, then it is treated as the g++ executable name. Everything else is converted to the -fplugin-arg- format and passed as plugin options which are then handled in the plugin code with the help of cli. So if you execute:

driver -x g++-4.5 -x m32 --foo bar test.cxx

Then the g++ command line will look like this:

g++-4.5 -m32 -S -x c++ -fplugin=./plugin.so \
-fplugin-arg-plugin-foo=bar test.cxx

And that’s it for today. Remember to drop a line in the comments if you would like to read more about parsing C++ with GCC plugins.

12 Responses to “Parsing C++ with GCC plugins, Part 1”

  1. yichi Says:

    Cool!

  2. Kim Gräsman Says:

    I’d be interested in more detail on this, I’ve been meaning to build some static analysis tools, and leaning on GCC seems like a good way of making headway faster.

    Thanks!

  3. Dmitriy V'jukov Says:

    Hi,

    Thank you for the info. I would like to read more about plugins.
    Btw, is it possible to modify AST with plugins? What I would like to do is to add some arguments to some function calls. And it is possible to get the functionality of __FILE__, __LINE__, __FUNCTION__? Currently I have:

    #define INFO debug_info_t(__FILE__, __LINE__, __FUNCTION__)
    void foo(…, debug_info_t info);

    foo(…, INFO);

    I would like to eliminate those INFO parameters.

  4. Dmitriy V'jukov Says:

    Ah, small world! It’s accidentally happened so that it’s me who submitted XERCESC-1919. Thanks for the fix too ;)

  5. Yosh Says:

    Nice & concise introduction - thanks.

  6. Sebastien Binet Says:

    yes ! keep them coming.

    the gcc plugins area sorely lacks a getting-started or a how-to from first principles document.

    cheers,
    sebastien.

  7. Boris Kolpackov Says:

    Dmitriy,

    Yes, it is definitely possible to modify the tree. The GCC test suite contains a couple of basic examples of this:

    http://gcc.gnu.org/viewcvs/trunk/gcc/testsuite/gcc.dg/plugin/

    The finish_unit_plugin.c shows how to create a function and start_unit_plugin.c shows how to create a global variable. But for more serious modifications the GCC Internals documentation and the GCC source code are probably your best bet.

    Boris

  8. yoco Says:

    I am very interested in it! I always hope to create a refactoring tool for C++. But the parsing work is really killing me T_T

  9. Philip Craig Says:

    Hi please keep the series going, there is room for better documentation on analysing the AST with g++ plugins.

  10. Jonas Bülow Says:

    Nice introduction! I look forward to more on this topic from you.

  11. Philip Sajdak Says:

    Great post! It would be nice to read more about it.

  12. Chad Colgur Says:

    Thanks for the post. So right about the state of plug-in API documentation. Still, an incredibly useful addition to GCC.

    If you’re taking requests for this series of blog posts then static analysis would be of interest to me. My interest is program slicing, particularly amorphous slices (http://doi.ieeecomputersociety.org/10.1109/WPC.1997.601266).