C++ Clang

【转载】Using libclang to Parse C++ (aka libclang 101)

Spread the love

In this post I’ll provide a quick tutorial for using libclang. I started playing around with libclang while implementing Reflang – an open source reflection framework for C++. Then I came to appreciate the amazing work done by its developers.

Please note that we will start with a program and will gradually add code. Scroll to the end of the post to view the complete solution.

libclang?

Clang, if you haven’t heard of yet, is a wonderful C++ (and other C language family) compiler. Well, not exactly a compiler, but a frontend to the LLVM compiler.

You see, compilers have a very tough problem to solve, and so most of them split it into 2 easier problems:

Translating a programming language (C++ in our case) to some intermediate code – this is called the frontend, and is exactly what Clang does.
Translate the above intermediate code to machine code – this is called the back-end. Clang uses LLVM for that.
The neat thing about Clang is that it is designed to be also used as a library. There are many types of applications that must truly understand code – IDEs, documentation-generators, static-analysis tools, etc. Instead of each of them having to implement C++ parsing (which is an extremely difficult task!), libclang can be used to correctly handle all language features and edge-cases.

libclang!

And it’s so darn easy. Really. Those Clang folks really did an awesome work. In the rest of this post we will use its C-API to explore the following code:

[code language=”plain”]// header.hpp

class MyClass
{
public:
int field;
virtual void method() const = 0;

static const int static_field;
static int static_method();
};

[/code]

Basic example

Let’s look at the simplest of examples. The following program parses the above file and immediately exists:

[code language=”plain”]
#include
#include // This is libclang.
using namespace std;

int main()
{
CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit unit = clang_parseTranslationUnit(
index,
“header.hpp”, nullptr, 0,
nullptr, 0,
CXTranslationUnit_None);
if (unit == nullptr)
{
cerr << “Unable to parse translation unit. Quitting.” << endl;
exit(-1);
}

clang_disposeTranslationUnit(unit);
clang_disposeIndex(index);
}

[/code]

There are many 0s and nullptrs – these allow us to do some more advanced stuff (like pass argv & argc, use in-memory files, etc). Let’s not get into these.

So what do we have after clang_parseTranslationUnit() has finished successfully? We have a parsed Abstract Syntax Tree (AST) which we can traverse and inspect. Which is exactly what we’ll do.

Cursors

Pointers to the AST are called Cursors in libclang lingo. A Cursor can have a parent and children. It can also have related cursors (like a default value for a parameter, an explicit value to an enum entry, etc).

The ‘entry point’ cursor we will use is the cursor representing the Translation Unit (TU), which is a C++ term meaning a single file including all #included code. To get the TU’s cursor we will use the very descriptive clang_getTranslationUnitCursor(). Now that we have a cursor we can investigate it or iterate using it.

Visit children

Any cursor has a kind, which represents the essence of the cursor. Kind can be one of many, many options, as can be seen here. A few examples are:

[code language=”plain”] /** \brief A C or C++ struct. */
CXCursor_StructDecl = 2,
/** \brief A C or C++ union. */
CXCursor_UnionDecl = 3,
/** \brief A C++ class. */
CXCursor_ClassDecl = 4,
/** \brief An enumeration. */
CXCursor_EnumDecl = 5,

[/code]

We can get the kind from a cursor using clang_getCursorKind().

For now lets visit all children of the TU:

[code language=”plain”] CXCursor cursor = clang_getTranslationUnitCursor(unit);
clang_visitChildren(
cursor,
[](CXCursor c, CXCursor parent, CXClientData client_data)
{
cout << “Cursor kind: ” << clang_getCursorKind(c) << endl;
return CXChildVisit_Recurse;
},
nullptr);

[/code]

The second-parameter lambda is a function called for every cursor visited. Inside we always return CXChildVisit_Recurse (although other options exist), because we want to explore everything in our file.

Output:

Cursor kind: 4
Cursor kind: 39
Cursor kind: 6
Cursor kind: 21
Cursor kind: 9
Cursor kind: 21

Thats a bit cryptic, and requires us to skip back and forth to Index.h. Fortunately, theres a built-in function to convert cursor kind to a string, but first we need to discuss libclangs strings.

CXString

CXString is a type representing a pointer to the AST. To retrieve an actually useful string (const char * for example), one must call clang_getCString() which internally increments a ref-count, and then clang_disposeString() when done.

Since were going to do this a lot, lets create a helper function:

[code language=”plain”]ostream& operator<<(ostream& stream, const CXString& str)
{
stream << clang_getCString(str);
clang_disposeString(str);
return stream;
}

[/code]

Print meaningful output

Now that we can extract strings, lets modify our lambda to print something that is actually useful:

[code language=”plain”] CXCursor cursor = clang_getTranslationUnitCursor(unit);
clang_visitChildren(
cursor,
[](CXCursor c, CXCursor parent, CXClientData client_data)
{
cout << “Cursor ‘” << clang_getCursorSpelling(c) << “‘ of kind ‘”
<< clang_getCursorKindSpelling(clang_getCursorKind(c)) << “‘\n”;
return CXChildVisit_Recurse;
},
nullptr);

[/code]

Output:

Cursor ‘MyClass’ of kind ‘ClassDecl’
Cursor ” of kind ‘CXXAccessSpecifier’
Cursor ‘field’ of kind ‘FieldDecl’
Cursor ‘method’ of kind ‘CXXMethod’
Cursor ‘static_field’ of kind ‘VarDecl’
Cursor ‘static_method’ of kind ‘CXXMethod’
Now, thats friggin neat.

A more complicated example

I was very careful not to #include any header in header.hpp. Why? Well, by merely adding #include to header.hpp the output size is 1.51MB. Ever got pissed at the compiler for taking so long? Thats why. It’s very educating to read such a file, but for everyone’s sake I won’t post it here.

Instead, let’s parse the following file:

[code language=”plain”]enum class Cpp11Enum
{
RED = 10,
BLUE = 20
};

struct Wowza
{
virtual ~Wowza() = default;
virtual void foo(int i = 0) = 0;
};

struct Badabang : Wowza
{
void foo(int) override;

bool operator==(const Badabang& o) const;
};

template
void bar(T&& t);

[/code]

Same program’s output for this file:

Cursor ‘Cpp11Enum’ of kind ‘EnumDecl’
Cursor ‘RED’ of kind ‘EnumConstantDecl’
Cursor ” of kind ‘IntegerLiteral’
Cursor ‘BLUE’ of kind ‘EnumConstantDecl’
Cursor ” of kind ‘IntegerLiteral’
Cursor ‘Wowza’ of kind ‘StructDecl’
Cursor ‘~Wowza’ of kind ‘CXXDestructor’
Cursor ‘foo’ of kind ‘CXXMethod’
Cursor ‘i’ of kind ‘ParmDecl’
Cursor ” of kind ‘IntegerLiteral’
Cursor ‘Badabang’ of kind ‘StructDecl’
Cursor ‘struct Wowza’ of kind ‘C++ base class specifier’
Cursor ‘struct Wowza’ of kind ‘TypeRef’
Cursor ‘foo’ of kind ‘CXXMethod’
Cursor ” of kind ‘attribute(override)’
Cursor ” of kind ‘ParmDecl’
Cursor ‘operator==’ of kind ‘CXXMethod’
Cursor ‘o’ of kind ‘ParmDecl’
Cursor ‘struct Badabang’ of kind ‘TypeRef’
Cursor ‘bar’ of kind ‘FunctionTemplate’
Cursor ‘T’ of kind ‘TemplateTypeParameter’
Cursor ‘t’ of kind ‘ParmDecl’
Cursor ‘T’ of kind ‘TypeRef’
Conclusion
libclang is awesome:

It allows checking whether code has been expanded from a macro, and to jump there;
It allows checking the location (file+line+column) for each cursor;
It allows getting function’s parameter names, types and return type;
It understands templates, autos, lambdas, and, well, everything in C++.
I hope this short post made you curious, and that you’ll also try exploring what this amazing API provides. Please do write a comment below if you have anything you want to add or ask!

Complete Code

For your convenience, here’s the complete code we implemented today:

[code language=”plain”]#include
#include
using namespace std;

ostream& operator<<(ostream& stream, const CXString& str)
{
stream << clang_getCString(str);
clang_disposeString(str);
return stream;
}

int main()
{
CXIndex index = clang_createIndex(0, 0);
CXTranslationUnit unit = clang_parseTranslationUnit(
index,
“header.hpp”, nullptr, 0,
nullptr, 0,
CXTranslationUnit_None);
if (unit == nullptr)
{
cerr << “Unable to parse translation unit. Quitting.” << endl;
exit(-1);
}

CXCursor cursor = clang_getTranslationUnitCursor(unit);
clang_visitChildren(
cursor,
[](CXCursor c, CXCursor parent, CXClientData client_data)
{
cout << “Cursor ‘” << clang_getCursorSpelling(c) << “‘ of kind ‘”
<< clang_getCursorKindSpelling(clang_getCursorKind(c)) << “‘\n”;
return CXChildVisit_Recurse;
},
nullptr);

[/code]

发表评论

电子邮件地址不会被公开。 必填项已用*标注