llvm-py: Writing Compilers Using Python
                    PyCon India 2010


                    Mahadevan R
                 mdevan@mdevan.org



                  September 25, 2010




Mahadevan R    llvm-py: Writing Compilers Using Python   September 25, 2010   1 / 14
About This Talk

Outline
  Compilers overview, some terminology
      LLVM about the “Low Level Virtual Machine”
    llvm-py Python bindings for LLVM


About me:
    Open source developer
    Day job is as a software architect (medical imaging)
    Started my career in 1999; mostly C, C++, a little Python!
    ..also the author of llvm-py.


      Mahadevan R        llvm-py: Writing Compilers Using Python   September 25, 2010   2 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”
        Assembly language




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”
        Assembly language
        Another well-structured textual representation (e.g. cfront)




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”
        Assembly language
        Another well-structured textual representation (e.g. cfront)
        Intermediate, binary representation (AOT compilers)




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”
        Assembly language
        Another well-structured textual representation (e.g. cfront)
        Intermediate, binary representation (AOT compilers)
        In-memory executable code (JIT compilers)




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    What is a Compiler?


What is a Compiler?


   What is a compiler?
        Something that transforms “source code” into “something else”
   What is “source code”?
        A well-structured, textual representation of a program
        Not: preprocessors, assemblers
   What is “something else?”
        Assembly language
        Another well-structured textual representation (e.g. cfront)
        Intermediate, binary representation (AOT compilers)
        In-memory executable code (JIT compilers)
        Executable images




     Mahadevan R         llvm-py: Writing Compilers Using Python    September 25, 2010   3 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST
    What is an AST? And what is abstract about it?




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST
    What is an AST? And what is abstract about it?
    What next after an AST?




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST
    What is an AST? And what is abstract about it?
    What next after an AST?
    What is an intermediate form (IR)?




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST
    What is an AST? And what is abstract about it?
    What next after an AST?
    What is an intermediate form (IR)?
    example: IRs in gcc: source → GENERIC → GIMPLE → RTL →
    backend




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Inside a Compiler


Inside a Compiler


What does the compiler do with the source code?
    Lexical analysis produces tokens
    Parser eats tokens, produces AST
    What is an AST? And what is abstract about it?
    What next after an AST?
    What is an intermediate form (IR)?
    example: IRs in gcc: source → GENERIC → GIMPLE → RTL →
    backend
    What is Static Single Assignment (SSA) form?




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   4 / 14
Compilers    Passes, Optimization and Code Generation


Passes, Optimization and Code Generation




   What is a “pass”?




     Mahadevan R       llvm-py: Writing Compilers Using Python            September 25, 2010   5 / 14
Compilers    Passes, Optimization and Code Generation


Passes, Optimization and Code Generation




   What is a “pass”?
   What is a “optimization”?




     Mahadevan R       llvm-py: Writing Compilers Using Python            September 25, 2010   5 / 14
Compilers    Passes, Optimization and Code Generation


Passes, Optimization and Code Generation




   What is a “pass”?
   What is a “optimization”?
   What is a “code generator”?




     Mahadevan R       llvm-py: Writing Compilers Using Python            September 25, 2010   5 / 14
Compilers    Passes, Optimization and Code Generation


Passes, Optimization and Code Generation




   What is a “pass”?
   What is a “optimization”?
   What is a “code generator”?
   What is a “code generator generator”?




     Mahadevan R       llvm-py: Writing Compilers Using Python            September 25, 2010   5 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:
        hand-coded lexer, (rec-desc) parser




     Mahadevan R         llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:
        hand-coded lexer, (rec-desc) parser
        Antlr (generates Python)




     Mahadevan R         llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:
        hand-coded lexer, (rec-desc) parser
        Antlr (generates Python)
        Yacc, Bison, Lemon (generates C, wrap to Python)




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:
        hand-coded lexer, (rec-desc) parser
        Antlr (generates Python)
        Yacc, Bison, Lemon (generates C, wrap to Python)
        Sphinx (C++, wrap to Python)




     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
Compilers    The Frontend and The Backend


The Frontend and The Backend


   Everything before the first IR is usually called the frontend
   And everything after as the backend
   LLVM and llvm-py deal only with backends
   Frontends tend to be simpler
   Many frontends for a backend is common (Scala, Groovy, Clojure:
   Java; C#, IronPython, etc: .NET)
   To build a front end in Python:
        hand-coded lexer, (rec-desc) parser
        Antlr (generates Python)
        Yacc, Bison, Lemon (generates C, wrap to Python)
        Sphinx (C++, wrap to Python)
        various Python parser generators



     Mahadevan R        llvm-py: Writing Compilers Using Python             September 25, 2010   6 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations
        wide range of optimizations




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations
        wide range of optimizations
        code generator (partly description-based) for many platforms




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations
        wide range of optimizations
        code generator (partly description-based) for many platforms
        JIT-compile-execute from IR




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations
        wide range of optimizations
        code generator (partly description-based) for many platforms
        JIT-compile-execute from IR
        link-time optimization (LTO)




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      What is LLVM?


What is LLVM?



   Primarily a set of libraries..
   ..with which you can make a compiler backend..
   ..or a VM with JIT support (but hard to get this right)
   It provides:
        IR data structure, with text and binary representations
        wide range of optimizations
        code generator (partly description-based) for many platforms
        JIT-compile-execute from IR
        link-time optimization (LTO)
        support for accurate garbage collection




     Mahadevan R         llvm-py: Writing Compilers Using Python   September 25, 2010   7 / 14
LLVM      LLVM Highlights


LLVM Highlights



   Written in readable C++




     Mahadevan R     llvm-py: Writing Compilers Using Python   September 25, 2010   8 / 14
LLVM      LLVM Highlights


LLVM Highlights



   Written in readable C++
   Reasonable documentation, helpful and mature community




     Mahadevan R     llvm-py: Writing Compilers Using Python   September 25, 2010   8 / 14
LLVM      LLVM Highlights


LLVM Highlights



   Written in readable C++
   Reasonable documentation, helpful and mature community
   clang is a now-famous subproject of LLVM




     Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   8 / 14
LLVM      LLVM Highlights


LLVM Highlights



   Written in readable C++
   Reasonable documentation, helpful and mature community
   clang is a now-famous subproject of LLVM
   llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) +
   LLVM backend




     Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   8 / 14
LLVM      LLVM Highlights


LLVM Highlights



   Written in readable C++
   Reasonable documentation, helpful and mature community
   clang is a now-famous subproject of LLVM
   llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) +
   LLVM backend
   Many users: Unladen Swallow, Iced Tea, Rubinus, llvm-lua, GHC,
   LDC




     Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   8 / 14
LLVM      The LLVM IR


The LLVM IR

The IR looks like this (textual representation):

“Hello world” in LLVM Assembly
@msg = private constant [15 x i8 ] c " Hello , world !0 A 00"

declare i32 @puts ( i8 *)

define i32 @main ( i32 % argc , i8 ** % argv ) {
entry :
  %0 = getelementptr [15 x i8 ]* @msg , i64 0 , i64 0
  %1 = tail call i32 @puts ( i8 * %0)
  ret i32 undef
}



       Mahadevan R        llvm-py: Writing Compilers Using Python   September 25, 2010   9 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team




    Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team
   clang is a C, Objective C and C++ frontend




    Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team
   clang is a C, Objective C and C++ frontend
   works with the LLVM backend (obviously!)




    Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team
   clang is a C, Objective C and C++ frontend
   works with the LLVM backend (obviously!)

   It can convert C/ObjC/C++ code to LLVM IR




    Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team
   clang is a C, Objective C and C++ frontend
   works with the LLVM backend (obviously!)

   It can convert C/ObjC/C++ code to LLVM IR
   which can be played around with using llvm-py




    Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
LLVM      A Word About clang


A Word About clang



   A sub-project of the LLVM team
   clang is a C, Objective C and C++ frontend
   works with the LLVM backend (obviously!)

   It can convert C/ObjC/C++ code to LLVM IR
   which can be played around with using llvm-py
   and then compiled “normally”




    Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   10 / 14
llvm-py    What is llvm-py?


What is llvm-py?




   Python bindings for LLVM




     Mahadevan R     llvm-py: Writing Compilers Using Python   September 25, 2010   11 / 14
llvm-py    What is llvm-py?


What is llvm-py?




   Python bindings for LLVM
   Including JIT




     Mahadevan R     llvm-py: Writing Compilers Using Python   September 25, 2010   11 / 14
llvm-py    What is llvm-py?


What is llvm-py?




   Python bindings for LLVM
   Including JIT
   Excellent for experimenting and prototyping




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   11 / 14
llvm-py    What is llvm-py?


What is llvm-py?




   Python bindings for LLVM
   Including JIT
   Excellent for experimenting and prototyping
   Available in Ubuntu, Debian, MacPorts (but may be out of date)




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   11 / 14
llvm-py    What is llvm-py?


What is llvm-py?




   Python bindings for LLVM
   Including JIT
   Excellent for experimenting and prototyping
   Available in Ubuntu, Debian, MacPorts (but may be out of date)
   Current version (0.6) works with LLVM 2.7




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   11 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR




     Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR
   See examples to get started




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR
   See examples to get started
   User guide at http://www.mdevan.org/userguide.html




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR
   See examples to get started
   User guide at http://www.mdevan.org/userguide.html
   Some LLVM knowledge required: Modules, IR instructions, ...




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR
   See examples to get started
   User guide at http://www.mdevan.org/userguide.html
   Some LLVM knowledge required: Modules, IR instructions, ...
   Learning curve of LLVM is steep, llvm-py/Python helps!




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    More on llvm-py


More on llvm-py




   Covers most of the LLVM APIs, esp. IR
   See examples to get started
   User guide at http://www.mdevan.org/userguide.html
   Some LLVM knowledge required: Modules, IR instructions, ...
   Learning curve of LLVM is steep, llvm-py/Python helps!
   Users group: http://groups.google.com/group/llvm-py




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   12 / 14
llvm-py    llvm-py Internals


llvm-py Internals




    Wraps LLVM’s C API as a Python extension module




     Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   13 / 14
llvm-py    llvm-py Internals


llvm-py Internals




    Wraps LLVM’s C API as a Python extension module
    Extends LLVM C API as it is incomplete




     Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   13 / 14
llvm-py    llvm-py Internals


llvm-py Internals




    Wraps LLVM’s C API as a Python extension module
    Extends LLVM C API as it is incomplete
    Does not use binding generators (Boost.Python, swig etc)




      Mahadevan R      llvm-py: Writing Compilers Using Python   September 25, 2010   13 / 14
llvm-py    llvm-py Internals


llvm-py Internals




    Wraps LLVM’s C API as a Python extension module
    Extends LLVM C API as it is incomplete
    Does not use binding generators (Boost.Python, swig etc)
    Public APIs are in Python, use extension module internally




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   13 / 14
llvm-py    llvm-py Internals


llvm-py Internals




    Wraps LLVM’s C API as a Python extension module
    Extends LLVM C API as it is incomplete
    Does not use binding generators (Boost.Python, swig etc)
    Public APIs are in Python, use extension module internally
    Mostly straightforward Python, a little metaclass magic




      Mahadevan R       llvm-py: Writing Compilers Using Python   September 25, 2010   13 / 14
Thanks     References and Thanks


Thanks!




                       Thanks for listening!

   Must read: Steven S. Muchnick. Advanced Compiler Design &
   Implementation.
   Me: Mahadevan R, mdevan@mdevan.org, http://www.mdevan.org/




    Mahadevan R      llvm-py: Writing Compilers Using Python      September 25, 2010   14 / 14

llvm-py: Writing Compilers In Python

  • 1.
    llvm-py: Writing CompilersUsing Python PyCon India 2010 Mahadevan R mdevan@mdevan.org September 25, 2010 Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 1 / 14
  • 2.
    About This Talk Outline Compilers overview, some terminology LLVM about the “Low Level Virtual Machine” llvm-py Python bindings for LLVM About me: Open source developer Day job is as a software architect (medical imaging) Started my career in 1999; mostly C, C++, a little Python! ..also the author of llvm-py. Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 2 / 14
  • 3.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 4.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 5.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 6.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 7.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 8.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 9.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Assembly language Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 10.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Assembly language Another well-structured textual representation (e.g. cfront) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 11.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Assembly language Another well-structured textual representation (e.g. cfront) Intermediate, binary representation (AOT compilers) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 12.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Assembly language Another well-structured textual representation (e.g. cfront) Intermediate, binary representation (AOT compilers) In-memory executable code (JIT compilers) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 13.
    Compilers What is a Compiler? What is a Compiler? What is a compiler? Something that transforms “source code” into “something else” What is “source code”? A well-structured, textual representation of a program Not: preprocessors, assemblers What is “something else?” Assembly language Another well-structured textual representation (e.g. cfront) Intermediate, binary representation (AOT compilers) In-memory executable code (JIT compilers) Executable images Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 3 / 14
  • 14.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 15.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 16.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST What is an AST? And what is abstract about it? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 17.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST What is an AST? And what is abstract about it? What next after an AST? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 18.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST What is an AST? And what is abstract about it? What next after an AST? What is an intermediate form (IR)? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 19.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST What is an AST? And what is abstract about it? What next after an AST? What is an intermediate form (IR)? example: IRs in gcc: source → GENERIC → GIMPLE → RTL → backend Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 20.
    Compilers Inside a Compiler Inside a Compiler What does the compiler do with the source code? Lexical analysis produces tokens Parser eats tokens, produces AST What is an AST? And what is abstract about it? What next after an AST? What is an intermediate form (IR)? example: IRs in gcc: source → GENERIC → GIMPLE → RTL → backend What is Static Single Assignment (SSA) form? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 4 / 14
  • 21.
    Compilers Passes, Optimization and Code Generation Passes, Optimization and Code Generation What is a “pass”? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
  • 22.
    Compilers Passes, Optimization and Code Generation Passes, Optimization and Code Generation What is a “pass”? What is a “optimization”? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
  • 23.
    Compilers Passes, Optimization and Code Generation Passes, Optimization and Code Generation What is a “pass”? What is a “optimization”? What is a “code generator”? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
  • 24.
    Compilers Passes, Optimization and Code Generation Passes, Optimization and Code Generation What is a “pass”? What is a “optimization”? What is a “code generator”? What is a “code generator generator”? Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 5 / 14
  • 25.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 26.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 27.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 28.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 29.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 30.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 31.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: hand-coded lexer, (rec-desc) parser Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 32.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: hand-coded lexer, (rec-desc) parser Antlr (generates Python) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 33.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: hand-coded lexer, (rec-desc) parser Antlr (generates Python) Yacc, Bison, Lemon (generates C, wrap to Python) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 34.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: hand-coded lexer, (rec-desc) parser Antlr (generates Python) Yacc, Bison, Lemon (generates C, wrap to Python) Sphinx (C++, wrap to Python) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 35.
    Compilers The Frontend and The Backend The Frontend and The Backend Everything before the first IR is usually called the frontend And everything after as the backend LLVM and llvm-py deal only with backends Frontends tend to be simpler Many frontends for a backend is common (Scala, Groovy, Clojure: Java; C#, IronPython, etc: .NET) To build a front end in Python: hand-coded lexer, (rec-desc) parser Antlr (generates Python) Yacc, Bison, Lemon (generates C, wrap to Python) Sphinx (C++, wrap to Python) various Python parser generators Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 6 / 14
  • 36.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 37.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 38.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 39.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 40.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 41.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations wide range of optimizations Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 42.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations wide range of optimizations code generator (partly description-based) for many platforms Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 43.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations wide range of optimizations code generator (partly description-based) for many platforms JIT-compile-execute from IR Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 44.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations wide range of optimizations code generator (partly description-based) for many platforms JIT-compile-execute from IR link-time optimization (LTO) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 45.
    LLVM What is LLVM? What is LLVM? Primarily a set of libraries.. ..with which you can make a compiler backend.. ..or a VM with JIT support (but hard to get this right) It provides: IR data structure, with text and binary representations wide range of optimizations code generator (partly description-based) for many platforms JIT-compile-execute from IR link-time optimization (LTO) support for accurate garbage collection Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 7 / 14
  • 46.
    LLVM LLVM Highlights LLVM Highlights Written in readable C++ Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
  • 47.
    LLVM LLVM Highlights LLVM Highlights Written in readable C++ Reasonable documentation, helpful and mature community Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
  • 48.
    LLVM LLVM Highlights LLVM Highlights Written in readable C++ Reasonable documentation, helpful and mature community clang is a now-famous subproject of LLVM Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
  • 49.
    LLVM LLVM Highlights LLVM Highlights Written in readable C++ Reasonable documentation, helpful and mature community clang is a now-famous subproject of LLVM llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) + LLVM backend Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
  • 50.
    LLVM LLVM Highlights LLVM Highlights Written in readable C++ Reasonable documentation, helpful and mature community clang is a now-famous subproject of LLVM llvm-gcc is gcc frontend (C, C++, Java, Ada, Fortran, ObjC) + LLVM backend Many users: Unladen Swallow, Iced Tea, Rubinus, llvm-lua, GHC, LDC Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 8 / 14
  • 51.
    LLVM The LLVM IR The LLVM IR The IR looks like this (textual representation): “Hello world” in LLVM Assembly @msg = private constant [15 x i8 ] c " Hello , world !0 A 00" declare i32 @puts ( i8 *) define i32 @main ( i32 % argc , i8 ** % argv ) { entry : %0 = getelementptr [15 x i8 ]* @msg , i64 0 , i64 0 %1 = tail call i32 @puts ( i8 * %0) ret i32 undef } Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 9 / 14
  • 52.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 53.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team clang is a C, Objective C and C++ frontend Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 54.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team clang is a C, Objective C and C++ frontend works with the LLVM backend (obviously!) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 55.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team clang is a C, Objective C and C++ frontend works with the LLVM backend (obviously!) It can convert C/ObjC/C++ code to LLVM IR Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 56.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team clang is a C, Objective C and C++ frontend works with the LLVM backend (obviously!) It can convert C/ObjC/C++ code to LLVM IR which can be played around with using llvm-py Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 57.
    LLVM A Word About clang A Word About clang A sub-project of the LLVM team clang is a C, Objective C and C++ frontend works with the LLVM backend (obviously!) It can convert C/ObjC/C++ code to LLVM IR which can be played around with using llvm-py and then compiled “normally” Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 10 / 14
  • 58.
    llvm-py What is llvm-py? What is llvm-py? Python bindings for LLVM Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
  • 59.
    llvm-py What is llvm-py? What is llvm-py? Python bindings for LLVM Including JIT Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
  • 60.
    llvm-py What is llvm-py? What is llvm-py? Python bindings for LLVM Including JIT Excellent for experimenting and prototyping Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
  • 61.
    llvm-py What is llvm-py? What is llvm-py? Python bindings for LLVM Including JIT Excellent for experimenting and prototyping Available in Ubuntu, Debian, MacPorts (but may be out of date) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
  • 62.
    llvm-py What is llvm-py? What is llvm-py? Python bindings for LLVM Including JIT Excellent for experimenting and prototyping Available in Ubuntu, Debian, MacPorts (but may be out of date) Current version (0.6) works with LLVM 2.7 Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 11 / 14
  • 63.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 64.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR See examples to get started Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 65.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR See examples to get started User guide at http://www.mdevan.org/userguide.html Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 66.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR See examples to get started User guide at http://www.mdevan.org/userguide.html Some LLVM knowledge required: Modules, IR instructions, ... Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 67.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR See examples to get started User guide at http://www.mdevan.org/userguide.html Some LLVM knowledge required: Modules, IR instructions, ... Learning curve of LLVM is steep, llvm-py/Python helps! Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 68.
    llvm-py More on llvm-py More on llvm-py Covers most of the LLVM APIs, esp. IR See examples to get started User guide at http://www.mdevan.org/userguide.html Some LLVM knowledge required: Modules, IR instructions, ... Learning curve of LLVM is steep, llvm-py/Python helps! Users group: http://groups.google.com/group/llvm-py Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 12 / 14
  • 69.
    llvm-py llvm-py Internals llvm-py Internals Wraps LLVM’s C API as a Python extension module Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
  • 70.
    llvm-py llvm-py Internals llvm-py Internals Wraps LLVM’s C API as a Python extension module Extends LLVM C API as it is incomplete Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
  • 71.
    llvm-py llvm-py Internals llvm-py Internals Wraps LLVM’s C API as a Python extension module Extends LLVM C API as it is incomplete Does not use binding generators (Boost.Python, swig etc) Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
  • 72.
    llvm-py llvm-py Internals llvm-py Internals Wraps LLVM’s C API as a Python extension module Extends LLVM C API as it is incomplete Does not use binding generators (Boost.Python, swig etc) Public APIs are in Python, use extension module internally Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
  • 73.
    llvm-py llvm-py Internals llvm-py Internals Wraps LLVM’s C API as a Python extension module Extends LLVM C API as it is incomplete Does not use binding generators (Boost.Python, swig etc) Public APIs are in Python, use extension module internally Mostly straightforward Python, a little metaclass magic Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 13 / 14
  • 74.
    Thanks References and Thanks Thanks! Thanks for listening! Must read: Steven S. Muchnick. Advanced Compiler Design & Implementation. Me: Mahadevan R, mdevan@mdevan.org, http://www.mdevan.org/ Mahadevan R llvm-py: Writing Compilers Using Python September 25, 2010 14 / 14