SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
1.
A(n abridged) tour of
the Rust compiler
Tom Lee
@tglee
Saturday, March 29, 14
2.
Brace yourselves.
• I’ve contributed some code to Rust
• Mostly fixing crate metadata quirks
• Haven’t touched most of the stuff I’m
covering today
• Sorry in advance for any lies I tell you
Saturday, March 29, 14
3.
What is this?
• We’re digging into the innards of Rust’s
compiler.
• Along the way, I’ll cover some “compilers
101” stuff that may not be common
knowledge.
• Not really covering any of the runtime stuff
-- data representation, garbage collection,
etc.
Saturday, March 29, 14
4.
Intro to compilers
• Most compilers follow a familiar pattern:
scan, parse, generate code
• A scanner converts raw source code into
a stream of tokens.
• A parser converts the stream of tokens
into an intermediate representation.
• A code generator emits the target code
(e.g. bytecode, x86_64 assembly, etc.)
Saturday, March 29, 14
5.
Intro to compilers
(cont.)
• Real-world compilers do other stuff too.
• Semantic analysis often follows the
parse phase.
For example, if the language is statically typed, a type checking
step might happen here.
• Often one or more optimization steps.
• The compiler may also be kind enough to
invoke external tools on your behalf.
Saturday, March 29, 14
6.
A 10,000 foot view of
Rust’s compiler
• Scan
• Parse
• Semantic Analysis
• (Optimizations occur somewhere here)
• Generate target code
• Link object files into an ELF/PE/Mach-O
binary.
Saturday, March 29, 14
7.
A 10,000 ft view (cont.)
• Where does it all begin?
• src/librustc/lib.rs
main(...) and run_compiler(...)
• src/librustc/driver/driver.rs
see compile_input and all the phase_X methods
like phase_1_parse_input,
phase_2_configure_and_expand, etc.
Saturday, March 29, 14
8.
Scanners
• Raw source code goes in e.g.
if (should_buy(goat_simulator)) { ... }
• Tokens come out e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN,
LBRACE, ..., RBRACE]
• This simple translation makes the parser’s
job easier.
Saturday, March 29, 14
9.
Rust’s Scanner
• Fully contained within libsyntax
• src/libsyntax/parse/lexer.rs
(another name for scanning is “lexical analysis”, ergo “lexer”)
Refer to the Reader trait
• src/libsyntax/parse/token.rs
Tokens and keywords defined here.
Saturday, March 29, 14
10.
Parsers
•Nom on a token stream from the
scanner/lexer e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”),
RPAREN, RPAREN, LBRACE, ..., RBRACE]
•Apply grammar rules to convert the token
stream into an Abstract Syntax Tree
(or some other representative data structure)
Saturday, March 29, 14
11.
Abstract Syntax Trees
• Or “AST”
• Data structure representing the syntactic
structure of your source program.
• Abstract in that it omits unnecessary crap
(parentheses, quotes, etc.)
Saturday, March 29, 14
12.
Abstract Syntax Trees
(cont.)
If(
Call(
Id(“should_buy”),
[Id(“goat_simulator”)]),
[...])
example AST for input
“if (should_buy(goat_simulator)) { ... }”
Saturday, March 29, 14
13.
Rust’s Parser and AST
• Also fully contained within libsyntax
• src/libsyntax/ast.rs
the Expr_ enum is an interesting starting point, containing the
AST representations of most Rust expressions.
• src/libsyntax/parse/mod.rs
see parse_crate_from_file
• src/libsyntax/parse/parser.rs
Most of the interesting stuff is in impl<‘a> Parser<‘a>.
Maybe check out parse_while_expr, for example.
Saturday, March 29, 14
14.
Semantic Analysis
• Language- & implementation-specific, but
there are common themes.
• Typically performed by analyzing and/or
annotating the AST (directly or indirectly).
• Statically typed languages often do type
checking etc. here.
Saturday, March 29, 14
15.
Semantic Analysis in
Rust
• Here we apply all the weird & wonderful
rules that make Rust unique.
• Mostly handled by src/librustc/middle/*.rs
• Name resolution (resolve.rs)
• Type checking (typeck/*.rs)
• Much, much more...
see phase_3_run_analysis_passes in compile_input
for the full details
Saturday, March 29, 14
16.
Semantic Analysis in Rust:
Name Resolution
• src/librustc/middle/resolve.rs
• Resolve names
“what does this name mean in this context?”
• Type? Function? Local variable?
• Rust has two namespaces: types and values
this is why you can e.g. refer to the str type and the str
module at the same time
• resolve_item seems to be the real
workhorse here.
Saturday, March 29, 14
17.
Semantic Analysis in Rust:
Type Checking
• src/librustc/middle/typeck/mod.rs
see check_crate
• Infer and unify types.
• Using inferred & explicit type info, ensure
that the input program satisfies all of Rust’s
type rules.
Saturday, March 29, 14
18.
Semantic Analysis in Rust:
Rust-y Stuff
•A borrow checking pass enforces
memory safety rules
see src/librustc/middle/borrowck/doc.rs for details
•An effect checking pass to ensure that
unsafe operations occur in unsafe contexts.
see src/librustc/middle/effect.rs
•A kind checking pass enforces special
rules for built-in traits like Send and Drop
see src/librustc/middle/kind.rs
Saturday, March 29, 14
19.
Semantic Analysis in Rust:
More Rust-y Stuff
•A compute moves pass to determine
whether the use of a value will result in a
move in a given expression.
Important to enforce rules on non-copyable (”linear”) types.
see src/librustc/middle/moves.rs
Saturday, March 29, 14
20.
Code Generators
• Takes an AST as input e.g.
If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])
• Emits some sort of target code e.g.
(some made up bytecode)
LOAD goat_simulator
CALL should_buy
JMPIF if_stmt_body_addr
Saturday, March 29, 14
21.
Rust’s Code Generator
• First, Rust translates the analyzed, type-
checked AST into an LLVM module.
This is phase_4_translate_to_llvm
• src/librustc/middle/trans/base.rs
trans_crate is a good place to start
Saturday, March 29, 14
22.
Rust’s Code Generator
(cont.)
• src/librustc/back/link.rs
• Passes are run over the LLVM module to
write the target code to disk
this is phase_5_run_llvm_passes in driver.rs,
which calls the appropriate stuff on rustc::back::link
• We can tweak the output format using
command line options: assembly code,
LLVM bitcode files, object files, etc.
see build_session_options and the OutputType*
variants as used in driver.rs
Saturday, March 29, 14
23.
Rust’s Code Generator
(cont.)
• If you’re trying to build a native executable,
the previous step will produce object files...
• ... but LLVM won’t link our object files into
a(n ELF/PE) binary.
this is phase_6_link_output
• Rust calls out to the system’s cc program
to do the link step.
see link_binary, link_natively and get_cc_prog
in src/librustc/back/link.rs
Saturday, March 29, 14