SpotFlow: Tracking Method Calls and States at Runtime
A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]
1. A(n abridged) tour of
the Rust compiler
Tom Lee
@tglee
Saturday, March 29, 14
2. Brace yourselves.
• I’ve contributed some code to Rust
• Mostly fixing crate metadata quirks
• Haven’t touched most of the stuff I’m
covering today
• Sorry in advance for any lies I tell you
Saturday, March 29, 14
3. What is this?
• We’re digging into the innards of Rust’s
compiler.
• Along the way, I’ll cover some “compilers
101” stuff that may not be common
knowledge.
• Not really covering any of the runtime stuff
-- data representation, garbage collection,
etc.
Saturday, March 29, 14
4. Intro to compilers
• Most compilers follow a familiar pattern:
scan, parse, generate code
• A scanner converts raw source code into
a stream of tokens.
• A parser converts the stream of tokens
into an intermediate representation.
• A code generator emits the target code
(e.g. bytecode, x86_64 assembly, etc.)
Saturday, March 29, 14
5. Intro to compilers
(cont.)
• Real-world compilers do other stuff too.
• Semantic analysis often follows the
parse phase.
For example, if the language is statically typed, a type checking
step might happen here.
• Often one or more optimization steps.
• The compiler may also be kind enough to
invoke external tools on your behalf.
Saturday, March 29, 14
6. A 10,000 foot view of
Rust’s compiler
• Scan
• Parse
• Semantic Analysis
• (Optimizations occur somewhere here)
• Generate target code
• Link object files into an ELF/PE/Mach-O
binary.
Saturday, March 29, 14
7. A 10,000 ft view (cont.)
• Where does it all begin?
• src/librustc/lib.rs
main(...) and run_compiler(...)
• src/librustc/driver/driver.rs
see compile_input and all the phase_X methods
like phase_1_parse_input,
phase_2_configure_and_expand, etc.
Saturday, March 29, 14
8. Scanners
• Raw source code goes in e.g.
if (should_buy(goat_simulator)) { ... }
• Tokens come out e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN,
LBRACE, ..., RBRACE]
• This simple translation makes the parser’s
job easier.
Saturday, March 29, 14
9. Rust’s Scanner
• Fully contained within libsyntax
• src/libsyntax/parse/lexer.rs
(another name for scanning is “lexical analysis”, ergo “lexer”)
Refer to the Reader trait
• src/libsyntax/parse/token.rs
Tokens and keywords defined here.
Saturday, March 29, 14
10. Parsers
•Nom on a token stream from the
scanner/lexer e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”),
RPAREN, RPAREN, LBRACE, ..., RBRACE]
•Apply grammar rules to convert the token
stream into an Abstract Syntax Tree
(or some other representative data structure)
Saturday, March 29, 14
11. Abstract Syntax Trees
• Or “AST”
• Data structure representing the syntactic
structure of your source program.
• Abstract in that it omits unnecessary crap
(parentheses, quotes, etc.)
Saturday, March 29, 14
13. Rust’s Parser and AST
• Also fully contained within libsyntax
• src/libsyntax/ast.rs
the Expr_ enum is an interesting starting point, containing the
AST representations of most Rust expressions.
• src/libsyntax/parse/mod.rs
see parse_crate_from_file
• src/libsyntax/parse/parser.rs
Most of the interesting stuff is in impl<‘a> Parser<‘a>.
Maybe check out parse_while_expr, for example.
Saturday, March 29, 14
14. Semantic Analysis
• Language- & implementation-specific, but
there are common themes.
• Typically performed by analyzing and/or
annotating the AST (directly or indirectly).
• Statically typed languages often do type
checking etc. here.
Saturday, March 29, 14
15. Semantic Analysis in
Rust
• Here we apply all the weird & wonderful
rules that make Rust unique.
• Mostly handled by src/librustc/middle/*.rs
• Name resolution (resolve.rs)
• Type checking (typeck/*.rs)
• Much, much more...
see phase_3_run_analysis_passes in compile_input
for the full details
Saturday, March 29, 14
16. Semantic Analysis in Rust:
Name Resolution
• src/librustc/middle/resolve.rs
• Resolve names
“what does this name mean in this context?”
• Type? Function? Local variable?
• Rust has two namespaces: types and values
this is why you can e.g. refer to the str type and the str
module at the same time
• resolve_item seems to be the real
workhorse here.
Saturday, March 29, 14
17. Semantic Analysis in Rust:
Type Checking
• src/librustc/middle/typeck/mod.rs
see check_crate
• Infer and unify types.
• Using inferred & explicit type info, ensure
that the input program satisfies all of Rust’s
type rules.
Saturday, March 29, 14
18. Semantic Analysis in Rust:
Rust-y Stuff
•A borrow checking pass enforces
memory safety rules
see src/librustc/middle/borrowck/doc.rs for details
•An effect checking pass to ensure that
unsafe operations occur in unsafe contexts.
see src/librustc/middle/effect.rs
•A kind checking pass enforces special
rules for built-in traits like Send and Drop
see src/librustc/middle/kind.rs
Saturday, March 29, 14
19. Semantic Analysis in Rust:
More Rust-y Stuff
•A compute moves pass to determine
whether the use of a value will result in a
move in a given expression.
Important to enforce rules on non-copyable (”linear”) types.
see src/librustc/middle/moves.rs
Saturday, March 29, 14
20. Code Generators
• Takes an AST as input e.g.
If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])
• Emits some sort of target code e.g.
(some made up bytecode)
LOAD goat_simulator
CALL should_buy
JMPIF if_stmt_body_addr
Saturday, March 29, 14
21. Rust’s Code Generator
• First, Rust translates the analyzed, type-
checked AST into an LLVM module.
This is phase_4_translate_to_llvm
• src/librustc/middle/trans/base.rs
trans_crate is a good place to start
Saturday, March 29, 14
22. Rust’s Code Generator
(cont.)
• src/librustc/back/link.rs
• Passes are run over the LLVM module to
write the target code to disk
this is phase_5_run_llvm_passes in driver.rs,
which calls the appropriate stuff on rustc::back::link
• We can tweak the output format using
command line options: assembly code,
LLVM bitcode files, object files, etc.
see build_session_options and the OutputType*
variants as used in driver.rs
Saturday, March 29, 14
23. Rust’s Code Generator
(cont.)
• If you’re trying to build a native executable,
the previous step will produce object files...
• ... but LLVM won’t link our object files into
a(n ELF/PE) binary.
this is phase_6_link_output
• Rust calls out to the system’s cc program
to do the link step.
see link_binary, link_natively and get_cc_prog
in src/librustc/back/link.rs
Saturday, March 29, 14