Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
A(n abridged) tour of
the Rust compiler
Tom Lee
@tglee
Saturday, March 29, 14
Brace yourselves.
• I’ve contributed some code to Rust
• Mostly fixing crate metadata quirks
• Haven’t touched most of the ...
What is this?
• We’re digging into the innards of Rust’s
compiler.
• Along the way, I’ll cover some “compilers
101” stuff ...
Intro to compilers
• Most compilers follow a familiar pattern:
scan, parse, generate code
• A scanner converts raw source ...
Intro to compilers
(cont.)
• Real-world compilers do other stuff too.
• Semantic analysis often follows the
parse phase.
F...
A 10,000 foot view of
Rust’s compiler
• Scan
• Parse
• Semantic Analysis
• (Optimizations occur somewhere here)
• Generate...
A 10,000 ft view (cont.)
• Where does it all begin?
• src/librustc/lib.rs
main(...) and run_compiler(...)
• src/librustc/d...
Scanners
• Raw source code goes in e.g.
if (should_buy(goat_simulator)) { ... }
• Tokens come out e.g.
[IF, LPAREN, ID(“sh...
Rust’s Scanner
• Fully contained within libsyntax
• src/libsyntax/parse/lexer.rs
(another name for scanning is “lexical an...
Parsers
•Nom on a token stream from the
scanner/lexer e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”),
RP...
Abstract Syntax Trees
• Or “AST”
• Data structure representing the syntactic
structure of your source program.
• Abstract ...
Abstract Syntax Trees
(cont.)
If(
Call(
Id(“should_buy”),
[Id(“goat_simulator”)]),
[...])
example AST for input
“if (shoul...
Rust’s Parser and AST
• Also fully contained within libsyntax
• src/libsyntax/ast.rs
the Expr_ enum is an interesting star...
Semantic Analysis
• Language- & implementation-specific, but
there are common themes.
• Typically performed by analyzing an...
Semantic Analysis in
Rust
• Here we apply all the weird & wonderful
rules that make Rust unique.
• Mostly handled by src/l...
Semantic Analysis in Rust:
Name Resolution
• src/librustc/middle/resolve.rs
• Resolve names
“what does this name mean in t...
Semantic Analysis in Rust:
Type Checking
• src/librustc/middle/typeck/mod.rs
see check_crate
• Infer and unify types.
• Us...
Semantic Analysis in Rust:
Rust-y Stuff
•A borrow checking pass enforces
memory safety rules
see src/librustc/middle/borro...
Semantic Analysis in Rust:
More Rust-y Stuff
•A compute moves pass to determine
whether the use of a value will result in ...
Code Generators
• Takes an AST as input e.g.
If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])
• Emits some sort o...
Rust’s Code Generator
• First, Rust translates the analyzed, type-
checked AST into an LLVM module.
This is phase_4_transl...
Rust’s Code Generator
(cont.)
• src/librustc/back/link.rs
• Passes are run over the LLVM module to
write the target code t...
Rust’s Code Generator
(cont.)
• If you’re trying to build a native executable,
the previous step will produce object files....
Upcoming SlideShare
Loading in …5
×

A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

2,246 views

Published on

Some notes on the internals of the Rust compiler.

Published in: Software
  • Be the first to comment

A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

  1. 1. A(n abridged) tour of the Rust compiler Tom Lee @tglee Saturday, March 29, 14
  2. 2. Brace yourselves. • I’ve contributed some code to Rust • Mostly fixing crate metadata quirks • Haven’t touched most of the stuff I’m covering today • Sorry in advance for any lies I tell you Saturday, March 29, 14
  3. 3. What is this? • We’re digging into the innards of Rust’s compiler. • Along the way, I’ll cover some “compilers 101” stuff that may not be common knowledge. • Not really covering any of the runtime stuff -- data representation, garbage collection, etc. Saturday, March 29, 14
  4. 4. Intro to compilers • Most compilers follow a familiar pattern: scan, parse, generate code • A scanner converts raw source code into a stream of tokens. • A parser converts the stream of tokens into an intermediate representation. • A code generator emits the target code (e.g. bytecode, x86_64 assembly, etc.) Saturday, March 29, 14
  5. 5. Intro to compilers (cont.) • Real-world compilers do other stuff too. • Semantic analysis often follows the parse phase. For example, if the language is statically typed, a type checking step might happen here. • Often one or more optimization steps. • The compiler may also be kind enough to invoke external tools on your behalf. Saturday, March 29, 14
  6. 6. A 10,000 foot view of Rust’s compiler • Scan • Parse • Semantic Analysis • (Optimizations occur somewhere here) • Generate target code • Link object files into an ELF/PE/Mach-O binary. Saturday, March 29, 14
  7. 7. A 10,000 ft view (cont.) • Where does it all begin? • src/librustc/lib.rs main(...) and run_compiler(...) • src/librustc/driver/driver.rs see compile_input and all the phase_X methods like phase_1_parse_input, phase_2_configure_and_expand, etc. Saturday, March 29, 14
  8. 8. Scanners • Raw source code goes in e.g. if (should_buy(goat_simulator)) { ... } • Tokens come out e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] • This simple translation makes the parser’s job easier. Saturday, March 29, 14
  9. 9. Rust’s Scanner • Fully contained within libsyntax • src/libsyntax/parse/lexer.rs (another name for scanning is “lexical analysis”, ergo “lexer”) Refer to the Reader trait • src/libsyntax/parse/token.rs Tokens and keywords defined here. Saturday, March 29, 14
  10. 10. Parsers •Nom on a token stream from the scanner/lexer e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] •Apply grammar rules to convert the token stream into an Abstract Syntax Tree (or some other representative data structure) Saturday, March 29, 14
  11. 11. Abstract Syntax Trees • Or “AST” • Data structure representing the syntactic structure of your source program. • Abstract in that it omits unnecessary crap (parentheses, quotes, etc.) Saturday, March 29, 14
  12. 12. Abstract Syntax Trees (cont.) If( Call( Id(“should_buy”), [Id(“goat_simulator”)]), [...]) example AST for input “if (should_buy(goat_simulator)) { ... }” Saturday, March 29, 14
  13. 13. Rust’s Parser and AST • Also fully contained within libsyntax • src/libsyntax/ast.rs the Expr_ enum is an interesting starting point, containing the AST representations of most Rust expressions. • src/libsyntax/parse/mod.rs see parse_crate_from_file • src/libsyntax/parse/parser.rs Most of the interesting stuff is in impl<‘a> Parser<‘a>. Maybe check out parse_while_expr, for example. Saturday, March 29, 14
  14. 14. Semantic Analysis • Language- & implementation-specific, but there are common themes. • Typically performed by analyzing and/or annotating the AST (directly or indirectly). • Statically typed languages often do type checking etc. here. Saturday, March 29, 14
  15. 15. Semantic Analysis in Rust • Here we apply all the weird & wonderful rules that make Rust unique. • Mostly handled by src/librustc/middle/*.rs • Name resolution (resolve.rs) • Type checking (typeck/*.rs) • Much, much more... see phase_3_run_analysis_passes in compile_input for the full details Saturday, March 29, 14
  16. 16. Semantic Analysis in Rust: Name Resolution • src/librustc/middle/resolve.rs • Resolve names “what does this name mean in this context?” • Type? Function? Local variable? • Rust has two namespaces: types and values this is why you can e.g. refer to the str type and the str module at the same time • resolve_item seems to be the real workhorse here. Saturday, March 29, 14
  17. 17. Semantic Analysis in Rust: Type Checking • src/librustc/middle/typeck/mod.rs see check_crate • Infer and unify types. • Using inferred & explicit type info, ensure that the input program satisfies all of Rust’s type rules. Saturday, March 29, 14
  18. 18. Semantic Analysis in Rust: Rust-y Stuff •A borrow checking pass enforces memory safety rules see src/librustc/middle/borrowck/doc.rs for details •An effect checking pass to ensure that unsafe operations occur in unsafe contexts. see src/librustc/middle/effect.rs •A kind checking pass enforces special rules for built-in traits like Send and Drop see src/librustc/middle/kind.rs Saturday, March 29, 14
  19. 19. Semantic Analysis in Rust: More Rust-y Stuff •A compute moves pass to determine whether the use of a value will result in a move in a given expression. Important to enforce rules on non-copyable (”linear”) types. see src/librustc/middle/moves.rs Saturday, March 29, 14
  20. 20. Code Generators • Takes an AST as input e.g. If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...]) • Emits some sort of target code e.g. (some made up bytecode) LOAD goat_simulator CALL should_buy JMPIF if_stmt_body_addr Saturday, March 29, 14
  21. 21. Rust’s Code Generator • First, Rust translates the analyzed, type- checked AST into an LLVM module. This is phase_4_translate_to_llvm • src/librustc/middle/trans/base.rs trans_crate is a good place to start Saturday, March 29, 14
  22. 22. Rust’s Code Generator (cont.) • src/librustc/back/link.rs • Passes are run over the LLVM module to write the target code to disk this is phase_5_run_llvm_passes in driver.rs, which calls the appropriate stuff on rustc::back::link • We can tweak the output format using command line options: assembly code, LLVM bitcode files, object files, etc. see build_session_options and the OutputType* variants as used in driver.rs Saturday, March 29, 14
  23. 23. Rust’s Code Generator (cont.) • If you’re trying to build a native executable, the previous step will produce object files... • ... but LLVM won’t link our object files into a(n ELF/PE) binary. this is phase_6_link_output • Rust calls out to the system’s cc program to do the link step. see link_binary, link_natively and get_cc_prog in src/librustc/back/link.rs Saturday, March 29, 14

×