0
A(n abridged) tour of
the Rust compiler
Tom Lee
@tglee
Saturday, March 29, 14
Brace yourselves.
• I’ve contributed some code to Rust
• Mostly fixing crate metadata quirks
• Haven’t touched most of the ...
What is this?
• We’re digging into the innards of Rust’s
compiler.
• Along the way, I’ll cover some “compilers
101” stuff ...
Intro to compilers
• Most compilers follow a familiar pattern:
scan, parse, generate code
• A scanner converts raw source ...
Intro to compilers
(cont.)
• Real-world compilers do other stuff too.
• Semantic analysis often follows the
parse phase.
F...
A 10,000 foot view of
Rust’s compiler
• Scan
• Parse
• Semantic Analysis
• (Optimizations occur somewhere here)
• Generate...
A 10,000 ft view (cont.)
• Where does it all begin?
• src/librustc/lib.rs
main(...) and run_compiler(...)
• src/librustc/d...
Scanners
• Raw source code goes in e.g.
if (should_buy(goat_simulator)) { ... }
• Tokens come out e.g.
[IF, LPAREN, ID(“sh...
Rust’s Scanner
• Fully contained within libsyntax
• src/libsyntax/parse/lexer.rs
(another name for scanning is “lexical an...
Parsers
•Nom on a token stream from the
scanner/lexer e.g.
[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”),
RP...
Abstract Syntax Trees
• Or “AST”
• Data structure representing the syntactic
structure of your source program.
• Abstract ...
Abstract Syntax Trees
(cont.)
If(
Call(
Id(“should_buy”),
[Id(“goat_simulator”)]),
[...])
example AST for input
“if (shoul...
Rust’s Parser and AST
• Also fully contained within libsyntax
• src/libsyntax/ast.rs
the Expr_ enum is an interesting star...
Semantic Analysis
• Language- & implementation-specific, but
there are common themes.
• Typically performed by analyzing an...
Semantic Analysis in
Rust
• Here we apply all the weird & wonderful
rules that make Rust unique.
• Mostly handled by src/l...
Semantic Analysis in Rust:
Name Resolution
• src/librustc/middle/resolve.rs
• Resolve names
“what does this name mean in t...
Semantic Analysis in Rust:
Type Checking
• src/librustc/middle/typeck/mod.rs
see check_crate
• Infer and unify types.
• Us...
Semantic Analysis in Rust:
Rust-y Stuff
•A borrow checking pass enforces
memory safety rules
see src/librustc/middle/borro...
Semantic Analysis in Rust:
More Rust-y Stuff
•A compute moves pass to determine
whether the use of a value will result in ...
Code Generators
• Takes an AST as input e.g.
If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])
• Emits some sort o...
Rust’s Code Generator
• First, Rust translates the analyzed, type-
checked AST into an LLVM module.
This is phase_4_transl...
Rust’s Code Generator
(cont.)
• src/librustc/back/link.rs
• Passes are run over the LLVM module to
write the target code t...
Rust’s Code Generator
(cont.)
• If you’re trying to build a native executable,
the previous step will produce object files....
Upcoming SlideShare
Loading in...5
×

A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

1,147

Published on

Some notes on the internals of the Rust compiler.

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,147
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
16
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]"

  1. 1. A(n abridged) tour of the Rust compiler Tom Lee @tglee Saturday, March 29, 14
  2. 2. Brace yourselves. • I’ve contributed some code to Rust • Mostly fixing crate metadata quirks • Haven’t touched most of the stuff I’m covering today • Sorry in advance for any lies I tell you Saturday, March 29, 14
  3. 3. What is this? • We’re digging into the innards of Rust’s compiler. • Along the way, I’ll cover some “compilers 101” stuff that may not be common knowledge. • Not really covering any of the runtime stuff -- data representation, garbage collection, etc. Saturday, March 29, 14
  4. 4. Intro to compilers • Most compilers follow a familiar pattern: scan, parse, generate code • A scanner converts raw source code into a stream of tokens. • A parser converts the stream of tokens into an intermediate representation. • A code generator emits the target code (e.g. bytecode, x86_64 assembly, etc.) Saturday, March 29, 14
  5. 5. Intro to compilers (cont.) • Real-world compilers do other stuff too. • Semantic analysis often follows the parse phase. For example, if the language is statically typed, a type checking step might happen here. • Often one or more optimization steps. • The compiler may also be kind enough to invoke external tools on your behalf. Saturday, March 29, 14
  6. 6. A 10,000 foot view of Rust’s compiler • Scan • Parse • Semantic Analysis • (Optimizations occur somewhere here) • Generate target code • Link object files into an ELF/PE/Mach-O binary. Saturday, March 29, 14
  7. 7. A 10,000 ft view (cont.) • Where does it all begin? • src/librustc/lib.rs main(...) and run_compiler(...) • src/librustc/driver/driver.rs see compile_input and all the phase_X methods like phase_1_parse_input, phase_2_configure_and_expand, etc. Saturday, March 29, 14
  8. 8. Scanners • Raw source code goes in e.g. if (should_buy(goat_simulator)) { ... } • Tokens come out e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] • This simple translation makes the parser’s job easier. Saturday, March 29, 14
  9. 9. Rust’s Scanner • Fully contained within libsyntax • src/libsyntax/parse/lexer.rs (another name for scanning is “lexical analysis”, ergo “lexer”) Refer to the Reader trait • src/libsyntax/parse/token.rs Tokens and keywords defined here. Saturday, March 29, 14
  10. 10. Parsers •Nom on a token stream from the scanner/lexer e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] •Apply grammar rules to convert the token stream into an Abstract Syntax Tree (or some other representative data structure) Saturday, March 29, 14
  11. 11. Abstract Syntax Trees • Or “AST” • Data structure representing the syntactic structure of your source program. • Abstract in that it omits unnecessary crap (parentheses, quotes, etc.) Saturday, March 29, 14
  12. 12. Abstract Syntax Trees (cont.) If( Call( Id(“should_buy”), [Id(“goat_simulator”)]), [...]) example AST for input “if (should_buy(goat_simulator)) { ... }” Saturday, March 29, 14
  13. 13. Rust’s Parser and AST • Also fully contained within libsyntax • src/libsyntax/ast.rs the Expr_ enum is an interesting starting point, containing the AST representations of most Rust expressions. • src/libsyntax/parse/mod.rs see parse_crate_from_file • src/libsyntax/parse/parser.rs Most of the interesting stuff is in impl<‘a> Parser<‘a>. Maybe check out parse_while_expr, for example. Saturday, March 29, 14
  14. 14. Semantic Analysis • Language- & implementation-specific, but there are common themes. • Typically performed by analyzing and/or annotating the AST (directly or indirectly). • Statically typed languages often do type checking etc. here. Saturday, March 29, 14
  15. 15. Semantic Analysis in Rust • Here we apply all the weird & wonderful rules that make Rust unique. • Mostly handled by src/librustc/middle/*.rs • Name resolution (resolve.rs) • Type checking (typeck/*.rs) • Much, much more... see phase_3_run_analysis_passes in compile_input for the full details Saturday, March 29, 14
  16. 16. Semantic Analysis in Rust: Name Resolution • src/librustc/middle/resolve.rs • Resolve names “what does this name mean in this context?” • Type? Function? Local variable? • Rust has two namespaces: types and values this is why you can e.g. refer to the str type and the str module at the same time • resolve_item seems to be the real workhorse here. Saturday, March 29, 14
  17. 17. Semantic Analysis in Rust: Type Checking • src/librustc/middle/typeck/mod.rs see check_crate • Infer and unify types. • Using inferred & explicit type info, ensure that the input program satisfies all of Rust’s type rules. Saturday, March 29, 14
  18. 18. Semantic Analysis in Rust: Rust-y Stuff •A borrow checking pass enforces memory safety rules see src/librustc/middle/borrowck/doc.rs for details •An effect checking pass to ensure that unsafe operations occur in unsafe contexts. see src/librustc/middle/effect.rs •A kind checking pass enforces special rules for built-in traits like Send and Drop see src/librustc/middle/kind.rs Saturday, March 29, 14
  19. 19. Semantic Analysis in Rust: More Rust-y Stuff •A compute moves pass to determine whether the use of a value will result in a move in a given expression. Important to enforce rules on non-copyable (”linear”) types. see src/librustc/middle/moves.rs Saturday, March 29, 14
  20. 20. Code Generators • Takes an AST as input e.g. If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...]) • Emits some sort of target code e.g. (some made up bytecode) LOAD goat_simulator CALL should_buy JMPIF if_stmt_body_addr Saturday, March 29, 14
  21. 21. Rust’s Code Generator • First, Rust translates the analyzed, type- checked AST into an LLVM module. This is phase_4_translate_to_llvm • src/librustc/middle/trans/base.rs trans_crate is a good place to start Saturday, March 29, 14
  22. 22. Rust’s Code Generator (cont.) • src/librustc/back/link.rs • Passes are run over the LLVM module to write the target code to disk this is phase_5_run_llvm_passes in driver.rs, which calls the appropriate stuff on rustc::back::link • We can tweak the output format using command line options: assembly code, LLVM bitcode files, object files, etc. see build_session_options and the OutputType* variants as used in driver.rs Saturday, March 29, 14
  23. 23. Rust’s Code Generator (cont.) • If you’re trying to build a native executable, the previous step will produce object files... • ... but LLVM won’t link our object files into a(n ELF/PE) binary. this is phase_6_link_output • Rust calls out to the system’s cc program to do the link step. see link_binary, link_natively and get_cc_prog in src/librustc/back/link.rs Saturday, March 29, 14
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×