A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]


Published on

Some notes on the internals of the Rust compiler.

Published in: Software
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

  1. 1. A(n abridged) tour of the Rust compiler Tom Lee @tglee Saturday, March 29, 14
  2. 2. Brace yourselves. • I’ve contributed some code to Rust • Mostly fixing crate metadata quirks • Haven’t touched most of the stuff I’m covering today • Sorry in advance for any lies I tell you Saturday, March 29, 14
  3. 3. What is this? • We’re digging into the innards of Rust’s compiler. • Along the way, I’ll cover some “compilers 101” stuff that may not be common knowledge. • Not really covering any of the runtime stuff -- data representation, garbage collection, etc. Saturday, March 29, 14
  4. 4. Intro to compilers • Most compilers follow a familiar pattern: scan, parse, generate code • A scanner converts raw source code into a stream of tokens. • A parser converts the stream of tokens into an intermediate representation. • A code generator emits the target code (e.g. bytecode, x86_64 assembly, etc.) Saturday, March 29, 14
  5. 5. Intro to compilers (cont.) • Real-world compilers do other stuff too. • Semantic analysis often follows the parse phase. For example, if the language is statically typed, a type checking step might happen here. • Often one or more optimization steps. • The compiler may also be kind enough to invoke external tools on your behalf. Saturday, March 29, 14
  6. 6. A 10,000 foot view of Rust’s compiler • Scan • Parse • Semantic Analysis • (Optimizations occur somewhere here) • Generate target code • Link object files into an ELF/PE/Mach-O binary. Saturday, March 29, 14
  7. 7. A 10,000 ft view (cont.) • Where does it all begin? • src/librustc/lib.rs main(...) and run_compiler(...) • src/librustc/driver/driver.rs see compile_input and all the phase_X methods like phase_1_parse_input, phase_2_configure_and_expand, etc. Saturday, March 29, 14
  8. 8. Scanners • Raw source code goes in e.g. if (should_buy(goat_simulator)) { ... } • Tokens come out e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] • This simple translation makes the parser’s job easier. Saturday, March 29, 14
  9. 9. Rust’s Scanner • Fully contained within libsyntax • src/libsyntax/parse/lexer.rs (another name for scanning is “lexical analysis”, ergo “lexer”) Refer to the Reader trait • src/libsyntax/parse/token.rs Tokens and keywords defined here. Saturday, March 29, 14
  10. 10. Parsers •Nom on a token stream from the scanner/lexer e.g. [IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE] •Apply grammar rules to convert the token stream into an Abstract Syntax Tree (or some other representative data structure) Saturday, March 29, 14
  11. 11. Abstract Syntax Trees • Or “AST” • Data structure representing the syntactic structure of your source program. • Abstract in that it omits unnecessary crap (parentheses, quotes, etc.) Saturday, March 29, 14
  12. 12. Abstract Syntax Trees (cont.) If( Call( Id(“should_buy”), [Id(“goat_simulator”)]), [...]) example AST for input “if (should_buy(goat_simulator)) { ... }” Saturday, March 29, 14
  13. 13. Rust’s Parser and AST • Also fully contained within libsyntax • src/libsyntax/ast.rs the Expr_ enum is an interesting starting point, containing the AST representations of most Rust expressions. • src/libsyntax/parse/mod.rs see parse_crate_from_file • src/libsyntax/parse/parser.rs Most of the interesting stuff is in impl<‘a> Parser<‘a>. Maybe check out parse_while_expr, for example. Saturday, March 29, 14
  14. 14. Semantic Analysis • Language- & implementation-specific, but there are common themes. • Typically performed by analyzing and/or annotating the AST (directly or indirectly). • Statically typed languages often do type checking etc. here. Saturday, March 29, 14
  15. 15. Semantic Analysis in Rust • Here we apply all the weird & wonderful rules that make Rust unique. • Mostly handled by src/librustc/middle/*.rs • Name resolution (resolve.rs) • Type checking (typeck/*.rs) • Much, much more... see phase_3_run_analysis_passes in compile_input for the full details Saturday, March 29, 14
  16. 16. Semantic Analysis in Rust: Name Resolution • src/librustc/middle/resolve.rs • Resolve names “what does this name mean in this context?” • Type? Function? Local variable? • Rust has two namespaces: types and values this is why you can e.g. refer to the str type and the str module at the same time • resolve_item seems to be the real workhorse here. Saturday, March 29, 14
  17. 17. Semantic Analysis in Rust: Type Checking • src/librustc/middle/typeck/mod.rs see check_crate • Infer and unify types. • Using inferred & explicit type info, ensure that the input program satisfies all of Rust’s type rules. Saturday, March 29, 14
  18. 18. Semantic Analysis in Rust: Rust-y Stuff •A borrow checking pass enforces memory safety rules see src/librustc/middle/borrowck/doc.rs for details •An effect checking pass to ensure that unsafe operations occur in unsafe contexts. see src/librustc/middle/effect.rs •A kind checking pass enforces special rules for built-in traits like Send and Drop see src/librustc/middle/kind.rs Saturday, March 29, 14
  19. 19. Semantic Analysis in Rust: More Rust-y Stuff •A compute moves pass to determine whether the use of a value will result in a move in a given expression. Important to enforce rules on non-copyable (”linear”) types. see src/librustc/middle/moves.rs Saturday, March 29, 14
  20. 20. Code Generators • Takes an AST as input e.g. If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...]) • Emits some sort of target code e.g. (some made up bytecode) LOAD goat_simulator CALL should_buy JMPIF if_stmt_body_addr Saturday, March 29, 14
  21. 21. Rust’s Code Generator • First, Rust translates the analyzed, type- checked AST into an LLVM module. This is phase_4_translate_to_llvm • src/librustc/middle/trans/base.rs trans_crate is a good place to start Saturday, March 29, 14
  22. 22. Rust’s Code Generator (cont.) • src/librustc/back/link.rs • Passes are run over the LLVM module to write the target code to disk this is phase_5_run_llvm_passes in driver.rs, which calls the appropriate stuff on rustc::back::link • We can tweak the output format using command line options: assembly code, LLVM bitcode files, object files, etc. see build_session_options and the OutputType* variants as used in driver.rs Saturday, March 29, 14
  23. 23. Rust’s Code Generator (cont.) • If you’re trying to build a native executable, the previous step will produce object files... • ... but LLVM won’t link our object files into a(n ELF/PE) binary. this is phase_6_link_output • Rust calls out to the system’s cc program to do the link step. see link_binary, link_natively and get_cc_prog in src/librustc/back/link.rs Saturday, March 29, 14