This document outlines a project to compile a subset of Lisp to x86-64 assembly using Node.js without third-party libraries. It demonstrates the basic components of a compiler including parsing, generating an abstract syntax tree, and code generation. The goal is to compile simple Lisp expressions like (+ 3 (+ 1 2)) to assembly that computes the result. The document discusses parsing to an AST, generating assembly code, calling conventions, syscalls, and testing the generated code. It concludes with ideas for improvements like adding error handling and linking to libraries.
2. The premise
● Compile a subset of lisp to assembly
● Using Javascript/Node.js
● Without any third-party Javascript libraries
● Without any third-party C/assembly libraries (e.g. libc)
○ But GCC instead of NASM/FASM to simplify development on macOS
● In under an hour
3. To demonstrate
● Common compiler architecture
● Basic assembly is not hard
● Starting a compiler is not hard
● Improving your compiler is not hard
● You can (and should) write a compiler too!
5. What we’ll omit
● (Custom) function definitions
● Non-symbol/-numeric data types
● More than 3 function arguments
● A whole lot of safety
● A whole lot of error messaging
10. Writing the parser
● Parser takes a string
● Accumulates “tokens”
● Produces Abstract Syntax Tree (AST)
● Goal:
○ Input (string): “(+ 3 (+ 10 2))”
○ Output (Javascript): [“+”, 3, [‘+’, 10, 2]]
● Strategy:
○ Iterate over each character
○ Recurse on left parenthesis
○ Accumulate on space and right parenthesis
15. Basic Assembly
● Alternate representation of binary (human-readable)
○ Basically
● Fixed set of registers (think: global integer variables)
○ e.g. RDI, RSI, RAX, etc.
○ Plus program memory (a stack)
● Numerous built-in operations
○ e.g. ADD, SUB, PUSH, POP, etc.
● Assign via MOV
○ e.g. MOV RDI, 1
● “function” calls via CALL/RET
17. Calling convention: Background
● Assume System V AMD64 ABI
● Remember registers are:
○ Global
○ Finite
○ Faster (than stack)
● Function caller and callee must agree who preserves which register values
18. Calling convention: Caller
● Registers RDI, RSI, RDX, … are stored on the stack
● Parameter values are assigned to RDI, RSI, RDX, …
● Function is called
● Stack is popped into …, RDX, RSI, RDI to restore prior values
● Function return value is available in RAX
19. Calling convention: Callee
● Preserve any registers not in RDI, RSI, RDX, etc.
● Body logic
● Return value stored in RAX
● Restore preserved registers before RET
21. Writing the code generator
● Goal:
○ Take an AST (e.g. [‘+’, 3, [‘+’, 10, 2]])
○ Produce an assembly program computes this expression andexits with the result
● Strategy:
○ Only supported AST elements are function calls and arguments
■ Arguments are numbers or function calls and arguments
○ Break out code generation into chunks by kind of AST element being compiled
■ E.g. compile_ast, compile_funcall, compile_argument
■ Include plus function as a built-in
24. Syscalls
● Special functions handled by the kernel
● Allow user-land programs to get access to kernel resources
● Syscall identified by a number, differs per kernel
○ Linux: 1 -> write, 60 -> exit
○ FreeBSD: 4 -> write, 1 -> exit
○ macOS: 0x2000004 -> write, 0x2000001 -> exit
■ (0x2000000 plus the FreeBSD syscall number)
● Used like CALL, but syscall number stored in RAX beforehand
27. Improvements? Changes?
● Error messages!!
○ Track line and column numbers in parsing
○ Parser generator not particularly more useful, especially if we get into read macros
● Comments/source in generated code
● Link against libc for additional functionality/bugs
○ Sockets, threads, string utilities, memory allocation, etc.
● Target C or LLVM IR instead
○ Infinite locals! Simpler output!
● Tests!
28. Further reading
● x86_64 calling convention
● macOS assembly programming
○ Stack alignment on macOS
○ Syscalls on macOS
● CHICKEN Scheme compilation process
● LLVM compiler tutorials
● Destination-driven code generation
○ Kent Dybvig’s original paper
○ One-pass code generation in V8
29. Source, blog post
● https://github.com/eatonphil/ulisp
● http://notes.eatonphil.com/compiler-basics-lisp-to-assembly.html