SlideShare a Scribd company logo
1 of 31
“lisp” to assembly
Compiler basics
Phil Eaton
Engineer & Manager at Capsule8
phil@capsule8.com
@phil_eaton
The premise
● Compile a subset of lisp to assembly
● Using Javascript/Node.js
● Without any third-party Javascript libraries
● Without any third-party C/assembly libraries (e.g. libc)
○ But GCC instead of NASM/FASM to simplify development on macOS
● In under an hour
To demonstrate
● Common compiler architecture
● Basic assembly is not hard
● Starting a compiler is not hard
● Improving your compiler is not hard
● You can (and should) write a compiler too!
What we’ll cover
● Parsing
● Code generation
○ Assembly
○ Syscalls
What we’ll omit
● (Custom) function definitions
● Non-symbol/-numeric data types
● More than 3 function arguments
● A whole lot of safety
● A whole lot of error messaging
Before we start coding...
Specify the language
● S-expressions/infix-notation for syntax
● Output is assembly
● Goal:
○ Compiler Input: (+ 3 (+ 1 2))
○ Generated Process 1: (+ 3 (+ 1 2))
○ Generated Process 2: (+ 3 3)
○ Generated Process 3: 6
○ Generated Output: 6
Desired use
$ node ulisp.js ‘(+ 3 (+ 1 2))’ > prog.S
$ gcc -mstackrealign -masm=intel prog.S
$ ./a.out
$ echo $?
6
Let’s dig in!
Writing the parser
● Parser takes a string
● Accumulates “tokens”
● Produces Abstract Syntax Tree (AST)
● Goal:
○ Input (string): “(+ 3 (+ 10 2))”
○ Output (Javascript): [“+”, 3, [‘+’, 10, 2]]
● Strategy:
○ Iterate over each character
○ Recurse on left parenthesis
○ Accumulate on space and right parenthesis
parser.js
● We write
Testing it
$ node
> const { parse } = require('./parser');
undefined
> console.log(JSON.stringify(parse('(+ 3 (+ 10 2)')));
[[["+",3,["+",10,2]]],""]
So... when do we compile?
First...
Basic Assembly
● Alternate representation of binary (human-readable)
○ Basically
● Fixed set of registers (think: global integer variables)
○ e.g. RDI, RSI, RAX, etc.
○ Plus program memory (a stack)
● Numerous built-in operations
○ e.g. ADD, SUB, PUSH, POP, etc.
● Assign via MOV
○ e.g. MOV RDI, 1
● “function” calls via CALL/RET
And now, the dry part...
Calling convention: Background
● Assume System V AMD64 ABI
● Remember registers are:
○ Global
○ Finite
○ Faster (than stack)
● Function caller and callee must agree who preserves which register values
Calling convention: Caller
● Registers RDI, RSI, RDX, … are stored on the stack
● Parameter values are assigned to RDI, RSI, RDX, …
● Function is called
● Stack is popped into …, RDX, RSI, RDI to restore prior values
● Function return value is available in RAX
Calling convention: Callee
● Preserve any registers not in RDI, RSI, RDX, etc.
● Body logic
● Return value stored in RAX
● Restore preserved registers before RET
Show me an example!
Writing the code generator
● Goal:
○ Take an AST (e.g. [‘+’, 3, [‘+’, 10, 2]])
○ Produce an assembly program computes this expression andexits with the result
● Strategy:
○ Only supported AST elements are function calls and arguments
■ Arguments are numbers or function calls and arguments
○ Break out code generation into chunks by kind of AST element being compiled
■ E.g. compile_ast, compile_funcall, compile_argument
■ Include plus function as a built-in
compiler.js
● We write
Communicating outside the process?
Syscalls
● Special functions handled by the kernel
● Allow user-land programs to get access to kernel resources
● Syscall identified by a number, differs per kernel
○ Linux: 1 -> write, 60 -> exit
○ FreeBSD: 4 -> write, 1 -> exit
○ macOS: 0x2000004 -> write, 0x2000001 -> exit
■ (0x2000000 plus the FreeBSD syscall number)
● Used like CALL, but syscall number stored in RAX beforehand
Generating a binary
$ node ulisp.js ‘(+ 3 (+ 1 2))’ > prog.s
$ gcc -mstackrealign -masm=intel prog.s
$ ./a.out
$ echo $?
6
$ node ulisp.js ‘(+ 8 (+ 10 4))’ > prog.s
$ gcc -mstackrealign -masm=intel prog.s
$ ./a.out
$ echo $?
22
We did it!
Improvements? Changes?
● Error messages!!
○ Track line and column numbers in parsing
○ Parser generator not particularly more useful, especially if we get into read macros
● Comments/source in generated code
● Link against libc for additional functionality/bugs
○ Sockets, threads, string utilities, memory allocation, etc.
● Target C or LLVM IR instead
○ Infinite locals! Simpler output!
● Tests!
Further reading
● x86_64 calling convention
● macOS assembly programming
○ Stack alignment on macOS
○ Syscalls on macOS
● CHICKEN Scheme compilation process
● LLVM compiler tutorials
● Destination-driven code generation
○ Kent Dybvig’s original paper
○ One-pass code generation in V8
Source, blog post
● https://github.com/eatonphil/ulisp
● http://notes.eatonphil.com/compiler-basics-lisp-to-assembly.html
Questions?
Thank you!

More Related Content

What's hot

Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017Codemotion
 
Make A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkMake A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkYodalee
 
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)Igalia
 
Hackersuli - Linux game hacking with LD_PRELOAD
Hackersuli - Linux game hacking with LD_PRELOADHackersuli - Linux game hacking with LD_PRELOAD
Hackersuli - Linux game hacking with LD_PRELOADhackersuli
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2Bahul Neel Upadhyaya
 
multi-line record grep
multi-line record grepmulti-line record grep
multi-line record grepRyoichi KATO
 
Some Tricks in Using Terminal - KienDT
Some Tricks in Using Terminal - KienDTSome Tricks in Using Terminal - KienDT
Some Tricks in Using Terminal - KienDTFramgia Vietnam
 
Parallel computing with GPars
Parallel computing with GParsParallel computing with GPars
Parallel computing with GParsPablo Molnar
 
Thrfit从入门到精通
Thrfit从入门到精通Thrfit从入门到精通
Thrfit从入门到精通炜龙 何
 
Emscripten, asm.js, and billions of math ops
Emscripten, asm.js, and billions of math opsEmscripten, asm.js, and billions of math ops
Emscripten, asm.js, and billions of math opsLuka Zakrajšek
 
Briefly Rust - Daniele Esposti - Codemotion Rome 2017
Briefly Rust - Daniele Esposti - Codemotion Rome 2017Briefly Rust - Daniele Esposti - Codemotion Rome 2017
Briefly Rust - Daniele Esposti - Codemotion Rome 2017Codemotion
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)kao kuo-tung
 
OSMC 2014: Introduction into collectd | Florian Foster
OSMC 2014: Introduction into collectd | Florian FosterOSMC 2014: Introduction into collectd | Florian Foster
OSMC 2014: Introduction into collectd | Florian FosterNETWAYS
 
Statim, time series interface for Perl.
Statim, time series interface for Perl.Statim, time series interface for Perl.
Statim, time series interface for Perl.Thiago Rondon
 
シェル芸でライフハック(特論)
シェル芸でライフハック(特論)シェル芸でライフハック(特論)
シェル芸でライフハック(特論)Yuki Shimazaki
 

What's hot (19)

Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
Asynchronous IO in Rust - Enrico Risa - Codemotion Rome 2017
 
Redis
RedisRedis
Redis
 
Make A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkMake A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst Framework
 
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
Guile 3: Faster programs via just-in-time compilation (FOSDEM 2019)
 
Why learn Internals?
Why learn Internals?Why learn Internals?
Why learn Internals?
 
Hackersuli - Linux game hacking with LD_PRELOAD
Hackersuli - Linux game hacking with LD_PRELOADHackersuli - Linux game hacking with LD_PRELOAD
Hackersuli - Linux game hacking with LD_PRELOAD
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
 
Case Study
Case Study Case Study
Case Study
 
multi-line record grep
multi-line record grepmulti-line record grep
multi-line record grep
 
Some Tricks in Using Terminal - KienDT
Some Tricks in Using Terminal - KienDTSome Tricks in Using Terminal - KienDT
Some Tricks in Using Terminal - KienDT
 
Parallel computing with GPars
Parallel computing with GParsParallel computing with GPars
Parallel computing with GPars
 
Thrfit从入门到精通
Thrfit从入门到精通Thrfit从入门到精通
Thrfit从入门到精通
 
Emscripten, asm.js, and billions of math ops
Emscripten, asm.js, and billions of math opsEmscripten, asm.js, and billions of math ops
Emscripten, asm.js, and billions of math ops
 
Briefly Rust - Daniele Esposti - Codemotion Rome 2017
Briefly Rust - Daniele Esposti - Codemotion Rome 2017Briefly Rust - Daniele Esposti - Codemotion Rome 2017
Briefly Rust - Daniele Esposti - Codemotion Rome 2017
 
15 - Streams
15 - Streams15 - Streams
15 - Streams
 
Why is a[1] fast than a.get(1)
Why is a[1]  fast than a.get(1)Why is a[1]  fast than a.get(1)
Why is a[1] fast than a.get(1)
 
OSMC 2014: Introduction into collectd | Florian Foster
OSMC 2014: Introduction into collectd | Florian FosterOSMC 2014: Introduction into collectd | Florian Foster
OSMC 2014: Introduction into collectd | Florian Foster
 
Statim, time series interface for Perl.
Statim, time series interface for Perl.Statim, time series interface for Perl.
Statim, time series interface for Perl.
 
シェル芸でライフハック(特論)
シェル芸でライフハック(特論)シェル芸でライフハック(特論)
シェル芸でライフハック(特論)
 

Similar to Compile Lisp to Assembly in an Hour

C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System LanguageScyllaDB
 
NASM Introduction.pptx
NASM Introduction.pptxNASM Introduction.pptx
NASM Introduction.pptxAnshKarwa
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
 
Compiler design notes phases of compiler
Compiler design notes phases of compilerCompiler design notes phases of compiler
Compiler design notes phases of compilerovidlivi91
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)Douglas Chen
 
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressBKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressLinaro
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudAndrea Righi
 
Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Max Kleiner
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilersAnastasiaStulova
 
r2con 2017 r2cLEMENCy
r2con 2017 r2cLEMENCyr2con 2017 r2cLEMENCy
r2con 2017 r2cLEMENCyRay Song
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linuxAjin Abraham
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathDennis Chung
 
Baby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language FunctionBaby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language FunctionKieran Kunhya
 
What Have Syscalls Done for you Lately?
What Have Syscalls Done for you Lately?What Have Syscalls Done for you Lately?
What Have Syscalls Done for you Lately?Docker, Inc.
 

Similar to Compile Lisp to Assembly in an Hour (20)

Reverse Engineering 101
Reverse Engineering 101Reverse Engineering 101
Reverse Engineering 101
 
C# as a System Language
C# as a System LanguageC# as a System Language
C# as a System Language
 
NASM Introduction.pptx
NASM Introduction.pptxNASM Introduction.pptx
NASM Introduction.pptx
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Compiler design notes phases of compiler
Compiler design notes phases of compilerCompiler design notes phases of compiler
Compiler design notes phases of compiler
 
不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)不深不淺,帶你認識 LLVM (Found LLVM in your life)
不深不淺,帶你認識 LLVM (Found LLVM in your life)
 
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and ProgressBKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
BKK16-211 Internet of Tiny Linux (io tl)- Status and Progress
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
Programming languages
Programming languagesProgramming languages
Programming languages
 
Linux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloudLinux kernel tracing superpowers in the cloud
Linux kernel tracing superpowers in the cloud
 
Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2Pascal script maxbox_ekon_14_2
Pascal script maxbox_ekon_14_2
 
Challenges in GPU compilers
Challenges in GPU compilersChallenges in GPU compilers
Challenges in GPU compilers
 
Node js lecture
Node js lectureNode js lecture
Node js lecture
 
r2con 2017 r2cLEMENCy
r2con 2017 r2cLEMENCyr2con 2017 r2cLEMENCy
r2con 2017 r2cLEMENCy
 
Shellcoding in linux
Shellcoding in linuxShellcoding in linux
Shellcoding in linux
 
Swug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainathSwug July 2010 - windows debugging by sainath
Swug July 2010 - windows debugging by sainath
 
Baby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language FunctionBaby Demuxed's First Assembly Language Function
Baby Demuxed's First Assembly Language Function
 
What Have Syscalls Done for you Lately?
What Have Syscalls Done for you Lately?What Have Syscalls Done for you Lately?
What Have Syscalls Done for you Lately?
 
Assembly language part I
Assembly language part IAssembly language part I
Assembly language part I
 
Assembly language part I
Assembly language part IAssembly language part I
Assembly language part I
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Compile Lisp to Assembly in an Hour

  • 1. “lisp” to assembly Compiler basics Phil Eaton Engineer & Manager at Capsule8 phil@capsule8.com @phil_eaton
  • 2. The premise ● Compile a subset of lisp to assembly ● Using Javascript/Node.js ● Without any third-party Javascript libraries ● Without any third-party C/assembly libraries (e.g. libc) ○ But GCC instead of NASM/FASM to simplify development on macOS ● In under an hour
  • 3. To demonstrate ● Common compiler architecture ● Basic assembly is not hard ● Starting a compiler is not hard ● Improving your compiler is not hard ● You can (and should) write a compiler too!
  • 4. What we’ll cover ● Parsing ● Code generation ○ Assembly ○ Syscalls
  • 5. What we’ll omit ● (Custom) function definitions ● Non-symbol/-numeric data types ● More than 3 function arguments ● A whole lot of safety ● A whole lot of error messaging
  • 6. Before we start coding...
  • 7. Specify the language ● S-expressions/infix-notation for syntax ● Output is assembly ● Goal: ○ Compiler Input: (+ 3 (+ 1 2)) ○ Generated Process 1: (+ 3 (+ 1 2)) ○ Generated Process 2: (+ 3 3) ○ Generated Process 3: 6 ○ Generated Output: 6
  • 8. Desired use $ node ulisp.js ‘(+ 3 (+ 1 2))’ > prog.S $ gcc -mstackrealign -masm=intel prog.S $ ./a.out $ echo $? 6
  • 10. Writing the parser ● Parser takes a string ● Accumulates “tokens” ● Produces Abstract Syntax Tree (AST) ● Goal: ○ Input (string): “(+ 3 (+ 10 2))” ○ Output (Javascript): [“+”, 3, [‘+’, 10, 2]] ● Strategy: ○ Iterate over each character ○ Recurse on left parenthesis ○ Accumulate on space and right parenthesis
  • 12. Testing it $ node > const { parse } = require('./parser'); undefined > console.log(JSON.stringify(parse('(+ 3 (+ 10 2)'))); [[["+",3,["+",10,2]]],""]
  • 13. So... when do we compile?
  • 15. Basic Assembly ● Alternate representation of binary (human-readable) ○ Basically ● Fixed set of registers (think: global integer variables) ○ e.g. RDI, RSI, RAX, etc. ○ Plus program memory (a stack) ● Numerous built-in operations ○ e.g. ADD, SUB, PUSH, POP, etc. ● Assign via MOV ○ e.g. MOV RDI, 1 ● “function” calls via CALL/RET
  • 16. And now, the dry part...
  • 17. Calling convention: Background ● Assume System V AMD64 ABI ● Remember registers are: ○ Global ○ Finite ○ Faster (than stack) ● Function caller and callee must agree who preserves which register values
  • 18. Calling convention: Caller ● Registers RDI, RSI, RDX, … are stored on the stack ● Parameter values are assigned to RDI, RSI, RDX, … ● Function is called ● Stack is popped into …, RDX, RSI, RDI to restore prior values ● Function return value is available in RAX
  • 19. Calling convention: Callee ● Preserve any registers not in RDI, RSI, RDX, etc. ● Body logic ● Return value stored in RAX ● Restore preserved registers before RET
  • 20. Show me an example!
  • 21. Writing the code generator ● Goal: ○ Take an AST (e.g. [‘+’, 3, [‘+’, 10, 2]]) ○ Produce an assembly program computes this expression andexits with the result ● Strategy: ○ Only supported AST elements are function calls and arguments ■ Arguments are numbers or function calls and arguments ○ Break out code generation into chunks by kind of AST element being compiled ■ E.g. compile_ast, compile_funcall, compile_argument ■ Include plus function as a built-in
  • 24. Syscalls ● Special functions handled by the kernel ● Allow user-land programs to get access to kernel resources ● Syscall identified by a number, differs per kernel ○ Linux: 1 -> write, 60 -> exit ○ FreeBSD: 4 -> write, 1 -> exit ○ macOS: 0x2000004 -> write, 0x2000001 -> exit ■ (0x2000000 plus the FreeBSD syscall number) ● Used like CALL, but syscall number stored in RAX beforehand
  • 25. Generating a binary $ node ulisp.js ‘(+ 3 (+ 1 2))’ > prog.s $ gcc -mstackrealign -masm=intel prog.s $ ./a.out $ echo $? 6 $ node ulisp.js ‘(+ 8 (+ 10 4))’ > prog.s $ gcc -mstackrealign -masm=intel prog.s $ ./a.out $ echo $? 22
  • 27. Improvements? Changes? ● Error messages!! ○ Track line and column numbers in parsing ○ Parser generator not particularly more useful, especially if we get into read macros ● Comments/source in generated code ● Link against libc for additional functionality/bugs ○ Sockets, threads, string utilities, memory allocation, etc. ● Target C or LLVM IR instead ○ Infinite locals! Simpler output! ● Tests!
  • 28. Further reading ● x86_64 calling convention ● macOS assembly programming ○ Stack alignment on macOS ○ Syscalls on macOS ● CHICKEN Scheme compilation process ● LLVM compiler tutorials ● Destination-driven code generation ○ Kent Dybvig’s original paper ○ One-pass code generation in V8
  • 29. Source, blog post ● https://github.com/eatonphil/ulisp ● http://notes.eatonphil.com/compiler-basics-lisp-to-assembly.html

Editor's Notes

  1. What is a compiler? To me: any process that takes one programming language and produces another
  2. Cover “basics of”
  3. Sexps simplify the syntax
  4. Sometimes this stage is split out into lexing (building up tokens) and parsing, but our language is so simple right now we don’t need that
  5. System V AMD64 ABI used by MacOS, FreeBSD, Linux, etc. “Global” meaning accessible by any “function”
  6. Gets more complicated when we need to deal with more than three (or six) parameters, will ignore
  7. Will not need to worry about this in our live coding
  8. First write compile_call following steps from previous slide Then write compile_arguments that calls compile_call or MOV-es a number into a register
  9. Kernel resource examples: writing to stdout, writing to disk, binding a socket to a port, etc.