SlideShare a Scribd company logo
1 of 67
Download to read offline
Crystal internals
Part 1
Is a compiler a hard thing?
At Manas we usually do webapps
Let’s talk about webapps...
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Let’s talk about webapps...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy…?
Let’s talk about compilers...
● HTML/CSS/JS
● React/Angular/Knockout
● Ruby/Erlang/Elixir
● Database (mysql/postgres)
● Elasticsearch
● Redis/Sidekiq/Background-jobs
● Docker, capistrano, deploy, servers
Easy!
Let’s talk about compilers...
Let’s talk about compilers...
No, let’s talk about usual programs
No, let’s talk about usual programs
INPUT -> [PROCESSING…] -> OUTPUT
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
No, let’s talk about compilers
SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
How do we go from source code to an executable?
Traditional stages of a compiler
class Foo
def bar
1 + 2
end
end
● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”]
● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)])
● Semantic (a.k.a “type check”): make sure there are no type errors
● Codegen: generate machine code
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Let’s start with the codegen phase
Goal: generate efficient assembly code for many architectures (32 bits, 64 bits,
intel, arm, etc.)
● Generating assembly code is hard
● Generating efficient assembly code is harder
● Generating assembly code for many architectures is hard/tedious/boring
Thus: writing a compiler is HARD! :-(
Well, not anymore...
Codegen
With LLVM, we generate LLVM IR (internal representation) instead of assembly,
and LLVM takes care of generating efficient assembly code for us!
The hardest part is solved :-)
define i32 @add(i32 %x, i32 %y) {
%0 = add i32 %x, %y
ret i32 %0
}
Codegen: LLVM (example)
LLVM provides a nice API to generate IR
require "llvm"
mod = LLVM::Module.new("main")
mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func|
func.basic_blocks.append do |builder|
res = builder.add(func.params[0], func.params[1])
builder.ret(res)
end
end
puts mod
● Lexer
● Parser
● Semantic
Remaining phases
● Kind of easy: go char by char until we get a keyword, identifier, number, etc.
● We won’t go into implementation details...
Lexer
● Kind of easy: go token by token and create a tree of expressions
● This tree is called AST: Abstract Syntax Tree
● An AST is like a directed, acyclic graph
● We won’t go into implementation details...
Parser
● This is the fundamental piece of the compiler
● It takes an AST as input and analyzes it
● Analysis can result in:
○ Declaring types: for example “class Foo; end” will declare a type Foo
○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the
method “bar” exists in it, and has the correct arity and types
○ Giving each non-dead expression in the program a type
○ Gathering some info for the codegen phase: for example know the local variables of a method,
and their type
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
Semantic
● The interesting part of the compiler is the semantic phase
● It’s just about processing an AST
● In Crystal’s compiler you just need to know one language: Crystal!
● No HTML/CSS/JS/JSX/etc.
● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe!
● Stuff is processed in memory
● No databases, no Elasticsearch, no Redis
Writing a compiler is easier than writing a web app! ^_^
(Or at least it’s more fun :-P)
Semantic
Directory layout
● src/compiler/crystal
○ command/
○ syntax/
○ semantic/
○ macros/
○ codegen/
○ tools/
○ compiler.cr
○ types.cr
○ program.cr
Directory layout
● src/compiler/crystal
○ command/ : the command line interface
○ syntax/ : lexer, parser, ast, visitor, transformer
○ semantic/ : type declaration, method lookup, etc.
○ macros/ : macro expansion logic
○ codegen/ : codegen
○ tools/ : doc generator, formatter, init
○ compiler.cr : combines syntax + semantic + codegen
○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.)
○ program.cr : holds definitions of a program (holds Int32, String, etc.)
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
Directory layout
● src/compiler/crystal : ~43K LOC
○ command/ : ~300LOC
○ syntax/ : ~10K LOC
○ semantic/ : ~12K LOC
○ macros/ : ~2K LOC
○ codegen/ : ~6K LOC
○ tools/ : ~7K LOC
○ compiler.cr : ~300LOC
○ types.cr :~2K LOC
○ program.cr : ~300 LOC
About 14K LOC to analyze source code.
One big Rails app at Manas has 14K LOC in “./app”
A compiler can’t be that hard! ;-)
Show me the code
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Program
● Holds all types and top-level methods for a given compilation
● For example, if I compile “class Foo; end” and you compile “class Bar; end”,
the first program will have a type named “Foo”, and the second one won’t (but
it will have a type named “Bar”)
● It lets us test the compiler more easily, because we can use different Program
instances for each snippet of code that we want to test
● In contrast of having global variables holding all of a program’s data
● A Program is passed around in all phases of a compilation (except lexing and
parsing, which don’t need semantic info)
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source # from source to Crystal::ASTNode
node = program.semantic node, @stats
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Show me the code
# src/compiler/crystal/compiler.cr
def compile(source : Source | Array(Source), output_filename : String) : Result
source = [source] unless source.is_a?(Array)
program = new_program(source)
node = parse program, source
node = program.semantic node, @stats # Semantic! :-)
codegen program, node, source, output_filename unless @no_codegen
Result.new program, node
end
What is a program?
Semantic
● The entry point for semantic analysis is in
src/compiler/crystal/semantic.cr
● Other files are in src/compiler/crystal/semantic/
● The file semantic.cr has comments that explain the overall algorithm :-)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods: create `new` methods for every `initialize` method
● type declarations: process type declarations like `@x : Int32`
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers: process initializers like `@@x = 1`
● instance_vars_initializers: process initializers like `@x = 1`
● main: process "main" code, calls and method bodies (the whole program).
● cleanup: remove dead code and other simplifications
● check recursive structs: check that structs are not recursive (impossible to
codegen)
Semantic: overall algorithm
Note!
● This algorithm didn’t come from the Skies
(nor from a textbook, nor from a paper)
● It’s not written in stone!
● It can definitely be improved: readability,
performance, etc.
Note!
● It’s actually more like this…
Semantic: overall algorithm
Semantic
But before looking at each phase, we need to learn about the most useful pattern
for analyzing an AST...
The Visitor pattern
require "compiler/crystal/syntax"
class SumVisitor < Crystal::Visitor
getter sum = 0
def visit(node : Crystal::NumberLiteral)
@sum += node.value.to_i
end
def visit(node : Crystal::ASTNode)
true # true: continue visiting children nodes
end
end
ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])")
visitor = SumVisitor.new
ast.accept(visitor)
puts visitor.sum
The Visitor pattern
● We define a visit method for each node of interest
● We process the nodes
● We return true if we want to process children, false otherwise
● Example: if we only want to process class declarations, we could just define
visit(node : Crystal::ClassDef) and define some logic there (and return true,
because of nested class definitions)
● A visitor abstracts over the way nodes are composed
● ...though in many cases, for semantic purposes, we need and use the way a
node is composed (for example, to analyze a call we need to know the
argument types, so we check the arguments, not all children in a generic way)
Semantic: overall algorithm
● top level: declare classes, modules, macros, defs and other top-level stuff
● new methods
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
Top level: declare classes, modules, macros, defs...
# src/compiler/crystal/semantic/top_level_visitor.cr
class Crystal::TopLevelVisitor < Crystal::SemanticVisitor
# ...
end
● Located at semantic_visitor.cr
● This is a base visitor used in most of the phases of the semantic analysis
● It keeps track of the “current type”
● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at
the top-level (the Program). When “class Foo” is found, the current type
becomes “Foo” (we search “Foo” in the current type). When “class Bar” is
found, the current type becomes “Foo::Bar” (we search “Bar” in the current
type). When “baz” is found, it will be looked up inside the current type.
● But initially there’s no “Foo” inside the current type (the Program). Who
defines it? … The top-level visitor!
Crystal::SemanticVisitor
● Located at top_level_visitor.cr
● Defines classes, methods, etc.
● Given “class Foo; class Bar; baz; end; end”...
● current_type starts at Program
● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current
type. If not, we create it. If it exists with a different type (if it’s a module), we
give an error.
● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use
this in every subsequent phase.
● … the “baz” call is not analyzed here (unless it’s a macro)
Crystal::TopLevelVisitor
Crystal::TopLevelVisitor
● Many other things done in this visitor: methods and macros are added to
types, aliases and enums are defined, etc.
● Question: why are methods and macros defined at this phase?
● The “inherited” macro hook must be processed as soon as “Bar <
Foo” and “Baz < Foo” are found
● The macro expands to “do_something”, which must expand to
“def foo; 1; end”
● This must happen before we continue processing Baz’s body:
“def foo; 3; end” must win and be the method found when doing
“Baz.new.foo”
● Conclusion: methods, macros and hooks must be defined in the
first pass, when defining types. Additionally, macros might be
looked up in types in this same pass (like “do_something”)
● SemanticVisitor takes care to look up and expand calls that
resolve to macro calls
When should macros be defined and expanded
class Foo
macro inherited
do_something
end
macro do_something
def foo; 1; end
end
end
class Bar < Foo; end
class Baz < Foo
def foo; 3; end
end
puts Bar.new.foo # => 1
puts Baz.new.foo # => 3
Method overloads
● Crystal methods are very powerful! For example: optional type restrictions,
different number of arguments, default arguments, splat, etc.
● When methods are added to types we need to:
○ Know if a method replaces (redefines) an old method
○ Track whether a method is “stricter” than another method, to quickly know, given a call
argument types, in which order they are going to be tested
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● Given foo(1), both methods match it. However, the first overload
should be invoked because it has a stronger restriction than the
second overload.
● If we define the methods in a different order, it still works the
same
● This is because an argument with a type restriction is stronger than
one without one. We say that the first one is a restriction of the
second one (we should probably rename this to use stronger)
● This applies to types too: Int32 is stronger than Int32 |
String. And Bar is stronger than Foo, if Bar < Foo.
● Given two methods with the same name, if all arguments of a
method are stronger than the others’, the whole method is stronger
and should come first. Each type stores an ordered list of methods
indexed by method name, with this notion.
● If the methods are both stronger than each other, they have the
same restriction.
Method restrictions
def foo(x : Int32)
puts 1
end
def foo(x)
puts 2
end
foo(1)
foo('a')
● This logic is located at restrictions.cr
● A lot of cases to consider: generics, tuples, splats, etc.
● The code and algorithms could probably use a simpler, unified logic
and a cleanup, but first all of these concepts and definitions must be
defined much more formally
Semantic: overall algorithm
● top level
● new methods: create `new` methods for every `initialize` method
● type declarations
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at new.cr
● TopLevelVisitor creates a `new` class method for every `initialize` method it
finds (the logic for this is also in new.cr)
● Classes that end up without an `initialize` need a default, argless `self.new`
method
● This phase is a bit messy right now because of some missing things related to
generics…
Semantic: new methods
class Foo
def initialize(x : Int32)
@x = x
end
# Generated from the above
def self.new(x : Int32)
instance = allocate
instance.initialize(x)
if instance.responds_to?(:finalize)
::GC.add_finalizer(instance)
end
end
end
Semantic: new methods
Semantic: overall algorithm
● top level
● new methods
● type declarations: process type declarations like `@x : Int32`
● check abstract defs
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at type_declaration_processor.cr (and
type_declaration_visitor.cr and type_guess_visitor.cr)
● Combines info gathered by these two visitors to declare the type of instance
and class variables.
● TypeDeclarationVisitor deals with explicit type declarations
● TypeGuessVisitor tries to “guess” the type of instance and class variables
without an explicit type annotations (for example @x = 1 and @x =
Foo.new)
Semantic: type declarations
Semantic: overall algorithm
● top level
● new methods
● type declarations
● check abstract defs: check that abstract defs are implemented
● class_vars_initializers
● instance_vars_initializers
● main
● cleanup
● check recursive structs
● Located at abstract_def_checker.cr
● Not a visitor, but traverses all types, and for those that have abstract defs
checks that subclasses or including modules defined those methods
Semantic: check abstract defs

More Related Content

What's hot

High Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetHigh Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetParag Ahire
 
Building flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on QuarkusBuilding flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on QuarkusIvelin Yanev
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Databricks
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Databricks
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaSpark Summit
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustAltinity Ltd
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInDatabricks
 
Low Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfLow Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfClaus Ibsen
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageJulien Le Dem
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...confluent
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...Databricks
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
Exadata
ExadataExadata
Exadatatalek
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataDataWorks Summit/Hadoop Summit
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackMichel Tricot
 

What's hot (20)

High Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data SetHigh Performance Computing on NYC Yellow Taxi Data Set
High Performance Computing on NYC Yellow Taxi Data Set
 
Building flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on QuarkusBuilding flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on Quarkus
 
Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1Deep Dive into the New Features of Apache Spark 3.1
Deep Dive into the New Features of Apache Spark 3.1
 
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
Deep Dive into Stateful Stream Processing in Structured Streaming with Tathag...
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
 
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, AdjustShipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
Shipping Data from Postgres to Clickhouse, by Murat Kabilov, Adjust
 
Spark vs Hadoop
Spark vs HadoopSpark vs Hadoop
Spark vs Hadoop
 
Improving Spark SQL at LinkedIn
Improving Spark SQL at LinkedInImproving Spark SQL at LinkedIn
Improving Spark SQL at LinkedIn
 
Low Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdfLow Code Integration with Apache Camel.pdf
Low Code Integration with Apache Camel.pdf
 
Open core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineageOpen core summit: Observability for data pipelines with OpenLineage
Open core summit: Observability for data pipelines with OpenLineage
 
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
KSQL-ops! Running ksqlDB in the Wild (Simon Aubury, ThoughtWorks) Kafka Summi...
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
The Killer Feature Store: Orchestrating Spark ML Pipelines and MLflow for Pro...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Exadata
ExadataExadata
Exadata
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Apache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing dataApache Beam: A unified model for batch and stream processing data
Apache Beam: A unified model for batch and stream processing data
 
Airbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stackAirbyte @ Airflow Summit - The new modern data stack
Airbyte @ Airflow Summit - The new modern data stack
 

Similar to Crystal internals (part 1)

Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScriptJorg Janke
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...apidays
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_netNico Ludwig
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...Sang Don Kim
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processorsRebaz Najeeb
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with serverEugene Yokota
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Eugene Yokota
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009spierre
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScriptOffirmo
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesEelco Visser
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentDavid Galeano
 
Software Development Automation With Scripting Languages
Software Development Automation With Scripting LanguagesSoftware Development Automation With Scripting Languages
Software Development Automation With Scripting LanguagesIonela
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?mikaelbarbero
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdfovidlivi91
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCTim Burks
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdfAkarTaher
 

Similar to Crystal internals (part 1) (20)

Dart the Better JavaScript
Dart the Better JavaScriptDart the Better JavaScript
Dart the Better JavaScript
 
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
Apidays Paris 2023 - Forget TypeScript, Choose Rust to build Robust, Fast and...
 
(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net(1) c sharp introduction_basics_dot_net
(1) c sharp introduction_basics_dot_net
 
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
[Td 2015] what is new in visual c++ 2015 and future directions(ulzii luvsanba...
 
C Language
C LanguageC Language
C Language
 
A Life of breakpoint
A Life of breakpointA Life of breakpoint
A Life of breakpoint
 
Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!Ruxmon.2013-08.-.CodeBro!
Ruxmon.2013-08.-.CodeBro!
 
Lecture 1 introduction to language processors
Lecture 1  introduction to language processorsLecture 1  introduction to language processors
Lecture 1 introduction to language processors
 
Road to sbt 1.0 paved with server
Road to sbt 1.0   paved with serverRoad to sbt 1.0   paved with server
Road to sbt 1.0 paved with server
 
ArangoDB
ArangoDBArangoDB
ArangoDB
 
Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)Road to sbt 1.0: Paved with server (2015 Amsterdam)
Road to sbt 1.0: Paved with server (2015 Amsterdam)
 
Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009Sugar Presentation - YULHackers March 2009
Sugar Presentation - YULHackers March 2009
 
Power Leveling your TypeScript
Power Leveling your TypeScriptPower Leveling your TypeScript
Power Leveling your TypeScript
 
TI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific LanguagesTI1220 Lecture 14: Domain-Specific Languages
TI1220 Lecture 14: Domain-Specific Languages
 
mloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game developmentmloc.js 2014 - JavaScript and the browser as a platform for game development
mloc.js 2014 - JavaScript and the browser as a platform for game development
 
Software Development Automation With Scripting Languages
Software Development Automation With Scripting LanguagesSoftware Development Automation With Scripting Languages
Software Development Automation With Scripting Languages
 
Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?Language Server Protocol - Why the Hype?
Language Server Protocol - Why the Hype?
 
Structure-Compiler-phases information about basics of compiler. Pdfpdf
Structure-Compiler-phases information  about basics of compiler. PdfpdfStructure-Compiler-phases information  about basics of compiler. Pdfpdf
Structure-Compiler-phases information about basics of compiler. Pdfpdf
 
Build Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPCBuild Great Networked APIs with Swift, OpenAPI, and gRPC
Build Great Networked APIs with Swift, OpenAPI, and gRPC
 
Compiler_Lecture1.pdf
Compiler_Lecture1.pdfCompiler_Lecture1.pdf
Compiler_Lecture1.pdf
 

Recently uploaded

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 

Recently uploaded (20)

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 

Crystal internals (part 1)

  • 2. Is a compiler a hard thing?
  • 3. At Manas we usually do webapps
  • 4. Let’s talk about webapps...
  • 5. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers
  • 6. Let’s talk about webapps... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy…?
  • 7. Let’s talk about compilers... ● HTML/CSS/JS ● React/Angular/Knockout ● Ruby/Erlang/Elixir ● Database (mysql/postgres) ● Elasticsearch ● Redis/Sidekiq/Background-jobs ● Docker, capistrano, deploy, servers Easy!
  • 8. Let’s talk about compilers...
  • 9. Let’s talk about compilers...
  • 10. No, let’s talk about usual programs
  • 11. No, let’s talk about usual programs INPUT -> [PROCESSING…] -> OUTPUT
  • 12. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE
  • 13. No, let’s talk about compilers SOURCE CODE -> [PROCESSING…] -> EXECUTABLE How do we go from source code to an executable?
  • 14. Traditional stages of a compiler class Foo def bar 1 + 2 end end ● Lexer: [“class”, “Foo”, “;”, “def”, “bar”, “;”, “1”, “+”, “2”, “;”, “end”, “;”, “end”] ● Parser: ClassDef(“Foo”, body: [Def.new(“bar”)]) ● Semantic (a.k.a “type check”): make sure there are no type errors ● Codegen: generate machine code
  • 15. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring
  • 16. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-(
  • 17. Let’s start with the codegen phase Goal: generate efficient assembly code for many architectures (32 bits, 64 bits, intel, arm, etc.) ● Generating assembly code is hard ● Generating efficient assembly code is harder ● Generating assembly code for many architectures is hard/tedious/boring Thus: writing a compiler is HARD! :-( Well, not anymore...
  • 18.
  • 19.
  • 20. Codegen With LLVM, we generate LLVM IR (internal representation) instead of assembly, and LLVM takes care of generating efficient assembly code for us! The hardest part is solved :-)
  • 21. define i32 @add(i32 %x, i32 %y) { %0 = add i32 %x, %y ret i32 %0 } Codegen: LLVM (example)
  • 22. LLVM provides a nice API to generate IR require "llvm" mod = LLVM::Module.new("main") mod.functions.add("add", [LLVM::Int32, LLVM::Int32], LLVM::Int32) do |func| func.basic_blocks.append do |builder| res = builder.add(func.params[0], func.params[1]) builder.ret(res) end end puts mod
  • 23. ● Lexer ● Parser ● Semantic Remaining phases
  • 24. ● Kind of easy: go char by char until we get a keyword, identifier, number, etc. ● We won’t go into implementation details... Lexer
  • 25. ● Kind of easy: go token by token and create a tree of expressions ● This tree is called AST: Abstract Syntax Tree ● An AST is like a directed, acyclic graph ● We won’t go into implementation details... Parser
  • 26. ● This is the fundamental piece of the compiler ● It takes an AST as input and analyzes it ● Analysis can result in: ○ Declaring types: for example “class Foo; end” will declare a type Foo ○ Checking methods: for example “Foo.bar” will check that “Foo” is a declared type and that the method “bar” exists in it, and has the correct arity and types ○ Giving each non-dead expression in the program a type ○ Gathering some info for the codegen phase: for example know the local variables of a method, and their type Semantic
  • 27. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Semantic
  • 28. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ Semantic
  • 29. ● The interesting part of the compiler is the semantic phase ● It’s just about processing an AST ● In Crystal’s compiler you just need to know one language: Crystal! ● No HTML/CSS/JS/JSX/etc. ● No untyped, dynamic languages: no Ruby/Erlang/Elixir. Type safe! ● Stuff is processed in memory ● No databases, no Elasticsearch, no Redis Writing a compiler is easier than writing a web app! ^_^ (Or at least it’s more fun :-P) Semantic
  • 30.
  • 31. Directory layout ● src/compiler/crystal ○ command/ ○ syntax/ ○ semantic/ ○ macros/ ○ codegen/ ○ tools/ ○ compiler.cr ○ types.cr ○ program.cr
  • 32. Directory layout ● src/compiler/crystal ○ command/ : the command line interface ○ syntax/ : lexer, parser, ast, visitor, transformer ○ semantic/ : type declaration, method lookup, etc. ○ macros/ : macro expansion logic ○ codegen/ : codegen ○ tools/ : doc generator, formatter, init ○ compiler.cr : combines syntax + semantic + codegen ○ types.cr : all possible types in Crystal (Int32, String, unions, custom types, etc.) ○ program.cr : holds definitions of a program (holds Int32, String, etc.)
  • 33. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC
  • 34. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code.
  • 35. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app”
  • 36. Directory layout ● src/compiler/crystal : ~43K LOC ○ command/ : ~300LOC ○ syntax/ : ~10K LOC ○ semantic/ : ~12K LOC ○ macros/ : ~2K LOC ○ codegen/ : ~6K LOC ○ tools/ : ~7K LOC ○ compiler.cr : ~300LOC ○ types.cr :~2K LOC ○ program.cr : ~300 LOC About 14K LOC to analyze source code. One big Rails app at Manas has 14K LOC in “./app” A compiler can’t be that hard! ;-)
  • 37. Show me the code
  • 38. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 39. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end
  • 40. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 41. Program ● Holds all types and top-level methods for a given compilation ● For example, if I compile “class Foo; end” and you compile “class Bar; end”, the first program will have a type named “Foo”, and the second one won’t (but it will have a type named “Bar”) ● It lets us test the compiler more easily, because we can use different Program instances for each snippet of code that we want to test ● In contrast of having global variables holding all of a program’s data ● A Program is passed around in all phases of a compilation (except lexing and parsing, which don’t need semantic info)
  • 42. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source # from source to Crystal::ASTNode node = program.semantic node, @stats codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 43. Show me the code # src/compiler/crystal/compiler.cr def compile(source : Source | Array(Source), output_filename : String) : Result source = [source] unless source.is_a?(Array) program = new_program(source) node = parse program, source node = program.semantic node, @stats # Semantic! :-) codegen program, node, source, output_filename unless @no_codegen Result.new program, node end What is a program?
  • 44. Semantic ● The entry point for semantic analysis is in src/compiler/crystal/semantic.cr ● Other files are in src/compiler/crystal/semantic/ ● The file semantic.cr has comments that explain the overall algorithm :-)
  • 45. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods: create `new` methods for every `initialize` method ● type declarations: process type declarations like `@x : Int32` ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers: process initializers like `@@x = 1` ● instance_vars_initializers: process initializers like `@x = 1` ● main: process "main" code, calls and method bodies (the whole program). ● cleanup: remove dead code and other simplifications ● check recursive structs: check that structs are not recursive (impossible to codegen)
  • 46. Semantic: overall algorithm Note! ● This algorithm didn’t come from the Skies (nor from a textbook, nor from a paper) ● It’s not written in stone! ● It can definitely be improved: readability, performance, etc.
  • 47. Note! ● It’s actually more like this… Semantic: overall algorithm
  • 48. Semantic But before looking at each phase, we need to learn about the most useful pattern for analyzing an AST...
  • 50. require "compiler/crystal/syntax" class SumVisitor < Crystal::Visitor getter sum = 0 def visit(node : Crystal::NumberLiteral) @sum += node.value.to_i end def visit(node : Crystal::ASTNode) true # true: continue visiting children nodes end end ast = Crystal::Parser.parse("foo(1 + 2, 3, [4])") visitor = SumVisitor.new ast.accept(visitor) puts visitor.sum
  • 51. The Visitor pattern ● We define a visit method for each node of interest ● We process the nodes ● We return true if we want to process children, false otherwise ● Example: if we only want to process class declarations, we could just define visit(node : Crystal::ClassDef) and define some logic there (and return true, because of nested class definitions) ● A visitor abstracts over the way nodes are composed ● ...though in many cases, for semantic purposes, we need and use the way a node is composed (for example, to analyze a call we need to know the argument types, so we check the arguments, not all children in a generic way)
  • 52. Semantic: overall algorithm ● top level: declare classes, modules, macros, defs and other top-level stuff ● new methods ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 53. Top level: declare classes, modules, macros, defs... # src/compiler/crystal/semantic/top_level_visitor.cr class Crystal::TopLevelVisitor < Crystal::SemanticVisitor # ... end
  • 54. ● Located at semantic_visitor.cr ● This is a base visitor used in most of the phases of the semantic analysis ● It keeps track of the “current type” ● For example in “class Foo; class Bar; baz; end; end”, “current type” starts at the top-level (the Program). When “class Foo” is found, the current type becomes “Foo” (we search “Foo” in the current type). When “class Bar” is found, the current type becomes “Foo::Bar” (we search “Bar” in the current type). When “baz” is found, it will be looked up inside the current type. ● But initially there’s no “Foo” inside the current type (the Program). Who defines it? … The top-level visitor! Crystal::SemanticVisitor
  • 55. ● Located at top_level_visitor.cr ● Defines classes, methods, etc. ● Given “class Foo; class Bar; baz; end; end”... ● current_type starts at Program ● When “class Foo” is found (ClassDef), we check if “Foo” exists in the current type. If not, we create it. If it exists with a different type (if it’s a module), we give an error. ● We attach this type “Foo” to the AST node ClassDef. SemnticVisitor will use this in every subsequent phase. ● … the “baz” call is not analyzed here (unless it’s a macro) Crystal::TopLevelVisitor
  • 56. Crystal::TopLevelVisitor ● Many other things done in this visitor: methods and macros are added to types, aliases and enums are defined, etc. ● Question: why are methods and macros defined at this phase?
  • 57. ● The “inherited” macro hook must be processed as soon as “Bar < Foo” and “Baz < Foo” are found ● The macro expands to “do_something”, which must expand to “def foo; 1; end” ● This must happen before we continue processing Baz’s body: “def foo; 3; end” must win and be the method found when doing “Baz.new.foo” ● Conclusion: methods, macros and hooks must be defined in the first pass, when defining types. Additionally, macros might be looked up in types in this same pass (like “do_something”) ● SemanticVisitor takes care to look up and expand calls that resolve to macro calls When should macros be defined and expanded class Foo macro inherited do_something end macro do_something def foo; 1; end end end class Bar < Foo; end class Baz < Foo def foo; 3; end end puts Bar.new.foo # => 1 puts Baz.new.foo # => 3
  • 58. Method overloads ● Crystal methods are very powerful! For example: optional type restrictions, different number of arguments, default arguments, splat, etc. ● When methods are added to types we need to: ○ Know if a method replaces (redefines) an old method ○ Track whether a method is “stricter” than another method, to quickly know, given a call argument types, in which order they are going to be tested
  • 59. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● Given foo(1), both methods match it. However, the first overload should be invoked because it has a stronger restriction than the second overload. ● If we define the methods in a different order, it still works the same ● This is because an argument with a type restriction is stronger than one without one. We say that the first one is a restriction of the second one (we should probably rename this to use stronger) ● This applies to types too: Int32 is stronger than Int32 | String. And Bar is stronger than Foo, if Bar < Foo. ● Given two methods with the same name, if all arguments of a method are stronger than the others’, the whole method is stronger and should come first. Each type stores an ordered list of methods indexed by method name, with this notion. ● If the methods are both stronger than each other, they have the same restriction.
  • 60. Method restrictions def foo(x : Int32) puts 1 end def foo(x) puts 2 end foo(1) foo('a') ● This logic is located at restrictions.cr ● A lot of cases to consider: generics, tuples, splats, etc. ● The code and algorithms could probably use a simpler, unified logic and a cleanup, but first all of these concepts and definitions must be defined much more formally
  • 61. Semantic: overall algorithm ● top level ● new methods: create `new` methods for every `initialize` method ● type declarations ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 62. ● Located at new.cr ● TopLevelVisitor creates a `new` class method for every `initialize` method it finds (the logic for this is also in new.cr) ● Classes that end up without an `initialize` need a default, argless `self.new` method ● This phase is a bit messy right now because of some missing things related to generics… Semantic: new methods
  • 63. class Foo def initialize(x : Int32) @x = x end # Generated from the above def self.new(x : Int32) instance = allocate instance.initialize(x) if instance.responds_to?(:finalize) ::GC.add_finalizer(instance) end end end Semantic: new methods
  • 64. Semantic: overall algorithm ● top level ● new methods ● type declarations: process type declarations like `@x : Int32` ● check abstract defs ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 65. ● Located at type_declaration_processor.cr (and type_declaration_visitor.cr and type_guess_visitor.cr) ● Combines info gathered by these two visitors to declare the type of instance and class variables. ● TypeDeclarationVisitor deals with explicit type declarations ● TypeGuessVisitor tries to “guess” the type of instance and class variables without an explicit type annotations (for example @x = 1 and @x = Foo.new) Semantic: type declarations
  • 66. Semantic: overall algorithm ● top level ● new methods ● type declarations ● check abstract defs: check that abstract defs are implemented ● class_vars_initializers ● instance_vars_initializers ● main ● cleanup ● check recursive structs
  • 67. ● Located at abstract_def_checker.cr ● Not a visitor, but traverses all types, and for those that have abstract defs checks that subclasses or including modules defined those methods Semantic: check abstract defs