Compilers Are Databases

M
Martin OderskyCreator of Scala
Compilers Are Databases
JVM Languages Summit
Martin Odersky
TypeSafe and EPFL
Compilers...
2
Compilers and Data Bases
3
Compilers are Data Bases?
4
Put a square peg in a round
hole?
This Talk ...
... reports on a new compiler architecture for dsc,
the Dotty Scala Compiler.
• It has a mostly functional architecture,
but uses a lot of low-level tricks for speed.
• Some of its concepts are inspired by
functional databases.
My Early Involvement in Compilers
80s Pascal, Modula-2
single pass, following the school of Niklaus Wirth.
95-96 Espresso, the 2nd Java compiler
 E Compiler
 Borland’s JBuilder
used an OO AST with one class per node
and all processing distributed between
methods on these nodes.
96-99 Pizza  GJ  javac (1.3+) -> scalac (1.x)
replaced OO AST with pattern matching.
6
Current Scala Compiler
2004-12 nsc compiler for Scala (2.0-2.10)
Made (some) use of functional capabilities of Scala
Added:
– REPL
– presentation compiler for IDEs (Eclipse, Ensime)
– run-time meta programming with toolboxes
It’s the codebase for the official scalac compiler for 2.11,
2.12 and beyond.
7
Next Generation Scala Compiler
2012 – now: Dotty
• Rethink compiler architecture
from the ground up.
• Introduce some language
changes with the aim of better
regularity.
• Status:
– Close to bootstrap
– But still rough around the edges
8
Compilers – Traditional View
9
Compilers – Traditional View
10
Add Separate Compilation
11
Challenges
A compiler for a language like Scala faces quite a few
challenges.
Among the most important are:
» Complexity
» Speed
» Latency
» Reusability
Challenge: Complex Transformations
• Input language (Scala) is complicated.
• Output language (JVM) is also complicated.
• Semantic gap between the two is large.
Compare with compilers to simple low-level languages
such as System F or SSA.
13
Deep Transformation Pipeline
Parser
Typer
FirstTransform
ValueClasses
Mixin
LazyVals
Memoize
CapturedVars
Constructors
LambdaLift
Flatten
ElimStaticThis
RestoreScopes
GenBCode
Source
Bytecode
RefChecks
ElimRepeated
NormalizeFlags
ExtensionMethods
TailRec
PatternMatcher
ExplicitOuter
ExpandSAMs
Splitter
SeqLiterals
InterceptedMeths
Literalize
Getters
ClassTags
ElimByName
AugmentS2Traits
ResolveSuper
Erasure
To achieve reliability, need
– excellent modularity
– minimized side effects
 Functional code rules!
Challenge: Speed
• Current scalac achieves 500-700 loc/sec on idiomatic
Scala code.
• Can be much lower, depending on input.
• Everyone would like it to be faster.
• But this is very hard to achieve.
- FP does have costs.
- Optimizations are ineffective.
- No hotspots, costs are
smeared out widely.
15
Challenge: Latency
• Some applications require fast turnaround for small
changes more than high throughput.
• Examples:
– REPL
– Worksheet
– IDE Presentation Compiler
 Need to keep things loaded
(program + data)
16
Challenge: Reusability
• A compiler has many clients:
– Command line
– Build tools
– IDEs
– REPL
– Meta-programming
 Abstractions must not leak.
(FP helps)
17
A Question
Every compiler has to answer questions like this:
Say I have a class
class C[T] {
def f(x: T): T = ...
}
At some point I change it to:
class C[T] {
def f(x: T)(y: T): T = ...
}
What is the type signature of C.f?
Clearly, it depends on the time when the question is asked!
18
Time-Varying Answers
Initially: (x: T): T
After erasure: (x: Any): Any
After the edit: (x: T)(y: T): T
After uncurry: (x: T, y: T): T
After erasure: (x: Any, y: Any): Any
19
Naive Functional Approach
World1  IR1,1  ...  IRn,1  Output1
World2  IR1,2  ...  IRn,2  Output2
.
.
.
Worldk  IR1,k  ...  IRn,k  Outputk
How big is the world?
20
A More Practical Strategy
Taking Inspiration from FRP and Functional Databases:
• Treat every value as a time-varying function.
• So the question is not:
“What is the signature of C.f” ?
but:
“What is the signature of C.f at a given point in time” ?
 Need to index every piece of information with the time
where it holds.
21
Time in dsc
Period = (RunID, PhaseID)
• RunIDs is incremented for each compiler run
• PhaseID ranges from 1 (parser) to ~ 50 (backend)
22
Run1 Run2 Run3
Time-Indexed Values
sig(C.f, (Run 1, parser)) = (x: T): T
sig(C.f, (Run 1, erasure)) = (x: Any): Any
sig(C.f, (Run 2, erasure)) = (x: T)(y: T): T
sig(C.f, (Run 2, uncurry)) = (x: T, y: T): T
sig(C.f, (Run 2, erasure) = (x: Any, y: Any): Any
23
Task of the Compiler
• Compute all values needed for analysis and code
generation over all periods where they are relevant.
• Problem: The graph of this function is humongous!
• More work is needed to make it efficiently explorable.
• But for a start it looks like the right model.
24
Core Data Types
Abstract Syntax Trees
Types
References
Denotations
Symbols
25
Abstract Syntax Trees
• For instance, for x * 2:
26
Tree Attributes
What about tree attributes?
In dsc, we simplified as much as we could.
Were left with just two attributes:
– Position (intrinsic)
– Type
The job of the type checker is to transform untyped to typed
trees.
27
Typed Abstract Syntax Trees
28
For instance, for x * 2:
The distinction whether a tree is typed or untyped is pretty
important, merits being reflected in the type of AST itself.
From Untyped to Typed Trees
Idea: parameterize the type Tree of AST’s with the attribute
info it carries.
Typed tree: tpd.Tree = Tree[Type]
Untyped tree: untpd.Tree = Tree[Nothing]
This leads to the following class:
class Tree[T] {
def tpe: T
def withType(t: Type): Tree[Type]
}
29
Question of Variance
• Question: Which of the following two subtype
relationships should hold?
tpd.Tree <: untpd.Tree
untpd.Tree <: tpd.Tree ?
• What is the more useful relationship?
(the first)
• What relationship do the variance
rules imply?
(the second) 30
class Tree[? T] {
def tpe: T ...
}
Fixing class Tree
class Tree[-T] {
def tpe: T @uncheckedVariance
def withType(t: Type): Tree[Type]
}
Interesting exception to the variance rules related to the
bottom type Nothing.
What can go “wrong” here? Given an untpd.Tree, I expect Nothing,
but I might get a Type.
Shows that it’s good have an escape hatch in the form of
@uncheckedVariance.
31
Types
• Types carry most of the essential information of trees
and symbols.
• Two kinds of types.
– Value types: Int, Int => Int, (Boolean, String)
– Types of definitions: (x: Int)Int, Lo..Hi, Class(...)
• Represented as subtypes of the same type “Type” for
convenience.
32
References
case class Select(qual: Tree, name: Name) {
// what is its tpe?
}
case class Ident(name: Name) {
// what is its tpe?
}
• Normally, these tree nodes would carry a “symbol”,
which acts as a reference to some definition.
• But there are no symbol attributes in dsc, for good
reason.
33
Traditional Scheme
34
That’s not very functional!
A Question of Meaning
Question: What is the meaning of
obj.fun
?
It depends on the period!
Does that mean that obj.fun has different types,
depending on period?
No, trees are immutable!
35
References
36
• A reference is a type
• It contains (only)
– a name
– potentially a prefix
• References
are immutable, they
exist forever.
What about Overloads?
The name of a TermRef may be shared by several
overloaded members of a class.
How do we determine which member is meant?
(In a nutshell, that’s why overloading is so universally hated
by compiler writers)
Trick: Allow “signature” as part of term names.
37
What Does A Reference Reference?
Surely, a symbol?
No!
References capture more than a symbol
And sometimes they do not refer to a unique
symbol at all.
38
References capture more than a symbol.
Consider:
class C[T] {
def f(x: T): T
}
val prefix = new C[Int]
Then prefix.f:
resolves to C’s f
but at type (Int)Int, not (T)T
Both pieces of information are part of the meaning of
prefix.f. 39
References
Sometimes references point to no symbol at all.
We have already seen overloading.
Here’s another example using union types, which are newly
supported by dsc:
class A { def f: Int }
class B { def f: Int }
val prefix: A | B = if (...) new A else new B
prefix.f
What symbol is referenced by prefix.f ?
40
Denotations
The meaning
of a reference is a denotation.
Non-overloaded denotations
carry symbols (maybe) and
types (always).
41
What Then Is A Symbol?
A symbol represents a declaration in some source
file.
It “lives” as long as the source file is unchanged.
It has a denotation depending on the period.
42
Denotation Transformers
• How do we compute new denotations from old ones?
• For references pre.f: Can recompute the member at
new phase.
• For symbols?
uncurry.transDenot(<(x: A)(y: B): C>) = <(x: A, y: B): C>
43
Caching Denotations
Symbols are memoized functions: Period  Denotation
Keep all denotations of a symbol at different phases as a
ring. 44
Putting it all Together
45
• ER diagram of core compiler architecture:
*
*
Lessons Learned
(Not done yet, still learning)
• Think databases for modeling.
• Think FP for transformations.
• Get efficiency through low-level techniques
(caching)
• But take care not to compromise the high-level
semantics.
46
To Find Out More
47
How to make it Fast
• Caching
– Symbols cache last denotation
– NamedTypes do the same
– Caches are stamped with validity interval (current period until the
next denotation transformer kicks in).
– Need to update only if outside of validity period
– Member lookup caches denotation
Not yet tried: Parallelization.
- Could be hard (similar to chess programs)
48
Many forms of Caches
• Lazy vals
• Memoization
• LRU Caches
• Rely on
– Purely functional semantics
– Access to low-level imperative implementation code.
– Important to keep the levels of abstractions apart!
49
Optimization: Phase Fusion
• For modularity reasons, phases should be small. Each
phase should od one self-contained transform.
• But that means we end up with many phases.
• Problem: Repeated tree rewriting is a performance killer.
• Solution: Automatically fuse phases into one tree
traversal.
– Relies on design pattern and some small amount of
introspection.
50
1 of 50

Recommended

Iceberg: a fast table format for S3 by
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3DataWorks Summit
7.5K views30 slides
Introduction to Apache NiFi 1.11.4 by
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
1K views32 slides
Performance Optimizations in Apache Impala by
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaCloudera, Inc.
10.7K views63 slides
Apache Arrow Workshop at VLDB 2019 / BOSS Session by
Apache Arrow Workshop at VLDB 2019 / BOSS SessionApache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS SessionWes McKinney
2.5K views57 slides
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste... by
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...
Spark and Object Stores —What You Need to Know: Spark Summit East talk by Ste...Spark Summit
2.9K views41 slides
Apache NiFi Crash Course Intro by
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course IntroDataWorks Summit/Hadoop Summit
6.8K views47 slides

More Related Content

What's hot

Performance Analysis of Apache Spark and Presto in Cloud Environments by
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
1.7K views42 slides
Achieve Blazing-Fast Ingest Speeds with Apache Arrow by
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache ArrowNeo4j
249 views30 slides
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim... by
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
740 views67 slides
Deep Dive into Apache Kafka by
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
8K views35 slides
Analyzing 1.2 Million Network Packets per Second in Real-time by
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-timeDataWorks Summit
14.8K views47 slides
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David... by
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...Altinity Ltd
65 views31 slides

What's hot(20)

Performance Analysis of Apache Spark and Presto in Cloud Environments by Databricks
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
Databricks1.7K views
Achieve Blazing-Fast Ingest Speeds with Apache Arrow by Neo4j
Achieve Blazing-Fast Ingest Speeds with Apache ArrowAchieve Blazing-Fast Ingest Speeds with Apache Arrow
Achieve Blazing-Fast Ingest Speeds with Apache Arrow
Neo4j249 views
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim... by InfluxData
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxData740 views
Deep Dive into Apache Kafka by confluent
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent8K views
Analyzing 1.2 Million Network Packets per Second in Real-time by DataWorks Summit
Analyzing 1.2 Million Network Packets per Second in Real-timeAnalyzing 1.2 Million Network Packets per Second in Real-time
Analyzing 1.2 Million Network Packets per Second in Real-time
DataWorks Summit14.8K views
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David... by Altinity Ltd
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
OSA Con 2022 - Arrow in Flight_ New Developments in Data Connectivity - David...
Altinity Ltd65 views
Optimizing Apache Spark UDFs by Databricks
Optimizing Apache Spark UDFsOptimizing Apache Spark UDFs
Optimizing Apache Spark UDFs
Databricks804 views
The Apache Spark File Format Ecosystem by Databricks
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
Databricks2.1K views
Simplifying Disaster Recovery with Delta Lake by Databricks
Simplifying Disaster Recovery with Delta LakeSimplifying Disaster Recovery with Delta Lake
Simplifying Disaster Recovery with Delta Lake
Databricks1.2K views
Efficient Data Storage for Analytics with Apache Parquet 2.0 by Cloudera, Inc.
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.158.6K views
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise by DataWorks Summit
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit685 views
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka by Guido Schmutz
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Guido Schmutz1.6K views
Thrift vs Protocol Buffers vs Avro - Biased Comparison by Igor Anishchenko
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Igor Anishchenko240.7K views
How to use Parquet as a basis for ETL and analytics by Julien Le Dem
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
Julien Le Dem11.9K views
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw... by InfluxData
InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...InfluxDB IOx Tech Talks: The Impossible Dream:  Easy-to-Use, Super Fast Softw...
InfluxDB IOx Tech Talks: The Impossible Dream: Easy-to-Use, Super Fast Softw...
InfluxData476 views
Presto Summit 2018 - 09 - Netflix Iceberg by kbajda
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
kbajda3K views
LLAP: long-lived execution in Hive by DataWorks Summit
LLAP: long-lived execution in HiveLLAP: long-lived execution in Hive
LLAP: long-lived execution in Hive
DataWorks Summit17.2K views
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet by DataWorks Summit
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
DataWorks Summit3.1K views
Apache BookKeeper: A High Performance and Low Latency Storage Service by Sijie Guo
Apache BookKeeper: A High Performance and Low Latency Storage ServiceApache BookKeeper: A High Performance and Low Latency Storage Service
Apache BookKeeper: A High Performance and Low Latency Storage Service
Sijie Guo5.9K views

Similar to Compilers Are Databases

Scala Days NYC 2016 by
Scala Days NYC 2016Scala Days NYC 2016
Scala Days NYC 2016Martin Odersky
74.2K views63 slides
Martin Odersky - Evolution of Scala by
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaScala Italy
1.6K views47 slides
First Class Variables as AST Annotations by
 First Class Variables as AST Annotations First Class Variables as AST Annotations
First Class Variables as AST AnnotationsESUG
58 views42 slides
First Class Variables as AST Annotations by
First Class Variables as AST AnnotationsFirst Class Variables as AST Annotations
First Class Variables as AST AnnotationsMarcus Denker
5 views42 slides
Scala Days San Francisco by
Scala Days San FranciscoScala Days San Francisco
Scala Days San FranciscoMartin Odersky
65.9K views48 slides
Archi Modelling by
Archi ModellingArchi Modelling
Archi Modellingdilane007
308 views49 slides

Similar to Compilers Are Databases(20)

Martin Odersky - Evolution of Scala by Scala Italy
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
Scala Italy1.6K views
First Class Variables as AST Annotations by ESUG
 First Class Variables as AST Annotations First Class Variables as AST Annotations
First Class Variables as AST Annotations
ESUG58 views
First Class Variables as AST Annotations by Marcus Denker
First Class Variables as AST AnnotationsFirst Class Variables as AST Annotations
First Class Variables as AST Annotations
Marcus Denker5 views
Scala Days San Francisco by Martin Odersky
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky65.9K views
Archi Modelling by dilane007
Archi ModellingArchi Modelling
Archi Modelling
dilane007308 views
QuadIron An open source library for number theoretic transform-based erasure ... by Scality
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
Scality2.1K views
Metrics ekon 14_2_kleiner by Max Kleiner
Metrics ekon 14_2_kleinerMetrics ekon 14_2_kleiner
Metrics ekon 14_2_kleiner
Max Kleiner18.1K views
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ... by Jose Quesada (hiring)
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)10.1K views
Standardizing on a single N-dimensional array API for Python by Ralf Gommers
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers119 views
Data oriented design and c++ by Mike Acton
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton33.6K views
Lecture1_computer vision-2023.pdf by ssuserff72e4
Lecture1_computer vision-2023.pdfLecture1_computer vision-2023.pdf
Lecture1_computer vision-2023.pdf
ssuserff72e417 views
Type Profiler: Ambitious Type Inference for Ruby 3 by mametter
Type Profiler: Ambitious Type Inference for Ruby 3Type Profiler: Ambitious Type Inference for Ruby 3
Type Profiler: Ambitious Type Inference for Ruby 3
mametter3.3K views

More from Martin Odersky

scalar.pdf by
scalar.pdfscalar.pdf
scalar.pdfMartin Odersky
958 views44 slides
Capabilities for Resources and Effects by
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and EffectsMartin Odersky
5.5K views29 slides
Preparing for Scala 3 by
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3Martin Odersky
11.1K views48 slides
Simplicitly by
SimplicitlySimplicitly
SimplicitlyMartin Odersky
2.1K views33 slides
What To Leave Implicit by
What To Leave ImplicitWhat To Leave Implicit
What To Leave ImplicitMartin Odersky
3.7K views41 slides
What To Leave Implicit by
What To Leave ImplicitWhat To Leave Implicit
What To Leave ImplicitMartin Odersky
6.1K views67 slides

More from Martin Odersky(17)

Capabilities for Resources and Effects by Martin Odersky
Capabilities for Resources and EffectsCapabilities for Resources and Effects
Capabilities for Resources and Effects
Martin Odersky5.5K views
Implementing Higher-Kinded Types in Dotty by Martin Odersky
Implementing Higher-Kinded Types in DottyImplementing Higher-Kinded Types in Dotty
Implementing Higher-Kinded Types in Dotty
Martin Odersky11.9K views
The Evolution of Scala by Martin Odersky
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
Martin Odersky44.4K views
Scala - The Simple Parts, SFScala presentation by Martin Odersky
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentation
Martin Odersky16.5K views
flatMap Oslo presentation slides by Martin Odersky
flatMap Oslo presentation slidesflatMap Oslo presentation slides
flatMap Oslo presentation slides
Martin Odersky24.9K views
Oscon keynote: Working hard to keep it simple by Martin Odersky
Oscon keynote: Working hard to keep it simpleOscon keynote: Working hard to keep it simple
Oscon keynote: Working hard to keep it simple
Martin Odersky33.5K views
Scala Talk at FOSDEM 2009 by Martin Odersky
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
Martin Odersky37.2K views

Recently uploaded

Zero to Automated in Under a Year by
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 views23 slides
Mini-Track: Challenges to Network Automation Adoption by
Mini-Track: Challenges to Network Automation AdoptionMini-Track: Challenges to Network Automation Adoption
Mini-Track: Challenges to Network Automation AdoptionNetwork Automation Forum
12 views27 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
19 views29 slides
Case Study Copenhagen Energy and Business Central.pdf by
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdfAitana
16 views3 slides

Recently uploaded(20)

Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
Case Study Copenhagen Energy and Business Central.pdf by Aitana
Case Study Copenhagen Energy and Business Central.pdfCase Study Copenhagen Energy and Business Central.pdf
Case Study Copenhagen Energy and Business Central.pdf
Aitana16 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
Voice Logger - Telephony Integration Solution at Aegis by Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 views
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta26 views
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada136 views
Data Integrity for Banking and Financial Services by Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely21 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views

Compilers Are Databases

  • 1. Compilers Are Databases JVM Languages Summit Martin Odersky TypeSafe and EPFL
  • 4. Compilers are Data Bases? 4 Put a square peg in a round hole?
  • 5. This Talk ... ... reports on a new compiler architecture for dsc, the Dotty Scala Compiler. • It has a mostly functional architecture, but uses a lot of low-level tricks for speed. • Some of its concepts are inspired by functional databases.
  • 6. My Early Involvement in Compilers 80s Pascal, Modula-2 single pass, following the school of Niklaus Wirth. 95-96 Espresso, the 2nd Java compiler  E Compiler  Borland’s JBuilder used an OO AST with one class per node and all processing distributed between methods on these nodes. 96-99 Pizza  GJ  javac (1.3+) -> scalac (1.x) replaced OO AST with pattern matching. 6
  • 7. Current Scala Compiler 2004-12 nsc compiler for Scala (2.0-2.10) Made (some) use of functional capabilities of Scala Added: – REPL – presentation compiler for IDEs (Eclipse, Ensime) – run-time meta programming with toolboxes It’s the codebase for the official scalac compiler for 2.11, 2.12 and beyond. 7
  • 8. Next Generation Scala Compiler 2012 – now: Dotty • Rethink compiler architecture from the ground up. • Introduce some language changes with the aim of better regularity. • Status: – Close to bootstrap – But still rough around the edges 8
  • 12. Challenges A compiler for a language like Scala faces quite a few challenges. Among the most important are: » Complexity » Speed » Latency » Reusability
  • 13. Challenge: Complex Transformations • Input language (Scala) is complicated. • Output language (JVM) is also complicated. • Semantic gap between the two is large. Compare with compilers to simple low-level languages such as System F or SSA. 13
  • 15. Challenge: Speed • Current scalac achieves 500-700 loc/sec on idiomatic Scala code. • Can be much lower, depending on input. • Everyone would like it to be faster. • But this is very hard to achieve. - FP does have costs. - Optimizations are ineffective. - No hotspots, costs are smeared out widely. 15
  • 16. Challenge: Latency • Some applications require fast turnaround for small changes more than high throughput. • Examples: – REPL – Worksheet – IDE Presentation Compiler  Need to keep things loaded (program + data) 16
  • 17. Challenge: Reusability • A compiler has many clients: – Command line – Build tools – IDEs – REPL – Meta-programming  Abstractions must not leak. (FP helps) 17
  • 18. A Question Every compiler has to answer questions like this: Say I have a class class C[T] { def f(x: T): T = ... } At some point I change it to: class C[T] { def f(x: T)(y: T): T = ... } What is the type signature of C.f? Clearly, it depends on the time when the question is asked! 18
  • 19. Time-Varying Answers Initially: (x: T): T After erasure: (x: Any): Any After the edit: (x: T)(y: T): T After uncurry: (x: T, y: T): T After erasure: (x: Any, y: Any): Any 19
  • 20. Naive Functional Approach World1  IR1,1  ...  IRn,1  Output1 World2  IR1,2  ...  IRn,2  Output2 . . . Worldk  IR1,k  ...  IRn,k  Outputk How big is the world? 20
  • 21. A More Practical Strategy Taking Inspiration from FRP and Functional Databases: • Treat every value as a time-varying function. • So the question is not: “What is the signature of C.f” ? but: “What is the signature of C.f at a given point in time” ?  Need to index every piece of information with the time where it holds. 21
  • 22. Time in dsc Period = (RunID, PhaseID) • RunIDs is incremented for each compiler run • PhaseID ranges from 1 (parser) to ~ 50 (backend) 22 Run1 Run2 Run3
  • 23. Time-Indexed Values sig(C.f, (Run 1, parser)) = (x: T): T sig(C.f, (Run 1, erasure)) = (x: Any): Any sig(C.f, (Run 2, erasure)) = (x: T)(y: T): T sig(C.f, (Run 2, uncurry)) = (x: T, y: T): T sig(C.f, (Run 2, erasure) = (x: Any, y: Any): Any 23
  • 24. Task of the Compiler • Compute all values needed for analysis and code generation over all periods where they are relevant. • Problem: The graph of this function is humongous! • More work is needed to make it efficiently explorable. • But for a start it looks like the right model. 24
  • 25. Core Data Types Abstract Syntax Trees Types References Denotations Symbols 25
  • 26. Abstract Syntax Trees • For instance, for x * 2: 26
  • 27. Tree Attributes What about tree attributes? In dsc, we simplified as much as we could. Were left with just two attributes: – Position (intrinsic) – Type The job of the type checker is to transform untyped to typed trees. 27
  • 28. Typed Abstract Syntax Trees 28 For instance, for x * 2: The distinction whether a tree is typed or untyped is pretty important, merits being reflected in the type of AST itself.
  • 29. From Untyped to Typed Trees Idea: parameterize the type Tree of AST’s with the attribute info it carries. Typed tree: tpd.Tree = Tree[Type] Untyped tree: untpd.Tree = Tree[Nothing] This leads to the following class: class Tree[T] { def tpe: T def withType(t: Type): Tree[Type] } 29
  • 30. Question of Variance • Question: Which of the following two subtype relationships should hold? tpd.Tree <: untpd.Tree untpd.Tree <: tpd.Tree ? • What is the more useful relationship? (the first) • What relationship do the variance rules imply? (the second) 30 class Tree[? T] { def tpe: T ... }
  • 31. Fixing class Tree class Tree[-T] { def tpe: T @uncheckedVariance def withType(t: Type): Tree[Type] } Interesting exception to the variance rules related to the bottom type Nothing. What can go “wrong” here? Given an untpd.Tree, I expect Nothing, but I might get a Type. Shows that it’s good have an escape hatch in the form of @uncheckedVariance. 31
  • 32. Types • Types carry most of the essential information of trees and symbols. • Two kinds of types. – Value types: Int, Int => Int, (Boolean, String) – Types of definitions: (x: Int)Int, Lo..Hi, Class(...) • Represented as subtypes of the same type “Type” for convenience. 32
  • 33. References case class Select(qual: Tree, name: Name) { // what is its tpe? } case class Ident(name: Name) { // what is its tpe? } • Normally, these tree nodes would carry a “symbol”, which acts as a reference to some definition. • But there are no symbol attributes in dsc, for good reason. 33
  • 35. A Question of Meaning Question: What is the meaning of obj.fun ? It depends on the period! Does that mean that obj.fun has different types, depending on period? No, trees are immutable! 35
  • 36. References 36 • A reference is a type • It contains (only) – a name – potentially a prefix • References are immutable, they exist forever.
  • 37. What about Overloads? The name of a TermRef may be shared by several overloaded members of a class. How do we determine which member is meant? (In a nutshell, that’s why overloading is so universally hated by compiler writers) Trick: Allow “signature” as part of term names. 37
  • 38. What Does A Reference Reference? Surely, a symbol? No! References capture more than a symbol And sometimes they do not refer to a unique symbol at all. 38
  • 39. References capture more than a symbol. Consider: class C[T] { def f(x: T): T } val prefix = new C[Int] Then prefix.f: resolves to C’s f but at type (Int)Int, not (T)T Both pieces of information are part of the meaning of prefix.f. 39
  • 40. References Sometimes references point to no symbol at all. We have already seen overloading. Here’s another example using union types, which are newly supported by dsc: class A { def f: Int } class B { def f: Int } val prefix: A | B = if (...) new A else new B prefix.f What symbol is referenced by prefix.f ? 40
  • 41. Denotations The meaning of a reference is a denotation. Non-overloaded denotations carry symbols (maybe) and types (always). 41
  • 42. What Then Is A Symbol? A symbol represents a declaration in some source file. It “lives” as long as the source file is unchanged. It has a denotation depending on the period. 42
  • 43. Denotation Transformers • How do we compute new denotations from old ones? • For references pre.f: Can recompute the member at new phase. • For symbols? uncurry.transDenot(<(x: A)(y: B): C>) = <(x: A, y: B): C> 43
  • 44. Caching Denotations Symbols are memoized functions: Period  Denotation Keep all denotations of a symbol at different phases as a ring. 44
  • 45. Putting it all Together 45 • ER diagram of core compiler architecture: * *
  • 46. Lessons Learned (Not done yet, still learning) • Think databases for modeling. • Think FP for transformations. • Get efficiency through low-level techniques (caching) • But take care not to compromise the high-level semantics. 46
  • 47. To Find Out More 47
  • 48. How to make it Fast • Caching – Symbols cache last denotation – NamedTypes do the same – Caches are stamped with validity interval (current period until the next denotation transformer kicks in). – Need to update only if outside of validity period – Member lookup caches denotation Not yet tried: Parallelization. - Could be hard (similar to chess programs) 48
  • 49. Many forms of Caches • Lazy vals • Memoization • LRU Caches • Rely on – Purely functional semantics – Access to low-level imperative implementation code. – Important to keep the levels of abstractions apart! 49
  • 50. Optimization: Phase Fusion • For modularity reasons, phases should be small. Each phase should od one self-contained transform. • But that means we end up with many phases. • Problem: Repeated tree rewriting is a performance killer. • Solution: Automatically fuse phases into one tree traversal. – Relies on design pattern and some small amount of introspection. 50