SlideShare a Scribd company logo
1 of 83
Download to read offline
Dive Into Catalyst
QCon Beijing 2015
Cheng Lian <lian@databricks.com>
“In almost every computation a great variety of arrangements
for the succession of the processes is possible, and various
considerations must influence the selection amongst them for
the purposes of a Calculating Engine. One essential object is
to choose that arrangement which shall tend to reduce to a
minimum the time necessary for completing the calculation.”
Ada Lovelace, 1843
What Is Catalyst?
Catalyst is a functional, extensible query
optimizer used by Spark SQL.
- Leverages advanced FP language (Scala)
features
- Contains a library for representing trees and
applying rules on them
Trees
Tree is the main data structure used in Catalyst
- A tree is composed of node objects
- A node has a node type and zero or more
children
- Node types are defined in Scala as
subclasses of the TreeNode class
Trees
Examples
- Literal(value: Int)
- Attribute(name: String)
- Add(left: TreeNode, right: TreeNode)
Add(
Attribute(x),
Add(Literal(1), Literal(2)))
Rules
Rules are functions that transform trees
- Typically functional
- Leverage pattern matching
- Used together with
- TreeNode.transform (synonym of transformDown)
- TreeNode.transformDown (pre-order traversal)
- TreeNode.transformUp (post-order traversal)
Rules
Examples
- Simple constant folding
tree.transform {
case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2)
case Add(left, Literal(0)) => left
case Add(Literal(0), right) => right
}
Rules
Examples
- Simple constant folding
tree.transform {
case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2)
case Add(left, Literal(0)) => left
case Add(Literal(0), right) => right
}
Pattern
Rules
Examples
- Simple constant folding
tree.transform {
case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2)
case Add(left, Literal(0)) => left
case Add(Literal(0), right) => right
}
Transformation
Rules
Examples
- Simple constant folding
tree.transform {
case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2)
case Add(left, Literal(0)) => left
case Add(Literal(0), right) => right
}
Rule
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Yeah, it’s not a typo, just wanna avoid
saying dirty words all the time during the talk
:^)
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Inspired by brainfuck optimization strategies
by Mats Linander
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Goals
Illustrate the power and conciseness of the
Catalyst optimizer
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Goals
As short as possible so that I can squeeze it
into a single talk (292 loc)
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Non-goal
Build a high performance practical compiler
Brainsuck
Brainsuck is an optimizing compiler of the
Brainfuck programming language
- Contains
- Parser, IR, and interpreter of the language
- A simplified version of the Catalyst library, consists
of the TreeNode class and the rule execution engine
- A set of optimization rules
object MergeMoves extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Move(n, Move(m, next)) => if (n + m == 0) next else Move(n + m, next)
}
}
object MergeAdds extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Add(n, Add (m, next)) => if (n + m == 0) next else Add(n + m, next)
}
}
object Clears extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Loop(Add(_, Halt), next) => Clear(next)
}
}
object Scans extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Loop(Move(n, Halt), next) => Scan(n, next)
}
}
object MultisAndCopies extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transform {
case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next) if n == -offset =>
seq.foldRight(Clear(next): Instruction) {
case ((distance, 1), code) => Copy(distance, code)
case ((distance, increment), code) => Multi(distance, increment, code)
}
}
}
object MoveAddPairs {
type ResultType = (List[(Int, Int)], Int, Instruction)
def unapply(tree: Instruction): Option[ResultType] = {
def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match {
case Move(n, Add(m, inner)) =>
loop(inner, offset + n).map { case (seq, finalOffset, next) =>
((offset + n, m) :: seq, finalOffset, next)
}
case inner => Some((Nil, offset, inner))
}
loop(tree, 0)
}
}
Optimization rules in
Brainsuck
Optimization rules in
bfoptimization by Mats
Linander
Machine Model
A deadly simple simulation of Turing machine
0 0 3 …1
Instruction Set
Instruction Meaning
(Initial)
char array[infinitely large size] = {0};
char *ptr = array;
> ++ptr; // Move right
< --ptr; // Move left
+ ++*ptr; // Increment
- --*ptr; // Decrement
. putchar(*ptr); // Put a char
, *ptr = getchar(); // Read a char
[ while (*ptr) { // Loop until *ptr is 0
] } // End of the loop
Sample Program
++++++++[>++++[>++>+++>+++>
+<<<<-]>+>+>->>+[<]<-]>>.>
---.+++++++..+++.>>.<-.<.++
+.------.--------.>>+.>++.
Sample Program
++++++++[>++++[>++>+++>+++>
+<<<<-]>+>+>->>+[<]<-]>>.>
---.+++++++..+++.>>.<-.<.++
+.------.--------.>>+.>++.
Hello World!
Brainsuck Instruction Set
Instruction Meaning
Move(n) ptr += n;
Add(n) *ptr += n;
Loop(body) while (*ptr) { body; }
Out putchar(*ptr);
In *ptr = getchar();
Scan(n) while (*ptr) { ptr +=n; }
Clear *ptr = 0
Copy(offset) *(ptr + offset) = *ptr;
Multi(offset, n) *(ptr + offset) += (*ptr) * n;
Halt abort();
Brainsuck Instruction Set
Instruction Meaning
Move(n) ptr += n;
Add(n) *ptr += n;
Loop(body) while (*ptr) { body; }
Out putchar(*ptr);
In *ptr = getchar();
Scan(n) while (*ptr) { ptr +=n; }
Clear *ptr = 0
Copy(offset) *(ptr + offset) = *ptr;
Multi(offset, n) *(ptr + offset) += (*ptr) * n;
Halt abort();
For Optimization
TreeNode
Forms trees, and supports functional
transformations
trait TreeNode[BaseType <: TreeNode[BaseType]] {
self: BaseType =>
def children: Seq[BaseType]
protected def makeCopy(args: Seq[BaseType]): BaseType
def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
...
}
TreeNode
Forms trees, and supports functional
transformations
trait TreeNode[BaseType <: TreeNode[BaseType]] {
self: BaseType =>
def children: Seq[BaseType]
protected def makeCopy(args: Seq[BaseType]): BaseType
def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
...
}
TreeNode
Forms trees, and supports functional
transformations
trait TreeNode[BaseType <: TreeNode[BaseType]] {
self: BaseType =>
def children: Seq[BaseType]
protected def makeCopy(args: Seq[BaseType]): BaseType
def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
...
}
TreeNode
Forms trees, and supports functional
transformations
trait TreeNode[BaseType <: TreeNode[BaseType]] {
self: BaseType =>
def children: Seq[BaseType]
protected def makeCopy(args: Seq[BaseType]): BaseType
def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ...
...
}
TreeNode
Helper classes for leaf nodes and unary nodes
trait LeafNode[BaseType <: TreeNode[BaseType]] extends TreeNode[BaseType] {
self: BaseType =>
override def children = Seq.empty[BaseType]
override def makeCopy(args: Seq[BaseType]) = this
}
trait UnaryNode[BaseType <: TreeNode[BaseType]] extends TreeNode[BaseType] {
self: BaseType =>
def child: BaseType
override def children = Seq(child)
}
IR
Represented as tree nodes
sealed trait Instruction extends TreeNode[Instruction] {
def next: Instruction
def run(machine: Machine): Unit
}
sealed trait LeafInstruction extends Instruction with LeafNode[Instruction] {
self: Instruction =>
def next: Instruction = this
}
sealed trait UnaryInstruction extends Instruction with UnaryNode[Instruction] {
self: Instruction =>
def next: Instruction = child
}
IR
Examples
case object Halt extends LeafInstruction {
override def run(machine: Machine) = ()
}
case class Add(n: Int, child: Instruction) extends UnaryInstruction {
override protected def makeCopy(args: Seq[Instruction]) =
copy(child = args.head)
override def run(machine: Machine) = machine.value += n
}
case class Move(n: Int, child: Instruction) extends UnaryInstruction {
override protected def makeCopy(args: Seq[Instruction]) =
copy(child = args.head)
override def run(machine: Machine) = machine.pointer += n
}
IR
Examples
// +[<]+>.
Add(1,
Loop(
Move(-1,
Halt),
Add(1,
Move(1,
Out(
Halt)))))
Add 1
Loop
Add 1
Move 1
Move -1
Halt
Out
Halt
IR
Examples
// +[<]+>.
Add(1,
Loop(
Move(-1,
Halt),
Add(1,
Move(1,
Out(
Halt)))))
Add 1
Loop
Add 1
Move 1
Move -1
Halt
Out
HaltCPS Style
Rule Executor
Rules are functions organized as Batches
trait Rule[BaseType <: TreeNode[BaseType]] {
def apply(tree: BaseType): BaseType
}
case class Batch[BaseType <: TreeNode[BaseType]](
name: String,
rules: Seq[Rule[BaseType]],
strategy: Strategy)
Rule Executor
Each Batch has an execution strategy
sealed trait Strategy {
def maxIterations: Int
}
A Batch of rules can be repeatedly applied to a
tree according to some Strategy
Rule Executor
Each Batch has an execution strategy
sealed trait Strategy {
def maxIterations: Int
}
Apply once and only once
case object Once extends Strategy {
val maxIterations = 1
}
Rule Executor
Each Batch has an execution strategy
sealed trait Strategy {
def maxIterations: Int
}
Apply repeatedly until the tree doesn’t change
final case class FixedPoint(maxIterations: Int) extends Strategy
case object FixedPoint {
val Unlimited = FixedPoint(-1)
}
Finally…
Optimizations!
Optimization 1: Merge Moves
Patterns:
- >>>>
- <<<<
- >><<<>>
Basic idea:
- Merge adjacent moves to save instructions
Optimization 1: Merge Moves
Examples:
- Move(1, Move(1, next) ⇒
Move(2, next)
- Move(1, Move(-1, next) ⇒
next
Optimization 1: Merge Moves
object MergeMoves extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Move(n, Move(m, next)) =>
if (n + m == 0) next else Move(n + m, next)
}
}
Strategy:
- FixedPoint
We’d like to merge all adjacent moves until
none can be found
Optimization 2: Merge Adds
Patterns
- ++++
- ----
- ++---++
Basic idea:
- Merge adjacent adds to save instructions
Optimization 2: Merge Adds
Examples:
- Add(1, Add(1, next) ⇒
Add(2, next)
- Add(1, Add(-1, next) ⇒
next
Optimization 2: Merge Adds
object MergeAdds extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transformUp {
case Add(n, Add(m, next)) =>
if (n + m == 0) next else Add(n + m, next)
}
}
Strategy:
- FixedPoint
We’d like to merge all adjacent adds until
none can be merged
Optimization 3: Clears
Patterns
- [-]
- [--]
- [+]
Basic idea:
- These loops actually zero the current cell.
Find and transform them to Clears.
Optimization 3: Clears
Examples:
- Loop(Add(-1, Halt), next) ⇒
Clear(next)
- Loop(Add(2, Halt), next) ⇒
Clear(next)
Optimization 3: Clears
object Clears extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transform {
case Loop(Add(n, Halt), next) =>
Clear(next)
}
}
Strategy:
- Once
After merging all adds, we can find all “clear”
loops within a single run
Optimization 4: Scans
Patterns
- [<]
- [<<]
- [>]
Basic idea:
- These loops move the pointer to the most
recent zero cell in one direction. Find and
transform them to Scans.
Optimization 4: Scans
Examples:
- Loop(Move(-1, Halt), next) ⇒
Scan(-1, next)
- Loop(Move(2, Halt), next) ⇒
Scan(2, next)
Optimization 4: Scans
object Scans extends Rule[Instruction] {
override def apply(tree: Instruction) = tree.transform {
case Loop(Scan(n, Halt), next) =>
Scan(n, next)
}
}
Strategy:
- Once
After merging all adds, we can find all “scan”
loops within a single run
Optimization 5: Multiplications
Patterns:
- [->>++<<]
*(ptr + 2) += (*ptr) * 2;
- [->>++<<<--->]
*(ptr + 2) += (*ptr) * 2;
*(ptr - 1) += (*ptr) * (-3);
Optimization 5: Multiplications
0 2 0 0 0 ……
[->>++<<]
Optimization 5: Multiplications
0 1 0 0 0 ……
[->>++<<]
Optimization 5: Multiplications
0 1 0 0 0 ……
[->>++<<]
Optimization 5: Multiplications
0 1 0 2 0 ……
[->>++<<]
Optimization 5: Multiplications
0 1 0 2 0 ……
[->>++<<]
Optimization 5: Multiplications
0 1 0 2 0 ……
[->>++<<]
Optimization 5: Multiplications
0 0 0 2 0 ……
[->>++<<]
Optimization 5: Multiplications
0 0 0 2 0 ……
[->>++<<]
Optimization 5: Multiplications
0 0 0 4 0 ……
[->>++<<]
Optimization 5: Multiplications
0 0 0 4 0 ……
[->>++<<]
Optimization 5: Multiplications
0 0 0 4 0 ……
[->>++<<]
Optimization 5: Multiplications
Examples:
- Loop(
Add(-1,
Move(2,
Add(2,
Move(-2, Halt)))), next) ⇒
Multi(2, 2,
Clear(next))
Optimization 5: Multiplications
Examples:
- Loop(
Add(-1,
Move(2, Add(2,
Move(-3, Add(-3,
Move(1, Halt)))))), next) ⇒
Multi(2, 2,
Multi(-1, -3,
Clear(next)))
Optimization 5: Multiplications
This pattern is relatively harder to recognize,
since it has variable length:
tree.transform {
case Loop(Add(-1, ???)) => ???
}
Optimization 5: Multiplications
What we want to extract from the pattern?
- (offset,step) pairs
For generating corresponding Multi
instructions sequences
- finalOffset
To check whether the pointer is reset to the
original cell at the end of the loop body
Optimization 5: Multiplications
Examples:
- [->>++<<]
- Pairs: (2,2) :: Nil, finalOffset: 2
- [->>++<---<]
- Pairs: (2,2)::(1,-3)::Nil, finalOffset: 1
- [-<<<+++>-->>]
- Pairs: (-3,3)::(-2,-2)::Nil, finalOffset: -2
Extractor Objects
In Scala, patterns can be defined as extractor
objects with a method named unapply. This
methods can be used to recognize arbitrarily
complex patterns.
Optimization 5: Multiplications
An extractor object which recognizes move-add
pairs and extracts offsets and steps:
object MoveAddPairs {
type ResultType = (List[(Int, Int)], Int, Instruction)
def unapply(tree: Instruction): Option[ResultType] = {
def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match {
case Move(n, Add(m, inner)) =>
loop(inner, offset + n).map { case (seq, finalOffset, next) =>
((offset + n, m) :: seq, finalOffset, next)
}
case inner => Some((Nil, offset, inner))
}
loop(tree, 0)
}
}
Optimization 5: Multiplications
An extractor object which recognizes move-add
pairs and extracts offsets and steps:
object MoveAddPairs {
type ResultType = (List[(Int, Int)], Int, Instruction)
def unapply(tree: Instruction): Option[ResultType] = {
def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match {
case Move(n, Add(m, inner)) =>
loop(inner, offset + n).map { case (seq, finalOffset, next) =>
((offset + n, m) :: seq, finalOffset, next)
}
case inner => Some((Nil, offset, inner))
}
loop(tree, 0)
}
}
Optimization 5: Multiplications
object MultisAndCopies extends Rule[Instruction] {
override def apply(tree: Instruction): Instruction = tree.transform {
case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next)
if n == -offset =>
seq.foldRight(Clear(next): Instruction) {
case ((distance, increment), code) => Multi(distance, increment, code)
}
}
}
Strategy:
- Once
After merging moves and adds, we can find
all multiplication loops within a single run
Optimization 6: Copies
Patterns:
- [->>+<<]
- [->>+<<<+>]
Basic idea:
- These are actually special forms of
multiplication loops with 1 as multiplicand
- Replace Multi with a cheaper Copy
Optimization 6: Copies
Examples:
- Loop(
Add(-1,
Move(2, Add(1,
Move(-3, Add(1,
Move(1, Halt)))))), next) ⇒
Copy(2,
Copy(-1),
Clear(next)))
Optimization 6: Copies
object MultisAndCopies extends Rule[Instruction] {
override def apply(tree: Instruction): Instruction = tree.transform {
case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next)
if n == -offset =>
seq.foldRight(Clear(next): Instruction) {
case ((distance, 1), code) => Copy(distance, code)
case ((distance, increment), code) => Multi(distance, increment, code)
}
}
}
Strategy:
- Once
After merging moves and adds, we can find
all copy loops within a single run
Composing the Optimizer
val optimizer = new Optimizer {
override def batches = Seq(
Batch(
"Contraction",
MergeMoves :: MergeAdds :: Nil,
FixedPoint.Unlimited),
Batch(
"LoopSimplification",
Clears :: Scans :: MultisAndCopies :: Nil,
Once)
).take(optimizationLevel)
}
Demo: Hanoi Tower
Towers of Hanoi in Brainf*ck
Written by Clifford Wolf <http://www.clifford.at/bfcpu/>
xXXXXXXXXXx
xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx xXXXXXx
xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx xXXXXXXXXXXXXXXXXXXXXXXXXXx
----------------------------------- -----------------------------------
xXx
xXXXXXXXXXXXXXx
xXXXXXXXXXXXXXXXXXx
xXXXXXXXXXXXXXXXXXXXXXx
-----------------------------------
How does Spark SQL use Catalyst
The following structures are all represented as
TreeNode objects in Spark SQL
- Expressions
- Unresolved logical plans
- Resolved logical plans
- Optimized logical plans
- Physical plans
How does Spark SQL use Catalyst
How does Spark SQL use Catalyst
The actual Catalyst library in Spark SQL is
more complex than the one shown in this talk.
It’s used for the following purposes:
- Analysis (logical ⇒ logical)
- Logical optimization (logical ⇒ logical)
- Physical planning (logical ⇒ physical)
How does Spark SQL use Catalyst
References
- Spark SQL and Catalyst
- Deep Dive into Spark SQL’s Catalyst Optimizer
(Databricks blog)
- Spark SQL: Relational Data Processing in Spark
(Accepted by SIGMOD15)
- Brainfuck language and optimization
- Brainfuck optimization strategies
- Optimizing brainfuck compiler
- Wikipedia: Brainfuck
- Brainsuck code repository
- https://github.com/liancheng/brainsuck
Thanks
Q & A

More Related Content

What's hot

The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Beconfluent
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsDatabricks
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsDatabricks
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Julian Hyde
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer宇 傅
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafkaconfluent
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...Redis Labs
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevAltinity Ltd
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IODatabricks
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 

What's hot (20)

The Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to BeThe Future of ETL Isn't What It Used to Be
The Future of ETL Isn't What It Used to Be
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
 
Fine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark JobsFine Tuning and Enhancing Performance of Apache Spark Jobs
Fine Tuning and Enhancing Performance of Apache Spark Jobs
 
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
Apache Calcite: A Foundational Framework for Optimized Query Processing Over ...
 
The Volcano/Cascades Optimizer
The Volcano/Cascades OptimizerThe Volcano/Cascades Optimizer
The Volcano/Cascades Optimizer
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Top use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache KafkaTop use cases for 2022 with Data in Motion and Apache Kafka
Top use cases for 2022 with Data in Motion and Apache Kafka
 
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
JSON in Redis - When to use RedisJSON by Jay Won of Coupang - Redis Day Seatt...
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
How We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IOHow We Optimize Spark SQL Jobs With parallel and sync IO
How We Optimize Spark SQL Jobs With parallel and sync IO
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 

Viewers also liked

DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsCheng Lian
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityDatabricks
 
Austin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkAustin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkSteve Blackmon
 
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick WendellDatabricks
 
Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemrhatr
 
Spark fundamentals i (bd095 en) version #1: updated: april 2015
Spark fundamentals i (bd095 en) version #1: updated: april 2015Spark fundamentals i (bd095 en) version #1: updated: april 2015
Spark fundamentals i (bd095 en) version #1: updated: april 2015Ashutosh Sonaliya
 
Type Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset TransformsType Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset TransformsJohn Nestor
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Gabriele Modena
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
臺灣高中數學講義 - 第一冊 - 數與式
臺灣高中數學講義 - 第一冊 - 數與式臺灣高中數學講義 - 第一冊 - 數與式
臺灣高中數學講義 - 第一冊 - 數與式Xuan-Chao Huang
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsGabriele Modena
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015Chris Fregly
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?rhatr
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsSatya Narayan
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonVitthal Gogate
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystTakuya UESHIN
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellDatabricks
 

Viewers also liked (20)

DTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime InternalsDTCC '14 Spark Runtime Internals
DTCC '14 Spark Runtime Internals
 
Strata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark communityStrata NYC 2015 - What's coming for the Spark community
Strata NYC 2015 - What's coming for the Spark community
 
Apache streams 2015
Apache streams 2015Apache streams 2015
Apache streams 2015
 
Austin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - SparkAustin Data Meetup 092014 - Spark
Austin Data Meetup 092014 - Spark
 
Spark in the BigData dark
Spark in the BigData darkSpark in the BigData dark
Spark in the BigData dark
 
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.5 presented by Databricks co-founder Patrick Wendell
 
Unikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystemUnikernels: in search of a killer app and a killer ecosystem
Unikernels: in search of a killer app and a killer ecosystem
 
Spark fundamentals i (bd095 en) version #1: updated: april 2015
Spark fundamentals i (bd095 en) version #1: updated: april 2015Spark fundamentals i (bd095 en) version #1: updated: april 2015
Spark fundamentals i (bd095 en) version #1: updated: april 2015
 
Type Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset TransformsType Checking Scala Spark Datasets: Dataset Transforms
Type Checking Scala Spark Datasets: Dataset Transforms
 
Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2Full stack analytics with Hadoop 2
Full stack analytics with Hadoop 2
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
臺灣高中數學講義 - 第一冊 - 數與式
臺灣高中數學講義 - 第一冊 - 數與式臺灣高中數學講義 - 第一冊 - 數與式
臺灣高中數學講義 - 第一冊 - 數與式
 
Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?Apache Spark: killer or savior of Apache Hadoop?
Apache Spark: killer or savior of Apache Hadoop?
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Introduction to Spark SQL & Catalyst
Introduction to Spark SQL & CatalystIntroduction to Spark SQL & Catalyst
Introduction to Spark SQL & Catalyst
 
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick WendellApache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
Apache® Spark™ 1.6 presented by Databricks co-founder Patrick Wendell
 

Similar to Dive into Catalyst

Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovFwdays
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaSpark Summit
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Chris Fregly
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native CompilationPGConf APAC
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Rajeev Rastogi (KRR)
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowDatabricks
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
 

Similar to Dive into Catalyst (20)

Spark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-MallaKerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
Kerberizing Spark: Spark Summit East talk by Abel Rincon and Jorge Lopez-Malla
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + ...
 
Go Faster With Native Compilation
Go Faster With Native CompilationGo Faster With Native Compilation
Go Faster With Native Compilation
 
Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2Go faster with_native_compilation Part-2
Go faster with_native_compilation Part-2
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
Advance python
Advance pythonAdvance python
Advance python
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 

Recently uploaded

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 

Recently uploaded (20)

The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 

Dive into Catalyst

  • 1. Dive Into Catalyst QCon Beijing 2015 Cheng Lian <lian@databricks.com>
  • 2. “In almost every computation a great variety of arrangements for the succession of the processes is possible, and various considerations must influence the selection amongst them for the purposes of a Calculating Engine. One essential object is to choose that arrangement which shall tend to reduce to a minimum the time necessary for completing the calculation.” Ada Lovelace, 1843
  • 3. What Is Catalyst? Catalyst is a functional, extensible query optimizer used by Spark SQL. - Leverages advanced FP language (Scala) features - Contains a library for representing trees and applying rules on them
  • 4. Trees Tree is the main data structure used in Catalyst - A tree is composed of node objects - A node has a node type and zero or more children - Node types are defined in Scala as subclasses of the TreeNode class
  • 5. Trees Examples - Literal(value: Int) - Attribute(name: String) - Add(left: TreeNode, right: TreeNode) Add( Attribute(x), Add(Literal(1), Literal(2)))
  • 6. Rules Rules are functions that transform trees - Typically functional - Leverage pattern matching - Used together with - TreeNode.transform (synonym of transformDown) - TreeNode.transformDown (pre-order traversal) - TreeNode.transformUp (post-order traversal)
  • 7. Rules Examples - Simple constant folding tree.transform { case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2) case Add(left, Literal(0)) => left case Add(Literal(0), right) => right }
  • 8. Rules Examples - Simple constant folding tree.transform { case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2) case Add(left, Literal(0)) => left case Add(Literal(0), right) => right } Pattern
  • 9. Rules Examples - Simple constant folding tree.transform { case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2) case Add(left, Literal(0)) => left case Add(Literal(0), right) => right } Transformation
  • 10. Rules Examples - Simple constant folding tree.transform { case Add(Literal(c1), Literal(c2)) => Literal(c1 + c2) case Add(left, Literal(0)) => left case Add(Literal(0), right) => right } Rule
  • 11. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language
  • 12. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Yeah, it’s not a typo, just wanna avoid saying dirty words all the time during the talk :^)
  • 13. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Inspired by brainfuck optimization strategies by Mats Linander
  • 14. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Goals Illustrate the power and conciseness of the Catalyst optimizer
  • 15. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Goals As short as possible so that I can squeeze it into a single talk (292 loc)
  • 16. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Non-goal Build a high performance practical compiler
  • 17. Brainsuck Brainsuck is an optimizing compiler of the Brainfuck programming language - Contains - Parser, IR, and interpreter of the language - A simplified version of the Catalyst library, consists of the TreeNode class and the rule execution engine - A set of optimization rules
  • 18. object MergeMoves extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Move(n, Move(m, next)) => if (n + m == 0) next else Move(n + m, next) } } object MergeAdds extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Add(n, Add (m, next)) => if (n + m == 0) next else Add(n + m, next) } } object Clears extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Loop(Add(_, Halt), next) => Clear(next) } } object Scans extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Loop(Move(n, Halt), next) => Scan(n, next) } } object MultisAndCopies extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transform { case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next) if n == -offset => seq.foldRight(Clear(next): Instruction) { case ((distance, 1), code) => Copy(distance, code) case ((distance, increment), code) => Multi(distance, increment, code) } } } object MoveAddPairs { type ResultType = (List[(Int, Int)], Int, Instruction) def unapply(tree: Instruction): Option[ResultType] = { def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match { case Move(n, Add(m, inner)) => loop(inner, offset + n).map { case (seq, finalOffset, next) => ((offset + n, m) :: seq, finalOffset, next) } case inner => Some((Nil, offset, inner)) } loop(tree, 0) } } Optimization rules in Brainsuck Optimization rules in bfoptimization by Mats Linander
  • 19. Machine Model A deadly simple simulation of Turing machine 0 0 3 …1
  • 20. Instruction Set Instruction Meaning (Initial) char array[infinitely large size] = {0}; char *ptr = array; > ++ptr; // Move right < --ptr; // Move left + ++*ptr; // Increment - --*ptr; // Decrement . putchar(*ptr); // Put a char , *ptr = getchar(); // Read a char [ while (*ptr) { // Loop until *ptr is 0 ] } // End of the loop
  • 23. Brainsuck Instruction Set Instruction Meaning Move(n) ptr += n; Add(n) *ptr += n; Loop(body) while (*ptr) { body; } Out putchar(*ptr); In *ptr = getchar(); Scan(n) while (*ptr) { ptr +=n; } Clear *ptr = 0 Copy(offset) *(ptr + offset) = *ptr; Multi(offset, n) *(ptr + offset) += (*ptr) * n; Halt abort();
  • 24. Brainsuck Instruction Set Instruction Meaning Move(n) ptr += n; Add(n) *ptr += n; Loop(body) while (*ptr) { body; } Out putchar(*ptr); In *ptr = getchar(); Scan(n) while (*ptr) { ptr +=n; } Clear *ptr = 0 Copy(offset) *(ptr + offset) = *ptr; Multi(offset, n) *(ptr + offset) += (*ptr) * n; Halt abort(); For Optimization
  • 25. TreeNode Forms trees, and supports functional transformations trait TreeNode[BaseType <: TreeNode[BaseType]] { self: BaseType => def children: Seq[BaseType] protected def makeCopy(args: Seq[BaseType]): BaseType def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ... ... }
  • 26. TreeNode Forms trees, and supports functional transformations trait TreeNode[BaseType <: TreeNode[BaseType]] { self: BaseType => def children: Seq[BaseType] protected def makeCopy(args: Seq[BaseType]): BaseType def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ... ... }
  • 27. TreeNode Forms trees, and supports functional transformations trait TreeNode[BaseType <: TreeNode[BaseType]] { self: BaseType => def children: Seq[BaseType] protected def makeCopy(args: Seq[BaseType]): BaseType def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ... ... }
  • 28. TreeNode Forms trees, and supports functional transformations trait TreeNode[BaseType <: TreeNode[BaseType]] { self: BaseType => def children: Seq[BaseType] protected def makeCopy(args: Seq[BaseType]): BaseType def transform(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformDown(rule: PartialFunction[BaseType, BaseType]): BaseType = ... def transformUp(rule: PartialFunction[BaseType, BaseType]): BaseType = ... ... }
  • 29. TreeNode Helper classes for leaf nodes and unary nodes trait LeafNode[BaseType <: TreeNode[BaseType]] extends TreeNode[BaseType] { self: BaseType => override def children = Seq.empty[BaseType] override def makeCopy(args: Seq[BaseType]) = this } trait UnaryNode[BaseType <: TreeNode[BaseType]] extends TreeNode[BaseType] { self: BaseType => def child: BaseType override def children = Seq(child) }
  • 30. IR Represented as tree nodes sealed trait Instruction extends TreeNode[Instruction] { def next: Instruction def run(machine: Machine): Unit } sealed trait LeafInstruction extends Instruction with LeafNode[Instruction] { self: Instruction => def next: Instruction = this } sealed trait UnaryInstruction extends Instruction with UnaryNode[Instruction] { self: Instruction => def next: Instruction = child }
  • 31. IR Examples case object Halt extends LeafInstruction { override def run(machine: Machine) = () } case class Add(n: Int, child: Instruction) extends UnaryInstruction { override protected def makeCopy(args: Seq[Instruction]) = copy(child = args.head) override def run(machine: Machine) = machine.value += n } case class Move(n: Int, child: Instruction) extends UnaryInstruction { override protected def makeCopy(args: Seq[Instruction]) = copy(child = args.head) override def run(machine: Machine) = machine.pointer += n }
  • 34. Rule Executor Rules are functions organized as Batches trait Rule[BaseType <: TreeNode[BaseType]] { def apply(tree: BaseType): BaseType } case class Batch[BaseType <: TreeNode[BaseType]]( name: String, rules: Seq[Rule[BaseType]], strategy: Strategy)
  • 35. Rule Executor Each Batch has an execution strategy sealed trait Strategy { def maxIterations: Int } A Batch of rules can be repeatedly applied to a tree according to some Strategy
  • 36. Rule Executor Each Batch has an execution strategy sealed trait Strategy { def maxIterations: Int } Apply once and only once case object Once extends Strategy { val maxIterations = 1 }
  • 37. Rule Executor Each Batch has an execution strategy sealed trait Strategy { def maxIterations: Int } Apply repeatedly until the tree doesn’t change final case class FixedPoint(maxIterations: Int) extends Strategy case object FixedPoint { val Unlimited = FixedPoint(-1) }
  • 39. Optimization 1: Merge Moves Patterns: - >>>> - <<<< - >><<<>> Basic idea: - Merge adjacent moves to save instructions
  • 40. Optimization 1: Merge Moves Examples: - Move(1, Move(1, next) ⇒ Move(2, next) - Move(1, Move(-1, next) ⇒ next
  • 41. Optimization 1: Merge Moves object MergeMoves extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Move(n, Move(m, next)) => if (n + m == 0) next else Move(n + m, next) } } Strategy: - FixedPoint We’d like to merge all adjacent moves until none can be found
  • 42. Optimization 2: Merge Adds Patterns - ++++ - ---- - ++---++ Basic idea: - Merge adjacent adds to save instructions
  • 43. Optimization 2: Merge Adds Examples: - Add(1, Add(1, next) ⇒ Add(2, next) - Add(1, Add(-1, next) ⇒ next
  • 44. Optimization 2: Merge Adds object MergeAdds extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transformUp { case Add(n, Add(m, next)) => if (n + m == 0) next else Add(n + m, next) } } Strategy: - FixedPoint We’d like to merge all adjacent adds until none can be merged
  • 45. Optimization 3: Clears Patterns - [-] - [--] - [+] Basic idea: - These loops actually zero the current cell. Find and transform them to Clears.
  • 46. Optimization 3: Clears Examples: - Loop(Add(-1, Halt), next) ⇒ Clear(next) - Loop(Add(2, Halt), next) ⇒ Clear(next)
  • 47. Optimization 3: Clears object Clears extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transform { case Loop(Add(n, Halt), next) => Clear(next) } } Strategy: - Once After merging all adds, we can find all “clear” loops within a single run
  • 48. Optimization 4: Scans Patterns - [<] - [<<] - [>] Basic idea: - These loops move the pointer to the most recent zero cell in one direction. Find and transform them to Scans.
  • 49. Optimization 4: Scans Examples: - Loop(Move(-1, Halt), next) ⇒ Scan(-1, next) - Loop(Move(2, Halt), next) ⇒ Scan(2, next)
  • 50. Optimization 4: Scans object Scans extends Rule[Instruction] { override def apply(tree: Instruction) = tree.transform { case Loop(Scan(n, Halt), next) => Scan(n, next) } } Strategy: - Once After merging all adds, we can find all “scan” loops within a single run
  • 51. Optimization 5: Multiplications Patterns: - [->>++<<] *(ptr + 2) += (*ptr) * 2; - [->>++<<<--->] *(ptr + 2) += (*ptr) * 2; *(ptr - 1) += (*ptr) * (-3);
  • 52. Optimization 5: Multiplications 0 2 0 0 0 …… [->>++<<]
  • 53. Optimization 5: Multiplications 0 1 0 0 0 …… [->>++<<]
  • 54. Optimization 5: Multiplications 0 1 0 0 0 …… [->>++<<]
  • 55. Optimization 5: Multiplications 0 1 0 2 0 …… [->>++<<]
  • 56. Optimization 5: Multiplications 0 1 0 2 0 …… [->>++<<]
  • 57. Optimization 5: Multiplications 0 1 0 2 0 …… [->>++<<]
  • 58. Optimization 5: Multiplications 0 0 0 2 0 …… [->>++<<]
  • 59. Optimization 5: Multiplications 0 0 0 2 0 …… [->>++<<]
  • 60. Optimization 5: Multiplications 0 0 0 4 0 …… [->>++<<]
  • 61. Optimization 5: Multiplications 0 0 0 4 0 …… [->>++<<]
  • 62. Optimization 5: Multiplications 0 0 0 4 0 …… [->>++<<]
  • 63. Optimization 5: Multiplications Examples: - Loop( Add(-1, Move(2, Add(2, Move(-2, Halt)))), next) ⇒ Multi(2, 2, Clear(next))
  • 64. Optimization 5: Multiplications Examples: - Loop( Add(-1, Move(2, Add(2, Move(-3, Add(-3, Move(1, Halt)))))), next) ⇒ Multi(2, 2, Multi(-1, -3, Clear(next)))
  • 65. Optimization 5: Multiplications This pattern is relatively harder to recognize, since it has variable length: tree.transform { case Loop(Add(-1, ???)) => ??? }
  • 66. Optimization 5: Multiplications What we want to extract from the pattern? - (offset,step) pairs For generating corresponding Multi instructions sequences - finalOffset To check whether the pointer is reset to the original cell at the end of the loop body
  • 67. Optimization 5: Multiplications Examples: - [->>++<<] - Pairs: (2,2) :: Nil, finalOffset: 2 - [->>++<---<] - Pairs: (2,2)::(1,-3)::Nil, finalOffset: 1 - [-<<<+++>-->>] - Pairs: (-3,3)::(-2,-2)::Nil, finalOffset: -2
  • 68. Extractor Objects In Scala, patterns can be defined as extractor objects with a method named unapply. This methods can be used to recognize arbitrarily complex patterns.
  • 69. Optimization 5: Multiplications An extractor object which recognizes move-add pairs and extracts offsets and steps: object MoveAddPairs { type ResultType = (List[(Int, Int)], Int, Instruction) def unapply(tree: Instruction): Option[ResultType] = { def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match { case Move(n, Add(m, inner)) => loop(inner, offset + n).map { case (seq, finalOffset, next) => ((offset + n, m) :: seq, finalOffset, next) } case inner => Some((Nil, offset, inner)) } loop(tree, 0) } }
  • 70. Optimization 5: Multiplications An extractor object which recognizes move-add pairs and extracts offsets and steps: object MoveAddPairs { type ResultType = (List[(Int, Int)], Int, Instruction) def unapply(tree: Instruction): Option[ResultType] = { def loop(tree: Instruction, offset: Int): Option[ResultType] = tree match { case Move(n, Add(m, inner)) => loop(inner, offset + n).map { case (seq, finalOffset, next) => ((offset + n, m) :: seq, finalOffset, next) } case inner => Some((Nil, offset, inner)) } loop(tree, 0) } }
  • 71. Optimization 5: Multiplications object MultisAndCopies extends Rule[Instruction] { override def apply(tree: Instruction): Instruction = tree.transform { case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next) if n == -offset => seq.foldRight(Clear(next): Instruction) { case ((distance, increment), code) => Multi(distance, increment, code) } } } Strategy: - Once After merging moves and adds, we can find all multiplication loops within a single run
  • 72. Optimization 6: Copies Patterns: - [->>+<<] - [->>+<<<+>] Basic idea: - These are actually special forms of multiplication loops with 1 as multiplicand - Replace Multi with a cheaper Copy
  • 73. Optimization 6: Copies Examples: - Loop( Add(-1, Move(2, Add(1, Move(-3, Add(1, Move(1, Halt)))))), next) ⇒ Copy(2, Copy(-1), Clear(next)))
  • 74. Optimization 6: Copies object MultisAndCopies extends Rule[Instruction] { override def apply(tree: Instruction): Instruction = tree.transform { case Loop(Add(-1, MoveAddPairs(seq, offset, Move(n, Halt))), next) if n == -offset => seq.foldRight(Clear(next): Instruction) { case ((distance, 1), code) => Copy(distance, code) case ((distance, increment), code) => Multi(distance, increment, code) } } } Strategy: - Once After merging moves and adds, we can find all copy loops within a single run
  • 75. Composing the Optimizer val optimizer = new Optimizer { override def batches = Seq( Batch( "Contraction", MergeMoves :: MergeAdds :: Nil, FixedPoint.Unlimited), Batch( "LoopSimplification", Clears :: Scans :: MultisAndCopies :: Nil, Once) ).take(optimizationLevel) }
  • 76. Demo: Hanoi Tower Towers of Hanoi in Brainf*ck Written by Clifford Wolf <http://www.clifford.at/bfcpu/> xXXXXXXXXXx xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx xXXXXXx xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXx xXXXXXXXXXXXXXXXXXXXXXXXXXx ----------------------------------- ----------------------------------- xXx xXXXXXXXXXXXXXx xXXXXXXXXXXXXXXXXXx xXXXXXXXXXXXXXXXXXXXXXx -----------------------------------
  • 77.
  • 78. How does Spark SQL use Catalyst The following structures are all represented as TreeNode objects in Spark SQL - Expressions - Unresolved logical plans - Resolved logical plans - Optimized logical plans - Physical plans
  • 79. How does Spark SQL use Catalyst
  • 80. How does Spark SQL use Catalyst The actual Catalyst library in Spark SQL is more complex than the one shown in this talk. It’s used for the following purposes: - Analysis (logical ⇒ logical) - Logical optimization (logical ⇒ logical) - Physical planning (logical ⇒ physical)
  • 81. How does Spark SQL use Catalyst
  • 82. References - Spark SQL and Catalyst - Deep Dive into Spark SQL’s Catalyst Optimizer (Databricks blog) - Spark SQL: Relational Data Processing in Spark (Accepted by SIGMOD15) - Brainfuck language and optimization - Brainfuck optimization strategies - Optimizing brainfuck compiler - Wikipedia: Brainfuck - Brainsuck code repository - https://github.com/liancheng/brainsuck