Introduction to Spark SQL and Catalyst / Spark SQLおよびCalalystの紹介

Introduction to 
SparkSQL & Catalyst 
Takuya UESHIN 
! 
ScalaMatsuri2014 2014/09/06(Sat)
Who am I? 
Takuya UESHIN 
@ueshin 
github.com/ueshin 
Nautilus Technologies, Inc. 
A Spark contributor 
2
Agenda 
What is SparkSQL? 
Catalyst in depth 
SparkSQL core in depth 
Interesting issues 
How to contribute 
3
What is SparkSQL?
What is SparkSQL? 
SparkSQL is one of 
Spark components. 
Executes SQL on Spark 
Builds SchemaRDD like LINQ 
Optimizes execution plan. 
5
What is SparkSQL? 
Catalyst provides a execution planning 
framework for relational operations. 
Including: 
SQL parser & analyzer 
Logical operators & general expressions 
Logical optimizer 
A framework to transform operator tree. 
6
What is SparkSQL? 
To execute query needs some steps. 
Parse 
Analyze 
Logical Plan 
Optimize 
Physical Plan 
Execute 
7
What is SparkSQL? 
To execute query needs some steps. 
Parse 
Analyze 
Logical Plan 
Optimize 
Physical Plan 
Execute 
8 
Catalyst
What is SparkSQL? 
To execute query needs some steps. 
Parse 
Analyze 
Logical Plan 
Optimize 
Physical Plan 
Execute 
9 
SQL core
Catalyst in depth
Catalyst in depth 
Provides a execution planning framework for 
relational operations. 
Row & DataType’s 
Trees & Rules 
Logical Operators 
Expressions 
Optimizations 
11
Row & DataType’s 
o.a.s.sql.catalyst.types.DataType 
Long, Int, Short, Byte, Float, Double, Decimal 
String, Binary, Boolean, Timestamp 
Array, Map, Struct 
o.a.s.sql.catalyst.expressions.Row 
Represents a single row. 
Can contain complex types. 
12
Trees & Rules 
o.a.s.sql.catalyst.trees.TreeNode 
Provides transformations of tree. 
foreach, map, flatMap, collect 
transform, transformUp, 
transformDown 
Used for operator tree, expression tree. 
13
Trees & Rules 
o.a.s.sql.catalyst.rules.Rule 
Represents a tree transform rule. 
o.a.s.sql.catalyst.rules.RuleExecutor 
A framework to transform trees 
based on rules. 
14
Logical Operators 
Basic Operators 
Project, Filter, … 
Binary Operators 
Join, Except, Intersect, Union, … 
Aggregate 
Generate, Distinct 
Sort, Limit 
InsertInto, WriteToFile 
15 
Project 
Filter 
Join 
Table Table
Expressions 
Literal 
Arithmetics 
UnaryMinus, Sqrt, MaxOf 
Add, Subtract, Multiply, … 
Predicates 
EqualTo, LessThan, LessThanOrEqual, GreaterThan, 
GreaterThanOrEqual 
Not, And, Or, In, If, CaseWhen 
16 
+ 
1 2
Expressions 
Cast 
GetItem, GetField 
Coalesce, IsNull, IsNotNull 
StringOperations 
Like, Upper, Lower, Contains, 
StartsWith, EndsWith, Substring, … 
17
Optimizations 
ConstantFolding 
NullPropagation 
ConstantFolding 
BooleanSimplification 
SimplifyFilters 
FilterPushdown 
CombineFilters 
PushPredicateThroughProject 
PushPredicateThroughJoin 
ColumnPruning 
18
Optimizations 
NullPropagation, ConstantFolding 
Replace expressions that can be evaluated 
with some literal value to the value. 
ex) 
1 + null => null 
1 + 2 => 3 
Count(null) => 0 
19
Optimizations 
BooleanSimplification 
Simplifies boolean expressions that can be determined. 
ex) 
false AND $right => false 
true AND $right => $right 
true OR $right => true 
false OR $right => $right 
If(true, $then, $else) => $then 
20
Optimizations 
SimplifyFilters 
Removes filters that can be evaluated 
trivially. 
ex) 
Filter(true, child) => child 
Filter(false, child) => empty 
21
Optimizations 
CombineFilters 
Merges two filters. 
ex) 
Filter($fc, Filter($nc, child)) => 
Filter(AND($fc, $nc), child) 
22
Optimizations 
PushPredicateThroughProject 
Pushes Filter operators through 
Project operator. 
ex) 
Filter(‘i === 1, Project(‘i, ‘j, child)) 
=> 
Project(‘i, ‘j, Filter(‘i === 1, child)) 
23
Optimizations 
PushPredicateThroughJoin 
Pushes Filter operators through Join 
operator. 
ex) 
Project(“left.i”.attr === 1, Join(left, 
right) => 
Join(Project(‘i === 1, left), right) 
24
Optimizations 
ColumnPruning 
Eliminates the reading of unused 
columns. 
ex) 
Join(left, right, LeftSemi, 
“left.id”.attr === “right.id”.attr) => 
Join(Project(‘id, left), right, LeftSemi) 
25
SQL core in depth
SQL core in depth 
Provides: 
Physical operators to build RDD 
Conversion from Existing RDD of Product to 
SchemaRDD support 
Parquet file read/write support 
JSON file read support 
Columnar in-memory table support 
27
SQL core in depth 
o.a.s.sql.SchemaRDD 
Extends RDD[Row]. 
Has logical plan tree. 
Provides LINQ-like interfaces to construct 
logical plan. 
select, where, join, orderBy, … 
Executes the plan. 
28
SQL core in depth 
o.a.s.sql.execution.SparkStrategies 
Converts logical plan to physical. 
Some rules are based on statistics 
of the operators. 
29
SQL core in depth 
Parquet read/write support 
Parquet: Columnar storage format for Hadoop 
Reads existing Parquet files. 
Converts Parquet schema to row schema. 
Writes new Parquet files. 
Currently DecimalType and TimestampType 
are not supported. 
30
SQL core in depth 
JSON read support 
Loads a JSON file (one object per line) 
Infers row schema from the entire 
dataset. 
Giving the schema is experimental. 
Inferring the schema by sampling is 
also experimental. 
31
SQL core in depth 
Columnar in-memory table support 
Caches table like RDD.cache, but as 
columnar style. 
Can prune unnecessary columns 
when read data. 
32
Interesting issues
Interesting issues 
Support the GroupingSet/ROLLUP/CUBE 
https://issues.apache.org/jira/browse/SPARK-2663 
Use statistics to skip partitions when 
reading from in-memory columnar 
data 
https://issues.apache.org/jira/browse/SPARK-2961 
34
Interesting issues 
Pluggable interface for shuffles 
https://issues.apache.org/jira/browse/SPARK-2044 
Sort-merge join 
https://issues.apache.org/jira/browse/SPARK-2213 
Cost-based join reordering 
https://issues.apache.org/jira/browse/SPARK-2216 
35
How to contribute
How to contribute 
See: Contributing to Spark 
Open an issue on JIRA 
Send pull-request at GitHub 
Communicate with committers and 
reviewers 
Congratulations! 
37
Conclusion 
Introduced SparkSQL & Catalyst 
Now you know them very well! 
And you know how to contribute. 
! 
Let’s contribute to Spark & SparkSQL!! 
38
1 of 38

Recommended

The Evolution of Scala / Scala進化論 by
The Evolution of Scala / Scala進化論The Evolution of Scala / Scala進化論
The Evolution of Scala / Scala進化論scalaconfjp
5.2K views37 slides
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一 by
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一
Building a Unified Data Pipline in Spark / Apache Sparkを用いたBig Dataパイプラインの統一scalaconfjp
3.7K views37 slides
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ... by
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...
Scarab: SAT-based Constraint Programming System in Scala / Scala上で実現された制約プログラ...scalaconfjp
3.3K views60 slides
Scala @ TechMeetup Edinburgh by
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghStuart Roebuck
1.1K views28 slides
How Scala code is expressed in the JVM by
How Scala code is expressed in the JVMHow Scala code is expressed in the JVM
How Scala code is expressed in the JVMKoichi Sakata
8.3K views89 slides
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo by
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
3K views37 slides

More Related Content

What's hot

The Why and How of Scala at Twitter by
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
40.6K views31 slides
Scala Matsuri 2016: Japanese Text Mining with Scala and Spark by
Scala Matsuri 2016: Japanese Text Mining with Scala and SparkScala Matsuri 2016: Japanese Text Mining with Scala and Spark
Scala Matsuri 2016: Japanese Text Mining with Scala and SparkEduardo Gonzalez
3.9K views46 slides
A Brief, but Dense, Intro to Scala by
A Brief, but Dense, Intro to ScalaA Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaDerek Chen-Becker
1.7K views12 slides
Scala Refactoring for Fun and Profit (Japanese subtitles) by
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Tomer Gabel
6.6K views16 slides
A Brief Intro to Scala by
A Brief Intro to ScalaA Brief Intro to Scala
A Brief Intro to ScalaTim Underwood
23.1K views56 slides
Martin Odersky: What's next for Scala by
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for ScalaMarakana Inc.
4.4K views53 slides

What's hot(20)

The Why and How of Scala at Twitter by Alex Payne
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
Alex Payne40.6K views
Scala Matsuri 2016: Japanese Text Mining with Scala and Spark by Eduardo Gonzalez
Scala Matsuri 2016: Japanese Text Mining with Scala and SparkScala Matsuri 2016: Japanese Text Mining with Scala and Spark
Scala Matsuri 2016: Japanese Text Mining with Scala and Spark
Eduardo Gonzalez3.9K views
Scala Refactoring for Fun and Profit (Japanese subtitles) by Tomer Gabel
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel6.6K views
A Brief Intro to Scala by Tim Underwood
A Brief Intro to ScalaA Brief Intro to Scala
A Brief Intro to Scala
Tim Underwood23.1K views
Martin Odersky: What's next for Scala by Marakana Inc.
Martin Odersky: What's next for ScalaMartin Odersky: What's next for Scala
Martin Odersky: What's next for Scala
Marakana Inc.4.4K views
Scala Days San Francisco by Martin Odersky
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
Martin Odersky65.9K views
From Ruby to Scala by tod esking
From Ruby to ScalaFrom Ruby to Scala
From Ruby to Scala
tod esking7.9K views
Scala : language of the future by AnsviaLab
Scala : language of the futureScala : language of the future
Scala : language of the future
AnsviaLab854 views
Martin Odersky - Evolution of Scala by Scala Italy
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
Scala Italy1.6K views
Introduction to Scala by Saleem Ansari
Introduction to ScalaIntroduction to Scala
Introduction to Scala
Saleem Ansari2.6K views
What's a macro?: Learning by Examples / Scalaのマクロに実用例から触れてみよう! by scalaconfjp
What's a macro?: Learning by Examples / Scalaのマクロに実用例から触れてみよう!What's a macro?: Learning by Examples / Scalaのマクロに実用例から触れてみよう!
What's a macro?: Learning by Examples / Scalaのマクロに実用例から触れてみよう!
scalaconfjp3.2K views
Scala - The Simple Parts, SFScala presentation by Martin Odersky
Scala - The Simple Parts, SFScala presentationScala - The Simple Parts, SFScala presentation
Scala - The Simple Parts, SFScala presentation
Martin Odersky16.5K views

Viewers also liked

Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング by
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングscalaconfjp
2.9K views42 slides
Solid And Sustainable Development in Scala by
Solid And Sustainable Development in ScalaSolid And Sustainable Development in Scala
Solid And Sustainable Development in ScalaKazuhiro Sera
10.4K views48 slides
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri) by
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)TIS Inc.
4.8K views25 slides
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫 by
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫gree_tech
10.8K views58 slides
GitBucket: The perfect Github clone by Scala by
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scalatakezoe
26.7K views37 slides
sbt, past and future / sbt, 傾向と対策 by
sbt, past and future / sbt, 傾向と対策sbt, past and future / sbt, 傾向と対策
sbt, past and future / sbt, 傾向と対策scalaconfjp
4.5K views23 slides

Viewers also liked(14)

Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング by scalaconfjp
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディングXitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
Xitrum Web Framework Live Coding Demos / Xitrum Web Framework ライブコーディング
scalaconfjp2.9K views
Solid And Sustainable Development in Scala by Kazuhiro Sera
Solid And Sustainable Development in ScalaSolid And Sustainable Development in Scala
Solid And Sustainable Development in Scala
Kazuhiro Sera10.4K views
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri) by TIS Inc.
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
Scalable Generator: Using Scala in SIer Business (ScalaMatsuri)
TIS Inc.4.8K views
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫 by gree_tech
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫
[ScalaMatsuri] グリー初のscalaプロダクト!チャットサービス公開までの苦労と工夫
gree_tech10.8K views
GitBucket: The perfect Github clone by Scala by takezoe
GitBucket: The perfect Github clone by ScalaGitBucket: The perfect Github clone by Scala
GitBucket: The perfect Github clone by Scala
takezoe26.7K views
sbt, past and future / sbt, 傾向と対策 by scalaconfjp
sbt, past and future / sbt, 傾向と対策sbt, past and future / sbt, 傾向と対策
sbt, past and future / sbt, 傾向と対策
scalaconfjp4.5K views
Node.js vs Play Framework (with Japanese subtitles) by Yevgeniy Brikman
Node.js vs Play Framework (with Japanese subtitles)Node.js vs Play Framework (with Japanese subtitles)
Node.js vs Play Framework (with Japanese subtitles)
Yevgeniy Brikman42.5K views
芸者東京とScala〜おみせやさんから脳トレクエストまでの軌跡〜 by scalaconfjp
芸者東京とScala〜おみせやさんから脳トレクエストまでの軌跡〜芸者東京とScala〜おみせやさんから脳トレクエストまでの軌跡〜
芸者東京とScala〜おみせやさんから脳トレクエストまでの軌跡〜
scalaconfjp5.5K views
Scala が支える医療系ウェブサービス #jissenscala by Kazuhiro Sera
Scala が支える医療系ウェブサービス #jissenscalaScala が支える医療系ウェブサービス #jissenscala
Scala が支える医療系ウェブサービス #jissenscala
Kazuhiro Sera30.3K views
Scala@SmartNews AdFrontend を Scala で書いた話 by Keiji Muraishi
Scala@SmartNews AdFrontend を Scala で書いた話Scala@SmartNews AdFrontend を Scala で書いた話
Scala@SmartNews AdFrontend を Scala で書いた話
Keiji Muraishi12.1K views
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala by takezoe
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscalaビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
ビズリーチの新サービスをScalaで作ってみた 〜マイクロサービスの裏側 #jissenscala
takezoe26.6K views
あなたのScalaを爆速にする7つの方法 by x1 ichi
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
x1 ichi9.1K views
Scala Warrior and type-safe front-end development with Scala.js by takezoe
Scala Warrior and type-safe front-end development with Scala.jsScala Warrior and type-safe front-end development with Scala.js
Scala Warrior and type-safe front-end development with Scala.js
takezoe9.3K views

Similar to Introduction to Spark SQL and Catalyst / Spark SQLおよびCalalystの紹介

20140908 spark sql & catalyst by
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalystTakuya UESHIN
8.8K views44 slides
Spark sql meetup by
Spark sql meetupSpark sql meetup
Spark sql meetupMichael Zhang
7.1K views30 slides
Spark Summit EU talk by Herman van Hovell by
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van HovellSpark Summit
2.3K views50 slides
Introduction to Spark - DataFactZ by
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
1.6K views63 slides
Meetup ml spark_ppt by
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_pptSnehal Nagmote
1K views51 slides
Spark SQL In Depth www.syedacademy.com by
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSyed Hadoop
218 views35 slides

Similar to Introduction to Spark SQL and Catalyst / Spark SQLおよびCalalystの紹介(20)

20140908 spark sql & catalyst by Takuya UESHIN
20140908 spark sql & catalyst20140908 spark sql & catalyst
20140908 spark sql & catalyst
Takuya UESHIN8.8K views
Spark Summit EU talk by Herman van Hovell by Spark Summit
Spark Summit EU talk by Herman van HovellSpark Summit EU talk by Herman van Hovell
Spark Summit EU talk by Herman van Hovell
Spark Summit2.3K views
Introduction to Spark - DataFactZ by DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
DataFactZ1.6K views
Spark SQL In Depth www.syedacademy.com by Syed Hadoop
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop218 views
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python by Christian Perone
Apache Spark - Intro to Large-scale recommendations with Apache Spark and PythonApache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Apache Spark - Intro to Large-scale recommendations with Apache Spark and Python
Christian Perone9.5K views
Hyperspace: An Indexing Subsystem for Apache Spark by Databricks
Hyperspace: An Indexing Subsystem for Apache SparkHyperspace: An Indexing Subsystem for Apache Spark
Hyperspace: An Indexing Subsystem for Apache Spark
Databricks720 views
Apache spark sneha challa- google pittsburgh-aug 25th by Sneha Challa
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
Sneha Challa578 views
Spark - The Ultimate Scala Collections by Martin Odersky by Spark Summit
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
Spark Summit5.7K views
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production by Chetan Khatri
ScalaTo July 2019 - No more struggles with Apache Spark workloads in productionScalaTo July 2019 - No more struggles with Apache Spark workloads in production
ScalaTo July 2019 - No more struggles with Apache Spark workloads in production
Chetan Khatri289 views
Spark Streaming @ Berlin Apache Spark Meetup, March 2015 by Stratio
Spark Streaming @ Berlin Apache Spark Meetup, March 2015Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio2.1K views
Spark SQL Deep Dive @ Melbourne Spark Meetup by Databricks
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks9K views
Jump Start on Apache Spark 2.2 with Databricks by Anyscale
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
Anyscale976 views
Introduction to Spark Datasets - Functional and relational together at last by Holden Karau
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
Holden Karau640 views
Fast federated SQL with Apache Calcite by Chris Baynes
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
Chris Baynes1.4K views

More from scalaconfjp

脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~ by
脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~
脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~scalaconfjp
112 views50 slides
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会 by
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会scalaconfjp
72 views20 slides
GraalVM Overview Compact version by
GraalVM Overview Compact versionGraalVM Overview Compact version
GraalVM Overview Compact versionscalaconfjp
2K views70 slides
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by... by
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...scalaconfjp
842 views37 slides
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視... by
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...scalaconfjp
224 views16 slides
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeau by
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan GoyeauScala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeau
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeauscalaconfjp
318 views79 slides

More from scalaconfjp(20)

脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~ by scalaconfjp
脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~
脆弱性対策のためのClean Architecture ~脆弱性に対するレジリエンスを確保せよ~
scalaconfjp112 views
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会 by scalaconfjp
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会
Alp x BizReach SaaS事業を営む2社がお互い気になることをゆるゆる聞いてみる会
scalaconfjp72 views
GraalVM Overview Compact version by scalaconfjp
GraalVM Overview Compact versionGraalVM Overview Compact version
GraalVM Overview Compact version
scalaconfjp2K views
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by... by scalaconfjp
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
Run Scala Faster with GraalVM on any Platform / GraalVMで、どこでもScalaを高速実行しよう by...
scalaconfjp842 views
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視... by scalaconfjp
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...
Monitoring Reactive Architecture Like Never Before / 今までになかったリアクティブアーキテクチャの監視...
scalaconfjp224 views
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeau by scalaconfjp
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan GoyeauScala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeau
Scala 3, what does it means for me? / Scala 3って、私にはどんな影響があるの? by Joan Goyeau
scalaconfjp318 views
Functional Object-Oriented Imperative Scala / 関数型オブジェクト指向命令型 Scala by Sébasti... by scalaconfjp
Functional Object-Oriented Imperative Scala / 関数型オブジェクト指向命令型 Scala by Sébasti...Functional Object-Oriented Imperative Scala / 関数型オブジェクト指向命令型 Scala by Sébasti...
Functional Object-Oriented Imperative Scala / 関数型オブジェクト指向命令型 Scala by Sébasti...
scalaconfjp323 views
Scala ♥ Graal by Flavio Brasil by scalaconfjp
Scala ♥ Graal by Flavio BrasilScala ♥ Graal by Flavio Brasil
Scala ♥ Graal by Flavio Brasil
scalaconfjp315 views
Introduction to GraphQL in Scala by scalaconfjp
Introduction to GraphQL in ScalaIntroduction to GraphQL in Scala
Introduction to GraphQL in Scala
scalaconfjp721 views
Safety Beyond Types by scalaconfjp
Safety Beyond TypesSafety Beyond Types
Safety Beyond Types
scalaconfjp551 views
Reactive Kafka with Akka Streams by scalaconfjp
Reactive Kafka with Akka StreamsReactive Kafka with Akka Streams
Reactive Kafka with Akka Streams
scalaconfjp681 views
Reactive microservices with play and akka by scalaconfjp
Reactive microservices with play and akkaReactive microservices with play and akka
Reactive microservices with play and akka
scalaconfjp2K views
Scalaに対して意識の低いエンジニアがScalaで何したかの話, by 芸者東京エンターテインメント by scalaconfjp
Scalaに対して意識の低いエンジニアがScalaで何したかの話, by 芸者東京エンターテインメントScalaに対して意識の低いエンジニアがScalaで何したかの話, by 芸者東京エンターテインメント
Scalaに対して意識の低いエンジニアがScalaで何したかの話, by 芸者東京エンターテインメント
scalaconfjp810 views
DWANGO by ドワンゴ by scalaconfjp
DWANGO by ドワンゴDWANGO by ドワンゴ
DWANGO by ドワンゴ
scalaconfjp636 views
OCTOPARTS by M3, Inc. by scalaconfjp
OCTOPARTS by M3, Inc.OCTOPARTS by M3, Inc.
OCTOPARTS by M3, Inc.
scalaconfjp1.2K views
Try using Aeromock by Marverick, Inc. by scalaconfjp
Try using Aeromock by Marverick, Inc.Try using Aeromock by Marverick, Inc.
Try using Aeromock by Marverick, Inc.
scalaconfjp1.2K views
統計をとって高速化する
Scala開発 by CyberZ,Inc. by scalaconfjp
統計をとって高速化する
Scala開発 by CyberZ,Inc.統計をとって高速化する
Scala開発 by CyberZ,Inc.
統計をとって高速化する
Scala開発 by CyberZ,Inc.
scalaconfjp963 views
Short Introduction of Implicit Conversion by TIS, Inc. by scalaconfjp
Short Introduction of Implicit Conversion by TIS, Inc.Short Introduction of Implicit Conversion by TIS, Inc.
Short Introduction of Implicit Conversion by TIS, Inc.
scalaconfjp550 views
ビズリーチ x ScalaMatsuri by BIZREACH, Inc. by scalaconfjp
ビズリーチ x ScalaMatsuri  by BIZREACH, Inc.ビズリーチ x ScalaMatsuri  by BIZREACH, Inc.
ビズリーチ x ScalaMatsuri by BIZREACH, Inc.
scalaconfjp907 views
Solid and Sustainable Development in Scala by scalaconfjp
Solid and Sustainable Development in ScalaSolid and Sustainable Development in Scala
Solid and Sustainable Development in Scala
scalaconfjp539 views

Recently uploaded

ict act 1.pptx by
ict act 1.pptxict act 1.pptx
ict act 1.pptxsanjaniarun08
13 views17 slides
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... by
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...Deltares
16 views12 slides
El Arte de lo Possible by
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo PossibleNeo4j
38 views35 slides
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan... by
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...Deltares
11 views30 slides
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...Marc Müller
38 views62 slides
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea... by
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Safe Software
412 views59 slides

Recently uploaded(20)

DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge... by Deltares
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
DSD-INT 2023 Delft3D FM Suite 2024.01 2D3D - New features + Improvements - Ge...
Deltares16 views
El Arte de lo Possible by Neo4j
El Arte de lo PossibleEl Arte de lo Possible
El Arte de lo Possible
Neo4j38 views
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan... by Deltares
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
DSD-INT 2023 Baseline studies for Strategic Coastal protection for Long Islan...
Deltares11 views
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra... by Marc Müller
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra....NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
.NET Developer Conference 2023 - .NET Microservices mit Dapr – zu viel Abstra...
Marc Müller38 views
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea... by Safe Software
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Geospatial Synergy: Amplifying Efficiency with FME & Esri ft. Peak Guest Spea...
Safe Software412 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares11 views
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t... by Deltares
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
DSD-INT 2023 Thermobaricity in 3D DCSM-FM - taking pressure into account in t...
Deltares9 views
Software evolution understanding: Automatic extraction of software identifier... by Ra'Fat Al-Msie'deen
Software evolution understanding: Automatic extraction of software identifier...Software evolution understanding: Automatic extraction of software identifier...
Software evolution understanding: Automatic extraction of software identifier...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove... by Deltares
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
DSD-INT 2023 HydroMT model building and river-coast coupling in Python - Bove...
Deltares17 views
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit... by Deltares
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
DSD-INT 2023 FloodAdapt - A decision-support tool for compound flood risk mit...
Deltares13 views
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ... by Donato Onofri
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Unmasking the Dark Art of Vectored Exception Handling: Bypassing XDR and EDR ...
Donato Onofri711 views
Fleet Management Software in India by Fleetable
Fleet Management Software in India Fleet Management Software in India
Fleet Management Software in India
Fleetable11 views
Citi TechTalk Session 2: Kafka Deep Dive by confluent
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent17 views
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI... by Marc Müller
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Dev-Cloud Conference 2023 - Continuous Deployment Showdown: Traditionelles CI...
Marc Müller36 views
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida by Deltares
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - PridaDSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
DSD-INT 2023 Dam break simulation in Derna (Libya) using HydroMT_SFINCS - Prida
Deltares18 views
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -... by Deltares
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
DSD-INT 2023 Simulating a falling apron in Delft3D 4 - Engineering Practice -...
Deltares6 views
What Can Employee Monitoring Software Do?​ by wAnywhere
What Can Employee Monitoring Software Do?​What Can Employee Monitoring Software Do?​
What Can Employee Monitoring Software Do?​
wAnywhere21 views

Introduction to Spark SQL and Catalyst / Spark SQLおよびCalalystの紹介

  • 1. Introduction to SparkSQL & Catalyst Takuya UESHIN ! ScalaMatsuri2014 2014/09/06(Sat)
  • 2. Who am I? Takuya UESHIN @ueshin github.com/ueshin Nautilus Technologies, Inc. A Spark contributor 2
  • 3. Agenda What is SparkSQL? Catalyst in depth SparkSQL core in depth Interesting issues How to contribute 3
  • 5. What is SparkSQL? SparkSQL is one of Spark components. Executes SQL on Spark Builds SchemaRDD like LINQ Optimizes execution plan. 5
  • 6. What is SparkSQL? Catalyst provides a execution planning framework for relational operations. Including: SQL parser & analyzer Logical operators & general expressions Logical optimizer A framework to transform operator tree. 6
  • 7. What is SparkSQL? To execute query needs some steps. Parse Analyze Logical Plan Optimize Physical Plan Execute 7
  • 8. What is SparkSQL? To execute query needs some steps. Parse Analyze Logical Plan Optimize Physical Plan Execute 8 Catalyst
  • 9. What is SparkSQL? To execute query needs some steps. Parse Analyze Logical Plan Optimize Physical Plan Execute 9 SQL core
  • 11. Catalyst in depth Provides a execution planning framework for relational operations. Row & DataType’s Trees & Rules Logical Operators Expressions Optimizations 11
  • 12. Row & DataType’s o.a.s.sql.catalyst.types.DataType Long, Int, Short, Byte, Float, Double, Decimal String, Binary, Boolean, Timestamp Array, Map, Struct o.a.s.sql.catalyst.expressions.Row Represents a single row. Can contain complex types. 12
  • 13. Trees & Rules o.a.s.sql.catalyst.trees.TreeNode Provides transformations of tree. foreach, map, flatMap, collect transform, transformUp, transformDown Used for operator tree, expression tree. 13
  • 14. Trees & Rules o.a.s.sql.catalyst.rules.Rule Represents a tree transform rule. o.a.s.sql.catalyst.rules.RuleExecutor A framework to transform trees based on rules. 14
  • 15. Logical Operators Basic Operators Project, Filter, … Binary Operators Join, Except, Intersect, Union, … Aggregate Generate, Distinct Sort, Limit InsertInto, WriteToFile 15 Project Filter Join Table Table
  • 16. Expressions Literal Arithmetics UnaryMinus, Sqrt, MaxOf Add, Subtract, Multiply, … Predicates EqualTo, LessThan, LessThanOrEqual, GreaterThan, GreaterThanOrEqual Not, And, Or, In, If, CaseWhen 16 + 1 2
  • 17. Expressions Cast GetItem, GetField Coalesce, IsNull, IsNotNull StringOperations Like, Upper, Lower, Contains, StartsWith, EndsWith, Substring, … 17
  • 18. Optimizations ConstantFolding NullPropagation ConstantFolding BooleanSimplification SimplifyFilters FilterPushdown CombineFilters PushPredicateThroughProject PushPredicateThroughJoin ColumnPruning 18
  • 19. Optimizations NullPropagation, ConstantFolding Replace expressions that can be evaluated with some literal value to the value. ex) 1 + null => null 1 + 2 => 3 Count(null) => 0 19
  • 20. Optimizations BooleanSimplification Simplifies boolean expressions that can be determined. ex) false AND $right => false true AND $right => $right true OR $right => true false OR $right => $right If(true, $then, $else) => $then 20
  • 21. Optimizations SimplifyFilters Removes filters that can be evaluated trivially. ex) Filter(true, child) => child Filter(false, child) => empty 21
  • 22. Optimizations CombineFilters Merges two filters. ex) Filter($fc, Filter($nc, child)) => Filter(AND($fc, $nc), child) 22
  • 23. Optimizations PushPredicateThroughProject Pushes Filter operators through Project operator. ex) Filter(‘i === 1, Project(‘i, ‘j, child)) => Project(‘i, ‘j, Filter(‘i === 1, child)) 23
  • 24. Optimizations PushPredicateThroughJoin Pushes Filter operators through Join operator. ex) Project(“left.i”.attr === 1, Join(left, right) => Join(Project(‘i === 1, left), right) 24
  • 25. Optimizations ColumnPruning Eliminates the reading of unused columns. ex) Join(left, right, LeftSemi, “left.id”.attr === “right.id”.attr) => Join(Project(‘id, left), right, LeftSemi) 25
  • 26. SQL core in depth
  • 27. SQL core in depth Provides: Physical operators to build RDD Conversion from Existing RDD of Product to SchemaRDD support Parquet file read/write support JSON file read support Columnar in-memory table support 27
  • 28. SQL core in depth o.a.s.sql.SchemaRDD Extends RDD[Row]. Has logical plan tree. Provides LINQ-like interfaces to construct logical plan. select, where, join, orderBy, … Executes the plan. 28
  • 29. SQL core in depth o.a.s.sql.execution.SparkStrategies Converts logical plan to physical. Some rules are based on statistics of the operators. 29
  • 30. SQL core in depth Parquet read/write support Parquet: Columnar storage format for Hadoop Reads existing Parquet files. Converts Parquet schema to row schema. Writes new Parquet files. Currently DecimalType and TimestampType are not supported. 30
  • 31. SQL core in depth JSON read support Loads a JSON file (one object per line) Infers row schema from the entire dataset. Giving the schema is experimental. Inferring the schema by sampling is also experimental. 31
  • 32. SQL core in depth Columnar in-memory table support Caches table like RDD.cache, but as columnar style. Can prune unnecessary columns when read data. 32
  • 34. Interesting issues Support the GroupingSet/ROLLUP/CUBE https://issues.apache.org/jira/browse/SPARK-2663 Use statistics to skip partitions when reading from in-memory columnar data https://issues.apache.org/jira/browse/SPARK-2961 34
  • 35. Interesting issues Pluggable interface for shuffles https://issues.apache.org/jira/browse/SPARK-2044 Sort-merge join https://issues.apache.org/jira/browse/SPARK-2213 Cost-based join reordering https://issues.apache.org/jira/browse/SPARK-2216 35
  • 37. How to contribute See: Contributing to Spark Open an issue on JIRA Send pull-request at GitHub Communicate with committers and reviewers Congratulations! 37
  • 38. Conclusion Introduced SparkSQL & Catalyst Now you know them very well! And you know how to contribute. ! Let’s contribute to Spark & SparkSQL!! 38