SlideShare a Scribd company logo
1 of 26
Spark
-
The Ultimate Scala
Collections
Martin Odersky
Scala is
• functional
• object-oriented
• statically typed
• interoperates well with Java and Javascript
1st Invariant: A Scalable Language
• Flexible Syntax
• Flexible Types
• User-definable operators
• Higher-order functions
• Implicits
...
Make it relatively easy to build new DSLs on top of
Scala
3
DSLs on top of Scala
4
SBT
Chisel
Spray
Dispatch
Akka
ScalaTest
Squeryl
Specs
shapeless
Scalaz
Slick
Spiral
Opti{X}
Scala and Spark
5
JVM
Scala Runtime
Spark Runtime Cluster Manager
(e.g. Yarn, Mesos)
File System (e.g.
HDFS, Cassandra)
Scala REPL Scala Compiler
- A domain specific language
- Implemented in Scala
- Embedded in Scala as a host language
Two Reasons for DSLs
(1) Support new syntax.
(2) Support new functionality.
I find (2) much more interesting than (1).
What Kind of DSL is Spark?
• Centered around collections.
• Immutable data sets equipped with functional
transformers.
map reduce union
flatMap fold intersection
filter aggregate ...
These are exactly the Scala collection operations!
Spark vs Scala Collections
• So, is Spark exactly Scala Collections, but
running in a cluster?
• Not quite. Two main differences
• Spark is lazy, Scala collections are strict.
• Spark has added functionality, e.g. PairRDDs.
Collections Design Choices
Imperative Functional
Strict Lazy
vs
vs
java.util scala.collection.immutable
Scala
OCaml
C#
Spark
Scala Stream, views
Strict vs Lazy
val xs = data.map(f)
xs.filter(p).sum
xs.take(10).toString
Strict: map is evaluated once,
produces intermediate list xs
Lazy: map is evaluated twice,
no intermediate list.
Lazy needs a purely functional architecture.
How Scala Can Learn from Spark
• Redo views for Scala collections:
val ys = xs
.view
.map(f)
.filter(p)
.force
Lifting to views makes it clear when things get
evaluated.
• Add cache() operation for persistence
How Scala Can Learn from Spark
• Redo views for Scala collections:
val ys = xs
.view
.map(f)
.filter(p)
.to(Vector)
Lifting to views makes it clear when things get
evaluated.
• Add cache() operation for persistence
How Scala Can Learn from Spark
• Add pairwise operations such as
reduceByKey
groupByKey
lookup
• These work on a sequences of pairs,
analogously to PairRDDs.
• Provides a lightweight way to add map-like
operations to sequences.
• Most of the advanced types concepts are about flexibility,
less so about safety.
2nd Scala Invariant: It’s about the Types
14
Flexibility / Ease of Use
Safety
Scala
Trend in Type-systems
Goals of PL design
where we’d like it to move
Spark is A Multi-Language Platform
Supported Programming Languages:
• Python
• Scala
• Java
• R
Why Scala instead of Python?
• Native to Spark, can use
everything without translation.
• Types help (a lot).
Why Types Help
Functional operations do not have hidden
dependencies
- inputs are parameters
- outputs are results
• Hence, every interaction is given a type.
• Logic errors usually translate into type errors.
• This is a great asset for programming, and the
same should hold for data science!
How can Scala help Spark?
• Developer experience
• Infrastructure
• Spores
• Fusion
Infrastructure
18
JVM
Scala Runtime
Spark Runtime Cluster Manager
(e.g. Yarn, Mesos)
File System (e.g.
HDFS, Cassandra)
Scala REPL Scala CompilerJLine handling
Autocompletion
Use Lambdas, default methods
Serialization
Java 8 streams
Spores
• Problem: Closures use Java serialization, can
drag a very large dependency graph.
class C {
val data = someLargeCollection
val sum: Int = data.sum
doLater(x => println(sum))
}
All of data is serialized in the closure.
Spores: Safely serializable closures
Idea: Make explicit what a closure captures:
class C {
val data = someLargeCollection
val sum: Int = data.sum
doLater {
val s = sum
x => println(s))
}
Spores
Idea: Make explicit what a closure captures:
class C {
val data = someLargeCollection
val sum: Int = data.sum
doLater {
Spore {
val s = sum
x => println(s))
}
}
Spores make sure no hidden dependencies
remain.
Fusion
• Problem: Many Spark jobs are compute bound.
 Kay Ousterhout “Making sense of Spark performance”
- Lazy collections avoid intermediate results.
- Overhead because every operation is a separate closure.
Fusion
• Problem: Many Spark jobs are compute bound.
 Kay Ousterhout “Making sense of Spark performance”
- Lazy collections avoid intermediate results.
- Overhead because every operation is a separate closure.
• Fusion cuts through this, can combine several
closures in a tight loop.
• Spark implements fusion for selected operations
using dataframes.
• Would like to do the same for general Scala code.
The Dotty Linker
• A smart, whole program optimizer
• Analyses and tunes program and dependencies.
• removes dead code
• eliminates virtual dispatch, boxing, closures
PageRank example, 80K nodes:
Bytecode Objects Virtual CPU branch Running
size allocated calls miss rate time
Original 60KB app 3M 24M 12% 3.2 sec
+ 5.5MB lib
Optimized 920KB 490K 3M 3% 1.7 sec
Instead of a Summary
• Spark and Scala are beautiful examples of what
can be achieved by a bunch of dedicated grad
students using a language & system originally
written by another bunch of dedicated grad
students.
• I am looking forward to the next steps of their co-
evolution.
Thank You
Martin Odersky

More Related Content

What's hot

What's hot (20)

Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
Performance Optimization Case Study: Shattering Hadoop's Sort Record with Spa...
 
Distributed ML in Apache Spark
Distributed ML in Apache SparkDistributed ML in Apache Spark
Distributed ML in Apache Spark
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Big Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and ZeppelinBig Data visualization with Apache Spark and Zeppelin
Big Data visualization with Apache Spark and Zeppelin
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0Simplifying Big Data Applications with Apache Spark 2.0
Simplifying Big Data Applications with Apache Spark 2.0
 
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi KeynoteSpark Summit San Francisco 2016 - Ali Ghodsi Keynote
Spark Summit San Francisco 2016 - Ali Ghodsi Keynote
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn WhittickSpark Summit EU talk by Emlyn Whittick
Spark Summit EU talk by Emlyn Whittick
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
 
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced AnalyticsETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
ETL to ML: Use Apache Spark as an end to end tool for Advanced Analytics
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Lessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark WorkloadsLessons from Running Large Scale Spark Workloads
Lessons from Running Large Scale Spark Workloads
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and AkkaStreaming Analytics with Spark, Kafka, Cassandra and Akka
Streaming Analytics with Spark, Kafka, Cassandra and Akka
 

Similar to Spark - The Ultimate Scala Collections by Martin Odersky

Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
Scala Italy
 

Similar to Spark - The Ultimate Scala Collections by Martin Odersky (20)

Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
 
Scala Days San Francisco
Scala Days San FranciscoScala Days San Francisco
Scala Days San Francisco
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Spark real world use cases and optimizations
Spark real world use cases and optimizationsSpark real world use cases and optimizations
Spark real world use cases and optimizations
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Why Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data EraWhy Functional Programming Is Important in Big Data Era
Why Functional Programming Is Important in Big Data Era
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
Spark core
Spark coreSpark core
Spark core
 
Scala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big DataScala and Spark are Ideal for Big Data
Scala and Spark are Ideal for Big Data
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
 
Ten tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache SparkTen tools for ten big data areas 03_Apache Spark
Ten tools for ten big data areas 03_Apache Spark
 
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
 
Devoxx
DevoxxDevoxx
Devoxx
 
Develop realtime web with Scala and Xitrum
Develop realtime web with Scala and XitrumDevelop realtime web with Scala and Xitrum
Develop realtime web with Scala and Xitrum
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 

More from Spark Summit

Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Recently uploaded (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Spark - The Ultimate Scala Collections by Martin Odersky

  • 2. Scala is • functional • object-oriented • statically typed • interoperates well with Java and Javascript
  • 3. 1st Invariant: A Scalable Language • Flexible Syntax • Flexible Types • User-definable operators • Higher-order functions • Implicits ... Make it relatively easy to build new DSLs on top of Scala 3
  • 4. DSLs on top of Scala 4 SBT Chisel Spray Dispatch Akka ScalaTest Squeryl Specs shapeless Scalaz Slick Spiral Opti{X}
  • 5. Scala and Spark 5 JVM Scala Runtime Spark Runtime Cluster Manager (e.g. Yarn, Mesos) File System (e.g. HDFS, Cassandra) Scala REPL Scala Compiler - A domain specific language - Implemented in Scala - Embedded in Scala as a host language
  • 6. Two Reasons for DSLs (1) Support new syntax. (2) Support new functionality. I find (2) much more interesting than (1).
  • 7. What Kind of DSL is Spark? • Centered around collections. • Immutable data sets equipped with functional transformers. map reduce union flatMap fold intersection filter aggregate ... These are exactly the Scala collection operations!
  • 8. Spark vs Scala Collections • So, is Spark exactly Scala Collections, but running in a cluster? • Not quite. Two main differences • Spark is lazy, Scala collections are strict. • Spark has added functionality, e.g. PairRDDs.
  • 9. Collections Design Choices Imperative Functional Strict Lazy vs vs java.util scala.collection.immutable Scala OCaml C# Spark Scala Stream, views
  • 10. Strict vs Lazy val xs = data.map(f) xs.filter(p).sum xs.take(10).toString Strict: map is evaluated once, produces intermediate list xs Lazy: map is evaluated twice, no intermediate list. Lazy needs a purely functional architecture.
  • 11. How Scala Can Learn from Spark • Redo views for Scala collections: val ys = xs .view .map(f) .filter(p) .force Lifting to views makes it clear when things get evaluated. • Add cache() operation for persistence
  • 12. How Scala Can Learn from Spark • Redo views for Scala collections: val ys = xs .view .map(f) .filter(p) .to(Vector) Lifting to views makes it clear when things get evaluated. • Add cache() operation for persistence
  • 13. How Scala Can Learn from Spark • Add pairwise operations such as reduceByKey groupByKey lookup • These work on a sequences of pairs, analogously to PairRDDs. • Provides a lightweight way to add map-like operations to sequences.
  • 14. • Most of the advanced types concepts are about flexibility, less so about safety. 2nd Scala Invariant: It’s about the Types 14 Flexibility / Ease of Use Safety Scala Trend in Type-systems Goals of PL design where we’d like it to move
  • 15. Spark is A Multi-Language Platform Supported Programming Languages: • Python • Scala • Java • R Why Scala instead of Python? • Native to Spark, can use everything without translation. • Types help (a lot).
  • 16. Why Types Help Functional operations do not have hidden dependencies - inputs are parameters - outputs are results • Hence, every interaction is given a type. • Logic errors usually translate into type errors. • This is a great asset for programming, and the same should hold for data science!
  • 17. How can Scala help Spark? • Developer experience • Infrastructure • Spores • Fusion
  • 18. Infrastructure 18 JVM Scala Runtime Spark Runtime Cluster Manager (e.g. Yarn, Mesos) File System (e.g. HDFS, Cassandra) Scala REPL Scala CompilerJLine handling Autocompletion Use Lambdas, default methods Serialization Java 8 streams
  • 19. Spores • Problem: Closures use Java serialization, can drag a very large dependency graph. class C { val data = someLargeCollection val sum: Int = data.sum doLater(x => println(sum)) } All of data is serialized in the closure.
  • 20. Spores: Safely serializable closures Idea: Make explicit what a closure captures: class C { val data = someLargeCollection val sum: Int = data.sum doLater { val s = sum x => println(s)) }
  • 21. Spores Idea: Make explicit what a closure captures: class C { val data = someLargeCollection val sum: Int = data.sum doLater { Spore { val s = sum x => println(s)) } } Spores make sure no hidden dependencies remain.
  • 22. Fusion • Problem: Many Spark jobs are compute bound.  Kay Ousterhout “Making sense of Spark performance” - Lazy collections avoid intermediate results. - Overhead because every operation is a separate closure.
  • 23. Fusion • Problem: Many Spark jobs are compute bound.  Kay Ousterhout “Making sense of Spark performance” - Lazy collections avoid intermediate results. - Overhead because every operation is a separate closure. • Fusion cuts through this, can combine several closures in a tight loop. • Spark implements fusion for selected operations using dataframes. • Would like to do the same for general Scala code.
  • 24. The Dotty Linker • A smart, whole program optimizer • Analyses and tunes program and dependencies. • removes dead code • eliminates virtual dispatch, boxing, closures PageRank example, 80K nodes: Bytecode Objects Virtual CPU branch Running size allocated calls miss rate time Original 60KB app 3M 24M 12% 3.2 sec + 5.5MB lib Optimized 920KB 490K 3M 3% 1.7 sec
  • 25. Instead of a Summary • Spark and Scala are beautiful examples of what can be achieved by a bunch of dedicated grad students using a language & system originally written by another bunch of dedicated grad students. • I am looking forward to the next steps of their co- evolution.