SlideShare a Scribd company logo
Spark As A Distributed Scala
Write a lot to my blog www.Fruzenshtein.com
Currently interested in Scala, Akka, Spark…
Who is who?
Alexey Zvolinskiy
~4 years of Scala experience
Passing through Functional Programming
in Scala Specialization on Coursera
@Fruzenshtein
Why Scala?
What makes Scala so great?
1. Functional programming language*
2. Immutability
3. Type system
4. Collections API
5. Pattern matching
6. Implicit
Functional programming language
1. Function is a first class citizen
2. Totality
3. Determinism
4. Purity
A => B
A1
A2
…
An
B1
B2
…
Bn
A => BAi Bi A => BAi Bi
A => BAi Bi
Immutability
1. Makes a code more predictable
2. Reduces efforts to understand a code
3. Key to thread-safety
Books:
Java concurrency in practice
Effective Java 2nd Edition
Type system
1. Static typing
2. Type inference
3. Bounds Map[V, K]
List[T1 <: T2]
Set[+T]
Collections API
val numbers = List(1,2,3,4,5,6,7,8,9,10)
numbers.filter(_ % 2 == 0)
.map(_ * 10)
//List(20, 40, 60, 80, 100)
filter(n:Int => Boolean)
//(n => n % 2 == 0)
//(n => n * 10)
map(n:Int => Int)
Collections API
val groupsOfStudents = List(
List(("Alex", 65), ("Kate", 87), ("Sam", 98)),
List(("Peter", 84), ("Bob", 79), ("Samanta", 71)),
List(("Rob", 82), ("Jack", 55), ("Ann", 90))
)
groupsOfStudents.flatMap(students => students)
.groupBy(student => student._2 > 75)
.get(true).get
//List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
And what?!
=Parallelism=
Idea of parallelism
How to divide a problem
into subproblems?
How to use a hardware
optimally?
Parallelism background
Scala parallel collections
val from0to100000: Range = 0 until 100000
val list = from0to100000.toList
//scala.collection.parallel.immutable.ParSeq[Int]
val parList = list.par
Some benchmarks
val list = from0to100000.toList
for (i <- 1 to 10) {
val t0 = System.currentTimeMillis()
list.filter(isPrime(_))
println(System.currentTimeMillis - t0)
}
def isPrime(n: Int): Boolean = ! (
(2 until n-1) exists (n % _ == 0)
)
val parList = list.par
for (i <- 1 to 10) {
val t1 = System.currentTimeMillis()
parList.filter(isPrime(_))
println(System.currentTimeMillis - t1)
}
7106
6467
6315
6275
6478
8732
6543
6296
6299
6286
5130
5106
4649
4568
4580
4446
4447
4437
4290
4476
Ok, but what about
Spark?!
Why distributed computations?
single machine
(shared memory)
Multiple nodes
(network)
Parallel collections
(scala)
RDDs
(spark)
Almost the same API
RDD example
Spark
Spark
Spark
val tweets: RDD[Tweet] = …
tweets.filter(
_.contains(“bigdata”)
)
Latency
Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
Computation model
memory disk network
seconds
-
days
weeks
-
months
weeks
-
years
Scala transformations & actions
1. Transformations are lazy
2. Actions are eager
map
filter
flatMap
…
reduce
collect
count
…
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
val tweets: RDD[Tweet] = …
tweets.filter(_.contains(“bigdata”))
.map(t => (t.author, t.body)
.collect()
Rules of thumb
1. Cache
2. Apply efficiently
3. Avoid shuffling
val tweets: RDD[Tweet] = …
val cachedTweets = tweets.cache()
cachedTweets.filter(_.contains(“USA”))
.map(t => (t.author, t.body)
cachedTweets.map(t => (t.author, t.body)
.filter(_.contains(“USA”))
Shuffling
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
(2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100])
groupByKey()
Transaction(id: Int, amount: Int)
We want to know how much money spent each client
Reduce before group
(1, 240)
(2, 605)
(3, 100)
(1, 700)
(1, 450)
(3, 200)
(2, [605]) (1, [240, 700, 450]) (3, [100, 200])
groupByKey()
(1, 240)
(2, 500)
(2, 105)
(3, 100)
(1, 200)
(1, 500)
(1, 450)
(3, 100)
(3, 100)
reduceByKey(…)
Thanks :)
@Fruzenshtein

More Related Content

What's hot

Xebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalXebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_final
Urs Peter
 
Lisp Programming Languge
Lisp Programming LangugeLisp Programming Languge
Lisp Programming Languge
Yaser Jaradeh
 
Gentle Introduction To Lisp
Gentle Introduction To LispGentle Introduction To Lisp
Gentle Introduction To Lisp
Damien Garaud
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
Asankhaya Sharma
 
Linked lists c7
Linked lists c7Linked lists c7
Linked lists c7
Omar Al-Sabek
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculus
Afaq Siddiqui
 
Using VI Java from Scala
Using VI Java from ScalaUsing VI Java from Scala
Using VI Java from Scala
dcbriccetti
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
Meetu Maltiar
 
(3) collections algorithms
(3) collections algorithms(3) collections algorithms
(3) collections algorithms
Nico Ludwig
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
Viliam Elischer
 
OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)
Edward Willink
 
Lisp
LispLisp
Object Oriented JavaScript
Object Oriented JavaScriptObject Oriented JavaScript
Object Oriented JavaScript
injulkarnilesh
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in Scala
Ayush Mishra
 
Safe navigation in OCL
Safe navigation in OCLSafe navigation in OCL
Safe navigation in OCL
Edward Willink
 
The Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesThe Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and Perspectives
Matthias Langer
 

What's hot (16)

Xebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_finalXebicon2013 scala vsjava_final
Xebicon2013 scala vsjava_final
 
Lisp Programming Languge
Lisp Programming LangugeLisp Programming Languge
Lisp Programming Languge
 
Gentle Introduction To Lisp
Gentle Introduction To LispGentle Introduction To Lisp
Gentle Introduction To Lisp
 
Verified Subtyping with Traits and Mixins
Verified Subtyping with Traits and MixinsVerified Subtyping with Traits and Mixins
Verified Subtyping with Traits and Mixins
 
Linked lists c7
Linked lists c7Linked lists c7
Linked lists c7
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculus
 
Using VI Java from Scala
Using VI Java from ScalaUsing VI Java from Scala
Using VI Java from Scala
 
Getting Started With Scala
Getting Started With ScalaGetting Started With Scala
Getting Started With Scala
 
(3) collections algorithms
(3) collections algorithms(3) collections algorithms
(3) collections algorithms
 
RxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScriptRxJS - The Reactive Extensions for JavaScript
RxJS - The Reactive Extensions for JavaScript
 
OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)OCL 2.4. (... 2.5)
OCL 2.4. (... 2.5)
 
Lisp
LispLisp
Lisp
 
Object Oriented JavaScript
Object Oriented JavaScriptObject Oriented JavaScript
Object Oriented JavaScript
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in Scala
 
Safe navigation in OCL
Safe navigation in OCLSafe navigation in OCL
Safe navigation in OCL
 
The Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and PerspectivesThe Scala Refactoring Library: Problems and Perspectives
The Scala Refactoring Library: Problems and Perspectives
 

Viewers also liked

ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSS
Niall Beard
 
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
Mustafa TURAN
 
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Elixir Club
 
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
Changwook Park
 
Control flow in_elixir
Control flow in_elixirControl flow in_elixir
Control flow in_elixir
Anna Neyzberg
 
Spring IO for startups
Spring IO for startupsSpring IO for startups
Spring IO for startups
Alex Fruzenshtein
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Elixir Club
 
Phoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushPhoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex Troush
Elixir Club
 
GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim
Elixir Club
 
Distributed system in Elixir
Distributed system in ElixirDistributed system in Elixir
Distributed system in Elixir
Changwook Park
 
Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays
saraswathi rajakumar
 
Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Elixir & Phoenix 推坑
Elixir & Phoenix 推坑
Chao-Ju Huang
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko
Elixir Club
 
Anatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationAnatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor Communication
Mustafa TURAN
 
Build Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixBuild Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir Phoenix
Chi-chi Ekweozor
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
Daniel Cukier
 
Elixir basics
Elixir basicsElixir basics
Elixir basics
Ruben Amortegui
 
Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)
Gabriele Lana
 
Embedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsEmbedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBots
Frank Hunleth
 

Viewers also liked (20)

Big Data eBook
Big Data eBookBig Data eBook
Big Data eBook
 
ELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSSELIXIR Webinar: Introducing TeSS
ELIXIR Webinar: Introducing TeSS
 
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
WEB MINING: PATTERN DISCOVERY ON THE WORLD WIDE WEB - 2011
 
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
Magic Clusters and Where to Find Them 2.0 - Eugene Pirogov
 
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
나프다 웨비너 1604: Elixir와 함수형 프로그래밍을 이용한 웹 개발
 
Control flow in_elixir
Control flow in_elixirControl flow in_elixir
Control flow in_elixir
 
Spring IO for startups
Spring IO for startupsSpring IO for startups
Spring IO for startups
 
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton MishchukFlowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
Flowex: Flow-Based Programming with Elixir GenStage - Anton Mishchuk
 
Phoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex TroushPhoenix: Inflame the Web - Alex Troush
Phoenix: Inflame the Web - Alex Troush
 
GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim GenStage and Flow - Jose Valim
GenStage and Flow - Jose Valim
 
Distributed system in Elixir
Distributed system in ElixirDistributed system in Elixir
Distributed system in Elixir
 
Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays Proteome array - antibody based proteome arrays
Proteome array - antibody based proteome arrays
 
Elixir & Phoenix 推坑
Elixir & Phoenix 推坑Elixir & Phoenix 推坑
Elixir & Phoenix 推坑
 
Bottleneck in Elixir Application - Alexey Osipenko
 Bottleneck in Elixir Application - Alexey Osipenko  Bottleneck in Elixir Application - Alexey Osipenko
Bottleneck in Elixir Application - Alexey Osipenko
 
Anatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor CommunicationAnatomy of an elixir process and Actor Communication
Anatomy of an elixir process and Actor Communication
 
Build Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir PhoenixBuild Your Own Real-Time Web Service with Elixir Phoenix
Build Your Own Real-Time Web Service with Elixir Phoenix
 
Play vs Rails
Play vs RailsPlay vs Rails
Play vs Rails
 
Elixir basics
Elixir basicsElixir basics
Elixir basics
 
Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)Professional Programmer (3 Years Later)
Professional Programmer (3 Years Later)
 
Embedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBotsEmbedded Erlang, Nerves, and SumoBots
Embedded Erlang, Nerves, and SumoBots
 

Similar to Spark as a distributed Scala

Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
Angelo Leto
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
league
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
Nguyen Tuan
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
Neelkanth Sachdeva
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
Knoldus Inc.
 
Devoxx
DevoxxDevoxx
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
Spark Summit
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
Mohammad Hossein Rimaz
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
Albert Bifet
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
Martin Odersky
 
Lecture1
Lecture1Lecture1
Lecture1
Muhammad Fayyaz
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
Martin Odersky
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdfHiroshi Ono
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
Viplav Jain
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
Eric Pederson
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako,
Vasil Remeniuk
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
WalmirCouto3
 

Similar to Spark as a distributed Scala (20)

Introduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with sparkIntroduction to parallel and distributed computation with spark
Introduction to parallel and distributed computation with spark
 
The Scala Programming Language
The Scala Programming LanguageThe Scala Programming Language
The Scala Programming Language
 
Scala training workshop 02
Scala training workshop 02Scala training workshop 02
Scala training workshop 02
 
Functional programming with Scala
Functional programming with ScalaFunctional programming with Scala
Functional programming with Scala
 
Functional Programming With Scala
Functional Programming With ScalaFunctional Programming With Scala
Functional Programming With Scala
 
Devoxx
DevoxxDevoxx
Devoxx
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
Quick introduction to scala
Quick introduction to scalaQuick introduction to scala
Quick introduction to scala
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009Scala Talk at FOSDEM 2009
Scala Talk at FOSDEM 2009
 
Lecture1
Lecture1Lecture1
Lecture1
 
The Evolution of Scala
The Evolution of ScalaThe Evolution of Scala
The Evolution of Scala
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
scalaliftoff2009.pdf
scalaliftoff2009.pdfscalaliftoff2009.pdf
scalaliftoff2009.pdf
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Scala for Java Programmers
Scala for Java ProgrammersScala for Java Programmers
Scala for Java Programmers
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako,
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 

More from Alex Fruzenshtein

Scala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingScala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional Programming
Alex Fruzenshtein
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
Alex Fruzenshtein
 
Boost UI tests
Boost UI testsBoost UI tests
Boost UI tests
Alex Fruzenshtein
 
N аргументов не идти в QA
N аргументов не идти в QAN аргументов не идти в QA
N аргументов не идти в QA
Alex Fruzenshtein
 
XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)
Alex Fruzenshtein
 

More from Alex Fruzenshtein (6)

Scala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional ProgrammingScala UA: Big Step To Functional Programming
Scala UA: Big Step To Functional Programming
 
Akka: Actor Design & Communication Technics
Akka: Actor Design & Communication TechnicsAkka: Actor Design & Communication Technics
Akka: Actor Design & Communication Technics
 
Boost UI tests
Boost UI testsBoost UI tests
Boost UI tests
 
N аргументов не идти в QA
N аргументов не идти в QAN аргументов не идти в QA
N аргументов не идти в QA
 
XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)XpDays - Automated testing of responsive design (GalenFramework)
XpDays - Automated testing of responsive design (GalenFramework)
 
Test-case design
Test-case designTest-case design
Test-case design
 

Recently uploaded

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Spark as a distributed Scala

  • 1. Spark As A Distributed Scala
  • 2. Write a lot to my blog www.Fruzenshtein.com Currently interested in Scala, Akka, Spark… Who is who? Alexey Zvolinskiy ~4 years of Scala experience Passing through Functional Programming in Scala Specialization on Coursera @Fruzenshtein
  • 4. What makes Scala so great? 1. Functional programming language* 2. Immutability 3. Type system 4. Collections API 5. Pattern matching 6. Implicit
  • 5. Functional programming language 1. Function is a first class citizen 2. Totality 3. Determinism 4. Purity A => B A1 A2 … An B1 B2 … Bn A => BAi Bi A => BAi Bi A => BAi Bi
  • 6. Immutability 1. Makes a code more predictable 2. Reduces efforts to understand a code 3. Key to thread-safety Books: Java concurrency in practice Effective Java 2nd Edition
  • 7. Type system 1. Static typing 2. Type inference 3. Bounds Map[V, K] List[T1 <: T2] Set[+T]
  • 8. Collections API val numbers = List(1,2,3,4,5,6,7,8,9,10) numbers.filter(_ % 2 == 0) .map(_ * 10) //List(20, 40, 60, 80, 100) filter(n:Int => Boolean) //(n => n % 2 == 0) //(n => n * 10) map(n:Int => Int)
  • 9. Collections API val groupsOfStudents = List( List(("Alex", 65), ("Kate", 87), ("Sam", 98)), List(("Peter", 84), ("Bob", 79), ("Samanta", 71)), List(("Rob", 82), ("Jack", 55), ("Ann", 90)) ) groupsOfStudents.flatMap(students => students) .groupBy(student => student._2 > 75) .get(true).get //List((Kate,87), (Sam,98), (Peter,84), (Bob,79), (Rob,82), (Ann,90))
  • 11. Idea of parallelism How to divide a problem into subproblems? How to use a hardware optimally?
  • 13. Scala parallel collections val from0to100000: Range = 0 until 100000 val list = from0to100000.toList //scala.collection.parallel.immutable.ParSeq[Int] val parList = list.par
  • 14. Some benchmarks val list = from0to100000.toList for (i <- 1 to 10) { val t0 = System.currentTimeMillis() list.filter(isPrime(_)) println(System.currentTimeMillis - t0) } def isPrime(n: Int): Boolean = ! ( (2 until n-1) exists (n % _ == 0) ) val parList = list.par for (i <- 1 to 10) { val t1 = System.currentTimeMillis() parList.filter(isPrime(_)) println(System.currentTimeMillis - t1) } 7106 6467 6315 6275 6478 8732 6543 6296 6299 6286 5130 5106 4649 4568 4580 4446 4447 4437 4290 4476
  • 15. Ok, but what about Spark?!
  • 16. Why distributed computations? single machine (shared memory) Multiple nodes (network) Parallel collections (scala) RDDs (spark) Almost the same API
  • 17. RDD example Spark Spark Spark val tweets: RDD[Tweet] = … tweets.filter( _.contains(“bigdata”) )
  • 18. Latency Numbers from Jeff Dean http://research.google.com/people/jeff/ https://gist.github.com/2841832 Graph and scale by Thomas Lee
  • 19. Computation model memory disk network seconds - days weeks - months weeks - years
  • 20. Scala transformations & actions 1. Transformations are lazy 2. Actions are eager map filter flatMap … reduce collect count … val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) val tweets: RDD[Tweet] = … tweets.filter(_.contains(“bigdata”)) .map(t => (t.author, t.body) .collect()
  • 21. Rules of thumb 1. Cache 2. Apply efficiently 3. Avoid shuffling val tweets: RDD[Tweet] = … val cachedTweets = tweets.cache() cachedTweets.filter(_.contains(“USA”)) .map(t => (t.author, t.body) cachedTweets.map(t => (t.author, t.body) .filter(_.contains(“USA”))
  • 22. Shuffling (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) (2, [500, 105]) (1, [240, 200, 500, 450]) (3, [100, 100, 100]) groupByKey() Transaction(id: Int, amount: Int) We want to know how much money spent each client
  • 23. Reduce before group (1, 240) (2, 605) (3, 100) (1, 700) (1, 450) (3, 200) (2, [605]) (1, [240, 700, 450]) (3, [100, 200]) groupByKey() (1, 240) (2, 500) (2, 105) (3, 100) (1, 200) (1, 500) (1, 450) (3, 100) (3, 100) reduceByKey(…)