SlideShare a Scribd company logo
1 of 61
ZparkIO
Spark wrapped in ZIO
© 2020 DEMANDBASE|SLIDE 2
Senior Data Engineer
Demandbase
Leo Benkel
leobenkel
AGENDA
▪ Scala & Functional Programming
▪ Spark
▪ Future
▪ ZIO
▪ ZparkIO
▪ Installation
▪ Configuration
▪ Spark
▪ Helper functions
▪ From Futures
▪ In production
Scala & Functional Programming
© 2020 DEMANDBASE|SLIDE 5
Scala
● Programming language based on the JVM
● Built with Functional Programming in mind
● Inspired by Haskell
© 2020 DEMANDBASE|SLIDE 6
Functional programming
● Reason in term of Type transformation
● Category Theory in Mathematics
● Pure function. No side effects
● Immutable
© 2020 DEMANDBASE|SLIDE 7
Monad, Monoid, Functor, Applicative
(m : M[A]).map(f: A => B) : M[B]
● Being able to chain operations without intermediate variables
© 2020 DEMANDBASE|SLIDE 8
Monad, Monoid, Functor, Applicative
val m: M[A]
val f: A => B
val g: B => C
val h: C => D
val output: M[D] = m
.map(f)
.map(g)
.map(h)
● Easier to read
● Would not compile if used g before f as a mistake
● Compiler is our friend
© 2020 DEMANDBASE|SLIDE 9
Monad, Monoid, Functor, Applicative
for {
a:A <- m
b:B <- f(a)
c:C <- g(b)
d:D <- h(c)
} yield { d }
● Can read the code in the same order it is happening
Spark
© 2020 DEMANDBASE|SLIDE 11
Spark
● Distributed computing framework
● Dataset[A] has methods related to Functional Programming, you
can use map
● Driver will wait until jobs are completed to submit new ones to
Executors
● Each operation is semi-lazy and synchronous
© 2020 DEMANDBASE|SLIDE 12
Spark - ETL
1. Load data
2. Transform
3. Aggregate
4. Save
From: https://www.astera.com/type/blog/etl-pipeline-vs-data-pipeline/
© 2020 DEMANDBASE|SLIDE 13
Spark - ETL
DB1 DB2
A => B C => D(B,C) => E
BD3 BD4
Future
© 2020 DEMANDBASE|SLIDE 15
Future - the revelation - Spark Summit 2019
● Parallelizing with Apache Spark in Unexpected Ways
○ from Anna Holschuh
© 2020 DEMANDBASE|SLIDE 16
Fetching sources one at a time is not efficient
© 2020 DEMANDBASE|SLIDE 17
Fetching everything at the same time
© 2020 DEMANDBASE|SLIDE 18
Tried on production project at Demandbase
Future
20min
Raw
40min
ZIO
© 2020 DEMANDBASE|SLIDE 20
ZIO
● https://zio.dev/
● Wrap sync and async operations smoothly
● Can use map across anything at a macro level
● Fully lazy
© 2020 DEMANDBASE|SLIDE 21
ZIO component
ZIO[R, E, A]
● Environment: Requirements to execute this Task
● Error
● Output
© 2020 DEMANDBASE|SLIDE 22
Nobody likes Future
● Everything is wrapped in ZIO ( sync and async )
● Async with Fibers, just call .fork
● Cancellable !
● Easy retries !
● Easy timeout !
● Simpler methods with less arguments because of Environment
© 2020 DEMANDBASE|SLIDE 23
I have heard, people like memes
ZparkIO
© 2020 DEMANDBASE|SLIDE 25
https://github.com/leobenkel/ZparkIO
Where to find it?
© 2020 DEMANDBASE|SLIDE 26
https://github.com/leobenkel/ZparkIO
Where to find it?
© 2020 DEMANDBASE|SLIDE 27
What is it?
● Boilerplate to start with ZIO and Spark
● Lots of helper functions to make the code looks smoother
● Easier to read the code
● Easier to implement retries
● Easier to implement timeout
● Easier to parallelize tasks
© 2020 DEMANDBASE|SLIDE 28
Example use cases
https://github.com/leobenkel/ZparkIO/tree/master/ProjectExample/src/main/
scala/com/leobenkel/zparkioProjectExample
https://github.com/leobenkel/ZparkIO/tree/master/ProjectExample_MoreCo
mplex/src/main/scala/com/leobenkel/zparkioProfileExampleMoreComplex
© 2020 DEMANDBASE|SLIDE 29
How to use?
object Main extends Application {}
© 2020 DEMANDBASE|SLIDE 30
Unit test the entire application
class ApplicationTest extends FreeSpec with TestWithSpark {
"Full application" - {
"Run" in {
TestApp.unsafeRunSync(TestApp.run("--spark-foo" :: "abc" :: Nil)) match {
case Success(value) =>
println(s"Read: $value")
assertResult(0)(value)
case Failure(cause) => fail(cause.prettyPrint)
}
}
}
}
object TestApp extends Application {}
© 2020 DEMANDBASE|SLIDE 31
Application
trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] {
override def runApp(): ZIO[RuntimeEnv, Throwable, String] = {
for {
...
} yield { output }
}
override def makeEnvironment(
cliService: Arguments,
sparkService: SparkModule.Service
): RuntimeEnv = {
RuntimeEnv(cliService, sparkService)
}
override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder
override def makeCliBuilder: CommandLineArguments.Builder[Arguments] =
new CommandLineArguments.Builder[Arguments] {
override protected def createCli(args: List[String]): Arguments = {
Arguments(args)
}
}
}
© 2020 DEMANDBASE|SLIDE 32
Application
trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] {
override def runApp(): ZIO[RuntimeEnv, Throwable, String] = {
for {
...
} yield { output }
}
override def makeEnvironment(
cliService: Arguments,
sparkService: SparkModule.Service
): RuntimeEnv = {
RuntimeEnv(cliService, sparkService)
}
override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder
override def makeCliBuilder: CommandLineArguments.Builder[Arguments] =
new CommandLineArguments.Builder[Arguments] {
override protected def createCli(args: List[String]): Arguments = {
Arguments(args)
}
}
}
© 2020 DEMANDBASE|SLIDE 33
Application
trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] {
override def runApp(): ZIO[RuntimeEnv, Throwable, String] = {
for {
...
} yield { output }
}
override def makeEnvironment(
cliService: Arguments,
sparkService: SparkModule.Service
): RuntimeEnv = {
RuntimeEnv(cliService, sparkService)
}
override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder
override def makeCliBuilder: CommandLineArguments.Builder[Arguments] =
new CommandLineArguments.Builder[Arguments] {
override protected def createCli(args: List[String]): Arguments = {
Arguments(args)
}
}
}
© 2020 DEMANDBASE|SLIDE 34
Application
trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] {
override def runApp(): ZIO[RuntimeEnv, Throwable, String] = {
for {
...
} yield { output }
}
override def makeEnvironment(
cliService: Arguments,
sparkService: SparkModule.Service
): RuntimeEnv = {
RuntimeEnv(cliService, sparkService)
}
override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder
override def makeCliBuilder: CommandLineArguments.Builder[Arguments] =
new CommandLineArguments.Builder[Arguments] {
override protected def createCli(args: List[String]): Arguments = {
Arguments(args)
}
}
}
© 2020 DEMANDBASE|SLIDE 35
Application
trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] {
override def runApp(): ZIO[RuntimeEnv, Throwable, String] = {
for {
...
} yield { output }
}
override def makeEnvironment(
cliService: Arguments,
sparkService: SparkModule.Service
): RuntimeEnv = {
RuntimeEnv(cliService, sparkService)
}
override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder
override def makeCliBuilder: CommandLineArguments.Builder[Arguments] =
new CommandLineArguments.Builder[Arguments] {
override protected def createCli(args: List[String]): Arguments = {
Arguments(args)
}
}
}
© 2020 DEMANDBASE|SLIDE 36
ZparkioApp
ZparkioApp[C <: CommandLineArguments.Service, ENV <: ZparkioApp.ZPEnv[C], OUTPUT]
● Command line input class
● Environment for the zio.Runtime
● Output of the run function
© 2020 DEMANDBASE|SLIDE 37
RuntimeEnv
case class RuntimeEnv(
cliService: Arguments,
sparkService: SparkModule.Service
) extends System.Live
with Console.Live
with Clock.Live
with Random.Live
with Blocking.Live
with CommandLineArguments[Arguments]
with Logger
with FileIO.Live
with SparkModule {
lazy final override val cli: Arguments = cliService
lazy final override val spark: SparkModule.Service = sparkService
lazy final override val log: Logger.Service = new Log()
}
© 2020 DEMANDBASE|SLIDE 38
RuntimeEnv
case class RuntimeEnv(
cliService: Arguments,
sparkService: SparkModule.Service
) extends System.Live
with Console.Live
with Clock.Live
with Random.Live
with Blocking.Live
with CommandLineArguments[Arguments]
with Logger
with FileIO.Live
with SparkModule {
lazy final override val cli: Arguments = cliService
lazy final override val spark: SparkModule.Service = sparkService
lazy final override val log: Logger.Service = new Log()
}
© 2020 DEMANDBASE|SLIDE 39
RuntimeEnv
case class RuntimeEnv(
cliService: Arguments,
sparkService: SparkModule.Service
) extends System.Live
with Console.Live
with Clock.Live
with Random.Live
with Blocking.Live
with CommandLineArguments[Arguments]
with Logger
with FileIO.Live
with SparkModule {
lazy final override val cli: Arguments = cliService
lazy final override val spark: SparkModule.Service = sparkService
lazy final override val log: Logger.Service = new Log()
}
© 2020 DEMANDBASE|SLIDE 40
RuntimeEnv
case class RuntimeEnv(
cliService: Arguments,
sparkService: SparkModule.Service
) extends System.Live
with Console.Live
with Clock.Live
with Random.Live
with Blocking.Live
with CommandLineArguments[Arguments]
with Logger
with FileIO.Live
with SparkModule {
lazy final override val cli: Arguments = cliService
lazy final override val spark: SparkModule.Service = sparkService
lazy final override val log: Logger.Service = new Log()
}
© 2020 DEMANDBASE|SLIDE 41
RuntimeEnv
case class RuntimeEnv(
cliService: Arguments,
sparkService: SparkModule.Service
) extends System.Live
with Console.Live
with Clock.Live
with Random.Live
with Blocking.Live
with CommandLineArguments[Arguments]
with Logger
with FileIO.Live
with SparkModule {
lazy final override val cli: Arguments = cliService
lazy final override val spark: SparkModule.Service = sparkService
lazy final override val log: Logger.Service = new Log()
}
Configurations
© 2020 DEMANDBASE|SLIDE 43
Configurations
case class Arguments(input: List[String])
extends ScallopConf(input) with CommandLineArguments.Service {
val inputId: ScallopOption[Int] = opt[Int](
default = Some(10),
required = false,
noshort = true
)
}
object Arguments {
def apply[A](f: Arguments => A): ZIO[CommandLineArguments[Arguments], Throwable, A] = {
CommandLineArguments.get[Arguments](f)
}
}
© 2020 DEMANDBASE|SLIDE 44
Using Configurations
for {
...
a <- Arguments(_.inputId())
...
} yield { ??? }
● No need to pass Arguments to all your methods.
● Always accessible through the ZIO environment
ZIO(spark)
© 2020 DEMANDBASE|SLIDE 46
Building Spark
object SparkBuilder extends SparkModule.Builder[Arguments] {
override protected final lazy val appName: String = "Zparkio_test"
override protected def updateConfig(
sparkBuilder: SparkSession.Builder,
arguments: Arguments
): SparkSession.Builder = {
sparkBuilder.config("spark.foo.bar", arguments.sparkFoo())
}
}
© 2020 DEMANDBASE|SLIDE 47
Fetching SparkSession
for {
...
spark <- SparkModule()
...
} yield { ??? }
● No need to pass SparkSession to all your methods.
● Always accessible through the ZIO environment
Helper functions
© 2020 DEMANDBASE|SLIDE 49
Making Datasets
for {
...
outputs <- ZDS { spark =>
import spark.implicits._
inputDS.map(_.toOutput)
}
...
} yield { ??? }
● Lots of helper functions
© 2020 DEMANDBASE|SLIDE 50
Making Datasets
for {
...
outputs: Dataset[CaseClass] <- ZDS(
CaseClass(a = 1, b = "one"),
CaseClass(a = 2, b = "two"),
CaseClass(a = 3, b = "three")
)
...
} yield { ??? }
● Helpful in test
● Turn a Seq to a Dataset .
© 2020 DEMANDBASE|SLIDE 51
Transforming Datasets
for {
...
outputs: Dataset[OutputCaseClass] <- ZDS(
CaseClass(a = 1, b = "one")
).zMap {
case TestClass(a, b) => Task(OutputCaseClass(a + b.length))
}
...
} yield { ??? }
● No need to do _.map(_.map(???)) anymore
© 2020 DEMANDBASE|SLIDE 52
Broadcasting
for {
...
authorIds: Broadcast[Array[Int]] <- ZDS.broadcast { spark =>
import spark.implicits._
posts.map(_.authorId).distinct.collect
}
...
} yield { ??? }
● Broadcast easily
From Futures
© 2020 DEMANDBASE|SLIDE 54
From Future
import com.leobenkel.zparkio.ZFuture._
val z = (Future(???)(_)).toZIO
● https://github.com/leobenkel/ZparkIO/blob/master/Library/src/main/scal
a/com/leobenkel/zparkio/ZFuture.scala
In production
© 2020 DEMANDBASE|SLIDE 56
Tried on production project at Demandbase
ZIO
2h
Future
3h
● Faster
● Less errors because of easy retry
● Cheaper because of timeout limit
● Better error logs because of Fiber logs
© 2020 DEMANDBASE|SLIDE 57
Fetching everything at the same time
© 2020 DEMANDBASE|SLIDE 58
Fetching everything at the same time
What next?
© 2020 DEMANDBASE|SLIDE 60
What next?
● https://github.com/leobenkel/ZparkIO/issues
● Giter8 to make starting a project easier
● Build for all Spark versions
THANK YOU
Questions?

More Related Content

What's hot

What's hot (7)

Sprint 20
Sprint 20Sprint 20
Sprint 20
 
Sprint 39 review
Sprint 39 reviewSprint 39 review
Sprint 39 review
 
Scripting Your Qt Application
Scripting Your Qt ApplicationScripting Your Qt Application
Scripting Your Qt Application
 
Copy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with QtCopy Your Favourite Nokia App with Qt
Copy Your Favourite Nokia App with Qt
 
Thinking functional-in-scala
Thinking functional-in-scalaThinking functional-in-scala
Thinking functional-in-scala
 
ScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded SolutionsScilabTEC 2015 - Embedded Solutions
ScilabTEC 2015 - Embedded Solutions
 
Open gl
Open glOpen gl
Open gl
 

Similar to 2020 03-26 - meet up - zparkio

Google Developer Fest 2010
Google Developer Fest 2010Google Developer Fest 2010
Google Developer Fest 2010
Chris Ramsdale
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuning
AOE
 
Google io bootcamp_2010
Google io bootcamp_2010Google io bootcamp_2010
Google io bootcamp_2010
Chris Ramsdale
 

Similar to 2020 03-26 - meet up - zparkio (20)

Serverless and React
Serverless and ReactServerless and React
Serverless and React
 
Native Java with GraalVM
Native Java with GraalVMNative Java with GraalVM
Native Java with GraalVM
 
Yannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflowYannis Zarkadas. Enterprise data science workflows on kubeflow
Yannis Zarkadas. Enterprise data science workflows on kubeflow
 
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
Yannis Zarkadas. Stefano Fioravanzo. Enterprise data science workflows on kub...
 
Google Developer Fest 2010
Google Developer Fest 2010Google Developer Fest 2010
Google Developer Fest 2010
 
Middy.js - A powerful Node.js middleware framework for your lambdas​
Middy.js - A powerful Node.js middleware framework for your lambdas​ Middy.js - A powerful Node.js middleware framework for your lambdas​
Middy.js - A powerful Node.js middleware framework for your lambdas​
 
Improving Apache Spark Downscaling
 Improving Apache Spark Downscaling Improving Apache Spark Downscaling
Improving Apache Spark Downscaling
 
Scilab Modelica conference 20150921
Scilab Modelica conference 20150921Scilab Modelica conference 20150921
Scilab Modelica conference 20150921
 
Performance measurement and tuning
Performance measurement and tuningPerformance measurement and tuning
Performance measurement and tuning
 
Remix & GraphQL: A match made in heaven with type-safety DX
Remix & GraphQL:  A match made in heaven with type-safety DXRemix & GraphQL:  A match made in heaven with type-safety DX
Remix & GraphQL: A match made in heaven with type-safety DX
 
Angular for Java Enterprise Developers: Oracle Code One 2018
Angular for Java Enterprise Developers: Oracle Code One 2018Angular for Java Enterprise Developers: Oracle Code One 2018
Angular for Java Enterprise Developers: Oracle Code One 2018
 
JBoss World 2010
JBoss World 2010JBoss World 2010
JBoss World 2010
 
High performance web programming with C++14
High performance web programming with C++14High performance web programming with C++14
High performance web programming with C++14
 
Building Web Apps Sanely - EclipseCon 2010
Building Web Apps Sanely - EclipseCon 2010Building Web Apps Sanely - EclipseCon 2010
Building Web Apps Sanely - EclipseCon 2010
 
Google io bootcamp_2010
Google io bootcamp_2010Google io bootcamp_2010
Google io bootcamp_2010
 
Getting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with ThymeleafGetting start Java EE Action-Based MVC with Thymeleaf
Getting start Java EE Action-Based MVC with Thymeleaf
 
DevFest 2022 - Skaffold 2 Deep Dive Taipei.pdf
DevFest 2022 - Skaffold 2 Deep Dive Taipei.pdfDevFest 2022 - Skaffold 2 Deep Dive Taipei.pdf
DevFest 2022 - Skaffold 2 Deep Dive Taipei.pdf
 
Gain more freedom when migrating from Camunda 7 to 8.pdf
Gain more freedom when migrating from Camunda 7 to 8.pdfGain more freedom when migrating from Camunda 7 to 8.pdf
Gain more freedom when migrating from Camunda 7 to 8.pdf
 
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
Anais Dotis-Georgiou & Faith Chikwekwe [InfluxData] | Top 10 Hurdles for Flux...
 
Maximize the power of OSGi in AEM
Maximize the power of OSGi in AEM Maximize the power of OSGi in AEM
Maximize the power of OSGi in AEM
 

Recently uploaded

CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
anilsa9823
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

2020 03-26 - meet up - zparkio

  • 2. © 2020 DEMANDBASE|SLIDE 2 Senior Data Engineer Demandbase Leo Benkel leobenkel
  • 3. AGENDA ▪ Scala & Functional Programming ▪ Spark ▪ Future ▪ ZIO ▪ ZparkIO ▪ Installation ▪ Configuration ▪ Spark ▪ Helper functions ▪ From Futures ▪ In production
  • 4. Scala & Functional Programming
  • 5. © 2020 DEMANDBASE|SLIDE 5 Scala ● Programming language based on the JVM ● Built with Functional Programming in mind ● Inspired by Haskell
  • 6. © 2020 DEMANDBASE|SLIDE 6 Functional programming ● Reason in term of Type transformation ● Category Theory in Mathematics ● Pure function. No side effects ● Immutable
  • 7. © 2020 DEMANDBASE|SLIDE 7 Monad, Monoid, Functor, Applicative (m : M[A]).map(f: A => B) : M[B] ● Being able to chain operations without intermediate variables
  • 8. © 2020 DEMANDBASE|SLIDE 8 Monad, Monoid, Functor, Applicative val m: M[A] val f: A => B val g: B => C val h: C => D val output: M[D] = m .map(f) .map(g) .map(h) ● Easier to read ● Would not compile if used g before f as a mistake ● Compiler is our friend
  • 9. © 2020 DEMANDBASE|SLIDE 9 Monad, Monoid, Functor, Applicative for { a:A <- m b:B <- f(a) c:C <- g(b) d:D <- h(c) } yield { d } ● Can read the code in the same order it is happening
  • 10. Spark
  • 11. © 2020 DEMANDBASE|SLIDE 11 Spark ● Distributed computing framework ● Dataset[A] has methods related to Functional Programming, you can use map ● Driver will wait until jobs are completed to submit new ones to Executors ● Each operation is semi-lazy and synchronous
  • 12. © 2020 DEMANDBASE|SLIDE 12 Spark - ETL 1. Load data 2. Transform 3. Aggregate 4. Save From: https://www.astera.com/type/blog/etl-pipeline-vs-data-pipeline/
  • 13. © 2020 DEMANDBASE|SLIDE 13 Spark - ETL DB1 DB2 A => B C => D(B,C) => E BD3 BD4
  • 15. © 2020 DEMANDBASE|SLIDE 15 Future - the revelation - Spark Summit 2019 ● Parallelizing with Apache Spark in Unexpected Ways ○ from Anna Holschuh
  • 16. © 2020 DEMANDBASE|SLIDE 16 Fetching sources one at a time is not efficient
  • 17. © 2020 DEMANDBASE|SLIDE 17 Fetching everything at the same time
  • 18. © 2020 DEMANDBASE|SLIDE 18 Tried on production project at Demandbase Future 20min Raw 40min
  • 19. ZIO
  • 20. © 2020 DEMANDBASE|SLIDE 20 ZIO ● https://zio.dev/ ● Wrap sync and async operations smoothly ● Can use map across anything at a macro level ● Fully lazy
  • 21. © 2020 DEMANDBASE|SLIDE 21 ZIO component ZIO[R, E, A] ● Environment: Requirements to execute this Task ● Error ● Output
  • 22. © 2020 DEMANDBASE|SLIDE 22 Nobody likes Future ● Everything is wrapped in ZIO ( sync and async ) ● Async with Fibers, just call .fork ● Cancellable ! ● Easy retries ! ● Easy timeout ! ● Simpler methods with less arguments because of Environment
  • 23. © 2020 DEMANDBASE|SLIDE 23 I have heard, people like memes
  • 25. © 2020 DEMANDBASE|SLIDE 25 https://github.com/leobenkel/ZparkIO Where to find it?
  • 26. © 2020 DEMANDBASE|SLIDE 26 https://github.com/leobenkel/ZparkIO Where to find it?
  • 27. © 2020 DEMANDBASE|SLIDE 27 What is it? ● Boilerplate to start with ZIO and Spark ● Lots of helper functions to make the code looks smoother ● Easier to read the code ● Easier to implement retries ● Easier to implement timeout ● Easier to parallelize tasks
  • 28. © 2020 DEMANDBASE|SLIDE 28 Example use cases https://github.com/leobenkel/ZparkIO/tree/master/ProjectExample/src/main/ scala/com/leobenkel/zparkioProjectExample https://github.com/leobenkel/ZparkIO/tree/master/ProjectExample_MoreCo mplex/src/main/scala/com/leobenkel/zparkioProfileExampleMoreComplex
  • 29. © 2020 DEMANDBASE|SLIDE 29 How to use? object Main extends Application {}
  • 30. © 2020 DEMANDBASE|SLIDE 30 Unit test the entire application class ApplicationTest extends FreeSpec with TestWithSpark { "Full application" - { "Run" in { TestApp.unsafeRunSync(TestApp.run("--spark-foo" :: "abc" :: Nil)) match { case Success(value) => println(s"Read: $value") assertResult(0)(value) case Failure(cause) => fail(cause.prettyPrint) } } } } object TestApp extends Application {}
  • 31. © 2020 DEMANDBASE|SLIDE 31 Application trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] { override def runApp(): ZIO[RuntimeEnv, Throwable, String] = { for { ... } yield { output } } override def makeEnvironment( cliService: Arguments, sparkService: SparkModule.Service ): RuntimeEnv = { RuntimeEnv(cliService, sparkService) } override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder override def makeCliBuilder: CommandLineArguments.Builder[Arguments] = new CommandLineArguments.Builder[Arguments] { override protected def createCli(args: List[String]): Arguments = { Arguments(args) } } }
  • 32. © 2020 DEMANDBASE|SLIDE 32 Application trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] { override def runApp(): ZIO[RuntimeEnv, Throwable, String] = { for { ... } yield { output } } override def makeEnvironment( cliService: Arguments, sparkService: SparkModule.Service ): RuntimeEnv = { RuntimeEnv(cliService, sparkService) } override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder override def makeCliBuilder: CommandLineArguments.Builder[Arguments] = new CommandLineArguments.Builder[Arguments] { override protected def createCli(args: List[String]): Arguments = { Arguments(args) } } }
  • 33. © 2020 DEMANDBASE|SLIDE 33 Application trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] { override def runApp(): ZIO[RuntimeEnv, Throwable, String] = { for { ... } yield { output } } override def makeEnvironment( cliService: Arguments, sparkService: SparkModule.Service ): RuntimeEnv = { RuntimeEnv(cliService, sparkService) } override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder override def makeCliBuilder: CommandLineArguments.Builder[Arguments] = new CommandLineArguments.Builder[Arguments] { override protected def createCli(args: List[String]): Arguments = { Arguments(args) } } }
  • 34. © 2020 DEMANDBASE|SLIDE 34 Application trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] { override def runApp(): ZIO[RuntimeEnv, Throwable, String] = { for { ... } yield { output } } override def makeEnvironment( cliService: Arguments, sparkService: SparkModule.Service ): RuntimeEnv = { RuntimeEnv(cliService, sparkService) } override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder override def makeCliBuilder: CommandLineArguments.Builder[Arguments] = new CommandLineArguments.Builder[Arguments] { override protected def createCli(args: List[String]): Arguments = { Arguments(args) } } }
  • 35. © 2020 DEMANDBASE|SLIDE 35 Application trait Application extends ZparkioApp[Arguments, RuntimeEnv, String] { override def runApp(): ZIO[RuntimeEnv, Throwable, String] = { for { ... } yield { output } } override def makeEnvironment( cliService: Arguments, sparkService: SparkModule.Service ): RuntimeEnv = { RuntimeEnv(cliService, sparkService) } override def makeSparkBuilder: SparkModule.Builder[Arguments] = SparkBuilder override def makeCliBuilder: CommandLineArguments.Builder[Arguments] = new CommandLineArguments.Builder[Arguments] { override protected def createCli(args: List[String]): Arguments = { Arguments(args) } } }
  • 36. © 2020 DEMANDBASE|SLIDE 36 ZparkioApp ZparkioApp[C <: CommandLineArguments.Service, ENV <: ZparkioApp.ZPEnv[C], OUTPUT] ● Command line input class ● Environment for the zio.Runtime ● Output of the run function
  • 37. © 2020 DEMANDBASE|SLIDE 37 RuntimeEnv case class RuntimeEnv( cliService: Arguments, sparkService: SparkModule.Service ) extends System.Live with Console.Live with Clock.Live with Random.Live with Blocking.Live with CommandLineArguments[Arguments] with Logger with FileIO.Live with SparkModule { lazy final override val cli: Arguments = cliService lazy final override val spark: SparkModule.Service = sparkService lazy final override val log: Logger.Service = new Log() }
  • 38. © 2020 DEMANDBASE|SLIDE 38 RuntimeEnv case class RuntimeEnv( cliService: Arguments, sparkService: SparkModule.Service ) extends System.Live with Console.Live with Clock.Live with Random.Live with Blocking.Live with CommandLineArguments[Arguments] with Logger with FileIO.Live with SparkModule { lazy final override val cli: Arguments = cliService lazy final override val spark: SparkModule.Service = sparkService lazy final override val log: Logger.Service = new Log() }
  • 39. © 2020 DEMANDBASE|SLIDE 39 RuntimeEnv case class RuntimeEnv( cliService: Arguments, sparkService: SparkModule.Service ) extends System.Live with Console.Live with Clock.Live with Random.Live with Blocking.Live with CommandLineArguments[Arguments] with Logger with FileIO.Live with SparkModule { lazy final override val cli: Arguments = cliService lazy final override val spark: SparkModule.Service = sparkService lazy final override val log: Logger.Service = new Log() }
  • 40. © 2020 DEMANDBASE|SLIDE 40 RuntimeEnv case class RuntimeEnv( cliService: Arguments, sparkService: SparkModule.Service ) extends System.Live with Console.Live with Clock.Live with Random.Live with Blocking.Live with CommandLineArguments[Arguments] with Logger with FileIO.Live with SparkModule { lazy final override val cli: Arguments = cliService lazy final override val spark: SparkModule.Service = sparkService lazy final override val log: Logger.Service = new Log() }
  • 41. © 2020 DEMANDBASE|SLIDE 41 RuntimeEnv case class RuntimeEnv( cliService: Arguments, sparkService: SparkModule.Service ) extends System.Live with Console.Live with Clock.Live with Random.Live with Blocking.Live with CommandLineArguments[Arguments] with Logger with FileIO.Live with SparkModule { lazy final override val cli: Arguments = cliService lazy final override val spark: SparkModule.Service = sparkService lazy final override val log: Logger.Service = new Log() }
  • 43. © 2020 DEMANDBASE|SLIDE 43 Configurations case class Arguments(input: List[String]) extends ScallopConf(input) with CommandLineArguments.Service { val inputId: ScallopOption[Int] = opt[Int]( default = Some(10), required = false, noshort = true ) } object Arguments { def apply[A](f: Arguments => A): ZIO[CommandLineArguments[Arguments], Throwable, A] = { CommandLineArguments.get[Arguments](f) } }
  • 44. © 2020 DEMANDBASE|SLIDE 44 Using Configurations for { ... a <- Arguments(_.inputId()) ... } yield { ??? } ● No need to pass Arguments to all your methods. ● Always accessible through the ZIO environment
  • 46. © 2020 DEMANDBASE|SLIDE 46 Building Spark object SparkBuilder extends SparkModule.Builder[Arguments] { override protected final lazy val appName: String = "Zparkio_test" override protected def updateConfig( sparkBuilder: SparkSession.Builder, arguments: Arguments ): SparkSession.Builder = { sparkBuilder.config("spark.foo.bar", arguments.sparkFoo()) } }
  • 47. © 2020 DEMANDBASE|SLIDE 47 Fetching SparkSession for { ... spark <- SparkModule() ... } yield { ??? } ● No need to pass SparkSession to all your methods. ● Always accessible through the ZIO environment
  • 49. © 2020 DEMANDBASE|SLIDE 49 Making Datasets for { ... outputs <- ZDS { spark => import spark.implicits._ inputDS.map(_.toOutput) } ... } yield { ??? } ● Lots of helper functions
  • 50. © 2020 DEMANDBASE|SLIDE 50 Making Datasets for { ... outputs: Dataset[CaseClass] <- ZDS( CaseClass(a = 1, b = "one"), CaseClass(a = 2, b = "two"), CaseClass(a = 3, b = "three") ) ... } yield { ??? } ● Helpful in test ● Turn a Seq to a Dataset .
  • 51. © 2020 DEMANDBASE|SLIDE 51 Transforming Datasets for { ... outputs: Dataset[OutputCaseClass] <- ZDS( CaseClass(a = 1, b = "one") ).zMap { case TestClass(a, b) => Task(OutputCaseClass(a + b.length)) } ... } yield { ??? } ● No need to do _.map(_.map(???)) anymore
  • 52. © 2020 DEMANDBASE|SLIDE 52 Broadcasting for { ... authorIds: Broadcast[Array[Int]] <- ZDS.broadcast { spark => import spark.implicits._ posts.map(_.authorId).distinct.collect } ... } yield { ??? } ● Broadcast easily
  • 54. © 2020 DEMANDBASE|SLIDE 54 From Future import com.leobenkel.zparkio.ZFuture._ val z = (Future(???)(_)).toZIO ● https://github.com/leobenkel/ZparkIO/blob/master/Library/src/main/scal a/com/leobenkel/zparkio/ZFuture.scala
  • 56. © 2020 DEMANDBASE|SLIDE 56 Tried on production project at Demandbase ZIO 2h Future 3h ● Faster ● Less errors because of easy retry ● Cheaper because of timeout limit ● Better error logs because of Fiber logs
  • 57. © 2020 DEMANDBASE|SLIDE 57 Fetching everything at the same time
  • 58. © 2020 DEMANDBASE|SLIDE 58 Fetching everything at the same time
  • 60. © 2020 DEMANDBASE|SLIDE 60 What next? ● https://github.com/leobenkel/ZparkIO/issues ● Giter8 to make starting a project easier ● Build for all Spark versions