SlideShare a Scribd company logo
1 of 11
C H A P T E R 0 6 : A D V A N C E D S P A R K
P R O G R A M M I N G
Learning Spark
by Holden Karau et. al.
Overview: Advanced Spark Programming
 Introduction
 Accumulators
 Accumulators and Fault Tolerance
 Custom Accumulators
 Broadcast Variables
 Optimizing Broadcasts
 Working on a Per-Partition Basis
 Piping to External Programs
 Numeric RDD Operations
 Conclusion
6.1 Introduction
 This chapter introduces a variety of advanced Spark
programming features that we didn’t get to cover in
the previous chapters.
 We introduce two types of shared variables:
 Accumulators to aggregate information
 Broadcast variables to efficiently distribute large values.
 Throughout this chapter we build an example using
ham radio operators’ call logs as the input.
 This chapter introduces how to use Spark’s language-
agnostic pipe() method to interact with other
programs through standard input and output.
6.2 Accumulators
 When we normally pass functions to Spark, such as a
map() function or a condition for filter(), they can use
variables defined outside them in the driver program, but
each task running on the cluster gets a new copy of each
variable, and updates from these copies are not
propagated back to the driver. Spark’s shared variables,
accumulators and broadcast variables, relax this
restriction for two common types of communication
patterns: aggregation of results and broadcasts.
 Spark supports accumulators of type Double, Long, and
Float
Edx and Coursera Courses
 Introduction to Big Data with Apache Spark
 Spark Fundamentals I
 Functional Programming Principles in Scala
6.3 Broadcast Variables
 Spark’s second type of shared variable, broadcast
variables, allows the program to efficiently send a large,
read-only value to all the worker nodes for use in one or
more Spark operations. They come in handy, for
example, if your application needs to send a large, read-
only lookup table to all the nodes, or even a large feature
vector in a machine learning algorithm.
 When we are broadcasting large values, it is important to
choose a data serialization format that is both fast and
compact, because the time to send the value over the
network can quickly become a bottleneck if it takes a long
time to either serialize a value or to send the serialized
value over the network.
6.4 Working on a Per-Partition Basis
 Working with data on a per-partition basis allows us
to avoid redoing setup work for each data item.
Operations like opening a database connection or
creating a random-number generator are examples
of setup steps that we wish to avoid doing for each
element.
 Spark has per-partition versions of map and for each
to help reduce the cost of these operations by letting
you run code only once for each partition of an RDD.
6.5 Piping to External Programs
 With three language bindings to choose from out of the
box, you may have all the options you need for writing
Spark applications. However, if none of Scala, Java, or
Python does what you need, then Spark provides a
general mechanism to pipe data to programs in other
languages, like R scripts.
 Spark provides a pipe() method on RDDs. Spark’s pipe()
lets us write parts of jobs using any language we want as
long as it can read and write to Unix standard streams.
 With pipe(), you can write a transformation of an RDD
that reads each RDD element from standard input as a
String, manipulates that String however you like, and
then writes the result(s) as Strings to standard output.
6.6 Numeric RDD Operations
 Spark’s numeric operations are
implemented with a streaming
algorithm that allows for building
up our model one element at a
time. The descriptive statistics
are all computed in a single pass
over the data and returned as a
StatsCounter object by calling
stats().
 If you want to compute only one
of these statistics, you can also
call the correspond‐ing method
directly on an RDD—for example,
rdd.mean() or rdd.sum().
6.7 Conclusion
 In this chapter, you have been introduced to some of
the more advanced Spark programming features that
you can use to make your programs more efficient or
expressive. Subsequent chapters cover deploying and
tuning Spark applications, as well as built-in libraries
for SQL and streaming and machine learning. We’ll
also start seeing more complex and more complete
sample applications that make use of much of the
functionality described so far, and that should help
guide and inspire your own usage of Spark.
Edx and Coursera Courses
 Introduction to Big Data with Apache Spark
 Spark Fundamentals I
 Functional Programming Principles in Scala

More Related Content

What's hot

Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark TutorialAhmet Bulut
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksDatabricks
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xincaidezhi655
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingDatabricks
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingCloudera, Inc.
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016 Databricks
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Xuan-Chao Huang
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsDatabricks
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkSamy Dindane
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapePaco Nathan
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDatabricks
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0datamantra
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesDatabricks
 

What's hot (19)

Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Jump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on DatabricksJump Start with Apache Spark 2.0 on Databricks
Jump Start with Apache Spark 2.0 on Databricks
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
 
Strata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark StreamingStrata NYC 2015: What's new in Spark Streaming
Strata NYC 2015: What's new in Spark Streaming
 
Introduction to Apache Spark Developer Training
Introduction to Apache Spark Developer TrainingIntroduction to Apache Spark Developer Training
Introduction to Apache Spark Developer Training
 
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016 A Deep Dive into Structured Streaming:  Apache Spark Meetup at Bloomberg 2016
A Deep Dive into Structured Streaming: Apache Spark Meetup at Bloomberg 2016
 
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin WilkinsSpark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
Spark Summit EU talk by Mikhail Semeniuk Hollin Wilkins
 
Apache Spark 101
Apache Spark 101Apache Spark 101
Apache Spark 101
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
From Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data ApplicationsFrom Pipelines to Refineries: Scaling Big Data Applications
From Pipelines to Refineries: Scaling Big Data Applications
 
Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
How Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscapeHow Apache Spark fits into the Big Data landscape
How Apache Spark fits into the Big Data landscape
 
Data Science
Data ScienceData Science
Data Science
 
Spark sql
Spark sqlSpark sql
Spark sql
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 
Introduction to spark 2.0
Introduction to spark 2.0Introduction to spark 2.0
Introduction to spark 2.0
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 

Viewers also liked

Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Sudhir Mallem
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet odsc
 
df: Dataframe on Spark
df: Dataframe on Sparkdf: Dataframe on Spark
df: Dataframe on SparkAlpine Data
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013larsgeorge
 
A Step to programming with Apache Spark
A Step to programming with Apache SparkA Step to programming with Apache Spark
A Step to programming with Apache SparkKnoldus Inc.
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark Mostafa
 
Code Review and other aspects of project organization
Code Review and other aspects of project organizationCode Review and other aspects of project organization
Code Review and other aspects of project organizationŁukasz Dumiszewski
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningSpark Summit
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark Summit
 
Programming in Spark - Lessons Learned in OpenAire project
Programming in Spark - Lessons Learned in OpenAire projectProgramming in Spark - Lessons Learned in OpenAire project
Programming in Spark - Lessons Learned in OpenAire projectŁukasz Dumiszewski
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 

Viewers also liked (20)

Overcoming cassandra query limitation spark
Overcoming cassandra query limitation sparkOvercoming cassandra query limitation spark
Overcoming cassandra query limitation spark
 
Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
Interactive SQL POC on Hadoop (Hive, Presto and Hive-on-Tez)
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
df: Dataframe on Spark
df: Dataframe on Sparkdf: Dataframe on Spark
df: Dataframe on Spark
 
Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013Parquet - Data I/O - Philadelphia 2013
Parquet - Data I/O - Philadelphia 2013
 
Apache Spark
Apache Spark Apache Spark
Apache Spark
 
A Step to programming with Apache Spark
A Step to programming with Apache SparkA Step to programming with Apache Spark
A Step to programming with Apache Spark
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Programming in Spark using PySpark
Programming in Spark using PySpark      Programming in Spark using PySpark
Programming in Spark using PySpark
 
Code Review and other aspects of project organization
Code Review and other aspects of project organizationCode Review and other aspects of project organization
Code Review and other aspects of project organization
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic RepartitioningHandling Data Skew Adaptively In Spark Using Dynamic Repartitioning
Handling Data Skew Adaptively In Spark Using Dynamic Repartitioning
 
Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham ChopraSpark and Online Analytics: Spark Summit East talky by Shubham Chopra
Spark and Online Analytics: Spark Summit East talky by Shubham Chopra
 
Programming in Spark - Lessons Learned in OpenAire project
Programming in Spark - Lessons Learned in OpenAire projectProgramming in Spark - Lessons Learned in OpenAire project
Programming in Spark - Lessons Learned in OpenAire project
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 

Similar to Advanced Spark Programming Techniques

Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An OverviewMohit Jain
 
Spark and scala..................................... ppt.pptx
Spark and scala..................................... ppt.pptxSpark and scala..................................... ppt.pptx
Spark and scala..................................... ppt.pptxshivani22y
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark exampleShidrokhGoudarzi1
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Soumee Maschatak
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hoodAdarsh Pannu
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKzmhassan
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working setsJinxinTang
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingAnimesh Chaturvedi
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache sparkRahul Kumar
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkSupriya .
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkROlgun Aydın
 

Similar to Advanced Spark Programming Techniques (20)

Apache Spark An Overview
Apache Spark An OverviewApache Spark An Overview
Apache Spark An Overview
 
Spark and scala..................................... ppt.pptx
Spark and scala..................................... ppt.pptxSpark and scala..................................... ppt.pptx
Spark and scala..................................... ppt.pptx
 
Spark core
Spark coreSpark core
Spark core
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)Managing Big data Module 3 (1st part)
Managing Big data Module 3 (1st part)
 
Apache Spark: What's under the hood
Apache Spark: What's under the hoodApache Spark: What's under the hood
Apache Spark: What's under the hood
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARKSCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
SCALABLE MONITORING USING PROMETHEUS WITH APACHE SPARK
 
Apache Spark Tutorial
Apache Spark TutorialApache Spark Tutorial
Apache Spark Tutorial
 
Spark cluster computing with working sets
Spark cluster computing with working setsSpark cluster computing with working sets
Spark cluster computing with working sets
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Big Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computingBig Data Analytics and Ubiquitous computing
Big Data Analytics and Ubiquitous computing
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Hot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark frameworkHot-Spot analysis Using Apache Spark framework
Hot-Spot analysis Using Apache Spark framework
 
Spark
SparkSpark
Spark
 
Machine Learning with SparkR
Machine Learning with SparkRMachine Learning with SparkR
Machine Learning with SparkR
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 

More from phanleson

Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewallsphanleson
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hackingphanleson
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocolsphanleson
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacksphanleson
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applicationsphanleson
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designphanleson
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operationsphanleson
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBasephanleson
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagiaphanleson
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLphanleson
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Webphanleson
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposesphanleson
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19phanleson
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Developmentphanleson
 
Lecture 15 - Technical Details
Lecture 15 - Technical DetailsLecture 15 - Technical Details
Lecture 15 - Technical Detailsphanleson
 
Lecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange PatternsLecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange Patternsphanleson
 
Lecture 9 - SOA in Context
Lecture 9 - SOA in ContextLecture 9 - SOA in Context
Lecture 9 - SOA in Contextphanleson
 
Lecture 07 - Business Process Management
Lecture 07 - Business Process ManagementLecture 07 - Business Process Management
Lecture 07 - Business Process Managementphanleson
 
Lecture 04 - Loose Coupling
Lecture 04 - Loose CouplingLecture 04 - Loose Coupling
Lecture 04 - Loose Couplingphanleson
 
Lecture 2 - SOA
Lecture 2 - SOALecture 2 - SOA
Lecture 2 - SOAphanleson
 

More from phanleson (20)

Firewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth FirewallsFirewall - Network Defense in Depth Firewalls
Firewall - Network Defense in Depth Firewalls
 
Mobile Security - Wireless hacking
Mobile Security - Wireless hackingMobile Security - Wireless hacking
Mobile Security - Wireless hacking
 
Authentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless ProtocolsAuthentication in wireless - Security in Wireless Protocols
Authentication in wireless - Security in Wireless Protocols
 
E-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server AttacksE-Commerce Security - Application attacks - Server Attacks
E-Commerce Security - Application attacks - Server Attacks
 
Hacking web applications
Hacking web applicationsHacking web applications
Hacking web applications
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
 
HBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - OperationsHBase In Action - Chapter 10 - Operations
HBase In Action - Chapter 10 - Operations
 
Hbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBaseHbase in action - Chapter 09: Deploying HBase
Hbase in action - Chapter 09: Deploying HBase
 
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about LibertagiaHướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
Hướng Dẫn Đăng Ký LibertaGia - A guide and introduciton about Libertagia
 
Lecture 1 - Getting to know XML
Lecture 1 - Getting to know XMLLecture 1 - Getting to know XML
Lecture 1 - Getting to know XML
 
Lecture 4 - Adding XTHML for the Web
Lecture  4 - Adding XTHML for the WebLecture  4 - Adding XTHML for the Web
Lecture 4 - Adding XTHML for the Web
 
Lecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many PurposesLecture 2 - Using XML for Many Purposes
Lecture 2 - Using XML for Many Purposes
 
SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19SOA Course - SOA governance - Lecture 19
SOA Course - SOA governance - Lecture 19
 
Lecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service DevelopmentLecture 18 - Model-Driven Service Development
Lecture 18 - Model-Driven Service Development
 
Lecture 15 - Technical Details
Lecture 15 - Technical DetailsLecture 15 - Technical Details
Lecture 15 - Technical Details
 
Lecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange PatternsLecture 10 - Message Exchange Patterns
Lecture 10 - Message Exchange Patterns
 
Lecture 9 - SOA in Context
Lecture 9 - SOA in ContextLecture 9 - SOA in Context
Lecture 9 - SOA in Context
 
Lecture 07 - Business Process Management
Lecture 07 - Business Process ManagementLecture 07 - Business Process Management
Lecture 07 - Business Process Management
 
Lecture 04 - Loose Coupling
Lecture 04 - Loose CouplingLecture 04 - Loose Coupling
Lecture 04 - Loose Coupling
 
Lecture 2 - SOA
Lecture 2 - SOALecture 2 - SOA
Lecture 2 - SOA
 

Recently uploaded

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayMakMakNepo
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Recently uploaded (20)

ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Quarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up FridayQuarter 4 Peace-education.pptx Catch Up Friday
Quarter 4 Peace-education.pptx Catch Up Friday
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

Advanced Spark Programming Techniques

  • 1. C H A P T E R 0 6 : A D V A N C E D S P A R K P R O G R A M M I N G Learning Spark by Holden Karau et. al.
  • 2. Overview: Advanced Spark Programming  Introduction  Accumulators  Accumulators and Fault Tolerance  Custom Accumulators  Broadcast Variables  Optimizing Broadcasts  Working on a Per-Partition Basis  Piping to External Programs  Numeric RDD Operations  Conclusion
  • 3. 6.1 Introduction  This chapter introduces a variety of advanced Spark programming features that we didn’t get to cover in the previous chapters.  We introduce two types of shared variables:  Accumulators to aggregate information  Broadcast variables to efficiently distribute large values.  Throughout this chapter we build an example using ham radio operators’ call logs as the input.  This chapter introduces how to use Spark’s language- agnostic pipe() method to interact with other programs through standard input and output.
  • 4. 6.2 Accumulators  When we normally pass functions to Spark, such as a map() function or a condition for filter(), they can use variables defined outside them in the driver program, but each task running on the cluster gets a new copy of each variable, and updates from these copies are not propagated back to the driver. Spark’s shared variables, accumulators and broadcast variables, relax this restriction for two common types of communication patterns: aggregation of results and broadcasts.  Spark supports accumulators of type Double, Long, and Float
  • 5. Edx and Coursera Courses  Introduction to Big Data with Apache Spark  Spark Fundamentals I  Functional Programming Principles in Scala
  • 6. 6.3 Broadcast Variables  Spark’s second type of shared variable, broadcast variables, allows the program to efficiently send a large, read-only value to all the worker nodes for use in one or more Spark operations. They come in handy, for example, if your application needs to send a large, read- only lookup table to all the nodes, or even a large feature vector in a machine learning algorithm.  When we are broadcasting large values, it is important to choose a data serialization format that is both fast and compact, because the time to send the value over the network can quickly become a bottleneck if it takes a long time to either serialize a value or to send the serialized value over the network.
  • 7. 6.4 Working on a Per-Partition Basis  Working with data on a per-partition basis allows us to avoid redoing setup work for each data item. Operations like opening a database connection or creating a random-number generator are examples of setup steps that we wish to avoid doing for each element.  Spark has per-partition versions of map and for each to help reduce the cost of these operations by letting you run code only once for each partition of an RDD.
  • 8. 6.5 Piping to External Programs  With three language bindings to choose from out of the box, you may have all the options you need for writing Spark applications. However, if none of Scala, Java, or Python does what you need, then Spark provides a general mechanism to pipe data to programs in other languages, like R scripts.  Spark provides a pipe() method on RDDs. Spark’s pipe() lets us write parts of jobs using any language we want as long as it can read and write to Unix standard streams.  With pipe(), you can write a transformation of an RDD that reads each RDD element from standard input as a String, manipulates that String however you like, and then writes the result(s) as Strings to standard output.
  • 9. 6.6 Numeric RDD Operations  Spark’s numeric operations are implemented with a streaming algorithm that allows for building up our model one element at a time. The descriptive statistics are all computed in a single pass over the data and returned as a StatsCounter object by calling stats().  If you want to compute only one of these statistics, you can also call the correspond‐ing method directly on an RDD—for example, rdd.mean() or rdd.sum().
  • 10. 6.7 Conclusion  In this chapter, you have been introduced to some of the more advanced Spark programming features that you can use to make your programs more efficient or expressive. Subsequent chapters cover deploying and tuning Spark applications, as well as built-in libraries for SQL and streaming and machine learning. We’ll also start seeing more complex and more complete sample applications that make use of much of the functionality described so far, and that should help guide and inspire your own usage of Spark.
  • 11. Edx and Coursera Courses  Introduction to Big Data with Apache Spark  Spark Fundamentals I  Functional Programming Principles in Scala