SlideShare a Scribd company logo
© 2015 IBM Corporation
Interactive Analytics Using Apache Spark
Bangalore Spark Enthusiasts Group
http://www.meetup.com/Bangalore-Spark-Enthusiasts/
1
Bagavath Subramaniam, IBM Analytics
Shally Sangal, IBM Analytics
© 2015 IBM Corporation
Agenda
▪ Overview of Interactive Analytics
▪ Spark Application User Types
▪ Spark Context
▪ Spark Shell
▪ Spark Submit
▪ Spark JDBC Thrift Server
▪ Apache Zeppelin
▪ Jupyter
▪ Spark Kernel
▪ Spark Job Server
▪ Livy
2
© 2015 IBM Corporation
Spark Application User Types
Data Scientist
▪ Data Exploration
▪ Data Wrangling
▪ Build Models from Data using Algorithms - Predict/Prescribe
▪ Knowledge in Statistics & Maths
▪ R/Python, Matlab/SPSS
▪ Ad-hoc analysis using Interactive Shells
Data Analyst
▪ Data Exploration and Visualization
▪ Understands data sources and relationships among them in an Enterprise
▪ Relates data to business and derives insights, can talk business language
▪ May have basic programming skills and analytic tools knowledge
▪ Ad-hoc analysis using canned reports
▪ Limited usage of interactive shells
3
© 2015 IBM Corporation
Typical User Roles
Business Analyst
▪ Industry Expert
▪ Understand business needs and works on solutions
▪ Improve business processes and design new systems to support them
▪ Not a programmer / Analytics expert
▪ Typical user of reporting systems
Data Engineer / Application Developer
▪ Programmer with S/W Engineering background
▪ Builds production data pipelines, data warehouses, reporting solutions and apps
▪ Productionize models built by data scientists
▪ Builds s/w applications to solve business problems
▪ Maintains, Monitors, Tunes data processing platform and applications
> Roles are often fluid and overlapping
4
© 2015 IBM Corporation
Interactive Tools for Spark
Apache Spark
IBM Spark Kernel
(Apache Toree)
Cloudera
Livy
Ooyala
Spark Job Server
5
© 2015 IBM Corporation
User and Tools
Primary set of tools for each role
Spark
Shell
Spark
Submit
Thrift
JDBC
Server
Zeppelin Spark
Kernel
Jupyter Livy Hue
Data Scientist
Data Analyst
Developer
Business Analyst
6
Spark Job
Server
© 2015 IBM Corporation
Spark Context
▪ Common thread for all Spark Interfaces
▪ Main entry point for Spark, represents the connection to a Spark cluster
▪ Standalone, Yarn, Mesos, Local
▪ Holds all the configuration - memory, cores, parallelism, compression
▪ Create RDDs, accumulators, broadcast variables
▪ Run Jobs, Cancel Jobs
▪ One Spark Context per JVM limitation, one application Id
▪ Supports parallel jobs from separate threads
▪ Scheduler mode - FIFO / Fair (within an Application)
▪ Fair Scheduler
− Pools - spark.scheduler.pool
− Weights/Priorities, Scheduling Mode, Weights
7
© 2015 IBM Corporation
Spark shell
Spark Shell
▪ Interactive shell ( spark-shell for scala, pyspark for python)
▪ spark-shell is based on scala REPL
▪ Instantiates Spark Context by default, also a UI
▪ Also gives sqlContext which is hiveContext
▪ Internally calls spark-submit
{SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name Spark shell
▪ All parameters of Spark submit can be passed to Spark shell as well
8
© 2015 IBM Corporation
Spark submit
▪ Launch/Submit a spark application to a spark cluster
▪ org.apache.spark.launcher.Main gets called with org.apache.spark.deploy.SparkSubmit
as a parameter along with the other params passed to spark-submit
org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit "$@"
▪ spark-submit --help => list of supported parameters
▪ kill a job ( spark-submit --kill) and get job status (spark-submit --status)
▪ spark-defaults.conf in SPARK_CONF_DIR
▪ Precedence - Explicit set on SparkConf, flags passed to spark-submit, values from
defaults
9
© 2015 IBM Corporation
▪ Web based notebook for interactive analytics.
▪ Provides built in Spark integration.
▪ Supports many interpreters such as Scala,
Pyspark,SparkSQL, Hive, Shell etc.
▪ It starts a Zeppelin server.
▪ Spawns one JVM per interpreter group.
▪ Server communicates with the Interpreter
Group using Thrift.
10
Apache Zeppelin
© 2015 IBM Corporation
Zeppelin Demo
To know more : http://zeppelin.incubator.apache.org/
11
© 2015 IBM Corporation
Jupyter Notebook
Web notebook for interactive data
analysis.Part of Jupyter ecosystem.
Evolved from IPython, works on the
IPython messaging Protocol.
Has the concept of Kernels - any
language kernel can be plugged in
which implements the protocol.
Spark kernel is available via Apache
Toree.
12
© 2015 IBM Corporation
Jupyter Notebook Demo
To know more : http://jupyter.org/
13
© 2015 IBM Corporation
Spark Kernel ( Apache Toree )
Kernel provides the foundation for interactive
applications to connect to use Spark.
Provides an interface that allows clients to
interact with a Spark Cluster. Clients can send
code snippets and libraries that are interpreted
and run against a pre configured SparkContext.
Acts as a proxy between your application and
the Spark Cluster.
14
© 2015 IBM Corporation
Kernel Architecture
Kernel uses ZeroMQ as its messaging
middleware using TCP sockets
and implements the IPython
message protocol.
It is architected in layers, where each
layer has a specific purpose
in processing of requests.
Provides concurrency and code
isolation by use of Akka
framework.
15
© 2015 IBM Corporation
How does it talk to Spark?
Kernel is launched by a spark-submit process. It works with local spark, Standalone Spark
Cluster as well as Spark with Yarn.
SPARK_HOME is a mandatory environment variable needed.
SPARK_OPTS is an optional environment variable - we can use to configure spark master,
deploy mode, driver memory, number of executors etc.
Uses the same Scala Interpreter as Spark shell. The interpreter holds a Spark Context and
the class server uri used to host compiled code.
16
© 2015 IBM Corporation
How to communicate with Kernel
Two forms of communication :
1. Client library for code execution
1. Directly talk to Kernel like Jupyter notebook
17
© 2015 IBM Corporation
Kernel Client Library
Written in Scala. Eliminates need to understand ZeroMQ
message protocol.
Enables treating the kernel as a remote service.
Shares majority of its code with the kernel’s
codebase.
Two steps to using the client :
1. Initialize the client with the connection details of the kernel.
2. Use the execute API to run code snippets with attached
callbacks.
18
© 2015 IBM Corporation
How to run Kernel and Client:
https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel
https://github.com/ibm-et/spark-kernel/wiki/Guide-for-the-Spark-Kernel-Client
CODE DEMO
19
© 2015 IBM Corporation
Comm API
As part of the IPython message protocol, the Comm API allows developers to specify
custom messages to communicate data and perform actions on both the frontend(client) and
backend(kernel). This API is useful in scenarios where we want to do same actions for the
messages. Either client or kernel can start sending messages.
20
© 2015 IBM Corporation
Livy
Livy is an open source REST
interface for interacting with
Spark. It supports executing
code snippets of Python,
Scala, R.
It is currently used to power
the Spark snippets of Hadoop
Notebook in Hue.
Multiple contexts by using
multiple sessions or multiple
users to same session.
21
© 2015 IBM Corporation
LIVY CODE EXECUTION DEMO
To know more : https://github.com/cloudera/hue/tree/master/apps/spark/java
22
© 2015 IBM Corporation
Spark Job Server
JobServer provides a REST interface for submitting and managing Spark jobs/jars.
It is intended to be run as one or more independent processes, separate from the Spark
cluster or within the spark cluster. It works with Mesos as well as Yarn.
It supports multiple Spark Context. Runs SparkContext in their own forked JVM process.
This is available via a config parameter spark.jobserver.context-per-jvm. It is by default set
to false for local development mode, but recommended to be set to true for production
deployment.
It exposes APIs to upload your jars, get contexts, run jobs, get data, configure contexts etc.
It used Spray, Akka actors , Akka Cluster for separate contexts.
23
© 2015 IBM Corporation
JOB SERVER DEMO
To know more : https://github.com/spark-jobserver/spark-jobserver
24

More Related Content

What's hot

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Databricks
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Spark Summit
 
Project Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python UsersProject Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python Users
Databricks
 
Helium makes Zeppelin fly!
Helium makes Zeppelin fly!Helium makes Zeppelin fly!
Helium makes Zeppelin fly!
DataWorks Summit
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
Sachin Aggarwal
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
DataWorks Summit/Hadoop Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data Science
Bikas Saha
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
DataWorks Summit/Hadoop Summit
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
DataWorks Summit/Hadoop Summit
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
Databricks
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virendervithakur
 
Advanced Visualization of Spark jobs
Advanced Visualization of Spark jobsAdvanced Visualization of Spark jobs
Advanced Visualization of Spark jobs
DataWorks Summit/Hadoop Summit
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Big Data Spain
 
Spark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit EU talk by William Benton
Spark Summit EU talk by William Benton
Spark Summit
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Databricks
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
huguk
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Alex Zeltov
 

What's hot (20)

Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis MagdaApache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT with Denis Magda
 
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
Opaque: A Data Analytics Platform with Strong Security: Spark Summit East tal...
 
Project Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python UsersProject Zen: Improving Apache Spark for Python Users
Project Zen: Improving Apache Spark for Python Users
 
Helium makes Zeppelin fly!
Helium makes Zeppelin fly!Helium makes Zeppelin fly!
Helium makes Zeppelin fly!
 
Comparison of various streaming technologies
Comparison of various streaming technologiesComparison of various streaming technologies
Comparison of various streaming technologies
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
Migrating pipelines into Docker
Migrating pipelines into DockerMigrating pipelines into Docker
Migrating pipelines into Docker
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Apache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data ScienceApache Zeppelin and Spark for Enterprise Data Science
Apache Zeppelin and Spark for Enterprise Data Science
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
SparkOscope: Enabling Apache Spark Optimization through Cross Stack Monitorin...
 
Streaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_VirenderStreaming Sensor Data Slides_Virender
Streaming Sensor Data Slides_Virender
 
Advanced Visualization of Spark jobs
Advanced Visualization of Spark jobsAdvanced Visualization of Spark jobs
Advanced Visualization of Spark jobs
 
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan GatesApache Hive 2.0 SQL, Speed, Scale by Alan Gates
Apache Hive 2.0 SQL, Speed, Scale by Alan Gates
 
Spark Summit EU talk by William Benton
Spark Summit EU talk by William BentonSpark Summit EU talk by William Benton
Spark Summit EU talk by William Benton
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 

Viewers also liked

Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
Sachin Aggarwal
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
Databricks
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
Jen Aman
 
Building a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceBuilding a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a Service
Cloudera, Inc.
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Spark Summit
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
Spark Summit
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
Legacy Typesafe (now Lightbend)
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
arunkumar sadhasivam
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
gethue
 
Graph Data -- RDF and Property Graphs
Graph Data -- RDF and Property GraphsGraph Data -- RDF and Property Graphs
Graph Data -- RDF and Property Graphs
andyseaborne
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Jen Aman
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
Jen Aman
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
Spark Summit
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
Jen Aman
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
DataStax Academy
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
Jen Aman
 
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
Jen Aman
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
Databricks
 

Viewers also liked (20)

Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
Building a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceBuilding a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a Service
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
 
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service700 Queries Per Second with Updates: Spark As A Real-Time Web Service
700 Queries Per Second with Updates: Spark As A Real-Time Web Service
 
How to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOSHow to deploy Apache Spark 
to Mesos/DCOS
How to deploy Apache Spark 
to Mesos/DCOS
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
 
Big Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your BrowserBig Data Scala by the Bay: Interactive Spark in your Browser
Big Data Scala by the Bay: Interactive Spark in your Browser
 
Graph Data -- RDF and Property Graphs
Graph Data -- RDF and Property GraphsGraph Data -- RDF and Property Graphs
Graph Data -- RDF and Property Graphs
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
No More “Sbt Assembly”: Rethinking Spark-Submit Using CueSheet: Spark Summit ...
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandracodecentric AG: CQRS and Event Sourcing Applications with Cassandra
codecentric AG: CQRS and Event Sourcing Applications with Cassandra
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 

Similar to Interactive Analytics using Apache Spark

Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
Tim Ellison
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
Andrey Vykhodtsev
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
Tim Ellison
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
Learntek1
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
Big Data Spain
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
J On The Beach
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Romeo Kienzler
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
Janu Jahnavi
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
Luciano Resende
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
Knoldus Inc.
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
Luciano Resende
 
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav TulachJDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
PROIDEA
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
GlobalLogic Ukraine
 
Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019
Shaun Smith
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
ShidrokhGoudarzi1
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
Knoldus Inc.
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
Tim Ellison
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
Takeshi Yamamuro
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
Luciano Resende
 

Similar to Interactive Analytics using Apache Spark (20)

Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
 
Apache Big Data Europe 2016
Apache Big Data Europe 2016Apache Big Data Europe 2016
Apache Big Data Europe 2016
 
Scala & Spark Online Training
Scala & Spark Online TrainingScala & Spark Online Training
Scala & Spark Online Training
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
A Java Implementer's Guide to Boosting Apache Spark Performance by Tim Ellison.
 
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
Cloud scale predictive DevOps automation using Apache Spark: Velocity in Amst...
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav TulachJDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
JDD2015: Towards the Fastest (J)VM on the Planet! - Jaroslav Tulach
 
OpenDataPlane Project
OpenDataPlane ProjectOpenDataPlane Project
OpenDataPlane Project
 
Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019Serverless Java: JJUG CCC 2019
Serverless Java: JJUG CCC 2019
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
A Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark PerformanceA Java Implementer's Guide to Better Apache Spark Performance
A Java Implementer's Guide to Better Apache Spark Performance
 
20180417 hivemall meetup#4
20180417 hivemall meetup#420180417 hivemall meetup#4
20180417 hivemall meetup#4
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 

Recently uploaded

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 

Recently uploaded (20)

一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 

Interactive Analytics using Apache Spark

  • 1. © 2015 IBM Corporation Interactive Analytics Using Apache Spark Bangalore Spark Enthusiasts Group http://www.meetup.com/Bangalore-Spark-Enthusiasts/ 1 Bagavath Subramaniam, IBM Analytics Shally Sangal, IBM Analytics
  • 2. © 2015 IBM Corporation Agenda ▪ Overview of Interactive Analytics ▪ Spark Application User Types ▪ Spark Context ▪ Spark Shell ▪ Spark Submit ▪ Spark JDBC Thrift Server ▪ Apache Zeppelin ▪ Jupyter ▪ Spark Kernel ▪ Spark Job Server ▪ Livy 2
  • 3. © 2015 IBM Corporation Spark Application User Types Data Scientist ▪ Data Exploration ▪ Data Wrangling ▪ Build Models from Data using Algorithms - Predict/Prescribe ▪ Knowledge in Statistics & Maths ▪ R/Python, Matlab/SPSS ▪ Ad-hoc analysis using Interactive Shells Data Analyst ▪ Data Exploration and Visualization ▪ Understands data sources and relationships among them in an Enterprise ▪ Relates data to business and derives insights, can talk business language ▪ May have basic programming skills and analytic tools knowledge ▪ Ad-hoc analysis using canned reports ▪ Limited usage of interactive shells 3
  • 4. © 2015 IBM Corporation Typical User Roles Business Analyst ▪ Industry Expert ▪ Understand business needs and works on solutions ▪ Improve business processes and design new systems to support them ▪ Not a programmer / Analytics expert ▪ Typical user of reporting systems Data Engineer / Application Developer ▪ Programmer with S/W Engineering background ▪ Builds production data pipelines, data warehouses, reporting solutions and apps ▪ Productionize models built by data scientists ▪ Builds s/w applications to solve business problems ▪ Maintains, Monitors, Tunes data processing platform and applications > Roles are often fluid and overlapping 4
  • 5. © 2015 IBM Corporation Interactive Tools for Spark Apache Spark IBM Spark Kernel (Apache Toree) Cloudera Livy Ooyala Spark Job Server 5
  • 6. © 2015 IBM Corporation User and Tools Primary set of tools for each role Spark Shell Spark Submit Thrift JDBC Server Zeppelin Spark Kernel Jupyter Livy Hue Data Scientist Data Analyst Developer Business Analyst 6 Spark Job Server
  • 7. © 2015 IBM Corporation Spark Context ▪ Common thread for all Spark Interfaces ▪ Main entry point for Spark, represents the connection to a Spark cluster ▪ Standalone, Yarn, Mesos, Local ▪ Holds all the configuration - memory, cores, parallelism, compression ▪ Create RDDs, accumulators, broadcast variables ▪ Run Jobs, Cancel Jobs ▪ One Spark Context per JVM limitation, one application Id ▪ Supports parallel jobs from separate threads ▪ Scheduler mode - FIFO / Fair (within an Application) ▪ Fair Scheduler − Pools - spark.scheduler.pool − Weights/Priorities, Scheduling Mode, Weights 7
  • 8. © 2015 IBM Corporation Spark shell Spark Shell ▪ Interactive shell ( spark-shell for scala, pyspark for python) ▪ spark-shell is based on scala REPL ▪ Instantiates Spark Context by default, also a UI ▪ Also gives sqlContext which is hiveContext ▪ Internally calls spark-submit {SPARK_HOME}"/bin/spark-submit --class org.apache.spark.repl.Main --name Spark shell ▪ All parameters of Spark submit can be passed to Spark shell as well 8
  • 9. © 2015 IBM Corporation Spark submit ▪ Launch/Submit a spark application to a spark cluster ▪ org.apache.spark.launcher.Main gets called with org.apache.spark.deploy.SparkSubmit as a parameter along with the other params passed to spark-submit org.apache.spark.launcher.Main org.apache.spark.deploy.SparkSubmit "$@" ▪ spark-submit --help => list of supported parameters ▪ kill a job ( spark-submit --kill) and get job status (spark-submit --status) ▪ spark-defaults.conf in SPARK_CONF_DIR ▪ Precedence - Explicit set on SparkConf, flags passed to spark-submit, values from defaults 9
  • 10. © 2015 IBM Corporation ▪ Web based notebook for interactive analytics. ▪ Provides built in Spark integration. ▪ Supports many interpreters such as Scala, Pyspark,SparkSQL, Hive, Shell etc. ▪ It starts a Zeppelin server. ▪ Spawns one JVM per interpreter group. ▪ Server communicates with the Interpreter Group using Thrift. 10 Apache Zeppelin
  • 11. © 2015 IBM Corporation Zeppelin Demo To know more : http://zeppelin.incubator.apache.org/ 11
  • 12. © 2015 IBM Corporation Jupyter Notebook Web notebook for interactive data analysis.Part of Jupyter ecosystem. Evolved from IPython, works on the IPython messaging Protocol. Has the concept of Kernels - any language kernel can be plugged in which implements the protocol. Spark kernel is available via Apache Toree. 12
  • 13. © 2015 IBM Corporation Jupyter Notebook Demo To know more : http://jupyter.org/ 13
  • 14. © 2015 IBM Corporation Spark Kernel ( Apache Toree ) Kernel provides the foundation for interactive applications to connect to use Spark. Provides an interface that allows clients to interact with a Spark Cluster. Clients can send code snippets and libraries that are interpreted and run against a pre configured SparkContext. Acts as a proxy between your application and the Spark Cluster. 14
  • 15. © 2015 IBM Corporation Kernel Architecture Kernel uses ZeroMQ as its messaging middleware using TCP sockets and implements the IPython message protocol. It is architected in layers, where each layer has a specific purpose in processing of requests. Provides concurrency and code isolation by use of Akka framework. 15
  • 16. © 2015 IBM Corporation How does it talk to Spark? Kernel is launched by a spark-submit process. It works with local spark, Standalone Spark Cluster as well as Spark with Yarn. SPARK_HOME is a mandatory environment variable needed. SPARK_OPTS is an optional environment variable - we can use to configure spark master, deploy mode, driver memory, number of executors etc. Uses the same Scala Interpreter as Spark shell. The interpreter holds a Spark Context and the class server uri used to host compiled code. 16
  • 17. © 2015 IBM Corporation How to communicate with Kernel Two forms of communication : 1. Client library for code execution 1. Directly talk to Kernel like Jupyter notebook 17
  • 18. © 2015 IBM Corporation Kernel Client Library Written in Scala. Eliminates need to understand ZeroMQ message protocol. Enables treating the kernel as a remote service. Shares majority of its code with the kernel’s codebase. Two steps to using the client : 1. Initialize the client with the connection details of the kernel. 2. Use the execute API to run code snippets with attached callbacks. 18
  • 19. © 2015 IBM Corporation How to run Kernel and Client: https://github.com/ibm-et/spark-kernel/wiki/Getting-Started-with-the-Spark-Kernel https://github.com/ibm-et/spark-kernel/wiki/Guide-for-the-Spark-Kernel-Client CODE DEMO 19
  • 20. © 2015 IBM Corporation Comm API As part of the IPython message protocol, the Comm API allows developers to specify custom messages to communicate data and perform actions on both the frontend(client) and backend(kernel). This API is useful in scenarios where we want to do same actions for the messages. Either client or kernel can start sending messages. 20
  • 21. © 2015 IBM Corporation Livy Livy is an open source REST interface for interacting with Spark. It supports executing code snippets of Python, Scala, R. It is currently used to power the Spark snippets of Hadoop Notebook in Hue. Multiple contexts by using multiple sessions or multiple users to same session. 21
  • 22. © 2015 IBM Corporation LIVY CODE EXECUTION DEMO To know more : https://github.com/cloudera/hue/tree/master/apps/spark/java 22
  • 23. © 2015 IBM Corporation Spark Job Server JobServer provides a REST interface for submitting and managing Spark jobs/jars. It is intended to be run as one or more independent processes, separate from the Spark cluster or within the spark cluster. It works with Mesos as well as Yarn. It supports multiple Spark Context. Runs SparkContext in their own forked JVM process. This is available via a config parameter spark.jobserver.context-per-jvm. It is by default set to false for local development mode, but recommended to be set to true for production deployment. It exposes APIs to upload your jars, get contexts, run jobs, get data, configure contexts etc. It used Spray, Akka actors , Akka Cluster for separate contexts. 23
  • 24. © 2015 IBM Corporation JOB SERVER DEMO To know more : https://github.com/spark-jobserver/spark-jobserver 24