SlideShare a Scribd company logo
1 of 66
Download to read offline
© 2017 MapR Technologies 1
Spark and MapR Streams: A
Motivating Example
© 2017 MapR Technologies 2
Abstract
• Businesses are discovering the untapped potential of large datasets and data
streams through the use of technologies for big data processing and storage. By
leveraging these assets they’re creating a new generation of applications that
derive value from data they used to throw away.
• In this presentation we’ll discuss how to build operational environments for
these types of applications with the MapR Converged Data Platform and we’ll
walk through an example of a next-generation application that uses Java APIs
for MapR Streams, Apache Spark, Apache Hive, and MapR-DB.
• We’ll see how these technologies can be used to join and transform unbounded
datasets to find signals and derive new data streams for a financial scenario
involving real-time algorithmic trading and historical analysis using SQL.
• We’ll also discuss how MapR enables you to run real-time data applications with
the speed, reliability, and security you need for a production environment.
• Keywords: MapR, Spark, Kafka, NoSQL, JSON, Zeppelin, Hive, streaming
© 2017 MapR Technologies 3
Contact Info
Ian Downard
Technical Evangelist at MapR Technologies
idownard@mapr.com
Personal Blog: http://bigendiandata.com
Twitter: @iandownard
© 2017 MapR Technologies 4
Learning Goals
1. Appreciate the opportunity of the time we’re in.
2. Become familiar with MapR
3. Become familiar with Spark
4. Feel empowered.
© 2017 MapR Technologies 5
Why Now?
• But Moore’s law has applied for a long time
• Why is data exploding now?
• Why not 10 years ago?
• Why not 20?
© 2017 MapR Technologies 6
Because data wasn’t available?
• If it were just availability of data then existing big companies
would adopt big data technology first
© 2017 MapR Technologies 7
Because data wasn’t available?
• If it were just availability of data then existing big companies
would adopt big data technology first
They didn’t
© 2017 MapR Technologies 8
Because processing it was too expensive?
• If it were just a net positive value then finance companies
should adopt first because they have higher opportunity value
/ byte
© 2017 MapR Technologies 9
Because processing it was too expensive?
• If it were just a net positive value then finance companies
should adopt first because they have higher opportunity value
/ byte
They didn’t
© 2017 MapR Technologies 10
Backwards adoption
• Under almost any argument, startups would not have adopted
big data technology first
© 2017 MapR Technologies 11
Backwards adoption
• Under almost any argument, startups would not have adopted
big data technology first
They did
© 2017 MapR Technologies 12
Everywhere at Once?
• Something very strange is happening
– Big data is being applied at many different scales
– By large companies and small
© 2017 MapR Technologies 13
Everywhere at Once?
• Something very strange is happening
– Big data is being applied at many different scales
– By large companies and small
Why?
© 2017 MapR Technologies 14
Data Analytics Scaling Laws
• Analytics scaling is all about:
– Big gains for little initial effort
– Rapidly diminishing returns
• The key to net value is how costs scale
– Old school – exponential scaling
– Big data – linear scaling, low constant
• Cost/performance has radically changed
– Cluster computing, commodity hardware, data science frameworks…
© 2017 MapR Technologies 15
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
Most data isn’t worth much in isolation
First data is valuable
Later data is dregs
© 2017 MapR Technologies 16
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
Suddenly worth processing
First data is valuable
Later data is dregs
But has high aggregate value
© 2017 MapR Technologies 17
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
If we can handle the scale
It’s really big
© 2017 MapR Technologies 18
So what makes
that possible?
© 2017 MapR Technologies 19
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
© 2017 MapR Technologies 20
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
Net value optimum has
a sharp peak well
before maximum effort
© 2017 MapR Technologies 21
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
But scaling laws are
changing both slope and
shape
© 2017 MapR Technologies 22
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
More than just a
little
© 2017 MapR Technologies 23
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
They are changing a
LOT!
© 2017 MapR Technologies 24
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
© 2017 MapR Technologies 25
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
© 2017 MapR Technologies 26
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
© 2017 MapR Technologies 27
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
© 2017 MapR Technologies 28
2,0000 500 1000 1500
1
0
0.25
0.5
0.75
Scale
Value
Initially, linear cost scaling
actually makes things
worse
Then a tipping point is
reached and things change
radically …
© 2017 MapR Technologies 30
MapR Overview
© 2017 MapR Technologies 31
How do you persist data?
© 2017 MapR Technologies 32
All major persistence abstractions are one of these:
Files
tokyo
Streams
User
profiles
Tables
© 2017 MapR Technologies 33
HDFS
SOURCE
DATA
STREAM
PROCESSING & STORAGE
FINAL
OUTPUT
STORAGE
Kafka
Kafka
Kafka
Kafka
Kafka
Spark
Cassandra /
Mongo
Cassandra /
Mongo
Cassandra /
Mongo
“Classic” streaming involves single-purpose clusters.
© 2017 MapR Technologies 34
MapR-FS
SOURCE
DATA
STREAM
PROCESSING & STORAGE
FINAL
OUTPUT
STORAGE
MapR
Streams
Spark
MapR-DB
MapR converges the data layer into a single cluster.
© 2017 MapR Technologies 35
What is MapR?
A Data Platform
Converged^
© 2017 MapR Technologies 36
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-FS MapR-DB
Search and
Others
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Cloud and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search and
Others
Event StreamingDatabase
Custom
Apps
HDFS API POSIX, NFS HBase API JSON API Kafka API
MapR Converged Data Platform
© 2017 MapR Technologies 37
“Convergence” means…
• One cluster that does it all: Files + Tables + Streams
• Standard APIs for everything
• A distributed file system that looks “normal” (POSIX)
• Unified Management
• Global Namespace
• Mirroring, Replication, and Snapshots
– Synchronize files, tables, and streams across datacenters
– True failover for your applications
© 2017 MapR Technologies 38
How do I use MapR?
• Installs on Linux (e.g. Ubuntu, Redhat) typically to a block
device, and typically to a cluster of 3 or more nodes.
• Packaged as a scriptable / web-based installer, cloud
marketplace offers, Docker containers
• Sandbox VMs for your laptops.
© 2017 MapR Technologies 39
MapR In Action
© 2017 MapR Technologies 40
Apply MapR as a data layer for containers.
Producer Servlet Engine
HTTP Log
Browser
© 2017 MapR Technologies 41
Procedure
1. Download Sandbox
– Configure for Host-only Adapter
2. Download github repo
3. Compile code
4. Build Docker images
5. Create the MapR Stream topics
6. Run the Docker containers
© 2017 MapR Technologies 42
Procedure
• Download Sandbox
– Configure for Host-only Adapter
© 2017 MapR Technologies 43
Docker / MapR demo commands
1. git clone https://github.com/mapr-demos/mapr-pacc-sample
2. maprcli stream create -path /apps/sensors -produceperm p -consumeperm
p -topicperm p
3. maprcli stream topic create -path /apps/sensors -topic computer
4. /opt/mapr/kafka/kafka-0.9.0/bin/kafka-console-consumer.sh --new-
consumer --bootstrap-server this.will.be.ignored:9092 --topic
/apps/sensors:computer
5. docker run -it -e MAPR_CLDB_HOSTS=192.168.99.3 -e
MAPR_CLUSTER=demo.cluster.com -e MAPR_CONTAINER_USER=mapr --name
producer -i -t mapr-sensor-producer
6. docker run -it --privileged --cap-add SYS_ADMIN --cap-add SYS_RESOURCE
--device /dev/fuse -e MAPR_CLDB_HOSTS=192.168.99.3 -e
MAPR_CLUSTER=demo.cluster.com -e MAPR_CONTAINER_USER=mapr -e
MAPR_MOUNT_PATH=/mapr -p 8080:8080 --device /dev/fuse --name web -i -t
mapr-web-consumer
7. Open http://localhost:8080
8. Open http://192.168.99.3:8443
© 2017 MapR Technologies 44
References
• MapR Sandbox
http://maprdocs.mapr.com/home/SandboxHadoop/t_install_sa
ndbox_vbox.html
• MapR sample application
https://mapr.com/blog/getting-started-mapr-client-container/
• MapR Tutorials
https://mapr.com/developercentral/code/
© 2017 MapR Technologies 45
Apache Spark
© 2017 MapR Technologies 46
https://databricks.com/spark/about
© 2017 MapR Technologies 47
Resilient Distributed Datasets (RDDs)
• RDDs – lets programmers perform in-memory computations on
large distributed datasets in a fault-tolerant manner
• RDD is a representation of data that may or may not be on
your local machine. It’s partitioned across the cluster. (like a
distributed java Collection).
• RDD is immutable
– JavaRDD<String> lines = sc.textFile(“/path/to/data.log”)
• When you read data, nothing gets loaded. You’re not even
opening it. We first declare the operations that we’re going to
perform, then in the end the data is loaded and operated
upon when we perform an action that materializes the data.
© 2017 MapR Technologies 48
Resilient Distributed Datasets (RDDs)
1. Start by reading from files, DB, etc. to create a top level RDD
2. Lazy Transformations
.filter(), .map(), shuffle(), sample()
3. Actions (retrieval of the data) trigger stuff to finally run. Pulls
all the data into the JVM.
.savetoCassandra() .count(), .collect()
4. Once you have an RDD you like to work on, you can call
.cache() on it to keep it around, so you don’t have to derive it
again. By default cache will save to disk.
© 2017 MapR Technologies 49
Resilient Distributed Datasets (RDDs)
• RDD is building block of Spark.
– Dataframe, Dataset, DStream, etc are all abstractions for RDD
• immutable
• Operated on by lambda functions.
• Lazily evaluated
• Kick off parallel execution with actions like collect(), count(),
etc.
© 2017 MapR Technologies 50
What is Spark Streaming?
• enables scalable, high-throughput, fault-tolerant stream
processing of live data
• Run continuous SQL queries on data pushed into Kafka
Data Sources Data Sinks
© 2017 MapR Technologies 51
tail -f
MapR Streams store
and expose stream data
for processing
Output
action
© 2017 MapR Technologies 52
Spark Streaming Architecture
• processed	results	are	pushed	out		in	batches
Spark
batches of processed
results
Spark
Streaming
input data
stream
data from
time 0 to 1
data from
time 1 to 2
RDD @ time 2
data from
time 2 to 3
RDD @ time 3RDD @ time 1
Batch
interval
© 2017 MapR Technologies 53
Spark In Action
© 2017 MapR Technologies 54
Spark In Action
• Spark Shell
• Spark SQL in Zeppelin
• Spark SQL Databricks Notebook
• Spark Streaming Java API
• Debugging Spark with IntelliJ
© 2017 MapR Technologies 55
Databricks Cloud
• Spark notebook in
the cloud
– https://community.cl
oud.databricks.com/
• Sample notebooks:
– https://databricks.co
m/resources/type/ex
ample-notebooks
© 2017 MapR Technologies 56
Spark Shell (aka REPL)
• If you install spark locally, you get
this.
• Evals commands immediately
when you type it in, and shows you
the output.
• Fantastic way to experiment, with
tab completion.
© 2017 MapR Technologies 57
Apache Zeppelin
© 2017 MapR Technologies 58
Debugging Spark with IntelliJ
export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=
dt_socket,server=y,suspend=y,address=4000
© 2017 MapR Technologies 59
Monitoring
http://[hostname]:4040/jobs/
© 2017 MapR Technologies 60
Spark Streaming + ML on MapR
• Predict the location and time of Taxi requests.
– https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-1
– https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-2
streaming topic:
location and time of
taxi requests
Predicted and
actual pickup
locations and
times
Classification
Models (Spark ML)
Ridership
analytics
(Zeppelin)
Kmeans Clustering
(Spark ML on Uber
dataset)
© 2017 MapR Technologies 61
Streaming + ML demo procedure
1. Create topics:
maprcli stream create -path /user/mapr/stream -produceperm p -consumeperm p -topicperm p
maprcli stream topic create -path /user/mapr/stream -topic ubers -partitions 3
maprcli stream topic create -path /user/mapr/stream -topic uberp -partitions 3
2. Create and save the kmeans model to /mapr/my.cluster.com/user/mapr/data/savemodel:
/opt/mapr/spark/spark-2.0.1/bin/spark-submit --class com.sparkml.uber.ClusterUber --master
local[2] /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-1.0.jar
3. Send test dataset to a stream (just to illustrate using a stream):
java -cp /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-
1.0.jar:`mapr classpath` com.streamskafka.uber.MsgProducer /user/mapr/stream:ubers
/mapr/my.cluster.com/user/mapr/data/uber.csv
4. Monitor the test dataset (optional, on nodeb):
java -cp /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-
1.0.jar:`mapr classpath` com.streamskafka.uber.MsgConsumer /user/mapr/stream:ubers
5. Use the model to predict cluster for incoming taxi telemetry, output predictions to a topic:
/opt/mapr/spark/spark-2.0.1/bin/spark-submit --class
com.sparkkafka.uber.SparkKafkaConsumerProducer --master local[2] /home/mapr/mapr-sparkml-
streaming-uber/target/mapr-sparkml-streaming-uber-1.0-jar-with-dependencies.jar
/user/mapr/data/savemodel /user/mapr/stream:ubers /user/mapr/stream:uberp
6. Read the predictions topic and put it into a format that we can adhoc analyze in SQL
/opt/mapr/spark/spark-2.0.1/bin/spark-submit --class com.sparkkafka.uber.SparkKafkaConsumer --
master local[2] /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-1.0-
jar-with-dependencies.jar /user/mapr/stream:uberp
7. Open http://nodea:4040
© 2017 MapR Technologies 62
Real-Time Stock Market Analysis
https://mapr.com/appblueprint
https://github.com/mapr-demos/finserv-application-blueprint
© 2017 MapR Technologies 63
Advanced Concept:
Look Back for n Seconds on a Topic
Time
Data Topic
Offset Topic
t₀ t₁ t₂ t₃ t₄ t₅
3253 3347 3467 3608 3798 3913
Offset Topic: Key = Time t, Value = Offset of Data Topic at t
© 2017 MapR Technologies 64
MapR Streams vs Kafka
© 2017 MapR Technologies 65
Call To Action
© 2017 MapR Technologies 66
Call To Action
• You can foster innovation just by making data available.
• Seeking career advancement?
– Coursera classes on data science, ML, Spark, etc.
– Be a polyglot.
– Enable data science from development to production.
• You can apply those skills in ANY industry.
• Don’t be afraid by not knowing much.
• 87% of career builders attribute career benefit to completing
online courses (Harvard Business Review, Coursera)
– Be better equipped for current job, find a new job, change career.
© 2017 MapR Technologies 67
All Industries Web 2.0 Healthcare Telecom
• ETL / DW optimization
• Mainframe optimization
• Real-time application &
network monitoring
• Security information & event
management
• Recommendation engines &
targeting
• Customer 360
• Click-stream analysis
• Social media analysis
• Ad optimization
• Patient system of record
• Smart hospitals
• Biometrics
• Patient vital monitoring
• Fraud detection
• Crowd-based antenna
optimization
• Charging & billing
• Equipment monitoring &
preventative maintenance
• Smart meter analysis
Have an interesting use case? Let’s talk!
Oil & Gas Financial Services Retail Ad Tech
• Pump monitoring & alerting
• Seismic trace identification
• Equipment maintenance
• Safety & environment
• Security
• Real-time fraud/risk
monitoring
• Mobile notifications of
transactions
• Real-time supply chain
optimization
• Customer location
optimization
• Real-time coupons
• Ad targeting & optimization
• Global campaign dashboards

More Related Content

What's hot

Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient DataCarol McDonald
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Technologies
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataCarol McDonald
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Carol McDonald
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017Amazon Web Services
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 

What's hot (20)

Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
Applying Machine Learning to Live Patient Data
Applying Machine Learning to  Live Patient DataApplying Machine Learning to  Live Patient Data
Applying Machine Learning to Live Patient Data
 
How Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health CareHow Big Data is Reducing Costs and Improving Outcomes in Health Care
How Big Data is Reducing Costs and Improving Outcomes in Health Care
 
MapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data PlatformMapR Streams and MapR Converged Data Platform
MapR Streams and MapR Converged Data Platform
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
 
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor DataState of the Art Robot Predictive Maintenance with Real-time Sensor Data
State of the Art Robot Predictive Maintenance with Real-time Sensor Data
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...
 
NoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DBNoSQL Application Development with JSON and MapR-DB
NoSQL Application Development with JSON and MapR-DB
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 

Viewers also liked

Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafkaconfluent
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
Josh Vanderbaan 8.3 sportsmanship case study
Josh Vanderbaan 8.3 sportsmanship case studyJosh Vanderbaan 8.3 sportsmanship case study
Josh Vanderbaan 8.3 sportsmanship case studyJosh Vanderbaan
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Gwen (Chen) Shapira
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...confluent
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317Nan Zhu
 
Virus Marburgo - María José Merlo Nieva
Virus Marburgo - María José Merlo NievaVirus Marburgo - María José Merlo Nieva
Virus Marburgo - María José Merlo NievaMariaJose0742
 
Presentación de protocolo de investigación
Presentación de protocolo de investigaciónPresentación de protocolo de investigación
Presentación de protocolo de investigaciónGloria lastre
 
ゲームアイディア発想における発想法の有効性評価
ゲームアイディア発想における発想法の有効性評価ゲームアイディア発想における発想法の有効性評価
ゲームアイディア発想における発想法の有効性評価Kazuki Miyanishi
 
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...Ryan Huggins
 
3Com 3C16476A
3Com 3C16476A3Com 3C16476A
3Com 3C16476Asavomir
 
Metodos de programcion no lineal
Metodos de programcion no linealMetodos de programcion no lineal
Metodos de programcion no linealAngel Jhoan
 
Deontologia e etica profissional
Deontologia e etica profissionalDeontologia e etica profissional
Deontologia e etica profissionalMonalisa Ferreira
 
Sem group investor presentation february 2017 final
Sem group investor presentation february 2017 finalSem group investor presentation february 2017 final
Sem group investor presentation february 2017 finalSemGroupCorporation
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingSingleStore
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Timothy Spann
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven productsLars Albertsson
 
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...João Paulo Proença
 

Viewers also liked (20)

Data Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache KafkaData Pipelines Made Simple with Apache Kafka
Data Pipelines Made Simple with Apache Kafka
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
Josh Vanderbaan 8.3 sportsmanship case study
Josh Vanderbaan 8.3 sportsmanship case studyJosh Vanderbaan 8.3 sportsmanship case study
Josh Vanderbaan 8.3 sportsmanship case study
 
Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017Multi-Datacenter Kafka - Strata San Jose 2017
Multi-Datacenter Kafka - Strata San Jose 2017
 
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
Strata+Hadoop 2017 San Jose - The Rise of Real Time: Apache Kafka and the Str...
 
Sean chrysostom resume
Sean chrysostom resumeSean chrysostom resume
Sean chrysostom resume
 
Seattle spark-meetup-032317
Seattle spark-meetup-032317Seattle spark-meetup-032317
Seattle spark-meetup-032317
 
Virus Marburgo - María José Merlo Nieva
Virus Marburgo - María José Merlo NievaVirus Marburgo - María José Merlo Nieva
Virus Marburgo - María José Merlo Nieva
 
Presentación de protocolo de investigación
Presentación de protocolo de investigaciónPresentación de protocolo de investigación
Presentación de protocolo de investigación
 
ゲームアイディア発想における発想法の有効性評価
ゲームアイディア発想における発想法の有効性評価ゲームアイディア発想における発想法の有効性評価
ゲームアイディア発想における発想法の有効性評価
 
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...
SOLD! 1133 Calle Las Trancas in Thousand Oaks' Lynn Ranch - 5+3.5 nearly 3000...
 
Evaluation q1
Evaluation q1Evaluation q1
Evaluation q1
 
3Com 3C16476A
3Com 3C16476A3Com 3C16476A
3Com 3C16476A
 
Metodos de programcion no lineal
Metodos de programcion no linealMetodos de programcion no lineal
Metodos de programcion no lineal
 
Deontologia e etica profissional
Deontologia e etica profissionalDeontologia e etica profissional
Deontologia e etica profissional
 
Sem group investor presentation february 2017 final
Sem group investor presentation february 2017 finalSem group investor presentation february 2017 final
Sem group investor presentation february 2017 final
 
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile AdvertisingTapjoy: Building a Real-Time Data Science Service for Mobile Advertising
Tapjoy: Building a Real-Time Data Science Service for Mobile Advertising
 
Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms Ingesting Drone Data into Big Data Platforms
Ingesting Drone Data into Big Data Platforms
 
A primer on building real time data-driven products
A primer on building real time data-driven productsA primer on building real time data-driven products
A primer on building real time data-driven products
 
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...
Ferramentas web 2.0 ao serviço da organização, arquivo e divulgação de docume...
 

Similar to Spark and MapR Streams: A Motivating Example

Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsMatt Stubbs
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Carol McDonald
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning PrimerMathieu Dumoulin
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globallyridhav
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRData Con LA
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupAlan Iovine
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limitsAntje Barth
 

Similar to Spark and MapR Streams: A Motivating Example (20)

Big Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business SolutionsBig Data LDN 2017: How to leverage the cloud for Business Solutions
Big Data LDN 2017: How to leverage the cloud for Business Solutions
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
 
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...
 
MapR and Machine Learning Primer
MapR and Machine Learning PrimerMapR and Machine Learning Primer
MapR and Machine Learning Primer
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...
 
Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1Fast Cars, Big Data How Streaming can help Formula 1
Fast Cars, Big Data How Streaming can help Formula 1
 
Predictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural NetworksPredictive Maintenance Using Recurrent Neural Networks
Predictive Maintenance Using Recurrent Neural Networks
 
MapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn GloballyMapR Edge : Act Locally Learn Globally
MapR Edge : Act Locally Learn Globally
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Streaming in the Extreme
Streaming in the ExtremeStreaming in the Extreme
Streaming in the Extreme
 
Hadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapRHadoop and NoSQL joining forces by Dale Kim of MapR
Hadoop and NoSQL joining forces by Dale Kim of MapR
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Map r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetupMap r chicago_advanalytics_oct_meetup
Map r chicago_advanalytics_oct_meetup
 
MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -MapR on Azure: Getting Value from Big Data in the Cloud -
MapR on Azure: Getting Value from Big Data in the Cloud -
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Spark and MapR Streams: A Motivating Example

  • 1. © 2017 MapR Technologies 1 Spark and MapR Streams: A Motivating Example
  • 2. © 2017 MapR Technologies 2 Abstract • Businesses are discovering the untapped potential of large datasets and data streams through the use of technologies for big data processing and storage. By leveraging these assets they’re creating a new generation of applications that derive value from data they used to throw away. • In this presentation we’ll discuss how to build operational environments for these types of applications with the MapR Converged Data Platform and we’ll walk through an example of a next-generation application that uses Java APIs for MapR Streams, Apache Spark, Apache Hive, and MapR-DB. • We’ll see how these technologies can be used to join and transform unbounded datasets to find signals and derive new data streams for a financial scenario involving real-time algorithmic trading and historical analysis using SQL. • We’ll also discuss how MapR enables you to run real-time data applications with the speed, reliability, and security you need for a production environment. • Keywords: MapR, Spark, Kafka, NoSQL, JSON, Zeppelin, Hive, streaming
  • 3. © 2017 MapR Technologies 3 Contact Info Ian Downard Technical Evangelist at MapR Technologies idownard@mapr.com Personal Blog: http://bigendiandata.com Twitter: @iandownard
  • 4. © 2017 MapR Technologies 4 Learning Goals 1. Appreciate the opportunity of the time we’re in. 2. Become familiar with MapR 3. Become familiar with Spark 4. Feel empowered.
  • 5. © 2017 MapR Technologies 5 Why Now? • But Moore’s law has applied for a long time • Why is data exploding now? • Why not 10 years ago? • Why not 20?
  • 6. © 2017 MapR Technologies 6 Because data wasn’t available? • If it were just availability of data then existing big companies would adopt big data technology first
  • 7. © 2017 MapR Technologies 7 Because data wasn’t available? • If it were just availability of data then existing big companies would adopt big data technology first They didn’t
  • 8. © 2017 MapR Technologies 8 Because processing it was too expensive? • If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte
  • 9. © 2017 MapR Technologies 9 Because processing it was too expensive? • If it were just a net positive value then finance companies should adopt first because they have higher opportunity value / byte They didn’t
  • 10. © 2017 MapR Technologies 10 Backwards adoption • Under almost any argument, startups would not have adopted big data technology first
  • 11. © 2017 MapR Technologies 11 Backwards adoption • Under almost any argument, startups would not have adopted big data technology first They did
  • 12. © 2017 MapR Technologies 12 Everywhere at Once? • Something very strange is happening – Big data is being applied at many different scales – By large companies and small
  • 13. © 2017 MapR Technologies 13 Everywhere at Once? • Something very strange is happening – Big data is being applied at many different scales – By large companies and small Why?
  • 14. © 2017 MapR Technologies 14 Data Analytics Scaling Laws • Analytics scaling is all about: – Big gains for little initial effort – Rapidly diminishing returns • The key to net value is how costs scale – Old school – exponential scaling – Big data – linear scaling, low constant • Cost/performance has radically changed – Cluster computing, commodity hardware, data science frameworks…
  • 15. © 2017 MapR Technologies 15 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value Most data isn’t worth much in isolation First data is valuable Later data is dregs
  • 16. © 2017 MapR Technologies 16 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value Suddenly worth processing First data is valuable Later data is dregs But has high aggregate value
  • 17. © 2017 MapR Technologies 17 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value If we can handle the scale It’s really big
  • 18. © 2017 MapR Technologies 18 So what makes that possible?
  • 19. © 2017 MapR Technologies 19 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value
  • 20. © 2017 MapR Technologies 20 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value Net value optimum has a sharp peak well before maximum effort
  • 21. © 2017 MapR Technologies 21 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value But scaling laws are changing both slope and shape
  • 22. © 2017 MapR Technologies 22 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value More than just a little
  • 23. © 2017 MapR Technologies 23 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value They are changing a LOT!
  • 24. © 2017 MapR Technologies 24 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value
  • 25. © 2017 MapR Technologies 25 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value
  • 26. © 2017 MapR Technologies 26 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value
  • 27. © 2017 MapR Technologies 27 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value
  • 28. © 2017 MapR Technologies 28 2,0000 500 1000 1500 1 0 0.25 0.5 0.75 Scale Value Initially, linear cost scaling actually makes things worse Then a tipping point is reached and things change radically …
  • 29. © 2017 MapR Technologies 30 MapR Overview
  • 30. © 2017 MapR Technologies 31 How do you persist data?
  • 31. © 2017 MapR Technologies 32 All major persistence abstractions are one of these: Files tokyo Streams User profiles Tables
  • 32. © 2017 MapR Technologies 33 HDFS SOURCE DATA STREAM PROCESSING & STORAGE FINAL OUTPUT STORAGE Kafka Kafka Kafka Kafka Kafka Spark Cassandra / Mongo Cassandra / Mongo Cassandra / Mongo “Classic” streaming involves single-purpose clusters.
  • 33. © 2017 MapR Technologies 34 MapR-FS SOURCE DATA STREAM PROCESSING & STORAGE FINAL OUTPUT STORAGE MapR Streams Spark MapR-DB MapR converges the data layer into a single cluster.
  • 34. © 2017 MapR Technologies 35 What is MapR? A Data Platform Converged^
  • 35. © 2017 MapR Technologies 36 Open Source Engines & Tools Commercial Engines & Applications Enterprise-Grade Platform Services DataProcessing Web-Scale Storage MapR-FS MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps HDFS API POSIX, NFS HBase API JSON API Kafka API MapR Converged Data Platform
  • 36. © 2017 MapR Technologies 37 “Convergence” means… • One cluster that does it all: Files + Tables + Streams • Standard APIs for everything • A distributed file system that looks “normal” (POSIX) • Unified Management • Global Namespace • Mirroring, Replication, and Snapshots – Synchronize files, tables, and streams across datacenters – True failover for your applications
  • 37. © 2017 MapR Technologies 38 How do I use MapR? • Installs on Linux (e.g. Ubuntu, Redhat) typically to a block device, and typically to a cluster of 3 or more nodes. • Packaged as a scriptable / web-based installer, cloud marketplace offers, Docker containers • Sandbox VMs for your laptops.
  • 38. © 2017 MapR Technologies 39 MapR In Action
  • 39. © 2017 MapR Technologies 40 Apply MapR as a data layer for containers. Producer Servlet Engine HTTP Log Browser
  • 40. © 2017 MapR Technologies 41 Procedure 1. Download Sandbox – Configure for Host-only Adapter 2. Download github repo 3. Compile code 4. Build Docker images 5. Create the MapR Stream topics 6. Run the Docker containers
  • 41. © 2017 MapR Technologies 42 Procedure • Download Sandbox – Configure for Host-only Adapter
  • 42. © 2017 MapR Technologies 43 Docker / MapR demo commands 1. git clone https://github.com/mapr-demos/mapr-pacc-sample 2. maprcli stream create -path /apps/sensors -produceperm p -consumeperm p -topicperm p 3. maprcli stream topic create -path /apps/sensors -topic computer 4. /opt/mapr/kafka/kafka-0.9.0/bin/kafka-console-consumer.sh --new- consumer --bootstrap-server this.will.be.ignored:9092 --topic /apps/sensors:computer 5. docker run -it -e MAPR_CLDB_HOSTS=192.168.99.3 -e MAPR_CLUSTER=demo.cluster.com -e MAPR_CONTAINER_USER=mapr --name producer -i -t mapr-sensor-producer 6. docker run -it --privileged --cap-add SYS_ADMIN --cap-add SYS_RESOURCE --device /dev/fuse -e MAPR_CLDB_HOSTS=192.168.99.3 -e MAPR_CLUSTER=demo.cluster.com -e MAPR_CONTAINER_USER=mapr -e MAPR_MOUNT_PATH=/mapr -p 8080:8080 --device /dev/fuse --name web -i -t mapr-web-consumer 7. Open http://localhost:8080 8. Open http://192.168.99.3:8443
  • 43. © 2017 MapR Technologies 44 References • MapR Sandbox http://maprdocs.mapr.com/home/SandboxHadoop/t_install_sa ndbox_vbox.html • MapR sample application https://mapr.com/blog/getting-started-mapr-client-container/ • MapR Tutorials https://mapr.com/developercentral/code/
  • 44. © 2017 MapR Technologies 45 Apache Spark
  • 45. © 2017 MapR Technologies 46 https://databricks.com/spark/about
  • 46. © 2017 MapR Technologies 47 Resilient Distributed Datasets (RDDs) • RDDs – lets programmers perform in-memory computations on large distributed datasets in a fault-tolerant manner • RDD is a representation of data that may or may not be on your local machine. It’s partitioned across the cluster. (like a distributed java Collection). • RDD is immutable – JavaRDD<String> lines = sc.textFile(“/path/to/data.log”) • When you read data, nothing gets loaded. You’re not even opening it. We first declare the operations that we’re going to perform, then in the end the data is loaded and operated upon when we perform an action that materializes the data.
  • 47. © 2017 MapR Technologies 48 Resilient Distributed Datasets (RDDs) 1. Start by reading from files, DB, etc. to create a top level RDD 2. Lazy Transformations .filter(), .map(), shuffle(), sample() 3. Actions (retrieval of the data) trigger stuff to finally run. Pulls all the data into the JVM. .savetoCassandra() .count(), .collect() 4. Once you have an RDD you like to work on, you can call .cache() on it to keep it around, so you don’t have to derive it again. By default cache will save to disk.
  • 48. © 2017 MapR Technologies 49 Resilient Distributed Datasets (RDDs) • RDD is building block of Spark. – Dataframe, Dataset, DStream, etc are all abstractions for RDD • immutable • Operated on by lambda functions. • Lazily evaluated • Kick off parallel execution with actions like collect(), count(), etc.
  • 49. © 2017 MapR Technologies 50 What is Spark Streaming? • enables scalable, high-throughput, fault-tolerant stream processing of live data • Run continuous SQL queries on data pushed into Kafka Data Sources Data Sinks
  • 50. © 2017 MapR Technologies 51 tail -f MapR Streams store and expose stream data for processing Output action
  • 51. © 2017 MapR Technologies 52 Spark Streaming Architecture • processed results are pushed out in batches Spark batches of processed results Spark Streaming input data stream data from time 0 to 1 data from time 1 to 2 RDD @ time 2 data from time 2 to 3 RDD @ time 3RDD @ time 1 Batch interval
  • 52. © 2017 MapR Technologies 53 Spark In Action
  • 53. © 2017 MapR Technologies 54 Spark In Action • Spark Shell • Spark SQL in Zeppelin • Spark SQL Databricks Notebook • Spark Streaming Java API • Debugging Spark with IntelliJ
  • 54. © 2017 MapR Technologies 55 Databricks Cloud • Spark notebook in the cloud – https://community.cl oud.databricks.com/ • Sample notebooks: – https://databricks.co m/resources/type/ex ample-notebooks
  • 55. © 2017 MapR Technologies 56 Spark Shell (aka REPL) • If you install spark locally, you get this. • Evals commands immediately when you type it in, and shows you the output. • Fantastic way to experiment, with tab completion.
  • 56. © 2017 MapR Technologies 57 Apache Zeppelin
  • 57. © 2017 MapR Technologies 58 Debugging Spark with IntelliJ export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport= dt_socket,server=y,suspend=y,address=4000
  • 58. © 2017 MapR Technologies 59 Monitoring http://[hostname]:4040/jobs/
  • 59. © 2017 MapR Technologies 60 Spark Streaming + ML on MapR • Predict the location and time of Taxi requests. – https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-1 – https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-learning-streaming-and-kafka-api-part-2 streaming topic: location and time of taxi requests Predicted and actual pickup locations and times Classification Models (Spark ML) Ridership analytics (Zeppelin) Kmeans Clustering (Spark ML on Uber dataset)
  • 60. © 2017 MapR Technologies 61 Streaming + ML demo procedure 1. Create topics: maprcli stream create -path /user/mapr/stream -produceperm p -consumeperm p -topicperm p maprcli stream topic create -path /user/mapr/stream -topic ubers -partitions 3 maprcli stream topic create -path /user/mapr/stream -topic uberp -partitions 3 2. Create and save the kmeans model to /mapr/my.cluster.com/user/mapr/data/savemodel: /opt/mapr/spark/spark-2.0.1/bin/spark-submit --class com.sparkml.uber.ClusterUber --master local[2] /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-1.0.jar 3. Send test dataset to a stream (just to illustrate using a stream): java -cp /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber- 1.0.jar:`mapr classpath` com.streamskafka.uber.MsgProducer /user/mapr/stream:ubers /mapr/my.cluster.com/user/mapr/data/uber.csv 4. Monitor the test dataset (optional, on nodeb): java -cp /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber- 1.0.jar:`mapr classpath` com.streamskafka.uber.MsgConsumer /user/mapr/stream:ubers 5. Use the model to predict cluster for incoming taxi telemetry, output predictions to a topic: /opt/mapr/spark/spark-2.0.1/bin/spark-submit --class com.sparkkafka.uber.SparkKafkaConsumerProducer --master local[2] /home/mapr/mapr-sparkml- streaming-uber/target/mapr-sparkml-streaming-uber-1.0-jar-with-dependencies.jar /user/mapr/data/savemodel /user/mapr/stream:ubers /user/mapr/stream:uberp 6. Read the predictions topic and put it into a format that we can adhoc analyze in SQL /opt/mapr/spark/spark-2.0.1/bin/spark-submit --class com.sparkkafka.uber.SparkKafkaConsumer -- master local[2] /home/mapr/mapr-sparkml-streaming-uber/target/mapr-sparkml-streaming-uber-1.0- jar-with-dependencies.jar /user/mapr/stream:uberp 7. Open http://nodea:4040
  • 61. © 2017 MapR Technologies 62 Real-Time Stock Market Analysis https://mapr.com/appblueprint https://github.com/mapr-demos/finserv-application-blueprint
  • 62. © 2017 MapR Technologies 63 Advanced Concept: Look Back for n Seconds on a Topic Time Data Topic Offset Topic t₀ t₁ t₂ t₃ t₄ t₅ 3253 3347 3467 3608 3798 3913 Offset Topic: Key = Time t, Value = Offset of Data Topic at t
  • 63. © 2017 MapR Technologies 64 MapR Streams vs Kafka
  • 64. © 2017 MapR Technologies 65 Call To Action
  • 65. © 2017 MapR Technologies 66 Call To Action • You can foster innovation just by making data available. • Seeking career advancement? – Coursera classes on data science, ML, Spark, etc. – Be a polyglot. – Enable data science from development to production. • You can apply those skills in ANY industry. • Don’t be afraid by not knowing much. • 87% of career builders attribute career benefit to completing online courses (Harvard Business Review, Coursera) – Be better equipped for current job, find a new job, change career.
  • 66. © 2017 MapR Technologies 67 All Industries Web 2.0 Healthcare Telecom • ETL / DW optimization • Mainframe optimization • Real-time application & network monitoring • Security information & event management • Recommendation engines & targeting • Customer 360 • Click-stream analysis • Social media analysis • Ad optimization • Patient system of record • Smart hospitals • Biometrics • Patient vital monitoring • Fraud detection • Crowd-based antenna optimization • Charging & billing • Equipment monitoring & preventative maintenance • Smart meter analysis Have an interesting use case? Let’s talk! Oil & Gas Financial Services Retail Ad Tech • Pump monitoring & alerting • Seismic trace identification • Equipment maintenance • Safety & environment • Security • Real-time fraud/risk monitoring • Mobile notifications of transactions • Real-time supply chain optimization • Customer location optimization • Real-time coupons • Ad targeting & optimization • Global campaign dashboards