SlideShare a Scribd company logo
1 of 44
®
© 2015 MapR Technologies 1
®
© 2014 MapR Technologies
Jim Bates, Solutions Engineering
JUL, 2016
®
© 2015 MapR Technologies 2
Some Use Cases for Big Data
Fraud and Risk
Operational Optimization
Opportunity Targeting
Predictive Maintenance
Internet of Things
Customer 360
Data Warehousing
Who Says They Can do Them?
Which Ones are Right for Spark?
®
© 2015 MapR Technologies 3
The “Hadoop” View of the World
®
© 2015 MapR Technologies 4
The Real View of the World
®
© 2015 MapR Technologies 5
Not all tools are a hammer…
Not all solutions are Hadoop…
Not all solutions are Spark…
®
© 2015 MapR Technologies 6
What are the right questions to ask?
• The question is not “Who can do it?” but
“How is it done?”
• The question is “What does it mean for
your business?”
• The question is “How do you get the
fastest ROI?”
• The question is “How well can I
maintain my solution”
®
© 2015 MapR Technologies 7
Agenda
• Why Spark today
• Spark history and prognostication
• Spark architectures with and without Hadoop
• Spark success examples
®
© 2015 MapR Technologies 8
<1%
MapR: the Production Choice for Big Data Applications
Best Product High Growth
>100% Billings Growth
18%
Customers
with >50 apps**
382% Avg. 3-yr ROI*
700+
CustomersBig Data
Converged
Data Platform
Apache Open Source
Churn
+ Innovation
* IDC – “The Business Value of MapR”, 2016.
** - TechValidate Research, 2015
®
© 2015 MapR Technologies 9
Data Transported by Elephant
Streaming
Real-time
Analytics
(Hadoop, Spark)
Operational Cluster
(HBase, Cassandra)
Streaming Cluster
(TIBCO, IBM, Kafka)
Batch Loads
Sources Apps
Enterprise
Storage
(system of record)
®
© 2015 MapR Technologies 10
®
© 2015 MapR Technologies 11
MapR Gives Unique Advantage to Your Business:
The Easiest Ingestion and Consumption of Your Data
Real-time
applications
NFS and
Fuse client
for
file-based
applications
HadoopAPIs
for Hadoop
applications ODBC &
JDBC for
SQL-based
applications
Mission
critical and
SLA
dependent
applications
®
© 2015 MapR Technologies 12
Agenda
• Why Spark today
• Spark history and prognostication
• Spark architectures with and without Hadoop
• Spark success examples
®
© 2015 MapR Technologies 13
Why the continual execution engine shift?
• MapReduce is powerful, but hard
– Needed to convert models to map & reduce constructs
– Brittle and prone to fracture on change
• Impala
– Low latency SQL/HiveQL quires
• Drill
– Low latency ANSI SQL queries
• Tez
– Nice improvement but late to the game
• Spark
• Flink: the latest flavor
®
© 2015 MapR Technologies 14
Analytics & ETL: Batch or Streaming?
• The every increasing drive to know everything NOW!
V
A
L
U
E
TIME
®
© 2015 MapR Technologies 15
But why the Spark explosion?
• Libraries!
• Flexibility!
• Interaction!
®
© 2015 MapR Technologies 16
Agenda
• Why Spark today
• Spark history and prognostication
• Spark architectures with and without Hadoop
• Spark success examples
®
© 2015 MapR Technologies 17
The Push to 2.0
• DataFrames
• SparkR
• Data Sources
• Project Tungsten
• Streaming ML
• Kafka Connectors
• ML Pipelines
• Better Debug
• Dataset API
• Notebooks
®
© 2015 MapR Technologies 18
The Push to 2.0
• Performance Increases
• Structured Streaming
• Unified DataSets and Data
Frames
®
© 2015 MapR Technologies 19
Tungsten
®
© 2015 MapR Technologies 20
Streaming with Structure
• Real-Time is the driving factor
• Most Apps need both batch and interactive
• Goal to combine both
• Adds windowing, sessions, sources and sinks
• Built in ML
®
© 2015 MapR Technologies 21
Combining Data
• Merge the DateFrame and Dataset APIs
• Fewer lines of code
• Richer semantics
• Cleaner code movement throughout Spark
®
© 2015 MapR Technologies 22
Agenda
• Why Spark today
• Spark history and prognostication
• Spark architectures with and without Hadoop
• Spark success examples
®
© 2015 MapR Technologies 23
Spark Stack Offers Variety of Functionality…
Spark SQL
(SQL)
Spark Streaming
(Streaming)
MLlib
(Machine learning)
Spark (General execution engine)
GraphX (Graph
computation)
Mesos
Distributed File System (HDFS, MapR-FS, S3, …)
Hadoop YARN
®
© 2015 MapR Technologies 24
Spark Architectures
• databricks
®
© 2015 MapR Technologies 25
Spark Architectures
• databricks
• Google
®
© 2015 MapR Technologies 26
Spark Architectures
• databricks
• Google
• Amazon
• Azure
• Hadoop Players
®
© 2015 MapR Technologies 27
The MapR Platform including Spark
* Zeppelin or Jupyter Notebooks
Notebooks
Workflow
Management
Commercial Apps QSS
Installer, Management, and Monitoring
Spark Streaming
Spark Machine
Learning (MLLib)
Spark SQL GraphX SparkR
Spark Core
MapR-FS MapR-DB MapR Streams
®
© 2015 MapR Technologies 28
The MapR Platform including Spark
* Zeppelin or Jupyter Notebooks
Notebooks
Workflow
Management
Commercial Apps QSS
Installer, Management, and Monitoring
Spark Streaming
Spark Machine
Learning (MLLib)
Spark SQL GraphX SparkR
Spark Core
MapR-FS MapR-DB MapR Streams
Mesos YARNMyriad
®
© 2015 MapR Technologies 29
The MapR Platform including Spark
Installer, Management, and Monitoring
MapR-FS MapR-DB MapR Streams
Mesos YARNMyriad
Notebooks Workflow Management Commercial Apps QSS
Spark
Streaming
Spark
MLLib
Spark
SQL
GraphX SparkR
Spark Core
®
© 2015 MapR Technologies 30
Agenda
• Why Spark today
• Spark history and prognostication
• Spark architectures with and without Hadoop
• Spark success examples
®
© 2015 MapR Technologies 31
Analytics & ETL: Batch or Streaming?
• The every increasing drive to know everything NOW!
V
A
L
U
E
TIME
®
© 2015 MapR Technologies 32
Advanced Analytics
Descriptive Predictive Streaming Prescriptive
Data-At-Rest Data-In-Motion Future
● What happened
● Why did it happen
● Discovery in nature
● Batch analytics
● What will happen
● Combines historical
data with rules and
algorithms
● ML (Batch
+ Real Time)
● What + When + Why
● Suggestions to take
advantage of future
opportunity or mitigate risks
● Volume, velocity and variety
● Agility is key to success.
● Analyse data as it happens
● Triggers and Alarms.
● Anomaly detection
● Continuous ETL and analytics
®
© 2015 MapR Technologies 33
Decreasing Job Latencies
Hours Mins Secs
Milli
Secs
Data persisted
on-disk
Data persisted
in-memory
®
© 2015 MapR Technologies 34
Trinity of Real Time
Topic 1
Real Time
Producers
Spark +
MapR
Streams
IntegrationTopic 2
Global Messaging
System
Transformation Layer Key Value Store
Real Time
Operational
Analytics
Spark +
MapR DB
Integration
APIs
®
© 2015 MapR Technologies 35© 2015 MapR Technologies
Batch Analytics & ETL
®
© 2015 MapR Technologies 36
Global Ad-tech: End of Day ETL and Analytics
Online Ad
Auctions: Ad-
impressions Data
Streams/
Kafka
Topic
Topic
Data is ingested into the
Hadoop Cluster
Data Aggregation
based on filters
(GroupBy based on
Geographical
locations, ad-sites
etc…)
Advanced PredictiveAnalytics
MapR-FS
®
© 2015 MapR Technologies 37© 2015 MapR Technologies
Predictive Analytics
®
© 2015 MapR Technologies 38
Customer 360 & Behavior Prediction
Website Click-
Stream Topic
Topic
Topic
Topic
Real Time/Offline
ClickStream Analysis
MapR-FS
EDH/EDL
Internal Data
Sources
External Data
Sources
Support
Tickets
DBMSEmail
CRM
● Prediction Modelling
● Attribution Modelling
● CohortAnalysis
● Customer Lifetime Value
Analysis
● Attrition Modelling
● Response Modelling
● Churn Modelling
Eliminate latency due to data movement between clusters
Eliminate Redundant storage
with MapR streams and lower
the TCO
360
Degree
Customer
View
Customer Behavior Prediction
Better Conversion Rate and Lower attrition $$$
Offline
Real Time
HA, DR, NFS, Snapshots,
Data Protection
®
© 2015 MapR Technologies 39© 2015 MapR Technologies
Prescriptive Analytics
®
© 2015 MapR Technologies 40
Prescriptive Analytics: Automotive Operational and
Manufacturing
GPS
Telemati
c Data
Telephone Truck Fleet
Topic
Topic
Topic
Topic
Data generated from cars are
stored locally
Data Modelling/Secondary
ETL: Data is converted from
proprietary to parquet format
● Identify emission patterns
● Route optimization
● Customer service requests
● How does throttling affect other factors such as fuel
consumption, emissions, etc.
● Image and video analysis
● Time series analysis for threshold breach
®
© 2015 MapR Technologies 41© 2015 MapR Technologies
Streaming Analytics and Analytic
Applications
®
© 2015 MapR Technologies 42
On-Demand
Pre-Computed
Analytics Application: Implementation
MapR-DB
DB
Application
Sales Incentive
Data
MapR-FS
Topic
Topic
Topic
MapR-
Streams
Topic
● 60 events/sec
● 10 MB/event
● Tabled based
topics
Fast Changing Data
Ex: Credit date
Append Only (50%
of events)
Search
Application
Stale Data. Aggregates
calculated using Snapshots.
Level 1 and 2
Aggregates
Level 3
Aggregates
Advanced ML
Analytics
Delta
Aggregates
Pre-compute
analytics with
Spark Streaming
on Data-in-motion Object Profile
JSON docs
®
© 2015 MapR Technologies 43
Ask the questions early and often
• The question is not “Who can do it?” but
“How is it done?”
• The question is “What does it mean for
your business?”
• The question is “How do you get the
fastest ROI?”
• The question is “How well can I
maintain my solution”
• Add your own questions to the list
®
© 2015 MapR Technologies 44
Q&A
@mapr maprtech
jbates@mapr.com
Engage with us!
MapR
maprtech
mapr-technologies
®
© 2015 MapR Technologies 44

More Related Content

More from StampedeCon

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016StampedeCon
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016StampedeCon
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsStampedeCon
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016StampedeCon
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016StampedeCon
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016StampedeCon
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016StampedeCon
 

More from StampedeCon (20)

Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016Turn Data Into Actionable Insights - StampedeCon 2016
Turn Data Into Actionable Insights - StampedeCon 2016
 
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
The Big Data Journey – How Companies Adopt Hadoop - StampedeCon 2016
 
Visualizing Big Data – The Fundamentals
Visualizing Big Data – The FundamentalsVisualizing Big Data – The Fundamentals
Visualizing Big Data – The Fundamentals
 
Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016Introduction to Kudu - StampedeCon 2016
Introduction to Kudu - StampedeCon 2016
 
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
Analyzing Time-Series Data with Apache Spark and Cassandra - StampedeCon 2016
 
Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016Resource Management in Impala - StampedeCon 2016
Resource Management in Impala - StampedeCon 2016
 
How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016How to get started in Big Data without Big Costs - StampedeCon 2016
How to get started in Big Data without Big Costs - StampedeCon 2016
 

Recently uploaded

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding TeamAdam Moalla
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 

Recently uploaded (20)

IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team9 Steps For Building Winning Founding Team
9 Steps For Building Winning Founding Team
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

Apache Spark, With or Without Hadoop? - StampedeCon 2016

  • 1. ® © 2015 MapR Technologies 1 ® © 2014 MapR Technologies Jim Bates, Solutions Engineering JUL, 2016
  • 2. ® © 2015 MapR Technologies 2 Some Use Cases for Big Data Fraud and Risk Operational Optimization Opportunity Targeting Predictive Maintenance Internet of Things Customer 360 Data Warehousing Who Says They Can do Them? Which Ones are Right for Spark?
  • 3. ® © 2015 MapR Technologies 3 The “Hadoop” View of the World
  • 4. ® © 2015 MapR Technologies 4 The Real View of the World
  • 5. ® © 2015 MapR Technologies 5 Not all tools are a hammer… Not all solutions are Hadoop… Not all solutions are Spark…
  • 6. ® © 2015 MapR Technologies 6 What are the right questions to ask? • The question is not “Who can do it?” but “How is it done?” • The question is “What does it mean for your business?” • The question is “How do you get the fastest ROI?” • The question is “How well can I maintain my solution”
  • 7. ® © 2015 MapR Technologies 7 Agenda • Why Spark today • Spark history and prognostication • Spark architectures with and without Hadoop • Spark success examples
  • 8. ® © 2015 MapR Technologies 8 <1% MapR: the Production Choice for Big Data Applications Best Product High Growth >100% Billings Growth 18% Customers with >50 apps** 382% Avg. 3-yr ROI* 700+ CustomersBig Data Converged Data Platform Apache Open Source Churn + Innovation * IDC – “The Business Value of MapR”, 2016. ** - TechValidate Research, 2015
  • 9. ® © 2015 MapR Technologies 9 Data Transported by Elephant Streaming Real-time Analytics (Hadoop, Spark) Operational Cluster (HBase, Cassandra) Streaming Cluster (TIBCO, IBM, Kafka) Batch Loads Sources Apps Enterprise Storage (system of record)
  • 10. ® © 2015 MapR Technologies 10
  • 11. ® © 2015 MapR Technologies 11 MapR Gives Unique Advantage to Your Business: The Easiest Ingestion and Consumption of Your Data Real-time applications NFS and Fuse client for file-based applications HadoopAPIs for Hadoop applications ODBC & JDBC for SQL-based applications Mission critical and SLA dependent applications
  • 12. ® © 2015 MapR Technologies 12 Agenda • Why Spark today • Spark history and prognostication • Spark architectures with and without Hadoop • Spark success examples
  • 13. ® © 2015 MapR Technologies 13 Why the continual execution engine shift? • MapReduce is powerful, but hard – Needed to convert models to map & reduce constructs – Brittle and prone to fracture on change • Impala – Low latency SQL/HiveQL quires • Drill – Low latency ANSI SQL queries • Tez – Nice improvement but late to the game • Spark • Flink: the latest flavor
  • 14. ® © 2015 MapR Technologies 14 Analytics & ETL: Batch or Streaming? • The every increasing drive to know everything NOW! V A L U E TIME
  • 15. ® © 2015 MapR Technologies 15 But why the Spark explosion? • Libraries! • Flexibility! • Interaction!
  • 16. ® © 2015 MapR Technologies 16 Agenda • Why Spark today • Spark history and prognostication • Spark architectures with and without Hadoop • Spark success examples
  • 17. ® © 2015 MapR Technologies 17 The Push to 2.0 • DataFrames • SparkR • Data Sources • Project Tungsten • Streaming ML • Kafka Connectors • ML Pipelines • Better Debug • Dataset API • Notebooks
  • 18. ® © 2015 MapR Technologies 18 The Push to 2.0 • Performance Increases • Structured Streaming • Unified DataSets and Data Frames
  • 19. ® © 2015 MapR Technologies 19 Tungsten
  • 20. ® © 2015 MapR Technologies 20 Streaming with Structure • Real-Time is the driving factor • Most Apps need both batch and interactive • Goal to combine both • Adds windowing, sessions, sources and sinks • Built in ML
  • 21. ® © 2015 MapR Technologies 21 Combining Data • Merge the DateFrame and Dataset APIs • Fewer lines of code • Richer semantics • Cleaner code movement throughout Spark
  • 22. ® © 2015 MapR Technologies 22 Agenda • Why Spark today • Spark history and prognostication • Spark architectures with and without Hadoop • Spark success examples
  • 23. ® © 2015 MapR Technologies 23 Spark Stack Offers Variety of Functionality… Spark SQL (SQL) Spark Streaming (Streaming) MLlib (Machine learning) Spark (General execution engine) GraphX (Graph computation) Mesos Distributed File System (HDFS, MapR-FS, S3, …) Hadoop YARN
  • 24. ® © 2015 MapR Technologies 24 Spark Architectures • databricks
  • 25. ® © 2015 MapR Technologies 25 Spark Architectures • databricks • Google
  • 26. ® © 2015 MapR Technologies 26 Spark Architectures • databricks • Google • Amazon • Azure • Hadoop Players
  • 27. ® © 2015 MapR Technologies 27 The MapR Platform including Spark * Zeppelin or Jupyter Notebooks Notebooks Workflow Management Commercial Apps QSS Installer, Management, and Monitoring Spark Streaming Spark Machine Learning (MLLib) Spark SQL GraphX SparkR Spark Core MapR-FS MapR-DB MapR Streams
  • 28. ® © 2015 MapR Technologies 28 The MapR Platform including Spark * Zeppelin or Jupyter Notebooks Notebooks Workflow Management Commercial Apps QSS Installer, Management, and Monitoring Spark Streaming Spark Machine Learning (MLLib) Spark SQL GraphX SparkR Spark Core MapR-FS MapR-DB MapR Streams Mesos YARNMyriad
  • 29. ® © 2015 MapR Technologies 29 The MapR Platform including Spark Installer, Management, and Monitoring MapR-FS MapR-DB MapR Streams Mesos YARNMyriad Notebooks Workflow Management Commercial Apps QSS Spark Streaming Spark MLLib Spark SQL GraphX SparkR Spark Core
  • 30. ® © 2015 MapR Technologies 30 Agenda • Why Spark today • Spark history and prognostication • Spark architectures with and without Hadoop • Spark success examples
  • 31. ® © 2015 MapR Technologies 31 Analytics & ETL: Batch or Streaming? • The every increasing drive to know everything NOW! V A L U E TIME
  • 32. ® © 2015 MapR Technologies 32 Advanced Analytics Descriptive Predictive Streaming Prescriptive Data-At-Rest Data-In-Motion Future ● What happened ● Why did it happen ● Discovery in nature ● Batch analytics ● What will happen ● Combines historical data with rules and algorithms ● ML (Batch + Real Time) ● What + When + Why ● Suggestions to take advantage of future opportunity or mitigate risks ● Volume, velocity and variety ● Agility is key to success. ● Analyse data as it happens ● Triggers and Alarms. ● Anomaly detection ● Continuous ETL and analytics
  • 33. ® © 2015 MapR Technologies 33 Decreasing Job Latencies Hours Mins Secs Milli Secs Data persisted on-disk Data persisted in-memory
  • 34. ® © 2015 MapR Technologies 34 Trinity of Real Time Topic 1 Real Time Producers Spark + MapR Streams IntegrationTopic 2 Global Messaging System Transformation Layer Key Value Store Real Time Operational Analytics Spark + MapR DB Integration APIs
  • 35. ® © 2015 MapR Technologies 35© 2015 MapR Technologies Batch Analytics & ETL
  • 36. ® © 2015 MapR Technologies 36 Global Ad-tech: End of Day ETL and Analytics Online Ad Auctions: Ad- impressions Data Streams/ Kafka Topic Topic Data is ingested into the Hadoop Cluster Data Aggregation based on filters (GroupBy based on Geographical locations, ad-sites etc…) Advanced PredictiveAnalytics MapR-FS
  • 37. ® © 2015 MapR Technologies 37© 2015 MapR Technologies Predictive Analytics
  • 38. ® © 2015 MapR Technologies 38 Customer 360 & Behavior Prediction Website Click- Stream Topic Topic Topic Topic Real Time/Offline ClickStream Analysis MapR-FS EDH/EDL Internal Data Sources External Data Sources Support Tickets DBMSEmail CRM ● Prediction Modelling ● Attribution Modelling ● CohortAnalysis ● Customer Lifetime Value Analysis ● Attrition Modelling ● Response Modelling ● Churn Modelling Eliminate latency due to data movement between clusters Eliminate Redundant storage with MapR streams and lower the TCO 360 Degree Customer View Customer Behavior Prediction Better Conversion Rate and Lower attrition $$$ Offline Real Time HA, DR, NFS, Snapshots, Data Protection
  • 39. ® © 2015 MapR Technologies 39© 2015 MapR Technologies Prescriptive Analytics
  • 40. ® © 2015 MapR Technologies 40 Prescriptive Analytics: Automotive Operational and Manufacturing GPS Telemati c Data Telephone Truck Fleet Topic Topic Topic Topic Data generated from cars are stored locally Data Modelling/Secondary ETL: Data is converted from proprietary to parquet format ● Identify emission patterns ● Route optimization ● Customer service requests ● How does throttling affect other factors such as fuel consumption, emissions, etc. ● Image and video analysis ● Time series analysis for threshold breach
  • 41. ® © 2015 MapR Technologies 41© 2015 MapR Technologies Streaming Analytics and Analytic Applications
  • 42. ® © 2015 MapR Technologies 42 On-Demand Pre-Computed Analytics Application: Implementation MapR-DB DB Application Sales Incentive Data MapR-FS Topic Topic Topic MapR- Streams Topic ● 60 events/sec ● 10 MB/event ● Tabled based topics Fast Changing Data Ex: Credit date Append Only (50% of events) Search Application Stale Data. Aggregates calculated using Snapshots. Level 1 and 2 Aggregates Level 3 Aggregates Advanced ML Analytics Delta Aggregates Pre-compute analytics with Spark Streaming on Data-in-motion Object Profile JSON docs
  • 43. ® © 2015 MapR Technologies 43 Ask the questions early and often • The question is not “Who can do it?” but “How is it done?” • The question is “What does it mean for your business?” • The question is “How do you get the fastest ROI?” • The question is “How well can I maintain my solution” • Add your own questions to the list
  • 44. ® © 2015 MapR Technologies 44 Q&A @mapr maprtech jbates@mapr.com Engage with us! MapR maprtech mapr-technologies ® © 2015 MapR Technologies 44