SlideShare a Scribd company logo
Spark and Hadoop
Perfect Together
Arun Murthy
Hortonworks Co-Founder
@acmurthy
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Operating System
Enable all data and applications
TO BE
accessible and shared
BY
any end-users
Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Operating System
YARN: data operating system
Governance Security
Operations
Resource management
Data access: batch, interactive, real-time
Storage
Commodity Appliance Cloud
Built on a centralized architecture of
shared enterprise services:
Scalable tiered storage
Resource and workload management
Trusted data governance & metadata management
Consistent operations
Comprehensive security
Developer APIs and tools
Hadoop/YARN-powered data operating system
100% open source, multi-tenant data platform for
any application, any data set, anywhere.
Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Elegant Developer APIs
DataFrames, Machine Learning, and SQL
Made for Data Science
All apps need to get predictive at scale and fine granularity
Democratize Machine Learning
Spark is doing to ML on Hadoop what Hive did for SQL on
Hadoop
Community
Broad developer, customer and partner interest
Realize Value of Data Operating System
A key tool in the Hadoop toolbox
Why We Love Spark at Hortonworks
YARN
Scala
Java
Python
R
APIs
Spark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
N
HDFS
Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Let’s Talk Real-World Use-Cases
Page6 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW
MANAGEMENT
Streaming & Machine Learning for Web Analytics
USE CASE:
Cost: Storage and
processing not economical
at scale
Silos: Separate clusters —
Hadoop & Spark
Analytics: Retroactive view
limited predictive
capabilities
CHALLENGE
Spark Streaming for
ingesting events in real-
time
SparkML for predictive
analytics
SOLUTION
Cost: Hardware costs
reduced 25-50%
YARN: Shared cluster for
Spark, MapReduce, Hive
etc.
Analytics: Processes 10
billion events daily at 20
milliseconds per event
IMPACT
HDP
Page7 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
DATA FLOW
MANAGEMENT
Claims Re-imbursement Processing with Spark
USE CASE: INSURANCE
Overwhelmed by data
ingest rates
Team expertise in R
Lots of key features like
textual features not
incorporated
CHALLENGE
Use Spark to optimize
claims reimbursements
process
Leverage Spark’s machine
learning capabilities to
process and analyze all
claims.
SOLUTION
Insurance companies can
now process all claims and
detect over and under
payment
More accurate over and
under payment detection
IMPACT
HDP
Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
On-going Customer Use Cases with Spark
HealthCare
Entity Resolution
• Different feeds represent entities in different
forms
– need to identify same entities
• No standard packages available to do the
job.
• Additional features need to be derived to
expand context and disambiguate
Requirement
• Extensible entity resolution framework
– mix supervised and unsupervised learning
John Smith J. Smith
A B
E
DC
H
GF
name name
location location
=
Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
On-going Customer Use Cases with Spark
Finance
Efficiently bring HBase Data into
Spark
• HBase is the operational store of record
• End user Analytics via Spark
Requirement:
• Efficient scans via predicate pushdown
via co-processors ° ° ° ° ° ° ° ° °HDFS
YARN
Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark + Hadoop - The Road Ahead
Innovate at the Core
Seamless Data Access
Data Science Acceleration
Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Innovate at the Core
Storage
– RDD Sharing with HDFS Memory Tier
Resource Management
– Spark on YARN improvements
Security
– Enhance SparkSQL Security
– Enhance Wire Encryption
Governance
– Integration with Atlas
Operations
– Ambari to Support Multiple Spark Versions
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
Spark + Hadoop - The Road Ahead
ORC FileNiFiHBase Custom
Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Seamless Data Access
Data Access
– Hive ORC
– Hbase Connector
– NiFi Streams
Spark + Hadoop - The Road Ahead
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
ORC FileNiFiHBase Custom
Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Data Science Acceleration
Apache Zeppelin
– Spark development and visualization
Data Science Libraries
– Jump start tools leveraging Spark Machine
Learning (ie Magellan, Entity Resolution)
Core Machine Learning
– One vs Rest classifiers, multi-class
classifiers
– Serializing Machine Learning pipelines
– Data set loader for public data sets
Spark + Hadoop - The Road Ahead
Data Sources
Data
Science
Acceleration
Data Science ToolsNotebook
Data Science Libraries
Apache Hadoop, Hive and HBase
InnovateattheCore
Apache Spark
Zeppelin
Governance
Security
Operations
APIs
Spark Core EngineSpark Core Engine
Spark
SQL
Spark
Streaming
MLlib GraphX
1 ° ° ° ° ° ° ° NHDFS
YARN
SQL
Hive
NoSQL
Hbase
Enterprise Ready
Seamless
DataAccess
ORC FileNiFiHBase Custom
Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache Zeppelin Open Web-based Notebook
for interactive analytics
Features
Ad-hoc experimentation
Deeply integrated with Spark + Hadoop
Supports multiple language backends
Incubating at Apache
Use Case
Data exploration and discovery
Visualization
Interactive snippet-at-a-time experience
“Modern Data Science Studio”
Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Geospatial Insight
Where do people go on weekends?
Does usage pattern change with time?
Predict the drop off point of a user?
Predict the location where next pick up can be expected?
Identify crime hotspots
How do these hotspots evolve with time?
Predict the likelihood of crime occurring at a given neighborhood
Predict climate at fairly granular level
Climate insurance: do I need to buy insurance for my crops?
Climate as a factor in crime: Join climate dataset with Crimes
Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Magellan on Spark Packages
What
– Brings GeoSpatial Analytics to Big Data powered by Spark
– Available at Spark Packages
http://spark-packages.org/package/harsha2010/magellan
Key Features
– Parse geospatial data and metadata into Shapes + Metadata
– Python and Scala support
– Efficient Geometric Queries
- ESRI Hive Library
- simple and intuitive syntax
– Scalable implementations of common algorithms
Learn More
– Magellan Blog: http://tinyurl.com/magellanBlog
Find Sample Magellan Notebook:
http://tinyurl.com/zeppelinNotebooks
Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Spark + Hadoop - The Road Ahead
Innovate at the Core
Seamless Data Access
Data Science Acceleration
Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Magellan Walkthrough
Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Maximizing Revenue for UBER drivers
Data Insight Opportunity
– Uber publishes anonymized GeoSpatial trip data
– City of San Francisco has an active Open Data program
- demographics, neighborhoods, traffic
Challenges
– What neighborhood should a driver hangout to maximize revenue?
Solution
– Leverage Spark to transform and aggregate data
– Use Magellan to do GeoSpatial Queries
uber-magellan-nb
uber-magellan-nb
uber-magellan-nb
uber-magellan-nb
Hortonworks Data Platform
Perfect Together
+ Hadoop

More Related Content

What's hot

Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Hortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
DataWorks Summit
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
Hortonworks
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
Joe Percivall
 
Powering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the CloudPowering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the Cloud
Hortonworks
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
 
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy KeynoteWelcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
DataWorks Summit
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
DataWorks Summit
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an open source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an open source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
DataWorks Summit
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
DataWorks Summit
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
Hortonworks
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar Hortonworks
 

What's hot (20)

Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power SystemsDelivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
Delivering a Flexible IT Infrastructure for Analytics on IBM Power Systems
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4Hortonworks Data In Motion Series Part 4
Hortonworks Data In Motion Series Part 4
 
MiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talkMiNiFi 0.0.1 MeetUp talk
MiNiFi 0.0.1 MeetUp talk
 
Powering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the CloudPowering Big Data Success On-Prem and in the Cloud
Powering Big Data Success On-Prem and in the Cloud
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy KeynoteWelcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
 
Apache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the unionApache Hadoop YARN: state of the union
Apache Hadoop YARN: state of the union
 
Overview of new features in Apache Ranger
Overview of new features in Apache RangerOverview of new features in Apache Ranger
Overview of new features in Apache Ranger
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Data in the Cloud Crash Course
Data in the Cloud Crash CourseData in the Cloud Crash Course
Data in the Cloud Crash Course
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an open source Hybrid Cloud Data ArchitectureRunning Enterprise Workloads with an open source Hybrid Cloud Data Architecture
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
 
Building a data-driven authorization framework
Building a data-driven authorization frameworkBuilding a data-driven authorization framework
Building a data-driven authorization framework
 
Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications Pivotal - Advanced Analytics for Telecommunications
Pivotal - Advanced Analytics for Telecommunications
 
Apache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJApache Hadoop Crash Course - HS16SJ
Apache Hadoop Crash Course - HS16SJ
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise HadoopHDP Advanced Security: Comprehensive Security for Enterprise Hadoop
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar Hortonworks, Novetta and Noble Energy Webinar
Hortonworks, Novetta and Noble Energy Webinar
 

Similar to Spark Summit EMEA - Arun Murthy's Keynote

Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
Isheeta Sanghi
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
Hortonworks
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Spark Summit
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
Mac Moore
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
Hortonworks
 
OOP 2014
OOP 2014OOP 2014
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Hortonworks
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
Data Con LA
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
Hortonworks
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
POSSCON
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks
 

Similar to Spark Summit EMEA - Arun Murthy's Keynote (20)

Spark + Hadoop Perfect together
Spark + Hadoop Perfect togetherSpark + Hadoop Perfect together
Spark + Hadoop Perfect together
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Hadoop and Spark – Perfect Together
Hadoop and Spark – Perfect TogetherHadoop and Spark – Perfect Together
Hadoop and Spark – Perfect Together
 
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
Hadoop and Spark-Perfect Together-(Arun C. Murthy, Hortonworks)
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder HortonworksThe Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration Hortonworks Oracle Big Data Integration
Hortonworks Oracle Big Data Integration
 
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
Discover Red Hat and Apache Hadoop for the Modern Data Architecture - Part 3
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
Hortonworks and Red Hat Webinar_Sept.3rd_Part 1
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Spark Summit EMEA - Arun Murthy's Keynote

  • 1. Spark and Hadoop Perfect Together Arun Murthy Hortonworks Co-Founder @acmurthy © Hortonworks Inc. 2011 – 2015. All Rights Reserved
  • 2. Data Operating System Enable all data and applications TO BE accessible and shared BY any end-users
  • 3. Page3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Operating System YARN: data operating system Governance Security Operations Resource management Data access: batch, interactive, real-time Storage Commodity Appliance Cloud Built on a centralized architecture of shared enterprise services: Scalable tiered storage Resource and workload management Trusted data governance & metadata management Consistent operations Comprehensive security Developer APIs and tools Hadoop/YARN-powered data operating system 100% open source, multi-tenant data platform for any application, any data set, anywhere.
  • 4. Page4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Elegant Developer APIs DataFrames, Machine Learning, and SQL Made for Data Science All apps need to get predictive at scale and fine granularity Democratize Machine Learning Spark is doing to ML on Hadoop what Hive did for SQL on Hadoop Community Broad developer, customer and partner interest Realize Value of Data Operating System A key tool in the Hadoop toolbox Why We Love Spark at Hortonworks YARN Scala Java Python R APIs Spark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N HDFS
  • 5. Page5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Let’s Talk Real-World Use-Cases
  • 6. Page6 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DATA FLOW MANAGEMENT Streaming & Machine Learning for Web Analytics USE CASE: Cost: Storage and processing not economical at scale Silos: Separate clusters — Hadoop & Spark Analytics: Retroactive view limited predictive capabilities CHALLENGE Spark Streaming for ingesting events in real- time SparkML for predictive analytics SOLUTION Cost: Hardware costs reduced 25-50% YARN: Shared cluster for Spark, MapReduce, Hive etc. Analytics: Processes 10 billion events daily at 20 milliseconds per event IMPACT HDP
  • 7. Page7 © Hortonworks Inc. 2011 – 2015. All Rights ReservedPage7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved DATA FLOW MANAGEMENT Claims Re-imbursement Processing with Spark USE CASE: INSURANCE Overwhelmed by data ingest rates Team expertise in R Lots of key features like textual features not incorporated CHALLENGE Use Spark to optimize claims reimbursements process Leverage Spark’s machine learning capabilities to process and analyze all claims. SOLUTION Insurance companies can now process all claims and detect over and under payment More accurate over and under payment detection IMPACT HDP
  • 8. Page8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved On-going Customer Use Cases with Spark HealthCare Entity Resolution • Different feeds represent entities in different forms – need to identify same entities • No standard packages available to do the job. • Additional features need to be derived to expand context and disambiguate Requirement • Extensible entity resolution framework – mix supervised and unsupervised learning John Smith J. Smith A B E DC H GF name name location location =
  • 9. Page9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved On-going Customer Use Cases with Spark Finance Efficiently bring HBase Data into Spark • HBase is the operational store of record • End user Analytics via Spark Requirement: • Efficient scans via predicate pushdown via co-processors ° ° ° ° ° ° ° ° °HDFS YARN
  • 10. Page10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark + Hadoop - The Road Ahead Innovate at the Core Seamless Data Access Data Science Acceleration
  • 11. Page11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Innovate at the Core Storage – RDD Sharing with HDFS Memory Tier Resource Management – Spark on YARN improvements Security – Enhance SparkSQL Security – Enhance Wire Encryption Governance – Integration with Atlas Operations – Ambari to Support Multiple Spark Versions Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess Spark + Hadoop - The Road Ahead ORC FileNiFiHBase Custom
  • 12. Page12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Seamless Data Access Data Access – Hive ORC – Hbase Connector – NiFi Streams Spark + Hadoop - The Road Ahead Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess ORC FileNiFiHBase Custom
  • 13. Page13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Data Science Acceleration Apache Zeppelin – Spark development and visualization Data Science Libraries – Jump start tools leveraging Spark Machine Learning (ie Magellan, Entity Resolution) Core Machine Learning – One vs Rest classifiers, multi-class classifiers – Serializing Machine Learning pipelines – Data set loader for public data sets Spark + Hadoop - The Road Ahead Data Sources Data Science Acceleration Data Science ToolsNotebook Data Science Libraries Apache Hadoop, Hive and HBase InnovateattheCore Apache Spark Zeppelin Governance Security Operations APIs Spark Core EngineSpark Core Engine Spark SQL Spark Streaming MLlib GraphX 1 ° ° ° ° ° ° ° NHDFS YARN SQL Hive NoSQL Hbase Enterprise Ready Seamless DataAccess ORC FileNiFiHBase Custom
  • 14. Page14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Introducing Apache Zeppelin Open Web-based Notebook for interactive analytics Features Ad-hoc experimentation Deeply integrated with Spark + Hadoop Supports multiple language backends Incubating at Apache Use Case Data exploration and discovery Visualization Interactive snippet-at-a-time experience “Modern Data Science Studio”
  • 15. Page15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Geospatial Insight Where do people go on weekends? Does usage pattern change with time? Predict the drop off point of a user? Predict the location where next pick up can be expected? Identify crime hotspots How do these hotspots evolve with time? Predict the likelihood of crime occurring at a given neighborhood Predict climate at fairly granular level Climate insurance: do I need to buy insurance for my crops? Climate as a factor in crime: Join climate dataset with Crimes
  • 16. Page16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Magellan on Spark Packages What – Brings GeoSpatial Analytics to Big Data powered by Spark – Available at Spark Packages http://spark-packages.org/package/harsha2010/magellan Key Features – Parse geospatial data and metadata into Shapes + Metadata – Python and Scala support – Efficient Geometric Queries - ESRI Hive Library - simple and intuitive syntax – Scalable implementations of common algorithms Learn More – Magellan Blog: http://tinyurl.com/magellanBlog Find Sample Magellan Notebook: http://tinyurl.com/zeppelinNotebooks
  • 17. Page17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Spark + Hadoop - The Road Ahead Innovate at the Core Seamless Data Access Data Science Acceleration
  • 18. Page18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Magellan Walkthrough
  • 19. Page19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved Maximizing Revenue for UBER drivers Data Insight Opportunity – Uber publishes anonymized GeoSpatial trip data – City of San Francisco has an active Open Data program - demographics, neighborhoods, traffic Challenges – What neighborhood should a driver hangout to maximize revenue? Solution – Leverage Spark to transform and aggregate data – Use Magellan to do GeoSpatial Queries

Editor's Notes

  1. NEED SPEAKER NOTES
  2. Ensuring Spark is well integrated with YARN, Ambari, and Ranger enables enterprise to deploy Spark apps with confidence, and since HDP is available across Windows, Linux, on-premises and cloud deployment environments, it just makes it that much easier for enterprises to adopt it.
  3. TALK TRACK MACHINE LEARNING WITH SPARK WEBTRENDS PROVIDES DIGITAL MARKETING ANALYTICS STATS + PROCESSES 10 BILLION EVENTS PER DAY + AT AN AVERAGE SPEED OF 20 MILLISECONDS PER EVENT [Back ground] Listen to Peter Crossley Hadoop Summit keynote(min 22:22): http://brightcove.fora.tv/services/player/bcpid4287593805001?bckey=AQ~~,AAACbMgRlRk~,KnD13XNmCDZZPWkNmxmPMFTH2h0USbHh&bclid=4287661488001&bctid=4285724101001 Read recent article on Web Trends by Peter Crossley: http://insidebigdata.com/2015/07/14/strategic-big-data-pivot-leads-webtrends-to-success/
  4. Overwhelmed by data ingest rates, reduce sample data to fit into edge node Random Forest models in R Team expertise in R and not Scala/ Java Lots of key features like textual features not incorporated (cannot handle feature blowup in R)
  5. HBase has efficient scans, however can Spark leverage it? Push predicates and prune columns
  6. HBase has efficient scans, however can Spark leverage it? Push predicates and prune columns
  7. Allow for RDD sharing with the HDFS Memory Tier. Improve dynamic resource allocation via YARN. Mature SparkSQL and Spark Streaming to GA quality. HDFS Memory Tier There are many use cases where Spark today feels less than ideal. For example, using Spark in a shared environment, with a middle tier fielding request from multiple tier is a common problem. SparkContext is a heavyweight object and is tied to a specific user session. Using Spark in shared environment requires features such as RDD sharing with HDFS memory tier and SparkContext sharing. There are other areas of improvement in Spark’s YARN integration. For example, today YARN applications logs are published when the application finishes running. This model does not work for Spark Streaming, which is a long running application and doesn’t get a chance to publish its logs. Spark’s YARN ATS integration also needs work to help it scale and not become a bottleneck. SparkSQL is another critical area where we want to add more value by bringing Hive level features such as security (SPARK-11265), ACID and vectorization features to SparkSQL and make it GA in our platform over the coming months.
  8. Seamless Data Access One Hive Seamless use of capabilities across Spark and Hive via SQL including common file formats. Deliver connectors for HBase (HFile). Spark is a great data processing engine, and it provides more value when it can process more data. The value of data lake is that it brings more data under one roof and opens new opportunity for insights and to drive efficiency. The data lake is delivered by YARN and Hadoop to provide massive scale and run all types of workload to take advantage of that data. With the DataSource API, Spark provides a first class way to bring data from external sources while leveraging these systems for filtering, predicate pushdown etc. We used DataSource API to bring ORC data into Spark. And now we are working to bring HBase data efficiently into processing with Spark. This will be different from existing Spark + HBase connector in that it will leverage HBase for predicate pushdown, column filtering and will be more efficient. Look for a tech preview of this feature in the coming months.
  9. Data science notebooks and automation for the most common analysis scenarios. Zeppelin Include support for GeoSpatial and Entity Resolution. Magellan
  10. TALK TRACK Ad-hoc experimentation Spark, Hive, Shell, Flink, Tajo, Ignite, Lens, etc Deeply integrated with Spark + Hadoop Can be managed via Ambari Stacks Supports multiple language backends Pluggable “Interpreters” Incubating at Apache 100% open source and open community [NEXT SLIDE]
  11. Hive ESRI Hive (thin wrapper on ESRI Java) Magellan available on Github (https://github.com/harsha2010/magellan) Can parse and understand most widely used formats GeoJSON, ESRIShapefile All geometries supported 1.0.3 released (http://spark-packages.org/package/harsha2010/magellan) Broadcast join available for common scenarios Work in progress (targeted 1.0.4) Geohash Join optimization Map Matching algorithm using Markov Models Python and Scala support Please give it a try and give us feedback!
  12. One of our Insurance customers is using Spark to optimize the claims reimbursements and are using Spark’s machine learning capabilities to process and analyze all claims.
  13. We have seen rapid adoption of Spark in our customer base and we want to thank our customers for choosing Spark on HDP. We also want to thank our partners Microsoft, Databricks, HP, NFLabs and the community on sharing this journey with us.