SlideShare a Scribd company logo
1 of 44
Expect More from Hadoop!


©MapR Technologies - Confidential   1
My Background

     University, Startups
       – Aptex, MusicMatch, ID Analytics, Veoh
       – big data since before it was big

     Open source
       – even before the internet
       – Apache Hadoop, Mahout, Zookeeper, Drill
       – bought the beer at first HUG

 MapR
 Founding member of Apache Drill


©MapR Technologies - Confidential       2
MapR Technologies

     Enterprise quality distribution for Hadoop
       – Many extensions beyond basic Hadoop
     Super strong team
       – Long history of successful startups
     Strong supporter of Apache Drill
       –      and open source in general




©MapR Technologies - Confidential   3
meta-Hadoop?



©MapR Technologies - Confidential         4
meta
                    Meta- (from Greek: μετά = "after", "beyond",
                    "with", "adjacent", "self"), is a…




©MapR Technologies - Confidential         5
Answering
             Beyond ≠ yesterday’s
                      problems


©MapR Technologies - Confidential   6
Philosophy First




                              What is History?



©MapR Technologies - Confidential    7
The study of the past

(what came before now)


©MapR Technologies - Confidential   8
What is the future?

        (it comes after now)


©MapR Technologies - Confidential   9
©MapR Technologies - Confidential   10
©MapR Technologies - Confidential   11
But the future also
                     has a past!



©MapR Technologies - Confidential   12
the future of the past
             is not
     the past of the future



©MapR Technologies - Confidential   13
Do you remember the
                       future?



©MapR Technologies - Confidential   14
©MapR Technologies - Confidential   15
©MapR Technologies - Confidential   16
©MapR Technologies - Confidential   17
Those are
                                    yesterday’s
                                      answers

©MapR Technologies - Confidential        18
and also
                                     the seeds
                                    of tomorrow

©MapR Technologies - Confidential        19
Guys wearing
                                      Fedoras



©MapR Technologies - Confidential      20
Hadoop has
                                     a history


©MapR Technologies - Confidential       21
Hadoop also
                                       has a
                                      future

©MapR Technologies - Confidential        22
The Old Future of Hadoop

     Implementing yet another Google paper
       –   Map-reduce and HDFS, and Yarn and Tez
       –   more and more, but not really different


     Eco-system additions (more Google papers)
       –   simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc)
       –   key-value store (big table)
       –   ad hoc query (Dremel)
       –   also not really different


     Stands apart from other computing
       –   required by HDFS and other limitations

©MapR Technologies - Confidential            23
The New Future of Hadoop

     Real-time processing
       –   Combines real-time and long-time


     Integration with traditional IT
       –   No need to stand apart


     Integration with new technologies
       –   Solr, Node.js, Twisted all should work directly on Hadoop


     Fast and flexible computation
       –   Drill logical plan language


©MapR Technologies - Confidential             24
Example #1
                                    Search Abuse


©MapR Technologies - Confidential        25
History matrix

                                    One row per user

                                    One column per thing




©MapR Technologies - Confidential        26
Recommendation based on
                                    cooccurrence

                                    Cooccurrence gives item-item
                                    mapping

                                    One row and column per thing




©MapR Technologies - Confidential         27
Cooccurrence matrix can also be
                                    implemented as a search index




©MapR Technologies - Confidential         28
SolR
                                                              SolR
                          Complete    Cooccurrence            Indexer
                                                            Solr
                                                            Indexer
                            history     (Mahout)          indexing




                                        Item meta-             Index
                                           data               shards




©MapR Technologies - Confidential                    29
SolR
                                                             SolR
                                User                         Indexer
                                                           Solr
                                        Web tier           Indexer
                              history                     search




                                        Item meta-
                                                              Index
                                           data              shards




©MapR Technologies - Confidential                    30
Objective Results

     At a very large credit card company


     History is all transactions, all web interaction


     Processing time cut from 20 hours per day to 3


     Recommendation engine load time decreased from 8 hours to 3
      minutes




©MapR Technologies - Confidential       31
Scaling Estimates – Twitter Fire hose

     Old School – 8+ separate              MapR – one platform
      clusters, 20-25 nodes                  –   5-10 nodes total, any node does any
       –   >3 Kafka nodes                        job
       –   >2 TwitterLogger                  –   Full HA included,
       –   5-10 Hadoop                           backups included,
       –   >3 Storm                              disaster recovery included
       –   3 zookeepers (or not?)
       –   NAS for web storage
       –   >2 web servers




©MapR Technologies - Confidential   32
Example #2
                         Web Technology


©MapR Technologies - Confidential   33
Real-time   Fast analysis
                                         data     (Storm)




                                                   Analytic
                                                                   Raw logs
                                                   output




©MapR Technologies - Confidential                             34
Large analysis
                                                    (map-reduce)




                                    Analytic
                                                       Raw logs
                                    output




©MapR Technologies - Confidential              35
Presentation
                                    Browser
                                                tier (d3 +
                                      query
                                                 node.js)




                                                 Analytic
                                                                 Raw logs
                                                 output




©MapR Technologies - Confidential                           36
Old School Storm: Complex architecture
               Twitter
                                    Twitter
                                       API                              Kafka
                                                        Kafka             API
                TwitterLogger                            Kafka
                                                          Kafka
                                                       Cluster
                                                        Cluster
                                                         Cluster
                                                                                        Storm
                                               Kafka                            Storm



                                                                                            Web
                                              Flume                                         Data
                                                                                            NAS

                                                       HDFS
                                                       Data
                                      Hadoop                                                    http



                                                                                          Web-server
©MapR Technologies - Confidential                                  37
MapR: One Platform with Streaming Writes
          Twitter


                           Twitter
                              API                                                      http



                                       Catcher                                       Web-server
                    TwitterLogger       Catcher                 Storm

                                            NFS     NFS                 NFS     NFS
                       Optional
                                     HDFS
                      MapReduce               Topic                           Web
                                      API
                                              Queue                           Data
            MapR
                                                  Users can also run extended
                                                  analytics/MapReduce on the stored
                                                  data

©MapR Technologies - Confidential                   38
©MapR Technologies - Confidential   39
Objective Results

     Real-time + long-time analysis is seamless


     Web tier can be rooted directly on Hadoop cluster


     No need to move data




©MapR Technologies - Confidential     40
The future is
                               not what we
                                thought it
                                 would be

©MapR Technologies - Confidential   41
It is better!



©MapR Technologies - Confidential   42
Get Involved!


                                       Tweet:
                                    #strataconf
                                        #mapr
                                    @ted_dunning


©MapR Technologies - Confidential         43
Get Involved!

     Join Apache Drill!
       – drill-dev-subscribe@incubator.apache.org
       – Follow @apachedrill


     Join MapR!
       –   jobs@mapr.com

     Download these slides
       –   http://www.mapr.com/company/events/strata-conference-2-2-27-13

     Contact me:
       – tdunning@maprtech.com
       – tdunning@apache.org
       – @ted_dunning



©MapR Technologies - Confidential           44

More Related Content

What's hot

Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast DataMapR Technologies
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimesinside-BigData.com
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19ExtremeEarth
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Ted Dunning
 

What's hot (8)

Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Design Patterns for working with Fast Data
Design Patterns for working with Fast DataDesign Patterns for working with Fast Data
Design Patterns for working with Fast Data
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Addressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC RuntimesAddressing Emerging Challenges in Designing HPC Runtimes
Addressing Emerging Challenges in Designing HPC Runtimes
 
Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19Scalable Deep Learning in ExtremeEarth-phiweek19
Scalable Deep Learning in ExtremeEarth-phiweek19
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06
 

Viewers also liked

CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Practical Computing Wiith Chaos
Practical Computing Wiith ChaosPractical Computing Wiith Chaos
Practical Computing Wiith ChaosMapR Technologies
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasMapR Technologies
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchMapR Technologies
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning ClusteringMapR Technologies
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill MapR Technologies
 

Viewers also liked (8)

CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Practical Computing Wiith Chaos
Practical Computing Wiith ChaosPractical Computing Wiith Chaos
Practical Computing Wiith Chaos
 
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael HausenblasBerlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
Berlin Buzz Words - Apache Drill by Ted Dunning & Michael Hausenblas
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Hadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 MarchHadoop Summit - Hausenblas 20 March
Hadoop Summit - Hausenblas 20 March
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning Clustering
 
Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill Berlin Hadoop Get Together Apache Drill
Berlin Hadoop Get Together Apache Drill
 

Similar to Dunning strata-2012-27-02

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07Ted Dunning
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San DiegoMapR Technologies
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in businessMapR Technologies
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...DataWorks Summit/Hadoop Summit
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to MahoutTed Dunning
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGMapR Technologies
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04Ted Dunning
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limitsAntje Barth
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESSysFera
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceTed Dunning
 

Similar to Dunning strata-2012-27-02 (20)

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...
 
Introduction to Mahout
Introduction to MahoutIntroduction to Mahout
Introduction to Mahout
 
Introduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUGIntroduction to Mahout given at Twin Cities HUG
Introduction to Mahout given at Twin Cities HUG
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Container and Kubernetes without limits
Container and Kubernetes without limitsContainer and Kubernetes without limits
Container and Kubernetes without limits
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
David Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TESDavid Loureiro - Presentation at HP's HPC & OSL TES
David Loureiro - Presentation at HP's HPC & OSL TES
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 

More from MapR Technologies

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscapeMapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureMapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionMapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformMapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareMapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsMapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLMapR Technologies
 

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Dunning strata-2012-27-02

  • 1. Expect More from Hadoop! ©MapR Technologies - Confidential 1
  • 2. My Background  University, Startups – Aptex, MusicMatch, ID Analytics, Veoh – big data since before it was big  Open source – even before the internet – Apache Hadoop, Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR  Founding member of Apache Drill ©MapR Technologies - Confidential 2
  • 3. MapR Technologies  Enterprise quality distribution for Hadoop – Many extensions beyond basic Hadoop  Super strong team – Long history of successful startups  Strong supporter of Apache Drill – and open source in general ©MapR Technologies - Confidential 3
  • 5. meta Meta- (from Greek: μετά = "after", "beyond", "with", "adjacent", "self"), is a… ©MapR Technologies - Confidential 5
  • 6. Answering Beyond ≠ yesterday’s problems ©MapR Technologies - Confidential 6
  • 7. Philosophy First What is History? ©MapR Technologies - Confidential 7
  • 8. The study of the past (what came before now) ©MapR Technologies - Confidential 8
  • 9. What is the future? (it comes after now) ©MapR Technologies - Confidential 9
  • 10. ©MapR Technologies - Confidential 10
  • 11. ©MapR Technologies - Confidential 11
  • 12. But the future also has a past! ©MapR Technologies - Confidential 12
  • 13. the future of the past is not the past of the future ©MapR Technologies - Confidential 13
  • 14. Do you remember the future? ©MapR Technologies - Confidential 14
  • 15. ©MapR Technologies - Confidential 15
  • 16. ©MapR Technologies - Confidential 16
  • 17. ©MapR Technologies - Confidential 17
  • 18. Those are yesterday’s answers ©MapR Technologies - Confidential 18
  • 19. and also the seeds of tomorrow ©MapR Technologies - Confidential 19
  • 20. Guys wearing Fedoras ©MapR Technologies - Confidential 20
  • 21. Hadoop has a history ©MapR Technologies - Confidential 21
  • 22. Hadoop also has a future ©MapR Technologies - Confidential 22
  • 23. The Old Future of Hadoop  Implementing yet another Google paper – Map-reduce and HDFS, and Yarn and Tez – more and more, but not really different  Eco-system additions (more Google papers) – simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc) – key-value store (big table) – ad hoc query (Dremel) – also not really different  Stands apart from other computing – required by HDFS and other limitations ©MapR Technologies - Confidential 23
  • 24. The New Future of Hadoop  Real-time processing – Combines real-time and long-time  Integration with traditional IT – No need to stand apart  Integration with new technologies – Solr, Node.js, Twisted all should work directly on Hadoop  Fast and flexible computation – Drill logical plan language ©MapR Technologies - Confidential 24
  • 25. Example #1 Search Abuse ©MapR Technologies - Confidential 25
  • 26. History matrix One row per user One column per thing ©MapR Technologies - Confidential 26
  • 27. Recommendation based on cooccurrence Cooccurrence gives item-item mapping One row and column per thing ©MapR Technologies - Confidential 27
  • 28. Cooccurrence matrix can also be implemented as a search index ©MapR Technologies - Confidential 28
  • 29. SolR SolR Complete Cooccurrence Indexer Solr Indexer history (Mahout) indexing Item meta- Index data shards ©MapR Technologies - Confidential 29
  • 30. SolR SolR User Indexer Solr Web tier Indexer history search Item meta- Index data shards ©MapR Technologies - Confidential 30
  • 31. Objective Results  At a very large credit card company  History is all transactions, all web interaction  Processing time cut from 20 hours per day to 3  Recommendation engine load time decreased from 8 hours to 3 minutes ©MapR Technologies - Confidential 31
  • 32. Scaling Estimates – Twitter Fire hose  Old School – 8+ separate  MapR – one platform clusters, 20-25 nodes – 5-10 nodes total, any node does any – >3 Kafka nodes job – >2 TwitterLogger – Full HA included, – 5-10 Hadoop backups included, – >3 Storm disaster recovery included – 3 zookeepers (or not?) – NAS for web storage – >2 web servers ©MapR Technologies - Confidential 32
  • 33. Example #2 Web Technology ©MapR Technologies - Confidential 33
  • 34. Real-time Fast analysis data (Storm) Analytic Raw logs output ©MapR Technologies - Confidential 34
  • 35. Large analysis (map-reduce) Analytic Raw logs output ©MapR Technologies - Confidential 35
  • 36. Presentation Browser tier (d3 + query node.js) Analytic Raw logs output ©MapR Technologies - Confidential 36
  • 37. Old School Storm: Complex architecture Twitter Twitter API Kafka Kafka API TwitterLogger Kafka Kafka Cluster Cluster Cluster Storm Kafka Storm Web Flume Data NAS HDFS Data Hadoop http Web-server ©MapR Technologies - Confidential 37
  • 38. MapR: One Platform with Streaming Writes Twitter Twitter API http Catcher Web-server TwitterLogger Catcher Storm NFS NFS NFS NFS Optional HDFS MapReduce Topic Web API Queue Data MapR Users can also run extended analytics/MapReduce on the stored data ©MapR Technologies - Confidential 38
  • 39. ©MapR Technologies - Confidential 39
  • 40. Objective Results  Real-time + long-time analysis is seamless  Web tier can be rooted directly on Hadoop cluster  No need to move data ©MapR Technologies - Confidential 40
  • 41. The future is not what we thought it would be ©MapR Technologies - Confidential 41
  • 42. It is better! ©MapR Technologies - Confidential 42
  • 43. Get Involved! Tweet: #strataconf #mapr @ted_dunning ©MapR Technologies - Confidential 43
  • 44. Get Involved!  Join Apache Drill! – drill-dev-subscribe@incubator.apache.org – Follow @apachedrill  Join MapR! – jobs@mapr.com  Download these slides – http://www.mapr.com/company/events/strata-conference-2-2-27-13  Contact me: – tdunning@maprtech.com – tdunning@apache.org – @ted_dunning ©MapR Technologies - Confidential 44

Editor's Notes

  1. Take all of Twitter400 x 10^6 tweets per day < 400 GB per day < 40MB/s
  2. Kafka is a message Queuing system
  3. Catcher is a processorAll of the systems can be run out of Hadoop. Warden can be configured to run Storm as well. Simple Architecture – all from one platform. The green blocks are data that is available for other analytics.