SlideShare a Scribd company logo
1 of 12
Case Study of BigData use with MapR
           M7 in the Enterprise Datacenter
  Zeljko Dodlek
  Sales Director DACH
  zdodlek@maprtech.com
  +49 (0) 151 120 555 07
©MapR Technologies - Confidential   1
Agenda



        Ancestry Case Study
        MapR Overview
            Q&A




©MapR Technologies - Confidential   2
Ancestry use Case (page 1)

    What does Ancestry do?
Ancestry.com is an online family history service that uses machine
learning and several other statistical techniques to provide services
such as ancestry information and DNA sequencing to its users.


    Business Challenges?
10 Billion records in a 4 PB DataStore
40.000 Record collections (date of birth/death, census, military
status,….)
2+ Million subscribers
10+ Million registered users
DNA matching added to their offering
    ©MapR Technologies - Confidential   3
Ancestry use Case (page 2)

 Why MapR ?
HA Requirements for the NameNode & TaskTracker
Easy way to ingest Data into the cluster
Safe way for using different Jobs on the same cluster
Unified File & Table platform


Configuration
3 separate clusters
* DNA Matching
* Machine Learning
* Data Mining


    ©MapR Technologies - Confidential   4
MapRTech Overview
            Enterprise Grade Hadoop Distribution
            Innovations in the areas of the DataPlatform, Map&Reduce
             and HBase
            Enabling Customers to depend on our Hadoop Distribution
              –    No Single Points of Failure
              –    Guaranteeing SLA’s
              –    Easy to Install/run/expand
            Professional Services – Installation, consulting and training
            Support 7 x24




©MapR Technologies - Confidential                5
MapR Distribution




©MapR Technologies - Confidential   6
MapR’s value addition




                                    Distribution made for the enterprise
©MapR Technologies - Confidential                   7
Expanding Hadoop Use Cases


                                                              Hadoop APIs
                                                              for Hadoop
                                                              Applications


                                                                                   ODBC and JDBC for
                                    NFS for file-based
                                                                                      SQL-based
                                      applications
                                                                                     applications




                                                                                                Mission
                      Real-time                                                            critical and SLA
                     Applications                                                            dependent
                                                                                            Applications


                                                         Blue = MapR Innovations
©MapR Technologies - Confidential                                    8
No NameNode Architecture
Other Distributions (HDFS Federation)                                              MapR
                                          NAS
                                       APPLIANCE



                  A        B            C    D      E   F
                                                    NameNode
              NameNode                 NameNode    NameNode


                                                                           E
               DataNode                DataNode    DataNode
                                                                       A       F     C    D       E     D


               DataNode                DataNode    DataNode
                                                                       A       B     B    C        E    B


               DataNode                DataNode    DataNode
                                                                       A       D     C    F        B    F

                 Multiple single points of failure                   HA w/ automatic failover and re-replication
                 Limited to 50M files per NameNode                   Up to 1T files (> 5000x advantage)
                 Performance bottleneck                              Higher performance
                 Commercial NAS required                             100% commodity hardware
                 Metadata must fit in memory                         Metadata is persisted to disk

   ©MapR Technologies - Confidential                           9
Simplifying HBase Architecture


                          HBase

                             JVM


                             DFS    HBase

                             JVM     JVM

                            ext3    MapR    Unified


                           Disks    Disks    Disks


          Other Distributions

©MapR Technologies - Confidential    10
Selected MapR Customers
                                                                                                                         Global threat
                                                                                                                          analytics
    Intrusion detection & prevention                       Recommendation Engine                                       Virus analysis
    Forensic analysis                                      Family tree connections



Major Credit Card Company                                                                                      Clickstream Analysis
                                                                                   Log analysis               Quality profiling/field
     Recommendation Engine                                                        HBase                       failure analysis
     Fraud detection and Prevention



                                            Fraud                                                                     Customer
                                             Detection                                                                  Sentiment
                                            Channel            Advertising exchange                                  Network Analytics
                                             analytics           analysis and optimization



                                   Customer Revenue
                                    Analytics
                                                                Customer targeting                   Monitors and measures
                                   ETL Offload
                                                                Social media analysis                 behavior of online shoppers
    ©MapR Technologies - Confidential                                      11
Thank You




©MapR Technologies - Confidential   12

More Related Content

More from Swiss Big Data User Group

Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
Swiss Big Data User Group
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
Swiss Big Data User Group
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
Swiss Big Data User Group
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
Swiss Big Data User Group
 

More from Swiss Big Data User Group (20)

Making Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to useMaking Hadoop based analytics simple for everyone to use
Making Hadoop based analytics simple for everyone to use
 
A real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operatorA real life project using Cassandra at a large Swiss Telco operator
A real life project using Cassandra at a large Swiss Telco operator
 
Data Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2CData Analytics – B2B vs. B2C
Data Analytics – B2B vs. B2C
 
SQL on Hadoop
SQL on HadoopSQL on Hadoop
SQL on Hadoop
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Big Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companiesBig Data and Data Science for traditional Swiss companies
Big Data and Data Science for traditional Swiss companies
 
Design Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time LearningDesign Patterns for Large-Scale Real-Time Learning
Design Patterns for Large-Scale Real-Time Learning
 
Educating Data Scientists of the Future
Educating Data Scientists of the FutureEducating Data Scientists of the Future
Educating Data Scientists of the Future
 
Unleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data WarehouseUnleash the power of Big Data in your existing Data Warehouse
Unleash the power of Big Data in your existing Data Warehouse
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
Project "Babelfish" - A data warehouse to attack complexity
 Project "Babelfish" - A data warehouse to attack complexity Project "Babelfish" - A data warehouse to attack complexity
Project "Babelfish" - A data warehouse to attack complexity
 
Brainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density ChoiceBrainserve Datacenter: the High-Density Choice
Brainserve Datacenter: the High-Density Choice
 
Urturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maketUrturn on AWS: scaling infra, cost and time to maket
Urturn on AWS: scaling infra, cost and time to maket
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
New opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph databaseNew opportunities for connected data : Neo4j the graph database
New opportunities for connected data : Neo4j the graph database
 
Technology Outlook - The new Era of computing
Technology Outlook - The new Era of computingTechnology Outlook - The new Era of computing
Technology Outlook - The new Era of computing
 
In-Store Analysis with Hadoop
In-Store Analysis with HadoopIn-Store Analysis with Hadoop
In-Store Analysis with Hadoop
 
Big Data Visualization With ParaView
Big Data Visualization With ParaViewBig Data Visualization With ParaView
Big Data Visualization With ParaView
 
Introduction to Apache Drill
Introduction to Apache DrillIntroduction to Apache Drill
Introduction to Apache Drill
 

Recently uploaded

Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
Overkill Security
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 

Recently uploaded (20)

Navigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi DaparthiNavigating the Large Language Model choices_Ravi Daparthi
Navigating the Large Language Model choices_Ravi Daparthi
 
Microsoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdfMicrosoft BitLocker Bypass Attack Method.pdf
Microsoft BitLocker Bypass Attack Method.pdf
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 

Case Study of BigData use with MapR M7 in the Enterprise Datacenter

  • 1. Case Study of BigData use with MapR M7 in the Enterprise Datacenter Zeljko Dodlek Sales Director DACH zdodlek@maprtech.com +49 (0) 151 120 555 07 ©MapR Technologies - Confidential 1
  • 2. Agenda  Ancestry Case Study  MapR Overview  Q&A ©MapR Technologies - Confidential 2
  • 3. Ancestry use Case (page 1)  What does Ancestry do? Ancestry.com is an online family history service that uses machine learning and several other statistical techniques to provide services such as ancestry information and DNA sequencing to its users.  Business Challenges? 10 Billion records in a 4 PB DataStore 40.000 Record collections (date of birth/death, census, military status,….) 2+ Million subscribers 10+ Million registered users DNA matching added to their offering ©MapR Technologies - Confidential 3
  • 4. Ancestry use Case (page 2)  Why MapR ? HA Requirements for the NameNode & TaskTracker Easy way to ingest Data into the cluster Safe way for using different Jobs on the same cluster Unified File & Table platform Configuration 3 separate clusters * DNA Matching * Machine Learning * Data Mining ©MapR Technologies - Confidential 4
  • 5. MapRTech Overview  Enterprise Grade Hadoop Distribution  Innovations in the areas of the DataPlatform, Map&Reduce and HBase  Enabling Customers to depend on our Hadoop Distribution – No Single Points of Failure – Guaranteeing SLA’s – Easy to Install/run/expand  Professional Services – Installation, consulting and training  Support 7 x24 ©MapR Technologies - Confidential 5
  • 7. MapR’s value addition Distribution made for the enterprise ©MapR Technologies - Confidential 7
  • 8. Expanding Hadoop Use Cases Hadoop APIs for Hadoop Applications ODBC and JDBC for NFS for file-based SQL-based applications applications Mission Real-time critical and SLA Applications dependent Applications Blue = MapR Innovations ©MapR Technologies - Confidential 8
  • 9. No NameNode Architecture Other Distributions (HDFS Federation) MapR NAS APPLIANCE A B C D E F NameNode NameNode NameNode NameNode E DataNode DataNode DataNode A F C D E D DataNode DataNode DataNode A B B C E B DataNode DataNode DataNode A D C F B F  Multiple single points of failure  HA w/ automatic failover and re-replication  Limited to 50M files per NameNode  Up to 1T files (> 5000x advantage)  Performance bottleneck  Higher performance  Commercial NAS required  100% commodity hardware  Metadata must fit in memory  Metadata is persisted to disk ©MapR Technologies - Confidential 9
  • 10. Simplifying HBase Architecture HBase JVM DFS HBase JVM JVM ext3 MapR Unified Disks Disks Disks Other Distributions ©MapR Technologies - Confidential 10
  • 11. Selected MapR Customers  Global threat analytics  Intrusion detection & prevention  Recommendation Engine  Virus analysis  Forensic analysis  Family tree connections Major Credit Card Company  Clickstream Analysis  Log analysis  Quality profiling/field  Recommendation Engine  HBase failure analysis  Fraud detection and Prevention  Fraud  Customer Detection Sentiment  Channel  Advertising exchange  Network Analytics analytics analysis and optimization  Customer Revenue Analytics  Customer targeting  Monitors and measures  ETL Offload  Social media analysis behavior of online shoppers ©MapR Technologies - Confidential 11
  • 12. Thank You ©MapR Technologies - Confidential 12

Editor's Notes

  1. MapR’s innovations have also expanded the use cases that are possible with Hadoop. Not only do we support the full Hadoop API set. MapR provides support for NFS so any file-based application can access the cluster with no changes or rewrites required. MapR provides ODBC support, so any database application or SQL-based tool can access and manipulate data in a MapR cluster. MapR supports real-time streaming access. This greatly expands the applications that are possible with Hadoop moving beyond a batch limitation. Finally, the full HA, DR and data protection capabilities of MapR allow mission critical apps to be deployed safely and allows administrators to meet stringent SLA targets.
  2. The Namenode today in Hadoop is a single point of failure, a scalability limitation, and a performance bottleneck.With MapR there is no dedicated NameNode. The NameNode function is distributed across the cluster. This provides major advantages in terms of HA, data loss avoidance, scalability and performance. Other distributions you have a bottleneck regardless of the number of nodes in the cluster. With other distributions the most number of files that you can support is 200M at the maximum and that is with an extremely high end server. 50% of the processing of Hadoop in Facebook is to pack and unpack files to try to work around this limitation. MapR scales uniformly.
  3. (ed. Note: this slide is a great white board slide to summarize M7)The stack on the left is a representation of the HBase architecture found in all other distributions. HBase is deployed on a VM that stores its data in the HDFS layer running on a JVM that in turn stores its data in the Linux file system (ext3) which writes the data to disk. This stack results in a lot of administrative tasks, performance issues, and reliability issues. A lot of the infrastructure within HBase is an attempt to make up for the deficiencies in HDFS. You basically have a database solution that needs to deal with random IO that runs on top of a write-once file system. The middle stack shows how MapR simplified the lower part of the stack with our M5 edition that replaced HDFS and the dependency on the Linux file system with a random read/write storage layer. However, HBase is still a separate infrastructure running on top the storage layer within M5. The region servers are separate and users still experience downtime and delays when recovering from node failures and snapshots.With M7 on the far right, MapR has now unified tables and files into a unified data platform. We’ve eliminated the separate HBase infrastructure. The environment is much simpler to manage by eliminating the various redundant components. We’ve provided a uniform data management layer across files and tables, we’ve provided a consistent data protection layer. Recovery from node failures is in seconds, there is 100% data locality, HBase can read directly from snapshots. Files and tables are in the same namespace, volumes, and directories.