Hortonworks
Enterprise Apache Hadoop



March 5, 2013




© Hortonworks Inc. 2013    Page 1
Hortonworks
•  Who is Hortonworks

•  Our Approach

•  Customer Use Cases




                               Page 2
     © Hortonworks Inc. 2013
Housekeeping Items
•  Restrooms on 2nd and 4th Floors

•  Hadoop Summit
   –  March 20-21 in Amsterdam
   –  PreConference Training on March 18-19
       –  Discount Code Amst13Spon20

•  Download SandBox
   –  QR Code at postcode on table




                                              Page 3
     © Hortonworks Inc. 2013
A Brief History of Apache Hadoop

                Apache Project        Yahoo! begins to             Hortonworks
                 Established          Operate at scale             Data Platform

                                                                                             2013
   2004                  2006           2008             2010             2012            Enterprise
                                                                                           Hadoop
2005: Yahoo! creates
 team under E14 to                                             Focus on INNOVATION
  work on Hadoop

                         2008: Yahoo team extends focus to
                           operations to support multiple    Focus on OPERATIONS
                            projects & growing clusters


                                      2011: Hortonworks created to focus
                                     on “Enterprise Hadoop“. Starts with 24   STABILITY
                                      key Hadoop engineers from Yahoo



                                                                                               Page 4
          © Hortonworks Inc. 2013
Hortonworks Snapshot

                                         We develop, distribute and support
                                         the ONLY 100% open source
 Headquarters: Palo Alto, CA
 Employees: 180+ and growing             Enterprise Hadoop distribution
 Investors: Benchmark, Index, Yahoo



Develop                                     Distribute                       Support
•  We employ the core                  •  We distribute the only 100%   •  We are uniquely positioned
   architects, builders and               Open Source Enterprise           to deliver the highest quality
   operators of Apache Hadoop             Hadoop Distribution:             of Hadoop support
                                          Hortonworks Data Platform
•  We drive innovation within                                           •  We enable the ecosystem to
   Apache Software                     •  We engineer, test & certify      work better with Hadoop
   Foundation projects                    HDP for enterprise usage


Endorsed by Strategic Partners




                                                                                                      Page 5
             © Hortonworks Inc. 2013
Hortonworks
•  Who is Hortonworks
•  Our approach
  –    Leading Open Source Hadoop innovation
  –    Addressing “Enterprise Hadoop” Requirements
  –    Enabling Interoperability of the Ecosystem
  –    Ensuring No Lock-In: 100% Open Source
•  Patterns of Use




                                                     Page 6
       © Hortonworks Inc. 2013
Apache Community Leadership
    Apache
                                                                     Apache Software Foundation
      Pig          Test &                                            Guiding Principles
                   Patch                               Release
                                        Apache                       •  Release early & often
                                        Hadoop
             Apache                                                  •  Transparency, respect, meritocracy
              Hive
                               Design & Develop
                                                                     Key Roles held by Hortonworkers
                                   Apache
   Apache
   HBase                          HCatalog                           •  VP & PMC Members
                                                                        –  Arun Murthy (Hadoop), Daniel Dai (Pig),
                                                      Apache
                                                      Ambari               Mahadev Konar (Zookeeper)
                  Other
                 Apache
                 Projects
                                                                     •  Release Managers
                                                                        –  Matt Foley (Hadoop 1.x), Arun Murthy
                                                                           (Hadoop 2.x), Ashutosh Chauhan (Hive),
“We have noticed more activity over the last year                          Daniel Dai (Pig), Alan Gates (HCatalog),
 from Hortonworks’ engineers on building out                               Mahadev Konar (Ambari)
 Apache Hadoop’s more innovative features. These
 include YARN, Ambari and HCatalog..”
                                                                     •  Committers
                                             - Jeff Kelly: Wikibon      –  54 across all Hadoop-related projects

                                                                                                              Page 7
              © Hortonworks Inc. 2013
Leadership that Starts at the Core
•  Driving next generation Hadoop
   –  YARN, MapReduce2, HDFS2, High
      Availability, Disaster Recovery


•  420k+ lines authored since 2006
   –  More than twice nearest contributor


•  Deeply integrating w/ecosystem
   –  Enabling new deployment platforms
        –  (ex. Windows & Azure, Linux & VMware HA)
   –  Creating deeply engineered solutions
        –  (ex. Teradata big data appliance)



•  All Apache, NO holdbacks
   –  100% of code contributed to Apache




                                                      Page 8
          © Hortonworks Inc. 2013
Driving Enterprise Hadoop Innovation
                         Lines Of Code By Company                                               Hortonworks    Cloudera
                              Source: Apache Software Foundation                                Committers    Committers


 HADOOP                                                                                             19            9
  CORE


     PIG                                                                                            5             1


    HIVE                                                                                            1             0


HCATALOG                                                                                            5             0


  HBASE                                                                                             3             7

  AMBARI                                                                                            14            0

           0%     10%        20%          30%   40%      50%     60%   70%   80%   90%   100%

                                                      Hortonworks       Yahoo!
                                                      Cloudera          Other


                                                                                                                  Page 9
                © Hortonworks Inc. 2013
Hortonworks Process for Enterprise Hadoop
Upstream Community Projects                                     Downstream Enterprise Product
     Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream
                                                                                        Integrate
                                                                                          & Test

                                                       Fixed Issues


 Apache                                                    Design &
   Pig          Test &
                Patch                                      Develop
                                     Apache    Release                                                            Package
                                     Hadoop                                                                       & Certify
          Apache                                  Stable Project                Hortonworks
           Hive                                   Releases
                             Design & Develop                                   Data Platform

Apache                         Apache
HBase                         HCatalog
                                                                            Distribute
                                              Apache
              Other                           Ambari
             Apache
             Projects                                         No Lock-in: Integrated, tested & certified distribution lowers
                                                                risk by ensuring close alignment with Apache projects


                                                                                                                 Page 10
           © Hortonworks Inc. 2013
Hortonworks
•  Who is Hortonworks
•  Our approach
  –    Leading Open Source Hadoop Innovation
  –    Addressing “Enterprise Hadoop” Requirements
  –    Enabling Interoperability of the Ecosystem
  –    Ensuring NO LOCK-IN: 100% Open Source
•  Patterns of use




                                                     Page 11
       © Hortonworks Inc. 2013
Enhancing the Core of Apache Hadoop
                                                           Deliver high-scale
                                                           storage & processing
                                                           with enterprise-ready
                                                           platform services

                                Distributed                Unique Focus Areas:
 HADOOP	
  CORE	
               Storage & Processing
                                                           •  Bigger, faster, more flexible
                                                            Continued focus on speed & scale and
 PLATFORM	
  SERVICES	
             Enterprise Readiness    enabling near-real-time apps


                                                           •  Tested & certified at scale
                                                            Run ~1300 system tests on large Yahoo
                                                            clusters for every release
 Hortonworkers are the architects,
 operators, and builders of core Hadoop
                                                           •  Enterprise-ready services
                                                            High availability, disaster recovery,
                                                            snapshots, security, …


                                                                                               Page 12
      © Hortonworks Inc. 2013
Data Services for Full Data Lifecycle

                                           DATA	
  
                                                            Provide data services to
                                         SERVICES	
         store, process & access
                                           Store,           data in many ways
                                        Process and
                                        Access Data
                                                            Unique Focus Areas:
                                 Distributed
                                                            •  Apache HCatalog
  HADOOP	
  CORE	
               Storage & Processing        Metadata services for consistent table
                                                             access to Hadoop data

  PLATFORM	
  SERVICES	
             Enterprise Readiness
                                                            •  Apache Hive
                                                             Explore & process Hadoop data via SQL &
                                                             ODBC-compliant BI tools


 Hortonworks enables Hadoop data to be
 accessed via existing tools & systems




                                                                                              Page 13
       © Hortonworks Inc. 2013
Operational Services for Ease of Use

 OPERATIONAL	
                             DATA	
  
                                                            Include complete
   SERVICES	
                            SERVICES	
         operational services for
    Manage &                               Store,           productive operations
    Operate at                          Process and
      Scale                             Access Data         & management

                                 Distributed                Unique Focus Area:
  HADOOP	
  CORE	
               Storage & Processing
                                                            •  Apache Ambari:
                                                             Provision, manage & monitor a cluster;
  PLATFORM	
  SERVICES	
             Enterprise Readiness    complete REST APIs to integrate with
                                                             existing operational tools; job & task
                                                             visualizer to diagnose issues



 Only Hortonworks provides a complete
 open source Hadoop management tool




                                                                                             Page 14
       © Hortonworks Inc. 2013
Deployable Across a Range of Options

 OPERATIONAL	
                               DATA	
  
                                                                          Only Hortonworks
   SERVICES	
                              SERVICES	
                     allows you to deploy
   Manage &                                 Store,                        seamlessly across any
   Operate at                            Process and
     Scale                               Access Data                      deployment option

                                  Distributed                             •  Linux & Windows
 HADOOP	
  CORE	
                 Storage & Processing
                                                                          •  Azure, Rackspace & other clouds
                                                                          •  Virtual platforms
 PLATFORM	
  SERVICES	
               Enterprise Readiness
                                                                          •  Big data appliances
                                  HORTONWORKS	
  	
  
                                  DATA	
  PLATFORM	
  (HDP)	
  

   OS	
               Cloud	
              VM	
           Appliance	
  




                                                                                                    Page 15
        © Hortonworks Inc. 2013
HDP: Enterprise Hadoop Distribution

 OPERATIONAL	
                               DATA	
                       Hortonworks
   SERVICES	
                              SERVICES	
  
                                                                          Data Platform (HDP)
   Manage &                                 Store,
   Operate at                            Process and                      Enterprise Hadoop
     Scale                               Access Data

                                                                          •  The ONLY 100% open source
 HADOOP	
  CORE	
  
                                  Distributed                                and complete distribution
                                  Storage & Processing


 PLATFORM	
  SERVICES	
               Enterprise Readiness                •  Enterprise grade, proven and
                                                                             tested at scale
                                  HORTONWORKS	
  	
  
                                  DATA	
  PLATFORM	
  (HDP)	
             •  Ecosystem endorsed to
                                                                             ensure interoperability
   OS	
               Cloud	
              VM	
           Appliance	
  




                                                                                                       Page 16
        © Hortonworks Inc. 2013
HDP 1.2: Data Services Improvements

 OPERATIONAL	
                                        DATA	
                                Hortonworks
   SERVICES	
                                       SERVICES	
  
                                                                                            Data Platform (HDP)
      AMBARI	
                     FLUME	
          PIG	
        HIVE	
  
                                                                              HBASE	
       Enterprise Hadoop
       OOZIE	
                 SQOOP	
               HCATALOG	
  

                                                                                            •  The ONLY 100% open source
  HADOOP	
  CORE	
  
                                      WEBHDFS	
                MAP	
  REDUCE	
                 and complete distribution
                                         HDFS	
                YARN	
  (in	
  2.0)	
  

                                          Enterprise Readiness
  PLATFORM	
  SERVICES	
                  High Availability, Disaster Recovery,             •  Enterprise grade, proven and
                                          Snapshots, Security, etc…
                                                                                               tested at scale
                                    HORTONWORKS	
  	
  
                                    DATA	
  PLATFORM	
  (HDP)	
                             •  Ecosystem endorsed to
                                                                                               ensure interoperability
    OS	
               Cloud	
                        VM	
                  Appliance	
  




                                                                                                                         Page 17
         © Hortonworks Inc. 2013
Latest Hortonworks Announcements
Two releases in January 2013


  JANUARY                 Hortonworks Data Platform 1.2
                          Hortonworks Brings Enterprise Manageability to 100%

    15                    Open Source Apache Hadoop Distribution




  JANUARY                 Hortonworks Sandbox
                          Hortonworks accelerates Hadoop skills development
    22                    with an easy-to-use, flexible and extensible platform to
                          learn, evaluate and use Apache Hadoop


                                                                               Page 18
     © Hortonworks Inc. 2013
Latest Hortonworks Announcements
February 2013


  February                Hortonworks : New Apache projects
                          Hortonworks fuel the Open Source by releasing three

   20                     new projects : KNOX / TEZ / STINGER




  February                HDP available on Microsoft Windows
                          To help the Hadoop adoption, Hortonworks release
   25                     HDP on Microsoft Windows




                                                                             Page 19
     © Hortonworks Inc. 2013
Hortonworks
•  Who is Hortonworks
•  Our approach
  –    Leading Open Source Hadoop Innovation
  –    Addressing “Enterprise Hadoop” Requirements
  –    Enabling Interoperability of the Ecosystem
  –    Ensuring No Lock-in: 100% Open Source
•  Patterns of use




                                                     Page 20
       © Hortonworks Inc. 2013
Existing Data Architecture
APPLICATIONS	
  




                               Business	
                                 Custom	
         Enterprise	
  
                               AnalyLcs	
                               ApplicaLons	
     ApplicaLons	
  
                                                                                                            DEV	
  &	
  DATA	
  
                                                                                                              TOOLS	
  

                                                                                                               BUILD	
  &	
  
                                                                                                                TEST	
  
DATA	
  SYSTEMS	
  




                                                                                                            OPERATIONAL	
  
                                                                                                               TOOLS	
  

                                                                                                             MANAGE	
  &	
  
                                                                                                             MONITOR	
  
                         RDBMS	
          EDW	
               MPP	
  
                                   TRADITIONAL	
  REPOS	
  
DATA	
  SOURCES	
  




                             TradiLonal	
  Sources	
  	
  
                                 (RDBMS,	
  OLTP,	
  OLAP)	
  
                      OLTP,	
  POS	
  
                      SYSTEMS	
  




                                                                                                                                   Page 21
                                © Hortonworks Inc. 2013
An Emerging Data Architecture
APPLICATIONS	
  




                               Business	
                                 Custom	
                            Enterprise	
  
                               AnalyLcs	
                               ApplicaLons	
                        ApplicaLons	
  
                                                                                                                                                    DEV	
  &	
  DATA	
  
                                                                                                                                                      TOOLS	
  

                                                                                                                                                       BUILD	
  &	
  
                                                                                                                                                        TEST	
  
DATA	
  SYSTEMS	
  




                                                                                                                                                    OPERATIONAL	
  
                                                                                                                                                       TOOLS	
  
                                                                                                           HORTONWORKS	
  	
                         MANAGE	
  &	
  
                                                                                                           DATA	
  PLATFORM	
                        MONITOR	
  
                         RDBMS	
          EDW	
               MPP	
  
                                   TRADITIONAL	
  REPOS	
  
DATA	
  SOURCES	
  




                             TradiLonal	
  Sources	
  	
                                          New	
  Sources	
  	
  
                                 (RDBMS,	
  OLTP,	
  OLAP)	
  
                      OLTP,	
  POS	
                                          (web	
  logs,	
  email,	
  sensor	
  data,	
  social	
  mMOBILE	
  
                                                                                                                                       edia)	
  
                      SYSTEMS	
                                                                                                         DATA	
  




                                                                                                                                                                           Page 22
                                © Hortonworks Inc. 2013
Interoperating With Your Tools
APPLICATIONS	
  




                                   Microsoft Applications
                                                                                                                                       DEV	
  &	
  DATA	
  
                                                                                                                                         TOOLS	
  
DATA	
  SYSTEMS	
  




                                                                                                                                       OPERATIONAL	
  
                                                                                                                                          TOOLS	
  
                                                                                              HORTONWORKS	
  	
  
                                                                                              DATA	
  PLATFORM	
  
                                  TRADITIONAL	
  REPOS	
                                                                                         Viewpoint
DATA	
  SOURCES	
  




                             TradiLonal	
  Sources	
  	
                             New	
  Sources	
  	
  
                                 (RDBMS,	
  OLTP,	
  OLAP)	
  
                      OLTP,	
  POS	
                             (web	
  logs,	
  email,	
  sensor	
  data,	
  social	
  mMOBILE	
  
                                                                                                                          edia)	
  
                      SYSTEMS	
                                                                                            DATA	
  




                                                                                                                                                              Page 23
                                © Hortonworks Inc. 2013
Hortonworks
•  Who is Hortonworks
•  Our approach
  –    Leading Open Source Hadoop Innovation
  –    Addressing “Enterprise Hadoop” Requirements
  –    Enabling Interoperability of the Ecosystem
  –    Ensuring No Lock-In: 100% Open Source
•  Patterns of use




                                                     Page 24
       © Hortonworks Inc. 2013
Hortonworks
•  Who is Hortonworks
•  Our approach
•  Patterns of use




                               Page 25
     © Hortonworks Inc. 2013
Operational Data Refinery
                                                                                                                                                  Refine    Explore       Enrich
APPLICATIONS	
  




                       Business	
                              Custom	
                              Enterprise	
                             Collect data and apply
                       AnalyLcs	
                            ApplicaLons	
                          ApplicaLons	
                             a known algorithm to it
                                                                                                                                              in trusted operational
                                                                                                                                              process

                                                                                                                                              1   Capture
                                                                              3                                                                   Capture all data
DATA	
  SYSTEMS	
  




                                                                                                                 HORTONWORKS	
  	
  
                                                                                                                 DATA	
  PLATFORM	
       2   2   Process
                       RDBMS	
         EDW	
               MPP	
  
                                TRADITIONAL	
  REPOS	
  
                                                                                                                                                  Parse, cleanse, apply
                                                                                                                                                  structure & transform

                                                                                                                                              3   Exchange
                                                                                                                1                                 Push to existing data
                                                                                                                                                  warehouse for use with
                                                                                                                                                  existing analytic tools
DATA	
  SOURCES	
  




                      TradiLonal	
  Sources	
  	
                                        New	
  Sources	
  	
  
                        (RDBMS,	
  OLTP,	
  OLAP)	
                  (web	
  logs,	
  email,	
  sensor	
  data,	
  social	
  media)	
  




                                                                                                                                                                      Page 26
                        © Hortonworks Inc. 2013
Big Data Exploration & Visualization
                                                                                                                                                  Refine    Explore       Enrich
APPLICATIONS	
  




                       Business	
                              Custom	
                              Enterprise	
                             Collect data and
                       AnalyLcs	
                            ApplicaLons	
                          ApplicaLons	
                             perform iterative
                                                                                                                                              investigation for value
                                                                                                   3
                                                                                                                                              1   Capture
                                                                                                                                                  Capture all data
DATA	
  SYSTEMS	
  




                                                                                                                 HORTONWORKS	
  	
  
                                                                                                                 DATA	
  PLATFORM	
       2   2   Process
                       RDBMS	
         EDW	
               MPP	
  
                                TRADITIONAL	
  REPOS	
  
                                                                                                                                                  Parse, cleanse, apply
                                                                                                                                                  structure & transform

                                                                                                                                              3   Exchange
                                                                                                                1                                 Explore and visualize
                                                                                                                                                  with analytics tools
                                                                                                                                                  supporting Hadoop
DATA	
  SOURCES	
  




                      TradiLonal	
  Sources	
  	
                                        New	
  Sources	
  	
  
                        (RDBMS,	
  OLTP,	
  OLAP)	
                  (web	
  logs,	
  email,	
  sensor	
  data,	
  social	
  media)	
  




                                                                                                                                                                      Page 27
                        © Hortonworks Inc. 2013
Application Enrichment
                                                                                                                                                  Refine    Explore       Enrich
APPLICATIONS	
  




                                                               Custom	
                              Enterprise	
                             Collect data, analyze
                                                             ApplicaLons	
                          ApplicaLons	
                             and present salient
                                                                                                                                              results for online apps
                                                                                         3
                                                                                                                                              1   Capture
                                                                                                                                                  Capture all data
DATA	
  SYSTEMS	
  




                                                                                                                 HORTONWORKS	
  	
  
                                                                                                                 DATA	
  PLATFORM	
       2   2   Process
                       RDBMS	
         EDW	
               MPP	
             NOSQL	
  
                                TRADITIONAL	
  REPOS	
  
                                                                                                                                                  Parse, cleanse, apply
                                                                                                                                                  structure & transform

                                                                                                                                              3   Exchange
                                                                                                                1                                 Incorporate data directly
                                                                                                                                                  into applications
DATA	
  SOURCES	
  




                      TradiLonal	
  Sources	
  	
                                        New	
  Sources	
  	
  
                        (RDBMS,	
  OLTP,	
  OLAP)	
                  (web	
  logs,	
  email,	
  sensor	
  data,	
  social	
  media)	
  




                                                                                                                                                                      Page 28
                        © Hortonworks Inc. 2013
Key 2013 “Enterprise Hadoop” Initiatives

                                                                                         Invest In:
                            Tez / “Stinger”
                               Interactive Query
                                                                                      – Platform Services
   Ambari                                                             HBase               – DR, Snapshot, …
Manage & Operate                                                     Online Data
                          OPERATIONAL	
               DATA	
  
                            SERVICES	
              SERVICES	
  



                                         HADOOP	
  CORE	
  
                                                                                      – Data Services
                                      PLATFORM	
  SERVICES	
                              – In support of Refine,
 “Gateway”                           HORTONWORKS	
  	
                “Herd”                Explore, Enrich
 Secure Access
                                  DATA	
  PLATFORM	
  (HDP)	
      Data Integration


                                                                                      – Operational Services
                              “Continuum”                                                 – Manageability,
                                  Biz Continuity
                                                                                            Security, …




                                                                                                          Page 29
            © Hortonworks Inc. 2013
Stinger: Make Hive Best for All Needs

                     Interac4ve	
                   Non-­‐Interac4ve	
                     Batch	
  

             •  Parameterized	
                     •  Data	
  prepara4on	
       •  Opera4onal	
  batch	
  
                Reports	
                           •  Incremental	
  batch	
        processing	
  
             •  Drilldown	
                            processing	
               •  Enterprise	
  Reports	
  
             •  Visualiza4on	
                      •  Dashboards	
  /	
          •  Data	
  Mining	
  
             •  Explora4on	
                           Scorecards	
  




                            5s – 1m                        1m – 1h                           1h+

                                                           Data Size

Improve Latency & Throughput                                      Extend Deep Analytical Ability
•  Query engine improvements                                      •  Analytics functions
•  New “Optimized RCFile” column store                            •  Improved SQL coverage
•  Next-gen runtime (elim’s M/R latency)                          •  Continued focus on core Hive use cases

                                                                                                                 Page 30
             ©	
  Hortonworks	
  Inc.	
  2013	
  
Flexible Support Subscription Programs
    Leverage Hortonworks Expertise: Subscription and Support delivered and
    backed by Hadoop experts; subscriptions based on nodes or storage

      Developer Support
                                               12 x 5        All Sev:                    Application
      “How to” guidance for                                                   1 seat                       Code Review
                                              Web only    1 business day                Design Advice
      developers and archs

      Enterprise Support                       24 x 7
                                                         Sev 1: 1 Hour          5       Patches &   Cluster Design, Install,
      Operations support for                  Phone &
                                                         Sev 2: 4 Bus Hour   Contacts    Updates    Maintain, Performance
                                                Web
      critical clusters

      Additional Options
      Standard Support
                                               12 x 5        All Sev:           3       Patches &   Cluster Design, Install,
      Operations support for                  Web only    1 business day     Contacts    Updates    Maintain, Performance
      dev & test clusters

      Essential Support*
                                               12 x 5        All Sev:           3       Patches &   Cluster Design, Install,
      Operations support for                  Web only    1 business day     Contacts    Updates    Maintain, Performance
      small research clusters

* Limited in size and no expansion

                    © Hortonworks Inc. 2013                                                                            Page 31
Hortonworks: Best In Class Hadoop Support
•  Experienced enterprise support team
   –  Experience supporting enterprise clients in production
   –  Core engineers have real operational
      experience: built and supported 44+K nodes in production
   –  Extensive experience in commercial big data offerings
      including HDP, MapR, Karmasphere




•  Global 24x7 operation – support based in Sunnyvale, UK & India

•  Stringent case management processes ensures high quality customer
   service & responsiveness




                                                                    Page 32
       © Hortonworks Inc. 2013
Transferring Our Hadoop Expertise to You
                              The expert source for
                              Apache Hadoop training & certification

                              •  World class training programs designed to
                                 help you learn fast
                                 – Role-based hands on classes with 50% lab time

                              •  Expert consulting services
                                 – Programs designed to transfer knowledge


                              •  Industry leading Hadoop Sandbox program
                                 – Fastest way to learn Apache Hadoop
                                 – Multi-level tutorials for wide applicability
                                 – Customizable and updateable


                                                                                  Page 33
    © Hortonworks Inc. 2013
Summary
• Leading the Innovation in Core Hadoop
• Addressing the requirements for Enterprise usage
• Enabling interoperability of the ecosystem
• No lock-in. 100% Open Source.

• Best in industry support with flexible pricing model

• Find out more
  – www.hortonworks.com

  – http://hortonworks.com/hadoop-training/


                                                         Page 34
     © Hortonworks Inc. 2013

Hortonworks Presentation at Big Data London

  • 1.
    Hortonworks Enterprise Apache Hadoop March5, 2013 © Hortonworks Inc. 2013 Page 1
  • 2.
    Hortonworks •  Who isHortonworks •  Our Approach •  Customer Use Cases Page 2 © Hortonworks Inc. 2013
  • 3.
    Housekeeping Items •  Restroomson 2nd and 4th Floors •  Hadoop Summit –  March 20-21 in Amsterdam –  PreConference Training on March 18-19 –  Discount Code Amst13Spon20 •  Download SandBox –  QR Code at postcode on table Page 3 © Hortonworks Inc. 2013
  • 4.
    A Brief Historyof Apache Hadoop Apache Project Yahoo! begins to Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop 2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop 2008: Yahoo team extends focus to operations to support multiple Focus on OPERATIONS projects & growing clusters 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 STABILITY key Hadoop engineers from Yahoo Page 4 © Hortonworks Inc. 2013
  • 5.
    Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Headquarters: Palo Alto, CA Employees: 180+ and growing Enterprise Hadoop distribution Investors: Benchmark, Index, Yahoo Develop Distribute Support •  We employ the core •  We distribute the only 100% •  We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data Platform •  We drive innovation within •  We enable the ecosystem to Apache Software •  We engineer, test & certify work better with Hadoop Foundation projects HDP for enterprise usage Endorsed by Strategic Partners Page 5 © Hortonworks Inc. 2013
  • 6.
    Hortonworks •  Who isHortonworks •  Our approach –  Leading Open Source Hadoop innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-In: 100% Open Source •  Patterns of Use Page 6 © Hortonworks Inc. 2013
  • 7.
    Apache Community Leadership Apache Apache Software Foundation Pig Test & Guiding Principles Patch Release Apache •  Release early & often Hadoop Apache •  Transparency, respect, meritocracy Hive Design & Develop Key Roles held by Hortonworkers Apache Apache HBase HCatalog •  VP & PMC Members –  Arun Murthy (Hadoop), Daniel Dai (Pig), Apache Ambari Mahadev Konar (Zookeeper) Other Apache Projects •  Release Managers –  Matt Foley (Hadoop 1.x), Arun Murthy (Hadoop 2.x), Ashutosh Chauhan (Hive), “We have noticed more activity over the last year Daniel Dai (Pig), Alan Gates (HCatalog), from Hortonworks’ engineers on building out Mahadev Konar (Ambari) Apache Hadoop’s more innovative features. These include YARN, Ambari and HCatalog..” •  Committers - Jeff Kelly: Wikibon –  54 across all Hadoop-related projects Page 7 © Hortonworks Inc. 2013
  • 8.
    Leadership that Startsat the Core •  Driving next generation Hadoop –  YARN, MapReduce2, HDFS2, High Availability, Disaster Recovery •  420k+ lines authored since 2006 –  More than twice nearest contributor •  Deeply integrating w/ecosystem –  Enabling new deployment platforms –  (ex. Windows & Azure, Linux & VMware HA) –  Creating deeply engineered solutions –  (ex. Teradata big data appliance) •  All Apache, NO holdbacks –  100% of code contributed to Apache Page 8 © Hortonworks Inc. 2013
  • 9.
    Driving Enterprise HadoopInnovation Lines Of Code By Company Hortonworks Cloudera Source: Apache Software Foundation Committers Committers HADOOP 19 9 CORE PIG 5 1 HIVE 1 0 HCATALOG 5 0 HBASE 3 7 AMBARI 14 0 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Hortonworks Yahoo! Cloudera Other Page 9 © Hortonworks Inc. 2013
  • 10.
    Hortonworks Process forEnterprise Hadoop Upstream Community Projects Downstream Enterprise Product Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Integrate & Test Fixed Issues Apache Design & Pig Test & Patch Develop Apache Release Package Hadoop & Certify Apache Stable Project Hortonworks Hive Releases Design & Develop Data Platform Apache Apache HBase HCatalog Distribute Apache Other Ambari Apache Projects No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Page 10 © Hortonworks Inc. 2013
  • 11.
    Hortonworks •  Who isHortonworks •  Our approach –  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring NO LOCK-IN: 100% Open Source •  Patterns of use Page 11 © Hortonworks Inc. 2013
  • 12.
    Enhancing the Coreof Apache Hadoop Deliver high-scale storage & processing with enterprise-ready platform services Distributed Unique Focus Areas: HADOOP  CORE   Storage & Processing •  Bigger, faster, more flexible Continued focus on speed & scale and PLATFORM  SERVICES   Enterprise Readiness enabling near-real-time apps •  Tested & certified at scale Run ~1300 system tests on large Yahoo clusters for every release Hortonworkers are the architects, operators, and builders of core Hadoop •  Enterprise-ready services High availability, disaster recovery, snapshots, security, … Page 12 © Hortonworks Inc. 2013
  • 13.
    Data Services forFull Data Lifecycle DATA   Provide data services to SERVICES   store, process & access Store, data in many ways Process and Access Data Unique Focus Areas: Distributed •  Apache HCatalog HADOOP  CORE   Storage & Processing Metadata services for consistent table access to Hadoop data PLATFORM  SERVICES   Enterprise Readiness •  Apache Hive Explore & process Hadoop data via SQL & ODBC-compliant BI tools Hortonworks enables Hadoop data to be accessed via existing tools & systems Page 13 © Hortonworks Inc. 2013
  • 14.
    Operational Services forEase of Use OPERATIONAL   DATA   Include complete SERVICES   SERVICES   operational services for Manage & Store, productive operations Operate at Process and Scale Access Data & management Distributed Unique Focus Area: HADOOP  CORE   Storage & Processing •  Apache Ambari: Provision, manage & monitor a cluster; PLATFORM  SERVICES   Enterprise Readiness complete REST APIs to integrate with existing operational tools; job & task visualizer to diagnose issues Only Hortonworks provides a complete open source Hadoop management tool Page 14 © Hortonworks Inc. 2013
  • 15.
    Deployable Across aRange of Options OPERATIONAL   DATA   Only Hortonworks SERVICES   SERVICES   allows you to deploy Manage & Store, seamlessly across any Operate at Process and Scale Access Data deployment option Distributed •  Linux & Windows HADOOP  CORE   Storage & Processing •  Azure, Rackspace & other clouds •  Virtual platforms PLATFORM  SERVICES   Enterprise Readiness •  Big data appliances HORTONWORKS     DATA  PLATFORM  (HDP)   OS   Cloud   VM   Appliance   Page 15 © Hortonworks Inc. 2013
  • 16.
    HDP: Enterprise HadoopDistribution OPERATIONAL   DATA   Hortonworks SERVICES   SERVICES   Data Platform (HDP) Manage & Store, Operate at Process and Enterprise Hadoop Scale Access Data •  The ONLY 100% open source HADOOP  CORE   Distributed and complete distribution Storage & Processing PLATFORM  SERVICES   Enterprise Readiness •  Enterprise grade, proven and tested at scale HORTONWORKS     DATA  PLATFORM  (HDP)   •  Ecosystem endorsed to ensure interoperability OS   Cloud   VM   Appliance   Page 16 © Hortonworks Inc. 2013
  • 17.
    HDP 1.2: DataServices Improvements OPERATIONAL   DATA   Hortonworks SERVICES   SERVICES   Data Platform (HDP) AMBARI   FLUME   PIG   HIVE   HBASE   Enterprise Hadoop OOZIE   SQOOP   HCATALOG   •  The ONLY 100% open source HADOOP  CORE   WEBHDFS   MAP  REDUCE   and complete distribution HDFS   YARN  (in  2.0)   Enterprise Readiness PLATFORM  SERVICES   High Availability, Disaster Recovery, •  Enterprise grade, proven and Snapshots, Security, etc… tested at scale HORTONWORKS     DATA  PLATFORM  (HDP)   •  Ecosystem endorsed to ensure interoperability OS   Cloud   VM   Appliance   Page 17 © Hortonworks Inc. 2013
  • 18.
    Latest Hortonworks Announcements Tworeleases in January 2013 JANUARY Hortonworks Data Platform 1.2 Hortonworks Brings Enterprise Manageability to 100% 15 Open Source Apache Hadoop Distribution JANUARY Hortonworks Sandbox Hortonworks accelerates Hadoop skills development 22 with an easy-to-use, flexible and extensible platform to learn, evaluate and use Apache Hadoop Page 18 © Hortonworks Inc. 2013
  • 19.
    Latest Hortonworks Announcements February2013 February Hortonworks : New Apache projects Hortonworks fuel the Open Source by releasing three 20 new projects : KNOX / TEZ / STINGER February HDP available on Microsoft Windows To help the Hadoop adoption, Hortonworks release 25 HDP on Microsoft Windows Page 19 © Hortonworks Inc. 2013
  • 20.
    Hortonworks •  Who isHortonworks •  Our approach –  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-in: 100% Open Source •  Patterns of use Page 20 © Hortonworks Inc. 2013
  • 21.
    Existing Data Architecture APPLICATIONS   Business   Custom   Enterprise   AnalyLcs   ApplicaLons   ApplicaLons   DEV  &  DATA   TOOLS   BUILD  &   TEST   DATA  SYSTEMS   OPERATIONAL   TOOLS   MANAGE  &   MONITOR   RDBMS   EDW   MPP   TRADITIONAL  REPOS   DATA  SOURCES   TradiLonal  Sources     (RDBMS,  OLTP,  OLAP)   OLTP,  POS   SYSTEMS   Page 21 © Hortonworks Inc. 2013
  • 22.
    An Emerging DataArchitecture APPLICATIONS   Business   Custom   Enterprise   AnalyLcs   ApplicaLons   ApplicaLons   DEV  &  DATA   TOOLS   BUILD  &   TEST   DATA  SYSTEMS   OPERATIONAL   TOOLS   HORTONWORKS     MANAGE  &   DATA  PLATFORM   MONITOR   RDBMS   EDW   MPP   TRADITIONAL  REPOS   DATA  SOURCES   TradiLonal  Sources     New  Sources     (RDBMS,  OLTP,  OLAP)   OLTP,  POS   (web  logs,  email,  sensor  data,  social  mMOBILE   edia)   SYSTEMS   DATA   Page 22 © Hortonworks Inc. 2013
  • 23.
    Interoperating With YourTools APPLICATIONS   Microsoft Applications DEV  &  DATA   TOOLS   DATA  SYSTEMS   OPERATIONAL   TOOLS   HORTONWORKS     DATA  PLATFORM   TRADITIONAL  REPOS   Viewpoint DATA  SOURCES   TradiLonal  Sources     New  Sources     (RDBMS,  OLTP,  OLAP)   OLTP,  POS   (web  logs,  email,  sensor  data,  social  mMOBILE   edia)   SYSTEMS   DATA   Page 23 © Hortonworks Inc. 2013
  • 24.
    Hortonworks •  Who isHortonworks •  Our approach –  Leading Open Source Hadoop Innovation –  Addressing “Enterprise Hadoop” Requirements –  Enabling Interoperability of the Ecosystem –  Ensuring No Lock-In: 100% Open Source •  Patterns of use Page 24 © Hortonworks Inc. 2013
  • 25.
    Hortonworks •  Who isHortonworks •  Our approach •  Patterns of use Page 25 © Hortonworks Inc. 2013
  • 26.
    Operational Data Refinery Refine Explore Enrich APPLICATIONS   Business   Custom   Enterprise   Collect data and apply AnalyLcs   ApplicaLons   ApplicaLons   a known algorithm to it in trusted operational process 1 Capture 3 Capture all data DATA  SYSTEMS   HORTONWORKS     DATA  PLATFORM   2 2 Process RDBMS   EDW   MPP   TRADITIONAL  REPOS   Parse, cleanse, apply structure & transform 3 Exchange 1 Push to existing data warehouse for use with existing analytic tools DATA  SOURCES   TradiLonal  Sources     New  Sources     (RDBMS,  OLTP,  OLAP)   (web  logs,  email,  sensor  data,  social  media)   Page 26 © Hortonworks Inc. 2013
  • 27.
    Big Data Exploration& Visualization Refine Explore Enrich APPLICATIONS   Business   Custom   Enterprise   Collect data and AnalyLcs   ApplicaLons   ApplicaLons   perform iterative investigation for value 3 1 Capture Capture all data DATA  SYSTEMS   HORTONWORKS     DATA  PLATFORM   2 2 Process RDBMS   EDW   MPP   TRADITIONAL  REPOS   Parse, cleanse, apply structure & transform 3 Exchange 1 Explore and visualize with analytics tools supporting Hadoop DATA  SOURCES   TradiLonal  Sources     New  Sources     (RDBMS,  OLTP,  OLAP)   (web  logs,  email,  sensor  data,  social  media)   Page 27 © Hortonworks Inc. 2013
  • 28.
    Application Enrichment Refine Explore Enrich APPLICATIONS   Custom   Enterprise   Collect data, analyze ApplicaLons   ApplicaLons   and present salient results for online apps 3 1 Capture Capture all data DATA  SYSTEMS   HORTONWORKS     DATA  PLATFORM   2 2 Process RDBMS   EDW   MPP   NOSQL   TRADITIONAL  REPOS   Parse, cleanse, apply structure & transform 3 Exchange 1 Incorporate data directly into applications DATA  SOURCES   TradiLonal  Sources     New  Sources     (RDBMS,  OLTP,  OLAP)   (web  logs,  email,  sensor  data,  social  media)   Page 28 © Hortonworks Inc. 2013
  • 29.
    Key 2013 “EnterpriseHadoop” Initiatives Invest In: Tez / “Stinger” Interactive Query – Platform Services Ambari HBase – DR, Snapshot, … Manage & Operate Online Data OPERATIONAL   DATA   SERVICES   SERVICES   HADOOP  CORE   – Data Services PLATFORM  SERVICES   – In support of Refine, “Gateway” HORTONWORKS     “Herd” Explore, Enrich Secure Access DATA  PLATFORM  (HDP)   Data Integration – Operational Services “Continuum” – Manageability, Biz Continuity Security, … Page 29 © Hortonworks Inc. 2013
  • 30.
    Stinger: Make HiveBest for All Needs Interac4ve   Non-­‐Interac4ve   Batch   •  Parameterized   •  Data  prepara4on   •  Opera4onal  batch   Reports   •  Incremental  batch   processing   •  Drilldown   processing   •  Enterprise  Reports   •  Visualiza4on   •  Dashboards  /   •  Data  Mining   •  Explora4on   Scorecards   5s – 1m 1m – 1h 1h+ Data Size Improve Latency & Throughput Extend Deep Analytical Ability •  Query engine improvements •  Analytics functions •  New “Optimized RCFile” column store •  Improved SQL coverage •  Next-gen runtime (elim’s M/R latency) •  Continued focus on core Hive use cases Page 30 ©  Hortonworks  Inc.  2013  
  • 31.
    Flexible Support SubscriptionPrograms Leverage Hortonworks Expertise: Subscription and Support delivered and backed by Hadoop experts; subscriptions based on nodes or storage Developer Support 12 x 5 All Sev: Application “How to” guidance for 1 seat Code Review Web only 1 business day Design Advice developers and archs Enterprise Support 24 x 7 Sev 1: 1 Hour 5 Patches & Cluster Design, Install, Operations support for Phone & Sev 2: 4 Bus Hour Contacts Updates Maintain, Performance Web critical clusters Additional Options Standard Support 12 x 5 All Sev: 3 Patches & Cluster Design, Install, Operations support for Web only 1 business day Contacts Updates Maintain, Performance dev & test clusters Essential Support* 12 x 5 All Sev: 3 Patches & Cluster Design, Install, Operations support for Web only 1 business day Contacts Updates Maintain, Performance small research clusters * Limited in size and no expansion © Hortonworks Inc. 2013 Page 31
  • 32.
    Hortonworks: Best InClass Hadoop Support •  Experienced enterprise support team –  Experience supporting enterprise clients in production –  Core engineers have real operational experience: built and supported 44+K nodes in production –  Extensive experience in commercial big data offerings including HDP, MapR, Karmasphere •  Global 24x7 operation – support based in Sunnyvale, UK & India •  Stringent case management processes ensures high quality customer service & responsiveness Page 32 © Hortonworks Inc. 2013
  • 33.
    Transferring Our HadoopExpertise to You The expert source for Apache Hadoop training & certification •  World class training programs designed to help you learn fast – Role-based hands on classes with 50% lab time •  Expert consulting services – Programs designed to transfer knowledge •  Industry leading Hadoop Sandbox program – Fastest way to learn Apache Hadoop – Multi-level tutorials for wide applicability – Customizable and updateable Page 33 © Hortonworks Inc. 2013
  • 34.
    Summary • Leading the Innovationin Core Hadoop • Addressing the requirements for Enterprise usage • Enabling interoperability of the ecosystem • No lock-in. 100% Open Source. • Best in industry support with flexible pricing model • Find out more – www.hortonworks.com – http://hortonworks.com/hadoop-training/ Page 34 © Hortonworks Inc. 2013