Hortonworks & Systems Integrators
Mitch Ferguson
VP, Business Development

Rikin Shah
Dir, Field Engineering


September 5, 2012



© Hortonworks Inc. 2012         Page 1
Big data changes the game

                                                                     Transactions + Interactions
Petabytes
                  BIG DATA                       Mobile Web                  + Observations
                                                 Sentiment

                                                  User Click Stream
                                                                    SMS/MMS
                                                                                   = BIG DATA
                                                                         Speech to Text

                                                                Social Interactions & Feeds
  Terabytes       WEB                Web logs
                                                                         Spatial & GPS Coordinates
                                         A/B testing
                                                                                Sensors / RFID / Devices
                                                  Behavioral Targeting
   Gigabytes      CRM                                                                   Business Data Feeds
                                                             Dynamic Pricing
                                     Segmentation                                             External Demographics
                                                                    Search Marketing
                                         Customer Touches                                      User Generated Content
                  ERP
   Megabytes                                                           Affiliate Networks
                   Purchase detail              Support Contacts                                  HD Video, Audio, Images
                                                                         Dynamic Funnels
                   Purchase record
                                                    Offer details          Offer history            Product/Service Logs
                   Payment record



                                                  Increasing Data Variety and Complexity


               © Hortonworks Inc. 2012
Hortonworks Snapshot
                                     The industry leading and only 100% open
                                     source Apache Hadoop distribution


•  Headquarters
                                     Most experienced open source leadership team
   Sunnyvale, CA                      –    Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle)
                                      –    Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss)
•  100+ Employees
                                      –    Mitch Ferguson: VP BD (SringSource, VMWare)
•  Formed with core                   –    John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj)
   Apache Hadoop                      –    Ari Zilka – CPO (Teracotta, Accenture, Walmart.com)
   engineering team                   –    Greg Pavlik – VP Eng. (Oracle SOA & Integration platform)
   from Yahoo!

•  40+ engineers and
   architects including
                                     Business model focused on customer success:
   25+ Hadoop                        Hadoop support, services & training
   committers
                                      – Subscription support for Hortonworks Data Platform
                                      – Training business: Private and public classes
                                        available for developers & administrators



           © Hortonworks Inc. 2012
Hortonworks Business Strategy

  Enable the next gen data management
                platform

• Accelerate the adoption of Apache Hadoop

• Create a vibrant eco-systems
   – ISVs, IHV, Systems Integrators

• Provide world-class enterprise Support & Training




      © Hortonworks Inc. 2012
Hortonworks Vision & Role

                                We believe that by the end of 2015,
                                more than half the world's data will be
                                processed by Apache Hadoop.



  1       Be diligent stewards of the open source core

  2       Be tireless innovators beyond the core

  3       Provide robust data platform services & open APIs

  4       Enable the ecosystem at each layer of the stack

  5       Make the platform enterprise-ready & easy to use


      © Hortonworks Inc. 2012
Enabling Hadoop as Enterprise Big Data Platform



Applications,
                                                                                Installation &
Business Tools,                                                                 Configuration,
Development Tools,                                                              Administration,
Open APIs and access                                                            Monitoring,
Data Movement & Integration,                                                    High Availability,
Data Management Systems,                                                        Replication,
Systems Management                                Hortonworks                   Multi-tenancy, ..
                                                  Data Platform

                                              DEVELOPER
                                       Data Platform Services & Open APIs

                                        Metadata, Indexing, Search, Security,
                                       Management, Data Extract & Load, APIs




             © Hortonworks Inc. 2012
Hortonworks Partner Eco-System




   © Hortonworks Inc. 2012
Hortonworks & SIs
 Our business models are 100% Complementary

• Systems Integrators are a corner-stone of our business model
• Enable high-value & repeatable solutions
• Leverage multi-party relationships to accelerate business




                                            Systems Integrator



                                 Customer

       © Hortonworks Inc. 2012
Why Hortonworks?
•  The most Apache Hadoop experience and expertise
   –  Reliable Hadoop from the experts, project leaders, architects and
      builders
   –  Collectively over 90 years operational Hadoop experience
      (at least double that of the closest competitor)

•  Influence community direction
   –  Provides a direct connection to drive innovation in the community

•  Focus on the ecosystem
   –  Roadmap and vision to provide access to the wide ecosystem of
      enterprise application, such as Teradata

•  Industry momentum
   –  Collaborate across partners (ISVs/IHVs/SIs) to enable high-value
      solutions



       © Hortonworks Inc. 2012
Hortonworks Apache Hadoop Leadership

Hortonworkers… the builders,                                             Leadership
operators and core architects                             • VP and PMC of Hadoop
of Apache Hadoop                                           Arun Murthy
                                                          • Core Architect of YARN
                                                           Arun Murthy
•  Most experienced team running Hadoop                   • Core Architect MapReduce2
   in production at scale (> 5 years, 42000 nodes)         Arun Murthy

•  All “stable” releases of Apache Hadoop                 • VP & PMC of Pig
                                                           Daniel Dai
   have been shipped by Hortonworkers
                                                          • VP of Zookeeper
                                                           Mahdev Konar
“We have noticed more activity over the                   • Inventor of HCatalog
last year from Hortonworks’ engineers                      Alan Gates
on building out Apache Hadoop’s more                      • Project Lead for Ambari
innovative features. These include                         Mahedev Konar
YARN, Ambari and HCatalog..”                              • Original Project Lead
                                                           Eric Baldschweiler
                                   - Jeff Kelly: Wkibon


         © Hortonworks Inc. 2012
Hortonworks Data Platform




© Hortonworks Inc. 2012
Hortonworks Data Platform

                                                                                                                           Develop                            Interact




                                                                                                     Non-Relational Database




                                                                                                                                                                                                          Talend Open Studio for Big Data, Sqoop, Flume)
                                                                                                                                         Scripting                  Query
          Management & Monitoring Services




                                                                                                                                            (Pig)                    (Hive)




                                                                                                                                                                              Data Integration Services
                                                                   Workflow & Scheduling




                                                                                                                                                                                                                   (HCatalog APIs, WebHDFS,
                                                                                                                               (HBase)


                                                                                                                                              Metadata Services
                                             (Ambari, Zookeeper)




                                                                                                                                                       (HCatalog)
                                                                                           (Oozie)




Operate                                                                                                                                    Distributed Processing                                                                                          Integrate
                                                                                                                                                      (MapReduce)




                                                                                                                                         Distributed Storage
                                                                                                                                                    (HDFS)


                                                                                                          Hortonworks Data Platform

          © Hortonworks Inc. 2012
Apache Hadoop Release Management

                                                     1.1.1      1.1.2
       Hadoop 1




                                     1.0.1   1.0.2      1.0.3        HDP 1.0

•    Apache Hadoop Release management is run by Hortonworks
      •  Matt Foley, Release Manager for Hadoop 1
      •  Arun Murthy, Release Manager for Hadoop 2
      •  Ashutosh Chauhan, Release Manager for Hive
      •  Daniel Dai, Release Manager for Pig
      •  Alan Gates, Release Manager for Hcat
•    Hadoop Core releases validated (and fixed) by Hortonworks
      •  ~1300 end to end system tests run in house using our IP before any release can be made
•    Hortonworks Data Platform is released directly from Apache Hadoop branches


           © Hortonworks Inc. 2012
Full Stack High Availability




© Hortonworks Inc. 2012
Full Stack High Availability                                                                       HA

                                                                                                   HA




•  Failover and restart for
     •  NameNode
     •  JobTracker
     •  HBase and other services to come…      HA Cluster            Core Switch

                                                            Rack Switch              Rack Switch
•  Open API allows use of Proven HA
   from multiple vendors (Red Hat &
   VMWare)                                                  Namenode                 Namenode
                                                            HA Manager               HA Manager
•  Minimized changes to clients and
   configuration                                            Job Tracker          Job Tracker
•  Complementary to 2.0 HA efforts                           HA Manager              HA Manager

•  Server & Operating System failure
                                                            Etc. daemon          Etc. daemon
   detection and VM restart
                                                            HA Manager               HA Manager
•  Smart resource management
   ensures sufficient resources are                                       HA Pairs
   available to restart VMs
                            Addresses HA needs on stable Apache Hadoop 1.0

        © Hortonworks Inc. 2012
Capacity Scheduler Delivers Multi-tenancy

• Queue definition
  – % of total system memory
  – % CPU utilization (not slot count)
• Queues per team
  – Soft limits and hard so you can use entire cluster if available
  – Ownership and security built in
• Proactive resource management
  – Lots of rules and observation points
  – Don’t start another task if it will blow up the node
  – Don’t start another task if other workloads are spinning up
• Better than Fair + Preemption (HDP Supports All)
  – Utilization not measured by slot count (can blow up a node /
    cluster)
  – Doesn’t start all tasks automatically (proactive vs. reactive)
      © Hortonworks Inc. 2012
HCatalog
                          METADATA




© Hortonworks Inc. 2012
Metadata Services
Apache HCatalog provides flexible metadata
services across tools and external access
 •  Consistency of metadata and data models across tools
    (MapReduce, Pig, HBase and Hive)
 •  Accessibility: share data as tables in and out of HDFS
 •  Availability: enables flexible, thin-client access via REST API




                                  HCatalog                        Shared table
                                                                  and schema
                                                                  management
   •  Raw Hadoop data                        Table access         opens the
   •  Inconsistent, unknown                  Aligned metadata     platform
   •  Tool specific access                   REST API



        © Hortonworks Inc. 2012
Options Lead to Complexity
 Feature                        MapReduce         Pig                   Hive
 Record format                  Key value pairs   Tuple                 Record
 Data model                     User defined      int, float, string,   int, float, string,
                                                  bytes, maps,          maps, structs, lists
                                                  tuples, bags
 Schema                         Encoded in app    Declared in script    Read from
                                                  or read by loader     metadata
 Data location                  Encoded in app    Declared in script    Read from
                                                                        metadata
 Data format                    Encoded in app    Declared in script    Read from
                                                                        metadata


•    Pig and MR users need to know a lot to write their apps
•    When data schema, location, or format change Pig and MR apps must be
     rewritten, retested, and redeployed
•    Hive users have to load data from Pig/MR users to have access to it
           © Hortonworks 2012
Hadoop Ecosystem

                                  Hive                       Pig                 MapReduce
                                 (SQL)                   (scripting)               (Java)




                    Interface:            Interface:      Interface:
                       SQL                  SerDe        Load/Store

                                          DML

                                            Input/         Input/                   Input/
                                         OutputFormat   OutputFormat             OutputFormat
                  DDL




                     metastore                          dn1   dn2      dn3   .       .    .
                - tables
                - partitions
                - files                                   .     .        .    .       .    .
                - types

                                                         .              .    .       .   dnN


                                                                        HDFS



   © Hortonworks Inc. 2012
Opening up Metadata to MR & Pig

                                   Hive             Pig                 MapReduce
                                  (SQL)         (scripting)               (Java)




                                                       HCat Metadata layer
                                Interface:      Interface:
                                   SQL        HCatLoad/Store


                                                      Interface:
                                                        SerDe

                                                     HCatInput/OutputFormat




                                  metastore    dn1   dn2      dn3   .     .    .
                             - tables
                             - partitions
                             - files             .      .       .    .     .    .
                             - types

                                                .              .    .     .   dnN


                                                               HDFS



   © Hortonworks Inc. 2012
Tools With HCatalog
 Feature                         MapReduce +            Pig + HCatalog        Hive
                                 HCatalog
 Record format                   Record                 Tuple                 Record
 Data model                      int, float, string,    int, float, string,   int, float, string,
                                 maps, structs, lists   bytes, maps,          maps, structs, lists
                                                        tuples, bags
 Schema                          Read from              Read from             Read from
                                 metadata               metadata              metadata
 Data location                   Read from              Read from             Read from
                                 metadata               metadata              metadata
 Data format                     Read from              Read from             Read from
                                 metadata               metadata              metadata

•    Pig/MR users can read schema from metadata
•    Pig/MR users are insulated from schema, location, and format changes
•    All users have access to other users’ data as soon as it is committed
            © Hortonworks 2012
Metadata Services



     applications                                       DML          Hive


                                       HCatalog         DML          HBase
                              REST
     data stores              •  ddl
                              •  dml                    DML           Pig


                                       create             describe
     visualization

      Existing
                                            metastore                Hadoop
   Infrastructure                                                    Cluster




    © Hortonworks Inc. 2012
Services Integration

Provides RESTful API as
“front door” for Hadoop             Existing & New Applications




•    Opens the door to              WebHDFS            HCatalog RESTful Web Services
     languages other than Java

•    Thin clients via web                      MapReduce           Pig   Hive
     services vs. fat-clients in                             HCatalog
     gateway

•    Insulation from interface                                           External
                                        HDFS               HBase
     changes release to release                                           Store




     Opens Hadoop to integration with existing and new applications


          © Hortonworks Inc. 2012
Data Integration Services

•  Intuitive graphical data
   integration tools for HDFS,
   Hive, HBase, HCatalog and Pig

•  Oozie scheduling allows you to
   manage and stage jobs

•  Connectors for any database,
   business application or system

•  Integrated HCatalog storage

 Bridge the gap between
 legacy data & Hadoop

 Simplify and speed development

      © Hortonworks Inc. 2012
Teradata and Hortonworks Partner to Provide
                the First Enterprise Reference Architecture
                          for Hadoop and Big Data

        Partnership provides clear path to enterprise for Hadoop
    •  Reference architecture that provides guidance on best applications
         for Teradata, Teradata Aster, and Hadoop

    •  Clear partnership between industry and community leaders

    •  Deeper integration to ease data movement in/out of Hadoop

    •  Joint R&D and go-to-market


© Hortonworks Inc. 2012
Ambari
Cluster Provisioning
Configuration Management
Monitoring




© Hortonworks Inc. 2012
Ambari Architecture
•  Installs your cluster onto
   target HW for you                data and task
                                                                         n1    n2        n3     .   .   .
                                         sink
                                                      Puppet              .        .      .     .   .   .

                                   Nagios   Ganglia
•  Manage, reconfigure from                                               .               .     .   .   nN

   one place                             worker node
                                                                                          Hadoop



•  Monitor key and meaningful
   Hadoop metrics, not just OS /                               Ambari

   HW                                                           Nagios   Ganglia       Puppet

                                                               controller
                                                                        php portal
•  Scalable in line w/ Hadoop                                                           view

   itself




                                                                         operator



        © Hortonworks Inc. 2012
Ambari
                          Live Demonstration




© Hortonworks Inc. 2012
Why HDP?

ONLY Hortonworks Data Platform provides…
•  Tightly aligned to core Apache Hadoop development line
   - Reduces risk for customers who may add custom coding or projects

•  Enterprise Integration
  - HCatalog provides scalable, extensible integration point to Hadoop data

•  Most reliable Hadoop distribution
  - Full stack high availability on v1 delivers the strongest SLA guarantees

•  Multi-tenant scheduling and resource management
  - Capacity and fair scheduling optimizes cluster resources

•  Integration with operations, eases cluster management
  - Ambari is the most open/complete operations platform for Hadoop clusters




        © Hortonworks Inc. 2012
Hortonworks Support Subscriptions
Objective: help organizations to successfully develop
and deploy solutions based upon Apache Hadoop
• Full-lifecycle technical support available
  – Developer support for design, development and POCs
  – Production support for staging and production environments
       – Up to 24x7 with 1-hour response times

• Delivered by the Apache Hadoop experts
  – Backed by development team that has released every major
    version of Apache Hadoop since 0.1

• Forward-compatibility
  – Hortonworks’ leadership role helps ensure bug fixes and patches
    can be included in future versions of Hadoop projects



                                                                 Page 31
      © Hortonworks Inc. 2012
Hortonworks Training
Objective: help organizations overcome Hadoop
knowledge gaps
• Expert role-based training for developers,
  administrators & data analysts
  – Heavy emphasis on hands-on labs
  – Extensive schedule of public training courses available
    (hortonworks.com/training)

• Comprehensive certification programs



• Customized, on-site courses available

                                                              Page 32
      © Hortonworks Inc. 2012
Thank You!
Questions & Answers




                              Page 33
    © Hortonworks Inc. 2012

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx

  • 1.
    Hortonworks & SystemsIntegrators Mitch Ferguson VP, Business Development Rikin Shah Dir, Field Engineering September 5, 2012 © Hortonworks Inc. 2012 Page 1
  • 2.
    Big data changesthe game Transactions + Interactions Petabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  • 3.
    Hortonworks Snapshot The industry leading and only 100% open source Apache Hadoop distribution •  Headquarters Most experienced open source leadership team Sunnyvale, CA –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss) •  100+ Employees –  Mitch Ferguson: VP BD (SringSource, VMWare) •  Formed with core –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) Apache Hadoop –  Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) engineering team –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform) from Yahoo! •  40+ engineers and architects including Business model focused on customer success: 25+ Hadoop Hadoop support, services & training committers – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available for developers & administrators © Hortonworks Inc. 2012
  • 4.
    Hortonworks Business Strategy Enable the next gen data management platform • Accelerate the adoption of Apache Hadoop • Create a vibrant eco-systems – ISVs, IHV, Systems Integrators • Provide world-class enterprise Support & Training © Hortonworks Inc. 2012
  • 5.
    Hortonworks Vision &Role We believe that by the end of 2015, more than half the world's data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use © Hortonworks Inc. 2012
  • 6.
    Enabling Hadoop asEnterprise Big Data Platform Applications, Installation & Business Tools, Configuration, Development Tools, Administration, Open APIs and access Monitoring, Data Movement & Integration, High Availability, Data Management Systems, Replication, Systems Management Hortonworks Multi-tenancy, .. Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs © Hortonworks Inc. 2012
  • 7.
    Hortonworks Partner Eco-System © Hortonworks Inc. 2012
  • 8.
    Hortonworks & SIs Our business models are 100% Complementary • Systems Integrators are a corner-stone of our business model • Enable high-value & repeatable solutions • Leverage multi-party relationships to accelerate business Systems Integrator Customer © Hortonworks Inc. 2012
  • 9.
    Why Hortonworks? •  Themost Apache Hadoop experience and expertise –  Reliable Hadoop from the experts, project leaders, architects and builders –  Collectively over 90 years operational Hadoop experience (at least double that of the closest competitor) •  Influence community direction –  Provides a direct connection to drive innovation in the community •  Focus on the ecosystem –  Roadmap and vision to provide access to the wide ecosystem of enterprise application, such as Teradata •  Industry momentum –  Collaborate across partners (ISVs/IHVs/SIs) to enable high-value solutions © Hortonworks Inc. 2012
  • 10.
    Hortonworks Apache HadoopLeadership Hortonworkers… the builders, Leadership operators and core architects • VP and PMC of Hadoop of Apache Hadoop Arun Murthy • Core Architect of YARN Arun Murthy •  Most experienced team running Hadoop • Core Architect MapReduce2 in production at scale (> 5 years, 42000 nodes) Arun Murthy •  All “stable” releases of Apache Hadoop • VP & PMC of Pig Daniel Dai have been shipped by Hortonworkers • VP of Zookeeper Mahdev Konar “We have noticed more activity over the • Inventor of HCatalog last year from Hortonworks’ engineers Alan Gates on building out Apache Hadoop’s more • Project Lead for Ambari innovative features. These include Mahedev Konar YARN, Ambari and HCatalog..” • Original Project Lead Eric Baldschweiler - Jeff Kelly: Wkibon © Hortonworks Inc. 2012
  • 11.
    Hortonworks Data Platform ©Hortonworks Inc. 2012
  • 12.
    Hortonworks Data Platform Develop Interact Non-Relational Database Talend Open Studio for Big Data, Sqoop, Flume) Scripting Query Management & Monitoring Services (Pig) (Hive) Data Integration Services Workflow & Scheduling (HCatalog APIs, WebHDFS, (HBase) Metadata Services (Ambari, Zookeeper) (HCatalog) (Oozie) Operate Distributed Processing Integrate (MapReduce) Distributed Storage (HDFS) Hortonworks Data Platform © Hortonworks Inc. 2012
  • 13.
    Apache Hadoop ReleaseManagement 1.1.1 1.1.2 Hadoop 1 1.0.1 1.0.2 1.0.3 HDP 1.0 •  Apache Hadoop Release management is run by Hortonworks •  Matt Foley, Release Manager for Hadoop 1 •  Arun Murthy, Release Manager for Hadoop 2 •  Ashutosh Chauhan, Release Manager for Hive •  Daniel Dai, Release Manager for Pig •  Alan Gates, Release Manager for Hcat •  Hadoop Core releases validated (and fixed) by Hortonworks •  ~1300 end to end system tests run in house using our IP before any release can be made •  Hortonworks Data Platform is released directly from Apache Hadoop branches © Hortonworks Inc. 2012
  • 14.
    Full Stack HighAvailability © Hortonworks Inc. 2012
  • 15.
    Full Stack HighAvailability HA HA •  Failover and restart for •  NameNode •  JobTracker •  HBase and other services to come… HA Cluster Core Switch Rack Switch Rack Switch •  Open API allows use of Proven HA from multiple vendors (Red Hat & VMWare) Namenode Namenode HA Manager HA Manager •  Minimized changes to clients and configuration Job Tracker Job Tracker •  Complementary to 2.0 HA efforts HA Manager HA Manager •  Server & Operating System failure Etc. daemon Etc. daemon detection and VM restart HA Manager HA Manager •  Smart resource management ensures sufficient resources are HA Pairs available to restart VMs Addresses HA needs on stable Apache Hadoop 1.0 © Hortonworks Inc. 2012
  • 16.
    Capacity Scheduler DeliversMulti-tenancy • Queue definition – % of total system memory – % CPU utilization (not slot count) • Queues per team – Soft limits and hard so you can use entire cluster if available – Ownership and security built in • Proactive resource management – Lots of rules and observation points – Don’t start another task if it will blow up the node – Don’t start another task if other workloads are spinning up • Better than Fair + Preemption (HDP Supports All) – Utilization not measured by slot count (can blow up a node / cluster) – Doesn’t start all tasks automatically (proactive vs. reactive) © Hortonworks Inc. 2012
  • 17.
    HCatalog METADATA © Hortonworks Inc. 2012
  • 18.
    Metadata Services Apache HCatalogprovides flexible metadata services across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  • 19.
    Options Lead toComplexity Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string, int, float, string, bytes, maps, maps, structs, lists tuples, bags Schema Encoded in app Declared in script Read from or read by loader metadata Data location Encoded in app Declared in script Read from metadata Data format Encoded in app Declared in script Read from metadata •  Pig and MR users need to know a lot to write their apps •  When data schema, location, or format change Pig and MR apps must be rewritten, retested, and redeployed •  Hive users have to load data from Pig/MR users to have access to it © Hortonworks 2012
  • 20.
    Hadoop Ecosystem Hive Pig MapReduce (SQL) (scripting) (Java) Interface: Interface: Interface: SQL SerDe Load/Store DML Input/ Input/ Input/ OutputFormat OutputFormat OutputFormat DDL metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  • 21.
    Opening up Metadatato MR & Pig Hive Pig MapReduce (SQL) (scripting) (Java) HCat Metadata layer Interface: Interface: SQL HCatLoad/Store Interface: SerDe HCatInput/OutputFormat metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  • 22.
    Tools With HCatalog Feature MapReduce + Pig + HCatalog Hive HCatalog Record format Record Tuple Record Data model int, float, string, int, float, string, int, float, string, maps, structs, lists bytes, maps, maps, structs, lists tuples, bags Schema Read from Read from Read from metadata metadata metadata Data location Read from Read from Read from metadata metadata metadata Data format Read from Read from Read from metadata metadata metadata •  Pig/MR users can read schema from metadata •  Pig/MR users are insulated from schema, location, and format changes •  All users have access to other users’ data as soon as it is committed © Hortonworks 2012
  • 23.
    Metadata Services applications DML Hive HCatalog DML HBase REST data stores •  ddl •  dml DML Pig create describe visualization Existing metastore Hadoop Infrastructure Cluster © Hortonworks Inc. 2012
  • 24.
    Services Integration Provides RESTfulAPI as “front door” for Hadoop Existing & New Applications •  Opens the door to WebHDFS HCatalog RESTful Web Services languages other than Java •  Thin clients via web MapReduce Pig Hive services vs. fat-clients in HCatalog gateway •  Insulation from interface External HDFS HBase changes release to release Store Opens Hadoop to integration with existing and new applications © Hortonworks Inc. 2012
  • 25.
    Data Integration Services • Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig •  Oozie scheduling allows you to manage and stage jobs •  Connectors for any database, business application or system •  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development © Hortonworks Inc. 2012
  • 26.
    Teradata and HortonworksPartner to Provide the First Enterprise Reference Architecture for Hadoop and Big Data Partnership provides clear path to enterprise for Hadoop •  Reference architecture that provides guidance on best applications for Teradata, Teradata Aster, and Hadoop •  Clear partnership between industry and community leaders •  Deeper integration to ease data movement in/out of Hadoop •  Joint R&D and go-to-market © Hortonworks Inc. 2012
  • 27.
  • 28.
    Ambari Architecture •  Installsyour cluster onto target HW for you data and task n1 n2 n3 . . . sink Puppet . . . . . . Nagios Ganglia •  Manage, reconfigure from . . . . nN one place worker node Hadoop •  Monitor key and meaningful Hadoop metrics, not just OS / Ambari HW Nagios Ganglia Puppet controller php portal •  Scalable in line w/ Hadoop view itself operator © Hortonworks Inc. 2012
  • 29.
    Ambari Live Demonstration © Hortonworks Inc. 2012
  • 30.
    Why HDP? ONLY HortonworksData Platform provides… •  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects •  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data •  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees •  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources •  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters © Hortonworks Inc. 2012
  • 31.
    Hortonworks Support Subscriptions Objective:help organizations to successfully develop and deploy solutions based upon Apache Hadoop • Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times • Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1 • Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 31 © Hortonworks Inc. 2012
  • 32.
    Hortonworks Training Objective: helporganizations overcome Hadoop knowledge gaps • Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (hortonworks.com/training) • Comprehensive certification programs • Customized, on-site courses available Page 32 © Hortonworks Inc. 2012
  • 33.
    Thank You! Questions &Answers Page 33 © Hortonworks Inc. 2012