Introduction to the
Hortonworks Data Platform
Ari Zilka, Chief Products Officer
June 20, 2012




© Hortonworks Inc. 2012             Page 1
Who is Ari



                              Ari Zilka
                              Chief Products Officer
                              •    Bi coastal
                              •    Motorcycles
                              •    Technology




                                                       Page 2
    © Hortonworks Inc. 2012
Hortonworks Data Platform

                                                           •  Simplify deployment to get
                                                              started quickly and easily

                                                           •  Monitor, manage any size cluster
                                                              with familiar console and tools

                                                           •  Only platform to include data
                                                              integration services to interact
                                1                             with any data source

                                                           •  Metadata services opens the
                                                              platform for integration with
           Hortonworks Data Platform                          existing applications
    Delivers enterprise grade functionality on a proven
    Apache Hadoop distribution to ease management,         •  Dependable high availability
   simplify use and ease integration into the enterprise      architecture




The only 100% open source data platform for Apache Hadoop

      © Hortonworks Inc. 2012
Enabling Hadoop as Enterprise Big
Data Platform


 Applications,
 Business Tools,                                                            Usability,
 Development Tools,                                                         Installation & Configuration,
 Data Movement & Integration,                                               Administration,
 Data Management Systems,                                                   Monitoring,
 Systems Management,                                                        Data Extract & Load,
 Infrastructure
                                            Hortonworks
                                            Data Platform

                                       DEVELOPER
                                 Data Platform Services & Open APIs

                                   Metadata, Indexing, Search, Security,
                                  Management, HA, DR, Replication, Multi-
                                                tenancy, ...




                                                                                                       Page 4
       © Hortonworks Inc. 2012
Management & Monitoring Svcs

Hortonworks Management Center
   – View the health of cluster operations,
     server utilization and performance levels
   – Customizable dashboards
   – APIs for integration into 3rd party
     monitoring tools
   – 100% open source management &
     monitoring, powered by Apache Ambari,
     Puppet, Nagios and Gaglia
   – Simple wizard-based installation,
     configuration & provisioning of any size
     Hadoop cluster

Optimize performance for your Hadoop cluster
Simplify Installation and provisioning

                                                 Page 5
       © Hortonworks Inc. 2012
Simple Installation
•    Step-by-step install across multiple
     nodes
•    Automated compatibility and
     dependency checks
•    Analyzes/recommends optimal
     services configuration
•    Automatically configures mount
     points in the cluster



     Simple wizard-based installation,
     configuration & provisioning of any
     size Hadoop cluster


         © Hortonworks Inc. 2012
HMC Architecture




                             Page 7
   © Hortonworks Inc. 2012
Demonstration




                             Hortonworks Data Platform
                             •    Hortonworks Management Center
                             •    HCatalog & Data Integration Services
                             •    High Availability




                                                                    Page 8
   © Hortonworks Inc. 2012
Metadata Services
Apache HCatalog provides flexible metadata
services across tools and external access
 •  Consistency of metadata and data models across tools
    (MapReduce, Pig, HBase and Hive)
 •  Accessibility: share data as tables in and out of HDFS
 •  Availability: enables flexible, thin-client access via REST API




                                  HCatalog                        Shared table
                                                                  and schema
                                                                  management
   •  Raw Hadoop data                        Table access         opens the
   •  Inconsistent, unknown                  Aligned metadata     platform
   •  Tool specific access                   REST API



        © Hortonworks Inc. 2012
Data Integration Services

•  Intuitive graphical data
   integration tools for HDFS,
   Hive, HBase, HCatalog and Pig

•  Oozie scheduling allows you to
   manage and stage jobs

•  Connectors for any database,
   business application or system

•  Integrated HCatalog storage

 Bridge the gap between
 legacy data & Hadoop

 Simplify and speed development

                                    Page 10
      © Hortonworks Inc. 2012
Metadata Services



     applications                                       DML          Hive


                                       HCatalog         DML          HBase
                              REST
     data stores              •  ddl
                              •  dml                    DML           Pig


                                       create             describe
     visualization

      Existing
                                            metastore                Hadoop
   Infrastructure                                                    Cluster




    © Hortonworks Inc. 2012
Demonstration




                             Hortonworks Data Platform
                             •    Hortonworks Management Center
                             •    HCatalog & Data Integration Services
                             •    High Availability




                                                                   Page 12
   © Hortonworks Inc. 2012
Full Stack High Availability                                      HA

                                                                  HA




•  Failover and restart for
     •  NameNode
     •  JobTracker
     •  Other services to come…


                                       HA Cluster
•  Open API allows use of Proven
   HA from multiple vendors
                                         Built on Stable proven
•  Minimized changes to clients and      Apache Hadoop release
   configuration
                                         Complementary to
•  Server & Operating System failure
                                         Hadoop 2.0 HA efforts
   detection and VM restart

•  Smart resource management
   ensures sufficient resources are
   available to restart VMs


         © Hortonworks Inc. 2012
Demonstration




                             Hortonworks Data Platform
                             •    Hortonworks Management Center
                             •    HCatalog & Data Integration Services
                             •    High Availability




                                                                   Page 14
   © Hortonworks Inc. 2012
What next?

1                                 Download Hortonworks Data Platform
                                  hortonworks.com/download




2   Use the getting started guide
    hortonworks.com/get-started



3   Learn more… get support

                                                             Hortonworks Support
       •  Expert role based training                         •  Full lifecycle technical support
       •  Course for admins, developers                         across four service levels
          and operators                                      •  Delivered by Apache Hadoop
       •  Certification program                                 Experts/Committers
       •  Custom onsite options                              •  Forward-compatible
        hortonworks.com/training                             hortonworks.com/support


                                                                                                   Page 15
        © Hortonworks Inc. 2012
Hortonworks Support Subscriptions
Objective: help organizations to successfully develop
and deploy solutions based upon Apache Hadoop
• Full-lifecycle technical support available
  – Developer support for design, development and POCs
  – Production support for staging and production environments
       – Up to 24x7 with 1-hour response times

• Delivered by the Apache Hadoop experts
  – Backed by development team that has released every major
    version of Apache Hadoop since 0.1

• Forward-compatibility
  – Hortonworks’ leadership role helps ensure bug fixes and patches
    can be included in future versions of Hadoop projects



                                                                 Page 16
      © Hortonworks Inc. 2012
Hortonworks Training
Objective: help organizations overcome Hadoop
knowledge gaps
• Expert role-based training for developers,
  administrators & data analysts
  – Heavy emphasis on hands-on labs
  – Extensive schedule of public training courses available
    (hortonworks.com/training)

• Comprehensive certification programs



• Customized, on-site courses available

                                                              Page 17
      © Hortonworks Inc. 2012
Questions & Answers

                                           TRY
                                           download at hortonworks.com

                                           LEARN
                                           Hortonworks University

                                           FOLLOW
                                           twitter: @hortonworks
                                           Facebook: facebook.com/hortonworks

                                           MORE EVENTS
                                           hortonworks.com/events




                             Further questions & comments: events@hortonworks.com

                                                                             Page 18
   © Hortonworks Inc. 2012

Introduction to Hortonworks Data Platform

  • 1.
    Introduction to the HortonworksData Platform Ari Zilka, Chief Products Officer June 20, 2012 © Hortonworks Inc. 2012 Page 1
  • 2.
    Who is Ari Ari Zilka Chief Products Officer •  Bi coastal •  Motorcycles •  Technology Page 2 © Hortonworks Inc. 2012
  • 3.
    Hortonworks Data Platform •  Simplify deployment to get started quickly and easily •  Monitor, manage any size cluster with familiar console and tools •  Only platform to include data integration services to interact 1 with any data source •  Metadata services opens the platform for integration with Hortonworks Data Platform existing applications Delivers enterprise grade functionality on a proven Apache Hadoop distribution to ease management, •  Dependable high availability simplify use and ease integration into the enterprise architecture The only 100% open source data platform for Apache Hadoop © Hortonworks Inc. 2012
  • 4.
    Enabling Hadoop asEnterprise Big Data Platform Applications, Business Tools, Usability, Development Tools, Installation & Configuration, Data Movement & Integration, Administration, Data Management Systems, Monitoring, Systems Management, Data Extract & Load, Infrastructure Hortonworks Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, HA, DR, Replication, Multi- tenancy, ... Page 4 © Hortonworks Inc. 2012
  • 5.
    Management & MonitoringSvcs Hortonworks Management Center – View the health of cluster operations, server utilization and performance levels – Customizable dashboards – APIs for integration into 3rd party monitoring tools – 100% open source management & monitoring, powered by Apache Ambari, Puppet, Nagios and Gaglia – Simple wizard-based installation, configuration & provisioning of any size Hadoop cluster Optimize performance for your Hadoop cluster Simplify Installation and provisioning Page 5 © Hortonworks Inc. 2012
  • 6.
    Simple Installation •  Step-by-step install across multiple nodes •  Automated compatibility and dependency checks •  Analyzes/recommends optimal services configuration •  Automatically configures mount points in the cluster Simple wizard-based installation, configuration & provisioning of any size Hadoop cluster © Hortonworks Inc. 2012
  • 7.
    HMC Architecture Page 7 © Hortonworks Inc. 2012
  • 8.
    Demonstration Hortonworks Data Platform •  Hortonworks Management Center •  HCatalog & Data Integration Services •  High Availability Page 8 © Hortonworks Inc. 2012
  • 9.
    Metadata Services Apache HCatalogprovides flexible metadata services across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  • 10.
    Data Integration Services • Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig •  Oozie scheduling allows you to manage and stage jobs •  Connectors for any database, business application or system •  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development Page 10 © Hortonworks Inc. 2012
  • 11.
    Metadata Services applications DML Hive HCatalog DML HBase REST data stores •  ddl •  dml DML Pig create describe visualization Existing metastore Hadoop Infrastructure Cluster © Hortonworks Inc. 2012
  • 12.
    Demonstration Hortonworks Data Platform •  Hortonworks Management Center •  HCatalog & Data Integration Services •  High Availability Page 12 © Hortonworks Inc. 2012
  • 13.
    Full Stack HighAvailability HA HA •  Failover and restart for •  NameNode •  JobTracker •  Other services to come… HA Cluster •  Open API allows use of Proven HA from multiple vendors Built on Stable proven •  Minimized changes to clients and Apache Hadoop release configuration Complementary to •  Server & Operating System failure Hadoop 2.0 HA efforts detection and VM restart •  Smart resource management ensures sufficient resources are available to restart VMs © Hortonworks Inc. 2012
  • 14.
    Demonstration Hortonworks Data Platform •  Hortonworks Management Center •  HCatalog & Data Integration Services •  High Availability Page 14 © Hortonworks Inc. 2012
  • 15.
    What next? 1 Download Hortonworks Data Platform hortonworks.com/download 2 Use the getting started guide hortonworks.com/get-started 3 Learn more… get support Hortonworks Support •  Expert role based training •  Full lifecycle technical support •  Course for admins, developers across four service levels and operators •  Delivered by Apache Hadoop •  Certification program Experts/Committers •  Custom onsite options •  Forward-compatible hortonworks.com/training hortonworks.com/support Page 15 © Hortonworks Inc. 2012
  • 16.
    Hortonworks Support Subscriptions Objective:help organizations to successfully develop and deploy solutions based upon Apache Hadoop • Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times • Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1 • Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 16 © Hortonworks Inc. 2012
  • 17.
    Hortonworks Training Objective: helporganizations overcome Hadoop knowledge gaps • Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (hortonworks.com/training) • Comprehensive certification programs • Customized, on-site courses available Page 17 © Hortonworks Inc. 2012
  • 18.
    Questions & Answers TRY download at hortonworks.com LEARN Hortonworks University FOLLOW twitter: @hortonworks Facebook: facebook.com/hortonworks MORE EVENTS hortonworks.com/events Further questions & comments: events@hortonworks.com Page 18 © Hortonworks Inc. 2012