Hortonworks & Systems IntegratorsMitch FergusonVP, Business DevelopmentRikin ShahDir, Field EngineeringSeptember 5, 2012© ...
Big data changes the game                                                                     Transactions + InteractionsP...
Hortonworks Snapshot                                     The industry leading and only 100% open                          ...
Hortonworks Business Strategy  Enable the next gen data management                platform• Accelerate the adoption of Apa...
Hortonworks Vision & Role                                We believe that by the end of 2015,                              ...
Enabling Hadoop as Enterprise Big Data PlatformApplications,                                                              ...
Hortonworks Partner Eco-System   © Hortonworks Inc. 2012
Hortonworks & SIs Our business models are 100% Complementary• Systems Integrators are a corner-stone of our business model...
Why Hortonworks?•  The most Apache Hadoop experience and expertise   –  Reliable Hadoop from the experts, project leaders,...
Hortonworks Apache Hadoop LeadershipHortonworkers… the builders,                                             Leadershipope...
Hortonworks Data Platform© Hortonworks Inc. 2012
Hortonworks Data Platform                                                                                                 ...
Apache Hadoop Release Management                                                     1.1.1      1.1.2       Hadoop 1      ...
Full Stack High Availability© Hortonworks Inc. 2012
Full Stack High Availability                                                                       HA                     ...
Capacity Scheduler Delivers Multi-tenancy• Queue definition  – % of total system memory  – % CPU utilization (not slot cou...
HCatalog                          METADATA© Hortonworks Inc. 2012
Metadata ServicesApache HCatalog provides flexible metadataservices across tools and external access •  Consistency of met...
Options Lead to Complexity Feature                        MapReduce         Pig                   Hive Record format      ...
Hadoop Ecosystem                                  Hive                       Pig                 MapReduce                ...
Opening up Metadata to MR & Pig                                   Hive             Pig                 MapReduce          ...
Tools With HCatalog Feature                         MapReduce +            Pig + HCatalog        Hive                     ...
Metadata Services     applications                                       DML          Hive                                ...
Services IntegrationProvides RESTful API as“front door” for Hadoop             Existing & New Applications•    Opens the d...
Data Integration Services•  Intuitive graphical data   integration tools for HDFS,   Hive, HBase, HCatalog and Pig•  Oozie...
Teradata and Hortonworks Partner to Provide                the First Enterprise Reference Architecture                    ...
AmbariCluster ProvisioningConfiguration ManagementMonitoring© Hortonworks Inc. 2012
Ambari Architecture•  Installs your cluster onto   target HW for you                data and task                         ...
Ambari                          Live Demonstration© Hortonworks Inc. 2012
Why HDP?ONLY Hortonworks Data Platform provides…•  Tightly aligned to core Apache Hadoop development line   - Reduces risk...
Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apac...
Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers...
Thank You!Questions & Answers                              Page 33    © Hortonworks Inc. 2012
Upcoming SlideShare
Loading in …5
×

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx

3,658 views

Published on

Webinar introducing Hortonworks Data Platform HDP for Systems Integrators.

Published in: Technology

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx

  1. 1. Hortonworks & Systems IntegratorsMitch FergusonVP, Business DevelopmentRikin ShahDir, Field EngineeringSeptember 5, 2012© Hortonworks Inc. 2012 Page 1
  2. 2. Big data changes the game Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  3. 3. Hortonworks Snapshot The industry leading and only 100% open source Apache Hadoop distribution•  Headquarters Most experienced open source leadership team Sunnyvale, CA –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss)•  100+ Employees –  Mitch Ferguson: VP BD (SringSource, VMWare)•  Formed with core –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) Apache Hadoop –  Ari Zilka – CPO (Teracotta, Accenture, Walmart.com) engineering team –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform) from Yahoo!•  40+ engineers and architects including Business model focused on customer success: 25+ Hadoop Hadoop support, services & training committers – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available for developers & administrators © Hortonworks Inc. 2012
  4. 4. Hortonworks Business Strategy Enable the next gen data management platform• Accelerate the adoption of Apache Hadoop• Create a vibrant eco-systems – ISVs, IHV, Systems Integrators• Provide world-class enterprise Support & Training © Hortonworks Inc. 2012
  5. 5. Hortonworks Vision & Role We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use © Hortonworks Inc. 2012
  6. 6. Enabling Hadoop as Enterprise Big Data PlatformApplications, Installation &Business Tools, Configuration,Development Tools, Administration,Open APIs and access Monitoring,Data Movement & Integration, High Availability,Data Management Systems, Replication,Systems Management Hortonworks Multi-tenancy, .. Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs © Hortonworks Inc. 2012
  7. 7. Hortonworks Partner Eco-System © Hortonworks Inc. 2012
  8. 8. Hortonworks & SIs Our business models are 100% Complementary• Systems Integrators are a corner-stone of our business model• Enable high-value & repeatable solutions• Leverage multi-party relationships to accelerate business Systems Integrator Customer © Hortonworks Inc. 2012
  9. 9. Why Hortonworks?•  The most Apache Hadoop experience and expertise –  Reliable Hadoop from the experts, project leaders, architects and builders –  Collectively over 90 years operational Hadoop experience (at least double that of the closest competitor)•  Influence community direction –  Provides a direct connection to drive innovation in the community•  Focus on the ecosystem –  Roadmap and vision to provide access to the wide ecosystem of enterprise application, such as Teradata•  Industry momentum –  Collaborate across partners (ISVs/IHVs/SIs) to enable high-value solutions © Hortonworks Inc. 2012
  10. 10. Hortonworks Apache Hadoop LeadershipHortonworkers… the builders, Leadershipoperators and core architects • VP and PMC of Hadoopof Apache Hadoop Arun Murthy • Core Architect of YARN Arun Murthy•  Most experienced team running Hadoop • Core Architect MapReduce2 in production at scale (> 5 years, 42000 nodes) Arun Murthy•  All “stable” releases of Apache Hadoop • VP & PMC of Pig Daniel Dai have been shipped by Hortonworkers • VP of Zookeeper Mahdev Konar“We have noticed more activity over the • Inventor of HCataloglast year from Hortonworks’ engineers Alan Gateson building out Apache Hadoop’s more • Project Lead for Ambariinnovative features. These include Mahedev KonarYARN, Ambari and HCatalog..” • Original Project Lead Eric Baldschweiler - Jeff Kelly: Wkibon © Hortonworks Inc. 2012
  11. 11. Hortonworks Data Platform© Hortonworks Inc. 2012
  12. 12. Hortonworks Data Platform Develop Interact Non-Relational Database Talend Open Studio for Big Data, Sqoop, Flume) Scripting Query Management & Monitoring Services (Pig) (Hive) Data Integration Services Workflow & Scheduling (HCatalog APIs, WebHDFS, (HBase) Metadata Services (Ambari, Zookeeper) (HCatalog) (Oozie)Operate Distributed Processing Integrate (MapReduce) Distributed Storage (HDFS) Hortonworks Data Platform © Hortonworks Inc. 2012
  13. 13. Apache Hadoop Release Management 1.1.1 1.1.2 Hadoop 1 1.0.1 1.0.2 1.0.3 HDP 1.0•  Apache Hadoop Release management is run by Hortonworks •  Matt Foley, Release Manager for Hadoop 1 •  Arun Murthy, Release Manager for Hadoop 2 •  Ashutosh Chauhan, Release Manager for Hive •  Daniel Dai, Release Manager for Pig •  Alan Gates, Release Manager for Hcat•  Hadoop Core releases validated (and fixed) by Hortonworks •  ~1300 end to end system tests run in house using our IP before any release can be made•  Hortonworks Data Platform is released directly from Apache Hadoop branches © Hortonworks Inc. 2012
  14. 14. Full Stack High Availability© Hortonworks Inc. 2012
  15. 15. Full Stack High Availability HA HA•  Failover and restart for •  NameNode •  JobTracker •  HBase and other services to come… HA Cluster Core Switch Rack Switch Rack Switch•  Open API allows use of Proven HA from multiple vendors (Red Hat & VMWare) Namenode Namenode HA Manager HA Manager•  Minimized changes to clients and configuration Job Tracker Job Tracker•  Complementary to 2.0 HA efforts HA Manager HA Manager•  Server & Operating System failure Etc. daemon Etc. daemon detection and VM restart HA Manager HA Manager•  Smart resource management ensures sufficient resources are HA Pairs available to restart VMs Addresses HA needs on stable Apache Hadoop 1.0 © Hortonworks Inc. 2012
  16. 16. Capacity Scheduler Delivers Multi-tenancy• Queue definition – % of total system memory – % CPU utilization (not slot count)• Queues per team – Soft limits and hard so you can use entire cluster if available – Ownership and security built in• Proactive resource management – Lots of rules and observation points – Don’t start another task if it will blow up the node – Don’t start another task if other workloads are spinning up• Better than Fair + Preemption (HDP Supports All) – Utilization not measured by slot count (can blow up a node / cluster) – Doesn’t start all tasks automatically (proactive vs. reactive) © Hortonworks Inc. 2012
  17. 17. HCatalog METADATA© Hortonworks Inc. 2012
  18. 18. Metadata ServicesApache HCatalog provides flexible metadataservices across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  19. 19. Options Lead to Complexity Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string, int, float, string, bytes, maps, maps, structs, lists tuples, bags Schema Encoded in app Declared in script Read from or read by loader metadata Data location Encoded in app Declared in script Read from metadata Data format Encoded in app Declared in script Read from metadata•  Pig and MR users need to know a lot to write their apps•  When data schema, location, or format change Pig and MR apps must be rewritten, retested, and redeployed•  Hive users have to load data from Pig/MR users to have access to it © Hortonworks 2012
  20. 20. Hadoop Ecosystem Hive Pig MapReduce (SQL) (scripting) (Java) Interface: Interface: Interface: SQL SerDe Load/Store DML Input/ Input/ Input/ OutputFormat OutputFormat OutputFormat DDL metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  21. 21. Opening up Metadata to MR & Pig Hive Pig MapReduce (SQL) (scripting) (Java) HCat Metadata layer Interface: Interface: SQL HCatLoad/Store Interface: SerDe HCatInput/OutputFormat metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  22. 22. Tools With HCatalog Feature MapReduce + Pig + HCatalog Hive HCatalog Record format Record Tuple Record Data model int, float, string, int, float, string, int, float, string, maps, structs, lists bytes, maps, maps, structs, lists tuples, bags Schema Read from Read from Read from metadata metadata metadata Data location Read from Read from Read from metadata metadata metadata Data format Read from Read from Read from metadata metadata metadata•  Pig/MR users can read schema from metadata•  Pig/MR users are insulated from schema, location, and format changes•  All users have access to other users’ data as soon as it is committed © Hortonworks 2012
  23. 23. Metadata Services applications DML Hive HCatalog DML HBase REST data stores •  ddl •  dml DML Pig create describe visualization Existing metastore Hadoop Infrastructure Cluster © Hortonworks Inc. 2012
  24. 24. Services IntegrationProvides RESTful API as“front door” for Hadoop Existing & New Applications•  Opens the door to WebHDFS HCatalog RESTful Web Services languages other than Java•  Thin clients via web MapReduce Pig Hive services vs. fat-clients in HCatalog gateway•  Insulation from interface External HDFS HBase changes release to release Store Opens Hadoop to integration with existing and new applications © Hortonworks Inc. 2012
  25. 25. Data Integration Services•  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig•  Oozie scheduling allows you to manage and stage jobs•  Connectors for any database, business application or system•  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development © Hortonworks Inc. 2012
  26. 26. Teradata and Hortonworks Partner to Provide the First Enterprise Reference Architecture for Hadoop and Big Data Partnership provides clear path to enterprise for Hadoop •  Reference architecture that provides guidance on best applications for Teradata, Teradata Aster, and Hadoop •  Clear partnership between industry and community leaders •  Deeper integration to ease data movement in/out of Hadoop •  Joint R&D and go-to-market© Hortonworks Inc. 2012
  27. 27. AmbariCluster ProvisioningConfiguration ManagementMonitoring© Hortonworks Inc. 2012
  28. 28. Ambari Architecture•  Installs your cluster onto target HW for you data and task n1 n2 n3 . . . sink Puppet . . . . . . Nagios Ganglia•  Manage, reconfigure from . . . . nN one place worker node Hadoop•  Monitor key and meaningful Hadoop metrics, not just OS / Ambari HW Nagios Ganglia Puppet controller php portal•  Scalable in line w/ Hadoop view itself operator © Hortonworks Inc. 2012
  29. 29. Ambari Live Demonstration© Hortonworks Inc. 2012
  30. 30. Why HDP?ONLY Hortonworks Data Platform provides…•  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects•  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data•  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees•  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources•  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters © Hortonworks Inc. 2012
  31. 31. Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apache Hadoop• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times• Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 31 © Hortonworks Inc. 2012
  32. 32. Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (hortonworks.com/training)• Comprehensive certification programs• Customized, on-site courses available Page 32 © Hortonworks Inc. 2012
  33. 33. Thank You!Questions & Answers Page 33 © Hortonworks Inc. 2012

×