Your SlideShare is downloading. ×
Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Hortonworks Data Platform for Systems Integrators Webinar 9-5-2012.pptx


Published on

Webinar introducing Hortonworks Data Platform HDP for Systems Integrators.

Webinar introducing Hortonworks Data Platform HDP for Systems Integrators.

Published in: Technology

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Hortonworks & Systems IntegratorsMitch FergusonVP, Business DevelopmentRikin ShahDir, Field EngineeringSeptember 5, 2012© Hortonworks Inc. 2012 Page 1
  • 2. Big data changes the game Transactions + InteractionsPetabytes BIG DATA Mobile Web + Observations Sentiment User Click Stream SMS/MMS = BIG DATA Speech to Text Social Interactions & Feeds Terabytes WEB Web logs Spatial & GPS Coordinates A/B testing Sensors / RFID / Devices Behavioral Targeting Gigabytes CRM Business Data Feeds Dynamic Pricing Segmentation External Demographics Search Marketing Customer Touches User Generated Content ERP Megabytes Affiliate Networks Purchase detail Support Contacts HD Video, Audio, Images Dynamic Funnels Purchase record Offer details Offer history Product/Service Logs Payment record Increasing Data Variety and Complexity © Hortonworks Inc. 2012
  • 3. Hortonworks Snapshot The industry leading and only 100% open source Apache Hadoop distribution•  Headquarters Most experienced open source leadership team Sunnyvale, CA –  Rob Bearden – CEO (JBoss, SpringSource, i2, Oracle) –  Shaun Connolly – VP Strategy (VMW, SpringSource, Red Hat, JBoss)•  100+ Employees –  Mitch Ferguson: VP BD (SringSource, VMWare)•  Formed with core –  John Kreisa – VP Marketing (Red Hat, Cloudera, MarkLogic, Bus Obj) Apache Hadoop –  Ari Zilka – CPO (Teracotta, Accenture, engineering team –  Greg Pavlik – VP Eng. (Oracle SOA & Integration platform) from Yahoo!•  40+ engineers and architects including Business model focused on customer success: 25+ Hadoop Hadoop support, services & training committers – Subscription support for Hortonworks Data Platform – Training business: Private and public classes available for developers & administrators © Hortonworks Inc. 2012
  • 4. Hortonworks Business Strategy Enable the next gen data management platform• Accelerate the adoption of Apache Hadoop• Create a vibrant eco-systems – ISVs, IHV, Systems Integrators• Provide world-class enterprise Support & Training © Hortonworks Inc. 2012
  • 5. Hortonworks Vision & Role We believe that by the end of 2015, more than half the worlds data will be processed by Apache Hadoop. 1 Be diligent stewards of the open source core 2 Be tireless innovators beyond the core 3 Provide robust data platform services & open APIs 4 Enable the ecosystem at each layer of the stack 5 Make the platform enterprise-ready & easy to use © Hortonworks Inc. 2012
  • 6. Enabling Hadoop as Enterprise Big Data PlatformApplications, Installation &Business Tools, Configuration,Development Tools, Administration,Open APIs and access Monitoring,Data Movement & Integration, High Availability,Data Management Systems, Replication,Systems Management Hortonworks Multi-tenancy, .. Data Platform DEVELOPER Data Platform Services & Open APIs Metadata, Indexing, Search, Security, Management, Data Extract & Load, APIs © Hortonworks Inc. 2012
  • 7. Hortonworks Partner Eco-System © Hortonworks Inc. 2012
  • 8. Hortonworks & SIs Our business models are 100% Complementary• Systems Integrators are a corner-stone of our business model• Enable high-value & repeatable solutions• Leverage multi-party relationships to accelerate business Systems Integrator Customer © Hortonworks Inc. 2012
  • 9. Why Hortonworks?•  The most Apache Hadoop experience and expertise –  Reliable Hadoop from the experts, project leaders, architects and builders –  Collectively over 90 years operational Hadoop experience (at least double that of the closest competitor)•  Influence community direction –  Provides a direct connection to drive innovation in the community•  Focus on the ecosystem –  Roadmap and vision to provide access to the wide ecosystem of enterprise application, such as Teradata•  Industry momentum –  Collaborate across partners (ISVs/IHVs/SIs) to enable high-value solutions © Hortonworks Inc. 2012
  • 10. Hortonworks Apache Hadoop LeadershipHortonworkers… the builders, Leadershipoperators and core architects • VP and PMC of Hadoopof Apache Hadoop Arun Murthy • Core Architect of YARN Arun Murthy•  Most experienced team running Hadoop • Core Architect MapReduce2 in production at scale (> 5 years, 42000 nodes) Arun Murthy•  All “stable” releases of Apache Hadoop • VP & PMC of Pig Daniel Dai have been shipped by Hortonworkers • VP of Zookeeper Mahdev Konar“We have noticed more activity over the • Inventor of HCataloglast year from Hortonworks’ engineers Alan Gateson building out Apache Hadoop’s more • Project Lead for Ambariinnovative features. These include Mahedev KonarYARN, Ambari and HCatalog..” • Original Project Lead Eric Baldschweiler - Jeff Kelly: Wkibon © Hortonworks Inc. 2012
  • 11. Hortonworks Data Platform© Hortonworks Inc. 2012
  • 12. Hortonworks Data Platform Develop Interact Non-Relational Database Talend Open Studio for Big Data, Sqoop, Flume) Scripting Query Management & Monitoring Services (Pig) (Hive) Data Integration Services Workflow & Scheduling (HCatalog APIs, WebHDFS, (HBase) Metadata Services (Ambari, Zookeeper) (HCatalog) (Oozie)Operate Distributed Processing Integrate (MapReduce) Distributed Storage (HDFS) Hortonworks Data Platform © Hortonworks Inc. 2012
  • 13. Apache Hadoop Release Management 1.1.1 1.1.2 Hadoop 1 1.0.1 1.0.2 1.0.3 HDP 1.0•  Apache Hadoop Release management is run by Hortonworks •  Matt Foley, Release Manager for Hadoop 1 •  Arun Murthy, Release Manager for Hadoop 2 •  Ashutosh Chauhan, Release Manager for Hive •  Daniel Dai, Release Manager for Pig •  Alan Gates, Release Manager for Hcat•  Hadoop Core releases validated (and fixed) by Hortonworks •  ~1300 end to end system tests run in house using our IP before any release can be made•  Hortonworks Data Platform is released directly from Apache Hadoop branches © Hortonworks Inc. 2012
  • 14. Full Stack High Availability© Hortonworks Inc. 2012
  • 15. Full Stack High Availability HA HA•  Failover and restart for •  NameNode •  JobTracker •  HBase and other services to come… HA Cluster Core Switch Rack Switch Rack Switch•  Open API allows use of Proven HA from multiple vendors (Red Hat & VMWare) Namenode Namenode HA Manager HA Manager•  Minimized changes to clients and configuration Job Tracker Job Tracker•  Complementary to 2.0 HA efforts HA Manager HA Manager•  Server & Operating System failure Etc. daemon Etc. daemon detection and VM restart HA Manager HA Manager•  Smart resource management ensures sufficient resources are HA Pairs available to restart VMs Addresses HA needs on stable Apache Hadoop 1.0 © Hortonworks Inc. 2012
  • 16. Capacity Scheduler Delivers Multi-tenancy• Queue definition – % of total system memory – % CPU utilization (not slot count)• Queues per team – Soft limits and hard so you can use entire cluster if available – Ownership and security built in• Proactive resource management – Lots of rules and observation points – Don’t start another task if it will blow up the node – Don’t start another task if other workloads are spinning up• Better than Fair + Preemption (HDP Supports All) – Utilization not measured by slot count (can blow up a node / cluster) – Doesn’t start all tasks automatically (proactive vs. reactive) © Hortonworks Inc. 2012
  • 17. HCatalog METADATA© Hortonworks Inc. 2012
  • 18. Metadata ServicesApache HCatalog provides flexible metadataservices across tools and external access •  Consistency of metadata and data models across tools (MapReduce, Pig, HBase and Hive) •  Accessibility: share data as tables in and out of HDFS •  Availability: enables flexible, thin-client access via REST API HCatalog Shared table and schema management •  Raw Hadoop data Table access opens the •  Inconsistent, unknown Aligned metadata platform •  Tool specific access REST API © Hortonworks Inc. 2012
  • 19. Options Lead to Complexity Feature MapReduce Pig Hive Record format Key value pairs Tuple Record Data model User defined int, float, string, int, float, string, bytes, maps, maps, structs, lists tuples, bags Schema Encoded in app Declared in script Read from or read by loader metadata Data location Encoded in app Declared in script Read from metadata Data format Encoded in app Declared in script Read from metadata•  Pig and MR users need to know a lot to write their apps•  When data schema, location, or format change Pig and MR apps must be rewritten, retested, and redeployed•  Hive users have to load data from Pig/MR users to have access to it © Hortonworks 2012
  • 20. Hadoop Ecosystem Hive Pig MapReduce (SQL) (scripting) (Java) Interface: Interface: Interface: SQL SerDe Load/Store DML Input/ Input/ Input/ OutputFormat OutputFormat OutputFormat DDL metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  • 21. Opening up Metadata to MR & Pig Hive Pig MapReduce (SQL) (scripting) (Java) HCat Metadata layer Interface: Interface: SQL HCatLoad/Store Interface: SerDe HCatInput/OutputFormat metastore dn1 dn2 dn3 . . . - tables - partitions - files . . . . . . - types . . . . dnN HDFS © Hortonworks Inc. 2012
  • 22. Tools With HCatalog Feature MapReduce + Pig + HCatalog Hive HCatalog Record format Record Tuple Record Data model int, float, string, int, float, string, int, float, string, maps, structs, lists bytes, maps, maps, structs, lists tuples, bags Schema Read from Read from Read from metadata metadata metadata Data location Read from Read from Read from metadata metadata metadata Data format Read from Read from Read from metadata metadata metadata•  Pig/MR users can read schema from metadata•  Pig/MR users are insulated from schema, location, and format changes•  All users have access to other users’ data as soon as it is committed © Hortonworks 2012
  • 23. Metadata Services applications DML Hive HCatalog DML HBase REST data stores •  ddl •  dml DML Pig create describe visualization Existing metastore Hadoop Infrastructure Cluster © Hortonworks Inc. 2012
  • 24. Services IntegrationProvides RESTful API as“front door” for Hadoop Existing & New Applications•  Opens the door to WebHDFS HCatalog RESTful Web Services languages other than Java•  Thin clients via web MapReduce Pig Hive services vs. fat-clients in HCatalog gateway•  Insulation from interface External HDFS HBase changes release to release Store Opens Hadoop to integration with existing and new applications © Hortonworks Inc. 2012
  • 25. Data Integration Services•  Intuitive graphical data integration tools for HDFS, Hive, HBase, HCatalog and Pig•  Oozie scheduling allows you to manage and stage jobs•  Connectors for any database, business application or system•  Integrated HCatalog storage Bridge the gap between legacy data & Hadoop Simplify and speed development © Hortonworks Inc. 2012
  • 26. Teradata and Hortonworks Partner to Provide the First Enterprise Reference Architecture for Hadoop and Big Data Partnership provides clear path to enterprise for Hadoop •  Reference architecture that provides guidance on best applications for Teradata, Teradata Aster, and Hadoop •  Clear partnership between industry and community leaders •  Deeper integration to ease data movement in/out of Hadoop •  Joint R&D and go-to-market© Hortonworks Inc. 2012
  • 27. AmbariCluster ProvisioningConfiguration ManagementMonitoring© Hortonworks Inc. 2012
  • 28. Ambari Architecture•  Installs your cluster onto target HW for you data and task n1 n2 n3 . . . sink Puppet . . . . . . Nagios Ganglia•  Manage, reconfigure from . . . . nN one place worker node Hadoop•  Monitor key and meaningful Hadoop metrics, not just OS / Ambari HW Nagios Ganglia Puppet controller php portal•  Scalable in line w/ Hadoop view itself operator © Hortonworks Inc. 2012
  • 29. Ambari Live Demonstration© Hortonworks Inc. 2012
  • 30. Why HDP?ONLY Hortonworks Data Platform provides…•  Tightly aligned to core Apache Hadoop development line - Reduces risk for customers who may add custom coding or projects•  Enterprise Integration - HCatalog provides scalable, extensible integration point to Hadoop data•  Most reliable Hadoop distribution - Full stack high availability on v1 delivers the strongest SLA guarantees•  Multi-tenant scheduling and resource management - Capacity and fair scheduling optimizes cluster resources•  Integration with operations, eases cluster management - Ambari is the most open/complete operations platform for Hadoop clusters © Hortonworks Inc. 2012
  • 31. Hortonworks Support SubscriptionsObjective: help organizations to successfully developand deploy solutions based upon Apache Hadoop• Full-lifecycle technical support available – Developer support for design, development and POCs – Production support for staging and production environments – Up to 24x7 with 1-hour response times• Delivered by the Apache Hadoop experts – Backed by development team that has released every major version of Apache Hadoop since 0.1• Forward-compatibility – Hortonworks’ leadership role helps ensure bug fixes and patches can be included in future versions of Hadoop projects Page 31 © Hortonworks Inc. 2012
  • 32. Hortonworks TrainingObjective: help organizations overcome Hadoopknowledge gaps• Expert role-based training for developers, administrators & data analysts – Heavy emphasis on hands-on labs – Extensive schedule of public training courses available (• Comprehensive certification programs• Customized, on-site courses available Page 32 © Hortonworks Inc. 2012
  • 33. Thank You!Questions & Answers Page 33 © Hortonworks Inc. 2012