Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2


Published on

Hortonworks continues to innovate throughout all Hadoop related projects, packaging the most enterprise-ready components, such as Ambari, into the Hortonworks Data Platform (HDP). Please join us in this interactive webinar as we present real-world use cases of Enterprise customers that are finding success with HDP and their Big Data initiatives. We will also introduce new features from version 1.2 of the Hortonworks Data Platform and how it has become the leading 100% open source distribution choice for the Enterprise.

In this webinar we will outline how enterprise customers are successfult with HDP and also review some of the newest features in version1.2 including:

-How to provision a cluster
-How to manage and monitor a cluster using completely open source tools
-How to perform diagnostics to identify issues in a cluster

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Committed to building 100% open source Hadoop for the Enterprise
  • So how does this get brought together into our distribution? It is really pretty straightforward, but also very unique:We start with this group of open source projects that I described and that we are continually driving in the OSS community. [CLICK] We then package the appropriate versions of those open source projects, integrate and test them using a full suite, including all the IP for regression testing contributed by Yahoo, and [CLICK] contribute back all of the bug fixes to the open source tree. From there, we package and certify a distribution in the from of the Hortonworks Data Platform (HDP) that includes both Hadoop Core as well as the related projects required by the Enterprise user, and provide to our customers.Through this application of Enterprise Software development process to the open source projects, the result is a 100% open source distribution that has been packaged, tested and certified by Hortonworks. It is also 100% in sync with the open source trees.
  • 100% Open Source: eliminating Lock-In
  • Quarterly Cadence: regular innovation every three monthsValidated & Tested by our ecosystem partnersEmbargo Date: January 15
  • HDP tracks closely to Apache project releasesCDH forks early and patches CDH distributions off to the side of the Apache community projects resulting in unnecessary drift and risk of lock-inThe “+923.423” and the “+541” parts of the version numbers represent how many patches these components have drifted away from corresponding Apache projects.While some drift can be expected, patches and changes that are in the order of hundreds results in lock-in and actually eliminates the virtuous cycle that upstream community should help drive.
  • I can’t really talk about Hortonworks without first taking a moment to talk about the history of Hadoop.What we now know of as Hadoop really started back in 2005, when Eric Baldeschwieler – known as “E14” – started to work on a project that to build a large scale data storage and processing technology that would allow them to store and process massive amounts of data to underpin Yahoo’s most critical application, Search. The initial focus was on building out the technology – the key components being HDFS and MapReduce – that would become the Core of what we think of as Hadoop today, and continuing to innovate it to meet the needs of this specific application.By 2008, Hadoop usage had greatly expanded inside of Yahoo, to the point that many applications were now using this data management platform, and as a result the team’s focus extended to include a focus on Operations: now that applications were beginning to propagate around the organization, sophisticated capabilities for operating it at scale were necessary. It was also at this time that usage began to expand well beyond Yahoo, with many notable organizations (including Facebook and others) adopting Hadoop as the basis of their large scale data processing and storage applications and necessitating a focus on operations to support what as by now a large variety of critical business applications.In 2011, recognizing that more mainstream adoption of Hadoop was beginning to take off and with an objective of facilitating it, the core team left – with the blessing of Yahoo – to form Hortonworks. The goal of the group was to facilitate broader adoption by addressing the Enterprise capabilities that would would enable a larger number of organizations to adopt and expand their usage of Hadoop.[note: if useful as a talk track, Cloudera was formed in 2008 well BEFORE the operational expertise of running Hadoop at scale was established inside of Yahoo]
  • In summary, by addressing these elements, we can provide an Enterprise Hadoop distribution which includes the:Core ServicesPlatform ServicesData ServicesOperational ServicesRequired by the Enterprise user.And all of this is done in 100% open source, and tested at scale by our team (together with our partner Yahoo) to bring Enterprise process to an open source approach. And finally this is the distribution that is endorsed by the ecosystem to ensure interoperability in your environment.
  • As the volume of data has exploded, we increasingly see organizations acknowledge that not all data belongs in a traditional database. The drivers are both cost (as volumes grow, database licensing costs can become prohibitive) and technology (databases are not optimized for very large datasets).Instead, we increasingly see Hadoop – and HDP in particular – being introduced as a complement to the traditional approaches. It is not replacing the database but rather is a complement: and as such, must integrate easily with existing tools and approaches. This means it must interoperate with:Existing applications – such as Tableau, SAS, Business Objects, etc,Existing databases and data warehouses for loading data to / from the data warehouseDevelopment tools used for building custom applicationsOperational tools for managing and monitoring
  • Eric and team created the Hadoop project as open source, and that is and always will be central to our approach. We believe strongly that the technology needs to be community driven and open source.In terms of open source mechanics, Apache Hadoop is governed by the Apache Software Foundation which provides structure to what inside a commercial software company would be a tightly governed process around the development, test and release process. When we think of Core Hadoop, the ASF has helped to manage this process for several years now.However as Hadoop has become more widely used, it has spawned a set of ancillary open source projects that introduce capabilities required for more mainstream use. These projects are generally classified as either being related to:“Data Services” – those that enable the Storage, Processing, and Accessing of data“Operational Services” – those that enable the management and operations of the infrastructureThe projects within these categories are run as independent projects with their own teams, and include some of the technologies you likely know of: Data Services include projects such as Hive, Pig, Hbase and Hcatalog, while Operational Services include Apache Ambari and more.Hortonworkers have always played a critical role in the development, test and release process for Core Apache Hadoop but also play leading roles in these ancillary projects that are required for enterprise usage. This includes every role from committer, release manager, and in many cases, the project leads. For example Arun Murthy is the project lead for Core Hadoop.Current Hortonworks PMC members by project:Hadoop:  Arun Murthy, Deveraj Das, EnisSoztutar, GiridharanKesavan, JitendraNathPandy, MahadevKonar, Matt Foley, Owen O'Malley, Sanjay Radia, Suresh Srinivas, Nicholas Sze, Vinod Kumar VavilapalliPig:  Daniel Dai, Alan Gates, GiridharanKesavan, AshutoshChauhan, Thejas NairHive:  AshutoshChauhanHBase:  NoneOozie:  Deveraj Das, Alan GatesSqoop:  NoneFlume:  NoneBigtop:  Alan Gates, Steve Loughran, Owen O'MalleyIncubator (not a Hadoop project but shows who's helping grow new projects in Apache):  Arun Murthy, Deveraj Das, Alan Gates, MahadevKonar, Steve Loughran, Owen O'Malley, EnisSoztutar
  • We are believers in open source: for us, we believe it is the most efficient way to develop enterprise softwareBut more importantly, we believe that 100% open source is the best approach for our customers. And in particular in the data management market, our customers are acutely aware of the implication of growing their database usage with a proprietary vendor who then can exert pricing pressure (Oracle).Particularly when it comes to data storage, which we can all anticipate will continue to grow exponentially, you don’t want to be penalized for scale. By choosing an open source approach organizations can build their operational processes on open technologies, without concern that they will be locked in to a particular vendor. And they can be confident that as their usage grows, they can choose from flexible pricing alternatives – by node or by storage – that aligns best to their needs.It is ultimately about mitigating risk, and in this regard open source has been proven as the safest approach. I would also caution you to look beyond the open source label used by some vendors: are they harvesting open source work, forking the code and then working independently (“fork early / patch often”)? Or like Hortonworks, have they embraced and committed to the community open source approach which will allow them to stay in sync with the innovation of the community? In the Hadoop community, Hortonworks is unquestioned in taking the community-driven approach.
  • Hadoop Operations, Innovations and Enterprise Readiness with Hortonworks Data Platform v1.2

    1. 1. Hadoop Operations &Enterprise ReadinessHDP 1.2Jim WalkerJeff Sposetti© Hortonworks Inc. 2013 Page 1
    2. 2. Hortonworks Snapshot We develop, distribute and support the ONLY 100% open source Enterprise Hadoop distributionDevelop Distribute Support• We employ the core • We distribute the only 100% • We are uniquely positioned architects, builders and Open Source Enterprise to deliver the highest quality operators of Apache Hadoop Hadoop Distribution: of Hadoop support Hortonworks Data• We drive innovation within Platform • We enable the ecosystem to Apache Software work better with Hadoop Foundation projects • We engineer, test & certify HDP for enterprise usageEndorsed by Strategic Partners Page 2 © Hortonworks Inc. 2013
    3. 3. Hortonworks Process for Enterprise HadoopUpstream Community Projects Downstream Enterprise Product Virtuous cycle when development & fixed issues done upstream & stable project releases flow downstream Integrate & Test Fixed Issues Apache Design & Pig Test & Patch Develop Apache Release Package Hadoop & Certify Apache Stable Project Hortonworks Hive Releases Design & Develop Data Platform Apache Apache HCatalo HBase g Distribute Apache Other Ambari Apache Projects No Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projects Page 3 © Hortonworks Inc. 2013
    4. 4. Hortonworks Data Platform 1.2• Quarterly cadence – HDP is aligned tightly with the open source community software releases, not a patchwork – Regular open source innovation based on an open community• Ecosystem validation – Packaged and tested with our key development partner, Yahoo! across hundreds of nodes – Ambari is the preferred management tool for integration with of Microsoft System Center and Teradata Viewpoint, today. Page 4 © Hortonworks Inc. 2013
    5. 5. HDP 1.2 SummaryHortonworks Data Platform 1.2Hortonworks Data Platform outpaces the competition to extendleadership through 100% open source Enterprise Apache HadoopFocus areas: 1. Ambari: continued innovation with a complete, free and open cluster management tool • Existing: Provision, Manage and Monitor your Hadoop infrastructure • New: Root Cause Analysis with job diagnostics, usage heat maps, • Improved: Ecosystem integration and user interface 2. Enhanced security model and performance for Hive and HCatalog 3. Apache Mahout: now included in the HDP distribution Page 5 © Hortonworks Inc. 2013
    6. 6. HDP Certifies Latest Stable Components Apache HDP CDH CDH Project 1.2 3u5 4.1.2 Hadoop 1.1.2 020.2 +923.418 2.0.0alpha +541 Pig 0.10.1 0.8.1 +51.39 0.10.0 +48 Hive 0.10.0 0.7.1 +42.56 0.9.0 +148 HCatalog 0.5.0 n/a n/a HBase 0.94.2 0.90.6 +84.73 0.92.1 +154 Sqoop 1.4.2 1.3.0 +5.88 1.4.1 +51 Oozie 3.2.0 3.2.0 3.2.0 Zookeeper 3.4.5 3.3.5 +19.5 3.4.3 +25 Ambari 1.2.0 n/a n/a Flume 1.3.0 0.9.4 +25.46 1.2.0 +119 Mahout 0.7.0 0.5 +9.7 0.7 +4 Source: http://files.cloudera.com/pdf/datasheet/cdh4.1_spec_sheet.pdf Page 6 © Hortonworks Inc. 2013
    7. 7. A Brief History of Apache Hadoop Apache Project Yahoo! begins to Hortonworks Established Operate at scale Data Platform 2013 2004 2006 2008 2010 2012 Enterprise Hadoop2005: Yahoo! creates team under E14 to Focus on INNOVATION work on Hadoop 2008: Yahoo team extends focus to operations to support multiple Focus on OPERATIONS projects & growing clusters 2011: Hortonworks created to focus on “Enterprise Hadoop“. Starts with 24 STABILITY key Hadoop engineers from Yahoo Page 7 © Hortonworks Inc. 2013
    8. 8. HDP: Enterprise Hadoop Distribution OPERATIONAL DATA Hortonworks SERVICES SERVICES Data Platform (HDP) Manage & Store, Operate at Process and Enterprise Hadoop Scale Access Data • The ONLY 100% open source HADOOP CORE Distributed and complete distribution Storage & Processing PLATFORM SERVICES Enterprise Readiness • Enterprise grade, proven and tested at scale HORTONWORKS DATA PLATFORM (HDP) • Ecosystem endorsed to ensure interoperability Page 8 © Hortonworks Inc. 2013
    9. 9. Next-Generation Data ArchitectureAPPLICATIONS Business Custom Enterprise Analytics Applications Applications DEV & DATA TOOLS BUILD & TESTDATA SYSTEMS OPERATIONAL TOOLS HORTONWORKS MANAGE & DATA PLATFORM MONITOR RDBMS EDW MPP TRADITIONAL REPOSDATA SOURCES Traditional Sources New Sources OLTP, (RDBMS, OLTP, OLAP) (web logs, email, sensor data, social media) MOBILE POS DATA SYSTEMS Page 9 © Hortonworks Inc. 2013
    10. 10. HDP 1.2: Operational Services Improvements OPERATIONAL DATA Apache Ambari 1.2 SERVICES SERVICES Hortonworks open source approach continues to accelerate Manage & AMBARI Store, Operate at Process and enterprise adoption of Hadoop Scale Access Data OOZIE – Open Source Approach The only 100% open source Apache Distributed Hadoop cluster management tool HADOOP CORE Storage & Processing – Baseline Features Enterprise Readiness Delivers all necessary tools/functions PLATFORM SERVICES High Availability, Disaster Recovery, to provision, manage and monitor a Snapshots, Security, etc… Apache Hadoop cluster HORTONWORKS – Innovation Provides ability to zoom into cluster DATA PLATFORM (HDP) usage and performance metrics for jobs and tasks to identify root cause of bottlenecks or operations issues – Interoperable Includes APIs for integrating with Microsoft System Center, Teradata Viewpoint, and other systems © Hortonworks Inc. 2013 Also Upgraded Oozie & Zookeeper 10 Page
    11. 11. HDP 1.2: New Ambari Features • Job Diagnostics Visualize and troubleshoot Hadoop job execution and performance • Cluster History View historical job execution & performance • REST interface provides external access to Ambari for existing tools. Facilitates integration with Microsoft System Center and Teradata Viewpoint • Instant Insight View health of Core Hadoop (HDFS, MapReduce) and related projects • Cluster NavigationApache Ambari Dashboard “Quick link” buttons jump into namenode web UI for a server Page 11 © Hortonworks Inc. 2013
    12. 12. Demo Page 12 © Hortonworks Inc. 2013
    13. 13. HDP 1.2: Platform Service Improvements OPERATIONAL DATA Security SERVICES SERVICES Extend platform services for security, a KEY requirement for Manage & Store, Operate at Process and enterprise adoption of Hadoop Scale Access Data – Enhanced security architecture & pluggable authentication model controls access to Hive tables and Distributed HADOOP CORE Storage & Processing metastore – Aligns and improves Hive & HCatalog Enterprise Readiness PLATFORM SERVICES High Availability, Disaster Recovery, authentication models Snapshots, Security, etc… HORTONWORKS High Availability DATA PLATFORM (HDP) Full stack HA on Hadoop 1.0 – Extended HA to Hive & HCatalog Metastore Page 13 © Hortonworks Inc. 2013
    14. 14. HDP 1.2: Data Services Improvements Data Services Updates OPERATIONAL DATA SERVICES SERVICES – Upgraded Pig, and Flume FLUME PIG HIVE – Added Mahout (0.7.0) to distribution Manage & Store, Operate at MAHOUT Process and HBASE Scale SQOOP Access Data HCATALOG Hive, HCatalog & HBase Continue to innovate & improve the data Distributed services with open source contributions HADOOP CORE Storage & Processing to HCatalog, Hive and HBase Enterprise Readiness – Concurrency improvements for Hive PLATFORM SERVICES High Availability, Disaster Recovery, and consistent security for Hive & Snapshots, Security, etc… HCatalog HORTONWORKS – Performance and operational enhancements for HBase DATA PLATFORM (HDP) – Improved Java developer productivity via certified Cascading framework Page 14 © Hortonworks Inc. 2013
    15. 15. Page 15© Hortonworks Inc. 2013
    16. 16. Page 16© Hortonworks Inc. 2013
    17. 17. Apache Community Leadership Apache Apache Software Foundation Pig Test & Guiding Principles Patch Release Apache • Release early & often Hadoop Apache • Transparency, respect, meritocracy Hive Design & Develop Apache Key Roles held by Hortonworkers Apache HBase HCatalo g • PMC Members – Managing community projects Apache Ambari – Mentoring new incubator projects Other Apache – About 20 Hortonworkers managing community Projects • Committers – Authoring, reviewing & editing code – About 50 Hortonworkers across projects“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’s more innovative features. These • Release Managers include YARN, Ambari and HCatalog..” – Testing & releasing projects – Hortonworkers across key projects like Hadoop, - Jeff Kelly: Wikibon Hive, Pig, HCatalog, Ambari, HBase Page 17 © Hortonworks Inc. 2013
    18. 18. True Enterprise Class Open Source• 100% Open Source. No Holdbacks. – Only true implementation of OSS Apache Hadoop – Preferred by the software vendors that you rely on• Flexible Deployment – No License Fee for usage• Community Open Source Mitigates Lock-In – Proprietary Open Source = Lock-In – Open communities always trump “open source” Page 18 © Hortonworks Inc. 2013
    19. 19. THANK YOU!! Download Hortonworks Sandbox www.hortonworks.com/sandbox Download Hortonworks Data Platform www.hortonworks.com/download Register for Enterprise Hadoop Series www.hortonworks.com/webinars @hortonworks Follow US! @jaymce @jsposetti Page 19 © Hortonworks Inc. 2013