Mrinal devadas, Hortonworks Making Sense Of Big Data


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • In that capacity,Arun allows Hortonworks to be instrumental in working with the community to drive the roadmap for Core Hadoop, where the focus today is on things like YARN, MapReduce2, HDFS2 and more.For Core Hadoop, in absolute terms, Hortonworkers have contributed more than twice as many lines of code as the next closest contributor, and even more if you include Yahoo, our development partner. Taking such a prominent role also enables us to ensure that our distribution integrates deeply with the ecosystem: on both choice of deployment platforms such as Windows, Azure and more, but also to create deeply engineered solutions with key partners such as Teradata.And consistent with our approach, all of this is done in 100% open source.
  • Mrinal devadas, Hortonworks Making Sense Of Big Data

    1. 1. © Hortonworks Inc. 2013HortonworksCommunity DrivenEnterprise Apache HadoopMrinal DevadasSystems Architectmdevadas@hortonworks.comPage 1
    2. 2. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our Approach• Patterns of UsePage 2
    3. 3. © Hortonworks Inc. 2013A Brief History of Apache HadoopPage 32013Focus on INNOVATION2005: Yahoo! createsteam under E14 towork on HadoopFocus on OPERATIONS2008: Yahoo team extends focus tooperations to support multipleprojects & growing clustersYahoo! begins toOperate at scaleEnterpriseHadoopApache ProjectEstablishedHortonworksData Platform2004 2008 2010 20122006STABILITY2011: Hortonworks created to focus on“Enterprise Hadoop“. Starts with 24key Hadoop engineers from Yahoo
    4. 4. © Hortonworks Inc. 2013Hortonworks SnapshotPage 4• We distribute the only 100%Open Source EnterpriseHadoop Distribution:Hortonworks DataPlatform• We engineer, test & certifyHDP for enterprise usage• We employ the corearchitects, builders andoperators of Apache Hadoop• We drive innovation withinApache SoftwareFoundation projects• We are uniquely positionedto deliver the highest qualityof Hadoop support• We enable the ecosystem towork better with HadoopDevelop Distribute SupportWe develop, distribute and supportthe ONLY 100% open sourceEnterprise Hadoop distributionEndorsed by Strategic PartnersHeadquarters: Palo Alto, CAEmployees: 200+ and growingInvestors: Benchmark, Index, Yahoo
    5. 5. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our approach– Leading Open Source Hadoop innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source• Patterns of UsePage 5
    6. 6. © Hortonworks Inc. 2013Page 6Apache Software FoundationGuiding Principles• Release early & often• Transparency, respect, meritocracyKey Roles held by Hortonworkers• PMC Members– Managing community projects– Mentoring new incubator projects– Over 20 Hortonworkers managing community• Committers– Authoring, reviewing & editing code– Over 50 Hortonworkers across projects• Release Managers– Testing & releasing projects– Hortonworkers across key projects like Hadoop,Hive, Pig, HCatalog, Ambari, HBaseApacheHadoopTest &PatchDesign & DevelopReleaseApachePigApacheHCatalogApacheHBaseOtherApacheProjectsApacheHiveApacheAmbari“We have noticed more activity over the last yearfrom Hortonworks’ engineers on building outApache Hadoop’s more innovative features. Theseinclude YARN, Ambari and HCatalog..”- Jeff Kelly: WikibonApache Community Leadership
    7. 7. © Hortonworks Inc. 2013Leadership that Starts at the CorePage 7• Driving next generation Hadoop– YARN, MapReduce2, HDFS2, HighAvailability, Disaster Recovery• 420k+ lines authored since 2006– More than twice nearest contributor• Deeply integrating w/ecosystem– Enabling new deployment platforms– (ex. Windows & Azure, Linux & VMware HA)– Creating deeply engineered solutions– (ex. Teradata big data appliance)• All Apache, NO holdbacks– 100% of code contributed to Apache
    8. 8. © Hortonworks Inc. 2013Driving Enterprise Hadoop InnovationPage 8HortonworksCommittersClouderaCommitters19 86 15 05 916 00% 20% 40% 60% 80% 100%AMBARIHBASEHIVE/HCATALOGPIGHADOOPCORELines Of Code By CompanySource: Apache Software FundationHortonworks Yahoo! Cloudera Other
    9. 9. © Hortonworks Inc. 2013Hortonworks Process for Enterprise HadoopPage 9Upstream Community Projects Downstream Enterprise ProductHortonworksData PlatformDesign &DevelopDistributeIntegrate& TestPackage& CertifyApacheHCatalogApachePigApacheHBaseOtherApacheProjectsApacheHiveApacheAmbariApacheHadoopTest &PatchDesign & DevelopReleaseVirtuous cycle when development & fixed issues done upstream & stable project releases flow downstreamNo Lock-in: Integrated, tested & certified distribution lowers risk by ensuring close alignment with Apache projectsStable ProjectReleasesFixed Issues“We have noticed more activity over the last year from Hortonworks’ engineers on building out Apache Hadoop’smore innovative features. These include YARN, Ambari and HCatalog.” - Jeff Kelly: Wikibon
    10. 10. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our approach– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring NO LOCK-IN: 100% Open Source• Patterns of usePage 10
    11. 11. © Hortonworks Inc. 2013Enhancing the Core of Apache HadoopDeliver high-scalestorage & processingwith enterprise-readyplatform servicesUnique Focus Areas:• Bigger, faster, more flexibleContinued focus on speed & scale andenabling near-real-time apps• Tested & certified at scaleRun ~1300 system tests on large Yahooclusters for every release• Enterprise-ready servicesHigh availability, disasterrecovery, snapshots, security, …Page 11HADOOP COREHortonworkers are thearchitects, operators, and builders ofcore HadoopDistributedStorage & ProcessingPLATFORM SERVICES Enterprise Readiness
    12. 12. © Hortonworks Inc. 2013Page 12HADOOP COREDATASERVICESProvide data services tostore, process & accessdata in many waysUnique Focus Areas:• Apache HCatalogMetadata services for consistent tableaccess to Hadoop data• Apache HiveExplore & process Hadoop data via SQL &ODBC-compliant BI toolsDistributedStorage & ProcessingHortonworks enables Hadoop data to beaccessed via existing tools & systemsStore, Process and AccessDataPLATFORM SERVICES Enterprise ReadinessData Services for Full Data Lifecycle
    13. 13. © Hortonworks Inc. 2013Operational Services for Ease of UsePage 13OPERATIONALSERVICESInclude completeoperational services forproductive operations& managementUnique Focus Area:• Apache Ambari:Provision, manage & monitor a cluster;complete REST APIs to integrate withexisting operational tools; job & taskvisualizer to diagnose issuesOnly Hortonworks provides a completeopen source Hadoop management toolManage &Operate atScaleDATASERVICESStore, Process and AccessDataHADOOP COREDistributedStorage & ProcessingPLATFORM SERVICES Enterprise Readiness
    14. 14. © Hortonworks Inc. 2013OS Cloud VM AppliancePage 14PLATFORM SERVICESHADOOP COREDATASERVICESOPERATIONALSERVICESManage &Operate atScaleStore, Process and AccessDataEnterprise ReadinessOnly Hortonworksallows you to deployseamlessly across anydeployment option• Linux & Windows• Azure, Rackspace & other clouds• Virtual platforms• Big data appliancesHORTONWORKSDATA PLATFORM (HDP)DistributedStorage & ProcessingDeployable Across a Range of Options
    15. 15. © Hortonworks Inc. 2013OS Cloud VM ApplianceHDP: Enterprise Hadoop DistributionPage 15PLATFORM SERVICESHADOOP COREDATASERVICESOPERATIONALSERVICESManage &Operate atScaleStore, Process and AccessDataHORTONWORKSDATA PLATFORM (HDP)DistributedStorage & ProcessingHortonworksData Platform (HDP)Enterprise Hadoop• The ONLY 100% open sourceand complete distribution• Enterprise grade, proven andtested at scale• Ecosystem endorsed toensure interoperabilityEnterprise Readiness
    16. 16. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our approach– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-in: 100% Open Source• Patterns of usePage 16
    18. 18. © Hortonworks Inc. 2013Next-Generation Data ArchitectureAPPLICATIONSDATASYSTEMSTRADITIONAL REPOSRDBMS EDW MPPDATASOURCESOLTP, POSSYSTEMSOPERATIONALTOOLSMANAGE &MONITORTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensors, social media)DEV & DATATOOLSBUILD &TESTBusinessAnalyticsCustomApplicationsEnterpriseApplicationsENTERPRISEHADOOP PLATFORMPage 18
    19. 19. © Hortonworks Inc. 2013Interoperating With Your ToolsPage 19APPLICATIONSDATASYSTEMSTRADITIONAL REPOSDEV & DATATOOLSOPERATIONALTOOLSViewpointMicrosoft ApplicationsHORTONWORKSDATA PLATFORMDATASOURCESTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensors, social media)
    20. 20. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our approach– Leading Open Source Hadoop Innovation– Addressing “Enterprise Hadoop” Requirements– Enabling Interoperability of the Ecosystem– Ensuring No Lock-In: 100% Open Source• Patterns of usePage 20
    21. 21. © Hortonworks Inc. 2013True Enterprise Class Open Source• Community-driven Approach Mitigates Lock-In–Identify & introduce enterprise requirements into public domain–Work with community to advance & incubate open source projects–Apply Enterprise Rigor for the most stable and reliable distribution• 100% Open Source. No Holdbacks.–Only true implementation of OSS Apache Hadoop–Preferred by the software vendors that you rely on–Proprietary Open Source = Lock-In–Open communities always trump “open source”• Flexible Deployment–No License Fee for usagePage 21
    22. 22. © Hortonworks Inc. 2013Hortonworks• Who is Hortonworks• Our approach• Patterns of usePage 22
    23. 23. © Hortonworks Inc. 2013Big DataTransactions, Interactions, ObservationsHadoop Common Patterns of UseBusiness CasesHORTONWORKSDATA PLATFORMRefine Explore EnrichBatch Interactive Online“Right-time” Access to DataPage 23
    24. 24. © Hortonworks Inc. 2013Operational Data RefineryDATASYSTEMSDATASOURCES131 CaptureProcessDistribute & Retain23Refine ExploreEnrich2APPLICATIONSTransform & refine ALLsources of dataAlso known as DataReservoir or Catch BasinTRADITIONAL REPOSRDBMS EDW MPPBusinessAnalyticsCustomApplicationsEnterpriseApplicationsTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensor data, social media)Page 24HORTONWORKSDATA PLATFORM
    25. 25. © Hortonworks Inc. 2013Big Data Exploration & VisualizationDATASYSTEMSDATASOURCESRefine Explore EnrichAPPLICATIONSLeverage “data lake”to perform iterativeinvestigation for value32TRADITIONAL REPOSRDBMS EDW MPP1BusinessAnalyticsTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensor data, social media)CustomApplicationsEnterpriseApplications1 CaptureProcessExplore & Visualize23Page 25HORTONWORKSDATA PLATFORM
    26. 26. © Hortonworks Inc. 2013DATASYSTEMSDATASOURCESRefine Explore EnrichAPPLICATIONSCreate intelligentapplicationsCollect data, createanalytical models anddeliver to online apps312TRADITIONAL REPOSRDBMS EDW MPPTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensor data, social media)CustomApplicationsEnterpriseApplicationsNOSQL1 CaptureProcess & ComputeDeliver Model23Page 26Application EnrichmentHORTONWORKSDATA PLATFORM
    27. 27. © Hortonworks Inc. 2013Flexible Support Subscription ProgramsLeverage Hortonworks Expertise: Subscription and Support delivered andbacked by Hadoop experts; subscriptions based on nodes or storagePage 27Developer Support“How to” guidance fordevelopers and archsEssential Support*Operations support forsmall research clustersStandard SupportOperations support fordev & test clusters12 x 5Web only12 x 5Web onlyAll Sev:1 business dayAll Sev:1 business day12 x 5Web onlyApplicationDesign AdviceCode ReviewClusterDesign, Install, Maintain, PerformanceClusterDesign, Install, Maintain, PerformanceAll Sev:1 business day1 seat3Contacts3ContactsPatches &UpdatesPatches &Updates* Limited in size and no expansionEnterprise SupportOperations support forcritical clusters24 x 7Phone &WebSev 1: 1 HourSev 2: 4 Bus HourClusterDesign, Install, Maintain, Performance5ContactsPatches &UpdatesAdditional Options
    28. 28. © Hortonworks Inc. 2013Hortonworks: Best In Class Hadoop Support• Experienced enterprise support team– Experience supporting enterprise clients in production– Core engineers have real operationalexperience: built and supported 44+K nodes in production– Extensive experience in commercial big data offeringsincluding HDP, MapR, Karmasphere• Global 24x7 operation – support based in Sunnyvale, UK & India• Stringent case management processes ensures high quality customerservice & responsivenessPage 28
    29. 29. © Hortonworks Inc. 2013Transferring Our Hadoop Expertise to YouThe expert source forApache Hadoop training & certification• World class training programs designed tohelp you learn fast– Role-based hands on classes with 50% lab time• Expert consulting services– Programs designed to transfer knowledge• Industry leading Hadoop Sandbox program– Fastest way to learn Apache Hadoop– Multi-level tutorials for wide applicability– Customizable and updateablePage 29
    30. 30. © Hortonworks Inc. 2013Introducing Hortonworks Data Platform for WindowsEnterprise Apache HadoopMarch 2013Page 30
    31. 31. © Hortonworks Inc. 2013Why Apache Hadoop on Windows?• According to IDC Windows Server held 73% market share in 2012– Hadoop was traditionally built for Linux servers so there are a large number of underservedorganizations• According to 2012 Barclays CIO study big data outranksvirtualization as #1 trend driving spending initiatives– Unstructured data growth exceeds 80% year/year in most enterprises• Apache Hadoop is the defacto big data platformfor processing massive amounts of unstructured data– Complementary to existing Microsoft technologies– There is a huge untapped community of Windows developers and ecosystem partners• A strong Microsoft-Hortonworks partnership and 18 months ofdevelopment makes this a natural next stepPage 31
    32. 32. © Hortonworks Inc. 2013Hortonworks Data Platform for Windows• Enterprise-grade Apache Hadoop on Windows– Enables same experience for Hadoop on Windows & Linux• More partners, more developers for Hadoop– Makes native Apache Hadoop available to Windows ecosystem– More options for Windows focused organizations• Hortonworks focus: Enterprise Apache Hadoop for all platforms– Trusted reliable production-ready distribution for on-premise Hadoop on Windowsdeployments• Built with joint investment and contributions from Microsoft– Deep engineering relationship ensures tight integration and maximum performancePage 32HDP is the first and only distribution available on Windows & Linux
    33. 33. © Hortonworks Inc. 2013Seamless Interoperability with Your Microsoft Tools• Integrated with Microsoft toolsfor native big data analysis– Bi-directional connectors for SQLServer and SQL Azure through SQOOP– Excel ODBC integration through Hive• Addressing demand for Hadoopon Windows– Ideal for Windows customers withHadoop operational experience• Enables most common Hadoopworkloads in the Enterprise– Data refinement and ETL offload forhigh-volume data landing– Data exploration for discovery of newbusiness opportunities– Data enrichment for fined tuned deliveryand recommendation enginesPage 33APPLICATIONSDATASYSTEMSMicrosoft ApplicationsHORTONWORKSDATA PLATFORMFor WindowsDATASOURCESMOBILEDATAOLTP, POSSYSTEMSTraditional Sources(RDBMS, OLTP, OLAP)New Sources(web logs, email, sensor data, social media)
    34. 34. © Hortonworks Inc. 2013Inside HDP for WindowsPage 34HORTONWORKSDATA PLATFORM (HDP)For WindowsHortonworksData Platform (HDP)For Windows• 100% Open SourceEnterprise Hadoop• Component and versioncompatible with HDInsight• Availability• Beta release available nowPLATFORM SERVICESHADOOP COREDistributedStorage & ProcessingHDFSWEBHDFSMAP REDUCEDATASERVICESStore, Process and AccessDataHCATALOGHIVEPIGSQOOPOPERATIONALSERVICESManage &Operate atScaleOOZIE
    35. 35. © Hortonworks Inc. 2013Maximize Your Hadoop Deployment Choice• Use HDP for Windows for on-premises deployment on Windows Server– Ideal for Windows users with Hadoop experience– Perfect next step for those who are ready to move from POC to production• Use HDInsight for Microsoft tooling and Management and Provisioning– HDInsight Service that offers full benefit of Windows Azure (e.g. elasticity & low cost) –available in Preview today– HDInsight Server for full integration of Hadoop with Microsoft tools on premises –Developer Preview available today• Full interoperability and deployment choice across platforms– Implement big data applications that run on-premise & cloud– By leveraging open source HDP, enables seamless interoperability acrossenvironments: Linux, Windows, Windows AzurePage 35
    36. 36. © Hortonworks Inc. 2013Summary• Leading the Innovation in Core Hadoop• Addressing the requirements for Enterprise usage• Enabling interoperability of the ecosystem• No lock-in. 100% Open Source.• Best in industry support with flexible pricing model• Find out more–– 36