HortonworksEric Baldeschwieler, Co-Founder and CEOSeptember 2011Overview for Cowen Big Data Day 2011© Hortonworks Inc. 2011
AgendaHortonworksApache HadoopUse casesHadoop in the EnterpriseMarketStrategy2© Hortonworks Inc. 2011
About Hortonworks – BasicsFounded – July 1st, 2011 22 architects & committers from Yahoo!Mission – Architect the future of Big Data Revolutionize and commoditize the storage and processing of Big Data via open sourceVision – Half of the worlds data will be stored in Hadoop within five years3© Hortonworks Inc. 2011
About Hortonworks – Game PlanSupport the growth of a huge Apache Hadoop ecosystemInvest in ease of use, management, and other enterprise featuresDefine APIs for ISVs, OEMs and others to integrate with Apache HadoopContinue to invest in advancing the Hadoop core, remain the expertsContribute all of our work to ApacheProfit by providing training & support to the Hadoop community4© Hortonworks Inc. 2011
CredentialsTechnical: key architects and committers from Yahoo! Hadoop engineering teamDelivered every major Apache Hadoop release since 0.1Highest concentration of Apache Hadoop committersDriving innovation across entire Apache Hadoop stackExperience managing world’s largest deploymentAccess to Yahoo!’s 1,000+ users and 42k+ nodes for testing, QA, etc.Business operations: team of highly successful open source veteransLed by Rob Bearden, former COO of SpringSource & JBossInvestors: backed by Benchmark Capital and Yahoo!5© Hortonworks Inc. 2011
What is Apache Hadoop?Set of open source projects Owned by Apache Software FoundationTransforms commodity hardware into a service that:Stores petabytes of data reliably (HDFS)Allows huge distributed computations (MapReduce)Key attributes:Redundant and reliableDoesn’t stop or lose data even if hardware failsEasy to programExtremely powerfulAllows the development of big data algorithms & toolsBatch processing centric Runs on commodity hardwareComputers & network6© Hortonworks Inc. 2011
Typical Hadoop Applications7data analyticsadvertising optimizationmachine learning search rankingMail anti-spamadvertising data systemsaudience, ad and search pipelinesad selectionWebsite personalizationContent Optimizationad inventory predictionuser interest prediction© Hortonworks Inc. 2011
Who Builds Hadoop?Lines of code contributed since Hadoop inception8© Hortonworks Inc. 2011
Who Builds Hadoop?Lines of code contributed in 20119© Hortonworks Inc. 2011
, early adopters Scale and productize Hadoop2006 – presentOther Internet CompaniesAdd tools / frameworks, enhance Hadoop2008 – presentService Providers Provide training, support, hosting 2010 – presentApache HadoopA Brief HistoryNascent / 2011Wide Enterprise Adoption Funds further development, enhancements10© Hortonworks Inc. 2011
HADOOP @ YAHOO!40K+ Servers170 PB Storage5M+ Monthly Jobs1000+ Active users© Yahoo 201111
CASE STUDYYAHOO! HOMEPAGE twice the engagementPersonalizedfor each visitorResult: twice the engagementNews InterestsTop SearchesRecommended links+43% clicksvs. editor selected+79%  clicksvs. randomly selected+160% clicksvs. one size fits all© Yahoo 201112
CASE STUDYYAHOO! HOMEPAGE SCIENCE        HADOOP          CLUSTERServingMaps
Users - Interests
Five Minute Production
Weekly Categorization models»Machine learning to build ever better categorization modelsCATEGORIZATIONMODELS (weekly)USERBEHAVIOR   PRODUCTION        HADOOP          CLUSTER»Identify user interests using Categorization modelsSERVINGMAPS(every 5 minutes)USERBEHAVIORBuild customized home pages with latest data (thousands / second)SERVING SYSTEMSENGAGED USERS© Yahoo 20111313
CASE STUDYYAHOO! MAILEnabling quick response in the spam arms raceSCIENCE450M mail boxes
5B+ deliveries/day
Antispam models retrained	every few hours on HadoopPRODUCTION“40% less spam than Hotmail and 55% less spam than Gmail“© Yahoo 20111414
Hadoop in the Enterprise© Hortonworks Inc. 201115
Big Data PlatformsCost per TB, AdoptionSize of bubble = cost effectiveness of solutionSource: 16© Hortonworks Inc. 2011
Traditional Enterprise ArchitectureData Silos + ETL17Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
Hadoop Enterprise ArchitectureConnecting All of Your Big Data 18Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsApache HadoopEsTsL (s = Store) Custom AnalyticsTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
Hadoop Enterprise ArchitectureConnecting All of Your Big Data 19Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsApache HadoopEsTsL (s = Store) Custom AnalyticsGartner predicts 800% data growth over next 5 years80-90% of data produced today is unstructuredTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
The Hadoop Market© Hortonworks Inc. 201120
Market Drivers for Apache HadoopBusiness driversIdentified high value projects that require use of more dataBelief that there is great ROI in mastering big dataFinancial driversGrowing cost of data systems as proportion of IT spendCost advantage of commodity hardware + open source Enables departmental-level big data strategies Technical driversExisting solutions failing under growing requirements3Vs - Volume, velocity, varietyProliferation of unstructured data21Significant opportunity for Hadoop in enterprise data architectures© Hortonworks Inc. 2011
Market Opportunity for HadoopCurrentApache Hadoop can become de facto platform for managing unstructured data in the enterpriseEnable new breed of applications to be built on top of Apache HadoopFutureHadoop becomes the next generation enterprise data architecture22© Hortonworks Inc. 2011
Market DynamicsTechnology & knowledge gaps are preventing Apache Hadoop from becoming an enterprise standardDifficult to install and deploy Hadoop projects Lack of technical content to assistDemand for knowledgeable developers far exceeds supplyVirtually every F500 company is constructing a Hadoop strategyBut most are still in POC/experimentation phase with HadoopTop ISV/OEMs working to create Hadoop strategiesDriven by customer demandCommunity is becoming increasingly confused by all of the noiseMultiple distributions, many vendor announcementsFear of market fragmentation23© Hortonworks Inc. 2011
ConclusionThere is not a Hadoop market to “win” todayMost organizations haven’t moved to full-scale productionLack of mass adoption limiting short-term monetization opportunitiesNeed to drive Apache Hadoop as a unifying standardIn order to succeed, we need to enable the marketContinue investment to overcome technology gapsEnable a vibrant partner ecosystemExpand availability of content and services to address knowledge gaps                            How will Hortonworks do that?24© Hortonworks Inc. 2011
Hortonworks Strategy© Hortonworks Inc. 201125

Hortonworks for Financial Analysts Presentation

  • 1.
    HortonworksEric Baldeschwieler, Co-Founderand CEOSeptember 2011Overview for Cowen Big Data Day 2011© Hortonworks Inc. 2011
  • 2.
    AgendaHortonworksApache HadoopUse casesHadoopin the EnterpriseMarketStrategy2© Hortonworks Inc. 2011
  • 3.
    About Hortonworks –BasicsFounded – July 1st, 2011 22 architects & committers from Yahoo!Mission – Architect the future of Big Data Revolutionize and commoditize the storage and processing of Big Data via open sourceVision – Half of the worlds data will be stored in Hadoop within five years3© Hortonworks Inc. 2011
  • 4.
    About Hortonworks –Game PlanSupport the growth of a huge Apache Hadoop ecosystemInvest in ease of use, management, and other enterprise featuresDefine APIs for ISVs, OEMs and others to integrate with Apache HadoopContinue to invest in advancing the Hadoop core, remain the expertsContribute all of our work to ApacheProfit by providing training & support to the Hadoop community4© Hortonworks Inc. 2011
  • 5.
    CredentialsTechnical: key architectsand committers from Yahoo! Hadoop engineering teamDelivered every major Apache Hadoop release since 0.1Highest concentration of Apache Hadoop committersDriving innovation across entire Apache Hadoop stackExperience managing world’s largest deploymentAccess to Yahoo!’s 1,000+ users and 42k+ nodes for testing, QA, etc.Business operations: team of highly successful open source veteransLed by Rob Bearden, former COO of SpringSource & JBossInvestors: backed by Benchmark Capital and Yahoo!5© Hortonworks Inc. 2011
  • 6.
    What is ApacheHadoop?Set of open source projects Owned by Apache Software FoundationTransforms commodity hardware into a service that:Stores petabytes of data reliably (HDFS)Allows huge distributed computations (MapReduce)Key attributes:Redundant and reliableDoesn’t stop or lose data even if hardware failsEasy to programExtremely powerfulAllows the development of big data algorithms & toolsBatch processing centric Runs on commodity hardwareComputers & network6© Hortonworks Inc. 2011
  • 7.
    Typical Hadoop Applications7dataanalyticsadvertising optimizationmachine learning search rankingMail anti-spamadvertising data systemsaudience, ad and search pipelinesad selectionWebsite personalizationContent Optimizationad inventory predictionuser interest prediction© Hortonworks Inc. 2011
  • 8.
    Who Builds Hadoop?Linesof code contributed since Hadoop inception8© Hortonworks Inc. 2011
  • 9.
    Who Builds Hadoop?Linesof code contributed in 20119© Hortonworks Inc. 2011
  • 10.
    , early adoptersScale and productize Hadoop2006 – presentOther Internet CompaniesAdd tools / frameworks, enhance Hadoop2008 – presentService Providers Provide training, support, hosting 2010 – presentApache HadoopA Brief HistoryNascent / 2011Wide Enterprise Adoption Funds further development, enhancements10© Hortonworks Inc. 2011
  • 11.
    HADOOP @ YAHOO!40K+Servers170 PB Storage5M+ Monthly Jobs1000+ Active users© Yahoo 201111
  • 12.
    CASE STUDYYAHOO! HOMEPAGEtwice the engagementPersonalizedfor each visitorResult: twice the engagementNews InterestsTop SearchesRecommended links+43% clicksvs. editor selected+79% clicksvs. randomly selected+160% clicksvs. one size fits all© Yahoo 201112
  • 13.
    CASE STUDYYAHOO! HOMEPAGESCIENCE HADOOP CLUSTERServingMaps
  • 14.
  • 15.
  • 16.
    Weekly Categorization models»Machinelearning to build ever better categorization modelsCATEGORIZATIONMODELS (weekly)USERBEHAVIOR PRODUCTION HADOOP CLUSTER»Identify user interests using Categorization modelsSERVINGMAPS(every 5 minutes)USERBEHAVIORBuild customized home pages with latest data (thousands / second)SERVING SYSTEMSENGAGED USERS© Yahoo 20111313
  • 17.
    CASE STUDYYAHOO! MAILEnablingquick response in the spam arms raceSCIENCE450M mail boxes
  • 18.
  • 19.
    Antispam models retrained everyfew hours on HadoopPRODUCTION“40% less spam than Hotmail and 55% less spam than Gmail“© Yahoo 20111414
  • 20.
    Hadoop in theEnterprise© Hortonworks Inc. 201115
  • 21.
    Big Data PlatformsCostper TB, AdoptionSize of bubble = cost effectiveness of solutionSource: 16© Hortonworks Inc. 2011
  • 22.
    Traditional Enterprise ArchitectureDataSilos + ETL17Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
  • 23.
    Hadoop Enterprise ArchitectureConnectingAll of Your Big Data 18Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsApache HadoopEsTsL (s = Store) Custom AnalyticsTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
  • 24.
    Hadoop Enterprise ArchitectureConnectingAll of Your Big Data 19Traditional Data Warehouses, BI & AnalyticsServing ApplicationsWeb ServingNoSQLRDMS…Traditional ETL &Message busesEDWData MartsBI / AnalyticsApache HadoopEsTsL (s = Store) Custom AnalyticsGartner predicts 800% data growth over next 5 years80-90% of data produced today is unstructuredTraditional ETL &Message busesServing LogsSocial MediaSensor DataText Systems…Unstructured Systems© Hortonworks Inc. 2011
  • 25.
    The Hadoop Market©Hortonworks Inc. 201120
  • 26.
    Market Drivers forApache HadoopBusiness driversIdentified high value projects that require use of more dataBelief that there is great ROI in mastering big dataFinancial driversGrowing cost of data systems as proportion of IT spendCost advantage of commodity hardware + open source Enables departmental-level big data strategies Technical driversExisting solutions failing under growing requirements3Vs - Volume, velocity, varietyProliferation of unstructured data21Significant opportunity for Hadoop in enterprise data architectures© Hortonworks Inc. 2011
  • 27.
    Market Opportunity forHadoopCurrentApache Hadoop can become de facto platform for managing unstructured data in the enterpriseEnable new breed of applications to be built on top of Apache HadoopFutureHadoop becomes the next generation enterprise data architecture22© Hortonworks Inc. 2011
  • 28.
    Market DynamicsTechnology &knowledge gaps are preventing Apache Hadoop from becoming an enterprise standardDifficult to install and deploy Hadoop projects Lack of technical content to assistDemand for knowledgeable developers far exceeds supplyVirtually every F500 company is constructing a Hadoop strategyBut most are still in POC/experimentation phase with HadoopTop ISV/OEMs working to create Hadoop strategiesDriven by customer demandCommunity is becoming increasingly confused by all of the noiseMultiple distributions, many vendor announcementsFear of market fragmentation23© Hortonworks Inc. 2011
  • 29.
    ConclusionThere is nota Hadoop market to “win” todayMost organizations haven’t moved to full-scale productionLack of mass adoption limiting short-term monetization opportunitiesNeed to drive Apache Hadoop as a unifying standardIn order to succeed, we need to enable the marketContinue investment to overcome technology gapsEnable a vibrant partner ecosystemExpand availability of content and services to address knowledge gaps How will Hortonworks do that?24© Hortonworks Inc. 2011
  • 30.

Editor's Notes

  • #5 Our commitment to Apache has already changed the market!Ultimately contributing the code that maters and making it work is the currency in open source
  • #10 Our commitment is to continue growing our contribution
  • #11 For more information on the history of Hadoop, see: http://developer.yahoo.com/blogs/hadoop/posts/2011/01/the-backstory-of-yahoo-and-hadoop/