Hortonworks for Financial Analysts Presentation


Published on

Hortonworks presentation from Cowen Big Data Day for financial industry analysts

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Our commitment to Apache has already changed the market!Ultimately contributing the code that maters and making it work is the currency in open source
  • Our commitment is to continue growing our contribution
  • For more information on the history of Hadoop, see: http://developer.yahoo.com/blogs/hadoop/posts/2011/01/the-backstory-of-yahoo-and-hadoop/
  • Hortonworks for Financial Analysts Presentation

    1. 1. Hortonworks<br />Eric Baldeschwieler, Co-Founder and CEO<br />September 2011<br />Overview for Cowen Big Data Day 2011<br />© Hortonworks Inc. 2011<br />
    2. 2. Agenda<br />Hortonworks<br />Apache Hadoop<br />Use cases<br />Hadoop in the Enterprise<br />Market<br />Strategy<br />2<br />© Hortonworks Inc. 2011<br />
    3. 3. About Hortonworks – Basics<br />Founded – July 1st, 2011<br /> 22 architects & committers from Yahoo!<br />Mission – Architect the future of Big Data<br /> Revolutionize and commoditize the storage and processing of Big Data via open source<br />Vision – Half of the worlds data will be stored in Hadoop within five years<br />3<br />© Hortonworks Inc. 2011<br />
    4. 4. About Hortonworks – Game Plan<br />Support the growth of a huge Apache Hadoop ecosystem<br />Invest in ease of use, management, and other enterprise features<br />Define APIs for ISVs, OEMs and others to integrate with Apache Hadoop<br />Continue to invest in advancing the Hadoop core, remain the experts<br />Contribute all of our work to Apache<br />Profit by providing training & support to the Hadoop community<br />4<br />© Hortonworks Inc. 2011<br />
    5. 5. Credentials<br />Technical: key architects and committers from Yahoo! Hadoop engineering team<br />Delivered every major Apache Hadoop release since 0.1<br />Highest concentration of Apache Hadoop committers<br />Driving innovation across entire Apache Hadoop stack<br />Experience managing world’s largest deployment<br />Access to Yahoo!’s 1,000+ users and 42k+ nodes for testing, QA, etc.<br />Business operations: team of highly successful open source veterans<br />Led by Rob Bearden, former COO of SpringSource & JBoss<br />Investors: backed by Benchmark Capital and Yahoo!<br />5<br />© Hortonworks Inc. 2011<br />
    6. 6. What is Apache Hadoop?<br />Set of open source projects <br />Owned by Apache Software Foundation<br />Transforms commodity hardware into a service that:<br />Stores petabytes of data reliably (HDFS)<br />Allows huge distributed computations (MapReduce)<br />Key attributes:<br />Redundant and reliable<br />Doesn’t stop or lose data even if hardware fails<br />Easy to program<br />Extremely powerful<br />Allows the development of big data algorithms & tools<br />Batch processing centric <br />Runs on commodity hardware<br />Computers & network<br />6<br />© Hortonworks Inc. 2011<br />
    7. 7. Typical Hadoop Applications<br />7<br />data analytics<br />advertising optimization<br />machine learning search ranking<br />Mail anti-spam<br />advertising data systems<br />audience, ad and search pipelines<br />ad selection<br />Website personalization<br />Content Optimization<br />ad inventory prediction<br />user interest prediction<br />© Hortonworks Inc. 2011<br />
    8. 8. Who Builds Hadoop?Lines of code contributed since Hadoop inception<br />8<br />© Hortonworks Inc. 2011<br />
    9. 9. Who Builds Hadoop?Lines of code contributed in 2011<br />9<br />© Hortonworks Inc. 2011<br />
    10. 10. , early adopters <br />Scale and productize Hadoop<br />2006 – present<br />Other Internet Companies<br />Add tools / frameworks, enhance Hadoop<br />2008 – present<br />Service Providers <br />Provide training, support, hosting <br />2010 – present<br />Apache Hadoop<br />A Brief History<br />Nascent / 2011<br />Wide Enterprise Adoption <br />Funds further development, enhancements<br />10<br />© Hortonworks Inc. 2011<br />
    11. 11. HADOOP @ YAHOO!<br />40K+ Servers<br />170 PB Storage<br />5M+ Monthly Jobs<br />1000+ Active users<br />© Yahoo 2011<br />11<br />
    12. 12. CASE STUDY<br />YAHOO! HOMEPAGE<br /> twice the engagement<br />Personalized<br />for each visitor<br />Result: <br />twice the engagement<br />News Interests<br />Top Searches<br />Recommended links<br />+43% clicks<br />vs. editor selected<br />+79% clicks<br />vs. randomly selected<br />+160% clicks<br />vs. one size fits all<br />© Yahoo 2011<br />12<br />
    13. 13. CASE STUDY<br />YAHOO! HOMEPAGE<br /> SCIENCE<br /> HADOOP <br /> CLUSTER<br /><ul><li>ServingMaps
    14. 14. Users - Interests
    15. 15. Five Minute Production
    16. 16. Weekly Categorization models</li></ul>»Machine learning to build ever better categorization models<br />CATEGORIZATION<br />MODELS (weekly)<br />USER<br />BEHAVIOR<br /> PRODUCTION<br /> HADOOP <br /> CLUSTER<br />»Identify user interests using Categorization models<br />SERVING<br />MAPS<br />(every 5 minutes)<br />USER<br />BEHAVIOR<br />Build customized home pages with latest data (thousands / second)<br />SERVING SYSTEMS<br />ENGAGED USERS<br />© Yahoo 2011<br />13<br />13<br />
    17. 17. CASE STUDY<br />YAHOO! MAIL<br />Enabling quick response in the spam arms race<br />SCIENCE<br /><ul><li>450M mail boxes
    18. 18. 5B+ deliveries/day
    19. 19. Antispam models retrained</li></ul> every few hours on Hadoop<br />PRODUCTION<br />“<br />40% less spam than Hotmail and 55% less spam than Gmail<br />“<br />© Yahoo 2011<br />14<br />14<br />
    20. 20. Hadoop in the Enterprise<br />© Hortonworks Inc. 2011<br />15<br />
    21. 21. Big Data PlatformsCost per TB, Adoption<br />Size of bubble = cost effectiveness of solution<br />Source: <br />16<br />© Hortonworks Inc. 2011<br />
    22. 22. Traditional Enterprise ArchitectureData Silos + ETL<br />17<br />Traditional Data Warehouses, <br />BI & Analytics<br />Serving Applications<br />Web Serving<br />NoSQLRDMS<br />…<br />Traditional ETL &<br />Message buses<br />EDW<br />Data Marts<br />BI / Analytics<br />Traditional ETL &<br />Message buses<br />Serving Logs<br />Social Media<br />Sensor Data<br />Text Systems<br />…<br />Unstructured Systems<br />© Hortonworks Inc. 2011<br />
    23. 23. Hadoop Enterprise ArchitectureConnecting All of Your Big Data <br />18<br />Traditional Data Warehouses, <br />BI & Analytics<br />Serving Applications<br />Web Serving<br />NoSQLRDMS<br />…<br />Traditional ETL &<br />Message buses<br />EDW<br />Data Marts<br />BI / Analytics<br />Apache Hadoop<br />EsTsL (s = Store) <br />Custom Analytics<br />Traditional ETL &<br />Message buses<br />Serving Logs<br />Social Media<br />Sensor Data<br />Text Systems<br />…<br />Unstructured Systems<br />© Hortonworks Inc. 2011<br />
    24. 24. Hadoop Enterprise ArchitectureConnecting All of Your Big Data <br />19<br />Traditional Data Warehouses, <br />BI & Analytics<br />Serving Applications<br />Web Serving<br />NoSQLRDMS<br />…<br />Traditional ETL &<br />Message buses<br />EDW<br />Data Marts<br />BI / Analytics<br />Apache Hadoop<br />EsTsL (s = Store) <br />Custom Analytics<br />Gartner predicts <br />800% data growth <br />over next 5 years<br />80-90% of data <br />produced today <br />is unstructured<br />Traditional ETL &<br />Message buses<br />Serving Logs<br />Social Media<br />Sensor Data<br />Text Systems<br />…<br />Unstructured Systems<br />© Hortonworks Inc. 2011<br />
    25. 25. The Hadoop Market<br />© Hortonworks Inc. 2011<br />20<br />
    26. 26. Market Drivers for Apache Hadoop<br />Business drivers<br />Identified high value projects that require use of more data<br />Belief that there is great ROI in mastering big data<br />Financial drivers<br />Growing cost of data systems as proportion of IT spend<br />Cost advantage of commodity hardware + open source <br />Enables departmental-level big data strategies <br />Technical drivers<br />Existing solutions failing under growing requirements<br />3Vs - Volume, velocity, variety<br />Proliferation of unstructured data<br />21<br />Significant opportunity for Hadoop in enterprise data architectures<br />© Hortonworks Inc. 2011<br />
    27. 27. Market Opportunity for Hadoop<br />Current<br />Apache Hadoop can become de facto platform for managing unstructured data in the enterprise<br />Enable new breed of applications to be built on top of Apache Hadoop<br />Future<br />Hadoop becomes the next generation enterprise data architecture<br />22<br />© Hortonworks Inc. 2011<br />
    28. 28. Market Dynamics<br />Technology & knowledge gaps are preventing Apache Hadoop from becoming an enterprise standard<br />Difficult to install and deploy Hadoop projects <br />Lack of technical content to assist<br />Demand for knowledgeable developers far exceeds supply<br />Virtually every F500 company is constructing a Hadoop strategy<br />But most are still in POC/experimentation phase with Hadoop<br />Top ISV/OEMs working to create Hadoop strategies<br />Driven by customer demand<br />Community is becoming increasingly confused by all of the noise<br />Multiple distributions, many vendor announcements<br />Fear of market fragmentation<br />23<br />© Hortonworks Inc. 2011<br />
    29. 29. Conclusion<br />There is not a Hadoop market to “win” today<br />Most organizations haven’t moved to full-scale production<br />Lack of mass adoption limiting short-term monetization opportunities<br />Need to drive Apache Hadoop as a unifying standard<br />In order to succeed, we need to enable the market<br />Continue investment to overcome technology gaps<br />Enable a vibrant partner ecosystem<br />Expand availability of content and services to address knowledge gaps<br /> How will Hortonworks do that?<br />24<br />© Hortonworks Inc. 2011<br />
    30. 30. Hortonworks Strategy<br />© Hortonworks Inc. 2011<br />25<br />
    31. 31. Hortonworks Strategy #1Overcome Technology Gaps<br />Make Apache Hadoop projects easier to install, manage & use<br />Regular sustaining releases<br />Projects released as binary (RPM, .deb)<br />Open source Management & Monitoring<br />Make Apache Hadoop more robust<br />Performance gains<br />High availability<br />Administration & monitoring <br />All done within Apache Hadoop community<br /><ul><li>Develop collaboratively with community
    32. 32. Complete transparency
    33. 33. All code contributed back to Apache</li></ul>Anyone should be able to easily deploy the Hadoop projects from Apache<br />26<br />© Hortonworks Inc. 2011<br />
    34. 34. HortonworksStrategy #2Enable a Vibrant Ecosystem<br />Unify the community around a strong Apache Hadoop offering<br />Make Apache Hadoop easier to integrate & extend<br />Work closely with partners to define and build open APIs<br />Everything contributed back to Apache<br />Provide enablement services as necessary to optimize integration<br />27<br />Integration & Services Partners<br />Hadoop Application Partners<br />DW, Analytics & BI Partners<br />Serving & Unstructured Data Systems Partners<br />Hardware Partners<br />Cloud & Hosting Platform Partners<br />© Hortonworks Inc. 2011<br />
    35. 35. Hortonworks Strategy #3Overcome Knowledge Gaps<br />Improve user experience with Apache Hadoop software<br />Binaries, installers, etc.<br />Expand Apache Hadoop technical content<br />Core content on Apache.org<br />Docs, installation guides, etc.<br />Advanced tools on Hortonworks.com<br />Best practices, screencasts, forums, etc.<br />Extensive Hadoop training & certification program<br />Expert technical support services<br />28<br />© Hortonworks Inc. 2011<br />
    36. 36. Rationale for Hortonworks Strategy<br />Strong interest from community (enterprises and ISV/OEMs) in a complete, enterprise-viable, Apache Hadoop platform<br />Strong desire for core to remain unified and strong, avoid UNIX wars II<br />Fremium model seen as a barrier to growth and adoption<br />Highly defensible because of Hortonworks leadership in core projects<br />Proven experience executing open source business models<br />Rob Bearden & Benchmark<br />29<br />© Hortonworks Inc. 2011<br />
    37. 37. 30<br />Thank You.<br />© Hortonworks Inc. 2011<br />
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.