Your SlideShare is downloading. ×
Hortonworks for Financial Analysts Presentation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hortonworks for Financial Analysts Presentation

4,996
views

Published on

Hortonworks presentation from Cowen Big Data Day for financial industry analysts

Hortonworks presentation from Cowen Big Data Day for financial industry analysts

Published in: Technology

0 Comments
11 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,996
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
276
Comments
0
Likes
11
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Our commitment to Apache has already changed the market!Ultimately contributing the code that maters and making it work is the currency in open source
  • Our commitment is to continue growing our contribution
  • For more information on the history of Hadoop, see: http://developer.yahoo.com/blogs/hadoop/posts/2011/01/the-backstory-of-yahoo-and-hadoop/
  • Transcript

    • 1. Hortonworks
      Eric Baldeschwieler, Co-Founder and CEO
      September 2011
      Overview for Cowen Big Data Day 2011
      © Hortonworks Inc. 2011
    • 2. Agenda
      Hortonworks
      Apache Hadoop
      Use cases
      Hadoop in the Enterprise
      Market
      Strategy
      2
      © Hortonworks Inc. 2011
    • 3. About Hortonworks – Basics
      Founded – July 1st, 2011
      22 architects & committers from Yahoo!
      Mission – Architect the future of Big Data
      Revolutionize and commoditize the storage and processing of Big Data via open source
      Vision – Half of the worlds data will be stored in Hadoop within five years
      3
      © Hortonworks Inc. 2011
    • 4. About Hortonworks – Game Plan
      Support the growth of a huge Apache Hadoop ecosystem
      Invest in ease of use, management, and other enterprise features
      Define APIs for ISVs, OEMs and others to integrate with Apache Hadoop
      Continue to invest in advancing the Hadoop core, remain the experts
      Contribute all of our work to Apache
      Profit by providing training & support to the Hadoop community
      4
      © Hortonworks Inc. 2011
    • 5. Credentials
      Technical: key architects and committers from Yahoo! Hadoop engineering team
      Delivered every major Apache Hadoop release since 0.1
      Highest concentration of Apache Hadoop committers
      Driving innovation across entire Apache Hadoop stack
      Experience managing world’s largest deployment
      Access to Yahoo!’s 1,000+ users and 42k+ nodes for testing, QA, etc.
      Business operations: team of highly successful open source veterans
      Led by Rob Bearden, former COO of SpringSource & JBoss
      Investors: backed by Benchmark Capital and Yahoo!
      5
      © Hortonworks Inc. 2011
    • 6. What is Apache Hadoop?
      Set of open source projects
      Owned by Apache Software Foundation
      Transforms commodity hardware into a service that:
      Stores petabytes of data reliably (HDFS)
      Allows huge distributed computations (MapReduce)
      Key attributes:
      Redundant and reliable
      Doesn’t stop or lose data even if hardware fails
      Easy to program
      Extremely powerful
      Allows the development of big data algorithms & tools
      Batch processing centric
      Runs on commodity hardware
      Computers & network
      6
      © Hortonworks Inc. 2011
    • 7. Typical Hadoop Applications
      7
      data analytics
      advertising optimization
      machine learning search ranking
      Mail anti-spam
      advertising data systems
      audience, ad and search pipelines
      ad selection
      Website personalization
      Content Optimization
      ad inventory prediction
      user interest prediction
      © Hortonworks Inc. 2011
    • 8. Who Builds Hadoop?Lines of code contributed since Hadoop inception
      8
      © Hortonworks Inc. 2011
    • 9. Who Builds Hadoop?Lines of code contributed in 2011
      9
      © Hortonworks Inc. 2011
    • 10. , early adopters
      Scale and productize Hadoop
      2006 – present
      Other Internet Companies
      Add tools / frameworks, enhance Hadoop
      2008 – present
      Service Providers
      Provide training, support, hosting
      2010 – present
      Apache Hadoop
      A Brief History
      Nascent / 2011
      Wide Enterprise Adoption
      Funds further development, enhancements
      10
      © Hortonworks Inc. 2011
    • 11. HADOOP @ YAHOO!
      40K+ Servers
      170 PB Storage
      5M+ Monthly Jobs
      1000+ Active users
      © Yahoo 2011
      11
    • 12. CASE STUDY
      YAHOO! HOMEPAGE
      twice the engagement
      Personalized
      for each visitor
      Result:
      twice the engagement
      News Interests
      Top Searches
      Recommended links
      +43% clicks
      vs. editor selected
      +79% clicks
      vs. randomly selected
      +160% clicks
      vs. one size fits all
      © Yahoo 2011
      12
    • 13. CASE STUDY
      YAHOO! HOMEPAGE
      SCIENCE
      HADOOP
      CLUSTER
      • ServingMaps
      • 14. Users - Interests
      • 15. Five Minute Production
      • 16. Weekly Categorization models
      »Machine learning to build ever better categorization models
      CATEGORIZATION
      MODELS (weekly)
      USER
      BEHAVIOR
      PRODUCTION
      HADOOP
      CLUSTER
      »Identify user interests using Categorization models
      SERVING
      MAPS
      (every 5 minutes)
      USER
      BEHAVIOR
      Build customized home pages with latest data (thousands / second)
      SERVING SYSTEMS
      ENGAGED USERS
      © Yahoo 2011
      13
      13
    • 17. CASE STUDY
      YAHOO! MAIL
      Enabling quick response in the spam arms race
      SCIENCE
      • 450M mail boxes
      • 18. 5B+ deliveries/day
      • 19. Antispam models retrained
      every few hours on Hadoop
      PRODUCTION

      40% less spam than Hotmail and 55% less spam than Gmail

      © Yahoo 2011
      14
      14
    • 20. Hadoop in the Enterprise
      © Hortonworks Inc. 2011
      15
    • 21. Big Data PlatformsCost per TB, Adoption
      Size of bubble = cost effectiveness of solution
      Source:
      16
      © Hortonworks Inc. 2011
    • 22. Traditional Enterprise ArchitectureData Silos + ETL
      17
      Traditional Data Warehouses,
      BI & Analytics
      Serving Applications
      Web Serving
      NoSQLRDMS

      Traditional ETL &
      Message buses
      EDW
      Data Marts
      BI / Analytics
      Traditional ETL &
      Message buses
      Serving Logs
      Social Media
      Sensor Data
      Text Systems

      Unstructured Systems
      © Hortonworks Inc. 2011
    • 23. Hadoop Enterprise ArchitectureConnecting All of Your Big Data
      18
      Traditional Data Warehouses,
      BI & Analytics
      Serving Applications
      Web Serving
      NoSQLRDMS

      Traditional ETL &
      Message buses
      EDW
      Data Marts
      BI / Analytics
      Apache Hadoop
      EsTsL (s = Store)
      Custom Analytics
      Traditional ETL &
      Message buses
      Serving Logs
      Social Media
      Sensor Data
      Text Systems

      Unstructured Systems
      © Hortonworks Inc. 2011
    • 24. Hadoop Enterprise ArchitectureConnecting All of Your Big Data
      19
      Traditional Data Warehouses,
      BI & Analytics
      Serving Applications
      Web Serving
      NoSQLRDMS

      Traditional ETL &
      Message buses
      EDW
      Data Marts
      BI / Analytics
      Apache Hadoop
      EsTsL (s = Store)
      Custom Analytics
      Gartner predicts
      800% data growth
      over next 5 years
      80-90% of data
      produced today
      is unstructured
      Traditional ETL &
      Message buses
      Serving Logs
      Social Media
      Sensor Data
      Text Systems

      Unstructured Systems
      © Hortonworks Inc. 2011
    • 25. The Hadoop Market
      © Hortonworks Inc. 2011
      20
    • 26. Market Drivers for Apache Hadoop
      Business drivers
      Identified high value projects that require use of more data
      Belief that there is great ROI in mastering big data
      Financial drivers
      Growing cost of data systems as proportion of IT spend
      Cost advantage of commodity hardware + open source
      Enables departmental-level big data strategies
      Technical drivers
      Existing solutions failing under growing requirements
      3Vs - Volume, velocity, variety
      Proliferation of unstructured data
      21
      Significant opportunity for Hadoop in enterprise data architectures
      © Hortonworks Inc. 2011
    • 27. Market Opportunity for Hadoop
      Current
      Apache Hadoop can become de facto platform for managing unstructured data in the enterprise
      Enable new breed of applications to be built on top of Apache Hadoop
      Future
      Hadoop becomes the next generation enterprise data architecture
      22
      © Hortonworks Inc. 2011
    • 28. Market Dynamics
      Technology & knowledge gaps are preventing Apache Hadoop from becoming an enterprise standard
      Difficult to install and deploy Hadoop projects
      Lack of technical content to assist
      Demand for knowledgeable developers far exceeds supply
      Virtually every F500 company is constructing a Hadoop strategy
      But most are still in POC/experimentation phase with Hadoop
      Top ISV/OEMs working to create Hadoop strategies
      Driven by customer demand
      Community is becoming increasingly confused by all of the noise
      Multiple distributions, many vendor announcements
      Fear of market fragmentation
      23
      © Hortonworks Inc. 2011
    • 29. Conclusion
      There is not a Hadoop market to “win” today
      Most organizations haven’t moved to full-scale production
      Lack of mass adoption limiting short-term monetization opportunities
      Need to drive Apache Hadoop as a unifying standard
      In order to succeed, we need to enable the market
      Continue investment to overcome technology gaps
      Enable a vibrant partner ecosystem
      Expand availability of content and services to address knowledge gaps
      How will Hortonworks do that?
      24
      © Hortonworks Inc. 2011
    • 30. Hortonworks Strategy
      © Hortonworks Inc. 2011
      25
    • 31. Hortonworks Strategy #1Overcome Technology Gaps
      Make Apache Hadoop projects easier to install, manage & use
      Regular sustaining releases
      Projects released as binary (RPM, .deb)
      Open source Management & Monitoring
      Make Apache Hadoop more robust
      Performance gains
      High availability
      Administration & monitoring
      All done within Apache Hadoop community
      • Develop collaboratively with community
      • 32. Complete transparency
      • 33. All code contributed back to Apache
      Anyone should be able to easily deploy the Hadoop projects from Apache
      26
      © Hortonworks Inc. 2011
    • 34. HortonworksStrategy #2Enable a Vibrant Ecosystem
      Unify the community around a strong Apache Hadoop offering
      Make Apache Hadoop easier to integrate & extend
      Work closely with partners to define and build open APIs
      Everything contributed back to Apache
      Provide enablement services as necessary to optimize integration
      27
      Integration & Services Partners
      Hadoop Application Partners
      DW, Analytics & BI Partners
      Serving & Unstructured Data Systems Partners
      Hardware Partners
      Cloud & Hosting Platform Partners
      © Hortonworks Inc. 2011
    • 35. Hortonworks Strategy #3Overcome Knowledge Gaps
      Improve user experience with Apache Hadoop software
      Binaries, installers, etc.
      Expand Apache Hadoop technical content
      Core content on Apache.org
      Docs, installation guides, etc.
      Advanced tools on Hortonworks.com
      Best practices, screencasts, forums, etc.
      Extensive Hadoop training & certification program
      Expert technical support services
      28
      © Hortonworks Inc. 2011
    • 36. Rationale for Hortonworks Strategy
      Strong interest from community (enterprises and ISV/OEMs) in a complete, enterprise-viable, Apache Hadoop platform
      Strong desire for core to remain unified and strong, avoid UNIX wars II
      Fremium model seen as a barrier to growth and adoption
      Highly defensible because of Hortonworks leadership in core projects
      Proven experience executing open source business models
      Rob Bearden & Benchmark
      29
      © Hortonworks Inc. 2011
    • 37. 30
      Thank You.
      © Hortonworks Inc. 2011