Dunning strata-2012-27-02

817 views

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
817
On SlideShare
0
From Embeds
0
Number of Embeds
31
Actions
Shares
0
Downloads
7
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Take all of Twitter400 x 10^6 tweets per day < 400 GB per day < 40MB/s
  • Kafka is a message Queuing system
  • Catcher is a processorAll of the systems can be run out of Hadoop. Warden can be configured to run Storm as well. Simple Architecture – all from one platform. The green blocks are data that is available for other analytics.
  • Dunning strata-2012-27-02

    1. 1. Expect More from Hadoop!©MapR Technologies - Confidential 1
    2. 2. My Background University, Startups – Aptex, MusicMatch, ID Analytics, Veoh – big data since before it was big Open source – even before the internet – Apache Hadoop, Mahout, Zookeeper, Drill – bought the beer at first HUG MapR Founding member of Apache Drill©MapR Technologies - Confidential 2
    3. 3. MapR Technologies Enterprise quality distribution for Hadoop – Many extensions beyond basic Hadoop Super strong team – Long history of successful startups Strong supporter of Apache Drill – and open source in general©MapR Technologies - Confidential 3
    4. 4. meta-Hadoop?©MapR Technologies - Confidential 4
    5. 5. meta Meta- (from Greek: μετά = "after", "beyond", "with", "adjacent", "self"), is a…©MapR Technologies - Confidential 5
    6. 6. Answering Beyond ≠ yesterday’s problems©MapR Technologies - Confidential 6
    7. 7. Philosophy First What is History?©MapR Technologies - Confidential 7
    8. 8. The study of the past(what came before now)©MapR Technologies - Confidential 8
    9. 9. What is the future? (it comes after now)©MapR Technologies - Confidential 9
    10. 10. ©MapR Technologies - Confidential 10
    11. 11. ©MapR Technologies - Confidential 11
    12. 12. But the future also has a past!©MapR Technologies - Confidential 12
    13. 13. the future of the past is not the past of the future©MapR Technologies - Confidential 13
    14. 14. Do you remember the future?©MapR Technologies - Confidential 14
    15. 15. ©MapR Technologies - Confidential 15
    16. 16. ©MapR Technologies - Confidential 16
    17. 17. ©MapR Technologies - Confidential 17
    18. 18. Those are yesterday’s answers©MapR Technologies - Confidential 18
    19. 19. and also the seeds of tomorrow©MapR Technologies - Confidential 19
    20. 20. Guys wearing Fedoras©MapR Technologies - Confidential 20
    21. 21. Hadoop has a history©MapR Technologies - Confidential 21
    22. 22. Hadoop also has a future©MapR Technologies - Confidential 22
    23. 23. The Old Future of Hadoop Implementing yet another Google paper – Map-reduce and HDFS, and Yarn and Tez – more and more, but not really different Eco-system additions (more Google papers) – simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc) – key-value store (big table) – ad hoc query (Dremel) – also not really different Stands apart from other computing – required by HDFS and other limitations©MapR Technologies - Confidential 23
    24. 24. The New Future of Hadoop Real-time processing – Combines real-time and long-time Integration with traditional IT – No need to stand apart Integration with new technologies – Solr, Node.js, Twisted all should work directly on Hadoop Fast and flexible computation – Drill logical plan language©MapR Technologies - Confidential 24
    25. 25. Example #1 Search Abuse©MapR Technologies - Confidential 25
    26. 26. History matrix One row per user One column per thing©MapR Technologies - Confidential 26
    27. 27. Recommendation based on cooccurrence Cooccurrence gives item-item mapping One row and column per thing©MapR Technologies - Confidential 27
    28. 28. Cooccurrence matrix can also be implemented as a search index©MapR Technologies - Confidential 28
    29. 29. SolR SolR Complete Cooccurrence Indexer Solr Indexer history (Mahout) indexing Item meta- Index data shards©MapR Technologies - Confidential 29
    30. 30. SolR SolR User Indexer Solr Web tier Indexer history search Item meta- Index data shards©MapR Technologies - Confidential 30
    31. 31. Objective Results At a very large credit card company History is all transactions, all web interaction Processing time cut from 20 hours per day to 3 Recommendation engine load time decreased from 8 hours to 3 minutes©MapR Technologies - Confidential 31
    32. 32. Scaling Estimates – Twitter Fire hose Old School – 8+ separate  MapR – one platform clusters, 20-25 nodes – 5-10 nodes total, any node does any – >3 Kafka nodes job – >2 TwitterLogger – Full HA included, – 5-10 Hadoop backups included, – >3 Storm disaster recovery included – 3 zookeepers (or not?) – NAS for web storage – >2 web servers©MapR Technologies - Confidential 32
    33. 33. Example #2 Web Technology©MapR Technologies - Confidential 33
    34. 34. Real-time Fast analysis data (Storm) Analytic Raw logs output©MapR Technologies - Confidential 34
    35. 35. Large analysis (map-reduce) Analytic Raw logs output©MapR Technologies - Confidential 35
    36. 36. Presentation Browser tier (d3 + query node.js) Analytic Raw logs output©MapR Technologies - Confidential 36
    37. 37. Old School Storm: Complex architecture Twitter Twitter API Kafka Kafka API TwitterLogger Kafka Kafka Cluster Cluster Cluster Storm Kafka Storm Web Flume Data NAS HDFS Data Hadoop http Web-server©MapR Technologies - Confidential 37
    38. 38. MapR: One Platform with Streaming Writes Twitter Twitter API http Catcher Web-server TwitterLogger Catcher Storm NFS NFS NFS NFS Optional HDFS MapReduce Topic Web API Queue Data MapR Users can also run extended analytics/MapReduce on the stored data©MapR Technologies - Confidential 38
    39. 39. ©MapR Technologies - Confidential 39
    40. 40. Objective Results Real-time + long-time analysis is seamless Web tier can be rooted directly on Hadoop cluster No need to move data©MapR Technologies - Confidential 40
    41. 41. The future is not what we thought it would be©MapR Technologies - Confidential 41
    42. 42. It is better!©MapR Technologies - Confidential 42
    43. 43. Get Involved! Tweet: #strataconf #mapr @ted_dunning©MapR Technologies - Confidential 43
    44. 44. Get Involved! Join Apache Drill! – drill-dev-subscribe@incubator.apache.org – Follow @apachedrill Join MapR! – jobs@mapr.com Download these slides – http://www.mapr.com/company/events/strata-conference-2-2-27-13 Contact me: – tdunning@maprtech.com – tdunning@apache.org – @ted_dunning©MapR Technologies - Confidential 44

    ×