Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Dunning strata-2012-27-02

516
views

Published on

Published in: Technology

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
516
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Take all of Twitter400 x 10^6 tweets per day < 400 GB per day < 40MB/s
  • Kafka is a message Queuing system
  • Catcher is a processorAll of the systems can be run out of Hadoop. Warden can be configured to run Storm as well. Simple Architecture – all from one platform. The green blocks are data that is available for other analytics.
  • Transcript

    • 1. Expect More from Hadoop!©MapR Technologies - Confidential 1
    • 2. My Background University, Startups – Aptex, MusicMatch, ID Analytics, Veoh – big data since before it was big Open source – even before the internet – Apache Hadoop, Mahout, Zookeeper, Drill – bought the beer at first HUG MapR Founding member of Apache Drill©MapR Technologies - Confidential 2
    • 3. MapR Technologies Enterprise quality distribution for Hadoop – Many extensions beyond basic Hadoop Super strong team – Long history of successful startups Strong supporter of Apache Drill – and open source in general©MapR Technologies - Confidential 3
    • 4. meta-Hadoop?©MapR Technologies - Confidential 4
    • 5. meta Meta- (from Greek: μετά = "after", "beyond", "with", "adjacent", "self"), is a…©MapR Technologies - Confidential 5
    • 6. Answering Beyond ≠ yesterday’s problems©MapR Technologies - Confidential 6
    • 7. Philosophy First What is History?©MapR Technologies - Confidential 7
    • 8. The study of the past(what came before now)©MapR Technologies - Confidential 8
    • 9. What is the future? (it comes after now)©MapR Technologies - Confidential 9
    • 10. ©MapR Technologies - Confidential 10
    • 11. ©MapR Technologies - Confidential 11
    • 12. But the future also has a past!©MapR Technologies - Confidential 12
    • 13. the future of the past is not the past of the future©MapR Technologies - Confidential 13
    • 14. Do you remember the future?©MapR Technologies - Confidential 14
    • 15. ©MapR Technologies - Confidential 15
    • 16. ©MapR Technologies - Confidential 16
    • 17. ©MapR Technologies - Confidential 17
    • 18. Those are yesterday’s answers©MapR Technologies - Confidential 18
    • 19. and also the seeds of tomorrow©MapR Technologies - Confidential 19
    • 20. Guys wearing Fedoras©MapR Technologies - Confidential 20
    • 21. Hadoop has a history©MapR Technologies - Confidential 21
    • 22. Hadoop also has a future©MapR Technologies - Confidential 22
    • 23. The Old Future of Hadoop Implementing yet another Google paper – Map-reduce and HDFS, and Yarn and Tez – more and more, but not really different Eco-system additions (more Google papers) – simpler programming (Hive and Pig and Crunch) (Sawzall, FlumeJava, etc) – key-value store (big table) – ad hoc query (Dremel) – also not really different Stands apart from other computing – required by HDFS and other limitations©MapR Technologies - Confidential 23
    • 24. The New Future of Hadoop Real-time processing – Combines real-time and long-time Integration with traditional IT – No need to stand apart Integration with new technologies – Solr, Node.js, Twisted all should work directly on Hadoop Fast and flexible computation – Drill logical plan language©MapR Technologies - Confidential 24
    • 25. Example #1 Search Abuse©MapR Technologies - Confidential 25
    • 26. History matrix One row per user One column per thing©MapR Technologies - Confidential 26
    • 27. Recommendation based on cooccurrence Cooccurrence gives item-item mapping One row and column per thing©MapR Technologies - Confidential 27
    • 28. Cooccurrence matrix can also be implemented as a search index©MapR Technologies - Confidential 28
    • 29. SolR SolR Complete Cooccurrence Indexer Solr Indexer history (Mahout) indexing Item meta- Index data shards©MapR Technologies - Confidential 29
    • 30. SolR SolR User Indexer Solr Web tier Indexer history search Item meta- Index data shards©MapR Technologies - Confidential 30
    • 31. Objective Results At a very large credit card company History is all transactions, all web interaction Processing time cut from 20 hours per day to 3 Recommendation engine load time decreased from 8 hours to 3 minutes©MapR Technologies - Confidential 31
    • 32. Scaling Estimates – Twitter Fire hose Old School – 8+ separate  MapR – one platform clusters, 20-25 nodes – 5-10 nodes total, any node does any – >3 Kafka nodes job – >2 TwitterLogger – Full HA included, – 5-10 Hadoop backups included, – >3 Storm disaster recovery included – 3 zookeepers (or not?) – NAS for web storage – >2 web servers©MapR Technologies - Confidential 32
    • 33. Example #2 Web Technology©MapR Technologies - Confidential 33
    • 34. Real-time Fast analysis data (Storm) Analytic Raw logs output©MapR Technologies - Confidential 34
    • 35. Large analysis (map-reduce) Analytic Raw logs output©MapR Technologies - Confidential 35
    • 36. Presentation Browser tier (d3 + query node.js) Analytic Raw logs output©MapR Technologies - Confidential 36
    • 37. Old School Storm: Complex architecture Twitter Twitter API Kafka Kafka API TwitterLogger Kafka Kafka Cluster Cluster Cluster Storm Kafka Storm Web Flume Data NAS HDFS Data Hadoop http Web-server©MapR Technologies - Confidential 37
    • 38. MapR: One Platform with Streaming Writes Twitter Twitter API http Catcher Web-server TwitterLogger Catcher Storm NFS NFS NFS NFS Optional HDFS MapReduce Topic Web API Queue Data MapR Users can also run extended analytics/MapReduce on the stored data©MapR Technologies - Confidential 38
    • 39. ©MapR Technologies - Confidential 39
    • 40. Objective Results Real-time + long-time analysis is seamless Web tier can be rooted directly on Hadoop cluster No need to move data©MapR Technologies - Confidential 40
    • 41. The future is not what we thought it would be©MapR Technologies - Confidential 41
    • 42. It is better!©MapR Technologies - Confidential 42
    • 43. Get Involved! Tweet: #strataconf #mapr @ted_dunning©MapR Technologies - Confidential 43
    • 44. Get Involved! Join Apache Drill! – drill-dev-subscribe@incubator.apache.org – Follow @apachedrill Join MapR! – jobs@mapr.com Download these slides – http://www.mapr.com/company/events/strata-conference-2-2-27-13 Contact me: – tdunning@maprtech.com – tdunning@apache.org – @ted_dunning©MapR Technologies - Confidential 44