Amazon Elastic MapReduceの紹介(英語)

6,155 views

Published on

『Hadoop on クラウド / Amazon Elastic MapReduceの真価』(Amazon Web Services, Jeff Barr)の資料です。

http://www.eventbrite.com/event/1278974447/efblike

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,155
On SlideShare
0
From Embeds
0
Number of Embeds
3,188
Actions
Shares
0
Downloads
106
Comments
0
Likes
5
Embeds 0
No embeds

No notes for slide

Amazon Elastic MapReduceの紹介(英語)

  1. 1. Amazon Elastic MapReduce
  2. 2. MY BACKGROUND• Based in Seattle, WA• Education: – BS in Computer Science, The American University, 1985 – Graduate student in Digital Media, University of Washington, 2010• Background: – Microsoft Visual Studio team – Consulting to startups and VC’s – Amazon employee since 2002• Evangelist: – Speak – Write – Tweet• Author, “Host Your Web Site in the Cloud”• Email: jbarr@amazon.com• Twitter: @jeffbarr
  3. 3. AGENDA• What is Big Data• Elastic MapReduce Overview• Example Use Cases• Ecosystem and Tools• Upcoming Features• Discussion
  4. 4. W HAT IS BIG DATA?• Doesn’t refer just to volume – You can benefit from Big Data infrastructure without having a ton of data – Many existing technologies have little problem physically handling large volumes• Challenges result from the combination of data volume, data structure, and usage demands from that data, usually tied to timeliness• Big Data Tools are needed to provide a holistic view of enterprise data and systematically harness it for insights and trends
  5. 5. WHAT IS AMAZON ELASTIC MAPREDUCE• Enables customers to easily, securely and cost-effectively process vast amounts of data: – Spin-up hundreds of instances – Process hundreds of terabytes of data• Hosted Hadoop framework running on Amazon’s web-scale infrastructure
  6. 6. • Launch and monitor job flows • AWS Management Console • Command line interface • REST API
  7. 7. WHY USE AMAZON ELASTIC MAPREDUCE• Elastic MapReduce removes “MUCK” from Big Data processing – Hard to manage compute clusters – Hard to tune Hadoop – Hard to monitor running Job Flows – Hard to debug Hadoop jobs – Hadoop issues prevent smooth operation in the cloud
  8. 8. PROBLEMS CUSTOMERS SOLVE WITH ELASTIC MAPREDUCE• Targeted advertising / Clickstream analysis• Data warehousing applications• Bio-informatics (Genome analysis)• Financial simulation (Monte Carlo simulation)• File processing (resize jpegs)• Web indexing• Data mining and BI
  9. 9. HARDWARE REQUIREMENTS FOR USE CASES• Data or I/O Intensive (m1/m2 instances) – Data Warehouse – Data Mining • Click stream, logs, events, etc.• Compute or I/O Intensive (c1, cc1/HPC instances) – Credit Ratings – Fraud Models – Portfolio analysis – VaR calculation
  10. 10. CLICKSTREAM ANALYSIS – R AZORFISH AND BEST BUY• Best Buy came to Razorfish – 3.5 billion records, 71 million unique cookies, 1.7 million targeted ads required per day User recently purchased a home theater Targeted Ad system and is searching for (1.7 Million per day) video games• Leveraged AWS and Elastic MapReduce – 100 node cluster on demand – Processing time dropped from 2+ days to 8 hours – Increased ROAS (Return on Advertising Spend) by 500%
  11. 11. CLICKSTREAM ANALYSIS - ARCHITECTURE
  12. 12. W HAT IS MAPR EDUCE? • Invented by Google • New processing model • Highly scalable • Easy to understand • Industry standard • Something worth knowing
  13. 13. ELASTIC MAPR EDUCE MODEL – O VERVIEW• Take input data• Break in to sub-problems• Distribute to worker nodes• Worker nodes process sub-problems in parallel• Take output of worker nodes and reduce to answer
  14. 14. MAPR EDUCE EXAMPLE – W ORD COUNT Input Output “This”, 3 “Word”, 2Map Phase Reduce “This”, Doc1 Phase “This”, Doc1 Mapper “This”, Doc2 Reducer “Word”, Doc1 Sort “This”, Doc3 Mapper “This”, Doc2 “This”, Doc3 Mapper “Word”, Doc1 “Word”, Doc3 Reducer “Word”, Doc3
  15. 15. ELASTIC MAPR EDUCE MODEL – DETAILED
  16. 16. ELASTIC MAPR EDUCE IN ACTION – S3 L OG F ILE
  17. 17. ELASTIC MAPR EDUCE IN ACTION – S TEP 1
  18. 18. ELASTIC MAPR EDUCE IN ACTION – S TEP 2
  19. 19. ELASTIC MAPR EDUCE IN ACTION – S TEP 3
  20. 20. ELASTIC MAPR EDUCE IN ACTION – S TEP 4
  21. 21. ELASTIC MAPR EDUCE IN ACTION – S TEP 5
  22. 22. ELASTIC MAPR EDUCE IN ACTION – S TEP 6
  23. 23. ELASTIC MAPR EDUCE IN ACTION – S TEP 7
  24. 24. ELASTIC MAPR EDUCE IN ACTION - R ESULTS
  25. 25. NOTES / ATTRIBUTES• Mapper and Reducer in Java JAR files• Scale as large as needed – Data – Processing – Add nodes (even while running) to speed up• No need to manage intermediate data• Suitable for certain types of problems – Record-oriented input – No dependencies between records• No more MUCK – focus on your problem
  26. 26. HADOOP + R
  27. 27. Thank You

×