Amazon Elastic MapReduce

       Peter Sirota
Amazon	
  Elas+c	
  MapReduce	
  
!  Enables	
  customers	
  to	
  easily	
  and	
  cost-­‐
   effec+vely	
  process	
  vas...
Amazon	
  Elas+c	
  MapReduce	
  
!  Large	
  scale	
  data	
  processing	
  has	
  a	
  lot	
  of	
  
   MUCK	
  and	
  w...
Hadoop	
  made	
  simple	
  and	
  easy	
  
Amazon Elastic MapReduce




                                  Amazon EC2 Instances
                                      ...
Amazon Elastic MapReduce
              Benefits
                 Uses as many or as few EC2 instances as needed.
   Elasti...
Problems	
  customers	
  solve	
  with	
  	
  
               Elas+c	
  MapReduce	
  
!  Data	
  mining	
  (Log	
  process...
Customer	
  Feedback	
  
!   Pros:	
  
     !   Amazon	
  Elas+c	
  MapReduce	
  makes	
  it	
  easy	
  to	
  run	
  Hadoo...
New	
  Features
                                     	
  
!  Support	
  for	
  Apache	
  Pig	
  –	
  August	
  2009	
  
  ...
New	
  Features
                                             	
  
!  Support	
  for	
  Apache	
  Hive	
  0.4	
  –	
  Today...
Amazon	
  Elas+c	
  MapReduce	
  Ecosystem	
  
!  Karmasphere	
  Studio	
  for	
  Hadoop	
  –	
  NetBeans	
  
   IDE	
  fo...
Amazon	
  Elas+c	
  MapReduce	
  Ecosystem	
  
!  Support	
  for	
  Cloudera’s	
  Hadoop	
  distribu+on	
  
   (private	
 ...
Q&A	
  
Upcoming SlideShare
Loading in...5
×

Hw09 Making Hadoop Easy On Amazon Web Services

2,868

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,868
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
115
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Hw09 Making Hadoop Easy On Amazon Web Services

  1. 1. Amazon Elastic MapReduce Peter Sirota
  2. 2. Amazon  Elas+c  MapReduce   !  Enables  customers  to  easily  and  cost-­‐ effec+vely  process  vast  amounts  of  data.     !  U+lizes  a  hosted  Hadoop  framework   running  on  the  web-­‐scale  infrastructure   of  Amazon.   !  Launched  in  the  US  in  April  and  EU  in  July   of  2009  
  3. 3. Amazon  Elas+c  MapReduce   !  Large  scale  data  processing  has  a  lot  of   MUCK  and  we  want  to  remove  it  for  our   customers   !  Hard  to  manage  compute  clusters   !  Hard  to  tune  Hadoop   !  Hadoop  issues  preven+ng  smooth  opera+on   in  the  cloud   Amazon.com  Confiden+al   3  
  4. 4. Hadoop  made  simple  and  easy  
  5. 5. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket   bucket   Amazon S3
  6. 6. Amazon Elastic MapReduce Benefits Uses as many or as few EC2 instances as needed. Elastic Spin up large or small job flows in minutes. Get up and running quickly with easy-to-use web Easy to use console, robust command line clients and sample jobs. No configuration necessary. Fault tolerant service built on top of battle-tested Reliable AWS infrastructure. Automatically retries failed tasks. We monitor progress of your jobs and turn off Cost Effective resources when job flow is done.
  7. 7. Problems  customers  solve  with     Elas+c  MapReduce   !  Data  mining  (Log  processing,  click  stream   analysis,  similari+es,  etc.)     !  Bio-­‐informa+cs  (Genome  analysis)     !  Financial  simula+on  (Monte  Carlo  simula+on)   !  File  processing  (resize  jpegs)   !  Web  indexing   Amazon.com  Confiden+al   7  
  8. 8. Customer  Feedback   !   Pros:   !   Amazon  Elas+c  MapReduce  makes  it  easy  to  run  Hadoop   applica+ons.   !   Reliable  plaZorm  for  produc+on  data-­‐processing   !   Challenges:   !   Simple  tasks  such  as  log  processing  require  fluency  in   MapReduce   !   Hadoop  applica+ons  are  difficult  to  develop  
  9. 9. New  Features   !  Support  for  Apache  Pig  –  August  2009   !  Batch  and  interac+ve  mode   !  Concurrent  access  to  mul+ple  file  systems   !  Loading  resources  from  Amazon  S3   !  Addi+onal  Piggybank  func+ons   !  Integra+on  with  Elas+c  MapReduce  Client   and  Web  Console  
  10. 10. New  Features   !  Support  for  Apache  Hive  0.4  –  Today   !  Batch  and  interac+ve  mode   !  Integra+on  with  Elas+c  MapReduce  Client  and   Web  Console   !  Addi+ons  to  Hive     •  Load  table  par++ons  automa+cally  from  Amazon  S3   •  Specify  an  off-­‐instance  metadata  store     •  Op+mized  data  writes  to  Amazon  S3   •  Reference  resources  on  Amazon  S3  
  11. 11. Amazon  Elas+c  MapReduce  Ecosystem   !  Karmasphere  Studio  for  Hadoop  –  NetBeans   IDE  for  development,  debugging,  deployment   and  management  of  Hadoop  jobs   !  Deploy  Hadoop  jobs  to  Elas+c  MapReduce   !  Monitor  progress  of  Elas+c  MapReduce  job  flows   !  Amazon  S3  file  browser   !  Elas+c  MapReduce  HDFS  browser  
  12. 12. Amazon  Elas+c  MapReduce  Ecosystem   !  Support  for  Cloudera’s  Hadoop  distribu+on   (private  beta)   !  Op+onally  use  Cloudera’s  Hadoop  while  execu+ng   Elas+c  MapReduce  job  flows   !  Get  support  from  Cloudera  for  the  Elas+c   MapReduce  job  flows  
  13. 13. Q&A  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×