Hw09   Making Hadoop Easy On Amazon Web Services
 

Hw09 Making Hadoop Easy On Amazon Web Services

on

  • 4,263 views

 

Statistics

Views

Total Views
4,263
Slideshare-icon Views on SlideShare
3,874
Embed Views
389

Actions

Likes
3
Downloads
113
Comments
0

5 Embeds 389

http://www.wegethosting.com 370
http://www.slideshare.net 14
http://webcache.googleusercontent.com 2
https://www.wegethosting.com 2
https://s1-eu3.ixquick-proxy.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Hw09   Making Hadoop Easy On Amazon Web Services Hw09 Making Hadoop Easy On Amazon Web Services Presentation Transcript

    • Amazon Elastic MapReduce Peter Sirota
    • Amazon  Elas+c  MapReduce   !  Enables  customers  to  easily  and  cost-­‐ effec+vely  process  vast  amounts  of  data.     !  U+lizes  a  hosted  Hadoop  framework   running  on  the  web-­‐scale  infrastructure   of  Amazon.   !  Launched  in  the  US  in  April  and  EU  in  July   of  2009  
    • Amazon  Elas+c  MapReduce   !  Large  scale  data  processing  has  a  lot  of   MUCK  and  we  want  to  remove  it  for  our   customers   !  Hard  to  manage  compute  clusters   !  Hard  to  tune  Hadoop   !  Hadoop  issues  preven+ng  smooth  opera+on   in  the  cloud   Amazon.com  Confiden+al   3  
    • Hadoop  made  simple  and  easy  
    • Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket   bucket   Amazon S3
    • Amazon Elastic MapReduce Benefits Uses as many or as few EC2 instances as needed. Elastic Spin up large or small job flows in minutes. Get up and running quickly with easy-to-use web Easy to use console, robust command line clients and sample jobs. No configuration necessary. Fault tolerant service built on top of battle-tested Reliable AWS infrastructure. Automatically retries failed tasks. We monitor progress of your jobs and turn off Cost Effective resources when job flow is done.
    • Problems  customers  solve  with     Elas+c  MapReduce   !  Data  mining  (Log  processing,  click  stream   analysis,  similari+es,  etc.)     !  Bio-­‐informa+cs  (Genome  analysis)     !  Financial  simula+on  (Monte  Carlo  simula+on)   !  File  processing  (resize  jpegs)   !  Web  indexing   Amazon.com  Confiden+al   7  
    • Customer  Feedback   !   Pros:   !   Amazon  Elas+c  MapReduce  makes  it  easy  to  run  Hadoop   applica+ons.   !   Reliable  plaZorm  for  produc+on  data-­‐processing   !   Challenges:   !   Simple  tasks  such  as  log  processing  require  fluency  in   MapReduce   !   Hadoop  applica+ons  are  difficult  to  develop  
    • New  Features   !  Support  for  Apache  Pig  –  August  2009   !  Batch  and  interac+ve  mode   !  Concurrent  access  to  mul+ple  file  systems   !  Loading  resources  from  Amazon  S3   !  Addi+onal  Piggybank  func+ons   !  Integra+on  with  Elas+c  MapReduce  Client   and  Web  Console  
    • New  Features   !  Support  for  Apache  Hive  0.4  –  Today   !  Batch  and  interac+ve  mode   !  Integra+on  with  Elas+c  MapReduce  Client  and   Web  Console   !  Addi+ons  to  Hive     •  Load  table  par++ons  automa+cally  from  Amazon  S3   •  Specify  an  off-­‐instance  metadata  store     •  Op+mized  data  writes  to  Amazon  S3   •  Reference  resources  on  Amazon  S3  
    • Amazon  Elas+c  MapReduce  Ecosystem   !  Karmasphere  Studio  for  Hadoop  –  NetBeans   IDE  for  development,  debugging,  deployment   and  management  of  Hadoop  jobs   !  Deploy  Hadoop  jobs  to  Elas+c  MapReduce   !  Monitor  progress  of  Elas+c  MapReduce  job  flows   !  Amazon  S3  file  browser   !  Elas+c  MapReduce  HDFS  browser  
    • Amazon  Elas+c  MapReduce  Ecosystem   !  Support  for  Cloudera’s  Hadoop  distribu+on   (private  beta)   !  Op+onally  use  Cloudera’s  Hadoop  while  execu+ng   Elas+c  MapReduce  job  flows   !  Get  support  from  Cloudera  for  the  Elas+c   MapReduce  job  flows  
    • Q&A