SlideShare a Scribd company logo
1 of 13
Download to read offline
Amazon Elastic MapReduce

       Peter Sirota
Amazon	
  Elas+c	
  MapReduce	
  
!  Enables	
  customers	
  to	
  easily	
  and	
  cost-­‐
   effec+vely	
  process	
  vast	
  amounts	
  of	
  data.	
  	
  
!  U+lizes	
  a	
  hosted	
  Hadoop	
  framework	
  
   running	
  on	
  the	
  web-­‐scale	
  infrastructure	
  
   of	
  Amazon.	
  
!  Launched	
  in	
  the	
  US	
  in	
  April	
  and	
  EU	
  in	
  July	
  
   of	
  2009	
  
Amazon	
  Elas+c	
  MapReduce	
  
!  Large	
  scale	
  data	
  processing	
  has	
  a	
  lot	
  of	
  
   MUCK	
  and	
  we	
  want	
  to	
  remove	
  it	
  for	
  our	
  
   customers	
  
    !  Hard	
  to	
  manage	
  compute	
  clusters	
  
    !  Hard	
  to	
  tune	
  Hadoop	
  
   !  Hadoop	
  issues	
  preven+ng	
  smooth	
  opera+on	
  
      in	
  the	
  cloud	
  

                           Amazon.com	
  Confiden+al	
     3	
  
Hadoop	
  made	
  simple	
  and	
  easy	
  
Amazon Elastic MapReduce




                                  Amazon EC2 Instances
                                                                                                 End
Deploy Application
                                 Hadoop                Hadoop     Hadoop
                       Elastic                                                       Elastic
                     MapReduce                                                     MapReduce
                                 Hadoop                Hadoop     Hadoop                       Notify
Web Console,
Command line tools               Input                                    output
                                 dataset                                  results



                                     Input	
  S3	
              Output	
  S3	
                  Get Results
    Input Data
                                      bucket	
                   bucket	
  



                                              Amazon S3
Amazon Elastic MapReduce
              Benefits
                 Uses as many or as few EC2 instances as needed.
   Elastic
                 Spin up large or small job flows in minutes.

                 Get up and running quickly with easy-to-use web
 Easy to use     console, robust command line clients and sample
                 jobs. No configuration necessary.

                 Fault tolerant service built on top of battle-tested
   Reliable
                 AWS infrastructure. Automatically retries failed tasks.

                 We monitor progress of your jobs and turn off
Cost Effective
                 resources when job flow is done.
Problems	
  customers	
  solve	
  with	
  	
  
               Elas+c	
  MapReduce	
  
!  Data	
  mining	
  (Log	
  processing,	
  click	
  stream	
  
   analysis,	
  similari+es,	
  etc.)	
  	
  
!  Bio-­‐informa+cs	
  (Genome	
  analysis)	
  	
  
!  Financial	
  simula+on	
  (Monte	
  Carlo	
  simula+on)	
  
!  File	
  processing	
  (resize	
  jpegs)	
  
!  Web	
  indexing	
  


                         Amazon.com	
  Confiden+al	
     7	
  
Customer	
  Feedback	
  
!   Pros:	
  
     !   Amazon	
  Elas+c	
  MapReduce	
  makes	
  it	
  easy	
  to	
  run	
  Hadoop	
  
         applica+ons.	
  
     !   Reliable	
  plaZorm	
  for	
  produc+on	
  data-­‐processing	
  
!   Challenges:	
  
     !   Simple	
  tasks	
  such	
  as	
  log	
  processing	
  require	
  fluency	
  in	
  
         MapReduce	
  
     !   Hadoop	
  applica+ons	
  are	
  difficult	
  to	
  develop	
  
New	
  Features
                                     	
  
!  Support	
  for	
  Apache	
  Pig	
  –	
  August	
  2009	
  
   !    Batch	
  and	
  interac+ve	
  mode	
  
   !    Concurrent	
  access	
  to	
  mul+ple	
  file	
  systems	
  
   !    Loading	
  resources	
  from	
  Amazon	
  S3	
  
   !    Addi+onal	
  Piggybank	
  func+ons	
  
   !    Integra+on	
  with	
  Elas+c	
  MapReduce	
  Client	
  
        and	
  Web	
  Console	
  
New	
  Features
                                             	
  
!  Support	
  for	
  Apache	
  Hive	
  0.4	
  –	
  Today	
  
    !  Batch	
  and	
  interac+ve	
  mode	
  
    !  Integra+on	
  with	
  Elas+c	
  MapReduce	
  Client	
  and	
  
       Web	
  Console	
  
    !  Addi+ons	
  to	
  Hive	
  	
  
        •    Load	
  table	
  par++ons	
  automa+cally	
  from	
  Amazon	
  S3	
  
        •    Specify	
  an	
  off-­‐instance	
  metadata	
  store	
  	
  
        •    Op+mized	
  data	
  writes	
  to	
  Amazon	
  S3	
  
        •    Reference	
  resources	
  on	
  Amazon	
  S3	
  
Amazon	
  Elas+c	
  MapReduce	
  Ecosystem	
  
!  Karmasphere	
  Studio	
  for	
  Hadoop	
  –	
  NetBeans	
  
   IDE	
  for	
  development,	
  debugging,	
  deployment	
  
   and	
  management	
  of	
  Hadoop	
  jobs	
  
   !    Deploy	
  Hadoop	
  jobs	
  to	
  Elas+c	
  MapReduce	
  
   !    Monitor	
  progress	
  of	
  Elas+c	
  MapReduce	
  job	
  flows	
  
   !    Amazon	
  S3	
  file	
  browser	
  
   !    Elas+c	
  MapReduce	
  HDFS	
  browser	
  
Amazon	
  Elas+c	
  MapReduce	
  Ecosystem	
  
!  Support	
  for	
  Cloudera’s	
  Hadoop	
  distribu+on	
  
   (private	
  beta)	
  
   !  Op+onally	
  use	
  Cloudera’s	
  Hadoop	
  while	
  execu+ng	
  
      Elas+c	
  MapReduce	
  job	
  flows	
  
   !  Get	
  support	
  from	
  Cloudera	
  for	
  the	
  Elas+c	
  
      MapReduce	
  job	
  flows	
  
Q&A	
  

More Related Content

What's hot

AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetupamarsri
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World VerticaOmer Trajman
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_awsHirokuni Uchida
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSAmazon Web Services
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce Amazon Web Services
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data PipelinesBill Liu
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015Debashis Saha
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesYang Li
 
Randall Hunt - AWS Midwest Community Day Keynote
Randall Hunt - AWS Midwest Community Day KeynoteRandall Hunt - AWS Midwest Community Day Keynote
Randall Hunt - AWS Midwest Community Day KeynoteAWS Chicago
 
Comparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsComparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsbasit raza
 
Scalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopScalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopDataWorks Summit
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Jim Dowling
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache KylinYang Li
 

What's hot (18)

AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Apache Lens at Hadoop meetup
Apache Lens at Hadoop meetupApache Lens at Hadoop meetup
Apache Lens at Hadoop meetup
 
Hadoop World Vertica
Hadoop World VerticaHadoop World Vertica
Hadoop World Vertica
 
EMR AWS Demo
EMR AWS DemoEMR AWS Demo
EMR AWS Demo
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 
Workshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECSWorkshop: Deploy a Deep Learning Framework on Amazon ECS
Workshop: Deploy a Deep Learning Framework on Amazon ECS
 
AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce AWS Office Hours: Amazon Elastic MapReduce
AWS Office Hours: Amazon Elastic MapReduce
 
Cloud Native Data Pipelines
Cloud Native Data PipelinesCloud Native Data Pipelines
Cloud Native Data Pipelines
 
The Evolution of Apache Kylin
The Evolution of Apache KylinThe Evolution of Apache Kylin
The Evolution of Apache Kylin
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Apache Kylin 1.5 Updates
Apache Kylin 1.5 UpdatesApache Kylin 1.5 Updates
Apache Kylin 1.5 Updates
 
Randall Hunt - AWS Midwest Community Day Keynote
Randall Hunt - AWS Midwest Community Day KeynoteRandall Hunt - AWS Midwest Community Day Keynote
Randall Hunt - AWS Midwest Community Day Keynote
 
Comparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutionsComparison of AWS, GCP & Azure web solutions
Comparison of AWS, GCP & Azure web solutions
 
Scalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for HadoopScalding: Twitter's New DSL for Hadoop
Scalding: Twitter's New DSL for Hadoop
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Design cube in Apache Kylin
Design cube in Apache KylinDesign cube in Apache Kylin
Design cube in Apache Kylin
 

Viewers also liked

Configuración de una conexión ordenador – teléfono móvil mediante una red WIFI
Configuración de una conexión ordenador – teléfono móvil mediante una red WIFIConfiguración de una conexión ordenador – teléfono móvil mediante una red WIFI
Configuración de una conexión ordenador – teléfono móvil mediante una red WIFIPaco Herraiz Ortega
 
Thibaud laurent cv septembre 2013
Thibaud laurent cv septembre 2013Thibaud laurent cv septembre 2013
Thibaud laurent cv septembre 2013laurent9425
 
TUCBC GRADUATION 2015 LECTURE - Prof Ademola Adedipe
TUCBC GRADUATION 2015 LECTURE - Prof Ademola AdedipeTUCBC GRADUATION 2015 LECTURE - Prof Ademola Adedipe
TUCBC GRADUATION 2015 LECTURE - Prof Ademola AdedipeAdemola ADEDIPE
 
Film Festival Clip
Film Festival Clip Film Festival Clip
Film Festival Clip Sara Porch
 
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)grangurusv
 
The activities of pumelo fruit juice (citrus maxima var
The activities of pumelo fruit juice (citrus maxima varThe activities of pumelo fruit juice (citrus maxima var
The activities of pumelo fruit juice (citrus maxima varAlexander Decker
 
Space Quarterly: September 2011
Space Quarterly:  September 2011Space Quarterly:  September 2011
Space Quarterly: September 2011Bill Duncan
 
Jornada territorios inteligentes 20140303 antaresii La Palma
Jornada territorios inteligentes 20140303 antaresii La PalmaJornada territorios inteligentes 20140303 antaresii La Palma
Jornada territorios inteligentes 20140303 antaresii La PalmaJoaquin Larrosa
 
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)Aristides Faria
 
Informativo GBOEX 2/2014
Informativo GBOEX 2/2014Informativo GBOEX 2/2014
Informativo GBOEX 2/2014GBOEX
 
Libro cocina saludable lody
Libro cocina saludable lodyLibro cocina saludable lody
Libro cocina saludable lodyCristy Julio
 
Tallers intercicle claustre metod.
Tallers intercicle claustre metod.Tallers intercicle claustre metod.
Tallers intercicle claustre metod.ceippuigdenvalls
 
Lean Start up @ Betahaus 2011
Lean Start up @ Betahaus 2011Lean Start up @ Betahaus 2011
Lean Start up @ Betahaus 2011Andreas Cem Vogt
 
Bass Pro Shops Case Study
Bass Pro Shops Case Study Bass Pro Shops Case Study
Bass Pro Shops Case Study Experian Hitwise
 

Viewers also liked (20)

Uruguay horrorizado: Pablo Borrás y sus homicidios
Uruguay horrorizado: Pablo Borrás y sus homicidiosUruguay horrorizado: Pablo Borrás y sus homicidios
Uruguay horrorizado: Pablo Borrás y sus homicidios
 
Configuración de una conexión ordenador – teléfono móvil mediante una red WIFI
Configuración de una conexión ordenador – teléfono móvil mediante una red WIFIConfiguración de una conexión ordenador – teléfono móvil mediante una red WIFI
Configuración de una conexión ordenador – teléfono móvil mediante una red WIFI
 
Thibaud laurent cv septembre 2013
Thibaud laurent cv septembre 2013Thibaud laurent cv septembre 2013
Thibaud laurent cv septembre 2013
 
Reabilitaçao para dor crônica
Reabilitaçao para dor crônicaReabilitaçao para dor crônica
Reabilitaçao para dor crônica
 
CDU - Brochure Tkn Chem
CDU - Brochure Tkn ChemCDU - Brochure Tkn Chem
CDU - Brochure Tkn Chem
 
TUCBC GRADUATION 2015 LECTURE - Prof Ademola Adedipe
TUCBC GRADUATION 2015 LECTURE - Prof Ademola AdedipeTUCBC GRADUATION 2015 LECTURE - Prof Ademola Adedipe
TUCBC GRADUATION 2015 LECTURE - Prof Ademola Adedipe
 
Film Festival Clip
Film Festival Clip Film Festival Clip
Film Festival Clip
 
Depuradora 4t
Depuradora 4t Depuradora 4t
Depuradora 4t
 
Nevera Aeg S73520CMW2
Nevera Aeg S73520CMW2Nevera Aeg S73520CMW2
Nevera Aeg S73520CMW2
 
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)
Gobierno de it_bdo_consulting_gobierno_de_ti_2(1)
 
The activities of pumelo fruit juice (citrus maxima var
The activities of pumelo fruit juice (citrus maxima varThe activities of pumelo fruit juice (citrus maxima var
The activities of pumelo fruit juice (citrus maxima var
 
Space Quarterly: September 2011
Space Quarterly:  September 2011Space Quarterly:  September 2011
Space Quarterly: September 2011
 
Indoor video wall from soniq
Indoor video wall from soniqIndoor video wall from soniq
Indoor video wall from soniq
 
Jornada territorios inteligentes 20140303 antaresii La Palma
Jornada territorios inteligentes 20140303 antaresii La PalmaJornada territorios inteligentes 20140303 antaresii La Palma
Jornada territorios inteligentes 20140303 antaresii La Palma
 
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)
Disciplina Organizacao de Eventos (I) (IFSP Campus Cubatao) (aula 14)
 
Informativo GBOEX 2/2014
Informativo GBOEX 2/2014Informativo GBOEX 2/2014
Informativo GBOEX 2/2014
 
Libro cocina saludable lody
Libro cocina saludable lodyLibro cocina saludable lody
Libro cocina saludable lody
 
Tallers intercicle claustre metod.
Tallers intercicle claustre metod.Tallers intercicle claustre metod.
Tallers intercicle claustre metod.
 
Lean Start up @ Betahaus 2011
Lean Start up @ Betahaus 2011Lean Start up @ Betahaus 2011
Lean Start up @ Betahaus 2011
 
Bass Pro Shops Case Study
Bass Pro Shops Case Study Bass Pro Shops Case Study
Bass Pro Shops Case Study
 

Similar to Hw09 Making Hadoop Easy On Amazon Web Services

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebjineshvaria
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1Milind gunjan
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Amazon Web Services
 
AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) Julien SIMON
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Distributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsDistributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsRobert Coup
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Amazon Web Services
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The CloudTed Leung
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...Amazon Web Services
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Robert Grossman
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAmazon Web Services
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Amazon Web Services
 
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017Amazon Web Services
 

Similar to Hw09 Making Hadoop Easy On Amazon Web Services (20)

B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Cloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWebCloud Architectures - Jinesh Varia - GrepTheWeb
Cloud Architectures - Jinesh Varia - GrepTheWeb
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
Think Big Data, Think Cloud - AWS Presentation - AWS Cloud Storage for the En...
 
AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2)
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Distributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the cloudsDistributed-ness: Distributed computing & the clouds
Distributed-ness: Distributed computing & the clouds
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
Build Data Lakes & Analytics on AWS: Patterns & Best Practices - BDA305 - Ana...
 
Seeding The Cloud
Seeding The CloudSeeding The Cloud
Seeding The Cloud
 
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS...
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)Processing Big Data (Chapter 3, SC 11 Tutorial)
Processing Big Data (Chapter 3, SC 11 Tutorial)
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Analytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWSAnalytics on the Cloud with Tableau on AWS
Analytics on the Cloud with Tableau on AWS
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
High-Throughput Genomics on AWS - LFS309 - re:Invent 2017
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Recently uploaded (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

Hw09 Making Hadoop Easy On Amazon Web Services

  • 2. Amazon  Elas+c  MapReduce   !  Enables  customers  to  easily  and  cost-­‐ effec+vely  process  vast  amounts  of  data.     !  U+lizes  a  hosted  Hadoop  framework   running  on  the  web-­‐scale  infrastructure   of  Amazon.   !  Launched  in  the  US  in  April  and  EU  in  July   of  2009  
  • 3. Amazon  Elas+c  MapReduce   !  Large  scale  data  processing  has  a  lot  of   MUCK  and  we  want  to  remove  it  for  our   customers   !  Hard  to  manage  compute  clusters   !  Hard  to  tune  Hadoop   !  Hadoop  issues  preven+ng  smooth  opera+on   in  the  cloud   Amazon.com  Confiden+al   3  
  • 4. Hadoop  made  simple  and  easy  
  • 5. Amazon Elastic MapReduce Amazon EC2 Instances End Deploy Application Hadoop Hadoop Hadoop Elastic Elastic MapReduce MapReduce Hadoop Hadoop Hadoop Notify Web Console, Command line tools Input output dataset results Input  S3   Output  S3   Get Results Input Data bucket   bucket   Amazon S3
  • 6. Amazon Elastic MapReduce Benefits Uses as many or as few EC2 instances as needed. Elastic Spin up large or small job flows in minutes. Get up and running quickly with easy-to-use web Easy to use console, robust command line clients and sample jobs. No configuration necessary. Fault tolerant service built on top of battle-tested Reliable AWS infrastructure. Automatically retries failed tasks. We monitor progress of your jobs and turn off Cost Effective resources when job flow is done.
  • 7. Problems  customers  solve  with     Elas+c  MapReduce   !  Data  mining  (Log  processing,  click  stream   analysis,  similari+es,  etc.)     !  Bio-­‐informa+cs  (Genome  analysis)     !  Financial  simula+on  (Monte  Carlo  simula+on)   !  File  processing  (resize  jpegs)   !  Web  indexing   Amazon.com  Confiden+al   7  
  • 8. Customer  Feedback   !   Pros:   !   Amazon  Elas+c  MapReduce  makes  it  easy  to  run  Hadoop   applica+ons.   !   Reliable  plaZorm  for  produc+on  data-­‐processing   !   Challenges:   !   Simple  tasks  such  as  log  processing  require  fluency  in   MapReduce   !   Hadoop  applica+ons  are  difficult  to  develop  
  • 9. New  Features   !  Support  for  Apache  Pig  –  August  2009   !  Batch  and  interac+ve  mode   !  Concurrent  access  to  mul+ple  file  systems   !  Loading  resources  from  Amazon  S3   !  Addi+onal  Piggybank  func+ons   !  Integra+on  with  Elas+c  MapReduce  Client   and  Web  Console  
  • 10. New  Features   !  Support  for  Apache  Hive  0.4  –  Today   !  Batch  and  interac+ve  mode   !  Integra+on  with  Elas+c  MapReduce  Client  and   Web  Console   !  Addi+ons  to  Hive     •  Load  table  par++ons  automa+cally  from  Amazon  S3   •  Specify  an  off-­‐instance  metadata  store     •  Op+mized  data  writes  to  Amazon  S3   •  Reference  resources  on  Amazon  S3  
  • 11. Amazon  Elas+c  MapReduce  Ecosystem   !  Karmasphere  Studio  for  Hadoop  –  NetBeans   IDE  for  development,  debugging,  deployment   and  management  of  Hadoop  jobs   !  Deploy  Hadoop  jobs  to  Elas+c  MapReduce   !  Monitor  progress  of  Elas+c  MapReduce  job  flows   !  Amazon  S3  file  browser   !  Elas+c  MapReduce  HDFS  browser  
  • 12. Amazon  Elas+c  MapReduce  Ecosystem   !  Support  for  Cloudera’s  Hadoop  distribu+on   (private  beta)   !  Op+onally  use  Cloudera’s  Hadoop  while  execu+ng   Elas+c  MapReduce  job  flows   !  Get  support  from  Cloudera  for  the  Elas+c   MapReduce  job  flows