SlideShare a Scribd company logo
Elastic MapReduce Office Hours April 13th, 2011
Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees.  Please let us know what you would like to see.
Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer  Please begin submitting questions now
What’s New?
S3 Multipart Upload ,[object Object]
Can begin upload process before a Hadoop task has finished
Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop  –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
Elastic IP Address Integration ,[object Object]
Reference the same IP address for a long-running job flow even if it has to be restarted
Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
EC2 Instance Tagging ,[object Object]
Useful if you are running multiple job flows concurrently or managing a large numbers of Amazon EC2 instances,[object Object]
Example 1 – Resize a running job flow ,[object Object]
Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate  4 instances Expand to  25 instances Expand to  9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to  9 instances Expand to  25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
Job Flow Architecture
Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow  j-ABABABABAB  // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
Example 2 – Launch a Hive-based Data Warehouse Apache Hive Data warehouse for Hadoop Open source project started at Facebook Turns data on Hadoop into a virtually limitless data warehouse Provides data summarization, ad hoc querying and analysis Enables SQL-like queries on structured and unstructured data ,[object Object],Inherits linear scalability of Hadoop
Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow  j-ABABABABAB  // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB  // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output  hive>
Define Impressions Table Show Tables hive> show tables;  OK Time taken: 3.51 seconds hive>  Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring )  PARTITIONED BY (dt string)  ROW FORMAT  serde'com.amazon.elasticmapreduce.JsonSerde'  with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId,  					referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring  )  PARTITIONED BY (dt string)  ROW FORMAT  SERDE 'com.amazon.elasticmapreduce.JsonSerde'  WITH SERDEPROPERTIES ( 'paths'='impressionId' )  LOCATION '${SAMPLE}/tables/clicks'  ;  ALTER TABLE clicks RECOVER PARTITIONS ;
Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows.  CREATE EXTERNAL TABLE joined_impressions (  requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean  )  PARTITIONED BY (day string, hour string)  	STORED AS SEQUENCEFILE  	LOCATION '${OUTPUT}/joined_impressions'  ;

More Related Content

What's hot

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
Amazon Web Services
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
Amazon Web Services
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排
Amazon Web Services
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
Emmanuel Quentin
 
AWS Partner Presentation - SAP
AWS Partner Presentation - SAP AWS Partner Presentation - SAP
AWS Partner Presentation - SAP
Amazon Web Services
 
Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2
Intelligentia IT Systems Pvt. Ltd.
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Julien SIMON
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
Amazon Web Services
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Amazon Web Services
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Web Services
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
Laurent Bernaille
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)
Julien SIMON
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Amazon Web Services
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Amazon Web Services
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics session
Sharad Gupta
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
Amazon Web Services
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
Amazon Web Services
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
Adam Book
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
Amazon Web Services
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
Amazon Web Services
 

What's hot (20)

AWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWSAWS for Startups, London - Programming AWS
AWS for Startups, London - Programming AWS
 
Gaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your LogsGaining Operational Insights out of Your Logs
Gaining Operational Insights out of Your Logs
 
使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排使用 Blox 實現容器任務調度與資源編排
使用 Blox 實現容器任務調度與資源編排
 
AWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacksAWS re:Invent 2016 : announcement, technical demos and feedbacks
AWS re:Invent 2016 : announcement, technical demos and feedbacks
 
AWS Partner Presentation - SAP
AWS Partner Presentation - SAP AWS Partner Presentation - SAP
AWS Partner Presentation - SAP
 
Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2Understanding The Benefits Of Amazon EC2
Understanding The Benefits Of Amazon EC2
 
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...Using Amazon CloudWatch Events,  AWS Lambda and Spark Streaming  to Process E...
Using Amazon CloudWatch Events, AWS Lambda and Spark Streaming to Process E...
 
Introduction to Amazon EC2
Introduction to Amazon EC2Introduction to Amazon EC2
Introduction to Amazon EC2
 
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
Using Amazon Cloudwatch Events, AWS Lambda and Spark Streaming to Process EC2...
 
Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算Amazon Batch: 實現簡單且有效率的批次運算
Amazon Batch: 實現簡單且有效率的批次運算
 
Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016Feedback on AWS re:invent 2016
Feedback on AWS re:invent 2016
 
Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)Running Docker clusters on AWS (June 2016)
Running Docker clusters on AWS (June 2016)
 
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot InstancesWorkshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
Workshop; Deploy a Deep Learning Framework on Amazon ECS and Spot Instances
 
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
Distributed Serverless Stack Tracing and Monitoring - DevDay Los Angeles 2017
 
AWS basics session
AWS basics sessionAWS basics session
AWS basics session
 
Introduction to Batch Processing on AWS
Introduction to Batch Processing on AWSIntroduction to Batch Processing on AWS
Introduction to Batch Processing on AWS
 
Amazon EC2 Masterclass
Amazon EC2 MasterclassAmazon EC2 Masterclass
Amazon EC2 Masterclass
 
AWS Cloud Formation
AWS Cloud Formation AWS Cloud Formation
AWS Cloud Formation
 
Building Your First Big Data Application on AWS
Building Your First Big Data Application on AWSBuilding Your First Big Data Application on AWS
Building Your First Big Data Application on AWS
 
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel AvivAn introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
An introduction to AWS CloudFormation - Pop-up Loft Tel Aviv
 

Viewers also liked

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Amazon Web Services
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
huguk
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikin
Amazon Web Services
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in Taipei
Amazon Web Services
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Amazon Web Services
 
Running databases on AWS
Running databases on AWSRunning databases on AWS
Running databases on AWS
Amazon Web Services
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
Amazon Web Services
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
Adrian Hornsby
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Amazon Web Services
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
Vladimir Simek
 
Technical Track
Technical TrackTechnical Track
Technical Track
Amazon Web Services
 

Viewers also liked (11)

Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)Masterclass Webinar - Amazon Elastic MapReduce (EMR)
Masterclass Webinar - Amazon Elastic MapReduce (EMR)
 
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduceMatthew Bishop - A Quick Introduction to AWS Elastic MapReduce
Matthew Bishop - A Quick Introduction to AWS Elastic MapReduce
 
AWS Customer Presentation - kikin
AWS Customer Presentation -  kikinAWS Customer Presentation -  kikin
AWS Customer Presentation - kikin
 
AWS 101 business seminar in Taipei
AWS 101 business seminar in TaipeiAWS 101 business seminar in Taipei
AWS 101 business seminar in Taipei
 
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSVArchitecting for the Cloud: Demo and Best Practicses - Janakiram MSV
Architecting for the Cloud: Demo and Best Practicses - Janakiram MSV
 
Running databases on AWS
Running databases on AWSRunning databases on AWS
Running databases on AWS
 
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWSAWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
AWS Tech Summit - Berlin 2011 - Choosing and Running Databases on AWS
 
Bringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potentialBringing Wireless Sensing to its full potential
Bringing Wireless Sensing to its full potential
 
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
How to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutesHow to run your Hadoop Cluster in 10 minutes
How to run your Hadoop Cluster in 10 minutes
 
Technical Track
Technical TrackTechnical Track
Technical Track
 

Similar to AWS Office Hours: Amazon Elastic MapReduce

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
BigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
Ian Massingham
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
Amazon Web Services
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS Chicago
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Spain
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
Krishna Sankar
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
Amazon Web Services
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
Andrew Lamb
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
Vinay Kumar
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
Promet Source
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
Long Dao
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
Altinity Ltd
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
Dan Morrill
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
Madhur Nawandar
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
Nag Arvind Gudiseva
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
Amin Uddin
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
Amazon Web Services Korea
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
zingopen
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
Chester Chen
 

Similar to AWS Office Hours: Amazon Elastic MapReduce (20)

Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCampRichard Cole of Amazon Gives Lightning Tallk at BigDataCamp
Richard Cole of Amazon Gives Lightning Tallk at BigDataCamp
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
AWS user group Serverless in September - Chris Johnson Bidler "Go Serverless ...
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data... Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
Big Data Web applications for Interactive Hadoop by ENRICO BERTI at Big Data...
 
Nosql hands on handout 04
Nosql hands on handout 04Nosql hands on handout 04
Nosql hands on handout 04
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Tuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paperTuning and optimizing webcenter spaces application white paper
Tuning and optimizing webcenter spaces application white paper
 
AutoScaling and Drupal
AutoScaling and DrupalAutoScaling and Drupal
AutoScaling and Drupal
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...ClickHouse materialized views - a secret weapon for high performance analytic...
ClickHouse materialized views - a secret weapon for high performance analytic...
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Apache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBaseApache Drill with Oracle, Hive and HBase
Apache Drill with Oracle, Hive and HBase
 
Advance Sql Server Store procedure Presentation
Advance Sql Server Store procedure PresentationAdvance Sql Server Store procedure Presentation
Advance Sql Server Store procedure Presentation
 
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
AWS Batch를 통한 손쉬운 일괄 처리 작업 관리하기 - 윤석찬 (AWS 테크에반젤리스트)
 
hadoop&zing
hadoop&zinghadoop&zing
hadoop&zing
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 

AWS Office Hours: Amazon Elastic MapReduce

  • 1. Elastic MapReduce Office Hours April 13th, 2011
  • 2. Introduction Richard Cole, Lead Engineer Adam Gray, Product Manager
  • 3. Office Hours IS Simply, Office Hours is a program the enables a technical audience the ability to interact with AWS technical experts. We look to improve this program by soliciting feedback from Office Hours attendees. Please let us know what you would like to see.
  • 4. Office Hours is NOT Support If you have a problem with your current deployment please visit our forums or our support website http://aws.amazon.com/premiumsupport/ A place to find out about upcoming services We do not typically disclose information about our services until they are available
  • 5. Agenda What’s New How-to Demonstrations Resize a running job flow Launch a Hive-based Data Warehouse (Contextual Advertising Example) Question and Answer Please begin submitting questions now
  • 7.
  • 8. Can begin upload process before a Hadoop task has finished
  • 9. Parallel uploads and earlier start mean data-intensive applications finish significantly fasterUsage: ./elastic-mapreduce--create --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop –args “-c,fs.s3n.multipart.uploads.enabled=true, -c,fs.s3n.multipart.uploads.split.size=524288000”
  • 10.
  • 11. Reference the same IP address for a long-running job flow even if it has to be restarted
  • 12. Reference transient job flows in a consistent way each time they are launchedUsage: ./elastic-mapreduce --create --eip [existing_ip]
  • 13.
  • 14.
  • 15.
  • 16. Customize cluster size to support varying resource needs (e.g., query support during the day versus batch processing overnight)Job Flow Job Flow Job Flow Data Warehouse (Batch Processing) Allocate 4 instances Expand to 25 instances Expand to 9 instances 3 Hours Data Warehouse (Steady State) Data Warehouse (Steady State) Shrink to 9 instances Expand to 25 instances Time remaining: Time remaining: 14 Hours Time remaining: 7 Hours
  • 18. Launch a Job Flow // Launch a job flow with 8 core nodes ./elastic-mapreduce--create --alive --instance-group master --instance-type m2.2xlarge --instance-count 1 --instance-group core --instance-type m1.small --instance-count 8 Created job flow j-ABABABABAB // Describe job flow ./elastic-mapreduce --describe j-ABABABABAB
  • 19. Resize Job Flow // Add 8 m2.4xlarge TASK Nodes ./elastic-mapreduce --jobflow j-ABABABABAB --add-instance-group task --instance-type m2.4xlarge --instance-count 8 // Expand CORE Nodes from 8 to 12 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group core --instance-count 12 // Shrink TASK Nodes from 8 to 6 ./elastic-mapreduce --jobflow j-ABABABABAB --modify-instance-group task --instance-count 6
  • 20. Terminate Job Flow // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 21.
  • 22. Launch a Hive Cluster(Contextual Advertising Example) // Launch a Hive cluster with cluster compute nodes ./elastic-mapreduce --create --alive --hive-interactive --name "Hive Job Flow" --instance-type cc1.4xlarge Created job flow j-ABABABABAB // SSH to Master Node ./elastic-mapreduce --ssh --jobflow j-ABABABABAB // Run a Hive Session on the Master Node hadoop@domU-12-31-39-07-D2-14:~$ hive -d SAMPLE=s3://elasticmapreduce/samples/hive-ads -d DAY=2009-04-13 -d HOUR=08 -d NEXT_DAY=2009-04-13 -d NEXT_HOUR=09 -d OUTPUT=s3://mybucket/samples/output hive>
  • 23. Define Impressions Table Show Tables hive> show tables; OK Time taken: 3.51 seconds hive> Define Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ipstring ) PARTITIONED BY (dt string) ROW FORMAT serde'com.amazon.elasticmapreduce.JsonSerde' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ;
  • 24. Recover Partitions The table is partitioned based on time, we can tell Hive about the existence of a single partition using the following statement. ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ; If we were to query the table at this point the results would contain data from just the single partition. We can instruct Hive to recover all partitions by inspecting the data stored in Amazon S3 using the RECOVER PARTITIONS statement. ALTER TABLE impressions RECOVER PARTITIONS ;
  • 25. Define Clicks Table We follow the same process to define the clicks table and recover its partitions. CREATE EXTERNAL TABLE clicks ( impressionIdstring ) PARTITIONED BY (dt string) ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde' WITH SERDEPROPERTIES ( 'paths'='impressionId' ) LOCATION '${SAMPLE}/tables/clicks' ; ALTER TABLE clicks RECOVER PARTITIONS ;
  • 26. Define Output Table We are going to combine the clicks and impressions tables so that we have a record of whether or not each impression resulted in a click. We'd like this data stored in Amazon S3 so that it can be used as input to other job flows. CREATE EXTERNAL TABLE joined_impressions ( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string, clicked Boolean ) PARTITIONED BY (day string, hour string) STORED AS SEQUENCEFILE LOCATION '${OUTPUT}/joined_impressions' ;
  • 27. Define Local Impressions Table Next, we create some temporary tables in the job flow's local HDFS partition to store intermediate impression and click data. CREATE TABLE tmp_impressions( requestBeginTimestring, adId string, impressionId string, referrer string, userAgentstring, userCookie string, ip string ) STORED AS SEQUENCEFILE ; We insert data from the impressions table for the time duration we're interested in. Note that because the impressions table is partitioned only the relevant partitions will be read. INSERT OVERWRITE TABLE tmp_impressions SELECT from_unixtime(cast((cast(i.requestBeginTime as bigint) / 1000) as int)) requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip FROM impressions i WHERE i.dt >= '${DAY}-${HOUR}-00' AND i.dt < '${NEXT_DAY}-${NEXT_HOUR}-00' ;
  • 28. Define Local Clicks Table For clicks, we extend the period of time over which we join by 20 minutes. Meaning we accept a click that occurred up to 20 minutes after the impression. CREATE TABLE tmp_clicks( impressionIdstring ) STORED AS SEQUENCEFILE; INSERT OVERWRITE TABLE tmp_clicks SELECT impressionId FROM clicks c WHERE c.dt>= '${DAY}-${HOUR}-00' AND c.dt < '${NEXT_DAY}-${NEXT_HOUR}-20' ;
  • 29. Join Tables Now we combine the impressions and clicks tables using a left outer join. This way any impressions that did not result in a click are preserved. INSERT OVERWRITE TABLE joined_impressions PARTITION (day='${DAY}', hour='${HOUR}') SELECT i.requestBeginTime, i.adId, i.impressionId, i.referrer, i.userAgent, i.userCookie, i.ip, (c.impressionId is not null) clicked FROM tmp_impressionsi LEFT OUTER JOIN tmp_clicksc ON i.impressionId = c.impressionId ;
  • 30. Terminate Interactive Session // Terminate Job Flow ./elastic-mapreduce--terminate j-ABABABABAB
  • 31. Question & Answer Visithttp://aws.amazon.com/officehours to watch recorded sessions and to sign up for upcoming sessions.