SlideShare a Scribd company logo
1 of 35
Download to read offline
November 14, 2014 | Las Vegas, NV 
Steve McPherson
instance AMI DB on 
instance 
instance with 
CloudWatch 
Elastic IP optimized 
instance 
Amazon 
WorkSpaces 
assignment/ 
task 
Amazon EMR cluster MapR M3 
engine 
MapR M5 
engine 
MapR M7 
engine 
engine 
Kinesis-enabled 
app 
new! Amazon 
Route 53 
hosted zone route table 
solid state disks 
AWS Direct Connect 
router 
Amazon RDS 
customer 
gateway 
attribute 
VPC peering 
Auto Scaling 
Amazon S3 bucket with 
objects 
object AWS Import/Export 
AWS Storage 
Gateway 
Amazon EBS volume snapshot cached 
volume 
virtual tape 
library 
Elastic Beanstalk 
Amazon Glacier archive vault 
CloudFront download 
distribution Node.js 
streaming 
distribution 
items 
DynamoDB table attributes global 
secondary 
index 
Amazon 
RDS DB Kinesis 
instance 
RDS DB 
instance standby 
(Multi-AZ) Oracle DB 
instance 
MS SQL 
instance 
PostgreSQL 
instance 
PIOP Redis Memcached 
new! new! new! new! 
AWS CloudTrail 
instances 
Amazon SimpleDB domain Amazon Redshift 
new! 
DW1 
Dense Compute 
ElastiCache 
DW2 
Dense Compute 
edge location 
AWS Toolkit for 
Visual Studio 
applicatiJoanvaScript 
stack 
Amazon VPC VPN 
connection 
virtual private 
gateway 
alarm 
stack 
Internet 
gateway 
.NET 
RDS DB 
instance read 
replica 
IAMJava Python (boto) 
AWS CLI 
permissions role 
MFA token 
new! 
new! new! 
AWS OpsWorks 
elastic network 
instance 
data encryption PHP 
key 
AWS Data Pipeline 
monitoring 
new! 
new! 
deployment CloudWatch 
Elastic Load 
Balancing 
SQL master 
new! new! 
Amazon EC2 
new! 
SQL slave 
encrypted 
data 
AWS Tools for 
Windows 
PowerShell 
non-cached 
volume 
users 
IAM add-on 
deployments 
bucket 
deployments 
new! 
permissions 
iOS 
resources 
cache node 
stack 
AWS OpsWorks layers 
apps 
new! 
new! apps 
new! 
Amazon SNS 
new! 
Human Intelligence 
Tasks (HIT) 
AWS Simple Icons: Deployment & Management 
instances 
new! 
new! new! 
Ruby 
new! 
instances 
new! 
resources permissions 
new! 
topic 
new! 
template 
AWS Toolkit 
for Eclipse 
Amazon SES 
traditional server 
Elastic 
Transcoder 
email 
monitoring 
Requester 
email notification HTTP notification 
Amazon 
CloudSearch 
SDF metadata 
Amazon SQS 
item 
message 
Amazon SWF 
decider 
layers 
worker 
disk tape storage 
Internet user 
Amazon 
Mechanical Turk 
client mobile client multimedia 
workers 
corporate 
data center 
generic database 
Android 
AWS Security 
Token Service 
AWS cloud 
AWS Management 
Console 
virtual private cloud forums 
MySQL DB 
instance 
queue AMAZON 
EMR
Big decisions need Big Data 
Server 
Purchase 
Social 
Media
Extract Transform Load to 
Data Warehouse 
Report 
Generation 
Ad Hoc 
Analysis
Hadoop 
Hadoop can help
Difficult, expensive, and time consuming to operate 
Hadoop 
But Hadoop needs help
Amazon EMR makes Hadoop easy
Extract Transform & Load 
Data Warehouse 
Report Generation & Ad Hoc Analysis 
Amazon S3 
•MapReduce API 
•Scoop 
•Spark 
•Cascading 
•Pig 
•MR 
•Hive 
•Spark 
•Cascading 
•Pig 
•Presto 
•Hive 
•Spark-SQL 
•Lingual 
•Parquet 
•ORC 
•SEQ 
•Text 
Extract 
Transform & Load 
Data Warehouse 
Report Generation 
Ad Hoc Analysis 
write 
read
Amazon S3 is your Data Lake 
Amazon S3
Amazon EMR with Amazon S3 is your Data Warehouse 
Hive,Pig, 
Cascading 
Spark 
Presto 
HBase 
Amazon S3
Disaster Recovery built in 
Cluster 1 
Cluster 2 
Cluster 3 
Cluster 4 
Amazon S3 
Availability Zone 
Availability Zone
Amazon EMR reads from and writes to AWS data sources 
Amazon S3 
bucket 
Amazon 
Kinesis 
Amazon 
DynamoDB 
Amazon S3 
bucket 
Amazon 
DynamoDB 
Amazon 
Redshift
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
Client/Sensor Recording 
Service 
Aggregator/ 
Sequencer 
Continuous 
Processor 
Data 
Warehouse 
Analytics and 
Reporting
Client/Sensor Recording 
Service 
Aggregator/ 
Sequencer 
Continuous 
Processor 
Data 
Warehouse 
Analytics and 
Reporting 
Kafka
Streaming Data Repository 
Amazon Kinesis
Amazon Kinesis + Amazon EMR= Fewer Moving Parts 
Client/ Sensor Recording 
Service 
Aggregator/ 
Sequencer 
Continuous 
Processor for 
Dashboard 
Data 
Warehouse 
Analytics and 
Reporting 
Amazon Kinesis Amazon EMR 
Logging Streaming Data Repository Data Processing 
Log4J
Processing 
Input 
• User 
• Dev 
push to 
Hive 
Pig 
Cascading 
pull from 
Spark 
Amazon Kinesis 
Amazon DynamoDB
Processing Amazon Kinesis data from Amazon EMR using Hive 
privatestaticfinalKinesisAppender.class 
SELECT 
FROM 
WHERE 
InstanceTime | InstnaceID| Message 
11/13/2014:07:51 InstanceID123Cannot find resource XYZ
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014
Amazon S3
Long-Running Clusters 
Scheduled Jobs
Amazon EMR integrates with your tools
Recent Integrations
Recent Integrations
http://emr.looker.com
Flexible and cost effective –Burst when you need to 
672 
0.113 
75.936 
On Demand pricing 
672 
14.784 
Reserved Instance 
672 
accepted 
10.08 
Spot Instance 
0.015 
481 
0.022 
Bill 
$10.08
Setup for security
Flexible, Reliable, Scalable, Secure, and Low-Cost Data Warehouse
AWS Big Data Blog 
•R 
•Amazon Kinesis 
•Visualization with Tableau 
•Bootstrap actions and steps 
http://blogs.aws.amazon.com/bigdata/
Get started today 
http://aws.amazon.com/elasticmapreduce/
http://bit.ly/awsevals

More Related Content

What's hot

Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Amazon Web Services
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsAmazon Web Services
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSAmazon Web Services
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsAmazon Web Services
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesm vaishnavi
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Web Services
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudAmazon Web Services
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDBAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLAmazon Web Services
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAmazon Web Services
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Amazon Web Services
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Amazon Web Services
 

What's hot (20)

Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2Scalable Data Analytics - DevDay Austin 2017 Day 2
Scalable Data Analytics - DevDay Austin 2017 Day 2
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Optimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics WorkloadsOptimizing Storage for Big Data Analytics Workloads
Optimizing Storage for Big Data Analytics Workloads
 
Architecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWSArchitecting a Serverless Data Lake on AWS
Architecting a Serverless Data Lake on AWS
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Optimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics WorkloadsOptimizing Storage for Big Data/Analytics Workloads
Optimizing Storage for Big Data/Analytics Workloads
 
Co 4, session 2, aws analytics services
Co 4, session 2, aws analytics servicesCo 4, session 2, aws analytics services
Co 4, session 2, aws analytics services
 
Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview Amazon Athena Capabilities and Use Cases Overview
Amazon Athena Capabilities and Use Cases Overview
 
Amazon Kinesis Data Streams
Amazon Kinesis Data StreamsAmazon Kinesis Data Streams
Amazon Kinesis Data Streams
 
Building Data Lakes in the AWS Cloud
Building Data Lakes in the AWS CloudBuilding Data Lakes in the AWS Cloud
Building Data Lakes in the AWS Cloud
 
Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2Databases in the Cloud - DevDay Austin 2017 Day 2
Databases in the Cloud - DevDay Austin 2017 Day 2
 
BDA311 Introduction to AWS Glue
BDA311 Introduction to AWS GlueBDA311 Introduction to AWS Glue
BDA311 Introduction to AWS Glue
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Getting started with Amazon DynamoDB
Getting started with Amazon DynamoDBGetting started with Amazon DynamoDB
Getting started with Amazon DynamoDB
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQLNEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
NEW LAUNCH! Intro to Amazon Athena. Analyze data in S3, using SQL
 
AWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWSAWS Summit Auckland - Building a Server-less Data Lake on AWS
AWS Summit Auckland - Building a Server-less Data Lake on AWS
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 
Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2Data Design for Microservices - DevDay Austin 2017 Day 2
Data Design for Microservices - DevDay Austin 2017 Day 2
 

Viewers also liked

(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...
(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...
(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...Amazon Web Services
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAmazon Web Services
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」SmartNews, Inc.
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicerconfluent
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWSDisaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWSAmazon Web Services
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Amazon Web Services
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Amazon Web Services
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具Amazon Web Services
 

Viewers also liked (10)

(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...
(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...
(EDU203) Instructing on the Cloud: Using AWS to Aid Professors and Teach Stud...
 
AWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions ShowcaseAWS Webcast - Informatica - Big Data Solutions Showcase
AWS Webcast - Informatica - Big Data Solutions Showcase
 
AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」AWS meetup「Apache Spark on EMR」
AWS meetup「Apache Spark on EMR」
 
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike SpicerKafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
Kafka and Stream Processing, Taking Analytics Real-time, Mike Spicer
 
Disaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWSDisaster Recovery of on-premises IT infrastructure with AWS
Disaster Recovery of on-premises IT infrastructure with AWS
 
Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift Uses and Best Practices for Amazon Redshift
Uses and Best Practices for Amazon Redshift
 
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
Best Practices for Building a Data Lake with Amazon S3 - August 2016 Monthly ...
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具深入淺出 AWS 大數據工具
深入淺出 AWS 大數據工具
 
IAM Introduction
IAM IntroductionIAM Introduction
IAM Introduction
 

Similar to (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetupstevemcpherson
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014Amazon Web Services
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesAmazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) Julien SIMON
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Amazon Web Services
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWSETCenter
 
Architecting Cloud Apps
Architecting Cloud AppsArchitecting Cloud Apps
Architecting Cloud Appsjineshvaria
 
Best Practices Scaling Web Application Up to Your First 10 Million Users
Best Practices Scaling Web Application Up to Your First 10 Million UsersBest Practices Scaling Web Application Up to Your First 10 Million Users
Best Practices Scaling Web Application Up to Your First 10 Million UsersAmazon Web Services
 
Build A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersBuild A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersAmazon Web Services
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosAmazon Web Services LATAM
 
AWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAmazon Web Services
 
Basics AWS Presentation
Basics AWS PresentationBasics AWS Presentation
Basics AWS PresentationShyam Kumar
 

Similar to (BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014 (20)

Amazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto MeetupAmazon EMR Facebook Presto Meetup
Amazon EMR Facebook Presto Meetup
 
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
AWS Cloud Kata 2014 | Jakarta - 2-1 AWS Intro and Scale 2014
 
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce(BDT208) A Technical Introduction to Amazon Elastic MapReduce
(BDT208) A Technical Introduction to Amazon Elastic MapReduce
 
4K Media Workflows on AWS
4K Media Workflows on AWS4K Media Workflows on AWS
4K Media Workflows on AWS
 
Your First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS CloudYour First 10 million Users on the AWS Cloud
Your First 10 million Users on the AWS Cloud
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Your First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web ServicesYour First 10 Million Users with Amazon Web Services
Your First 10 Million Users with Amazon Web Services
 
Scaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million UsersScaling Up to Your First 10 Million Users
Scaling Up to Your First 10 Million Users
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2) AWS re:Invent 2016 recap (part 2)
AWS re:Invent 2016 recap (part 2)
 
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
Building a Data Processing Pipeline on AWS - AWS Summit SG 2017
 
AWS 101
AWS 101AWS 101
AWS 101
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
 
Architecting Cloud Apps
Architecting Cloud AppsArchitecting Cloud Apps
Architecting Cloud Apps
 
Best Practices Scaling Web Application Up to Your First 10 Million Users
Best Practices Scaling Web Application Up to Your First 10 Million UsersBest Practices Scaling Web Application Up to Your First 10 Million Users
Best Practices Scaling Web Application Up to Your First 10 Million Users
 
Build A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million UsersBuild A Website on AWS for Your First 10 Million Users
Build A Website on AWS for Your First 10 Million Users
 
Escalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuariosEscalando para sus primeros 10 millones de usuarios
Escalando para sus primeros 10 millones de usuarios
 
AWS Webcast - Understanding database options
AWS Webcast - Understanding database optionsAWS Webcast - Understanding database options
AWS Webcast - Understanding database options
 
Basics AWS Presentation
Basics AWS PresentationBasics AWS Presentation
Basics AWS Presentation
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Muhammad Tiham Siddiqui
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInThousandEyes
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024Brian Pichman
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosErol GIRAUDY
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarThousandEyes
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationKnoldus Inc.
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3DianaGray10
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applicationsnooralam814309
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0DanBrown980551
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsDianaGray10
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1DianaGray10
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTxtailishbaloch
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024Brian Pichman
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxNeo4j
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfTejal81
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Libraryshyamraj55
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kitJamie (Taka) Wang
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud DataEric D. Schabell
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.IPLOOK Networks
 

Recently uploaded (20)

Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)Trailblazer Community - Flows Workshop (Session 2)
Trailblazer Community - Flows Workshop (Session 2)
 
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedInOutage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
Outage Analysis: March 5th/6th 2024 Meta, Comcast, and LinkedIn
 
CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024CyberSecurity - Computers In Libraries 2024
CyberSecurity - Computers In Libraries 2024
 
Scenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenariosScenario Library et REX Discover industry- and role- based scenarios
Scenario Library et REX Discover industry- and role- based scenarios
 
EMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? WebinarEMEA What is ThousandEyes? Webinar
EMEA What is ThousandEyes? Webinar
 
Introduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its applicationIntroduction to RAG (Retrieval Augmented Generation) and its application
Introduction to RAG (Retrieval Augmented Generation) and its application
 
UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3UiPath Studio Web workshop Series - Day 3
UiPath Studio Web workshop Series - Day 3
 
Graphene Quantum Dots-Based Composites for Biomedical Applications
Graphene Quantum Dots-Based Composites for  Biomedical ApplicationsGraphene Quantum Dots-Based Composites for  Biomedical Applications
Graphene Quantum Dots-Based Composites for Biomedical Applications
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0LF Energy Webinar - Unveiling OpenEEMeter 4.0
LF Energy Webinar - Unveiling OpenEEMeter 4.0
 
Automation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projectsAutomation Ops Series: Session 2 - Governance for UiPath projects
Automation Ops Series: Session 2 - Governance for UiPath projects
 
UiPath Studio Web workshop series - Day 1
UiPath Studio Web workshop series  - Day 1UiPath Studio Web workshop series  - Day 1
UiPath Studio Web workshop series - Day 1
 
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENTSIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
SIM INFORMATION SYSTEM: REVOLUTIONIZING DATA MANAGEMENT
 
AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024AI Workshops at Computers In Libraries 2024
AI Workshops at Computers In Libraries 2024
 
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptxGraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
GraphSummit Copenhagen 2024 - Neo4j Vision and Roadmap.pptx
 
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdfQ4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
Q4 2023 Quarterly Investor Presentation - FINAL - v1.pdf
 
How to release an Open Source Dataweave Library
How to release an Open Source Dataweave LibraryHow to release an Open Source Dataweave Library
How to release an Open Source Dataweave Library
 
20140402 - Smart house demo kit
20140402 - Smart house demo kit20140402 - Smart house demo kit
20140402 - Smart house demo kit
 
3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data3 Pitfalls Everyone Should Avoid with Cloud Data
3 Pitfalls Everyone Should Avoid with Cloud Data
 
Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.Introduction - IPLOOK NETWORKS CO., LTD.
Introduction - IPLOOK NETWORKS CO., LTD.
 

(BDT308) Using Amazon Elastic MapReduce as Your Scalable Data Warehouse | AWS re:Invent 2014

  • 1. November 14, 2014 | Las Vegas, NV Steve McPherson
  • 2. instance AMI DB on instance instance with CloudWatch Elastic IP optimized instance Amazon WorkSpaces assignment/ task Amazon EMR cluster MapR M3 engine MapR M5 engine MapR M7 engine engine Kinesis-enabled app new! Amazon Route 53 hosted zone route table solid state disks AWS Direct Connect router Amazon RDS customer gateway attribute VPC peering Auto Scaling Amazon S3 bucket with objects object AWS Import/Export AWS Storage Gateway Amazon EBS volume snapshot cached volume virtual tape library Elastic Beanstalk Amazon Glacier archive vault CloudFront download distribution Node.js streaming distribution items DynamoDB table attributes global secondary index Amazon RDS DB Kinesis instance RDS DB instance standby (Multi-AZ) Oracle DB instance MS SQL instance PostgreSQL instance PIOP Redis Memcached new! new! new! new! AWS CloudTrail instances Amazon SimpleDB domain Amazon Redshift new! DW1 Dense Compute ElastiCache DW2 Dense Compute edge location AWS Toolkit for Visual Studio applicatiJoanvaScript stack Amazon VPC VPN connection virtual private gateway alarm stack Internet gateway .NET RDS DB instance read replica IAMJava Python (boto) AWS CLI permissions role MFA token new! new! new! AWS OpsWorks elastic network instance data encryption PHP key AWS Data Pipeline monitoring new! new! deployment CloudWatch Elastic Load Balancing SQL master new! new! Amazon EC2 new! SQL slave encrypted data AWS Tools for Windows PowerShell non-cached volume users IAM add-on deployments bucket deployments new! permissions iOS resources cache node stack AWS OpsWorks layers apps new! new! apps new! Amazon SNS new! Human Intelligence Tasks (HIT) AWS Simple Icons: Deployment & Management instances new! new! new! Ruby new! instances new! resources permissions new! topic new! template AWS Toolkit for Eclipse Amazon SES traditional server Elastic Transcoder email monitoring Requester email notification HTTP notification Amazon CloudSearch SDF metadata Amazon SQS item message Amazon SWF decider layers worker disk tape storage Internet user Amazon Mechanical Turk client mobile client multimedia workers corporate data center generic database Android AWS Security Token Service AWS cloud AWS Management Console virtual private cloud forums MySQL DB instance queue AMAZON EMR
  • 3. Big decisions need Big Data Server Purchase Social Media
  • 4. Extract Transform Load to Data Warehouse Report Generation Ad Hoc Analysis
  • 6. Difficult, expensive, and time consuming to operate Hadoop But Hadoop needs help
  • 7. Amazon EMR makes Hadoop easy
  • 8. Extract Transform & Load Data Warehouse Report Generation & Ad Hoc Analysis Amazon S3 •MapReduce API •Scoop •Spark •Cascading •Pig •MR •Hive •Spark •Cascading •Pig •Presto •Hive •Spark-SQL •Lingual •Parquet •ORC •SEQ •Text Extract Transform & Load Data Warehouse Report Generation Ad Hoc Analysis write read
  • 9. Amazon S3 is your Data Lake Amazon S3
  • 10. Amazon EMR with Amazon S3 is your Data Warehouse Hive,Pig, Cascading Spark Presto HBase Amazon S3
  • 11. Disaster Recovery built in Cluster 1 Cluster 2 Cluster 3 Cluster 4 Amazon S3 Availability Zone Availability Zone
  • 12. Amazon EMR reads from and writes to AWS data sources Amazon S3 bucket Amazon Kinesis Amazon DynamoDB Amazon S3 bucket Amazon DynamoDB Amazon Redshift
  • 15. Client/Sensor Recording Service Aggregator/ Sequencer Continuous Processor Data Warehouse Analytics and Reporting
  • 16. Client/Sensor Recording Service Aggregator/ Sequencer Continuous Processor Data Warehouse Analytics and Reporting Kafka
  • 17. Streaming Data Repository Amazon Kinesis
  • 18. Amazon Kinesis + Amazon EMR= Fewer Moving Parts Client/ Sensor Recording Service Aggregator/ Sequencer Continuous Processor for Dashboard Data Warehouse Analytics and Reporting Amazon Kinesis Amazon EMR Logging Streaming Data Repository Data Processing Log4J
  • 19. Processing Input • User • Dev push to Hive Pig Cascading pull from Spark Amazon Kinesis Amazon DynamoDB
  • 20. Processing Amazon Kinesis data from Amazon EMR using Hive privatestaticfinalKinesisAppender.class SELECT FROM WHERE InstanceTime | InstnaceID| Message 11/13/2014:07:51 InstanceID123Cannot find resource XYZ
  • 26. Amazon EMR integrates with your tools
  • 30. Flexible and cost effective –Burst when you need to 672 0.113 75.936 On Demand pricing 672 14.784 Reserved Instance 672 accepted 10.08 Spot Instance 0.015 481 0.022 Bill $10.08
  • 32. Flexible, Reliable, Scalable, Secure, and Low-Cost Data Warehouse
  • 33. AWS Big Data Blog •R •Amazon Kinesis •Visualization with Tableau •Bootstrap actions and steps http://blogs.aws.amazon.com/bigdata/
  • 34. Get started today http://aws.amazon.com/elasticmapreduce/