March 19, 2015 | Facebook Presto Meetup
Steve McPherson
instance AMI DB on
instance
instance with
CloudWatch
Elastic IP optimized
instance
Amazon
WorkSpaces
assignment/
task
Amazon EMR cluster MapR M3
engine
MapR M5
engine
MapR M7
engine
engine
Kinesis-enabled
app
new!
Amazon
Route 53
hosted zone route table
solid state disks
AWS Direct Connect
router
Amazon RDS
customer
gateway
attribute
VPC peering
Auto Scaling
Amazon S3 bucket with
objects
object AWS Import/Export
AWS Storage
Gateway
volume snapshotAmazon EBS cached
volume
virtual tape
library
Elastic Beanstalk
Amazon Glacier archive vault
CloudFront download
distribution Node.js
streaming
distribution
items
tableDynamoDB attributes global
secondary
index
Amazon
KinesisRDS DB
instance
RDS DB
instance standby
(Multi-AZ) Oracle DB
instance
MS SQL
instance
PostgreSQL
instance
PIOP MemcachedRedis
new! new! new! new!
AWS CloudTrail
instances
domain Amazon RedshiftAmazon SimpleDB
new!
DW1
Dense Compute
ElastiCache
DW2
Dense Compute
edge location
AWS Toolkit for
Visual Studio
JavaScriptapplication
stack
Amazon VPC VPN
connection
virtual private
gateway
alarm
stack
Internet
gateway
.NET
RDS DB
instance read
replica
IAMJava Python (boto)
AWS CLI
permissions role
MFA token
new!
new! new!
AWS OpsWorks
elastic network
instance
PHPdata encryption
key
AWS Data Pipeline
monitoring
new!
new!
deployment CloudWatch
Elastic Load
Balancing
SQL master
new!new!
Amazon EC2
new!
SQL slave
encrypted
data
AWS Tools for
Windows
PowerShell
non-cached
volume
users
IAM add-on
deployments
bucket
deployments
new!
permissions
iOS
resources
cache node
stack
AWS OpsWorks layers
apps
new!
new! apps
new!
Amazon SNS
new!
Human Intelligence
Tasks (HIT)
AWS Simple Icons: Deployment & Management
instances
new!
new!new!
Ruby
new!
instances
new!
permissionsresources
new!
topic
new!
template
AWS Toolkit
for Eclipse
Amazon SES
traditional server
Elastic
Transcoder
email
monitoring
Requester
email notification HTTP notification
Amazon
CloudSearch
SDF metadata
Amazon SQS
item
message
Amazon SWF
decider
layers
worker
tape storagedisk
userInternet
Amazon
Mechanical Turk
client mobile client multimedia
workers
corporate
data center
generic database
Android
AWS Security
Token Service
AWS cloud
AWS Management
Console
virtual private cloud forums
MySQL DB
instance
queue
AMAZON
EMR
Amazon EMR makes Cluster Management easy
• Setup and
configuration
• Node monitoring and
replacement
• Log aggregation
• Cloudwatch integration
• Expand and shrink on
demand
• Integration with Spot
• AWS Support
Extract Transform & Load Data Warehouse Report Generation & Ad Hoc Analysis
Amazon S3
• MapReduce API
• Scoop
• Spark
• Cascading
• Pig
• MR
• Hive
• Spark
• Cascading
• Pig
• Presto
• Hive
• Spark-SQL
• Lingual
• Parquet
• ORC
• SEQ
• Text
Extract Transform
& Load
Data Warehouse Report
Generation
Ad Hoc
Analysis
write read
Different Clusters for different workloads
Hive, Pig,
Cascading
Presto
Spark HBase
Amazon S3
name
ami-version
applications
ec2-attributes
instance-groups
/
bootstrap-action
#wait 5 minutes
hive
presto-cli --catalog hive
Get started today
http://aws.amazon.com/elasticmapreduce/

Amazon EMR Facebook Presto Meetup

  • 1.
    March 19, 2015| Facebook Presto Meetup Steve McPherson
  • 2.
    instance AMI DBon instance instance with CloudWatch Elastic IP optimized instance Amazon WorkSpaces assignment/ task Amazon EMR cluster MapR M3 engine MapR M5 engine MapR M7 engine engine Kinesis-enabled app new! Amazon Route 53 hosted zone route table solid state disks AWS Direct Connect router Amazon RDS customer gateway attribute VPC peering Auto Scaling Amazon S3 bucket with objects object AWS Import/Export AWS Storage Gateway volume snapshotAmazon EBS cached volume virtual tape library Elastic Beanstalk Amazon Glacier archive vault CloudFront download distribution Node.js streaming distribution items tableDynamoDB attributes global secondary index Amazon KinesisRDS DB instance RDS DB instance standby (Multi-AZ) Oracle DB instance MS SQL instance PostgreSQL instance PIOP MemcachedRedis new! new! new! new! AWS CloudTrail instances domain Amazon RedshiftAmazon SimpleDB new! DW1 Dense Compute ElastiCache DW2 Dense Compute edge location AWS Toolkit for Visual Studio JavaScriptapplication stack Amazon VPC VPN connection virtual private gateway alarm stack Internet gateway .NET RDS DB instance read replica IAMJava Python (boto) AWS CLI permissions role MFA token new! new! new! AWS OpsWorks elastic network instance PHPdata encryption key AWS Data Pipeline monitoring new! new! deployment CloudWatch Elastic Load Balancing SQL master new!new! Amazon EC2 new! SQL slave encrypted data AWS Tools for Windows PowerShell non-cached volume users IAM add-on deployments bucket deployments new! permissions iOS resources cache node stack AWS OpsWorks layers apps new! new! apps new! Amazon SNS new! Human Intelligence Tasks (HIT) AWS Simple Icons: Deployment & Management instances new! new!new! Ruby new! instances new! permissionsresources new! topic new! template AWS Toolkit for Eclipse Amazon SES traditional server Elastic Transcoder email monitoring Requester email notification HTTP notification Amazon CloudSearch SDF metadata Amazon SQS item message Amazon SWF decider layers worker tape storagedisk userInternet Amazon Mechanical Turk client mobile client multimedia workers corporate data center generic database Android AWS Security Token Service AWS cloud AWS Management Console virtual private cloud forums MySQL DB instance queue AMAZON EMR
  • 3.
    Amazon EMR makesCluster Management easy • Setup and configuration • Node monitoring and replacement • Log aggregation • Cloudwatch integration • Expand and shrink on demand • Integration with Spot • AWS Support
  • 4.
    Extract Transform &Load Data Warehouse Report Generation & Ad Hoc Analysis Amazon S3 • MapReduce API • Scoop • Spark • Cascading • Pig • MR • Hive • Spark • Cascading • Pig • Presto • Hive • Spark-SQL • Lingual • Parquet • ORC • SEQ • Text Extract Transform & Load Data Warehouse Report Generation Ad Hoc Analysis write read
  • 5.
    Different Clusters fordifferent workloads Hive, Pig, Cascading Presto Spark HBase Amazon S3
  • 7.
  • 8.
  • 10.