1
LOOK BEFORE
YOU LEAP
Migrating On-Premises Hadoop to AWS
2
Today’s Speakers
Cobus Bernard
Senior Developer Advocate at
Amazon Web Services
cobusb@amazon.com
Jason Baick
Senior Director of Product
Marketing, Unravel Data
jbaick@unraveldata.com
3
1. Top reasons customers choose AWS for their cloud
migration journey
2. Advantages of planning out your Hadoop migration
to AWS
3. Demo: Migration assessment capabilities to ensure
risk-free migration
Today’s
Agenda
4
• Top reasons customers choose AWS for their
cloud migration journey
AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cobus Bernard
Sr Developer Advocate, Amazon Web Services
Migrating Hadoop to Amazon
EMR
Updated: 28-June-2018
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Amazon EMR
• Managed clusters for Hadoop, Spark, Presto, or any other
applications in the Apache/Hadoop stack
• Integrated with the AWS platform via EMRFS – connectors for
Amazon S3, Amazon DynamoDB, Amazon Kinesis, Amazon
Redshift, and AWS KMS
• Secure with support for AWS IAM roles, Kerberos, KMS, S3 client-
side encryption, Hadoop transparent encryption, Amazon VPC, and
HIPAA-eligible
• Built in support for resizing clusters and integrated with the
Amazon EC2 spot market to help lower costs
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
EMR Automation
EC2 Provisioning Cluster Setup Hadoop Configuration
Installing ApplicationsJob submissionMonitoring and Failure
Handling
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
YARN
PIG
SQL
Amazon
EMR
EMRFS
Amazon
S3
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Decouple Storage and Compute
Persistent Cluster – Interactive Queries
(Spark-SQL | Presto)
Transient Cluster - Batch Jobs
(X hours nightly) – Add/Remove Nodes
External Metastore
Workload specific clusters
(Different sizes, Different Versions)
Amazon S3
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Compute Flexibility
Compute Memory Storage
Machine Learning
C5 Family
C4 Family
X1 Family
R4 Family
Interactive Analysis
D2 Family
I3 Family
Large HDFS
General
Batch Process
M5 Family
M4 Family
19
• Advantages of planning out your Hadoop
migration to AWS
Unravel
20
Unravel: Extensible Data Operations Platform
21
Data pipelines are being
re-architected for cloud.
Cloud markets growing 7.5 times faster
than on-premises.
7.5x
22
On-premises
Hadoop, Spark,
Kafka, NoSQL
IaaS
Hadoop, Spark,
Kafka, NoSQL
PaaS
Amazon
EMR
Cloud
Native
Amazon
Redshift
Spectrum of platforms for analytics in the cloud
Unravel is your partner for your cloud journey
Migration paths:
• Lift and Shift
• Cost Reduction
• Workload Fit
Operate Migrate
Operate
23
Common Questions During Migration
I need full
accounting controls
for chargeback.
What Instance
types should I be
spinning up?
How do I reliably
forecast capacity?
What apps should I
migrate first?
Is Lift-and-shift the
right approach?
How do I tier
storage to optimize
my costs?
24
Unravel Cloud Migration Workflow
Cluster Discovery Validate and ManageCloud Mapping
Understand your Cluster and
Application Workloads
Map your On-Premise Cluster to an
AWS Deployment
Compare the performance of the
app in the new environment and
tune it using Unravel insights and
recommendations
25
Problem we solve: Cloud Sticker Shock
• On-premises data workloads are too
complex, are not delivering on
expected value and enterprises lack
the skills to impact change
• Cloud Data Services are promising
but are complex themselves
• Cloud migration projects are
challenged by
- Unexpected Costs
- Lack of data and Insights to make the
right Migration decisions
26
Accelerated cloud migration assessment
14 days7 days7 days7 days
Discovery
Meeting
Technical
Discovery
2 days
Manual
Correlation
• Detailed technical scoping
• Define use cases
• Data source identification and systems access controls
• Manual data gathering
• Project scoping
• Identify stakeholders
• Define milestones, KPIs, and project timeline
• Creation of Manual reports
• Correlation of data silo’s
• Build analytical models for ‘best-fit’ recommendations
• Manual adjustments and model revisions during iterations
Completed
Assessment
Technical
Discovery
Technical
Discovery
Without Unravel
8 days
Discovery
Meeting
Technical
Discovery
2 days
• Detailed technical scoping
• Define use cases
• Product installation
• Begin workload data gathering
• Project scoping
• Identify
stakeholders
• Define milestones,
KPIs, and project
timeline
Completed
Assessment
• Detailed migration
assessment report
• Detailed insights and
recommendations
Value of Unravel
With Unravel
27
Unravel is central to your Data Operations:
Current and future state
On-premises Cloud/Hybrid
• Unified Visibility
• Optimize Performance
• Automate Operations
• Optimize Costs
• Accelerate App Deployment
• SLA Assurance
• Baseline Workloads
• Instance mapping
• Capacity Forecasting
Assess & Plan
Journey to the cloud
1 2 3
28
• Demo: Migration assessment capabilities to
ensure risk-free migration
Unravel
29
Demo
30
Cloud Migration
Assessment Offer
31
Day 15 (+ ∆)
• Detailed migration assessment report
• Detailed insights and recommendations
• Define customer next steps
• Summary of Insights
• Summary of recommendations
Day 10 (+ Ω)
• Metrics readout
• Infrastructure summary
• Workload summary
• Summary of insights
• Summary of recommendations
Day 2-5
• Detailed technical scoping
• Define use cases
• Product installation
• Begin workload data gathering
• Define data gathering interval
DISCOVERY
MEETING
Day 1
• Project scoping
• Identify stakeholders
• Define milestones, KPIs,
and project timeline
Ω = Data gathering interval ∆ = Technical Teams Validate
Cloud Migration Assessment Offer
TECHNICAL
DISCOVERY
INITIAL
READOUT
COMPLETED
ASSESSMENT
32
Unravel provides Insights and recommendations
to accelerate migration
• Ensure the best possible transition with insights and guidance before,
during, and after migration.
• Recommendations for the best apps to Migrate
• Mapping on-premises infrastructure to cloud instance types
- Lift and shift
- Cost reduction
- Workload fit
• Cloud capacity planning and chargeback reporting
33
Try Unravel for Amazon EMR on AWS Marketplace
14 Day Free Trial - Get $2000 AWS credits for signup
34
www.unraveldata.com
hello@unraveldata.com

Look Before You Leap: Migrating On-Premises Hadoop to AWS

  • 1.
    1 LOOK BEFORE YOU LEAP MigratingOn-Premises Hadoop to AWS
  • 2.
    2 Today’s Speakers Cobus Bernard SeniorDeveloper Advocate at Amazon Web Services cobusb@amazon.com Jason Baick Senior Director of Product Marketing, Unravel Data jbaick@unraveldata.com
  • 3.
    3 1. Top reasonscustomers choose AWS for their cloud migration journey 2. Advantages of planning out your Hadoop migration to AWS 3. Demo: Migration assessment capabilities to ensure risk-free migration Today’s Agenda
  • 4.
    4 • Top reasonscustomers choose AWS for their cloud migration journey AWS
  • 5.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Cobus Bernard Sr Developer Advocate, Amazon Web Services Migrating Hadoop to Amazon EMR Updated: 28-June-2018
  • 6.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 7.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 8.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 9.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 10.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 11.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 12.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 13.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved.
  • 14.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Amazon EMR • Managed clusters for Hadoop, Spark, Presto, or any other applications in the Apache/Hadoop stack • Integrated with the AWS platform via EMRFS – connectors for Amazon S3, Amazon DynamoDB, Amazon Kinesis, Amazon Redshift, and AWS KMS • Secure with support for AWS IAM roles, Kerberos, KMS, S3 client- side encryption, Hadoop transparent encryption, Amazon VPC, and HIPAA-eligible • Built in support for resizing clusters and integrated with the Amazon EC2 spot market to help lower costs
  • 15.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. EMR Automation EC2 Provisioning Cluster Setup Hadoop Configuration Installing ApplicationsJob submissionMonitoring and Failure Handling
  • 16.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. YARN PIG SQL Amazon EMR EMRFS Amazon S3
  • 17.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Decouple Storage and Compute Persistent Cluster – Interactive Queries (Spark-SQL | Presto) Transient Cluster - Batch Jobs (X hours nightly) – Add/Remove Nodes External Metastore Workload specific clusters (Different sizes, Different Versions) Amazon S3
  • 18.
    © 2020, AmazonWeb Services, Inc. or its Affiliates. All rights reserved. Compute Flexibility Compute Memory Storage Machine Learning C5 Family C4 Family X1 Family R4 Family Interactive Analysis D2 Family I3 Family Large HDFS General Batch Process M5 Family M4 Family
  • 19.
    19 • Advantages ofplanning out your Hadoop migration to AWS Unravel
  • 20.
    20 Unravel: Extensible DataOperations Platform
  • 21.
    21 Data pipelines arebeing re-architected for cloud. Cloud markets growing 7.5 times faster than on-premises. 7.5x
  • 22.
    22 On-premises Hadoop, Spark, Kafka, NoSQL IaaS Hadoop,Spark, Kafka, NoSQL PaaS Amazon EMR Cloud Native Amazon Redshift Spectrum of platforms for analytics in the cloud Unravel is your partner for your cloud journey Migration paths: • Lift and Shift • Cost Reduction • Workload Fit Operate Migrate Operate
  • 23.
    23 Common Questions DuringMigration I need full accounting controls for chargeback. What Instance types should I be spinning up? How do I reliably forecast capacity? What apps should I migrate first? Is Lift-and-shift the right approach? How do I tier storage to optimize my costs?
  • 24.
    24 Unravel Cloud MigrationWorkflow Cluster Discovery Validate and ManageCloud Mapping Understand your Cluster and Application Workloads Map your On-Premise Cluster to an AWS Deployment Compare the performance of the app in the new environment and tune it using Unravel insights and recommendations
  • 25.
    25 Problem we solve:Cloud Sticker Shock • On-premises data workloads are too complex, are not delivering on expected value and enterprises lack the skills to impact change • Cloud Data Services are promising but are complex themselves • Cloud migration projects are challenged by - Unexpected Costs - Lack of data and Insights to make the right Migration decisions
  • 26.
    26 Accelerated cloud migrationassessment 14 days7 days7 days7 days Discovery Meeting Technical Discovery 2 days Manual Correlation • Detailed technical scoping • Define use cases • Data source identification and systems access controls • Manual data gathering • Project scoping • Identify stakeholders • Define milestones, KPIs, and project timeline • Creation of Manual reports • Correlation of data silo’s • Build analytical models for ‘best-fit’ recommendations • Manual adjustments and model revisions during iterations Completed Assessment Technical Discovery Technical Discovery Without Unravel 8 days Discovery Meeting Technical Discovery 2 days • Detailed technical scoping • Define use cases • Product installation • Begin workload data gathering • Project scoping • Identify stakeholders • Define milestones, KPIs, and project timeline Completed Assessment • Detailed migration assessment report • Detailed insights and recommendations Value of Unravel With Unravel
  • 27.
    27 Unravel is centralto your Data Operations: Current and future state On-premises Cloud/Hybrid • Unified Visibility • Optimize Performance • Automate Operations • Optimize Costs • Accelerate App Deployment • SLA Assurance • Baseline Workloads • Instance mapping • Capacity Forecasting Assess & Plan Journey to the cloud 1 2 3
  • 28.
    28 • Demo: Migrationassessment capabilities to ensure risk-free migration Unravel
  • 29.
  • 30.
  • 31.
    31 Day 15 (+∆) • Detailed migration assessment report • Detailed insights and recommendations • Define customer next steps • Summary of Insights • Summary of recommendations Day 10 (+ Ω) • Metrics readout • Infrastructure summary • Workload summary • Summary of insights • Summary of recommendations Day 2-5 • Detailed technical scoping • Define use cases • Product installation • Begin workload data gathering • Define data gathering interval DISCOVERY MEETING Day 1 • Project scoping • Identify stakeholders • Define milestones, KPIs, and project timeline Ω = Data gathering interval ∆ = Technical Teams Validate Cloud Migration Assessment Offer TECHNICAL DISCOVERY INITIAL READOUT COMPLETED ASSESSMENT
  • 32.
    32 Unravel provides Insightsand recommendations to accelerate migration • Ensure the best possible transition with insights and guidance before, during, and after migration. • Recommendations for the best apps to Migrate • Mapping on-premises infrastructure to cloud instance types - Lift and shift - Cost reduction - Workload fit • Cloud capacity planning and chargeback reporting
  • 33.
    33 Try Unravel forAmazon EMR on AWS Marketplace 14 Day Free Trial - Get $2000 AWS credits for signup
  • 34.