SlideShare a Scribd company logo
1 of 32
BUSI758B
Big Data Analytics On
Amazon Web Services
Yelp was able to save $55,000 in upfront in Hardware
costs.
Unilever processes Genetic sequences 20 times faster .
Swipely generates insight from millions of Credit Card
transactions.
Expedia processes click stream data from global
network of websites.
The Big Question
is
How ???
The Answer is :
Some Background on Cloud
Computing and AWS
What is Cloud Computing ?
lCloud computing is a model for enabling ubiquitous, convenient, on-
demand network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and services) that
can be rapidly provisioned and released with minimal management effort
or service provider interaction.
l - NIST Definition
lThis cloud model is composed of five essential characteristics, three
service models, and four deployment models.
Essential Characteristics:
l- On-demand self-service.
l- Broad network access.
l- Resource pooling.
l- Rapid elasticity.
l- Measured service.
Service Models:
IaaS Providers : AWS,HPCloud,Rackspace.
PaaS Providers: Google AppEngine, heroku, Redhat Openshift
SaaS Providers: Salesforce,Linkedin, Taleo
Delivery Models:
lPublic Cloud
lPrivate Cloud
lHybrid Cloud
lCommunity Cloud*
* NIST Defines Community cloud as The cloud infrastructure provisioned for exclusive use by a specific
community of consumers from organizations that have shared concerns (e.g., mission,security requirements,
policy, and compliance considerations).
lNow Few Questions ??
1. What service model does AWS fall into ??
2. What are the advantages of using Cloud Platform
for Big data ?
3. How AWS leverage those advantages to provide
Big Data Analytics ?
Advantage of Cloud Platform
l- Ability to Scale the infrastructure
l- OPEX instead of CAPEX
l- Custom solutions as per the need.
l- Easier/faster Deployment.
l- Help focus on Core Business
l solutions/Analytics.
So , It can be safely said that the Cloud Platform acts as Enabler of Big Data
technology.
AWS Big Data Analytics :
Elastic MapReduce(EMR)
Elastic MapReduce(EMR)
Hadoop as a Service
lAmazon Elastic Mapreduce supports Hadoop
Software Eco-System.(Hadoop 1.X, Hadoop 2.X)
lAmazon EMR control software is responsible for
automated arrangement, coordination, and
management of Hadoop Cluster.
lAmazon Elastic Mapreduce also Supports MAPR,
Apache Hadoop-derived software.
Integrated With Tools
Amazon EMR provides you have root access to the cluster.
Additional Software required can be installed and configured in the cluster before
Hadoop starts by creating BootStrap Action.
*Spark is installed using BootStrapping.
Mapreduce Engine
lJob/Task
lRoles of Servers:
la> Master Node
lb> Core Node
lc> Task Node
lStep: Unit of work
Mapreduce Engine implements the Distributed processing
framework of Hadoop.
Mapreduce Engine- Cont..
ll
Hadoop AWS
Name Node Master Node
Data Node Core Node
Additional concepts of Task Node and Steps :
Task Node - Task Nodes are optional. You can add task Nodes when you start
the cluster, or you can add task groups to a running cluster. Because they do
not store data and can be added and removed from a cluster, you can use task
nodes to manage the EC2 instance capacity your cluster uses, increasing
capacity to handle peak loads and decreasing it later.
Steps: Contains 1 or more Hadoop jobs. Step is an instruction given to
manipulate date using Hadoop jobs.
Max. no of Pending and Active Steps allowed in Cluster is 256.
Massively Parallel
lVirtual Instances -Much Easier to
Scale.
lQuick and Cost effective Scaling.
lDynamic Resizing while running the
job.
lDistributed Hadoop System in true
sense.
lMultiple clusters accessing same data
Cost Effective AWS Wrapper
lSpot Instances
lPay as you go.
lAutomatic Cluster
termination after
job completion.
lBundled License
softwares with
infrastructure.
lEconomy of Scale
Integrated to AWS Services
lAmazon EMR is integrated with other Amazon Web Services such as Amazon EC2,
Amazon S3, DynamoDB, Amazon RDS, CloudWatch, and AWS Data Pipeline.
lEasily access data stored in AWS from EMR cluster and make use of the
functionality offered by other Amazon Web Services to manage your cluster and
store the output of your cluster
Compute
lEC2
Networking
•VPC
•ELB
•Route 53
Storage
lEBS
lS3
lGlacier
Data Services
lRDS
lDynamoDB
lRedshift
Deployment and Management
lAWS Management Console
lAWS Command Line Interface
lAWS IAM
lCloud Watch
Life Cycle of EMR Cluster
How to launch and connect to EMR Cluster-
Quick Demo
Click on Create Cluster
lProvide Cluster name for easier Identification.
lTermination Protection has to be selected 'Yes' to prevent accidental
termination of Cluster.
lLogging has to be enabled as this feature leads to automatic logging of cluster
activity.
lProvide S3 folder location for logging.
lDebugging is enabled so that any troubleshooting regarding cluster activity
can be done.
lIt is optional feature but always encouraged to have tags.
lTag is Key/Value pair which gets associated with every resource in cluster.
lHelps in monitoring and in managing cluster resource easily.
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1

More Related Content

What's hot

AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Servicetechugo
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsAmazon Web Services
 
Aws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWSAws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWSAWS Germany
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopDataWorks Summit
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAmazon Web Services
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Amazon Web Services
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonData Con LA
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRAmazon Web Services
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesAmazon Web Services
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기Amazon Web Services Korea
 

What's hot (20)

AWS Database Migration Service
AWS Database Migration ServiceAWS Database Migration Service
AWS Database Migration Service
 
Building Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS CloudBuilding Analytics Applications in the AWS Cloud
Building Analytics Applications in the AWS Cloud
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
AWS Database Services
AWS Database ServicesAWS Database Services
AWS Database Services
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Aws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWSAws Summit Berlin 2013 - Understanding database options on AWS
Aws Summit Berlin 2013 - Understanding database options on AWS
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Amazon rds
Amazon rdsAmazon rds
Amazon rds
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Is Cloud a right Companion for Hadoop
Is Cloud a right Companion for HadoopIs Cloud a right Companion for Hadoop
Is Cloud a right Companion for Hadoop
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
 
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
Architectures for HPC and HTC Workloads on AWS | AWS Public Sector Summit 2017
 
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of AmazonBig Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
Big Data Day LA 2015 - The AWS Big Data Platform by Michael Limcaco of Amazon
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMRBDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
BDA302 Deep Dive on Migrating Big Data Workloads to Amazon EMR
 
New Database Migration Services & RDS Updates
New Database Migration Services & RDS UpdatesNew Database Migration Services & RDS Updates
New Database Migration Services & RDS Updates
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
 

Similar to BigData- On - AWS Cloud -1

Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recapCloudHesive
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
 
Uses, considerations, and recommendations for AWS
Uses, considerations, and recommendations for AWSUses, considerations, and recommendations for AWS
Uses, considerations, and recommendations for AWSScalar Decisions
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAdrian Hornsby
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapIan Massingham
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Amazon Web Services
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)Amazon Web Services
 
Deploy on AWS from GIT Lab PDF2.pdf
Deploy on AWS from GIT Lab PDF2.pdfDeploy on AWS from GIT Lab PDF2.pdf
Deploy on AWS from GIT Lab PDF2.pdfSrinivas Kannan
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAmazon Web Services
 
8kMiles Cloud Solutions Overview
8kMiles Cloud Solutions Overview8kMiles Cloud Solutions Overview
8kMiles Cloud Solutions Overviewsundarat8kmiles
 
8kmiles Cloud Solutions Overview
8kmiles Cloud Solutions Overview8kmiles Cloud Solutions Overview
8kmiles Cloud Solutions Overviewsundarat8kmiles
 
Reply Labcamp Rome - AWS Zombie - Serverless and Microservices
Reply Labcamp Rome - AWS Zombie - Serverless and MicroservicesReply Labcamp Rome - AWS Zombie - Serverless and Microservices
Reply Labcamp Rome - AWS Zombie - Serverless and MicroservicesAndrea Mercanti
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and DockerKristana Kane
 

Similar to BigData- On - AWS Cloud -1 (20)

Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...
 
Uses, considerations, and recommendations for AWS
Uses, considerations, and recommendations for AWSUses, considerations, and recommendations for AWS
Uses, considerations, and recommendations for AWS
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
Understanding AWS Managed Databases and Analytic Services - AWS Innovate Otta...
 
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
AWS re:Invent 2016: Deploying Scalable SAP Hybris Clusters using Docker (CON312)
 
Intro to cloud.pdf
Intro to cloud.pdfIntro to cloud.pdf
Intro to cloud.pdf
 
Deploy on AWS from GIT Lab PDF2.pdf
Deploy on AWS from GIT Lab PDF2.pdfDeploy on AWS from GIT Lab PDF2.pdf
Deploy on AWS from GIT Lab PDF2.pdf
 
Cloud computing What Why How
Cloud computing What Why HowCloud computing What Why How
Cloud computing What Why How
 
AWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the CloudAWS Webcast - Best Practices in Architecting for the Cloud
AWS Webcast - Best Practices in Architecting for the Cloud
 
8kMiles Cloud Solutions Overview
8kMiles Cloud Solutions Overview8kMiles Cloud Solutions Overview
8kMiles Cloud Solutions Overview
 
8KMiles Cloud Solutions Overview
8KMiles Cloud Solutions Overview8KMiles Cloud Solutions Overview
8KMiles Cloud Solutions Overview
 
8kmiles Cloud Solutions Overview
8kmiles Cloud Solutions Overview8kmiles Cloud Solutions Overview
8kmiles Cloud Solutions Overview
 
Reply Labcamp Rome - AWS Zombie - Serverless and Microservices
Reply Labcamp Rome - AWS Zombie - Serverless and MicroservicesReply Labcamp Rome - AWS Zombie - Serverless and Microservices
Reply Labcamp Rome - AWS Zombie - Serverless and Microservices
 
Deep Dive on Microservices and Docker
Deep Dive on Microservices and DockerDeep Dive on Microservices and Docker
Deep Dive on Microservices and Docker
 
Monitoring on Amazon AWS Cloud
Monitoring on Amazon AWS Cloud Monitoring on Amazon AWS Cloud
Monitoring on Amazon AWS Cloud
 
Avoid Embarrassment, Use Cloud
Avoid Embarrassment, Use CloudAvoid Embarrassment, Use Cloud
Avoid Embarrassment, Use Cloud
 
Fundamentals of Cloud Computing & AWS
Fundamentals of Cloud Computing & AWSFundamentals of Cloud Computing & AWS
Fundamentals of Cloud Computing & AWS
 

BigData- On - AWS Cloud -1

  • 1. BUSI758B Big Data Analytics On Amazon Web Services
  • 2. Yelp was able to save $55,000 in upfront in Hardware costs. Unilever processes Genetic sequences 20 times faster . Swipely generates insight from millions of Credit Card transactions. Expedia processes click stream data from global network of websites.
  • 5. Some Background on Cloud Computing and AWS
  • 6. What is Cloud Computing ? lCloud computing is a model for enabling ubiquitous, convenient, on- demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. l - NIST Definition lThis cloud model is composed of five essential characteristics, three service models, and four deployment models.
  • 7. Essential Characteristics: l- On-demand self-service. l- Broad network access. l- Resource pooling. l- Rapid elasticity. l- Measured service.
  • 8. Service Models: IaaS Providers : AWS,HPCloud,Rackspace. PaaS Providers: Google AppEngine, heroku, Redhat Openshift SaaS Providers: Salesforce,Linkedin, Taleo
  • 9. Delivery Models: lPublic Cloud lPrivate Cloud lHybrid Cloud lCommunity Cloud* * NIST Defines Community cloud as The cloud infrastructure provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission,security requirements, policy, and compliance considerations).
  • 10. lNow Few Questions ?? 1. What service model does AWS fall into ?? 2. What are the advantages of using Cloud Platform for Big data ? 3. How AWS leverage those advantages to provide Big Data Analytics ?
  • 11. Advantage of Cloud Platform l- Ability to Scale the infrastructure l- OPEX instead of CAPEX l- Custom solutions as per the need. l- Easier/faster Deployment. l- Help focus on Core Business l solutions/Analytics. So , It can be safely said that the Cloud Platform acts as Enabler of Big Data technology.
  • 12. AWS Big Data Analytics :
  • 15. Hadoop as a Service lAmazon Elastic Mapreduce supports Hadoop Software Eco-System.(Hadoop 1.X, Hadoop 2.X) lAmazon EMR control software is responsible for automated arrangement, coordination, and management of Hadoop Cluster. lAmazon Elastic Mapreduce also Supports MAPR, Apache Hadoop-derived software.
  • 16. Integrated With Tools Amazon EMR provides you have root access to the cluster. Additional Software required can be installed and configured in the cluster before Hadoop starts by creating BootStrap Action. *Spark is installed using BootStrapping.
  • 17. Mapreduce Engine lJob/Task lRoles of Servers: la> Master Node lb> Core Node lc> Task Node lStep: Unit of work Mapreduce Engine implements the Distributed processing framework of Hadoop.
  • 18. Mapreduce Engine- Cont.. ll Hadoop AWS Name Node Master Node Data Node Core Node Additional concepts of Task Node and Steps : Task Node - Task Nodes are optional. You can add task Nodes when you start the cluster, or you can add task groups to a running cluster. Because they do not store data and can be added and removed from a cluster, you can use task nodes to manage the EC2 instance capacity your cluster uses, increasing capacity to handle peak loads and decreasing it later. Steps: Contains 1 or more Hadoop jobs. Step is an instruction given to manipulate date using Hadoop jobs. Max. no of Pending and Active Steps allowed in Cluster is 256.
  • 19. Massively Parallel lVirtual Instances -Much Easier to Scale. lQuick and Cost effective Scaling. lDynamic Resizing while running the job. lDistributed Hadoop System in true sense. lMultiple clusters accessing same data
  • 20. Cost Effective AWS Wrapper lSpot Instances lPay as you go. lAutomatic Cluster termination after job completion. lBundled License softwares with infrastructure. lEconomy of Scale
  • 21. Integrated to AWS Services lAmazon EMR is integrated with other Amazon Web Services such as Amazon EC2, Amazon S3, DynamoDB, Amazon RDS, CloudWatch, and AWS Data Pipeline. lEasily access data stored in AWS from EMR cluster and make use of the functionality offered by other Amazon Web Services to manage your cluster and store the output of your cluster Compute lEC2 Networking •VPC •ELB •Route 53 Storage lEBS lS3 lGlacier Data Services lRDS lDynamoDB lRedshift Deployment and Management lAWS Management Console lAWS Command Line Interface lAWS IAM lCloud Watch
  • 22. Life Cycle of EMR Cluster
  • 23. How to launch and connect to EMR Cluster- Quick Demo
  • 24.
  • 25. Click on Create Cluster
  • 26. lProvide Cluster name for easier Identification. lTermination Protection has to be selected 'Yes' to prevent accidental termination of Cluster. lLogging has to be enabled as this feature leads to automatic logging of cluster activity. lProvide S3 folder location for logging. lDebugging is enabled so that any troubleshooting regarding cluster activity can be done.
  • 27. lIt is optional feature but always encouraged to have tags. lTag is Key/Value pair which gets associated with every resource in cluster. lHelps in monitoring and in managing cluster resource easily.