SlideShare a Scribd company logo
1 of 52
Big Data in the Cloud
Ganesh Raja
Solutions Architect AWS
graja@amazon.com
©Amazon.com, Inc. and its affiliates. All rights reserved.
When you do a directory listing on a folder and the lights
start to flicker
What is Big Data ?
When your data sets become
so large that you have to start innovating how to Collect,
Store, Organize, Analyze and Share it
Its tough because of
Velocity, Volume and Variety
What is Big Data ?
Big Data
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
The cost of data generation
is falling
GB TB
PB
ZB
EB
Unconstrained Data Growth
Big Data is now moving fast …
• IT/ Application server logs
IT Infrastructure logs, Metering,
Audit logs, Change logs
• Web sites / Mobile Apps/ Ads
Clickstream, User Engagement
• Sensor data
Weather, Smart Grids, Wearables
• Social Media, User Content
450MM+ Tweets/day
The volume of data
is increasing
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower cost,
higher throughput
Highly
constrained
Data Volume
Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Generated data
Available for analysis
Elastic & Highly Scalable
+
No capital expense
+
Pay-per-use
+
On-demand
Cloud Computing
$0
= Remove constraints
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Accelerated
3.5 Billion records, 71 Million unique cookies, 1.7 Million targeted ads per day
Analyzed customer clicks and impressions with Elastic MapReduce
“…no upfront investment in hardware, no hardware procurement delay, and no
additional operations staff was hired.”
“Because of the richness of the algorithm and the flexibility of the platform to support it
at scale, our first client campaign experienced a 500% increase in their return on
ad spend from a similar campaign a year before.”
Targeted Ad
User recently
purchased a sports
movie and is
searching for video
games
Case Study: Razorfish
Results:
500% return on ad spend
From 2 months procurement
time to a minute
How?
Big Data Technology
Technologies and techniques for
working productively with data,
at any scale
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
AWS
Data Pipeline
Amazon S3,
Amazon Glacier,
AWS Storage Gateway,
Amazon DynamoDB,
Amazon Redshift,
Amazon RDS,
Amazon Kinesis
Amazon EC2,
Amazon Elastic
MapReduce
Amazon EC2
Amazon S3,
Amazon Redshift,
Amazon RDS
Amazon
Simple Storage Service
(S3)
EMR is Hadoop in the Cloud
Hadoop is an open-source framework for
parallel processing huge amounts of data
on a cluster of machines
What is Amazon Elastic MapReduce (EMR)?
How does it work?
1. Put the data
into S3 (or HDFS)
2. Launch your cluster.
Choose:
• Hadoop distribution
• How many nodes
• Node type (hi-CPU, hi-
memory, etc.)
• Hadoop apps (Hive, Pig,
HBase)
3. Get the results.
S3 EMR Cluster
How does it work?
S3 EMR Cluster
You can easily
resize the cluster.
How does it work?
S3 EMR Cluster
Use Spot instances
to save time and
money.
How does it work?
S3 EMR Cluster
Launch parallel
clusters against the
same data source
How does it work?
S3 EMR Cluster
When the work is
complete, you can
terminate the
cluster (and stop
paying)
Cost to run a 100 node Elastic MapReduce Cluster
INR 450/hour
($7.5/h)
Cost to run a 100 node Elastic MapReduce Cluster
Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/
Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/
Each day AWS adds the equivalent
server capacity to a global, $7B
enterprise
Thousands of Customers, 5+ Millions of Clusters
The Zoo
Apache
Kafka
Amazon
Kinesis
Apache
Flume
Storm
Apache
Spark
Apache
Spark
Streaming
Hadoop/
EMR
Redshift S3
DynamoDB
Hive Pig Shark
HDFS
Impala
?
EMR makes it easy to use Hive and Pig
Pig:
• High-level programming
language (Pig Latin)
• Supports UDFs
• Ideal for data flow/ETL
Hive:
• Data Warehouse for Hadoop
• SQL-like query language (HiveQL)
• Initially developed at Facebook
HBase:
• Column-oriented database
• Runs on top of HDFS
• Ideal for sparse data
• Random, read/write access
• Ideal for very large tables (billions of
rows, millions of columns)
Mahout:
• Machine learning library
• Supports recommendation
mining, clustering,
classification, and frequent
itemset mining
EMR makes it easy to use HBase and Mahout
EMR makes it easy to use Spark and Shark
Shark:
• Hive on Spark
• Up to 100x Faster
• Compatible with Hive
• Used at Yahoo, Airbnb, etc
• Download via BA
Spark:
• In-memory MapReduce
• Up to 100x faster than Hadoop
• Access HDFS, HBase, S3
• Developed at UCBerkely
• Download via BA
Ganglia
• Scalable distributed monitoring
• View performance of the cluster and
individual nodes
• Open source
R:
• Language and software
environment for statistical
computing and graphics
• Open source
EMR makes it easy to use other tools and
applications
EMR Also Supports MapR’s Hadoop Distributions
In addition to the Amazon Hadoop distribution, EMR supports
two MapR Hadoop distributions
M3
M5
M7
Key features of MapR
NFS
No NameNode
JobTracker high availability
Cluster mirroring (disaster recovery)
Compression
Integrates with Hadoop Eco-System
EMR
Amazon S3 Amazon EMR
Amazon S3 Amazon EMR Amazon Redshift
Amazon Redshift
• Easy to provision, scale, operate
• No up-front costs, pay-as-you-go
• Fast: Columnar storage, compression,
Specialized nodes
• 1-100 node clusters 2TB - 1.6PB
• $999 per TB per year
• Transparent backups, restore, failover
• Security in transit, at rest, for backups, VPC
Data Warehousing the AWS way
Amazon S3 Amazon EMR Amazon Redshift
Amazon S3 Amazon EMR Amazon Redshift
Reporting
+BI
Amazon Redshift
AWS
Marketplace
AWS BI Partners
Amazon Redshift
Reporting
+BI
On-premise
Systems
3 million (100 * 30 thousand) queries are executed per hour on 100
HANA instances.
Each SAP HANA instance is deployed on an Amazon Web Services (AWS) Linux
server.
Each AWS Server has 16 cores, 60 GB of RAM and 600 million rows
(Total: 1776 cores, 6.6TB RAM and 60 billion rows).”
Rolls-Royce performance for the cost of a Ford.USD 400 / Hour
Experiment
often
Fail quickly,
at low cost
More
Innovation
Data is the
New Oil
How does Rajinikanth do Big
Data ?
Thank you!
Ganesh Raja
Solutions Architect AWS
graja@amazon.com
©Amazon.com, Inc. and its affiliates. All rights reserved.

More Related Content

What's hot

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisAmazon Web Services
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017Amazon Web Services
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1Milind gunjan
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftAmazon Web Services
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)Amazon Web Services
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analyticsKovid Academy
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAmazon Web Services
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementAmazon Web Services
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBAmazon Web Services
 
Introducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksIntroducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksAmazon Web Services
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSAmazon Web Services
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAmazon Web Services
 

What's hot (20)

AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit ParisBig Data Analytics on AWS - Carlos Conde - AWS Summit Paris
Big Data Analytics on AWS - Carlos Conde - AWS Summit Paris
 
#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017#EarthOnAWS | AWS Public Sector Summit 2017
#EarthOnAWS | AWS Public Sector Summit 2017
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
Deep Dive in Big Data
Deep Dive in Big DataDeep Dive in Big Data
Deep Dive in Big Data
 
Modern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon RedshiftModern Data Warehousing with Amazon Redshift
Modern Data Warehousing with Amazon Redshift
 
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
大數據運算媒體業案例分享 (Big Data Compute Case Sharing for Media Industry)
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Amazon big success using big data analytics
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
 
AWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMRAWS May Webinar Series - Getting Started with Amazon EMR
AWS May Webinar Series - Getting Started with Amazon EMR
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
Securing Your Big Data on AWS
Securing Your Big Data on AWSSecuring Your Big Data on AWS
Securing Your Big Data on AWS
 
Using Big Data to Driving Big Engagement
Using Big Data to Driving Big EngagementUsing Big Data to Driving Big Engagement
Using Big Data to Driving Big Engagement
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Getting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDBGetting Started with Amazon DynamoDB
Getting Started with Amazon DynamoDB
 
Introducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech TalksIntroducing AWS DeepLens - AWS Online Tech Talks
Introducing AWS DeepLens - AWS Online Tech Talks
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
AWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloudAWS Summit 2011: Big Data Analytics in the AWS cloud
AWS Summit 2011: Big Data Analytics in the AWS cloud
 

Viewers also liked

AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAmazon Web Services
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAmazon Web Services
 
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...Amazon Web Services
 
Introducing Database Offerings on AWS - Technical 101
Introducing Database Offerings on AWS - Technical 101Introducing Database Offerings on AWS - Technical 101
Introducing Database Offerings on AWS - Technical 101Amazon Web Services
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAmazon Web Services
 
Wicked rugby
Wicked rugbyWicked rugby
Wicked rugbyDklumb4
 
cloud computing in e commerce
cloud computing in e commercecloud computing in e commerce
cloud computing in e commercesteffz
 
Social Networking Project (website) full documentation
Social Networking Project (website) full documentation Social Networking Project (website) full documentation
Social Networking Project (website) full documentation Tenzin Tendar
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...Amazon Web Services
 
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群Amazon Web Services
 

Viewers also liked (11)

AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWSAWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
AWS Cloud Kata | Kuala Lumpur - Getting to Scale on AWS
 
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWSAWS Cloud Kata | Bangkok - Getting to Scale on AWS
AWS Cloud Kata | Bangkok - Getting to Scale on AWS
 
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
AWS Summit 2013 | India - Scaling Seamlessly and Going Global with the Cloud,...
 
Introducing Database Offerings on AWS - Technical 101
Introducing Database Offerings on AWS - Technical 101Introducing Database Offerings on AWS - Technical 101
Introducing Database Offerings on AWS - Technical 101
 
AWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the CloudAWS Webcast - Website Hosting in the Cloud
AWS Webcast - Website Hosting in the Cloud
 
Wicked rugby
Wicked rugbyWicked rugby
Wicked rugby
 
cloud computing in e commerce
cloud computing in e commercecloud computing in e commerce
cloud computing in e commerce
 
AWS for Startups
AWS for StartupsAWS for Startups
AWS for Startups
 
Social Networking Project (website) full documentation
Social Networking Project (website) full documentation Social Networking Project (website) full documentation
Social Networking Project (website) full documentation
 
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
AWS for Start-ups - Architectural Best Practices & Automating Your Infrastruc...
 
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群
使用 Amazon Pinpoint 讓你的行動 App 更精準接觸客群
 

Similar to Aaum Analytics event - Big data in the cloud

Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Amazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSAmazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSAmazon Web Services
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin BriskmanSameer Kenkare
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisAmazon Web Services
 
Choosing the Right Data Storage Solution
Choosing the Right Data Storage SolutionChoosing the Right Data Storage Solution
Choosing the Right Data Storage SolutionAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopArchana Gopinath
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceAmazon Web Services
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...Amazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...Amazon Web Services
 

Similar to Aaum Analytics event - Big data in the cloud (20)

Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Value of Data Beyond Analytics by Darin Briskman
 Value of Data Beyond Analytics by Darin Briskman Value of Data Beyond Analytics by Darin Briskman
Value of Data Beyond Analytics by Darin Briskman
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 
Choosing the Right Data Storage Solution
Choosing the Right Data Storage SolutionChoosing the Right Data Storage Solution
Choosing the Right Data Storage Solution
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Deep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduceDeep Dive: Amazon Elastic MapReduce
Deep Dive: Amazon Elastic MapReduce
 
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
(HLS402) Getting into Your Genes: The Definitive Guide to Using Amazon EMR, A...
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
AWS re:Invent 2016: Big Data Architectural Patterns and Best Practices on AWS...
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 

Recently uploaded

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 

Recently uploaded (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 

Aaum Analytics event - Big data in the cloud

  • 1. Big Data in the Cloud Ganesh Raja Solutions Architect AWS graja@amazon.com ©Amazon.com, Inc. and its affiliates. All rights reserved.
  • 2. When you do a directory listing on a folder and the lights start to flicker What is Big Data ?
  • 3. When your data sets become so large that you have to start innovating how to Collect, Store, Organize, Analyze and Share it Its tough because of Velocity, Volume and Variety What is Big Data ?
  • 4. Big Data Generation Collection & storage Analytics & computation Collaboration & sharing
  • 5. The cost of data generation is falling
  • 6. GB TB PB ZB EB Unconstrained Data Growth Big Data is now moving fast … • IT/ Application server logs IT Infrastructure logs, Metering, Audit logs, Change logs • Web sites / Mobile Apps/ Ads Clickstream, User Engagement • Sensor data Weather, Smart Grids, Wearables • Social Media, User Content 450MM+ Tweets/day
  • 7. The volume of data is increasing
  • 8. Generation Collection & storage Analytics & computation Collaboration & sharing Lower cost, higher throughput
  • 9. Generation Collection & storage Analytics & computation Collaboration & sharing Lower cost, higher throughput Highly constrained
  • 10. Data Volume Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011 IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares Generated data Available for analysis
  • 11. Elastic & Highly Scalable + No capital expense + Pay-per-use + On-demand Cloud Computing $0 = Remove constraints
  • 12. Generation Collection & storage Analytics & computation Collaboration & sharing Accelerated
  • 13.
  • 14. 3.5 Billion records, 71 Million unique cookies, 1.7 Million targeted ads per day Analyzed customer clicks and impressions with Elastic MapReduce “…no upfront investment in hardware, no hardware procurement delay, and no additional operations staff was hired.” “Because of the richness of the algorithm and the flexibility of the platform to support it at scale, our first client campaign experienced a 500% increase in their return on ad spend from a similar campaign a year before.” Targeted Ad User recently purchased a sports movie and is searching for video games Case Study: Razorfish
  • 15. Results: 500% return on ad spend From 2 months procurement time to a minute
  • 16. How?
  • 17. Big Data Technology Technologies and techniques for working productively with data, at any scale
  • 18. Generation Collection & storage Analytics & computation Collaboration & sharing AWS Data Pipeline Amazon S3, Amazon Glacier, AWS Storage Gateway, Amazon DynamoDB, Amazon Redshift, Amazon RDS, Amazon Kinesis Amazon EC2, Amazon Elastic MapReduce Amazon EC2 Amazon S3, Amazon Redshift, Amazon RDS
  • 20. EMR is Hadoop in the Cloud Hadoop is an open-source framework for parallel processing huge amounts of data on a cluster of machines What is Amazon Elastic MapReduce (EMR)?
  • 21. How does it work? 1. Put the data into S3 (or HDFS) 2. Launch your cluster. Choose: • Hadoop distribution • How many nodes • Node type (hi-CPU, hi- memory, etc.) • Hadoop apps (Hive, Pig, HBase) 3. Get the results. S3 EMR Cluster
  • 22. How does it work? S3 EMR Cluster You can easily resize the cluster.
  • 23. How does it work? S3 EMR Cluster Use Spot instances to save time and money.
  • 24. How does it work? S3 EMR Cluster Launch parallel clusters against the same data source
  • 25. How does it work? S3 EMR Cluster When the work is complete, you can terminate the cluster (and stop paying)
  • 26. Cost to run a 100 node Elastic MapReduce Cluster INR 450/hour ($7.5/h)
  • 27. Cost to run a 100 node Elastic MapReduce Cluster Photos: renee_mcgurk https://www.flickr.com/photos/51018933@N08/5355664961/in/photostream/ Calgary Reviews https://www.flickr.com/photos/calgaryreviews/6328302248/in/photostream/
  • 28. Each day AWS adds the equivalent server capacity to a global, $7B enterprise
  • 29. Thousands of Customers, 5+ Millions of Clusters
  • 31. EMR makes it easy to use Hive and Pig Pig: • High-level programming language (Pig Latin) • Supports UDFs • Ideal for data flow/ETL Hive: • Data Warehouse for Hadoop • SQL-like query language (HiveQL) • Initially developed at Facebook
  • 32. HBase: • Column-oriented database • Runs on top of HDFS • Ideal for sparse data • Random, read/write access • Ideal for very large tables (billions of rows, millions of columns) Mahout: • Machine learning library • Supports recommendation mining, clustering, classification, and frequent itemset mining EMR makes it easy to use HBase and Mahout
  • 33. EMR makes it easy to use Spark and Shark Shark: • Hive on Spark • Up to 100x Faster • Compatible with Hive • Used at Yahoo, Airbnb, etc • Download via BA Spark: • In-memory MapReduce • Up to 100x faster than Hadoop • Access HDFS, HBase, S3 • Developed at UCBerkely • Download via BA
  • 34. Ganglia • Scalable distributed monitoring • View performance of the cluster and individual nodes • Open source R: • Language and software environment for statistical computing and graphics • Open source EMR makes it easy to use other tools and applications
  • 35. EMR Also Supports MapR’s Hadoop Distributions In addition to the Amazon Hadoop distribution, EMR supports two MapR Hadoop distributions M3 M5 M7 Key features of MapR NFS No NameNode JobTracker high availability Cluster mirroring (disaster recovery) Compression
  • 36. Integrates with Hadoop Eco-System EMR
  • 38. Amazon S3 Amazon EMR Amazon Redshift
  • 39. Amazon Redshift • Easy to provision, scale, operate • No up-front costs, pay-as-you-go • Fast: Columnar storage, compression, Specialized nodes • 1-100 node clusters 2TB - 1.6PB • $999 per TB per year • Transparent backups, restore, failover • Security in transit, at rest, for backups, VPC Data Warehousing the AWS way
  • 40. Amazon S3 Amazon EMR Amazon Redshift
  • 41. Amazon S3 Amazon EMR Amazon Redshift Reporting +BI
  • 44.
  • 45. 3 million (100 * 30 thousand) queries are executed per hour on 100 HANA instances.
  • 46. Each SAP HANA instance is deployed on an Amazon Web Services (AWS) Linux server. Each AWS Server has 16 cores, 60 GB of RAM and 600 million rows (Total: 1776 cores, 6.6TB RAM and 60 billion rows).”
  • 47. Rolls-Royce performance for the cost of a Ford.USD 400 / Hour
  • 48.
  • 51. How does Rajinikanth do Big Data ?
  • 52. Thank you! Ganesh Raja Solutions Architect AWS graja@amazon.com ©Amazon.com, Inc. and its affiliates. All rights reserved.