SlideShare a Scribd company logo
1 of 56
ianmas@amazon.com
@IanMmmm
LARGE SCALE DATA
ANALYSIS WITH AWS



Ian Massingham – Technical Evangelist
THE MORE DATA YOU COLLECT
THE MORE VALUE YOU CAN
DERIVE FROM IT!
THE COST OF DATA
GENERATION IS FALLING!
We are constantly producing more data
From all types of industries
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
Highly
constrained
+ ELASTIC AND HIGHLY SCALABLE
+ NO UPFRONT CAPITAL EXPENSE
+ ONLY PAY FOR WHAT YOU USE
+ AVAILABLE ON-DEMAND
= REMOVE CONSTRAINTS
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
AWS Import / Export
AWS Direct Connect
Inbound data transfer is free
Multipart upload to S3
Physical media
AWS Direct Connect
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon EC2
Amazon Elastic
MapReduce
AMAZON EC2

ELASTIC COMPUTE CLOUD!
3 HOURS

FOR $4828.85/hr!
Instead of
$20+ MILLIONS
in infrastructure!
SIMULATING

205,000 COMPOUNDS!
18 HOURS

FOR $1833.33/hr!
Instead of
$68+ MILLIONS
in infrastructure!
GPU INSTANCES"
"
G2"
CG1 
1x NVIDIA Kepler GK104

8 vCPU (Intel Xeon E5-2670)
2x NVIDIA Fermi M2050

16 vCPU (Intel Xeon X5570)
$0.65/h
$2.10/h
ON A SINGLE INSTANCE
COMPUTE TIME: 4h
COST: 4h x $2.1 = $8.4
ON MULTIPLE INSTANCES
COMPUTE TIME: 1h
COST: 1h x 4 x $2.1 = $8.4
AMAZON ELASTIC
MAPREDUCE

HADOOP AS A SERVICE!
•  SPLITS DATA INTO PIECES
•  LETS PROCESSING OCCUR
•  GATHERS THE RESULTS!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
BATCH
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
AMAZON KINESIS

REAL-TIME DATA STREAM PROCESSING!
Real-time response to content
in semi-structured data streams



Relatively simple computations
on data (aggregates, filters,
sliding window, etc.)
Hourly server logs: how your
systems went wrong an hour ago
Weekly / Monthly Bill: What you
spent this past billing cycle
Daily customer report from your
website: tells you what deal or ad
to try next time
Daily fraud reports: tells you if there
was fraud yesterday
Daily business reports: tells me
how customers used AWS services
yesterday
Real-time metrics: what just went
wrong now
Real-time spending alerts/caps:
guaranteeing you can’t overspend
Real-time analysis: what to offer
the current customer now
Real-time detection: blocks
fraudulent use now
Fast ETL into Amazon Redshift:
how are customers using services
now
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon EC2
Amazon Elastic
MapReduce
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
AWS Import / Export
AWS Direct Connect
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon Kinesis
Stream Processing on
Amazon EC2
WANT TO KNOW MORE?
aws.amazon.com/solutions/case-studies/big-data/!
ianmas@amazon.com
@IanMmmm
THANK YOU



Ian Massingham – Technical Evangelist
Further References
Atomic Fiction Case-Study Video!
https://www.youtube.com/watch?v=ljHo1_5sWxo!
Slideshare with full details on the Schrodinger Materials Science case-study!
http://www.slideshare.net/insideHPC/cycle-computing-recordbreaking-peta-scale-hpc-run!
Real-time Streaming and Querying with Amazon Kinesis and Amazon EMR Video!
https://www.youtube.com/watch?v=NIa33ZwFa8E!
!
!

More Related Content

What's hot

Scalability: Rdbms Vs Other Data Stores
Scalability: Rdbms Vs Other Data StoresScalability: Rdbms Vs Other Data Stores
Scalability: Rdbms Vs Other Data Stores
Ramki Gaddipati
 

What's hot (20)

Aws pricing overview
Aws pricing overviewAws pricing overview
Aws pricing overview
 
Cost Optimization on AWS - Pop-up Loft Tel Aviv
Cost Optimization on AWS - Pop-up Loft Tel AvivCost Optimization on AWS - Pop-up Loft Tel Aviv
Cost Optimization on AWS - Pop-up Loft Tel Aviv
 
AWS Cost Optimisation Made Easy
AWS Cost Optimisation Made EasyAWS Cost Optimisation Made Easy
AWS Cost Optimisation Made Easy
 
AWS Batch: Simplifying Batch Computing in the Cloud
AWS Batch: Simplifying Batch Computing in the CloudAWS Batch: Simplifying Batch Computing in the Cloud
AWS Batch: Simplifying Batch Computing in the Cloud
 
E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.
E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.
E-Commerce Success is a Balancing Act. Ensure Success with ClustrixDB.
 
AWS Melbourne CO Meetup - Introduction - 20 Nov 2017
AWS Melbourne CO Meetup - Introduction - 20 Nov 2017AWS Melbourne CO Meetup - Introduction - 20 Nov 2017
AWS Melbourne CO Meetup - Introduction - 20 Nov 2017
 
Scalability: Rdbms Vs Other Data Stores
Scalability: Rdbms Vs Other Data StoresScalability: Rdbms Vs Other Data Stores
Scalability: Rdbms Vs Other Data Stores
 
SRV301 Getting the Most Bang for your Buck with #EC2 #Winning
SRV301 Getting the Most Bang for your Buck with #EC2 #WinningSRV301 Getting the Most Bang for your Buck with #EC2 #Winning
SRV301 Getting the Most Bang for your Buck with #EC2 #Winning
 
Alliance 2017 - Jet Reports Tips and Trips
Alliance 2017 - Jet Reports Tips and TripsAlliance 2017 - Jet Reports Tips and Trips
Alliance 2017 - Jet Reports Tips and Trips
 
AWS Summit Berlin 2013 - Tadaa - HD Camera and Photo Community
AWS Summit Berlin 2013 - Tadaa - HD Camera and Photo CommunityAWS Summit Berlin 2013 - Tadaa - HD Camera and Photo Community
AWS Summit Berlin 2013 - Tadaa - HD Camera and Photo Community
 
AWS & Infrastructure Hardening - Cloud Infrastructure Security
AWS & Infrastructure Hardening - Cloud Infrastructure SecurityAWS & Infrastructure Hardening - Cloud Infrastructure Security
AWS & Infrastructure Hardening - Cloud Infrastructure Security
 
Aws architecture problems while being fancy
Aws architecture problems while being fancyAws architecture problems while being fancy
Aws architecture problems while being fancy
 
From monolithic to serverless with Amazon Step Functions
From monolithic to serverless with Amazon Step FunctionsFrom monolithic to serverless with Amazon Step Functions
From monolithic to serverless with Amazon Step Functions
 
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
Big Data Integration & Analytics Data Flows with AWS Data Pipeline (BDT207) |...
 
Azure Web Scalability
Azure Web ScalabilityAzure Web Scalability
Azure Web Scalability
 
How Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
 
Value, TCO & Cost Optimisation
Value, TCO & Cost OptimisationValue, TCO & Cost Optimisation
Value, TCO & Cost Optimisation
 
AWS Machine Learning Big Data NYC
AWS Machine Learning Big Data NYC AWS Machine Learning Big Data NYC
AWS Machine Learning Big Data NYC
 
Voice Powered Analytics: Data Analytics Week at the SF Loft
Voice Powered Analytics: Data Analytics Week at the SF LoftVoice Powered Analytics: Data Analytics Week at the SF Loft
Voice Powered Analytics: Data Analytics Week at the SF Loft
 
Azure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@NightsAzure DocumentDB en Dev@Nights
Azure DocumentDB en Dev@Nights
 

Similar to Cloud World Forum: Large Scale Data Analysis on AWS

Similar to Cloud World Forum: Large Scale Data Analysis on AWS (20)

2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo
2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo
2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo
 
Workshop part2 – Big Data
Workshop part2 – Big DataWorkshop part2 – Big Data
Workshop part2 – Big Data
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWS
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8AWS re:Invent Recap from AWS User Group UK meetup #8
AWS re:Invent Recap from AWS User Group UK meetup #8
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Analyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon KinesisAnalyzing Streaming Data in Real-time with Amazon Kinesis
Analyzing Streaming Data in Real-time with Amazon Kinesis
 
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon KinesisABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
ABD301-Analyzing Streaming Data in Real Time with Amazon Kinesis
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 
Big Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWSBig Data Architectural Patterns and Best Practices on AWS
Big Data Architectural Patterns and Best Practices on AWS
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
 
ABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWSABD201-Big Data Architectural Patterns and Best Practices on AWS
ABD201-Big Data Architectural Patterns and Best Practices on AWS
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
Real Time Data Ingestion & Analysis - AWS Summit Sydney 2018
 
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
Build Data Lakes and Analytics on AWS: Patterns & Best Practices - BDA305 - A...
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Data Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SFData Warehouses & Data Lakes: Data Analytics Week SF
Data Warehouses & Data Lakes: Data Analytics Week SF
 
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 

More from Ian Massingham

More from Ian Massingham (20)

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relations
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless Cloud
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best Practices
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop Keynote
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management Track
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without Servers
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day Roadshow
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow Intro
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda Breakout
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry Pi
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv Loft
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 

Cloud World Forum: Large Scale Data Analysis on AWS