SlideShare a Scribd company logo
1 of 40
Download to read offline
ianmas@amazon.com
@IanMmmm
LARGE SCALE DATA
ANALYSIS WITH AWS



Ian Massingham – Technical Evangelist
THE MORE DATA YOU COLLECT
THE MORE VALUE YOU CAN
DERIVE FROM IT!
THE COST OF DATA
GENERATION IS FALLING!
We are constantly producing more data
From all types of industries
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Lower cost,
higher throughput
Highly
constrained
+ ELASTIC AND HIGHLY SCALABLE
+ NO UPFRONT CAPITAL EXPENSE
+ ONLY PAY FOR WHAT YOU USE
+ AVAILABLE ON-DEMAND
= REMOVE CONSTRAINTS
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
AWS Import / Export
AWS Direct Connect
Inbound data transfer is free
Multipart upload to S3
Physical media
AWS Direct Connect
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon EC2
Amazon Elastic
MapReduce
AMAZON ELASTIC
MAPREDUCE

HADOOP AS A SERVICE!
•  SPLITS DATA INTO PIECES
•  LETS PROCESSING OCCUR
•  GATHERS THE RESULTS!
HDFS
EMRKinesis
S3 DynamoDB
Data management
Pig
Analytics languages/engines
RDS
Redshift AWS Data Pipeline
EMR + IMPALA DEMO
STARTING AN EMR CLUSTER
WITH HADOOP ECOSYSTEM
TOOLS PRE-INSTALLED
COPY & LOAD OUR DATASET
$	
  scp	
  –i	
  EMRKeyPair.pem	
  ~/aws/hadoop/LHRarrivals*.csv	
  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐
west-­‐1.compute.amazonaws.com:	
  
	
  
$	
  ssh	
  –i	
  EMRKeyPair.pem	
  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐west-­‐1.compute.amazonaws.com	
  
	
  
$	
  hadoop	
  fs	
  -­‐mkdir	
  /data/	
  
$	
  hadoop	
  fs	
  -­‐put	
  <uploaded_files>	
  /data/	
  
$	
  hadoop	
  fs	
  -­‐ls	
  -­‐h	
  -­‐R	
  /data/	
  
	
  
or at scale, Distributed Copy using S3DistCp to parallel load from S3
	
  
$	
  .	
  /home/hadoop/impala/conf/impala.conf	
  
$	
  hadoop	
  jar	
  /home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar	
  -­‐Dmapreduce.job.reduces=30	
  -­‐-­‐
src	
  s3://s3bucketname/	
  -­‐-­‐dest	
  hdfs://$HADOOP_NAMENODE_HOST:$HADOOP_NAMENODE_PORT/
data/	
  -­‐-­‐outputCodec	
  'none'	
  
	
  
** Run on a cluster master node
CREATE EXTERNAL TABLE
$	
  #check	
  the	
  size	
  of	
  our	
  data	
  set	
  
$	
  wc	
  –l	
  LHRarrivals*.csv	
  	
  
	
  
	
  850	
  LHRarrivals2.csv	
  
	
  1526	
  LHRarrivals.csv	
  
	
  	
   	
  2376	
  total	
  
	
  
$	
  impala-­‐shell	
  
	
  
Welcome	
  to	
  the	
  Impala	
  shell.	
  
	
  
>	
  create	
  EXTERNAL	
  TABLE	
  flights	
  (	
  input	
  STRING,	
  id	
  BIGINT,	
  widget	
  STRING,	
  source	
  
STRING,	
  resultnum	
  BIGINT,	
  pageurl	
  STRING,	
  scheduled	
  STRING,	
  flightnumber	
  STRING,	
  
airport	
  STRING,	
  status	
  STRING,	
  terminal	
  STRING	
  )	
  ROW	
  FORMAT	
  DELIMITED	
  FIELDS	
  
TERMINATED	
  BY	
  ','	
  LOCATION	
  '/data/';	
  
>	
  select	
  count	
  (*)	
  from	
  flights;	
  
	
  
Should	
  return	
  count(*)	
  2376	
  reflecting	
  the	
  size	
  of	
  the	
  data	
  set	
  
DEMO OF ODBC ACCESS
Doing this part on Amazon WorkSpaces using the Simba Cloudera
Impala ODBC Driver.!
Set up an SSH tunnel to the master node to allow us to connect to port
25010 from the WorkSpaces desktop to the Impala ODBC port!
A previously configured system DSN allows us to work with the data from
our EMR/Impala cluster directly within Microsoft Excel!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
BATCH
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
AMAZON KINESIS

REAL-TIME DATA STREAM PROCESSING!
Real-time response to content
in semi-structured data streams



Relatively simple computations
on data (aggregates, filters,
sliding window, etc.)
Hourly server logs: how your
systems went wrong an hour ago
Weekly / Monthly Bill: What you
spent this past billing cycle
Daily customer report from your
website: tells you what deal or ad
to try next time
Daily fraud reports: tells you if there
was fraud yesterday
Daily business reports: tells me
how customers used AWS services
yesterday
Real-time metrics: what just went
wrong now
Real-time spending alerts/caps:
guaranteeing you can’t overspend
Real-time analysis: what to offer
the current customer now
Real-time detection: blocks
fraudulent use now
Fast ETL into Amazon Redshift:
how are customers using services
now
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon EC2
Amazon Elastic
MapReduce
Amazon S3,
Amazon Glacier,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
AWS Storage Gateway,
Data on Amazon EC2
AWS Import / Export
AWS Direct Connect
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
GENERATE ➔ ➔ SHARE!
STREAM
PROCESSING
Amazon S3,
Amazon DynamoDB,
Amazon RDS,
Amazon Redshift,
Data on Amazon EC2
Amazon Kinesis
Stream Processing on
Amazon EC2
WANT TO KNOW MORE?
aws.amazon.com/solutions/case-studies/big-data/!
ianmas@amazon.com
@IanMmmm
LARGE SCALE DATA
ANALYSIS WITH AWS



Ian Massingham – Technical Evangelist

More Related Content

What's hot

Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Cost Optimisation with AWS
Cost Optimisation with AWSCost Optimisation with AWS
Cost Optimisation with AWSIan Massingham
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaHelen Rogers
 
AWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroAWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroIan Massingham
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseAmazon Web Services
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...Amazon Web Services
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?Ian Massingham
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSAmazon Web Services
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computingAmazon Web Services
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightAmazon Web Services
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSightAmazon Web Services
 
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAmazon Web Services
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSAmazon Web Services
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...Amazon Web Services
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Ian Massingham
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料Amazon Web Services
 

What's hot (20)

AWS Cloud Watch
AWS Cloud WatchAWS Cloud Watch
AWS Cloud Watch
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Cost Optimisation with AWS
Cost Optimisation with AWSCost Optimisation with AWS
Cost Optimisation with AWS
 
Your First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon ElishaYour First Data Lake on AWS_Simon Elisha
Your First Data Lake on AWS_Simon Elisha
 
AWSome Day London January 2016 Intro
AWSome Day London January 2016 IntroAWSome Day London January 2016 Intro
AWSome Day London January 2016 Intro
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data WarehouseSoluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
Soluzioni di Database completamente gestite: NoSQL, relazionali e Data Warehouse
 
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
BDA308 Serverless Analytics with Amazon Athena and Amazon QuickSight, featuri...
 
What's New & What's Next from AWS?
What's New & What's Next from AWS?What's New & What's Next from AWS?
What's New & What's Next from AWS?
 
Workshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWSWorkshop: Building Your First Big Data Application on AWS
Workshop: Building Your First Big Data Application on AWS
 
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing 	  NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
NEW LAUNCH! Introducing AWS Batch: Easy and efficient batch computing
 
Visualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSightVisualizing Big Data Insights with Amazon QuickSight
Visualizing Big Data Insights with Amazon QuickSight
 
Cost Optimization at Scale
Cost Optimization at ScaleCost Optimization at Scale
Cost Optimization at Scale
 
Getting Started with Amazon QuickSight
Getting Started with Amazon QuickSightGetting Started with Amazon QuickSight
Getting Started with Amazon QuickSight
 
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibiliAnalisi dei dati con AWS: una panoramica degli strumenti disponibili
Analisi dei dati con AWS: una panoramica degli strumenti disponibili
 
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
 
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
AWS re:Invent 2016: How Fulfillment by Amazon (FBA) and Scopely Improved Resu...
 
Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015Intro Presentation at AWS AWSome Day Glasgow September 2015
Intro Presentation at AWS AWSome Day Glasgow September 2015
 
利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料利用 Amazon QuickSight 視覺化分析服務剖析資料
利用 Amazon QuickSight 視覺化分析服務剖析資料
 

Similar to 2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo

Cloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSCloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSIan Massingham
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWSDanilo Poccia
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWSAmazon Web Services
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisAmazon Web Services
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFAmazon Web Services
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with LabAmazon Web Services
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon RedshiftAmazon Web Services
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSAmazon Web Services
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR MasterclassIan Massingham
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big DataAmazon Web Services
 
Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Thijs Feryn
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 

Similar to 2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo (20)

Cloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWSCloud World Forum: Large Scale Data Analysis on AWS
Cloud World Forum: Large Scale Data Analysis on AWS
 
Workshop part2 – Big Data
Workshop part2 – Big DataWorkshop part2 – Big Data
Workshop part2 – Big Data
 
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
 
Large Scale Data Analysis with AWS
Large Scale Data Analysis with AWSLarge Scale Data Analysis with AWS
Large Scale Data Analysis with AWS
 
Journey Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data AnalysisJourney Through the AWS Cloud - Big Data Analysis
Journey Through the AWS Cloud - Big Data Analysis
 
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big DataAWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
AWS Summit Tel Aviv - Startup Track - Data Analytics & Big Data
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift with Lab
Loading Data into Redshift with LabLoading Data into Redshift with Lab
Loading Data into Redshift with Lab
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Big Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWSBig Data: Mejores prácticas en AWS
Big Data: Mejores prácticas en AWS
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
Amazon EMR Masterclass
Amazon EMR MasterclassAmazon EMR Masterclass
Amazon EMR Masterclass
 
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 
Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022Build your own CDN with Varnish - Confoo 2022
Build your own CDN with Varnish - Confoo 2022
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 

More from Ian Massingham

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsIan Massingham
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Ian Massingham
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudIan Massingham
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudIan Massingham
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesIan Massingham
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop KeynoteIan Massingham
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackIan Massingham
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapIan Massingham
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapIan Massingham
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudIan Massingham
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without ServersIan Massingham
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day RoadshowIan Massingham
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroIan Massingham
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutIan Massingham
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiIan Massingham
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides Ian Massingham
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endIan Massingham
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups Ian Massingham
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftIan Massingham
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best PracticesIan Massingham
 

More from Ian Massingham (20)

Some thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relationsSome thoughts on measuring the impact of developer relations
Some thoughts on measuring the impact of developer relations
 
Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017Leeds IoT Meetup - Nov 2017
Leeds IoT Meetup - Nov 2017
 
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless CloudDevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
DevTalks Romania - Getting Started with AWS Lambda & the Serverless Cloud
 
Getting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless CloudGetting started with AWS Lambda and the Serverless Cloud
Getting started with AWS Lambda and the Serverless Cloud
 
AWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best PracticesAWS AWSome Day - Getting Started Best Practices
AWS AWSome Day - Getting Started Best Practices
 
AWS IoT Workshop Keynote
AWS IoT Workshop KeynoteAWS IoT Workshop Keynote
AWS IoT Workshop Keynote
 
Security Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management TrackSecurity Best Practices: AWS AWSome Day Management Track
Security Best Practices: AWS AWSome Day Management Track
 
AWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:CapAWS re:Invent 2016 Day 2 Keynote re:Cap
AWS re:Invent 2016 Day 2 Keynote re:Cap
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:CapAWS re:Invent 2016 Day 1 Keynote re:Cap
AWS re:Invent 2016 Day 1 Keynote re:Cap
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Building Better IoT Applications without Servers
Building Better IoT Applications without ServersBuilding Better IoT Applications without Servers
Building Better IoT Applications without Servers
 
AWS AWSome Day Roadshow
AWS AWSome Day RoadshowAWS AWSome Day Roadshow
AWS AWSome Day Roadshow
 
AWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow IntroAWS AWSome Day Roadshow Intro
AWS AWSome Day Roadshow Intro
 
Hashiconf AWS Lambda Breakout
Hashiconf AWS Lambda BreakoutHashiconf AWS Lambda Breakout
Hashiconf AWS Lambda Breakout
 
Getting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry PiGetting started with AWS IoT on Raspberry Pi
Getting started with AWS IoT on Raspberry Pi
 
AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides AWSome Day Dublin Intro & Closing Slides
AWSome Day Dublin Intro & Closing Slides
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups What's New at AWS Update for AWS User Groups
What's New at AWS Update for AWS User Groups
 
Advanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv LoftAdvanced Security Masterclass - Tel Aviv Loft
Advanced Security Masterclass - Tel Aviv Loft
 
Security Best Practices
Security Best PracticesSecurity Best Practices
Security Best Practices
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

2014 Import.io Data Summit - Including Hadoop/Impala Getting Started Demo

  • 1. ianmas@amazon.com @IanMmmm LARGE SCALE DATA ANALYSIS WITH AWS
 
 Ian Massingham – Technical Evangelist
  • 2. THE MORE DATA YOU COLLECT THE MORE VALUE YOU CAN DERIVE FROM IT!
  • 3.
  • 4.
  • 5. THE COST OF DATA GENERATION IS FALLING!
  • 6. We are constantly producing more data
  • 7. From all types of industries
  • 8.
  • 9.
  • 10. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 11. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput
  • 12. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Lower cost, higher throughput Highly constrained
  • 13. + ELASTIC AND HIGHLY SCALABLE + NO UPFRONT CAPITAL EXPENSE + ONLY PAY FOR WHAT YOU USE + AVAILABLE ON-DEMAND = REMOVE CONSTRAINTS
  • 14. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 15. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! AWS Import / Export AWS Direct Connect
  • 16. Inbound data transfer is free Multipart upload to S3 Physical media AWS Direct Connect
  • 17. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2
  • 18. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon EC2 Amazon Elastic MapReduce
  • 19.
  • 21. •  SPLITS DATA INTO PIECES •  LETS PROCESSING OCCUR •  GATHERS THE RESULTS!
  • 22. HDFS EMRKinesis S3 DynamoDB Data management Pig Analytics languages/engines RDS Redshift AWS Data Pipeline
  • 23. EMR + IMPALA DEMO
  • 24. STARTING AN EMR CLUSTER WITH HADOOP ECOSYSTEM TOOLS PRE-INSTALLED
  • 25. COPY & LOAD OUR DATASET $  scp  –i  EMRKeyPair.pem  ~/aws/hadoop/LHRarrivals*.csv  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐ west-­‐1.compute.amazonaws.com:     $  ssh  –i  EMRKeyPair.pem  hadoop@ec2-­‐54-­‐76-­‐242-­‐238.eu-­‐west-­‐1.compute.amazonaws.com     $  hadoop  fs  -­‐mkdir  /data/   $  hadoop  fs  -­‐put  <uploaded_files>  /data/   $  hadoop  fs  -­‐ls  -­‐h  -­‐R  /data/     or at scale, Distributed Copy using S3DistCp to parallel load from S3   $  .  /home/hadoop/impala/conf/impala.conf   $  hadoop  jar  /home/hadoop/lib/emr-­‐s3distcp-­‐1.0.jar  -­‐Dmapreduce.job.reduces=30  -­‐-­‐ src  s3://s3bucketname/  -­‐-­‐dest  hdfs://$HADOOP_NAMENODE_HOST:$HADOOP_NAMENODE_PORT/ data/  -­‐-­‐outputCodec  'none'     ** Run on a cluster master node
  • 26. CREATE EXTERNAL TABLE $  #check  the  size  of  our  data  set   $  wc  –l  LHRarrivals*.csv        850  LHRarrivals2.csv    1526  LHRarrivals.csv        2376  total     $  impala-­‐shell     Welcome  to  the  Impala  shell.     >  create  EXTERNAL  TABLE  flights  (  input  STRING,  id  BIGINT,  widget  STRING,  source   STRING,  resultnum  BIGINT,  pageurl  STRING,  scheduled  STRING,  flightnumber  STRING,   airport  STRING,  status  STRING,  terminal  STRING  )  ROW  FORMAT  DELIMITED  FIELDS   TERMINATED  BY  ','  LOCATION  '/data/';   >  select  count  (*)  from  flights;     Should  return  count(*)  2376  reflecting  the  size  of  the  data  set  
  • 27. DEMO OF ODBC ACCESS Doing this part on Amazon WorkSpaces using the Simba Cloudera Impala ODBC Driver.! Set up an SSH tunnel to the master node to allow us to connect to port 25010 from the WorkSpaces desktop to the Impala ODBC port! A previously configured system DSN allows us to work with the data from our EMR/Impala cluster directly within Microsoft Excel!
  • 28. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2
  • 29. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 30. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! BATCH PROCESSING
  • 31. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  • 32. AMAZON KINESIS
 REAL-TIME DATA STREAM PROCESSING!
  • 33. Real-time response to content in semi-structured data streams
 
 Relatively simple computations on data (aggregates, filters, sliding window, etc.)
  • 34. Hourly server logs: how your systems went wrong an hour ago Weekly / Monthly Bill: What you spent this past billing cycle Daily customer report from your website: tells you what deal or ad to try next time Daily fraud reports: tells you if there was fraud yesterday Daily business reports: tells me how customers used AWS services yesterday Real-time metrics: what just went wrong now Real-time spending alerts/caps: guaranteeing you can’t overspend Real-time analysis: what to offer the current customer now Real-time detection: blocks fraudulent use now Fast ETL into Amazon Redshift: how are customers using services now
  • 35. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE!
  • 36. GENERATE ➔ STORE ➔ ANALYZE ➔ SHARE! Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon EC2 Amazon Elastic MapReduce Amazon S3, Amazon Glacier, Amazon DynamoDB, Amazon RDS, Amazon Redshift, AWS Storage Gateway, Data on Amazon EC2 AWS Import / Export AWS Direct Connect
  • 37. GENERATE ➔ ➔ SHARE! STREAM PROCESSING
  • 38. GENERATE ➔ ➔ SHARE! STREAM PROCESSING Amazon S3, Amazon DynamoDB, Amazon RDS, Amazon Redshift, Data on Amazon EC2 Amazon Kinesis Stream Processing on Amazon EC2
  • 39. WANT TO KNOW MORE? aws.amazon.com/solutions/case-studies/big-data/!
  • 40. ianmas@amazon.com @IanMmmm LARGE SCALE DATA ANALYSIS WITH AWS
 
 Ian Massingham – Technical Evangelist