SlideShare a Scribd company logo
1 of 25
Download to read offline
Big Data on AWS 
Johann Romefort
Agenda 
• What is Big Data? 
• What is AWS? 
• Presenting the tools: How Big Data and AWS fit 
together
What is Big Data? 
• It’s at the intersection of data’s 3 V: 
• Velocity (Batch / Real time / Streaming) 
• Volume (Terabytes/Petabytes) 
• Variety (structure/semi-structured/unstructured)
Why is everybody talking about it? 
• Cost of generation of data has gone down 
• By 2015, 3B people will be online, pushing data 
volume created to 8 zettabytes 
• More data = More insights = Better decisions 
• Ease and cost of processing is falling thanks to 
cloud platforms
Data flow and constraints 
Generate 
Ingest / Store 
Process 
Visualize / Share 
The 3 V involve 
heterogeneity and 
make it hard to 
achieve those steps
What is AWS? 
• AWS is a cloud computing platform 
• On-demand delivery of IT resources 
• Pay-as-you-go pricing model
Cloud Computing 
+ + 
Compute Storage Networking 
Adapts dynamically to ever 
changing needs to stick closely 
to user infrastructure and 
applications requirements
How does AWS helps 
with Big Data? 
• Remove constraints on the ingesting, storing, and 
processing layer and adapts closely to demands. 
• Provides a collection of integrated tools to adapt to 
the 3 V’s of Big Data 
• Unlimited capacity of storage and processing power 
fits well to changing data storage and analysis 
requirements.
Computing Solutions 
for Big Data on AWS 
EC2 EMR 
Kinesis 
Redshift
Computing Solutions 
for Big Data on AWS 
EC2 
All-purpose computing instances. 
Dynamic Provisioning and resizing 
Let you scale your infrastructure 
at low cost 
Use Case: Well suited for running custom or proprietary 
application (ex: SAP Hana, Tableau…)
Computing Solutions 
for Big Data on AWS 
EMR 
‘Hadoop in the cloud’ 
Adapt to complexity of the analysis 
and volume of data to process 
Use Case: Offline processing of very large volume of data, 
possibly unstructured (Variety variable)
Computing Solutions 
for Big Data on AWS 
Kinesis 
Stream Processing 
Real-time data 
Scale to adapt to the flow of 
inbound data 
Use Case: Complex Event Processing, click streams, 
sensors data, computation over window of time
Computing Solutions 
for Big Data on AWS 
RedShift 
Data Warehouse in the cloud 
Scales to Petabytes 
Supports SQL Querying 
Start small for just $0.25/h 
Use Case: BI Analysis, Use of ODBC/JDBC legacy software 
to analyze or visualize data
Storage Solution 
for Big Data on AWS 
DynamoDB RedShift 
S3 Glacier
Storage Solution 
for Big Data on AWS 
DynamoDB 
NoSQL Database 
Consistent 
Low latency access 
Column-base flexible 
data model 
Use Case: Offline processing of very large volume of data, 
possibly unstructured (Variety variable)
Storage Solution 
for Big Data on AWS 
S3 
Versatile storage system 
Low-cost 
Fast retrieving of data 
Use Case: Backups and Disaster recovery, Media storage, 
Storage for data analysis
Storage Solution 
for Big Data on AWS 
Glacier 
Archive storage of cold data 
Extremely low-cost 
optimized for data infrequently 
accessed 
Use Case: Storing raw logs of data. Storing media archives. 
Magnetic tape replacement
What makes AWS different 
when it comes to big data?
Integrated Environment for Big Data 
Given the 3V’s a collection of tools is most of the time 
needed for your data processing and storage. 
AWS Big Data solutions comes integrated with each others 
already 
AWS Big Data solutions also integrate with the whole AWS 
ecosystem (Security, Identity Management, Logging, Backups, 
Management Console…)
Example of products interacting with 
each other.
Tightly integrated rich 
environment of tools 
+ 
On-demand scaling sticking to 
processing requirements 
= 
Extremely cost-effective and easy to 
deploy solution for big data needs
Use Case: 
Real-time IOT Analytics 
Gathering data in real time from sensors deployed in 
factory and send them for immediate processing 
• Error Detection: Real-time detection of hardware 
problems 
• Optimization and Energy management
First Version of the 
infrastructure 
Aggregate 
Sensors 
data 
nodejs 
stream 
processor 
On customer site 
evaluate rules 
over time 
window 
mongodb 
feed algorithm 
in-house hadoop cluster 
write raw 
data for 
further 
processing 
backup
Second Version of the 
infrastructure 
Aggregate 
Sensors 
data 
On customer site 
evaluate rules 
over time 
window 
write raw 
data for 
archiving 
Kinesis RedShift 
for BI 
analysis 
Glacier
Thank You 
romefort@gmail.com 
follow me on @romefort

More Related Content

What's hot

Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data StoryLynn Langit
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerIdan Tohami
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWSStylight
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLSingleStore
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond RelationalLynn Langit
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge DatabasesLynn Langit
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.
 
Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"DataConf
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAmazon Web Services
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveTesora
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowGary Stafford
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the CloudInstaclustr
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseLynn Langit
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 

What's hot (20)

Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Building big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran TesslerBuilding big data applications on AWS by Ran Tessler
Building big data applications on AWS by Ran Tessler
 
Big data on AWS
Big data on AWSBig data on AWS
Big data on AWS
 
Real-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQLReal-Time Analytics with Confluent and MemSQL
Real-Time Analytics with Confluent and MemSQL
 
Beyond Relational
Beyond RelationalBeyond Relational
Beyond Relational
 
Introduction to AWS Glue
Introduction to AWS Glue Introduction to AWS Glue
Introduction to AWS Glue
 
Bleeding Edge Databases
Bleeding Edge DatabasesBleeding Edge Databases
Bleeding Edge Databases
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Yahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile PlatformYahoo's Next Generation User Profile Platform
Yahoo's Next Generation User Profile Platform
 
Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"Vitalii Bondarenko "Machine Learning on Fast Data"
Vitalii Bondarenko "Machine Learning on Fast Data"
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
Aws meetup 20190427
Aws meetup 20190427Aws meetup 20190427
Aws meetup 20190427
 
Managing Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack TroveManaging Cassandra Databases with OpenStack Trove
Managing Cassandra Databases with OpenStack Trove
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the Cloud
 
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseBenchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Azure SQL Data Warehouse
Azure SQL Data Warehouse Azure SQL Data Warehouse
Azure SQL Data Warehouse
 
Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 

Similar to Big Data on AWS

Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataStylight
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewAmazon Web Services
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudAmazon Web Services
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryAmazon Web Services
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWSAmazon Web Services
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAmazon Web Services
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 

Similar to Big Data on AWS (20)

Lean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big DataLean Enterprise, Microservices and Big Data
Lean Enterprise, Microservices and Big Data
 
Welcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution OverviewWelcome & AWS Big Data Solution Overview
Welcome & AWS Big Data Solution Overview
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Big Data Solutions Day - Calgary
Big Data Solutions Day - CalgaryBig Data Solutions Day - Calgary
Big Data Solutions Day - Calgary
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Building your First Big Data Application on AWS
Building your First Big Data Application on AWSBuilding your First Big Data Application on AWS
Building your First Big Data Application on AWS
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
AWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity CouchsurfingAWS Webcast - Attunity Couchsurfing
AWS Webcast - Attunity Couchsurfing
 
Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
(BDT201) Big Data and HPC State of the Union | AWS re:Invent 2014
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 

More from Johann Romefort

A Gentle introduction to Blockchain with Ethereum
A Gentle introduction to Blockchain with EthereumA Gentle introduction to Blockchain with Ethereum
A Gentle introduction to Blockchain with EthereumJohann Romefort
 
Introduction to Blockchain with an Ethereuem Hands-on
Introduction to Blockchain with an Ethereuem Hands-onIntroduction to Blockchain with an Ethereuem Hands-on
Introduction to Blockchain with an Ethereuem Hands-onJohann Romefort
 
IoT on AWS with NodeMCU for less than 5 Euros
IoT on AWS with NodeMCU for less than 5 EurosIoT on AWS with NodeMCU for less than 5 Euros
IoT on AWS with NodeMCU for less than 5 EurosJohann Romefort
 
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJSSupply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJSJohann Romefort
 
CoreOS introduction - by johann romefort
CoreOS introduction - by johann romefortCoreOS introduction - by johann romefort
CoreOS introduction - by johann romefortJohann Romefort
 
Le passage de clientele a communaute
Le passage de clientele a communauteLe passage de clientele a communaute
Le passage de clientele a communauteJohann Romefort
 
Webcom - From the Social Web to the Web of Data
Webcom - From the Social Web to the Web of DataWebcom - From the Social Web to the Web of Data
Webcom - From the Social Web to the Web of DataJohann Romefort
 
Seesmic - Using Free to Create Value
Seesmic - Using Free to Create ValueSeesmic - Using Free to Create Value
Seesmic - Using Free to Create ValueJohann Romefort
 

More from Johann Romefort (9)

A Gentle introduction to Blockchain with Ethereum
A Gentle introduction to Blockchain with EthereumA Gentle introduction to Blockchain with Ethereum
A Gentle introduction to Blockchain with Ethereum
 
Introduction to Blockchain with an Ethereuem Hands-on
Introduction to Blockchain with an Ethereuem Hands-onIntroduction to Blockchain with an Ethereuem Hands-on
Introduction to Blockchain with an Ethereuem Hands-on
 
IoT on AWS with NodeMCU for less than 5 Euros
IoT on AWS with NodeMCU for less than 5 EurosIoT on AWS with NodeMCU for less than 5 Euros
IoT on AWS with NodeMCU for less than 5 Euros
 
Hack the hack vivatech
Hack the hack vivatechHack the hack vivatech
Hack the hack vivatech
 
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJSSupply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
Supply Chain Management on the blockchain with Iot, Azure, BigchainDB, VueJS
 
CoreOS introduction - by johann romefort
CoreOS introduction - by johann romefortCoreOS introduction - by johann romefort
CoreOS introduction - by johann romefort
 
Le passage de clientele a communaute
Le passage de clientele a communauteLe passage de clientele a communaute
Le passage de clientele a communaute
 
Webcom - From the Social Web to the Web of Data
Webcom - From the Social Web to the Web of DataWebcom - From the Social Web to the Web of Data
Webcom - From the Social Web to the Web of Data
 
Seesmic - Using Free to Create Value
Seesmic - Using Free to Create ValueSeesmic - Using Free to Create Value
Seesmic - Using Free to Create Value
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Big Data on AWS

  • 1. Big Data on AWS Johann Romefort
  • 2. Agenda • What is Big Data? • What is AWS? • Presenting the tools: How Big Data and AWS fit together
  • 3. What is Big Data? • It’s at the intersection of data’s 3 V: • Velocity (Batch / Real time / Streaming) • Volume (Terabytes/Petabytes) • Variety (structure/semi-structured/unstructured)
  • 4. Why is everybody talking about it? • Cost of generation of data has gone down • By 2015, 3B people will be online, pushing data volume created to 8 zettabytes • More data = More insights = Better decisions • Ease and cost of processing is falling thanks to cloud platforms
  • 5. Data flow and constraints Generate Ingest / Store Process Visualize / Share The 3 V involve heterogeneity and make it hard to achieve those steps
  • 6. What is AWS? • AWS is a cloud computing platform • On-demand delivery of IT resources • Pay-as-you-go pricing model
  • 7. Cloud Computing + + Compute Storage Networking Adapts dynamically to ever changing needs to stick closely to user infrastructure and applications requirements
  • 8. How does AWS helps with Big Data? • Remove constraints on the ingesting, storing, and processing layer and adapts closely to demands. • Provides a collection of integrated tools to adapt to the 3 V’s of Big Data • Unlimited capacity of storage and processing power fits well to changing data storage and analysis requirements.
  • 9. Computing Solutions for Big Data on AWS EC2 EMR Kinesis Redshift
  • 10. Computing Solutions for Big Data on AWS EC2 All-purpose computing instances. Dynamic Provisioning and resizing Let you scale your infrastructure at low cost Use Case: Well suited for running custom or proprietary application (ex: SAP Hana, Tableau…)
  • 11. Computing Solutions for Big Data on AWS EMR ‘Hadoop in the cloud’ Adapt to complexity of the analysis and volume of data to process Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 12. Computing Solutions for Big Data on AWS Kinesis Stream Processing Real-time data Scale to adapt to the flow of inbound data Use Case: Complex Event Processing, click streams, sensors data, computation over window of time
  • 13. Computing Solutions for Big Data on AWS RedShift Data Warehouse in the cloud Scales to Petabytes Supports SQL Querying Start small for just $0.25/h Use Case: BI Analysis, Use of ODBC/JDBC legacy software to analyze or visualize data
  • 14. Storage Solution for Big Data on AWS DynamoDB RedShift S3 Glacier
  • 15. Storage Solution for Big Data on AWS DynamoDB NoSQL Database Consistent Low latency access Column-base flexible data model Use Case: Offline processing of very large volume of data, possibly unstructured (Variety variable)
  • 16. Storage Solution for Big Data on AWS S3 Versatile storage system Low-cost Fast retrieving of data Use Case: Backups and Disaster recovery, Media storage, Storage for data analysis
  • 17. Storage Solution for Big Data on AWS Glacier Archive storage of cold data Extremely low-cost optimized for data infrequently accessed Use Case: Storing raw logs of data. Storing media archives. Magnetic tape replacement
  • 18. What makes AWS different when it comes to big data?
  • 19. Integrated Environment for Big Data Given the 3V’s a collection of tools is most of the time needed for your data processing and storage. AWS Big Data solutions comes integrated with each others already AWS Big Data solutions also integrate with the whole AWS ecosystem (Security, Identity Management, Logging, Backups, Management Console…)
  • 20. Example of products interacting with each other.
  • 21. Tightly integrated rich environment of tools + On-demand scaling sticking to processing requirements = Extremely cost-effective and easy to deploy solution for big data needs
  • 22. Use Case: Real-time IOT Analytics Gathering data in real time from sensors deployed in factory and send them for immediate processing • Error Detection: Real-time detection of hardware problems • Optimization and Energy management
  • 23. First Version of the infrastructure Aggregate Sensors data nodejs stream processor On customer site evaluate rules over time window mongodb feed algorithm in-house hadoop cluster write raw data for further processing backup
  • 24. Second Version of the infrastructure Aggregate Sensors data On customer site evaluate rules over time window write raw data for archiving Kinesis RedShift for BI analysis Glacier
  • 25. Thank You romefort@gmail.com follow me on @romefort