SlideShare a Scribd company logo
1 of 1
Download to read offline
A
m
a
zo
n
Elastic
MapReduce
Log Files
Repository
A
m
a
zo
n
S3
A
m
a
zo
n
EC2
Visitors
Spot
Instances
A
m
a
zo
n
RDS
Content Delivery
Network
Web
Servers
Hadoop
Cluster
LogFiles
LogFiles
HTTP
Requests
Analyst
Analytics
Database
1
3
4
6
5
A
m
a
zo
n
CloudFront
2
7
System
Overview
WEB LOG
ANALYSIS
Amazon Web Services provides services and infrastructure to
build reliable, fault-tolerant, and highly available web applications
in the cloud. In production environments, these applications can
generate huge amounts of log information.
This data can be an important source of knowledge for any
company that is operating web applications. Analyzing logs can
reveal information such as traffic patterns, user behavior,
marketing profiles, etc.
A
m
a
zo
n
E
C
2
A
m
a
zo
n
E
C
2
S
p
o
t
A
m
a
zo
n
S
3
A
m
azon
C
lo
u
d
F
ro
n
t
AWS
Reference
Architectures
A
m
a
zo
n
E
M
R
However, as the web application grows and the number of visitors
increases, storing and analyzing web logs becomes increasingly
challenging.
This diagram shows how to use Amazon Web Services to build a
scalable and reliable large-scale log analytics platform. The core
component of this architecture is Amazon Elastic MapReduce, a
web service that enables analysts to process large amounts of
data easily and cost-effectively using a Hadoop hosted framework.
A
m
a
zo
n
R
D
S
1
The web front-end servers are running on Amazon
Elastic Compute Cloud (Amazon EC2) instances. 4 An Amazon Elastic MapReduce cluster processes
the data set. Amazon Elastic MapReduce utilizes a
hosted Hadoop framework, which processes the data in a
parallel job flow.
3
Log files are periodically uploaded to Amazon Simple
Storage Service (Amazon S3), a highly available and
reliable data store. Data is sent in parallel from multiple web
servers or edge locations.
5
When Amazon EC2 has unused capacity, it offers EC2
instances at a reduced cost, called the Spot Price. This
price fluctuates based on availability and demand. If your
workload is flexible in terms of time of completion or required
capacity, you can dynamically extend the capacity of your
cluster using Spot Instances and significantly reduce the
cost of running your job flows.
6 Data processing results are pushed back to a relational
database using tools like Apache Hive. The database
can be an Amazon Relational Database Service (Amazon
RDS) instance. Amazon RDS makes it easy to set up,
operate, and scale a relational database in the cloud.
7 Like many services, Amazon RDS instances are
priced on a pay-as-you-go model. After analysis, the
database can be backed-up into Amazon S3 as a database
snapshot, and then terminated. The database can then be
recreated from the snapshot whenever needed.
A
m
a
zo
n
C
lo
u
d
F
ro
n
t
2
Amazon CloudFront is a content delivery network that
uses low latency and high data transfer speeds to
distribute static files to customers. This service also
generates valuable log information.

More Related Content

What's hot

Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWSAmazon Web Services
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...AWSCOMSUM
 
Analyzing Mixpanel Data into Amazon Redshift
Analyzing Mixpanel Data into Amazon RedshiftAnalyzing Mixpanel Data into Amazon Redshift
Analyzing Mixpanel Data into Amazon RedshiftGeorge Psistakis
 
Power-up NoSQL with Cosmos DB
Power-up NoSQL with Cosmos DBPower-up NoSQL with Cosmos DB
Power-up NoSQL with Cosmos DBRadu Vunvulea
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsMark Kromer
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Right scale short architectural overview
Right scale short architectural overviewRight scale short architectural overview
Right scale short architectural overviewGiri Fox
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to RedshiftTreasure Data, Inc.
 
Databases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapDatabases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapSungmin Kim
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Timothy Spann
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Matt Stubbs
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWSAmazon Web Services
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFMark Kromer
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryMark Kromer
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerChester Chen
 

What's hot (20)

Building a Modern Data Platform on AWS
Building a Modern Data Platform on AWSBuilding a Modern Data Platform on AWS
Building a Modern Data Platform on AWS
 
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
Lessons learnt - building a data lake with redshift, emr, and athena - aws co...
 
Analyzing Mixpanel Data into Amazon Redshift
Analyzing Mixpanel Data into Amazon RedshiftAnalyzing Mixpanel Data into Amazon Redshift
Analyzing Mixpanel Data into Amazon Redshift
 
Power-up NoSQL with Cosmos DB
Power-up NoSQL with Cosmos DBPower-up NoSQL with Cosmos DB
Power-up NoSQL with Cosmos DB
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flows
 
Practical Experiences With ArcGIS Server
Practical Experiences With ArcGIS ServerPractical Experiences With ArcGIS Server
Practical Experiences With ArcGIS Server
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Right scale short architectural overview
Right scale short architectural overviewRight scale short architectural overview
Right scale short architectural overview
 
Treasure Data From MySQL to Redshift
Treasure Data  From MySQL to RedshiftTreasure Data  From MySQL to Redshift
Treasure Data From MySQL to Redshift
 
SparkFlow
SparkFlow SparkFlow
SparkFlow
 
AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019AAPG Geoscience Technology Workshop 2019
AAPG Geoscience Technology Workshop 2019
 
Sparkflows Use Cases
Sparkflows Use CasesSparkflows Use Cases
Sparkflows Use Cases
 
Databases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 RecapDatabases & Analytics AWS re:invent 2019 Recap
Databases & Analytics AWS re:invent 2019 Recap
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
 
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
Big Data LDN 2017: Look Ma, No Code! Building Streaming Data Pipelines With A...
 
Building a modern data platform in AWS
Building a modern data platform in AWSBuilding a modern data platform in AWS
Building a modern data platform in AWS
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADF
 
Azure Web Scalability
Azure Web ScalabilityAzure Web Scalability
Azure Web Scalability
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power Query
 
SF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher BernerSF Big Analytics: Machine Learning with Presto by Christopher Berner
SF Big Analytics: Machine Learning with Presto by Christopher Berner
 

Viewers also liked (13)

EX 4 Alexandra Ciscar i Anna Villarroya
EX 4 Alexandra Ciscar i Anna VillarroyaEX 4 Alexandra Ciscar i Anna Villarroya
EX 4 Alexandra Ciscar i Anna Villarroya
 
Target audience
Target audienceTarget audience
Target audience
 
recetta
recettarecetta
recetta
 
Compositions
CompositionsCompositions
Compositions
 
Promıba medica tourısm www.internationalba.com
Promıba medica tourısm www.internationalba.comPromıba medica tourısm www.internationalba.com
Promıba medica tourısm www.internationalba.com
 
Media Smart Seminar
Media Smart SeminarMedia Smart Seminar
Media Smart Seminar
 
Certificate_2
Certificate_2Certificate_2
Certificate_2
 
DMP Technology Program
DMP Technology ProgramDMP Technology Program
DMP Technology Program
 
victorialegge_resume
victorialegge_resumevictorialegge_resume
victorialegge_resume
 
Jessica Grenard TrainingCoordCertif10-15
Jessica Grenard TrainingCoordCertif10-15Jessica Grenard TrainingCoordCertif10-15
Jessica Grenard TrainingCoordCertif10-15
 
07 ¿cómo agregar un día laboral único?
07 ¿cómo agregar un día laboral único?07 ¿cómo agregar un día laboral único?
07 ¿cómo agregar un día laboral único?
 
ΑΡΧΑΙΟ ΘΕΑΤΡΟ
ΑΡΧΑΙΟ ΘΕΑΤΡΟΑΡΧΑΙΟ ΘΕΑΤΡΟ
ΑΡΧΑΙΟ ΘΕΑΤΡΟ
 
lol
lollol
lol
 

Similar to AWS Log Analytics Architecture

Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Amazon Web Services
 
AWS_Basics_By_Aadarsh_Sharan
AWS_Basics_By_Aadarsh_SharanAWS_Basics_By_Aadarsh_Sharan
AWS_Basics_By_Aadarsh_SharanAadarsh Sharan
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSAmazon Web Services
 
Database Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitDatabase Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitAmazon Web Services
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18Neal Davis
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)Amazon Web Services
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...Amazon Web Services
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSjavier ramirez
 

Similar to AWS Log Analytics Architecture (20)

Amazon web services session 4
Amazon web services   session 4Amazon web services   session 4
Amazon web services session 4
 
Amazon web services
Amazon web servicesAmazon web services
Amazon web services
 
AWS Big Data Landscape
AWS Big Data LandscapeAWS Big Data Landscape
AWS Big Data Landscape
 
Aws- Amazon Web Services
Aws- Amazon Web ServicesAws- Amazon Web Services
Aws- Amazon Web Services
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...
 
Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS Build Data Lakes and Analytics on AWS
Build Data Lakes and Analytics on AWS
 
Solution architecture Amazon web services
Solution architecture Amazon web servicesSolution architecture Amazon web services
Solution architecture Amazon web services
 
What is Amazon Redshift?
What is Amazon Redshift?What is Amazon Redshift?
What is Amazon Redshift?
 
AWS_Basics_By_Aadarsh_Sharan
AWS_Basics_By_Aadarsh_SharanAWS_Basics_By_Aadarsh_Sharan
AWS_Basics_By_Aadarsh_Sharan
 
Deploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWSDeploying your Data Warehouse on AWS
Deploying your Data Warehouse on AWS
 
Solution Architecture - AWS
Solution Architecture - AWSSolution Architecture - AWS
Solution Architecture - AWS
 
Web Consumables
Web ConsumablesWeb Consumables
Web Consumables
 
Database Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS SummitDatabase Freedom - ADB304 - Santa Clara AWS Summit
Database Freedom - ADB304 - Santa Clara AWS Summit
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18AWS Certified Solutions Architect Professional Course S15-S18
AWS Certified Solutions Architect Professional Course S15-S18
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Aws
AwsAws
Aws
 
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
AWS March 2016 Webinar Series - Building Big Data Solutions with Amazon EMR a...
 
Big Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWSBig Data, Ingeniería de datos, y Data Lakes en AWS
Big Data, Ingeniería de datos, y Data Lakes en AWS
 

AWS Log Analytics Architecture

  • 1. A m a zo n Elastic MapReduce Log Files Repository A m a zo n S3 A m a zo n EC2 Visitors Spot Instances A m a zo n RDS Content Delivery Network Web Servers Hadoop Cluster LogFiles LogFiles HTTP Requests Analyst Analytics Database 1 3 4 6 5 A m a zo n CloudFront 2 7 System Overview WEB LOG ANALYSIS Amazon Web Services provides services and infrastructure to build reliable, fault-tolerant, and highly available web applications in the cloud. In production environments, these applications can generate huge amounts of log information. This data can be an important source of knowledge for any company that is operating web applications. Analyzing logs can reveal information such as traffic patterns, user behavior, marketing profiles, etc. A m a zo n E C 2 A m a zo n E C 2 S p o t A m a zo n S 3 A m azon C lo u d F ro n t AWS Reference Architectures A m a zo n E M R However, as the web application grows and the number of visitors increases, storing and analyzing web logs becomes increasingly challenging. This diagram shows how to use Amazon Web Services to build a scalable and reliable large-scale log analytics platform. The core component of this architecture is Amazon Elastic MapReduce, a web service that enables analysts to process large amounts of data easily and cost-effectively using a Hadoop hosted framework. A m a zo n R D S 1 The web front-end servers are running on Amazon Elastic Compute Cloud (Amazon EC2) instances. 4 An Amazon Elastic MapReduce cluster processes the data set. Amazon Elastic MapReduce utilizes a hosted Hadoop framework, which processes the data in a parallel job flow. 3 Log files are periodically uploaded to Amazon Simple Storage Service (Amazon S3), a highly available and reliable data store. Data is sent in parallel from multiple web servers or edge locations. 5 When Amazon EC2 has unused capacity, it offers EC2 instances at a reduced cost, called the Spot Price. This price fluctuates based on availability and demand. If your workload is flexible in terms of time of completion or required capacity, you can dynamically extend the capacity of your cluster using Spot Instances and significantly reduce the cost of running your job flows. 6 Data processing results are pushed back to a relational database using tools like Apache Hive. The database can be an Amazon Relational Database Service (Amazon RDS) instance. Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud. 7 Like many services, Amazon RDS instances are priced on a pay-as-you-go model. After analysis, the database can be backed-up into Amazon S3 as a database snapshot, and then terminated. The database can then be recreated from the snapshot whenever needed. A m a zo n C lo u d F ro n t 2 Amazon CloudFront is a content delivery network that uses low latency and high data transfer speeds to distribute static files to customers. This service also generates valuable log information.