Centralized Logs/ Data, Data Analytics,
Visualization and ETL (Extract, Transform and Load)
services.
Mr. Subramanyam Tirumani Vemala
subramanyam.vemala@capgemini.com
Centralized Logs, Data Analytics and
Visualization:
In this PPT, have discussed with multiple approaches for processing
1. Centralized Logs,
2. Data Analytics,
3. Visualization and
4. ETL (Extract, Transform and Load)
We can decide on the type of approach, that we need to implement is based
the exact requirements.
Approach-1: Elasticsearch and Kibana: (Third
party tools)
Approach-2: Amazon Elasticsearch + Inbuilt
Kibana (AWS services)
Amazon Elasticsearch – A sample
Architecture:
Approach-3: AWS Glue (An AWS ETL service)
• AWS Glue is a fully managed extract, transform, and load (ETL) service
that makes it easy for customers to prepare and load their data for
analytics.
• You can create and run an ETL job with a few clicks in the AWS
Management Console. You simply point AWS Glue to your data stored
on AWS, and AWS Glue discovers your data and stores the associated
metadata (e.g. table definition and schema) in the AWS Glue Data
Catalog.
• Once cataloged, your data is immediately searchable, queryable, and
available for ETL.
AWS Glue – an AWS ETL service. (Extract,
Transforn and Load) – A sample Architecture:
AWS Glue –Another sample Architecture:
Approach-4: AWS Data Pipeline – ETL (Extract,
Transform and Load)
• AWS Data Pipeline helps you move, integrate, and process data across
AWS compute and storage resources, as well as your on-premises
resources.
• AWS Data Pipeline supports integration of data and activities across
multiple AWS regions.
• With AWS Data Pipeline, you can regularly access your data where it’s
stored, transform and process it at scale, and efficiently transfer the
results to AWS services such as Amazon S3, Amazon RDS, Amazon
DynamoDB, and Amazon EMR.
AWS Data Pipeline - A sample Architecture:
Approach-5: AWS Lake Formation
• AWS Lake Formation is a service that makes it easy to set up a secure
data lake in days. A data lake is a centralized, curated, and secured
repository that stores all your data, both in its original form and
prepared for analysis.
• A data lake enables you to break down data silos and combine
different types of analytics to gain insights and guide better business
decisions.
AWS Data Lake – A sample Architecture:
Lake Formation and Data Lake storage - A
sample Architecture:
Lake Formation – Security:
Approach-6: Amazon EMR – Elastic MapReduce -
Managed Hadoop Framework. (Big Data)
• Amazon Elastic MapReduce (Amazon EMR) is a web service that enables
businesses, researchers, data analysts, and developers to easily and cost-
effectively process vast amounts of data.
• Amazon EMR is the industry-leading cloud big data platform for processing
vast amounts of data using open source tools such as Apache
Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.
• With EMR you can run Petabyte-scale analysis at less than half of the
cost of traditional on-premises solutions and over 3x faster than standard
Apache Spark.
• For short-running jobs, you can spin up and spin down clusters and pay per
second for the instances used. For long-running workloads, you can
create highly available clusters that automatically scale to meet demand.
How the EMR, Elastic MapReduce Works?
Approach-7: Amazon Kinesis - Easily collect, process,
and analyze video and data streams in real time.
• Amazon Kinesis makes it easy to collect, process, and analyze real-time,
streaming data so you can get timely insights and react quickly to new
information.
• Amazon Kinesis offers key capabilities to cost-effectively process streaming
data at any scale, along with the flexibility to choose the tools that best suit
the requirements of your application.
• With Amazon Kinesis, you can ingest real-time data such as video, audio,
application logs, website clickstreams, and IoT telemetry data for machine
learning, analytics, and other applications.
• Amazon Kinesis enables you to process and analyze data as it arrives and
respond instantly instead of having to wait until all your data is collected
before the processing can begin.
Amazon Kinesis – Data Analytics:
Amazon Kinesis – Data Streams:
Amazon Kinesis – Video Streams:
Amazon Kinesis – Data Firehose:
Some slides for more information:
AWS Analytics services: We need to choose the
exact AWS service depends on the use cases.
AWS Data Lake: Usage of different data storage
services depends on the type of input Data.
AWS Data movement: This can be used if live data
streaming is needed – May be MSK or Amazon Kinesis.
Visualization Tools: We need to opt the right one based
on the approaches that we have chosen above.
1. AWS QuickSight – From Amazon.
2. Kibana - Kibana is a popular open source visualization tool designed
to work with Elasticsearch.
AWS Analytics Services: The type of service
will be chosen based the exact need.
1. Amazon ElasticSearch service
2. Athena
3. RedShift
4. EMR
5. Kinesis
6. ElasticSearch (https://www.elastic.co/)
Appendix1:
https://aws.amazon.com/solutions/implementations/centralized-
logging/#:~:text=The%20Centralized%20Logging%20solution%20enables,multiple%20acco
unts%20and%20AWS%20Regions.
https://aws.amazon.com/glue/?whats-new-cards.sort-
by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc
https://aws.amazon.com/lake-formation/?whats-new-cards.sort-
by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc
https://www.elastic.co/
https://aws.amazon.com/big-data/datalakes-and-analytics/
Appendix2:
https://aws.amazon.com/datapipeline/#:~:text=AWS%20Data%20Pipeli
ne%20is%20a,data%20sources%2C%20at%20specified%20intervals.&te
xt=AWS%20Data%20Pipeline%20also%20allows,in%20on%2Dpremises
%20data%20silos.
https://aws.amazon.com/blogs/big-data/analyze-data-in-amazon-
dynamodb-using-amazon-sagemaker-for-real-time-prediction/
https://aws.amazon.com/emr/?whats-new-cards.sort-
by=item.additionalFields.postDateTime&whats-new-cards.sort-
order=desc

Aws centralized logs