SlideShare a Scribd company logo
Introduction To
Data Analysis, Storage, & Processing Solutions
Data analysis and data analytics solutions
Generally, Analysis is a examination of something in order to understand
its nature or determine its essential features.
Specifically, Data Analysis is a process of compiling, processing, and
analyzing data so that you can use it to make proper decisions.
Analytics is systematic analysis of data.
Data analytics is the specific analytical process being applied for analysis.
Why use data analytics? 🤔
It’s simple,to stop making business or any such decisions not just based on
intuitions but based on data. Other specific use cases are:
● Customer Personalization
● Fraud Detection
● Security Threat Detection
● User Behaviour
● Financial modeling and forecasting
many more...
Components of data analytics solution
Steps of a data analysis solution
1. Get the data [Collect, Store]: Know where your data comes
from.
2. Discover and analyze your data[Analyze/Process]: Know the
options for processing your data
3. Visualize and learn from your data[Consume/Visualize]: Know
what you need to learn from data
Challenges
of data
analytics
😑
Knowledge check
Scenario
My business has a set of 15 JSON data files that are
each about 2.5 GB in size. They are placed on a file
server once an hour. They must be ingested as soon
as they arrive in this location. This data must be
combined with all transactions from the financial
dashboard for this same period, then compared to the
recommendations from the marketing engine. All data
is fully cleansed. The results from this time period
must be made available to decision makers by 10
minutes after the hour in the form of financial
dashboards.
Based on the scenario, which of
the following Vs pose a
challenge for this business?
● Volume
● Velocity
● Variety
● Veracity
● Value
Volume - data storage
When businesses have more data than they are able to process and analyze,
they have a volume problem.
Classification of data source types:
● Structured data
● Semistructured data
● Unstructured data
Unstructured data is every file we store, every picture we take, and email we send.
Introduction to
Amazon S3
Amazon S3 is object
storage built to store and
retrieve any amount of
data from anywhere.
It is the perfect place to
store your semi-structured
and unstructured data in
the internet.
Amazon S3 concepts
How does S3 store your data?
- Amazon S3 stores data as objects within buckets.
How to access your content?
- Through object key
Data Analysis solution on Amazon S3
● Decoupling of storage from compute and data processing
● Centralized data architecture
● Integration with clusterless and serverless AWS services
● Standardized Application Programming Interfaces (APIs)
Knowledge Check
Which of the following elements does an Amazon S3 object URL contain?
● Object key
● Bucket
● User key
● Access token
Introduction to
data lakes
● A data lake is a
centralized
repository that
allows you to store
structured,
semistructured, and
unstructured data at
any scale.
Benefits of data lakes
● Single source of Truth
● Store any type of data, regardless of structure
● Can be analyzed using Artificial Intelligence and Machine
Learning
Introduction to data storage methods
- Data Lakes
- Data Warehouse
Data WareHouse
A data warehouse is a central repository of structured data from many data
sources. This data is transformed, aggregated, and prepared for business
reporting and analysis.
Data Marts
Traditional data warehousing: pros and cons
Pros Cons
Fast data retrieval Costly to implement
Curated data sets Maintenance can be challenging
Centralized storage Security concerns
Better business intelligence Hard to scale to meet demand
Amazon Redshift
It is a cloud-based, scalable, secure environment for your data warehouse
Benefits of Amazon Redshift
Faster performance
10x faster than other data warehouses
Easy to set up, deploy, and manage
Secure
Scales quickly to meet your needs
Data storage on Big Scale
We have discussed several recommendations for storing data:
● When storing individual objects or files, AWS recommends Amazon S3.
● When storing massive volumes of data, both semistructured and
unstructured, AWS recommends building a data lake on Amazon S3.
● When storing massive amounts of structured data for complex analysis,
AWS recommends storing your data in Amazon Redshift.
Apache Hadoop
Hadoop uses a distributed processing architecture, in which a task is mapped to a cluster of
commodity servers for processing.
Velocity- data processing
When businesses need rapid insights from the data they are collecting, but the systems
in place simply cannot meet the need, there's a velocity problem.
Data processing means the collection and manipulation of data to produce meaningful
information. Data processing is divided into two parts:
Introduction to data processing methods
Introduction to batch data processing
Batch processing is the execution of a series of programs, or jobs, on one or
more computers without manual intervention.
Batch processing architecture
● AWS EMR - It is used for processing vast amounts of data. It does ETL
operations.
● AWS Glue - It is used for processing vast amounts of data. It helps us with
data discovery, conversion, mapping and job scheduling.
● AWS Lambda - It is a serverless compute service that runs your code in
response to events and automatically manages the underlying compute
resources for you.
Batch processing architecture
Basic Batch processing architecture using Amazon EMR Basic Batch processing architecture using AWS Glue
Introduction to stream data processing
Stream processing is the collection and processing of a constant stream of
data.
Benefits of stream processing
Stream processing architecture
1. Amazon Kinesis - It makes it easy to collect, process, and analyze real-
time, streaming data so you can get timely insights and react quickly to
new information.
2. Amazon Athena - It is used for querying data directly in Amazon S3
3. Amazon Quicksight - It is used to produce insightful dashboards and
reports
How does Stream processing takes place?
Stream processing architecture
Combined processing architecture
Thank you!!!

More Related Content

What's hot

ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
Amazon Web Services
 
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Amazon Web Services
 
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech TalksDeep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
Amazon Web Services
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
Amazon Web Services
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
Amazon Web Services
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Amazon Web Services
 
AWS & Database Analytics
AWS & Database AnalyticsAWS & Database Analytics
AWS & Database Analytics
Amazon Web Services
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
Amazon Web Services
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
Amazon Web Services
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Amazon Web Services
 
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
Amazon Web Services
 
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
Rolf Koski
 
ABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For EnterpriseABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For Enterprise
Amazon Web Services
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
Amazon Web Services
 
AWS Initiate Berlin - Einführung in AWS - Eine Übersicht
AWS Initiate Berlin - Einführung in AWS - Eine ÜbersichtAWS Initiate Berlin - Einführung in AWS - Eine Übersicht
AWS Initiate Berlin - Einführung in AWS - Eine Übersicht
Amazon Web Services
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Amazon Web Services
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
Amazon Web Services
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
Amazon Web Services
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Amazon Web Services
 

What's hot (20)

ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both WorldsABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
ABD330_Combining Batch and Stream Processing to Get the Best of Both Worlds
 
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
Explore Your Data Using Amazon QuickSight and Build Your First Machine Learni...
 
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech TalksDeep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
Deep Dive on Amazon QuickSight - January 2017 AWS Online Tech Talks
 
ABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWSABD312_Deep Dive Migrating Big Data Workloads to AWS
ABD312_Deep Dive Migrating Big Data Workloads to AWS
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Serverless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis AnalyticsServerless Streaming Data Processing using Amazon Kinesis Analytics
Serverless Streaming Data Processing using Amazon Kinesis Analytics
 
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT333) - AWS re:Invent 2018
 
AWS & Database Analytics
AWS & Database AnalyticsAWS & Database Analytics
AWS & Database Analytics
 
Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200Building Your Data Lake on AWS - Level 200
Building Your Data Lake on AWS - Level 200
 
Visualization with Amazon QuickSight
Visualization with Amazon QuickSightVisualization with Amazon QuickSight
Visualization with Amazon QuickSight
 
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
Business Intelligence in Minutes with Amazon Athena and Amazon QuickSight - A...
 
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
Creating Rich, Interactive Business Dashboards in Amazon QuickSight (ANT339) ...
 
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
AWS Community Day Nordics 2018 - Aino Health: Transition to serverless and le...
 
ABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For EnterpriseABD311_Deploying Amazon QuickSight For Enterprise
ABD311_Deploying Amazon QuickSight For Enterprise
 
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 TiVo: How to Scale New Products with a Data Lake on AWS and Qubole TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
TiVo: How to Scale New Products with a Data Lake on AWS and Qubole
 
AWS Initiate Berlin - Einführung in AWS - Eine Übersicht
AWS Initiate Berlin - Einführung in AWS - Eine ÜbersichtAWS Initiate Berlin - Einführung in AWS - Eine Übersicht
AWS Initiate Berlin - Einführung in AWS - Eine Übersicht
 
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...
 
Building Data Lakes with AWS
Building Data Lakes with AWSBuilding Data Lakes with AWS
Building Data Lakes with AWS
 
Data Warehouses and Data Lakes
Data Warehouses and Data LakesData Warehouses and Data Lakes
Data Warehouses and Data Lakes
 
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018
 

Similar to Introduction to Data Analysis, Storage & Processing Solutions

Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Amazon Web Services
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
Vishwas N
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
Amazon Web Services
 
Data Engineering
Data EngineeringData Engineering
Data Engineering
kiansahafi
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
Amazon Web Services
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
AkhilSinghal21
 
Aws centralized logs
Aws centralized logsAws centralized logs
Aws centralized logs
Subramanyam Vemala
 
AWS Data Lakes & Best Practices - GoDgtl
AWS Data Lakes & Best Practices - GoDgtlAWS Data Lakes & Best Practices - GoDgtl
AWS Data Lakes & Best Practices - GoDgtl
Mezzybatliwala
 
AWS Data Lakes and Best Practices
AWS Data Lakes and Best PracticesAWS Data Lakes and Best Practices
AWS Data Lakes and Best Practices
PeeterParkar
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
Amazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
Amazon Web Services
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
Amazon Web Services
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
Amazon Web Services
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on aws
Don Gillis
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
Amazon Web Services
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
Amazon Web Services
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
MadhuriNigam1
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
Amazon Web Services
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
Itai Yaffe
 

Similar to Introduction to Data Analysis, Storage & Processing Solutions (20)

Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS AnalyticsFinding Meaning in the Noise: Understanding Big Data with AWS Analytics
Finding Meaning in the Noise: Understanding Big Data with AWS Analytics
 
Azure data lakes
Azure data lakesAzure data lakes
Azure data lakes
 
Fast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWSFast Track to Your Data Lake on AWS
Fast Track to Your Data Lake on AWS
 
Data Engineering
Data EngineeringData Engineering
Data Engineering
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
Aws centralized logs
Aws centralized logsAws centralized logs
Aws centralized logs
 
AWS Data Lakes & Best Practices - GoDgtl
AWS Data Lakes & Best Practices - GoDgtlAWS Data Lakes & Best Practices - GoDgtl
AWS Data Lakes & Best Practices - GoDgtl
 
AWS Data Lakes and Best Practices
AWS Data Lakes and Best PracticesAWS Data Lakes and Best Practices
AWS Data Lakes and Best Practices
 
Database and Analytics on the AWS Cloud
Database and Analytics on the AWS CloudDatabase and Analytics on the AWS Cloud
Database and Analytics on the AWS Cloud
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
AWS re:Invent 2016: How to Build a Big Data Analytics Data Lake (LFS303)
 
Agile enterprise analytics on aws
Agile enterprise analytics on awsAgile enterprise analytics on aws
Agile enterprise analytics on aws
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Unleashing the Power of your Data
Unleashing the Power of your DataUnleashing the Power of your Data
Unleashing the Power of your Data
 

Recently uploaded

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
timhan337
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 

Recently uploaded (20)

Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Honest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptxHonest Reviews of Tim Han LMA Course Program.pptx
Honest Reviews of Tim Han LMA Course Program.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 

Introduction to Data Analysis, Storage & Processing Solutions

  • 1. Introduction To Data Analysis, Storage, & Processing Solutions
  • 2. Data analysis and data analytics solutions Generally, Analysis is a examination of something in order to understand its nature or determine its essential features. Specifically, Data Analysis is a process of compiling, processing, and analyzing data so that you can use it to make proper decisions. Analytics is systematic analysis of data. Data analytics is the specific analytical process being applied for analysis.
  • 3. Why use data analytics? 🤔 It’s simple,to stop making business or any such decisions not just based on intuitions but based on data. Other specific use cases are: ● Customer Personalization ● Fraud Detection ● Security Threat Detection ● User Behaviour ● Financial modeling and forecasting many more...
  • 4. Components of data analytics solution
  • 5. Steps of a data analysis solution 1. Get the data [Collect, Store]: Know where your data comes from. 2. Discover and analyze your data[Analyze/Process]: Know the options for processing your data 3. Visualize and learn from your data[Consume/Visualize]: Know what you need to learn from data
  • 7. Knowledge check Scenario My business has a set of 15 JSON data files that are each about 2.5 GB in size. They are placed on a file server once an hour. They must be ingested as soon as they arrive in this location. This data must be combined with all transactions from the financial dashboard for this same period, then compared to the recommendations from the marketing engine. All data is fully cleansed. The results from this time period must be made available to decision makers by 10 minutes after the hour in the form of financial dashboards. Based on the scenario, which of the following Vs pose a challenge for this business? ● Volume ● Velocity ● Variety ● Veracity ● Value
  • 8. Volume - data storage When businesses have more data than they are able to process and analyze, they have a volume problem. Classification of data source types: ● Structured data ● Semistructured data ● Unstructured data
  • 9. Unstructured data is every file we store, every picture we take, and email we send.
  • 10. Introduction to Amazon S3 Amazon S3 is object storage built to store and retrieve any amount of data from anywhere. It is the perfect place to store your semi-structured and unstructured data in the internet.
  • 11. Amazon S3 concepts How does S3 store your data? - Amazon S3 stores data as objects within buckets. How to access your content? - Through object key
  • 12. Data Analysis solution on Amazon S3 ● Decoupling of storage from compute and data processing ● Centralized data architecture ● Integration with clusterless and serverless AWS services ● Standardized Application Programming Interfaces (APIs)
  • 13. Knowledge Check Which of the following elements does an Amazon S3 object URL contain? ● Object key ● Bucket ● User key ● Access token
  • 14. Introduction to data lakes ● A data lake is a centralized repository that allows you to store structured, semistructured, and unstructured data at any scale.
  • 15. Benefits of data lakes ● Single source of Truth ● Store any type of data, regardless of structure ● Can be analyzed using Artificial Intelligence and Machine Learning
  • 16. Introduction to data storage methods - Data Lakes - Data Warehouse
  • 17. Data WareHouse A data warehouse is a central repository of structured data from many data sources. This data is transformed, aggregated, and prepared for business reporting and analysis.
  • 19. Traditional data warehousing: pros and cons Pros Cons Fast data retrieval Costly to implement Curated data sets Maintenance can be challenging Centralized storage Security concerns Better business intelligence Hard to scale to meet demand
  • 20. Amazon Redshift It is a cloud-based, scalable, secure environment for your data warehouse Benefits of Amazon Redshift Faster performance 10x faster than other data warehouses Easy to set up, deploy, and manage Secure Scales quickly to meet your needs
  • 21. Data storage on Big Scale We have discussed several recommendations for storing data: ● When storing individual objects or files, AWS recommends Amazon S3. ● When storing massive volumes of data, both semistructured and unstructured, AWS recommends building a data lake on Amazon S3. ● When storing massive amounts of structured data for complex analysis, AWS recommends storing your data in Amazon Redshift.
  • 22. Apache Hadoop Hadoop uses a distributed processing architecture, in which a task is mapped to a cluster of commodity servers for processing.
  • 23. Velocity- data processing When businesses need rapid insights from the data they are collecting, but the systems in place simply cannot meet the need, there's a velocity problem. Data processing means the collection and manipulation of data to produce meaningful information. Data processing is divided into two parts:
  • 24. Introduction to data processing methods
  • 25. Introduction to batch data processing Batch processing is the execution of a series of programs, or jobs, on one or more computers without manual intervention.
  • 26. Batch processing architecture ● AWS EMR - It is used for processing vast amounts of data. It does ETL operations. ● AWS Glue - It is used for processing vast amounts of data. It helps us with data discovery, conversion, mapping and job scheduling. ● AWS Lambda - It is a serverless compute service that runs your code in response to events and automatically manages the underlying compute resources for you.
  • 27. Batch processing architecture Basic Batch processing architecture using Amazon EMR Basic Batch processing architecture using AWS Glue
  • 28. Introduction to stream data processing Stream processing is the collection and processing of a constant stream of data.
  • 29. Benefits of stream processing
  • 30. Stream processing architecture 1. Amazon Kinesis - It makes it easy to collect, process, and analyze real- time, streaming data so you can get timely insights and react quickly to new information. 2. Amazon Athena - It is used for querying data directly in Amazon S3 3. Amazon Quicksight - It is used to produce insightful dashboards and reports
  • 31. How does Stream processing takes place?

Editor's Notes

  1. Hadoop Video:
  2. <<You may be wondering why are there two different topics analysis and analytics as they sound familiar>><<My story with this>>
  3. A data analysis solution has many components. The analytics performed in each of these components may require different services and different approaches.
  4. Data analysis solutions incorporate many forms of analytics to store, process, and visualize data. Planning a data analysis solution begins with knowing what you need out of that solution i.e. Looking at the Big picture. What does existing solution look like? What’s the end result of model’s output?
  5. An object key is the unique identifier for an object in the bucket. There is no user key or access token built into the URL itself.
  6. Every Amazon S3 object URL contains the bucket and object key for the item. An object key is the unique identifier for an object in the bucket. There is no user key or access token built into the URL itself.
  7. In the last topic, we discussed data storage and Amazon S3. Now it’s time to discuss how the data is organized in this service. Amazon S3 is an amazing object container. Like any bucket, you can put content in it in a neat and orderly fashion, or you can just dump it in. But no matter how the data gets there, once it’s there, you need a way to organize it in a meaningful way so you can find it when you need it.
  8. Need to add Knowledge Check section. Think about it.
  9. As the volume of data has increased, so have the options for storing data. Traditional storage methods such as data warehouses are still very popular and relevant. However, data lakes have become more popular recently. These new options can confuse businesses that are trying to be financially wise and technically relevant. So which is better: data warehouses or data lakes? Neither and both. They are different solutions that can be used together to maintain existing data warehouses while taking full advantage of the benefits of data lakes.
  10. A data warehouse is a central repository of information coming from one or more data sources. Data flows into a data warehouse from transactional systems, relational databases, and other sources. These data sources can include structured, semistructured, and unstructured data. These data sources are transformed into structured data before they are stored in the data warehouse. Data is stored within the data warehouse using a schema. A schema defines how data is stored within tables, columns, and rows.... Business analysts, data scientists, and decision makers access the data through business intelligence (BI) tools, SQL clients, and other analytics applications....
  11. Data warehouses can be massive. Analyzing these huge stores of data can be confusing. Many organizations need a way to limit the tables to those that are most relevant to the analytics users will be performing. Because data marts are generally a copy of data already contained in a data warehouse, they are often fast and simple to implement.
  12. Amazon Redshift Spectrum: works like data lake Video source: https://www.youtube.com/watch?v=_qKm6o1zK3U AWS Data warehouse
  13. Each of the AWS processing services we will cover in the next lesson incorporate a temporary storage layer that houses data while it is being processed and analyzed. This data is eventually moved to permanent storage within one of the other solutions we have already discussed. Apache Hadoop can consume data from an Amazon S3 data lake and process it in batches, scripted, or real-time. Hadoop can analyze for AI or machine learning. When many people think of working with a massive volume of fast-moving data, the first thing that comes to mind is Hadoop. Within AWS, Hadoop frameworks are implemented using Amazon EMR and AWS Glue.
  14. Scheduled batch processing represents data that is processed in a very large volume on a regularly scheduled basis. For instance, once a week or once a day. It is generally the same amount of data with each load, making these workloads predictable. Periodic batch processing is a batch of data that is processed at irregular times. These workloads are often run once a certain amount of data has been collected. This can make them unpredictable and hard to plan around. Near Real-time processing represents streaming data that is processed in small individual batches. The batches are continuously collected and processed within minutes of the data generation. Real-time processing represents streaming data that is processed in very small individual batches. The batches are continuously collected and processed within milliseconds of the data generation.
  15. Data is collected into batches asynchronously. The batch is sent to a processing system when specific conditions are met, such as a specified time of day. The results of the processing job are then sent to a storage location that can be queried later as needed
  16. Batch processing can be performed in different ways using AWS services. The architecture diagram below depicts the components and the data flow of a basic batch analytics system using a traditional approach. This approach uses Amazon S3 for storing data, AWS Lambda for intermediate file-level ETL, Amazon EMR for aggregated ETL (heavy lifting, consolidated transformation, and loading engine), and Amazon Redshift as the data warehouse hosting data needed for reporting. The architecture diagram below depicts the same data flow as above but uses AWS Glue for aggregated ETL (heavy lifting, consolidated transformation, and loading engine). AWS Glue is a fully managed service, as opposed to Amazon EMR, which requires management and configuration of all of the components within the service. It helps us with: Data Discovery Conversion Mapping Job Scheduling In simple words: It deals with simplifying data processing.
  17. Stream data processing gives companies the ability to get insights from their data within seconds of the data being collected.
  18. Consume data in parallel allows multiple users to work simultaneously on the same data.
  19. In this architecture, sensor data is being collected in the form of a stream. The streaming data is being collected from the sensor devices by Amazon Kinesis Data Firehose. This service is configured to send the data to be processed using Amazon Kinesis Data Analytics. This service filters the data for relevant records and send the data into another Kinesis Data Firehose process, which places the results into an Amazon S3 bucket at the serving layer. Using Amazon Athena, the data in the Amazon S3 bucket can now be queried to produce insightful dashboards and reports using Amazon QuickSight.
  20. In this architecture, sensor data is being collected in the form of a stream. The streaming data is being collected from the sensor devices by Amazon Kinesis Data Firehose. This service is configured to send the data to be processed using Amazon Kinesis Data Analytics. This service filters the data for relevant records and send the data into another Kinesis Data Firehose process, which places the results into an Amazon S3 bucket at the serving layer. Using Amazon Athena, the data in the Amazon S3 bucket can now be queried to produce insightful dashboards and reports using Amazon QuickSight.