SlideShare a Scribd company logo
Big Data Analytics
Adoption & Technologies
Instructor
Dendej Sawarnkatat
dendej@gmail.com
Agenda
• Big Data Adoption & Planning
• Big Data Analytics Lifecycle
• Enterprise Technologies
2
BIG DATA ADOPTION & PLANNING 3
Organization Prerequisites
• Data management and Big Data
governance frameworks.
• Sound processes and sufficient skillsets
personnel.
• Data quality assessment.
• Longevity of the Big Data environment
• Environment expansion roadmap
4
Data Procurement
• The nature of the business may make
external data very valuable.
• External data sources may include
government data sources and commercial
data markets.
• The greater the volume and variety of data
that can be supplied, the higher the
chances are of finding hidden insights
from patterns. 5
Data Costs
• Government-provided data, such as geo-
spatial data, may be free.
• Most commercially relevant data will need
to be purchased or subscribed.
• A substantial budget may still be required
to obtain external data.
6
Privacy
• Performing analytics on datasets jointly can
reveal confidential information about
organizations or individuals.
• Requires an understanding of the nature of data
• data privacy regulations
• special techniques for data tagging and
anonymization.
• For example, car’s GPS log or smart meter
data readings, collected over an extended
period of time can reveal an individual’s
location and behavior. 7
Privacy
8
Security
• Securing Big Data involves ensuring that
the data networks and repositories are
sufficiently secured
• Need to employ proper mechanisms:
• Authentication
• Authorization
• Access Levels
9
Provenance
• Refers to information about the source of
the data and how it has been processed.
• Determine the authenticity and quality of
data
• Can be used for auditing purposes.
10
Data States
• At different stages in the analytics lifecycle, data
will be in different states.
• These states are:
• data-in-motion
• data-in-use
• data-at-rest.
• Whenever Big Data changes state, it should trigger
the capture of provenance information that is
recorded as metadata.
• At the beginning, data’s provenance record can be
initialized with the recording of information that
captures its pedigree. 11
Trusted Result
• Provenance information is essential to
being able to realize the value of the
analytic result.
• What is the origin of the data and what
steps or algorithms were used to process
the data that led to the result.
12
Trusted Result
13
Clouds
• Provide remote environments that can host IT
infrastructure.
• Big Data environment may necessitate that
some or all of that environment be hosted
within a cloud.
• Example
• An enterprise that runs its CRM system in a
cloud decides to add a Big Data solution in
order to run analytics on its CRM data.
• This data can then be shared with its primary
Big Data environment that resides within the
enterprise boundaries.
14
Cloud Justifications
• Inadequate in-house hardware resources
• Upfront capital investment for system
procurement is not available
• Need of sandbox environment so that existing
business processes are not impacted
• The Big Data initiative is a proof of concept
• Datasets that need to be processed are
already cloud resident
• The limits of available computing and storage
resources.
15
BIG DATA ANALYTICS LIFECYCLE 16
The nine stages
of the Big Data
analytics
lifecycle
17
Business Case Evaluation
• Presents a clear understanding of the
justification, motivation and goals of
carrying out the analysis.
• Helps decision-makers understand the
business resources that will need to be
utilized and which business challenges the
analysis will tackle.
18
Data Identification
• Identifying the datasets required for the
analysis project and their sources.
• A wider variety of data sources may
increase the probability of finding hidden
patterns and correlations.
• Data sources can be internal and/or
external to the enterprise, depend on the
business scope of the analysis project and
nature of the business problems being
addressed. 19
Data Acquisition and Filtering
• Both internal and external data
needs to be persisted once it
gets generated or enters the
enterprise boundary.
• Metadata can be added via
automation to data from both
internal and external data
sources to improve the
classification and querying.
20
Data Extraction
• To extracting disparate data and
transforming it into a format that the
underlying Big Data solution.
Comments and user IDs are extracted from an XML
document.
The user ID and coordinates of a user are extracted from a single
JSON field.
21
Data Validation and Cleansing
• To establishing often complex validation rules and
removing any known invalid data.
• Can be used to examine interconnected datasets in
order to fill in missing valid data.
Data validation can
be used to examine
interconnected
datasets in order to
fill in missing valid
data.
The presence of invalid data is resulting in spikes.
Although the data appears abnormal, it may be indicative
of a new pattern.
22
Data Aggregation and
Representation
• To integrating multiple datasets together to arrive
at a unified view.
• this stage can become complicated because of
differences in Data Structure and Semantics.
• it is important to understand that the same data
can be stored in many different forms.
two datasets are
aggregated
together using
the Id field.
23
Data Analysis
• Usually iterative in nature until the
appropriate pattern or correlation is
uncovered.
• Can be as challenging as combining data
mining and complex statistical analysis
techniques to discover patterns and
anomalies.
• Use to generate a statistical or
mathematical model to depict
relationships between variables. 24
Data Analysis Types
1. Confirmatory analysis
2. Exploratory analysis.
25
Confirmatory Data Analysis
• A deductive approach where the cause of the
phenomenon being investigated is proposed
beforehand called a hypothesis.
• The data is then analyzed to prove or
disprove the hypothesis and provide
definitive answers to specific questions.
• Data sampling techniques are typically used.
• Unexpected findings or anomalies are usually
ignored since a predetermined cause was
assumed. 26
Exploratory data analysis
• An inductive approach that is closely
associated with data mining.
• No hypothesis or predetermined assumptions
are generated.
• The data is explored through analysis to
develop an understanding of the cause of the
phenomenon.
• May not provide definitive answers
• Use to find a general direction that can
facilitate the discovery of patterns or
anomalies. 27
Data Visualization
• Data visualization techniques and tools are used
to graphically communicate the analysis results
for effective interpretation by business users.
28
Data Visualization
• Provide users with the ability to perform visual
analysis
• Allow the discovery of answers to questions that
users have not yet even formulated.
29
Utilization of Analysis Results
• To determining how and where processed
analysis data can be further leveraged.
• Common areas that are explored during
this stage include the following:
1. Input for Enterprise Systems
2. Business Process Optimization
3. Alerts
30
Input for Enterprise Systems
• The results may be fed directly into
enterprise systems to enhance and
optimize their behaviors and performance.
• New models may be used to improve the
programming logic within existing
enterprise systems or may form the basis
of new systems.
31
Business Process Optimization
• The identified patterns, correlations and
anomalies discovered during the data
analysis are used to refine business
processes.
• Models may also lead to opportunities to
improve business process logic.
32
Alerts
• Data analysis results can be used as input
for existing alerts or may form the basis of
new alerts.
33
ENTERPRISE TECHNOLOGIES 34
Overview
• An enterprise executed as a layered
system
• The strategic layer constrains the tactical
layer, which directs the operational layer.
• The alignment of layers is captured
through metrics and performance
indicators
• Allows the operational layer to gain insight
into how its processes are executing.
35
Transformation of Data
• Process measurements are aggregated and
enhanced with additional meaning to
become key performance Indices (KPIs)
• Used by tactical layer to assess corporate
performance, or business execution.
• KPIs are used in conjunction with other
measurements and understandings to assess
critical success factors (CSF).
• Forms series of data enrichment regarding
transformation of data into information,
information into knowledge and knowledge
into wisdom.
36
Online Transaction Processing
(OLTP)
• A software system that processes
transaction-oriented data.
• Completes of an activity in real-time and is
not batch-processed.
• Stores operational data that is normalized.
37
OLTP Data
• A common source of structured data and
serves as input to many analytic processes.
• Big Data analysis results can be used to
augment OLTP data stored in the
underlying relational databases.
• The queries supported by OLTP systems
are comprised of simple insert, delete and
update operations with sub-second
response times. 38
Online Analytical Processing
(OLAP)
• A system that is used for processing data
analysis queries.
• Forms an integral part of business
intelligence, data mining and machine
learning processes.
• Can serve as both a data source as well as
a data sink that is capable of receiving
data.
• They are used in diagnostic, predictive and
prescriptive analytics.
39
OLAP
• Performs long-running, complex queries
against a multidimensional database
whose structure is optimized for
performing advanced analytics.
• Stores historical data that is aggregated
and denormalized to support fast
reporting capability.
• Use databases that store historical data in
multidimensional structures 40
OLAP
• Provides answers to complex queries
based on the relationships between
multiple aspects of the data.
41
Extract Transform Load (ETL)
• A process of loading data from a source
system into a target system.
• The source system can be a database, a
flat file, or an application.
• Similarly, the target system can be a
database or some other storage system.
• Represents the main operation through
which data warehouses are fed data. 42
ETL Steps
1. The required data is obtained or extracted from
the sources.
2. The extracts are modified or transformed by the
application of rules.
3. The data is inserted or loaded into the target
system.
43
Data Warehouses
• Central and enterprise-wide repository
consisting of historical and current data.
44
Data Warehouses
• Store data pertaining to multiple business
entities from different operational systems
• Heavily used by BI to run various analytical
queries.
• Usually interface with an OLAP system to
support multi-dimensional analytical
queries.
45
Data Warehouses
• Data is periodically extracted, validated,
transformed and consolidated into a single
denormalized database.
• Over time, the amount of data contained
will continue to increase that leads to
slower query response times for data
analysis tasks.
• Usually resolve by containing optimized
databases, called analytical databases, to
handle reporting and data analysis tasks. 46
Data Mart
47
• A subset of the data stored in a data
warehouse that typically belongs to a
department, division, or specific line of
business.
• Data warehouses can have multiple data
marts.
Data Mart
• Enterprise-wide data is collected and
business entities are then extracted.
• Domain-specific entities are persisted into
the data warehouse via an ETL process.
48
Traditional BI
49
• Primarily utilizes descriptive and
diagnostic analytics
• Provides information on historical and
current events.
• Only provides answers to correctly
formulated questions.
Traditional BI
• Correctly formulating questions requires
an understanding of business problems
and issues and of the data itself.
50
BI Report Types
BI reports can be categorized base on
different KPIs:
• ad-hoc reports
• dashboards
51
Ad-hoc Reports
• A process that involves manually processing
data to produce custom-made reports.
• Usually focuses on a specific area of the
business, such as its marketing or supply chain
management.
• The generated custom reports are detailed and
often tabular in nature.
52
Dashboards
53
• A holistic view of key business areas.
• The information displayed is generated at
periodic intervals in real-time or near-real-time.
• The presentation of data is graphical in nature,
using bar charts, pie charts and gauges.
Big Data BI
54
• Builds upon traditional BI by combining data
stored in warehouse with semi-structured
and unstructured data sources.
• Comprises both predictive and prescriptive
analytics
• Facilitate the development of an enterprise-
wide understanding of business performance.
Big Data BI vs Traditional BI
• Traditional BI analyses generally focus on
individual business processes
• Big Data BI analyses focus on multiple
business processes simultaneously.
• Helps reveal patterns and anomalies across a
broader scope within the enterprise.
• Leads to data discovery by identifying insights
and information that may have been
previously absent or unknown. 55
Hybrid Data Warehouse
• A uniform and central repository of
structured, semi-structured and unstructured
data
• Provide essential Big Data BI tools with all of
the required data.
• Eliminates the need to connect to multiple
data sources to retrieve or access data.
• Establishes a standardized data access layer
across a range of data sources. 56

More Related Content

What's hot

Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
maxonlinetr
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
ethantelaviv
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
Bahria University ,
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
Vibrant Technologies & Computers
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
MadhuriNigam1
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
Amdocs
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
idnats
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
Vibrant Technologies & Computers
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Saurab Dulal
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Subhanshu Verma
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Dr. Sunil Kr. Pandey
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
Shahed Khalili
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
Ranak Ghosh
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
IRJET Journal
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
Pradnya Saval
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Think Big, a Teradata Company
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
uncleRhyme
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Medma Infomatix (P) Ltd.
 

What's hot (20)

Difference between data warehouse and data mining
Difference between data warehouse and data miningDifference between data warehouse and data mining
Difference between data warehouse and data mining
 
Data Warehousing and Mining
Data Warehousing and MiningData Warehousing and Mining
Data Warehousing and Mining
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 
Data ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housingData ware housing- Introduction to data ware housing
Data ware housing- Introduction to data ware housing
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Data Warehouse and Data Mining
Data Warehouse and Data MiningData Warehouse and Data Mining
Data Warehouse and Data Mining
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
The Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their UsageThe Big Data Importance – Tools and their Usage
The Big Data Importance – Tools and their Usage
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on ReadBig Data Modeling and Analytic Patterns – Beyond Schema on Read
Big Data Modeling and Analytic Patterns – Beyond Schema on Read
 
Introduction to data warehousing
Introduction to data warehousingIntroduction to data warehousing
Introduction to data warehousing
 
Data-ware Housing
Data-ware HousingData-ware Housing
Data-ware Housing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Similar to 001 More introduction to big data analytics

000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. TisiModule-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Arunnaik63
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
Lithal Fragrance
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
DawitBirhanu13
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
randyburney60861
 
Big data
Big dataBig data
Big data
Srinivasa Reddy
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Akili Data Integration using PPDM
Akili Data Integration using PPDMAkili Data Integration using PPDM
Akili Data Integration using PPDM
rnaramore
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event Processing
Perficient, Inc.
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianDoreen Christian
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
Mark Schoeppel
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
Umair Shafique
 
Master Data Management.pptx
Master Data Management.pptxMaster Data Management.pptx
Master Data Management.pptx
Muhammad Fahad Bhatti
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
AstalapulosListestos
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
nimmakiran1
 
2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx
nirmalanr2
 

Similar to 001 More introduction to big data analytics (20)

000 introduction to big data analytics 2021
000   introduction to big data analytics  2021000   introduction to big data analytics  2021
000 introduction to big data analytics 2021
 
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. TisiModule-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
Module-1.pptxcjxifkgzkzigoyxyxoxoyztiai. Tisi
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Data Mining Implementation process.pptx
Data Mining Implementation process.pptxData Mining Implementation process.pptx
Data Mining Implementation process.pptx
 
Ch_2.pdf
Ch_2.pdfCh_2.pdf
Ch_2.pdf
 
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docxDATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
DATA SCIENCE AND BIG DATA ANALYTICSCHAPTER 2 DATA ANA.docx
 
Big data
Big dataBig data
Big data
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Akili Data Integration using PPDM
Akili Data Integration using PPDMAkili Data Integration using PPDM
Akili Data Integration using PPDM
 
Drive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event ProcessingDrive Smarter Decisions with Big Data Using Complex Event Processing
Drive Smarter Decisions with Big Data Using Complex Event Processing
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Data Governance Overview - Doreen Christian
Data Governance Overview - Doreen ChristianData Governance Overview - Doreen Christian
Data Governance Overview - Doreen Christian
 
MDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large EnterprisesMDM & BI Strategy For Large Enterprises
MDM & BI Strategy For Large Enterprises
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Master Data Management.pptx
Master Data Management.pptxMaster Data Management.pptx
Master Data Management.pptx
 
Data analytics and audit coverage guide
Data analytics and audit coverage guideData analytics and audit coverage guide
Data analytics and audit coverage guide
 
lec1.pdf
lec1.pdflec1.pdf
lec1.pdf
 
2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx2. Business Data Analytics and Technology.pptx
2. Business Data Analytics and Technology.pptx
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 

001 More introduction to big data analytics

  • 1. Big Data Analytics Adoption & Technologies Instructor Dendej Sawarnkatat dendej@gmail.com
  • 2. Agenda • Big Data Adoption & Planning • Big Data Analytics Lifecycle • Enterprise Technologies 2
  • 3. BIG DATA ADOPTION & PLANNING 3
  • 4. Organization Prerequisites • Data management and Big Data governance frameworks. • Sound processes and sufficient skillsets personnel. • Data quality assessment. • Longevity of the Big Data environment • Environment expansion roadmap 4
  • 5. Data Procurement • The nature of the business may make external data very valuable. • External data sources may include government data sources and commercial data markets. • The greater the volume and variety of data that can be supplied, the higher the chances are of finding hidden insights from patterns. 5
  • 6. Data Costs • Government-provided data, such as geo- spatial data, may be free. • Most commercially relevant data will need to be purchased or subscribed. • A substantial budget may still be required to obtain external data. 6
  • 7. Privacy • Performing analytics on datasets jointly can reveal confidential information about organizations or individuals. • Requires an understanding of the nature of data • data privacy regulations • special techniques for data tagging and anonymization. • For example, car’s GPS log or smart meter data readings, collected over an extended period of time can reveal an individual’s location and behavior. 7
  • 9. Security • Securing Big Data involves ensuring that the data networks and repositories are sufficiently secured • Need to employ proper mechanisms: • Authentication • Authorization • Access Levels 9
  • 10. Provenance • Refers to information about the source of the data and how it has been processed. • Determine the authenticity and quality of data • Can be used for auditing purposes. 10
  • 11. Data States • At different stages in the analytics lifecycle, data will be in different states. • These states are: • data-in-motion • data-in-use • data-at-rest. • Whenever Big Data changes state, it should trigger the capture of provenance information that is recorded as metadata. • At the beginning, data’s provenance record can be initialized with the recording of information that captures its pedigree. 11
  • 12. Trusted Result • Provenance information is essential to being able to realize the value of the analytic result. • What is the origin of the data and what steps or algorithms were used to process the data that led to the result. 12
  • 14. Clouds • Provide remote environments that can host IT infrastructure. • Big Data environment may necessitate that some or all of that environment be hosted within a cloud. • Example • An enterprise that runs its CRM system in a cloud decides to add a Big Data solution in order to run analytics on its CRM data. • This data can then be shared with its primary Big Data environment that resides within the enterprise boundaries. 14
  • 15. Cloud Justifications • Inadequate in-house hardware resources • Upfront capital investment for system procurement is not available • Need of sandbox environment so that existing business processes are not impacted • The Big Data initiative is a proof of concept • Datasets that need to be processed are already cloud resident • The limits of available computing and storage resources. 15
  • 16. BIG DATA ANALYTICS LIFECYCLE 16
  • 17. The nine stages of the Big Data analytics lifecycle 17
  • 18. Business Case Evaluation • Presents a clear understanding of the justification, motivation and goals of carrying out the analysis. • Helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle. 18
  • 19. Data Identification • Identifying the datasets required for the analysis project and their sources. • A wider variety of data sources may increase the probability of finding hidden patterns and correlations. • Data sources can be internal and/or external to the enterprise, depend on the business scope of the analysis project and nature of the business problems being addressed. 19
  • 20. Data Acquisition and Filtering • Both internal and external data needs to be persisted once it gets generated or enters the enterprise boundary. • Metadata can be added via automation to data from both internal and external data sources to improve the classification and querying. 20
  • 21. Data Extraction • To extracting disparate data and transforming it into a format that the underlying Big Data solution. Comments and user IDs are extracted from an XML document. The user ID and coordinates of a user are extracted from a single JSON field. 21
  • 22. Data Validation and Cleansing • To establishing often complex validation rules and removing any known invalid data. • Can be used to examine interconnected datasets in order to fill in missing valid data. Data validation can be used to examine interconnected datasets in order to fill in missing valid data. The presence of invalid data is resulting in spikes. Although the data appears abnormal, it may be indicative of a new pattern. 22
  • 23. Data Aggregation and Representation • To integrating multiple datasets together to arrive at a unified view. • this stage can become complicated because of differences in Data Structure and Semantics. • it is important to understand that the same data can be stored in many different forms. two datasets are aggregated together using the Id field. 23
  • 24. Data Analysis • Usually iterative in nature until the appropriate pattern or correlation is uncovered. • Can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies. • Use to generate a statistical or mathematical model to depict relationships between variables. 24
  • 25. Data Analysis Types 1. Confirmatory analysis 2. Exploratory analysis. 25
  • 26. Confirmatory Data Analysis • A deductive approach where the cause of the phenomenon being investigated is proposed beforehand called a hypothesis. • The data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions. • Data sampling techniques are typically used. • Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed. 26
  • 27. Exploratory data analysis • An inductive approach that is closely associated with data mining. • No hypothesis or predetermined assumptions are generated. • The data is explored through analysis to develop an understanding of the cause of the phenomenon. • May not provide definitive answers • Use to find a general direction that can facilitate the discovery of patterns or anomalies. 27
  • 28. Data Visualization • Data visualization techniques and tools are used to graphically communicate the analysis results for effective interpretation by business users. 28
  • 29. Data Visualization • Provide users with the ability to perform visual analysis • Allow the discovery of answers to questions that users have not yet even formulated. 29
  • 30. Utilization of Analysis Results • To determining how and where processed analysis data can be further leveraged. • Common areas that are explored during this stage include the following: 1. Input for Enterprise Systems 2. Business Process Optimization 3. Alerts 30
  • 31. Input for Enterprise Systems • The results may be fed directly into enterprise systems to enhance and optimize their behaviors and performance. • New models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems. 31
  • 32. Business Process Optimization • The identified patterns, correlations and anomalies discovered during the data analysis are used to refine business processes. • Models may also lead to opportunities to improve business process logic. 32
  • 33. Alerts • Data analysis results can be used as input for existing alerts or may form the basis of new alerts. 33
  • 35. Overview • An enterprise executed as a layered system • The strategic layer constrains the tactical layer, which directs the operational layer. • The alignment of layers is captured through metrics and performance indicators • Allows the operational layer to gain insight into how its processes are executing. 35
  • 36. Transformation of Data • Process measurements are aggregated and enhanced with additional meaning to become key performance Indices (KPIs) • Used by tactical layer to assess corporate performance, or business execution. • KPIs are used in conjunction with other measurements and understandings to assess critical success factors (CSF). • Forms series of data enrichment regarding transformation of data into information, information into knowledge and knowledge into wisdom. 36
  • 37. Online Transaction Processing (OLTP) • A software system that processes transaction-oriented data. • Completes of an activity in real-time and is not batch-processed. • Stores operational data that is normalized. 37
  • 38. OLTP Data • A common source of structured data and serves as input to many analytic processes. • Big Data analysis results can be used to augment OLTP data stored in the underlying relational databases. • The queries supported by OLTP systems are comprised of simple insert, delete and update operations with sub-second response times. 38
  • 39. Online Analytical Processing (OLAP) • A system that is used for processing data analysis queries. • Forms an integral part of business intelligence, data mining and machine learning processes. • Can serve as both a data source as well as a data sink that is capable of receiving data. • They are used in diagnostic, predictive and prescriptive analytics. 39
  • 40. OLAP • Performs long-running, complex queries against a multidimensional database whose structure is optimized for performing advanced analytics. • Stores historical data that is aggregated and denormalized to support fast reporting capability. • Use databases that store historical data in multidimensional structures 40
  • 41. OLAP • Provides answers to complex queries based on the relationships between multiple aspects of the data. 41
  • 42. Extract Transform Load (ETL) • A process of loading data from a source system into a target system. • The source system can be a database, a flat file, or an application. • Similarly, the target system can be a database or some other storage system. • Represents the main operation through which data warehouses are fed data. 42
  • 43. ETL Steps 1. The required data is obtained or extracted from the sources. 2. The extracts are modified or transformed by the application of rules. 3. The data is inserted or loaded into the target system. 43
  • 44. Data Warehouses • Central and enterprise-wide repository consisting of historical and current data. 44
  • 45. Data Warehouses • Store data pertaining to multiple business entities from different operational systems • Heavily used by BI to run various analytical queries. • Usually interface with an OLAP system to support multi-dimensional analytical queries. 45
  • 46. Data Warehouses • Data is periodically extracted, validated, transformed and consolidated into a single denormalized database. • Over time, the amount of data contained will continue to increase that leads to slower query response times for data analysis tasks. • Usually resolve by containing optimized databases, called analytical databases, to handle reporting and data analysis tasks. 46
  • 47. Data Mart 47 • A subset of the data stored in a data warehouse that typically belongs to a department, division, or specific line of business. • Data warehouses can have multiple data marts.
  • 48. Data Mart • Enterprise-wide data is collected and business entities are then extracted. • Domain-specific entities are persisted into the data warehouse via an ETL process. 48
  • 49. Traditional BI 49 • Primarily utilizes descriptive and diagnostic analytics • Provides information on historical and current events. • Only provides answers to correctly formulated questions.
  • 50. Traditional BI • Correctly formulating questions requires an understanding of business problems and issues and of the data itself. 50
  • 51. BI Report Types BI reports can be categorized base on different KPIs: • ad-hoc reports • dashboards 51
  • 52. Ad-hoc Reports • A process that involves manually processing data to produce custom-made reports. • Usually focuses on a specific area of the business, such as its marketing or supply chain management. • The generated custom reports are detailed and often tabular in nature. 52
  • 53. Dashboards 53 • A holistic view of key business areas. • The information displayed is generated at periodic intervals in real-time or near-real-time. • The presentation of data is graphical in nature, using bar charts, pie charts and gauges.
  • 54. Big Data BI 54 • Builds upon traditional BI by combining data stored in warehouse with semi-structured and unstructured data sources. • Comprises both predictive and prescriptive analytics • Facilitate the development of an enterprise- wide understanding of business performance.
  • 55. Big Data BI vs Traditional BI • Traditional BI analyses generally focus on individual business processes • Big Data BI analyses focus on multiple business processes simultaneously. • Helps reveal patterns and anomalies across a broader scope within the enterprise. • Leads to data discovery by identifying insights and information that may have been previously absent or unknown. 55
  • 56. Hybrid Data Warehouse • A uniform and central repository of structured, semi-structured and unstructured data • Provide essential Big Data BI tools with all of the required data. • Eliminates the need to connect to multiple data sources to retrieve or access data. • Establishes a standardized data access layer across a range of data sources. 56

Editor's Notes

  1. How to learn Emphasize on programming with Java Apology for document Document is not quite complete Some parts are irrelevant Some just get added because of its interesting nature Some are missing Some are not part of this documentß Student must lecture on undocumented details
  2. Most commercially relevant data will need to be purchased and may involve the continuation of subscription costs to ensure the delivery of updates to procured datasets.
  3. Even analyzing separate datasets that contain seemingly benign data can reveal private information when the datasets are analyzed jointly. This can lead to intentional or inadvertent breaches of privacy. Addressing these privacy concerns requires an understanding of the nature of data being accumulated and relevant data privacy regulations, as well as special techniques for data tagging and anonymization. For example, telemetry data, such as a car’s GPS log or smart meter data readings, collected over an extended period of time can reveal an individual’s location and behavior.
  4. Some of the components of Big Data solutions lack the robustness of traditional enterprise solution environments when it comes to access control and data security. Big Data security further involves establishing data access levels for different categories of users.
  5. Helps determine the authenticity and quality of data, and it can be used for auditing purposes. Maintaining provenance as large volumes of data are acquired, combined and put through multiple processing stages can be a complex task.
  6. At different stages in the analytics lifecycle, data will be in different states due to the fact it may be being transmitted, processed or in storage. As data enters the analytic environment, its provenance record can be Initialized with the recording of information that captures the pedigree of the data.
  7. Goal of capturing provenance is to be able to reason over the generated analytic results When provenance information is captured on the way to generating analytic results as in Figure 3.3, the results can be more easily trusted and thereby used with confidence.
  8. Provide remote environments that can host IT infrastructure for large-scale storage and processing. The adoption of a Big Data environment may necessitate that some or all of that environment be hosted within a cloud.
  9. the limits of available computing and storage resources used by an in-house Big Data solution are being reached
  10. Data sources can be internal and/or external to the enterprise, depend on the business scope of the analysis project and nature of the business problems being addressed.
  11. Performing this stage can become complicated because of differences in: • Data Structure – Although the data format may be the same, the data model may be different. • Semantics – A value that is labeled differently in two different datasets may mean the same thing, for example “surname” and “last name.
  12. An inductive approach that is closely associated with data mining. The data is explored through analysis to develop an understanding of the cause of the phenomenon.
  13. For example, an online store can be fed processed customer-related analysis results that may impact how it generates product recommendations.
  14. An example is consolidating transportation routes as part of a supply chain process.
  15. For example, alerts may be created to inform users via email or SMS text about an event that requires them to take corrective action.
  16. >>> Strategic layer constrains tactical >>> Tactical directs operational >>> alignment captured through metric and PI >>>> gain insight
  17. Ultimately, this series of enrichment corresponds with the transformation of data into information, information into knowledge and knowledge into wisdom.
  18. E.g.: a point of sale system, execute business processes in support of corporate operations. Examples include ticket reservation systems, banking and point of sale systems.
  19. They are relevant to Big Data in that they can serve as both a data source as well as a data sink that is capable of receiving data. >>>> OLAP is an integral part of BI, data mining & ML. >>>> can be both data sources and sink >>>> used in diagnostic, predictive , and prescriptive analytics
  20. >>>> complex queries of multi-dimensional data >>>> historical data is aggregated and denormalized to be reported >>>> use multi-dimensional historical data
  21. A Big Data solution encompasses the ETL feature-set for converting data of different types.
  22. Over time this leads to slower query response times for data analysis tasks. To resolve this shortcoming, data warehouses usually contain optimized databases, called analytical databases, to handle reporting and data analysis tasks. An analytical database can exist as a separate DBMS, as in the case of an OLAP database.
  23. As previously explained, data warehouses and data marts contain consolidated and validated information about enterprise-wide business entities. Traditional BI cannot function effectively without data marts because they contain the optimized and segregated data that BI requires for reporting purposes. Without data marts, data needs to be extracted from the data warehouse via an ETL process on an ad-hoc basis whenever a query needs to be run. This increases the time and effort to execute queries and generate reports. Traditional BI uses data warehouses and data marts for reporting and data analysis because they allow complex data analysis queries with multiple joins and aggregations to be issued.
  24. Big Data BI requires the analysis of unstructured, semi-structured and structured data residing in the enterprise data warehouse. This requires a “next-generation” data warehouse that uses new features and technologies to store cleansed data originating from a variety of sources in a single uniform data format. >>> Data lakes and data warehouses are both widely used for storing big data, but they are not interchangeable terms. >>>>> A data lake is a vast pool of raw data, the purpose for which is not yet defined. >>>>> A data warehouse is a repository for structured, filtered data that has already been processed for a specific purpose.