SlideShare a Scribd company logo
DATA WAREHOUSE DESIGN ON THE CLOUD
A BIGDATA APPROACH
PANCHALESWAR NAYAK ,SR ARCHITECT (CLOUD &
BIGDATA)
Public
Clouds
Private
Cloud
AGENDA
• WHAT IS BUSINESS INTELLIGENCE (BI)
• WHAT IS DATA WAREHOUSE(DW)
• WHAT IS DATA MARTS
• WHAT IS DATA MINING
• LOGICAL ARCHITECTURE OF ETL (EXTRACT TRANSFORMATION AND LOAD)
• DATA WAREHOUSE (DW) DESIGN METHODOLOGIES
• BILL INMON’S TOP-DOWN APPROACH
• RALPH KIMBALL'S BOTTOM-UP APPROACH
• THE NEW 3V DATA PROBLEM (VOLUME, VELOCITY, VARIETY)
• THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING
BUSINESS INTELLIGENCE (BI)
• Business intelligence, or BI, Is an umbrella term that refers to a variety of
software applications used to analyze an organization's raw data.
• BI as a discipline is made up of several related activities, including data mining,
online analytical processing, querying and reporting.
WHAT DATA WAREHOUSE (DW)
A data warehouse (DW or DWH), also known as an enterprise data
warehouse (EDW), is a system used for reporting and data analysis, and is
considered as a core component of business intelligence environment. DWs are
central repositories of integrated data from one or more disparate sources.
It is a relational database schema which stores historical data and metadata from an operational
system or systems, in such a way as to facilitate the reporting and analysis of the data, aggregated to
various levels.
WHAT IS DATA MART
• The DATA MART is a subset of the DATA WAREHOUSE that is usually oriented to
a specific business line or team.
• DATA MARTS are small slices of the DATA WAREHOUSE.
• Where as data warehouses have an enterprise-wide depth, the information
in data marts pertains to a single department.
DATA WARWHOUSE VS DATA MART
• THE MAIN DIFFERENCE IS THE INFORMATION SCOPE THEY STORE.
• DATA WAREHOUSE:
• Data warehouses save all kinds of data related to whole system.
• Data warehouse is usually much bigger than data marts, because it keeps a lot more data.
• Usually integrates large number of data sources in order to feed its database.
• Holds multiple subject areas
• Holds very detailed information
• Does not necessarily use a dimensional model but feeds dimensional models.
• DATA MART
• Data marts store specific subject information, becoming much more focused on these functionalities,
for example, finance, or sales.
• A data mart has a lot less integration to do, since its data is very specific.
• May hold more summarized data
• Concentrates on integrating information from a given subject area or set of data source.
• Is built focused on a dimensional model using a star schema.
WHAT IS DATA MINING
• Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different
perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts
costs, or both.
It is the data-driven discovery and
modeling of hidden patterns in a
volume of data.
Technically, data mining is the process of finding correlations or patterns among dozens of fields in large
relational databases.
It allows users to analyze data from
many different dimensions or angles,
categorize it, and summarize the
relationships identified.
LOGICAL ARCHITECTURE OF TRADITIONAL ETL SYSTEM
Online
Data Store Data Mart
ETL ToolsOriginal
Data Transform
ed Data
Transform
Load
Query
BI Tools
DATA WAREHOUSE DESIGN METHODOLOGIES
• Bill Inmon is sometimes also referred to as the "father of data warehousing“, his
design methodology is based on a top-down approach and defines data
warehouse in these terms
• Subject oriented - Data in a data warehouse is categorized on the basis of the
subject area and hence it is "subject oriented".
• Integrated - Data gets integrated from different disparate data sources and hence
universal naming conventions, measurements, classifications and so on used in the
data warehouse. The data warehouse provides an enterprise consolidated view of
data and therefore it is designated as an integrated solution.
• Non-volatile - Once the data is integratedloaded into the data warehouse it can
only be read. Users cannot make changes to the data and this practice makes the
data non-volatile.
• Time variant - Data is stored for long periods of time quantified in years and has a
date and timestamp and therefore it is described as "time variant".
BILL INMON’S TOP-DOWN APPROACH
Bill Inmon Top-Down approach
STAGING AREA
SALES
FINANCE
HR
OTHER
SOURCES
MARKETIN
G
CLEANING
SCRUBBING
DE-
DUPLICATION
TRANSFORMATI
ON
DATA
WAREHOUSE
DATA MART2
DATA MART3
DATA MART4
DATA MART1
DATA MART5
EXTRA
CT
LOAD
• Bill Inmon saw a need to integrate data from different OLTP systems into a centralized
repository (called a data warehouse) with a so called top-down approach.
• He envisions a DW center of the "corporate information factory" (cif), which provides a
logical framework for delivering Business Intelligence (BI), business analytics and
business management capabilities.
PROS AND CONS OF TOP-DOWN APPROACH
• PROS
• Highly consistent dimensional view of data across data marts as all data marts are loaded from the
centralized repository (data warehouse).
• Proven to be flexible to support business changes as it looks at the organization as whole, not at each
function or business process of the organization.
• Generating a new dimensional data marts against the data stored in the data warehouse is a relatively simple
task.
• CONS
• It represents a very large project with a very broad scope and hence the up-front cost for implementing a
data warehouse using the top-down methodology is significant.
• The duration of time from the start of project to the point that end users start experience initial benefits of
the solution can be substantial.
• The top-down methodology can be inflexible and unresponsive to changing departmental or business
process needs in today's dynamically changing environment.
RALPH KIMBALL'S BOTTOM-UP APPROACH
Ralph Kimball's bottom-up approach
STAGING AREA
SALES
FINANCE
HR
OTHER
SOURCES
MARKETIN
G
CLEANING
SCRUBBING
DE-
DUPLICATION
TRANSFORMATI
ON DATA
WAREHOUSE
DATA MART2
DATA MART3
DATA MART4
DATA MART1
DATA MART5
EXTRA
CT
LOAD
LOAD
LOAD
LOAD
LOAD
• Ralph Kimball's bottom-up approach proposes to create a business matrix which should contain all the common elements (that are
used by data marts such as conformedshared dimension, measures, etc.) defined for the enterprise as whole.
• The user can design and develop solutions which supports doing analysis across the business processes for cross selling.
BOTTOM-UP DATA WAREHOUSE DESIGN APPROACH
• Ralph Kimball is a renowned author on the subject of data warehousing. His
design methodology is called dimensional modeling or the Kimball
methodology.
• A data warehouse is the copy of the transactional data specifically structured for
EMPHASIZING THE VALUE OF THE DATA WAREHOUSE TO THE USERS AS QUICKLY
AS POSSIBLE.
• A Data Warehouse is the copy of the transactional data specifically structured
for analytical querying and reporting in order to support the decision support
system.
• Data Marts are first created to provide reporting and analytical capabilities for
specific businessfunctional processes and later on these data marts can
eventually be unioned together to create a comprehensive Enterprise Data
Warehouse.
• The bottom-up approach focuses on each business process at one point of time so the
return on investment (ROI) could be as quick as first data mart gets created.
• Though if not carefully planned, you might lack the big picture of the Enterprise Data
Warehouse by missing some dimensions or by creating redundant dimensions, etc. When
you are too focused on an individual business process.
PROBLEM WITH OLD DATA WAREHOUSE
• CAN HANDLE VERY LIMITED NUMBER OF DATA SOURCES (MAY BE AROUND 25-30)
• CAN NOT HANDLE LARGE NUMBER OF DATA SOURCES
• GLOBAL SCHEMA REQUIRED
• A PROGRAMMER OR DATA ENGINEER WAS REQUIRED FOR EACH DATA SOURCE TO
• To Understand Data Schema
• To Write local to Global mapping (Scripting language)
• To Clean the DATA
• To Run the ETL
• HUMAN INVOLVEMENT WAS VERY MUCH REQUIRED FOR ADDING A NEW DATA SOURCE.
• SCALABILITY ISSUES
• AGILITY ISSUES
THE NEW 3V DATA PROBLEM
• VOLUME
• TOO BIG DATA TO BE HANDLED AND TOO BIG TO BE PROCESSED BY A SINGLE SERVER
• VELOCITY
• TOO MUCH CONTINUOUS DATA FLOW WITH HIGH SPEED OF DATA INGESTION TO BE HANDLED BY A STATIC
DATA WAREHOUSE
• VARIETY
• TOO UNSTRUCTURED TO FIT INTO A ROW-AND-COLUMN DATABASE
THE NEW DATA ARCHITECTURE WITH
HADOOP
Online
Data Store
Hadoop
Data Mart
ETL Tools
Original
Data
Transform
ed Data
Extract Transform Load
Query
BI Tools
THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING

More Related Content

What's hot

Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
guest4e975e2
 
Big Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data SetsBig Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data Sets
BugRaptors
 
My Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence PortfolioMy Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence Portfolio
mnkashama
 
BISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthBISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in health
albertisern
 
Power bi
Power biPower bi
Power bi
karyatechs
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
AshishGuleria
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
Rob Winters
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
Venu Anuganti
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
Rob Winters
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
Serhiy (Serge) Haziyev
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
GreenM
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse Methodology
SQL Power
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
Nagaraj Yerram
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
Venu Anuganti
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
jdijcks
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Edureka!
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
Denodo
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
Durga Gadiraju
 

What's hot (20)

Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
Bi presentation to bkk
Bi presentation to bkkBi presentation to bkk
Bi presentation to bkk
 
Big Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data SetsBig Data Testing- Verify Structured and Unstructured Data Sets
Big Data Testing- Verify Structured and Unstructured Data Sets
 
My Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence PortfolioMy Microsoft Business Intelligence Portfolio
My Microsoft Business Intelligence Portfolio
 
BISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in healthBISMART Bihealth. Microsoft Business Intelligence in health
BISMART Bihealth. Microsoft Business Intelligence in health
 
Power bi
Power biPower bi
Power bi
 
Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-Data warehouse 101-fundamentals-
Data warehouse 101-fundamentals-
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Role of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, WarehousingRole of MySQL in Data Analytics, Warehousing
Role of MySQL in Data Analytics, Warehousing
 
A brief history of data warehousing
A brief history of data warehousingA brief history of data warehousing
A brief history of data warehousing
 
SoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in UtahSoftServe BI/BigData Workshop in Utah
SoftServe BI/BigData Workshop in Utah
 
Scalable data pipeline
Scalable data pipelineScalable data pipeline
Scalable data pipeline
 
Data Warehouse Methodology
Data Warehouse MethodologyData Warehouse Methodology
Data Warehouse Methodology
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
Designing Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQLDesigning Scalable Data Warehouse Using MySQL
Designing Scalable Data Warehouse Using MySQL
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
Data Warehouse Tutorial For Beginners | Data Warehouse Concepts | Data Wareho...
 
Performance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data WarehousePerformance Considerations in Logical Data Warehouse
Performance Considerations in Logical Data Warehouse
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 

Viewers also liked

140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool
GenomeInABottle
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
Omid Vahdaty
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
Ivo Andreev
 
Aug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysisAug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysis
GenomeInABottle
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
mark madsen
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
Krish_ver2
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Subhanshu Verma
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
GenomeInABottle
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Denodo
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
Stephen Alex
 
A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job Platform
Safe Software
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Andreas Buckenhofer
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
ines beltaief
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
Denodo
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
Omid Vahdaty
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
Torana, Inc.
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
Dunn Solutions Group
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
DataWorks Summit/Hadoop Summit
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Andreas Buckenhofer
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Denodo
 

Viewers also liked (20)

140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool140127 rtg vcfeval vcf comparison tool
140127 rtg vcfeval vcf comparison tool
 
Flume vs. kafka
Flume vs. kafkaFlume vs. kafka
Flume vs. kafka
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Aug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysisAug2013 real time genomics trio pedigree analysis
Aug2013 real time genomics trio pedigree analysis
 
Exploring cloud for data warehousing
Exploring cloud for data warehousingExploring cloud for data warehousing
Exploring cloud for data warehousing
 
1.4 data warehouse
1.4 data warehouse1.4 data warehouse
1.4 data warehouse
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...Designing Fast Data Architecture for Big Data  using Logical Data Warehouse a...
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 
A First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job PlatformA First Look at San Francisco’s New ETL Job Platform
A First Look at San Francisco’s New ETL Job Platform
 
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 4 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Data warehouse design
Data warehouse designData warehouse design
Data warehouse design
 
Supporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data VirtualizationSupporting Data Services Marketplace using Data Virtualization
Supporting Data Services Marketplace using Data Virtualization
 
Introduction to ETL process
Introduction to ETL process Introduction to ETL process
Introduction to ETL process
 
Automate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile wayAutomate data warehouse etl testing and migration testing the agile way
Automate data warehouse etl testing and migration testing the agile way
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
Part 1 - Data Warehousing Lecture at BW Cooperative State University (DHBW)
 
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
Designing an Agile Fast Data Architecture for Big Data Ecosystem using Logica...
 

Similar to Data Warehouse Design on Cloud ,A Big Data approach Part_One

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
Rishikese MR
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
Ashish Kumar Thakur
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
Sunderland City Council
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
vipush1
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
ssuser7fc7eb
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
A P
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
InformaticaTrainingClasses
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
AAKANKSHA JAIN
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
Kiran kumar
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
GraceJoyMoleroCarwan
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
calf_ville86
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
Samraiz Tejani
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
Nivetha Durganathan
 
Big data
Big dataBig data
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
SalehaMariyam
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
Philippe Julio
 
Data warehousing
Data warehousingData warehousing
Data warehousing
Juhi Mahajan
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
Anusuya123
 
Data warehouse
Data warehouseData warehouse
Data warehouse
sudhir Pawar
 
Data warehouse
Data warehouseData warehouse
Data warehouse
Shwetabh Jaiswal
 

Similar to Data Warehouse Design on Cloud ,A Big Data approach Part_One (20)

DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Manish tripathi-ea-dw-bi
Manish tripathi-ea-dw-biManish tripathi-ea-dw-bi
Manish tripathi-ea-dw-bi
 
Dataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClassesDataware house introduction by InformaticaTrainingClasses
Dataware house introduction by InformaticaTrainingClasses
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
 
Online analytical processing
Online analytical processingOnline analytical processing
Online analytical processing
 
Data warehouse - Nivetha Durganathan
Data warehouse - Nivetha DurganathanData warehouse - Nivetha Durganathan
Data warehouse - Nivetha Durganathan
 
Big data
Big dataBig data
Big data
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Business Intelligence Architecture
Business Intelligence ArchitectureBusiness Intelligence Architecture
Business Intelligence Architecture
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Recently uploaded

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 

Recently uploaded (20)

4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 

Data Warehouse Design on Cloud ,A Big Data approach Part_One

  • 1. DATA WAREHOUSE DESIGN ON THE CLOUD A BIGDATA APPROACH PANCHALESWAR NAYAK ,SR ARCHITECT (CLOUD & BIGDATA) Public Clouds Private Cloud
  • 2. AGENDA • WHAT IS BUSINESS INTELLIGENCE (BI) • WHAT IS DATA WAREHOUSE(DW) • WHAT IS DATA MARTS • WHAT IS DATA MINING • LOGICAL ARCHITECTURE OF ETL (EXTRACT TRANSFORMATION AND LOAD) • DATA WAREHOUSE (DW) DESIGN METHODOLOGIES • BILL INMON’S TOP-DOWN APPROACH • RALPH KIMBALL'S BOTTOM-UP APPROACH • THE NEW 3V DATA PROBLEM (VOLUME, VELOCITY, VARIETY) • THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING
  • 3. BUSINESS INTELLIGENCE (BI) • Business intelligence, or BI, Is an umbrella term that refers to a variety of software applications used to analyze an organization's raw data. • BI as a discipline is made up of several related activities, including data mining, online analytical processing, querying and reporting.
  • 4. WHAT DATA WAREHOUSE (DW) A data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered as a core component of business intelligence environment. DWs are central repositories of integrated data from one or more disparate sources. It is a relational database schema which stores historical data and metadata from an operational system or systems, in such a way as to facilitate the reporting and analysis of the data, aggregated to various levels.
  • 5. WHAT IS DATA MART • The DATA MART is a subset of the DATA WAREHOUSE that is usually oriented to a specific business line or team. • DATA MARTS are small slices of the DATA WAREHOUSE. • Where as data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.
  • 6. DATA WARWHOUSE VS DATA MART • THE MAIN DIFFERENCE IS THE INFORMATION SCOPE THEY STORE. • DATA WAREHOUSE: • Data warehouses save all kinds of data related to whole system. • Data warehouse is usually much bigger than data marts, because it keeps a lot more data. • Usually integrates large number of data sources in order to feed its database. • Holds multiple subject areas • Holds very detailed information • Does not necessarily use a dimensional model but feeds dimensional models. • DATA MART • Data marts store specific subject information, becoming much more focused on these functionalities, for example, finance, or sales. • A data mart has a lot less integration to do, since its data is very specific. • May hold more summarized data • Concentrates on integrating information from a given subject area or set of data source. • Is built focused on a dimensional model using a star schema.
  • 7. WHAT IS DATA MINING • Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information - information that can be used to increase revenue, cuts costs, or both. It is the data-driven discovery and modeling of hidden patterns in a volume of data. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified.
  • 8. LOGICAL ARCHITECTURE OF TRADITIONAL ETL SYSTEM Online Data Store Data Mart ETL ToolsOriginal Data Transform ed Data Transform Load Query BI Tools
  • 9. DATA WAREHOUSE DESIGN METHODOLOGIES • Bill Inmon is sometimes also referred to as the "father of data warehousing“, his design methodology is based on a top-down approach and defines data warehouse in these terms • Subject oriented - Data in a data warehouse is categorized on the basis of the subject area and hence it is "subject oriented". • Integrated - Data gets integrated from different disparate data sources and hence universal naming conventions, measurements, classifications and so on used in the data warehouse. The data warehouse provides an enterprise consolidated view of data and therefore it is designated as an integrated solution. • Non-volatile - Once the data is integratedloaded into the data warehouse it can only be read. Users cannot make changes to the data and this practice makes the data non-volatile. • Time variant - Data is stored for long periods of time quantified in years and has a date and timestamp and therefore it is described as "time variant".
  • 10. BILL INMON’S TOP-DOWN APPROACH Bill Inmon Top-Down approach STAGING AREA SALES FINANCE HR OTHER SOURCES MARKETIN G CLEANING SCRUBBING DE- DUPLICATION TRANSFORMATI ON DATA WAREHOUSE DATA MART2 DATA MART3 DATA MART4 DATA MART1 DATA MART5 EXTRA CT LOAD • Bill Inmon saw a need to integrate data from different OLTP systems into a centralized repository (called a data warehouse) with a so called top-down approach. • He envisions a DW center of the "corporate information factory" (cif), which provides a logical framework for delivering Business Intelligence (BI), business analytics and business management capabilities.
  • 11. PROS AND CONS OF TOP-DOWN APPROACH • PROS • Highly consistent dimensional view of data across data marts as all data marts are loaded from the centralized repository (data warehouse). • Proven to be flexible to support business changes as it looks at the organization as whole, not at each function or business process of the organization. • Generating a new dimensional data marts against the data stored in the data warehouse is a relatively simple task. • CONS • It represents a very large project with a very broad scope and hence the up-front cost for implementing a data warehouse using the top-down methodology is significant. • The duration of time from the start of project to the point that end users start experience initial benefits of the solution can be substantial. • The top-down methodology can be inflexible and unresponsive to changing departmental or business process needs in today's dynamically changing environment.
  • 12. RALPH KIMBALL'S BOTTOM-UP APPROACH Ralph Kimball's bottom-up approach STAGING AREA SALES FINANCE HR OTHER SOURCES MARKETIN G CLEANING SCRUBBING DE- DUPLICATION TRANSFORMATI ON DATA WAREHOUSE DATA MART2 DATA MART3 DATA MART4 DATA MART1 DATA MART5 EXTRA CT LOAD LOAD LOAD LOAD LOAD • Ralph Kimball's bottom-up approach proposes to create a business matrix which should contain all the common elements (that are used by data marts such as conformedshared dimension, measures, etc.) defined for the enterprise as whole. • The user can design and develop solutions which supports doing analysis across the business processes for cross selling.
  • 13. BOTTOM-UP DATA WAREHOUSE DESIGN APPROACH • Ralph Kimball is a renowned author on the subject of data warehousing. His design methodology is called dimensional modeling or the Kimball methodology. • A data warehouse is the copy of the transactional data specifically structured for EMPHASIZING THE VALUE OF THE DATA WAREHOUSE TO THE USERS AS QUICKLY AS POSSIBLE. • A Data Warehouse is the copy of the transactional data specifically structured for analytical querying and reporting in order to support the decision support system. • Data Marts are first created to provide reporting and analytical capabilities for specific businessfunctional processes and later on these data marts can eventually be unioned together to create a comprehensive Enterprise Data Warehouse. • The bottom-up approach focuses on each business process at one point of time so the return on investment (ROI) could be as quick as first data mart gets created. • Though if not carefully planned, you might lack the big picture of the Enterprise Data Warehouse by missing some dimensions or by creating redundant dimensions, etc. When you are too focused on an individual business process.
  • 14. PROBLEM WITH OLD DATA WAREHOUSE • CAN HANDLE VERY LIMITED NUMBER OF DATA SOURCES (MAY BE AROUND 25-30) • CAN NOT HANDLE LARGE NUMBER OF DATA SOURCES • GLOBAL SCHEMA REQUIRED • A PROGRAMMER OR DATA ENGINEER WAS REQUIRED FOR EACH DATA SOURCE TO • To Understand Data Schema • To Write local to Global mapping (Scripting language) • To Clean the DATA • To Run the ETL • HUMAN INVOLVEMENT WAS VERY MUCH REQUIRED FOR ADDING A NEW DATA SOURCE. • SCALABILITY ISSUES • AGILITY ISSUES
  • 15. THE NEW 3V DATA PROBLEM • VOLUME • TOO BIG DATA TO BE HANDLED AND TOO BIG TO BE PROCESSED BY A SINGLE SERVER • VELOCITY • TOO MUCH CONTINUOUS DATA FLOW WITH HIGH SPEED OF DATA INGESTION TO BE HANDLED BY A STATIC DATA WAREHOUSE • VARIETY • TOO UNSTRUCTURED TO FIT INTO A ROW-AND-COLUMN DATABASE
  • 16. THE NEW DATA ARCHITECTURE WITH HADOOP Online Data Store Hadoop Data Mart ETL Tools Original Data Transform ed Data Extract Transform Load Query BI Tools
  • 17. THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING