SlideShare a Scribd company logo
1 of 21
DATA WAREHOUSING/MINING
PREPARE BY: R. KRISHNARAJ
DATA WAREHOUSING INTRODUCTION
• A data warehouse is a central repository of information that can be
analyzed to make more informed decisions.
THE WAREHOUSING APPROACH
Data
Warehouse
Clients
Source Source
Source
. . .
Extractor/
Monitor
Integration System
. . .
Metadata
Extractor/
Monitor
Extractor/
Monitor
 Information
integrated in
advance
 Stored in wh for
direct querying
and analysis
ADVANTAGES OF WAREHOUSING APPROACH
• High query performance
• But not necessarily most current information
• Doesn’t interfere with local processing at sources
• Complex queries at warehouse
• OLTP at information sources
• Information copied at warehouse
• Can modify, annotate, summarize, restructure, etc.
• Can store historical information
• Security, no auditing
• Has caught on in industry
DATA WAREHOUSE EVOLUTION
TIME
2000
1995
1990
1985
1980
1960 1975
Information-
Based
Management
Data
Revolution
“Middle
Ages”
“Prehistoric
Times”
Relational
Databases
PC’s and
Spreadsheets
End-user
Interfaces
1st DW
Article
DW
Confs.
Vendor DW
Frameworks
Company
DWs
“Building the
DW”
Inmon (1992)
Data Replication
Tools
WHAT IS A DATA WAREHOUSE?
A PRACTITIONERS VIEWPOINT
“A data warehouse is simply a single, complete, and consistent
store of data obtained from a variety of sources and made
available to end users in a way they can understand and use it in a
business context.”
-- Barry Devlin, IBM Consultant
A DATA WAREHOUSE IS...
• Stored collection of diverse data
• A solution to data integration problem
• Single repository of information
• Subject-oriented
• Organized by subject, not by application
• Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive
A DATA WAREHOUSE IS... (CONTINUED)
• Large volume of data (Gb, Tb)
• Non-volatile
• Historical
• Time attributes are important
• Updates infrequent
• May be append-only
• Examples
• All transactions ever at WalMart
• Complete client histories at insurance firm
• Stockbroker financial information and portfolios
SUMMARY
Operational Systems
Enterprise
Modeling
Business
Information Guide
Data
Warehouse
Catalog
Data Warehouse
Population
Data
Warehouse
Business Information
Interface
WAREHOUSE IS A SPECIALIZED DB
Standard DB
• Mostly updates
• Many small transactions
• Mb - Gb of data
• Current snapshot
• Index/hash on p.k.
• Raw data
• Thousands of users (e.g.,
clerical users)
Warehouse
 Mostly reads
 Queries are long and complex
 Gb - Tb of data
 History
 Lots of scans
 Summarized, reconciled data
 Hundreds of users (e.g.,
decision-makers, analysts)
DATA WAREHOUSE ARCHITECTURES:
CONCEPTUAL VIEW
• Single-layer
• Every data element is stored once only
• Virtual warehouse
• Two-layer
• Real-time + derived data
• Most commonly used approach in
industry today
“Real-time data”
Operational
systems
Informational
systems
Derived Data
Real-time data
Operational
systems
Informational
systems
THREE-LAYER ARCHITECTURE: CONCEPTUAL
VIEW
• Transformation of real-time data to derived data really requires
two steps
Derived Data
Real-time data
Operational
systems
Informational
systems
Reconciled Data
Physical Implementation
of the Data Warehouse
View level
“Particular informational
needs”
WAREHOUSE ARCHITECTURE
Source Source Source
Extractor/
Monitor
Extractor/
Monitor
Extractor/
Monitor
Integrator
Warehouse
Query & Analysis
Client Client
...
Metadata
WHAT IS DATA MINING?
• Data Mining is:
• (1) The efficient discovery of previously unknown, valid,
potentially useful, understandable patterns in large datasets
• (2) The analysis of (often large) observational data sets to find
unsuspected relationships and to summarize the data in
novel ways that are both understandable and useful to the
data owner
OVERVIEW OF TERMS
• Data: a set of facts (items) D, usually stored in a database
• Pattern: an expression E in a language L, that describes a subset
of facts
• Attribute: a field in an item I in D.
• Interestingness: a function ID,L that maps an expression E in L
into a measure space M
KNOWLEDGE DISCOVERY
EXAMPLES OF LARGE DATASETS
• Government: IRS, NGA, …
• Large corporations
• WALMART: 20M transactions per day
• MOBIL: 100 TB geological databases
• AT&T 300 M calls per day
• Credit card companies
• Scientific
• NASA, EOS project: 50 GB per hour
• Environmental datasets
EXAMPLES OF DATA MINING APPLICATIONS
1. Fraud detection: credit cards, phone cards
2. Marketing: customer targeting
3. Data Warehousing: Walmart
4. Astronomy
5. Molecular biology
THE DATA MINING PROCESS
1. Understand the domain
2. Create a dataset:
Select the interesting attributes
Data cleaning and preprocessing
3. Choose the data mining task and the specific algorithm
4. Interpret the results, and possibly return to 2
HOW DATA MINING IS USED
1. Identify the problem
2. Use data mining techniques to transform the data into informatio
3. Act on the information
4. Measure the results
THANK YOU

More Related Content

Similar to DWIntro.pptx

Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OnePanchaleswar Nayak
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothAdaryl "Bob" Wakefield, MBA
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Introduction to Datawarehousing
Introduction to  DatawarehousingIntroduction to  Datawarehousing
Introduction to Datawarehousingkarunakar81987
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basicNivaTripathy2
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!DataWorks Summit/Hadoop Summit
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introductionMurli Jha
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsAseda Owusua Addai-Deseh
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution WSO2
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsDenodo
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lakejeetendra mandal
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Vibrant Technologies & Computers
 

Similar to DWIntro.pptx (20)

Data Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_OneData Warehouse Design on Cloud ,A Big Data approach Part_One
Data Warehouse Design on Cloud ,A Big Data approach Part_One
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need BothThe Marriage of the Data Lake and the Data Warehouse and Why You Need Both
The Marriage of the Data Lake and the Data Warehouse and Why You Need Both
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Introduction to Datawarehousing
Introduction to  DatawarehousingIntroduction to  Datawarehousing
Introduction to Datawarehousing
 
Big data
Big dataBig data
Big data
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data mining concept and methods for basic
Data mining concept and methods for basicData mining concept and methods for basic
Data mining concept and methods for basic
 
The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!The key to unlocking the Value in the IoT? Managing the Data!
The key to unlocking the Value in the IoT? Managing the Data!
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehouse introduction
Data warehouse introductionData warehouse introduction
Data warehouse introduction
 
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all AnalyticsDay 1 (Lecture 1): Data Management- The Foundation of all Analytics
Day 1 (Lecture 1): Data Management- The Foundation of all Analytics
 
Building your big data solution
Building your big data solution Building your big data solution
Building your big data solution
 
Data Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified InsightsData Lakes: A Logical Approach for Faster Unified Insights
Data Lakes: A Logical Approach for Faster Unified Insights
 
Difference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data LakeDifference between Database vs Data Warehouse vs Data Lake
Difference between Database vs Data Warehouse vs Data Lake
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 
Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.Data ware housing - Introduction to data ware housing process.
Data ware housing - Introduction to data ware housing process.
 

More from KRISHNARAJ207

INTERNSHIP PRESENTATION.pptx
INTERNSHIP PRESENTATION.pptxINTERNSHIP PRESENTATION.pptx
INTERNSHIP PRESENTATION.pptxKRISHNARAJ207
 
22BAB09-OPERATION MANAGEMENT.pptx
22BAB09-OPERATION MANAGEMENT.pptx22BAB09-OPERATION MANAGEMENT.pptx
22BAB09-OPERATION MANAGEMENT.pptxKRISHNARAJ207
 
BATCH-4 PROJECT PPT.pdf
BATCH-4 PROJECT PPT.pdfBATCH-4 PROJECT PPT.pdf
BATCH-4 PROJECT PPT.pdfKRISHNARAJ207
 
BATCH-4 PROJECT .pdf
BATCH-4 PROJECT .pdfBATCH-4 PROJECT .pdf
BATCH-4 PROJECT .pdfKRISHNARAJ207
 
ankurjobrotation-140217103220-phpapp02.pptx
ankurjobrotation-140217103220-phpapp02.pptxankurjobrotation-140217103220-phpapp02.pptx
ankurjobrotation-140217103220-phpapp02.pptxKRISHNARAJ207
 
methodsofperformanceappraisal-170222015553.pptx
methodsofperformanceappraisal-170222015553.pptxmethodsofperformanceappraisal-170222015553.pptx
methodsofperformanceappraisal-170222015553.pptxKRISHNARAJ207
 
work culture ppt.pptx
work culture ppt.pptxwork culture ppt.pptx
work culture ppt.pptxKRISHNARAJ207
 
Elements of Systems Design.ppt
Elements of Systems Design.pptElements of Systems Design.ppt
Elements of Systems Design.pptKRISHNARAJ207
 
ultrasonic-welding-828-SNggldd.pptx
ultrasonic-welding-828-SNggldd.pptxultrasonic-welding-828-SNggldd.pptx
ultrasonic-welding-828-SNggldd.pptxKRISHNARAJ207
 

More from KRISHNARAJ207 (20)

INTERNSHIP PRESENTATION.pptx
INTERNSHIP PRESENTATION.pptxINTERNSHIP PRESENTATION.pptx
INTERNSHIP PRESENTATION.pptx
 
22BAB09-OPERATION MANAGEMENT.pptx
22BAB09-OPERATION MANAGEMENT.pptx22BAB09-OPERATION MANAGEMENT.pptx
22BAB09-OPERATION MANAGEMENT.pptx
 
22BA005.pptx
22BA005.pptx22BA005.pptx
22BA005.pptx
 
BATCH-4 PROJECT PPT.pdf
BATCH-4 PROJECT PPT.pdfBATCH-4 PROJECT PPT.pdf
BATCH-4 PROJECT PPT.pdf
 
BATCH-4 PROJECT .pdf
BATCH-4 PROJECT .pdfBATCH-4 PROJECT .pdf
BATCH-4 PROJECT .pdf
 
ankurjobrotation-140217103220-phpapp02.pptx
ankurjobrotation-140217103220-phpapp02.pptxankurjobrotation-140217103220-phpapp02.pptx
ankurjobrotation-140217103220-phpapp02.pptx
 
UNIT-1.ppt
UNIT-1.pptUNIT-1.ppt
UNIT-1.ppt
 
methodsofperformanceappraisal-170222015553.pptx
methodsofperformanceappraisal-170222015553.pptxmethodsofperformanceappraisal-170222015553.pptx
methodsofperformanceappraisal-170222015553.pptx
 
Unit 5 ED.pptx
Unit 5 ED.pptxUnit 5 ED.pptx
Unit 5 ED.pptx
 
work culture ppt.pptx
work culture ppt.pptxwork culture ppt.pptx
work culture ppt.pptx
 
Control.ppt
Control.pptControl.ppt
Control.ppt
 
22BA001 IE S 2.pptx
22BA001 IE S 2.pptx22BA001 IE S 2.pptx
22BA001 IE S 2.pptx
 
22BA001 IE 1.pptx
22BA001 IE 1.pptx22BA001 IE 1.pptx
22BA001 IE 1.pptx
 
22BA003 IE 1.pptx
22BA003 IE 1.pptx22BA003 IE 1.pptx
22BA003 IE 1.pptx
 
Elements of Systems Design.ppt
Elements of Systems Design.pptElements of Systems Design.ppt
Elements of Systems Design.ppt
 
Financial IS.pptx
Financial IS.pptxFinancial IS.pptx
Financial IS.pptx
 
dbms.ppt
dbms.pptdbms.ppt
dbms.ppt
 
DFD1.ppt
DFD1.pptDFD1.ppt
DFD1.ppt
 
group behaviour.ppt
group behaviour.pptgroup behaviour.ppt
group behaviour.ppt
 
ultrasonic-welding-828-SNggldd.pptx
ultrasonic-welding-828-SNggldd.pptxultrasonic-welding-828-SNggldd.pptx
ultrasonic-welding-828-SNggldd.pptx
 

Recently uploaded

Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 

Recently uploaded (20)

Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

DWIntro.pptx

  • 2. DATA WAREHOUSING INTRODUCTION • A data warehouse is a central repository of information that can be analyzed to make more informed decisions.
  • 3. THE WAREHOUSING APPROACH Data Warehouse Clients Source Source Source . . . Extractor/ Monitor Integration System . . . Metadata Extractor/ Monitor Extractor/ Monitor  Information integrated in advance  Stored in wh for direct querying and analysis
  • 4. ADVANTAGES OF WAREHOUSING APPROACH • High query performance • But not necessarily most current information • Doesn’t interfere with local processing at sources • Complex queries at warehouse • OLTP at information sources • Information copied at warehouse • Can modify, annotate, summarize, restructure, etc. • Can store historical information • Security, no auditing • Has caught on in industry
  • 5. DATA WAREHOUSE EVOLUTION TIME 2000 1995 1990 1985 1980 1960 1975 Information- Based Management Data Revolution “Middle Ages” “Prehistoric Times” Relational Databases PC’s and Spreadsheets End-user Interfaces 1st DW Article DW Confs. Vendor DW Frameworks Company DWs “Building the DW” Inmon (1992) Data Replication Tools
  • 6. WHAT IS A DATA WAREHOUSE? A PRACTITIONERS VIEWPOINT “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant
  • 7. A DATA WAREHOUSE IS... • Stored collection of diverse data • A solution to data integration problem • Single repository of information • Subject-oriented • Organized by subject, not by application • Used for analysis, data mining, etc. • Optimized differently from transaction-oriented db • User interface aimed at executive
  • 8. A DATA WAREHOUSE IS... (CONTINUED) • Large volume of data (Gb, Tb) • Non-volatile • Historical • Time attributes are important • Updates infrequent • May be append-only • Examples • All transactions ever at WalMart • Complete client histories at insurance firm • Stockbroker financial information and portfolios
  • 9. SUMMARY Operational Systems Enterprise Modeling Business Information Guide Data Warehouse Catalog Data Warehouse Population Data Warehouse Business Information Interface
  • 10. WAREHOUSE IS A SPECIALIZED DB Standard DB • Mostly updates • Many small transactions • Mb - Gb of data • Current snapshot • Index/hash on p.k. • Raw data • Thousands of users (e.g., clerical users) Warehouse  Mostly reads  Queries are long and complex  Gb - Tb of data  History  Lots of scans  Summarized, reconciled data  Hundreds of users (e.g., decision-makers, analysts)
  • 11. DATA WAREHOUSE ARCHITECTURES: CONCEPTUAL VIEW • Single-layer • Every data element is stored once only • Virtual warehouse • Two-layer • Real-time + derived data • Most commonly used approach in industry today “Real-time data” Operational systems Informational systems Derived Data Real-time data Operational systems Informational systems
  • 12. THREE-LAYER ARCHITECTURE: CONCEPTUAL VIEW • Transformation of real-time data to derived data really requires two steps Derived Data Real-time data Operational systems Informational systems Reconciled Data Physical Implementation of the Data Warehouse View level “Particular informational needs”
  • 13. WAREHOUSE ARCHITECTURE Source Source Source Extractor/ Monitor Extractor/ Monitor Extractor/ Monitor Integrator Warehouse Query & Analysis Client Client ... Metadata
  • 14. WHAT IS DATA MINING? • Data Mining is: • (1) The efficient discovery of previously unknown, valid, potentially useful, understandable patterns in large datasets • (2) The analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner
  • 15. OVERVIEW OF TERMS • Data: a set of facts (items) D, usually stored in a database • Pattern: an expression E in a language L, that describes a subset of facts • Attribute: a field in an item I in D. • Interestingness: a function ID,L that maps an expression E in L into a measure space M
  • 17. EXAMPLES OF LARGE DATASETS • Government: IRS, NGA, … • Large corporations • WALMART: 20M transactions per day • MOBIL: 100 TB geological databases • AT&T 300 M calls per day • Credit card companies • Scientific • NASA, EOS project: 50 GB per hour • Environmental datasets
  • 18. EXAMPLES OF DATA MINING APPLICATIONS 1. Fraud detection: credit cards, phone cards 2. Marketing: customer targeting 3. Data Warehousing: Walmart 4. Astronomy 5. Molecular biology
  • 19. THE DATA MINING PROCESS 1. Understand the domain 2. Create a dataset: Select the interesting attributes Data cleaning and preprocessing 3. Choose the data mining task and the specific algorithm 4. Interpret the results, and possibly return to 2
  • 20. HOW DATA MINING IS USED 1. Identify the problem 2. Use data mining techniques to transform the data into informatio 3. Act on the information 4. Measure the results

Editor's Notes

  1. The slides for this text are organized into several modules. Each lecture contains about enough material for a 1.25 hour class period. (The time estimate is very approximate--it will vary with the instructor, and lectures also differ in length; so use this as a rough guideline.) This lecture is the first of two in Module (1). Module (1): Introduction (DBMS, Relational Model) Module (2): Storage and File Organizations (Disks, Buffering, Indexes) Module (3): Database Concepts (Relational Queries, DDL/ICs, Views and Security) Module (4): Relational Implementation (Query Evaluation, Optimization) Module (5): Database Design (ER Model, Normalization, Physical Design, Tuning) Module (6): Transaction Processing (Concurrency Control, Recovery) Module (7): Advanced Topics