SlideShare a Scribd company logo
1 of 20
Data Mining & Data Warehousing
RDBMS
• Data Warehousing
– Who need Data Warehousing
– Architecture of Data warehousing
– Types of Data Warehousing
– Components of Data Warehouse
– Applications
– Advantage and Disadvantages
• Data Mining
• Engage your Audience
• Capture Audience Attention
Data Warehouse
Definition: According to Inmon, a data warehouse is a subject oriented,
integrated, time-variant, and non-volatile collection of data. He defined the terms
in the sentence as follows:
Subject Oriented: Data that gives information about a particular subject
instead of about a company's ongoing operations.
Integrated: Data that is gathered into the data warehouse from a variety of
sources and merged into a coherent whole.
Time-variant: All data in the data warehouse is identified with a particular time
period.
Non-volatile: Data is stable in a data warehouse. More data is added but data
Is never removed.
• A data warehousing is a technique for collecting and managing data from
varied sources to provide meaningful business insights. It is a blend of
technologies and components which allows the strategic use of data.
Continue..
A Data Warehouse works as a central repository where
information arrives from one or more data sources. Data flows
into a data warehouse from the transactional system and
other relational databases.
Data may be:
• Structured
• Semi-structured
• Unstructured data
Who needs Data warehouse?
Data warehouse is needed for all types of users like:
• Decision makers who rely on mass amount of data
• Users who use customized, complex processes to obtain information
from multiple data sources.
• It is also used by the people who want simple technology to access the
data
• It also essential for those people who want a systematic approach for
making decisions.
• If the user wants fast performance on a huge amount of data which is a
necessity for reports, grids or charts, then Data warehouse proves useful.
• Data warehouse is a first step If you want to discover 'hidden patterns' of
data-flows and groupings.
Architecture
Three-Tier Data Warehouse Architecture
Generally a data warehouses adopts a three-tier architecture. Following are
the three tiers of the data warehouse architecture:
1. Bottom Tier − The bottom tier of the architecture is the data warehouse database
server. It is the relational database system. We use the back end tools and utilities to
feed data into the bottom tier. These back end tools and utilities perform the Extract,
Clean, Load, and refresh functions.
2. Middle Tier − In the middle tier, we have the OLAP Server that can be implemented
in either of the following ways.
• By Relational OLAP (ROLAP), which is an extended relational database management
system. The ROLAP maps the operations on multidimensional data to standard
relational operations
• By Multidimensional OLAP (MOLAP) model, which directly implements the
multidimensional data and operations.
Architecture
Three-Tier Data Warehouse Architecture
Top-Tier − This tier is the front-end client layer. This layer holds
the query tools and reporting tools, analysis tools and data
mining tools.
Types of Data warehouse
Three main types of Data Warehouses are:
1. Enterprise Data Warehouse:
Enterprise Data Warehouse is a centralized warehouse. It provides decision support service
across the enterprise. It offers a unified approach for organizing and representing data. It
also provide the ability to classify data according to the subject and give access according to
those divisions.
2. Operational Data Store:
Operational Data Store, which is also called ODS, are nothing but data store required when
neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS,
Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities
like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of
business, such as sales, finance, sales or finance. In an independent data mart, data can
collect directly from sources.
Components of Data warehouse
Four components of Data Warehouses are:
• Load manager: Load manager is also called the front component. It performs with all the
operations associated with the extraction and load of data into the warehouse. These
operations include transformations to prepare the data for entering into the Data warehouse.
• Warehouse Manager: Warehouse manager performs operations associated with the
management of the data in the warehouse. It performs operations like analysis of data to
ensure consistency, creation of indexes and views, generation of denormalization and
aggregations, transformation and merging of source data and archiving and baking-up data.
• Query Manager: Query manager is also known as backend component. It performs all the
operation operations related to the management of user queries. The operations of this Data
warehouse components are direct queries to the appropriate tables for scheduling the
execution of queries.
• End-user access tools:
This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3.
Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.
Applications of Data warehouse
• Airline:
In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer
program promotions, etc.
• Banking:
It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the
market research, performance analysis of the product and operations.
• Healthcare:
Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports,
share data with tie-in insurance companies, medical aid services, etc.
• Public sector:
In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and
analyze tax records, health policy records, for every individual.
• Telecommunication:
A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions.
• Hospitality Industry:
This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where
they want to target clients based on their feedback and travel patterns.
Advantages & Disadvantages
Advantages of Data Warehouse:
• Data warehouse allows business users to quickly access critical data from some sources all in
one place.
• Data warehouse provides consistent information on various cross-functional activities. It is
also supporting ad-hoc reporting and query.
• Data Warehouse helps to integrate many sources of data to reduce stress on the production
system.
• Data warehouse helps to reduce total turnaround time for analysis and reporting.
• Restructuring and Integration make it easier for the user to use for reporting and analysis.
• Data warehouse allows users to access critical data from the number of sources in a single
place. Therefore, it saves user's time of retrieving data from multiple sources.
• Data warehouse stores a large amount of historical data. This helps users to analyze different
time periods and trends to make future predictions.
Advantages & Disadvantages
Disadvantages of Data Warehouse:
• Not an ideal option for unstructured data.
• Creation and Implementation of Data Warehouse is surely time confusing affair.
• Data Warehouse can be outdated relatively quickly
• Difficult to make changes in data types and ranges, data source schema, indexes, and
queries.
• The data warehouse may seem easy, but actually, it is too complex for the average
users.
• Despite best efforts at project management, data warehousing project scope will
always increase.
• Sometime warehouse users will develop different business rules.
• Organisations need to spend lots of their resources for training and Implementation
purpose.
Data Mining
• Data Mining is defined as extracting information from huge sets of data. In
other words, we can say that data mining is the procedure of mining
knowledge from data.
or
• Data mining is looking for hidden, valid, and potentially useful patterns in
huge data sets. Data Mining is all about discovering unsuspected/
previously unknown relationships amongst the data.
• It is a multi-disciplinary skill that uses machine learning, statistics, AI and
database technology.
• The insights derived via Data Mining can be used for marketing, fraud
detection, and scientific discovery, etc.
• Data mining is also called as Knowledge discovery, Knowledge extraction,
data/pattern analysis, information harvesting, etc.
Data Mining Techniques
Data mining also
involves other processes
such as Data Cleaning,
Data Integration, Data
Transformation, Data
Mining, Pattern
Evaluation and Data
Presentation.
There are several
techniques which are
used to extract
information.
Implementation Process
1. Business understanding: In this phase, business and data-mining goals are
established.
• First, you need to understand business and client objectives. You need to define
what your client wants (which many times even they do not know themselves)
• Take stock of the current data mining scenario. Factor in resources, assumption,
constraints, and other significant factors into your assessment.
2. Data understanding: In this phase, sanity check on data is
performed to check whether its appropriate for the data mining
goals.
• First, data is collected from multiple data sources available in the
organization.
• Next, the step is to search for properties of acquired data.
3. Data preparation: In this phase, data is made production ready.
• The data from different sources should be selected, cleaned,
transformed, formatted, anALYZED, and constructed (if
required).
• Data cleaning is a process to "clean" the data by smoothing
noisy data and filling in missing values.
Continue..
• Data transformation: Data transformation operations would contribute toward
the success of the mining process.
• Smoothing: It helps to remove noise from the data.
• Aggregation: Summary or aggregation operations are applied to the data. I.e., the
weekly sales data is aggregated to calculate the monthly and yearly total.
• Generalization: In this step, Low-level data is replaced by higher-level concepts
with the help of concept hierarchies. For example, the city is replaced by the
county.
• Normalization: Normalization performed when the attribute data are scaled up o
scaled down. Example: Data should fall in the range -2.0 to 2.0 post-normalization.
Continue
4. Modelling : In this phase, mathematical models are used to determine
data patterns.
• Based on the business objectives, suitable modeling techniques should be
selected for the prepared dataset.
• Create a scenario to test check the quality and validity of the model.
• Run the model on the prepared dataset.
5. Evaluation: In this phase, patterns identified are evaluated against the
business objectives.
• Results generated by the data mining model should be evaluated against
the business objectives.
• A go or no-go decision is taken to move the model in the deployment
phase.
Continue..
6. Deployment: In the deployment phase, you ship your data mining discoveries to
everyday business operations.
• The knowledge or information discovered during data mining process should be
made easy to understand for non-technical stakeholders.
• A detailed deployment plan, for shipping, maintenance, and monitoring of data
mining discoveries is created.
• A final project report is created with lessons learned and key experiences during
the project. This helps to improve the organization's business policy.
Data Mining & Data Warehousing

More Related Content

What's hot

What's hot (20)

Artificial Intelligence: Data Mining
Artificial Intelligence: Data MiningArtificial Intelligence: Data Mining
Artificial Intelligence: Data Mining
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business intelligence concepts & application
Business intelligence concepts & applicationBusiness intelligence concepts & application
Business intelligence concepts & application
 
Big data
Big dataBig data
Big data
 
Data mining tutorial
Data mining tutorialData mining tutorial
Data mining tutorial
 
Data mining Introduction
Data mining IntroductionData mining Introduction
Data mining Introduction
 
Data mining techniques unit 1
Data mining techniques  unit 1Data mining techniques  unit 1
Data mining techniques unit 1
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Basic analtyics & advanced analtyics
Basic analtyics & advanced analtyicsBasic analtyics & advanced analtyics
Basic analtyics & advanced analtyics
 
Data warehousing and data mining
Data warehousing and data miningData warehousing and data mining
Data warehousing and data mining
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)Data mining & data warehousing (ppt)
Data mining & data warehousing (ppt)
 
Introduction to Big Data & Analytics
Introduction to Big Data & AnalyticsIntroduction to Big Data & Analytics
Introduction to Big Data & Analytics
 
Data mining in marketing
Data mining in marketingData mining in marketing
Data mining in marketing
 
Business analytics and data mining
Business analytics and data miningBusiness analytics and data mining
Business analytics and data mining
 
Unit 4 Advanced Data Analytics
Unit 4 Advanced Data AnalyticsUnit 4 Advanced Data Analytics
Unit 4 Advanced Data Analytics
 

Similar to Data Mining & Data Warehousing

Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehousessuser7fc7eb
 
Data Mart Lake Ware.pptx
Data Mart Lake Ware.pptxData Mart Lake Ware.pptx
Data Mart Lake Ware.pptxBalasundaramSr
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxParnalSatle
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxvipush1
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptxAnusuya123
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxSalehaMariyam
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data AnalyticsUtkarsh Sharma
 
data warehousing
data warehousingdata warehousing
data warehousing143sohil
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxHarsha Patel
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and ImplementationSHIKHA GAUTAM
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemKiran kumar
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 

Similar to Data Mining & Data Warehousing (20)

Cognos datawarehouse
Cognos datawarehouseCognos datawarehouse
Cognos datawarehouse
 
Data Mart Lake Ware.pptx
Data Mart Lake Ware.pptxData Mart Lake Ware.pptx
Data Mart Lake Ware.pptx
 
DATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptxDATA WAREHOUSING.2.pptx
DATA WAREHOUSING.2.pptx
 
ETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptxETL processes , Datawarehouse and Datamarts.pptx
ETL processes , Datawarehouse and Datamarts.pptx
 
presentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptxpresentationofism-complete-1-100227093028-phpapp01.pptx
presentationofism-complete-1-100227093028-phpapp01.pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data Mining
Data MiningData Mining
Data Mining
 
DWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptxDWDM Unit 1 (1).pptx
DWDM Unit 1 (1).pptx
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
data warehousing
data warehousingdata warehousing
data warehousing
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Unit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptxUnit-IV-Introduction to Data Warehousing .pptx
Unit-IV-Introduction to Data Warehousing .pptx
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Warehouse Planning and Implementation
Warehouse Planning and ImplementationWarehouse Planning and Implementation
Warehouse Planning and Implementation
 
Business Intelligence Data Warehouse System
Business Intelligence Data Warehouse SystemBusiness Intelligence Data Warehouse System
Business Intelligence Data Warehouse System
 
Oracle sql plsql & dw
Oracle sql plsql & dwOracle sql plsql & dw
Oracle sql plsql & dw
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Unit 1
Unit 1Unit 1
Unit 1
 

More from AAKANKSHA JAIN

Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision treeAAKANKSHA JAIN
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]AAKANKSHA JAIN
 
Inheritance in OOPs with java
Inheritance in OOPs with javaInheritance in OOPs with java
Inheritance in OOPs with javaAAKANKSHA JAIN
 
Distributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageDistributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageAAKANKSHA JAIN
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESAAKANKSHA JAIN
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management SystemAAKANKSHA JAIN
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSAAKANKSHA JAIN
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge countAAKANKSHA JAIN
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit NotesAAKANKSHA JAIN
 
Advance image processing
Advance image processingAdvance image processing
Advance image processingAAKANKSHA JAIN
 

More from AAKANKSHA JAIN (12)

Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
 
Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]Dimension reduction techniques[Feature Selection]
Dimension reduction techniques[Feature Selection]
 
Inheritance in OOPs with java
Inheritance in OOPs with javaInheritance in OOPs with java
Inheritance in OOPs with java
 
OOPs with java
OOPs with javaOOPs with java
OOPs with java
 
Probability
ProbabilityProbability
Probability
 
Distributed Database Design and Relational Query Language
Distributed Database Design and Relational Query LanguageDistributed Database Design and Relational Query Language
Distributed Database Design and Relational Query Language
 
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUESDISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
DISTRIBUTED DATABASE WITH RECOVERY TECHNIQUES
 
Distributed Database Management System
Distributed Database Management SystemDistributed Database Management System
Distributed Database Management System
 
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMSDETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
DETECTION OF MALICIOUS EXECUTABLES USING RULE BASED CLASSIFICATION ALGORITHMS
 
Fingerprint matching using ridge count
Fingerprint matching using ridge countFingerprint matching using ridge count
Fingerprint matching using ridge count
 
Image processing second unit Notes
Image processing second unit NotesImage processing second unit Notes
Image processing second unit Notes
 
Advance image processing
Advance image processingAdvance image processing
Advance image processing
 

Recently uploaded

Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage examplePragyanshuParadkar1
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixingviprabot1
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 

Recently uploaded (20)

Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
young call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Serviceyoung call girls in Green Park🔝 9953056974 🔝 escort Service
young call girls in Green Park🔝 9953056974 🔝 escort Service
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
DATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage exampleDATA ANALYTICS PPT definition usage example
DATA ANALYTICS PPT definition usage example
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
Effects of rheological properties on mixing
Effects of rheological properties on mixingEffects of rheological properties on mixing
Effects of rheological properties on mixing
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 

Data Mining & Data Warehousing

  • 1. Data Mining & Data Warehousing
  • 2. RDBMS • Data Warehousing – Who need Data Warehousing – Architecture of Data warehousing – Types of Data Warehousing – Components of Data Warehouse – Applications – Advantage and Disadvantages • Data Mining • Engage your Audience • Capture Audience Attention
  • 3. Data Warehouse Definition: According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. He defined the terms in the sentence as follows: Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing operations. Integrated: Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole. Time-variant: All data in the data warehouse is identified with a particular time period. Non-volatile: Data is stable in a data warehouse. More data is added but data Is never removed. • A data warehousing is a technique for collecting and managing data from varied sources to provide meaningful business insights. It is a blend of technologies and components which allows the strategic use of data.
  • 4. Continue.. A Data Warehouse works as a central repository where information arrives from one or more data sources. Data flows into a data warehouse from the transactional system and other relational databases. Data may be: • Structured • Semi-structured • Unstructured data
  • 5. Who needs Data warehouse? Data warehouse is needed for all types of users like: • Decision makers who rely on mass amount of data • Users who use customized, complex processes to obtain information from multiple data sources. • It is also used by the people who want simple technology to access the data • It also essential for those people who want a systematic approach for making decisions. • If the user wants fast performance on a huge amount of data which is a necessity for reports, grids or charts, then Data warehouse proves useful. • Data warehouse is a first step If you want to discover 'hidden patterns' of data-flows and groupings.
  • 6. Architecture Three-Tier Data Warehouse Architecture Generally a data warehouses adopts a three-tier architecture. Following are the three tiers of the data warehouse architecture: 1. Bottom Tier − The bottom tier of the architecture is the data warehouse database server. It is the relational database system. We use the back end tools and utilities to feed data into the bottom tier. These back end tools and utilities perform the Extract, Clean, Load, and refresh functions. 2. Middle Tier − In the middle tier, we have the OLAP Server that can be implemented in either of the following ways. • By Relational OLAP (ROLAP), which is an extended relational database management system. The ROLAP maps the operations on multidimensional data to standard relational operations • By Multidimensional OLAP (MOLAP) model, which directly implements the multidimensional data and operations.
  • 7. Architecture Three-Tier Data Warehouse Architecture Top-Tier − This tier is the front-end client layer. This layer holds the query tools and reporting tools, analysis tools and data mining tools.
  • 8. Types of Data warehouse Three main types of Data Warehouses are: 1. Enterprise Data Warehouse: Enterprise Data Warehouse is a centralized warehouse. It provides decision support service across the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to classify data according to the subject and give access according to those divisions. 2. Operational Data Store: Operational Data Store, which is also called ODS, are nothing but data store required when neither Data warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real time. Hence, it is widely preferred for routine activities like storing records of the Employees. 3. Data Mart: A data mart is a subset of the data warehouse. It specially designed for a particular line of business, such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.
  • 9. Components of Data warehouse Four components of Data Warehouses are: • Load manager: Load manager is also called the front component. It performs with all the operations associated with the extraction and load of data into the warehouse. These operations include transformations to prepare the data for entering into the Data warehouse. • Warehouse Manager: Warehouse manager performs operations associated with the management of the data in the warehouse. It performs operations like analysis of data to ensure consistency, creation of indexes and views, generation of denormalization and aggregations, transformation and merging of source data and archiving and baking-up data. • Query Manager: Query manager is also known as backend component. It performs all the operation operations related to the management of user queries. The operations of this Data warehouse components are direct queries to the appropriate tables for scheduling the execution of queries. • End-user access tools: This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.
  • 10. Applications of Data warehouse • Airline: In the Airline system, it is used for operation purpose like crew assignment, analyses of route profitability, frequent flyer program promotions, etc. • Banking: It is widely used in the banking sector to manage the resources available on desk effectively. Few banks also used for the market research, performance analysis of the product and operations. • Healthcare: Healthcare sector also used Data warehouse to strategize and predict outcomes, generate patient's treatment reports, share data with tie-in insurance companies, medical aid services, etc. • Public sector: In the public sector, data warehouse is used for intelligence gathering. It helps government agencies to maintain and analyze tax records, health policy records, for every individual. • Telecommunication: A data warehouse is used in this sector for product promotions, sales decisions and to make distribution decisions. • Hospitality Industry: This Industry utilizes warehouse services to design as well as estimate their advertising and promotion campaigns where they want to target clients based on their feedback and travel patterns.
  • 11. Advantages & Disadvantages Advantages of Data Warehouse: • Data warehouse allows business users to quickly access critical data from some sources all in one place. • Data warehouse provides consistent information on various cross-functional activities. It is also supporting ad-hoc reporting and query. • Data Warehouse helps to integrate many sources of data to reduce stress on the production system. • Data warehouse helps to reduce total turnaround time for analysis and reporting. • Restructuring and Integration make it easier for the user to use for reporting and analysis. • Data warehouse allows users to access critical data from the number of sources in a single place. Therefore, it saves user's time of retrieving data from multiple sources. • Data warehouse stores a large amount of historical data. This helps users to analyze different time periods and trends to make future predictions.
  • 12. Advantages & Disadvantages Disadvantages of Data Warehouse: • Not an ideal option for unstructured data. • Creation and Implementation of Data Warehouse is surely time confusing affair. • Data Warehouse can be outdated relatively quickly • Difficult to make changes in data types and ranges, data source schema, indexes, and queries. • The data warehouse may seem easy, but actually, it is too complex for the average users. • Despite best efforts at project management, data warehousing project scope will always increase. • Sometime warehouse users will develop different business rules. • Organisations need to spend lots of their resources for training and Implementation purpose.
  • 13. Data Mining • Data Mining is defined as extracting information from huge sets of data. In other words, we can say that data mining is the procedure of mining knowledge from data. or • Data mining is looking for hidden, valid, and potentially useful patterns in huge data sets. Data Mining is all about discovering unsuspected/ previously unknown relationships amongst the data. • It is a multi-disciplinary skill that uses machine learning, statistics, AI and database technology. • The insights derived via Data Mining can be used for marketing, fraud detection, and scientific discovery, etc. • Data mining is also called as Knowledge discovery, Knowledge extraction, data/pattern analysis, information harvesting, etc.
  • 14. Data Mining Techniques Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation, Data Mining, Pattern Evaluation and Data Presentation. There are several techniques which are used to extract information.
  • 15. Implementation Process 1. Business understanding: In this phase, business and data-mining goals are established. • First, you need to understand business and client objectives. You need to define what your client wants (which many times even they do not know themselves) • Take stock of the current data mining scenario. Factor in resources, assumption, constraints, and other significant factors into your assessment.
  • 16. 2. Data understanding: In this phase, sanity check on data is performed to check whether its appropriate for the data mining goals. • First, data is collected from multiple data sources available in the organization. • Next, the step is to search for properties of acquired data. 3. Data preparation: In this phase, data is made production ready. • The data from different sources should be selected, cleaned, transformed, formatted, anALYZED, and constructed (if required). • Data cleaning is a process to "clean" the data by smoothing noisy data and filling in missing values.
  • 17. Continue.. • Data transformation: Data transformation operations would contribute toward the success of the mining process. • Smoothing: It helps to remove noise from the data. • Aggregation: Summary or aggregation operations are applied to the data. I.e., the weekly sales data is aggregated to calculate the monthly and yearly total. • Generalization: In this step, Low-level data is replaced by higher-level concepts with the help of concept hierarchies. For example, the city is replaced by the county. • Normalization: Normalization performed when the attribute data are scaled up o scaled down. Example: Data should fall in the range -2.0 to 2.0 post-normalization.
  • 18. Continue 4. Modelling : In this phase, mathematical models are used to determine data patterns. • Based on the business objectives, suitable modeling techniques should be selected for the prepared dataset. • Create a scenario to test check the quality and validity of the model. • Run the model on the prepared dataset. 5. Evaluation: In this phase, patterns identified are evaluated against the business objectives. • Results generated by the data mining model should be evaluated against the business objectives. • A go or no-go decision is taken to move the model in the deployment phase.
  • 19. Continue.. 6. Deployment: In the deployment phase, you ship your data mining discoveries to everyday business operations. • The knowledge or information discovered during data mining process should be made easy to understand for non-technical stakeholders. • A detailed deployment plan, for shipping, maintenance, and monitoring of data mining discoveries is created. • A final project report is created with lessons learned and key experiences during the project. This helps to improve the organization's business policy.