SlideShare a Scribd company logo
1 of 20
Department of Information Technology 1Data base Technologies (ITB4201)
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology
Introduction to Data Mining
& Data Warehousing
Department of Information Technology 2Data base Technologies (ITB4201)
Action Plan
• Introduction
• Objectives
• What is Data Mining?
• Data Mining Applications
• Data Warehousing
• Advantages and disadvantages
• Trends and Current Issues
• Future Research Possibilities
Department of Information Technology 3Data base Technologies (ITB4201)
Introduction
What is Data Mining?
Data Mining is the process of collecting large amounts of raw data and
transforming that data into useful information.
Data Warehousing?
A Data Warehouse is a computerized collection of mined data.
Department of Information Technology 4Data base Technologies (ITB4201)
Objectives
• Explore the business applications of data mining &
warehousing
• Explain the advantages & disadvantages
• Uncover software used in data mining.
• Find what data mining is used for.
• Discover current trends, regulation, and future uses of the
technology.
Department of Information Technology 5Data base Technologies (ITB4201)
What is Data Mining?
Data mining is the practice of
searching through large amounts
of computerized data to find useful
patterns or trends (American
Heritage Dictionary, 2008).
Department of Information Technology 6Data base Technologies (ITB4201)
Data Mining Applications
Banking
Detect Fraudulent Activity
Insurance
Risk Assessment
Medicine/Healthcare
Enhance Research
Retail
Track consumer buying trends
Department of Information Technology 7Data base Technologies (ITB4201)
Cross-Industry Standard Process for Data Mining
- Understanding the business
- Understanding the data
- Data preparation
- Modeling
- Evaluation
- Deployment
CRISP-DM
Department of Information Technology 8Data base Technologies (ITB4201)
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single, complete, and consistent
store of data obtained from a variety of sources and made
available to end users in a way they can understand and use it in
a business context.”
-- Barry Devlin, IBM Consultant
Department of Information Technology 9Data base Technologies (ITB4201)
Data Mining
Advantages
• Improves Customer Satisfaction/service
• Saves Time and Money
• Increases Sales Effectiveness
• Increases profitability
Department of Information Technology 10Data base Technologies (ITB4201)
Data Mining
Disadvantages
–Require skilled technical users to interpret and analyze data
from warehouse
–Validity of the patterns
• Related to real world circumstances
–Unable to Identify Casual Relationships
–Reserved for the few instead of the many
Department of Information Technology 11Data base Technologies (ITB4201)
A Data Warehouse is...
• Stored collection of diverse data
– A solution to data integration problem
– Single repository of information
• Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive
Department of Information Technology 12Data base Technologies (ITB4201)
A Data Warehouse is... (continued)
• Large volume of data (Gb, Tb)
• Non-volatile
– Historical
– Time attributes are important
• Updates infrequent
• May be append-only
• Examples
– All transactions ever at WalMart
– Complete client histories at insurance firm
– Stockbroker financial information and portfolios
Department of Information Technology 13Data base Technologies (ITB4201)
Data Warehousing
Advantages
–Access to information
–Data Inconsistency
–Decrease Computing Cost
–Productivity Increase
–Increase company profits
Department of Information Technology 14Data base Technologies (ITB4201)
Data Warehousing
Disadvantages
–Data must be cleaned, loaded, and extracted
• 80% of the overall process
–User Variability
• Proper Training
–Difficult to Maintain
• Incongruence among systems
Department of Information Technology 15Data base Technologies (ITB4201)
Current Issues
Data Quality
– Duplicated records
– Lack of Data Standards
– Human Error
Inoperability
– Lack of communications among existing systems
Mission Creep
Department of Information Technology 16Data base Technologies (ITB4201)
Trends & Current Issues
•4 Major Trends
•Data – growing amount collected to be sifted
•Hardware – growing performance & storage
•Scientific Computing – theory, experiment, simulation
•Business – Meet higher standard in order to foresee risks, opportunities, and
benefits for the company
•Growing quickly due to renewal with new methodology frequently discovered
•Applications of uses & methodology to medical, marketing, operations, & others
•The government is closely reviewing the uses of data mining, due to the possibilities
both good and bad
•Counterterrorism data mining has been done, but in some instances has been
deemed a violation of privacy
Department of Information Technology 17Data base Technologies (ITB4201)
Future Research Possibilities
• Government’s uses for data mining.
–National Security
–Terrorism Detection
• Identity theft through data mining.
Department of Information Technology 18Data base Technologies (ITB4201)
Conclusion/Analysis
• Data mining is the extraction of information that can
predict future trends & behaviors
• Requires a large amount of data to be collected, and then
stored in data warehouse
• Possible violation of privacy in some circumstances
• Government is getting involved with regulation, despite
the counterterrorism program being a possible violation
Department of Information Technology 19Data base Technologies (ITB4201)
Test Yourself
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above
Department of Information Technology 20Data base Technologies (ITB4201)
Answers
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above

More Related Content

What's hot

Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented DatabaseEditor IJMTER
 
Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...Eswar Publications
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachAIRCC Publishing Corporation
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligencechirag patil
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...vtunotesbysree
 
A Comparative Study of RDBMs and OODBMs in Relation to Security of Data
A Comparative Study of RDBMs and OODBMs in Relation to Security of DataA Comparative Study of RDBMs and OODBMs in Relation to Security of Data
A Comparative Study of RDBMs and OODBMs in Relation to Security of Datainscit2006
 
Corporate data handling
Corporate data handlingCorporate data handling
Corporate data handlingJaipal Dhobale
 
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentComparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentIJAEMSJORNAL
 
ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048AliAlJadaa
 
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...Beat Signer
 
Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...IAEME Publication
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservationMichael Day
 

What's hot (20)

Survey of Object Oriented Database
Survey of Object Oriented DatabaseSurvey of Object Oriented Database
Survey of Object Oriented Database
 
Recovery techniques
Recovery techniquesRecovery techniques
Recovery techniques
 
Overview of dbms
Overview of dbmsOverview of dbms
Overview of dbms
 
Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...Open Source Platforms Integration for the Development of an Architecture of C...
Open Source Platforms Integration for the Development of an Architecture of C...
 
Information Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis ApproachInformation Retrieval based on Cluster Analysis Approach
Information Retrieval based on Cluster Analysis Approach
 
D1803012022
D1803012022D1803012022
D1803012022
 
Data mining and business intelligence
Data mining and business intelligenceData mining and business intelligence
Data mining and business intelligence
 
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
VTU 7TH SEM CSE DATA WAREHOUSING AND DATA MINING SOLVED PAPERS OF DEC2013 JUN...
 
Indexing and Retrieval of Audio
Indexing and Retrieval of AudioIndexing and Retrieval of Audio
Indexing and Retrieval of Audio
 
A Comparative Study of RDBMs and OODBMs in Relation to Security of Data
A Comparative Study of RDBMs and OODBMs in Relation to Security of DataA Comparative Study of RDBMs and OODBMs in Relation to Security of Data
A Comparative Study of RDBMs and OODBMs in Relation to Security of Data
 
Corporate data handling
Corporate data handlingCorporate data handling
Corporate data handling
 
Database Management & Models
Database Management & ModelsDatabase Management & Models
Database Management & Models
 
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML DocumentComparative Study on Graph-based Information Retrieval: the Case of XML Document
Comparative Study on Graph-based Information Retrieval: the Case of XML Document
 
ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048ontology based- data_integration.ali_aljadaa.1125048
ontology based- data_integration.ali_aljadaa.1125048
 
Cs2305 programming paradigms lecturer notes
Cs2305   programming paradigms lecturer notesCs2305   programming paradigms lecturer notes
Cs2305 programming paradigms lecturer notes
 
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
DBMS Architectures and Features - Lecture 7 - Introduction to Databases (1007...
 
03 Object Dbms Technology
03 Object Dbms Technology03 Object Dbms Technology
03 Object Dbms Technology
 
Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...Entity resolution for hierarchical data using attributes value comparison ove...
Entity resolution for hierarchical data using attributes value comparison ove...
 
Metadata for digital long-term preservation
Metadata for digital long-term preservationMetadata for digital long-term preservation
Metadata for digital long-term preservation
 
Comparision
ComparisionComparision
Comparision
 

Similar to Introduction to Data warehousiing and Mining

krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.pptKRISHNARAJ207
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data MiningIOSR Journals
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptxSyauqiAsyhabira1
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big DataUmair Shafique
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data WarehousingAAKANKSHA JAIN
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentationPriyesh Patel
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data ArchitectureSammer Qader
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxDrNilimaThakur
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxmuflehaljarrah
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousingamooool2000
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A PrimerIJRTEMJOURNAL
 

Similar to Introduction to Data warehousiing and Mining (20)

dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
krithi-talk-impact.ppt
krithi-talk-impact.pptkrithi-talk-impact.ppt
krithi-talk-impact.ppt
 
Data Warehouse Questions
Data Warehouse QuestionsData Warehouse Questions
Data Warehouse Questions
 
DWM
DWMDWM
DWM
 
A Survey on Data Mining
A Survey on Data MiningA Survey on Data Mining
A Survey on Data Mining
 
Applying Big Data
Applying Big DataApplying Big Data
Applying Big Data
 
20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx20211011112936_PPT01-Introduction to Big Data.pptx
20211011112936_PPT01-Introduction to Big Data.pptx
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
Data mining
Data miningData mining
Data mining
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Data Mining & Data Warehousing
Data Mining & Data WarehousingData Mining & Data Warehousing
Data Mining & Data Warehousing
 
final oracle presentation
final oracle presentationfinal oracle presentation
final oracle presentation
 
Information & Data Architecture
Information & Data ArchitectureInformation & Data Architecture
Information & Data Architecture
 
BVRM 402 IMS UNIT V
BVRM 402 IMS UNIT VBVRM 402 IMS UNIT V
BVRM 402 IMS UNIT V
 
BVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptxBVRM 402 IMS Database Concept.pptx
BVRM 402 IMS Database Concept.pptx
 
BIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptxBIG DATA CHAPTER 2 IN DSS.pptx
BIG DATA CHAPTER 2 IN DSS.pptx
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Data Warehouse: A Primer
Data Warehouse: A PrimerData Warehouse: A Primer
Data Warehouse: A Primer
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Clustering
ClusteringClustering
Clustering
 
Classification
ClassificationClassification
Classification
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
Introduction to regression
Introduction to regressionIntroduction to regression
Introduction to regression
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 

Recently uploaded

Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 

Recently uploaded (20)

Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 

Introduction to Data warehousiing and Mining

  • 1. Department of Information Technology 1Data base Technologies (ITB4201) Dr. C.V. Suresh Babu Professor Department of IT Hindustan Institute of Science & Technology Introduction to Data Mining & Data Warehousing
  • 2. Department of Information Technology 2Data base Technologies (ITB4201) Action Plan • Introduction • Objectives • What is Data Mining? • Data Mining Applications • Data Warehousing • Advantages and disadvantages • Trends and Current Issues • Future Research Possibilities
  • 3. Department of Information Technology 3Data base Technologies (ITB4201) Introduction What is Data Mining? Data Mining is the process of collecting large amounts of raw data and transforming that data into useful information. Data Warehousing? A Data Warehouse is a computerized collection of mined data.
  • 4. Department of Information Technology 4Data base Technologies (ITB4201) Objectives • Explore the business applications of data mining & warehousing • Explain the advantages & disadvantages • Uncover software used in data mining. • Find what data mining is used for. • Discover current trends, regulation, and future uses of the technology.
  • 5. Department of Information Technology 5Data base Technologies (ITB4201) What is Data Mining? Data mining is the practice of searching through large amounts of computerized data to find useful patterns or trends (American Heritage Dictionary, 2008).
  • 6. Department of Information Technology 6Data base Technologies (ITB4201) Data Mining Applications Banking Detect Fraudulent Activity Insurance Risk Assessment Medicine/Healthcare Enhance Research Retail Track consumer buying trends
  • 7. Department of Information Technology 7Data base Technologies (ITB4201) Cross-Industry Standard Process for Data Mining - Understanding the business - Understanding the data - Data preparation - Modeling - Evaluation - Deployment CRISP-DM
  • 8. Department of Information Technology 8Data base Technologies (ITB4201) What is a Data Warehouse? A Practitioners Viewpoint “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” -- Barry Devlin, IBM Consultant
  • 9. Department of Information Technology 9Data base Technologies (ITB4201) Data Mining Advantages • Improves Customer Satisfaction/service • Saves Time and Money • Increases Sales Effectiveness • Increases profitability
  • 10. Department of Information Technology 10Data base Technologies (ITB4201) Data Mining Disadvantages –Require skilled technical users to interpret and analyze data from warehouse –Validity of the patterns • Related to real world circumstances –Unable to Identify Casual Relationships –Reserved for the few instead of the many
  • 11. Department of Information Technology 11Data base Technologies (ITB4201) A Data Warehouse is... • Stored collection of diverse data – A solution to data integration problem – Single repository of information • Subject-oriented – Organized by subject, not by application – Used for analysis, data mining, etc. • Optimized differently from transaction-oriented db • User interface aimed at executive
  • 12. Department of Information Technology 12Data base Technologies (ITB4201) A Data Warehouse is... (continued) • Large volume of data (Gb, Tb) • Non-volatile – Historical – Time attributes are important • Updates infrequent • May be append-only • Examples – All transactions ever at WalMart – Complete client histories at insurance firm – Stockbroker financial information and portfolios
  • 13. Department of Information Technology 13Data base Technologies (ITB4201) Data Warehousing Advantages –Access to information –Data Inconsistency –Decrease Computing Cost –Productivity Increase –Increase company profits
  • 14. Department of Information Technology 14Data base Technologies (ITB4201) Data Warehousing Disadvantages –Data must be cleaned, loaded, and extracted • 80% of the overall process –User Variability • Proper Training –Difficult to Maintain • Incongruence among systems
  • 15. Department of Information Technology 15Data base Technologies (ITB4201) Current Issues Data Quality – Duplicated records – Lack of Data Standards – Human Error Inoperability – Lack of communications among existing systems Mission Creep
  • 16. Department of Information Technology 16Data base Technologies (ITB4201) Trends & Current Issues •4 Major Trends •Data – growing amount collected to be sifted •Hardware – growing performance & storage •Scientific Computing – theory, experiment, simulation •Business – Meet higher standard in order to foresee risks, opportunities, and benefits for the company •Growing quickly due to renewal with new methodology frequently discovered •Applications of uses & methodology to medical, marketing, operations, & others •The government is closely reviewing the uses of data mining, due to the possibilities both good and bad •Counterterrorism data mining has been done, but in some instances has been deemed a violation of privacy
  • 17. Department of Information Technology 17Data base Technologies (ITB4201) Future Research Possibilities • Government’s uses for data mining. –National Security –Terrorism Detection • Identity theft through data mining.
  • 18. Department of Information Technology 18Data base Technologies (ITB4201) Conclusion/Analysis • Data mining is the extraction of information that can predict future trends & behaviors • Requires a large amount of data to be collected, and then stored in data warehouse • Possible violation of privacy in some circumstances • Government is getting involved with regulation, despite the counterterrorism program being a possible violation
  • 19. Department of Information Technology 19Data base Technologies (ITB4201) Test Yourself 1. What is true about data mining? A. Data Mining is defined as the procedure of extracting information from huge sets of data B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation C. Data mining is the procedure of mining knowledge from data. D. All of the above 2. A goal of data mining includes which of the following? A. To explain some observed event or condition B. To confirm that data exists C. To analyze data for expected relationships D. To create a new data warehouse 3. A data warehouse is which of the following? A. Can be updated by end users. B. Contains numerous naming conventions and formats. C. Organized around important subject areas. D. Contains only current data. 4. Which of the following features usually applies to data in a data warehouse? A.Data are often deleted B.Most applications consist of transactions C.Data are rarely deleted D.Relatively few records are processed by applications 5. Which of the following statement is true? A.The data warehouse consists of data marts and operational data B.The data warehouse is used as a source for the operational data C.The operational data are used as a source for the data warehouse D.All of the above
  • 20. Department of Information Technology 20Data base Technologies (ITB4201) Answers 1. What is true about data mining? A. Data Mining is defined as the procedure of extracting information from huge sets of data B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation C. Data mining is the procedure of mining knowledge from data. D. All of the above 2. A goal of data mining includes which of the following? A. To explain some observed event or condition B. To confirm that data exists C. To analyze data for expected relationships D. To create a new data warehouse 3. A data warehouse is which of the following? A. Can be updated by end users. B. Contains numerous naming conventions and formats. C. Organized around important subject areas. D. Contains only current data. 4. Which of the following features usually applies to data in a data warehouse? A.Data are often deleted B.Most applications consist of transactions C.Data are rarely deleted D.Relatively few records are processed by applications 5. Which of the following statement is true? A.The data warehouse consists of data marts and operational data B.The data warehouse is used as a source for the operational data C.The operational data are used as a source for the data warehouse D.All of the above