This presentation discusses the following topics:
Introduction
Objectives
What is Data Mining?
Data Mining Applications
Data Warehousing
Advantages and disadvantages
Trends and Current Issues
Future Research Possibilities
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
Introduction to Data warehousiing and Mining
1. Department of Information Technology 1Data base Technologies (ITB4201)
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology
Introduction to Data Mining
& Data Warehousing
2. Department of Information Technology 2Data base Technologies (ITB4201)
Action Plan
• Introduction
• Objectives
• What is Data Mining?
• Data Mining Applications
• Data Warehousing
• Advantages and disadvantages
• Trends and Current Issues
• Future Research Possibilities
3. Department of Information Technology 3Data base Technologies (ITB4201)
Introduction
What is Data Mining?
Data Mining is the process of collecting large amounts of raw data and
transforming that data into useful information.
Data Warehousing?
A Data Warehouse is a computerized collection of mined data.
4. Department of Information Technology 4Data base Technologies (ITB4201)
Objectives
• Explore the business applications of data mining &
warehousing
• Explain the advantages & disadvantages
• Uncover software used in data mining.
• Find what data mining is used for.
• Discover current trends, regulation, and future uses of the
technology.
5. Department of Information Technology 5Data base Technologies (ITB4201)
What is Data Mining?
Data mining is the practice of
searching through large amounts
of computerized data to find useful
patterns or trends (American
Heritage Dictionary, 2008).
6. Department of Information Technology 6Data base Technologies (ITB4201)
Data Mining Applications
Banking
Detect Fraudulent Activity
Insurance
Risk Assessment
Medicine/Healthcare
Enhance Research
Retail
Track consumer buying trends
7. Department of Information Technology 7Data base Technologies (ITB4201)
Cross-Industry Standard Process for Data Mining
- Understanding the business
- Understanding the data
- Data preparation
- Modeling
- Evaluation
- Deployment
CRISP-DM
8. Department of Information Technology 8Data base Technologies (ITB4201)
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single, complete, and consistent
store of data obtained from a variety of sources and made
available to end users in a way they can understand and use it in
a business context.”
-- Barry Devlin, IBM Consultant
9. Department of Information Technology 9Data base Technologies (ITB4201)
Data Mining
Advantages
• Improves Customer Satisfaction/service
• Saves Time and Money
• Increases Sales Effectiveness
• Increases profitability
10. Department of Information Technology 10Data base Technologies (ITB4201)
Data Mining
Disadvantages
–Require skilled technical users to interpret and analyze data
from warehouse
–Validity of the patterns
• Related to real world circumstances
–Unable to Identify Casual Relationships
–Reserved for the few instead of the many
11. Department of Information Technology 11Data base Technologies (ITB4201)
A Data Warehouse is...
• Stored collection of diverse data
– A solution to data integration problem
– Single repository of information
• Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.
• Optimized differently from transaction-oriented db
• User interface aimed at executive
12. Department of Information Technology 12Data base Technologies (ITB4201)
A Data Warehouse is... (continued)
• Large volume of data (Gb, Tb)
• Non-volatile
– Historical
– Time attributes are important
• Updates infrequent
• May be append-only
• Examples
– All transactions ever at WalMart
– Complete client histories at insurance firm
– Stockbroker financial information and portfolios
13. Department of Information Technology 13Data base Technologies (ITB4201)
Data Warehousing
Advantages
–Access to information
–Data Inconsistency
–Decrease Computing Cost
–Productivity Increase
–Increase company profits
14. Department of Information Technology 14Data base Technologies (ITB4201)
Data Warehousing
Disadvantages
–Data must be cleaned, loaded, and extracted
• 80% of the overall process
–User Variability
• Proper Training
–Difficult to Maintain
• Incongruence among systems
15. Department of Information Technology 15Data base Technologies (ITB4201)
Current Issues
Data Quality
– Duplicated records
– Lack of Data Standards
– Human Error
Inoperability
– Lack of communications among existing systems
Mission Creep
16. Department of Information Technology 16Data base Technologies (ITB4201)
Trends & Current Issues
•4 Major Trends
•Data – growing amount collected to be sifted
•Hardware – growing performance & storage
•Scientific Computing – theory, experiment, simulation
•Business – Meet higher standard in order to foresee risks, opportunities, and
benefits for the company
•Growing quickly due to renewal with new methodology frequently discovered
•Applications of uses & methodology to medical, marketing, operations, & others
•The government is closely reviewing the uses of data mining, due to the possibilities
both good and bad
•Counterterrorism data mining has been done, but in some instances has been
deemed a violation of privacy
17. Department of Information Technology 17Data base Technologies (ITB4201)
Future Research Possibilities
• Government’s uses for data mining.
–National Security
–Terrorism Detection
• Identity theft through data mining.
18. Department of Information Technology 18Data base Technologies (ITB4201)
Conclusion/Analysis
• Data mining is the extraction of information that can
predict future trends & behaviors
• Requires a large amount of data to be collected, and then
stored in data warehouse
• Possible violation of privacy in some circumstances
• Government is getting involved with regulation, despite
the counterterrorism program being a possible violation
19. Department of Information Technology 19Data base Technologies (ITB4201)
Test Yourself
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above
20. Department of Information Technology 20Data base Technologies (ITB4201)
Answers
1. What is true about data mining?
A. Data Mining is defined as the procedure of extracting information from huge sets of data
B. Data mining also involves other processes such as Data Cleaning, Data Integration, Data Transformation
C. Data mining is the procedure of mining knowledge from data.
D. All of the above
2. A goal of data mining includes which of the following?
A. To explain some observed event or condition
B. To confirm that data exists
C. To analyze data for expected relationships
D. To create a new data warehouse
3. A data warehouse is which of the following?
A. Can be updated by end users.
B. Contains numerous naming conventions and formats.
C. Organized around important subject areas.
D. Contains only current data.
4. Which of the following features usually applies to data in a data warehouse?
A.Data are often deleted
B.Most applications consist of transactions
C.Data are rarely deleted
D.Relatively few records are processed by applications
5. Which of the following statement is true?
A.The data warehouse consists of data marts and operational data
B.The data warehouse is used as a source for the operational data
C.The operational data are used as a source for the data warehouse
D.All of the above