PANDITA RAMABAI- Indian political thought GENDER.pptx
Introduction to dm and dw
1. Introduction to Data Mining
and Data Warehousing
M S . T. K . A N U S U YA
A S S I S TA N T P R O F E S S O R
D E PA RT M E N T O F C O M P U T E R S C I E N C E
B O N S E C O U R S C O L L E G E F O R WO M E N , T H A N J AV U R
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 1
2. Introduction to Data Mining
What is Data Mining?
Why Data Mining?
Data Extraction
Data Warehouse
Process of Data mining
Evaluation of Database Technology
Data Mining Applications
Data Mining Functionalities
Major Issues of Data Mining
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 2
3. Data Mining
What is Data Mining?
Data mining is defined as a process used to extract
usable data from a larger set of any raw data. It implies
analysing data patterns in large batches of data using one or
more software. ... Data mining is also known as Knowledge
Discovery in Data (KDD).
Data mining is the analysis step of knowledge discovery in
databases process, or KDD. Data mining is the extraction of
hidden predictive information from large databases is a new
technology with great potential to help companies focus on
the most important information in their data warehouses
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 3
5. Data Extraction
Data extraction is the act or
process of retrieving data out of
(usually unstructured or poorly
structured) data sources for
further data processing
or data storage (data migration).
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 5
6. Data Warehouse
What is Data Warehouse?
Data warehousing is the electronic storage of a large
amount of information by a business or organization. A data
warehouse is designed to run query and analysis on
historical data derived from transactional sources for
business intelligence and data mining purposes.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 6
7. Data mining Process
Data mining is the process of
discovering patterns from large
data sets involving methods at
the intersection of machine
learning, statistics and database
systems. Data mining is an
interdisciplinary subfield of
computer science and statistics
with an overall goal to extract
information from a data set and
transform the information into a
comprehensible structure for
further use.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 7
8. Data mining Process
The related terms data dredging,
data fishing and data snooping
refer to the use of data mining
methods to sample parts of a
larger data set that are too small
for reliable statistical inferences to
be made about the validity of any
patterns discovered.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 8
10. Data Mining Applications
Data analysis and decision support
◦ Market analysis and management
◦ Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
◦ Risk analysis and management
◦ Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
◦ Fraud detection and detection of unusual patterns (outliers)
Other Applications
◦ Text mining (news group, email, documents) and Web mining
◦ Stream data mining
◦ Bioinformatics and bio-data analysis
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 10
11. Data Mining Functionalities
Concept description: Characterization and discrimination
◦ Generalize, summarize, and contrast data characteristics
Association (correlation and causality)
◦ Diaper Beer [0.5%, 75%]
Classification and Prediction
◦ Construct models (functions) that describe and distinguish classes or
concepts for future prediction
◦ Presentation: decision-tree, classification rule, neural network
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 11
12. Data Mining Functionalities
Cluster analysis
◦ Class label is unknown: Group data to form new classes, e.g., cluster
houses to find distribution patterns
◦ Maximizing intra-class similarity & minimizing interclass similarity
Outlier analysis
◦ Outlier: a data object that does not comply with the general behavior of
the data
◦ Useful in fraud detection, rare events analysis
Trend and evolution analysis
◦ Trend and deviation: regression analysis
◦ Sequential pattern mining, periodicity analysis
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 12
13. Major Issues in Data Mining
Mining methodology
◦ Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
◦ Mining knowledge in multidimensional space.
◦ DM –an interdisciplinary effort
◦ Performance: efficiency, effectiveness, and scalability
◦ Boosting the power of discovery in a networked environment
◦ Handling uncertainity noise or incompleteness of data
◦ Pattern evaluation and pattern or constraint guided mining
User interaction
◦ Interactive mining of knowledge at multiple levels of abstraction
◦ Data mining query languages and ad-hoc mining
◦ Expression and visualization of data mining results
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 13
14. Major Issues in Data Mining
Efficiency and Scaliability
Efficiency and scability of dm algorithms
Parallel distributed and incremental mining algorithms
Diversity of Database Types
Handling complex types of data
Mining dynamic networked and global data respositories
Applications and social impacts
Social impacts of data mining
Privacy and protective preserving data mining
Domain-specific data mining & invisible data mining
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 14
15. Applications and Trends in
Data mining
Data mining applications
Social impact of data mining
Trends in data mining
Summary
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 15
16. Data Mining Applications
Data mining is a young discipline with wide and diverse applications
◦ There is still a nontrivial gap between general principles of data mining and
domain-specific, effective data mining tools for particular applications
Some application domains (covered in this chapter)
◦ Biomedical and DNA data analysis
◦ Financial data analysis
◦ Retail industry
◦ Telecommunication industry
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 16
17. Social implication of Data mining
Data mining technologies are being used in business in many ways like,
User Security, Inventory and Order Management System and Product
Management etc. Data mining can also influence our leisure time
involving dining and entertainment.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 17
18. Trends in Data warehousing
Datafication of the enterprise requires more capable data warehouse(IOT)
Physical and logical consolidation help reduce costs
Hadoop optimizes dw environments with distributed file sytem (HDFS) and
parellel MapReduce paradigm excels at processing very large data sets.
Engineered system
On-demand analytics environments
Data compressions enables higher volume
In database analytics simplify analysis (SQL, R)
Consolidation –Private clouds gives more flexibility and reduce costs
Business Analytics gets more accessible
Increased performance with Flash and DRAM
High availability
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 18
19. Trends in Data Mining
In the next few years, Data warehouse id expected to make a high growth in
software industry especially for
Optimizining the queries
Indexing very large tables
Enhancing SQL
Improving data compression methods
Expanding dimensional modelling
Real Time Data Warehousing
Data Visualization
Parallel processing software implementation to the Data Warehouse Appliances
Multidimensional Analysis and Predictive Analytics
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 19
20. Conclusion
DW is a designed with the purpose of inducing business decisions by
allowing data consolidation, analysis and reporting at different
aggregate levels.
DW is the process of compiling and organizing data into one common
database, where as data mining refers the process of extracting
meaningful data from that database
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 20
21. Conclusion
The major trends in data mining includes
Datafication of the enterprise
Open source Hadoop program with the distributed file system
(HDFS)
On demand anaytics Environment
In database analytics and in memory technologies
Use of Flash and DRAM for better performance.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 21