SlideShare a Scribd company logo
1 of 22
Introduction to Data Mining
and Data Warehousing
M S . T. K . A N U S U YA
A S S I S TA N T P R O F E S S O R
D E PA RT M E N T O F C O M P U T E R S C I E N C E
B O N S E C O U R S C O L L E G E F O R WO M E N , T H A N J AV U R
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 1
Introduction to Data Mining
What is Data Mining?
Why Data Mining?
Data Extraction
Data Warehouse
Process of Data mining
Evaluation of Database Technology
Data Mining Applications
Data Mining Functionalities
Major Issues of Data Mining
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 2
Data Mining
What is Data Mining?
Data mining is defined as a process used to extract
usable data from a larger set of any raw data. It implies
analysing data patterns in large batches of data using one or
more software. ... Data mining is also known as Knowledge
Discovery in Data (KDD).
Data mining is the analysis step of knowledge discovery in
databases process, or KDD. Data mining is the extraction of
hidden predictive information from large databases is a new
technology with great potential to help companies focus on
the most important information in their data warehouses
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 3
Data Mining (KDD)
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 4
Data Extraction
Data extraction is the act or
process of retrieving data out of
(usually unstructured or poorly
structured) data sources for
further data processing
or data storage (data migration).
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 5
Data Warehouse
What is Data Warehouse?
Data warehousing is the electronic storage of a large
amount of information by a business or organization. A data
warehouse is designed to run query and analysis on
historical data derived from transactional sources for
business intelligence and data mining purposes.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 6
Data mining Process
Data mining is the process of
discovering patterns from large
data sets involving methods at
the intersection of machine
learning, statistics and database
systems. Data mining is an
interdisciplinary subfield of
computer science and statistics
with an overall goal to extract
information from a data set and
transform the information into a
comprehensible structure for
further use.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 7
Data mining Process
The related terms data dredging,
data fishing and data snooping
refer to the use of data mining
methods to sample parts of a
larger data set that are too small
for reliable statistical inferences to
be made about the validity of any
patterns discovered.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 8
Evaluation of Database Technology
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 9
Data Mining Applications
Data analysis and decision support
◦ Market analysis and management
◦ Target marketing, customer relationship management (CRM),
market basket analysis, cross selling, market segmentation
◦ Risk analysis and management
◦ Forecasting, customer retention, improved underwriting, quality
control, competitive analysis
◦ Fraud detection and detection of unusual patterns (outliers)
Other Applications
◦ Text mining (news group, email, documents) and Web mining
◦ Stream data mining
◦ Bioinformatics and bio-data analysis
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 10
Data Mining Functionalities
Concept description: Characterization and discrimination
◦ Generalize, summarize, and contrast data characteristics
Association (correlation and causality)
◦ Diaper  Beer [0.5%, 75%]
Classification and Prediction
◦ Construct models (functions) that describe and distinguish classes or
concepts for future prediction
◦ Presentation: decision-tree, classification rule, neural network
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 11
Data Mining Functionalities
Cluster analysis
◦ Class label is unknown: Group data to form new classes, e.g., cluster
houses to find distribution patterns
◦ Maximizing intra-class similarity & minimizing interclass similarity
Outlier analysis
◦ Outlier: a data object that does not comply with the general behavior of
the data
◦ Useful in fraud detection, rare events analysis
Trend and evolution analysis
◦ Trend and deviation: regression analysis
◦ Sequential pattern mining, periodicity analysis
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 12
Major Issues in Data Mining
Mining methodology
◦ Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
◦ Mining knowledge in multidimensional space.
◦ DM –an interdisciplinary effort
◦ Performance: efficiency, effectiveness, and scalability
◦ Boosting the power of discovery in a networked environment
◦ Handling uncertainity noise or incompleteness of data
◦ Pattern evaluation and pattern or constraint guided mining
User interaction
◦ Interactive mining of knowledge at multiple levels of abstraction
◦ Data mining query languages and ad-hoc mining
◦ Expression and visualization of data mining results
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 13
Major Issues in Data Mining
Efficiency and Scaliability
Efficiency and scability of dm algorithms
Parallel distributed and incremental mining algorithms
Diversity of Database Types
Handling complex types of data
Mining dynamic networked and global data respositories
Applications and social impacts
Social impacts of data mining
Privacy and protective preserving data mining
Domain-specific data mining & invisible data mining
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 14
Applications and Trends in
Data mining
Data mining applications
Social impact of data mining
Trends in data mining
Summary
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 15
Data Mining Applications
Data mining is a young discipline with wide and diverse applications
◦ There is still a nontrivial gap between general principles of data mining and
domain-specific, effective data mining tools for particular applications
Some application domains (covered in this chapter)
◦ Biomedical and DNA data analysis
◦ Financial data analysis
◦ Retail industry
◦ Telecommunication industry
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 16
Social implication of Data mining
Data mining technologies are being used in business in many ways like,
User Security, Inventory and Order Management System and Product
Management etc. Data mining can also influence our leisure time
involving dining and entertainment.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 17
Trends in Data warehousing
Datafication of the enterprise requires more capable data warehouse(IOT)
Physical and logical consolidation help reduce costs
Hadoop optimizes dw environments with distributed file sytem (HDFS) and
parellel MapReduce paradigm excels at processing very large data sets.
Engineered system
On-demand analytics environments
Data compressions enables higher volume
In database analytics simplify analysis (SQL, R)
Consolidation –Private clouds gives more flexibility and reduce costs
Business Analytics gets more accessible
Increased performance with Flash and DRAM
High availability
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 18
Trends in Data Mining
In the next few years, Data warehouse id expected to make a high growth in
software industry especially for
Optimizining the queries
Indexing very large tables
Enhancing SQL
Improving data compression methods
Expanding dimensional modelling
Real Time Data Warehousing
Data Visualization
Parallel processing software implementation to the Data Warehouse Appliances
Multidimensional Analysis and Predictive Analytics
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 19
Conclusion
DW is a designed with the purpose of inducing business decisions by
allowing data consolidation, analysis and reporting at different
aggregate levels.
DW is the process of compiling and organizing data into one common
database, where as data mining refers the process of extracting
meaningful data from that database
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 20
Conclusion
The major trends in data mining includes
 Datafication of the enterprise
 Open source Hadoop program with the distributed file system
(HDFS)
 On demand anaytics Environment
 In database analytics and in memory technologies
 Use of Flash and DRAM for better performance.
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 21
INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 22

More Related Content

What's hot

What's hot (16)

03. Data Preprocessing
03. Data Preprocessing03. Data Preprocessing
03. Data Preprocessing
 
Statistics and Data Mining
Statistics and  Data MiningStatistics and  Data Mining
Statistics and Data Mining
 
Data Preprocessing || Data Mining
Data Preprocessing || Data MiningData Preprocessing || Data Mining
Data Preprocessing || Data Mining
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing ng
Data preprocessing   ngData preprocessing   ng
Data preprocessing ng
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocess
Data preprocessData preprocess
Data preprocess
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 

Similar to Introduction to dm and dw

Similar to Introduction to dm and dw (20)

dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Data Mining Intro
Data Mining IntroData Mining Intro
Data Mining Intro
 
data mining
data miningdata mining
data mining
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Hi2413031309
Hi2413031309Hi2413031309
Hi2413031309
 
Data mining
Data miningData mining
Data mining
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Data mining
Data miningData mining
Data mining
 
Abstract
AbstractAbstract
Abstract
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
A review on data mining
A  review on data miningA  review on data mining
A review on data mining
 
Introduction To Data Mining
Introduction To Data MiningIntroduction To Data Mining
Introduction To Data Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 

More from ANUSUYA T K

Chap3 Device Technology
Chap3 Device TechnologyChap3 Device Technology
Chap3 Device TechnologyANUSUYA T K
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPTANUSUYA T K
 
Introduction to Corel Draw
Introduction to Corel DrawIntroduction to Corel Draw
Introduction to Corel DrawANUSUYA T K
 
Chap 2-pc applications examples
Chap 2-pc applications examplesChap 2-pc applications examples
Chap 2-pc applications examplesANUSUYA T K
 
Chap1 introduction to Pervasive Computing
Chap1 introduction to Pervasive ComputingChap1 introduction to Pervasive Computing
Chap1 introduction to Pervasive ComputingANUSUYA T K
 
Pagemaker7.0 layout
Pagemaker7.0 layoutPagemaker7.0 layout
Pagemaker7.0 layoutANUSUYA T K
 
Mail merge in page maker 7
Mail merge in page maker 7Mail merge in page maker 7
Mail merge in page maker 7ANUSUYA T K
 
Layers and types of cloud
Layers and types of cloudLayers and types of cloud
Layers and types of cloudANUSUYA T K
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment modelsANUSUYA T K
 
Virtual Machine provisioning and migration services
Virtual Machine provisioning and migration servicesVirtual Machine provisioning and migration services
Virtual Machine provisioning and migration servicesANUSUYA T K
 
VM for cloud infrastructure
VM for cloud infrastructureVM for cloud infrastructure
VM for cloud infrastructureANUSUYA T K
 
Cloud Computing Environment using Cluster as a service
Cloud Computing Environment using Cluster as a serviceCloud Computing Environment using Cluster as a service
Cloud Computing Environment using Cluster as a serviceANUSUYA T K
 
Data Storage in Cloud computing
Data Storage in Cloud computingData Storage in Cloud computing
Data Storage in Cloud computingANUSUYA T K
 
Migrating into a cloud
Migrating into a cloudMigrating into a cloud
Migrating into a cloudANUSUYA T K
 
Cloud computing introduction
Cloud computing introductionCloud computing introduction
Cloud computing introductionANUSUYA T K
 

More from ANUSUYA T K (16)

Chap3 Device Technology
Chap3 Device TechnologyChap3 Device Technology
Chap3 Device Technology
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
Introduction to Corel Draw
Introduction to Corel DrawIntroduction to Corel Draw
Introduction to Corel Draw
 
Chap 2-pc applications examples
Chap 2-pc applications examplesChap 2-pc applications examples
Chap 2-pc applications examples
 
Chap1 introduction to Pervasive Computing
Chap1 introduction to Pervasive ComputingChap1 introduction to Pervasive Computing
Chap1 introduction to Pervasive Computing
 
Pagemaker7.0 layout
Pagemaker7.0 layoutPagemaker7.0 layout
Pagemaker7.0 layout
 
Mail merge in page maker 7
Mail merge in page maker 7Mail merge in page maker 7
Mail merge in page maker 7
 
Layers and types of cloud
Layers and types of cloudLayers and types of cloud
Layers and types of cloud
 
Cloud deployment models
Cloud deployment modelsCloud deployment models
Cloud deployment models
 
Cc chap-8
Cc chap-8Cc chap-8
Cc chap-8
 
Virtual Machine provisioning and migration services
Virtual Machine provisioning and migration servicesVirtual Machine provisioning and migration services
Virtual Machine provisioning and migration services
 
VM for cloud infrastructure
VM for cloud infrastructureVM for cloud infrastructure
VM for cloud infrastructure
 
Cloud Computing Environment using Cluster as a service
Cloud Computing Environment using Cluster as a serviceCloud Computing Environment using Cluster as a service
Cloud Computing Environment using Cluster as a service
 
Data Storage in Cloud computing
Data Storage in Cloud computingData Storage in Cloud computing
Data Storage in Cloud computing
 
Migrating into a cloud
Migrating into a cloudMigrating into a cloud
Migrating into a cloud
 
Cloud computing introduction
Cloud computing introductionCloud computing introduction
Cloud computing introduction
 

Recently uploaded

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use CasesTechSoup
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsNbelano25
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonhttgc7rh9c
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfNirmal Dwivedi
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptNishitharanjan Rout
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningMarc Dusseiller Dusjagr
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111GangaMaiya1
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxakanksha16arora
 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...OS-operating systems- ch05 (CPU Scheduling) ...
OS-operating systems- ch05 (CPU Scheduling) ...
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lessonQUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
QUATER-1-PE-HEALTH-LC2- this is just a sample of unpacked lesson
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
AIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.pptAIM of Education-Teachers Training-2024.ppt
AIM of Education-Teachers Training-2024.ppt
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
PANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptxPANDITA RAMABAI- Indian political thought GENDER.pptx
PANDITA RAMABAI- Indian political thought GENDER.pptx
 

Introduction to dm and dw

  • 1. Introduction to Data Mining and Data Warehousing M S . T. K . A N U S U YA A S S I S TA N T P R O F E S S O R D E PA RT M E N T O F C O M P U T E R S C I E N C E B O N S E C O U R S C O L L E G E F O R WO M E N , T H A N J AV U R INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 1
  • 2. Introduction to Data Mining What is Data Mining? Why Data Mining? Data Extraction Data Warehouse Process of Data mining Evaluation of Database Technology Data Mining Applications Data Mining Functionalities Major Issues of Data Mining INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 2
  • 3. Data Mining What is Data Mining? Data mining is defined as a process used to extract usable data from a larger set of any raw data. It implies analysing data patterns in large batches of data using one or more software. ... Data mining is also known as Knowledge Discovery in Data (KDD). Data mining is the analysis step of knowledge discovery in databases process, or KDD. Data mining is the extraction of hidden predictive information from large databases is a new technology with great potential to help companies focus on the most important information in their data warehouses INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 3
  • 4. Data Mining (KDD) INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 4
  • 5. Data Extraction Data extraction is the act or process of retrieving data out of (usually unstructured or poorly structured) data sources for further data processing or data storage (data migration). INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 5
  • 6. Data Warehouse What is Data Warehouse? Data warehousing is the electronic storage of a large amount of information by a business or organization. A data warehouse is designed to run query and analysis on historical data derived from transactional sources for business intelligence and data mining purposes. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 6
  • 7. Data mining Process Data mining is the process of discovering patterns from large data sets involving methods at the intersection of machine learning, statistics and database systems. Data mining is an interdisciplinary subfield of computer science and statistics with an overall goal to extract information from a data set and transform the information into a comprehensible structure for further use. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 7
  • 8. Data mining Process The related terms data dredging, data fishing and data snooping refer to the use of data mining methods to sample parts of a larger data set that are too small for reliable statistical inferences to be made about the validity of any patterns discovered. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 8
  • 9. Evaluation of Database Technology INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 9
  • 10. Data Mining Applications Data analysis and decision support ◦ Market analysis and management ◦ Target marketing, customer relationship management (CRM), market basket analysis, cross selling, market segmentation ◦ Risk analysis and management ◦ Forecasting, customer retention, improved underwriting, quality control, competitive analysis ◦ Fraud detection and detection of unusual patterns (outliers) Other Applications ◦ Text mining (news group, email, documents) and Web mining ◦ Stream data mining ◦ Bioinformatics and bio-data analysis INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 10
  • 11. Data Mining Functionalities Concept description: Characterization and discrimination ◦ Generalize, summarize, and contrast data characteristics Association (correlation and causality) ◦ Diaper  Beer [0.5%, 75%] Classification and Prediction ◦ Construct models (functions) that describe and distinguish classes or concepts for future prediction ◦ Presentation: decision-tree, classification rule, neural network INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 11
  • 12. Data Mining Functionalities Cluster analysis ◦ Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns ◦ Maximizing intra-class similarity & minimizing interclass similarity Outlier analysis ◦ Outlier: a data object that does not comply with the general behavior of the data ◦ Useful in fraud detection, rare events analysis Trend and evolution analysis ◦ Trend and deviation: regression analysis ◦ Sequential pattern mining, periodicity analysis INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 12
  • 13. Major Issues in Data Mining Mining methodology ◦ Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web ◦ Mining knowledge in multidimensional space. ◦ DM –an interdisciplinary effort ◦ Performance: efficiency, effectiveness, and scalability ◦ Boosting the power of discovery in a networked environment ◦ Handling uncertainity noise or incompleteness of data ◦ Pattern evaluation and pattern or constraint guided mining User interaction ◦ Interactive mining of knowledge at multiple levels of abstraction ◦ Data mining query languages and ad-hoc mining ◦ Expression and visualization of data mining results INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 13
  • 14. Major Issues in Data Mining Efficiency and Scaliability Efficiency and scability of dm algorithms Parallel distributed and incremental mining algorithms Diversity of Database Types Handling complex types of data Mining dynamic networked and global data respositories Applications and social impacts Social impacts of data mining Privacy and protective preserving data mining Domain-specific data mining & invisible data mining INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 14
  • 15. Applications and Trends in Data mining Data mining applications Social impact of data mining Trends in data mining Summary INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 15
  • 16. Data Mining Applications Data mining is a young discipline with wide and diverse applications ◦ There is still a nontrivial gap between general principles of data mining and domain-specific, effective data mining tools for particular applications Some application domains (covered in this chapter) ◦ Biomedical and DNA data analysis ◦ Financial data analysis ◦ Retail industry ◦ Telecommunication industry INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 16
  • 17. Social implication of Data mining Data mining technologies are being used in business in many ways like, User Security, Inventory and Order Management System and Product Management etc. Data mining can also influence our leisure time involving dining and entertainment. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 17
  • 18. Trends in Data warehousing Datafication of the enterprise requires more capable data warehouse(IOT) Physical and logical consolidation help reduce costs Hadoop optimizes dw environments with distributed file sytem (HDFS) and parellel MapReduce paradigm excels at processing very large data sets. Engineered system On-demand analytics environments Data compressions enables higher volume In database analytics simplify analysis (SQL, R) Consolidation –Private clouds gives more flexibility and reduce costs Business Analytics gets more accessible Increased performance with Flash and DRAM High availability INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 18
  • 19. Trends in Data Mining In the next few years, Data warehouse id expected to make a high growth in software industry especially for Optimizining the queries Indexing very large tables Enhancing SQL Improving data compression methods Expanding dimensional modelling Real Time Data Warehousing Data Visualization Parallel processing software implementation to the Data Warehouse Appliances Multidimensional Analysis and Predictive Analytics INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 19
  • 20. Conclusion DW is a designed with the purpose of inducing business decisions by allowing data consolidation, analysis and reporting at different aggregate levels. DW is the process of compiling and organizing data into one common database, where as data mining refers the process of extracting meaningful data from that database INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 20
  • 21. Conclusion The major trends in data mining includes  Datafication of the enterprise  Open source Hadoop program with the distributed file system (HDFS)  On demand anaytics Environment  In database analytics and in memory technologies  Use of Flash and DRAM for better performance. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 21
  • 22. INTRODUCTION TO DATA MINING AND DATA WAREHOUSING 22