SlideShare a Scribd company logo
1 of 16
1 I NAME OF PRESENTER
Data Mining
Ashis Kumar Chanda
Department of Computer Science and Engineering
University of Dhaka
2 I NAME OF PRESENTERCSE, DU2
Key concepts
 What is Data mining
 Why learn Data mining
 Data type
 Warehouse & OLAP
 Data Cleaning, Integration
 Associations, Item sets, Support, Confidence
3 I NAME OF PRESENTERCSE, DU3
Data Mining
 Data mining refers to Knowledge mining
from large amount of data
 Also known as “Knowledge Discovery from
Data” or KDD
 Target is to find a Hidden Pattern
4 I NAME OF PRESENTER
 We can’t get all type of information through Query
 Query not support Statistical analysis
 Again, we can apply artificial intelligence & find new
patterns or structures
CSE, DU4
Why learn data mining
Query provide values but data mining provides idea that help
to take (business ) decision
Ex: Women live at “Dhanmondi” & older than 40 years
most frequently buy “Jamdani Shari” at “Arong”
5 I NAME OF PRESENTERCSE, DU5
Data type
 Tabular (Transaction data) Most commonly
used
 Spatial Data (Remote sensing data/
encoded data)
 Tree Data ( xml )
 Graphs (www, bio-molecular)
 Sequence (DNA, activity log)
 Text, multimedia data
6 I NAME OF PRESENTERCSE, DU6
Warehouse & OLAP
Ware House
Data Source
Warehouse is an archive of information gathered from
multiple sources
Suppose a Banking database where each has a data source
that stores all transactions of that area. And all data source
will provide a clean/safe copy at Warehouse
7 I NAME OF PRESENTERCSE, DU7
Warehouse & OLAP
There is several issues about Warehouse:
 When and how to gather data
 What schema/pattern to use
 Data transformation & cleaning
 How to update
“Warehouse is a collection of data marts”
Where data mart is store of data in specialized pattern
8 I NAME OF PRESENTERCSE, DU8
Warehouse & OLAP
OLAP: Online Analytical Processing
OLAP tools support interactive analysis of summary Information
OLAP permits an analyst to view different summaries of
multidimensional data
Item name
Dress
Fig: Data Cube
9 I NAME OF PRESENTERCSE, DU9
Data cleaning
There may be some missing data, duplicate data, dirty data
So we need to data cleaning
Some methods:
 Ignore the tuple (not effective unless tuple contain many
missing attribute)
 Fill missing values (time consuming)
 Fill with a global value (like: unknown)
 Use mean attribute
 Use most probable value
10 I NAME OF PRESENTERCSE, DU10
11 I NAME OF PRESENTERCSE, DU11
Associations & Item sets
Associations:
An associations is a rule of the form if X then Y
It is denoted as X-> Y
Example: if there is an exam then I read
Item Sets:
For any rule if X->Y & Y->X Then X, Y are called item-set
Example:
People buying school books in January also by notebook
People buying school note books in January also by book
12 I NAME OF PRESENTERCSE, DU12
Support & confidence
Support:
The proportion of transactions in the data set which contains
the itemset
Confidence:
The conditional probability that an item appears in a
transaction when another item appears.
13 I NAME OF PRESENTERCSE, DU13
Support & confidence
Support for {I₁,I₂}
= support_count(I1 U I2)/ |D|
= 4/9
Confidence for I1 → I2
=support_count(I1 U I2) /
support_count(I1)
= 4/6
14 I NAME OF PRESENTERCSE, DU14
Association rules
Where, support count(AUB) is the number of transactions
containing the itemsets AUB, and support count(A) is the
number of transactions containing the itemset A.
•Association rules can be generated as follows:
1. For each frequent itemset l, generate all nonempty subsets
of l.
2. For every nonempty subset s of l, output the rule “s → (l-
s)” if support count(l)/support count(s) >= min_conf,
where min_conf is the minimum confidence threshold.
15 I NAME OF PRESENTERCSE, DU15
Summary
Basic topics: Data mining, Data cleaning, Warehouse, OLAP
Term: Association, Item-set, Support, Confidence
16 I NAME OF PRESENTERCSE, DU16
References
- Data Mining Concepts & Techniques
by J. Han & M. Kamber
- Database system Concept
by Abraham Sillberschatz, Korth, Sudarshan
- Lecture of Dr. S. Srinath
Institute of Technology at Madras, India

More Related Content

What's hot

Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Matteo Manca
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...Leon Osinski
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningIOSR Journals
 
DataVsStatistics
DataVsStatisticsDataVsStatistics
DataVsStatisticsjpheintz
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyLeon Osinski
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data mining nouman javed
Data mining   nouman javedData mining   nouman javed
Data mining nouman javednouman javed
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data miningEr. Nawaraj Bhandari
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...ijsrd.com
 
Data pre processing
Data pre processingData pre processing
Data pre processingpommurajopt
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingsuganmca14
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data ManagementDaniel JACOB
 
EDI Training Module 11: Publishing Data in the EDI Repository
EDI Training Module 11:  Publishing Data in the EDI RepositoryEDI Training Module 11:  Publishing Data in the EDI Repository
EDI Training Module 11: Publishing Data in the EDI RepositoryEnvironmental Data Initiative
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysisDataminingTools Inc
 
A Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining AlgorithmsA Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining Algorithmsijsrd.com
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyushastronish
 

What's hot (20)

Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
 
23.database
23.database23.database
23.database
 
A classification of methods for frequent pattern mining
A classification of methods for frequent pattern miningA classification of methods for frequent pattern mining
A classification of methods for frequent pattern mining
 
DataVsStatistics
DataVsStatisticsDataVsStatistics
DataVsStatistics
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and why
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data mining nouman javed
Data mining   nouman javedData mining   nouman javed
Data mining nouman javed
 
Research trends in data warehousing and data mining
Research trends in data warehousing and data miningResearch trends in data warehousing and data mining
Research trends in data warehousing and data mining
 
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
Simulation and Performance Analysis of Long Term Evolution (LTE) Cellular Net...
 
1 db terms
1 db terms1 db terms
1 db terms
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Database
DatabaseDatabase
Database
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data structures
Data structuresData structures
Data structures
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
EDI Training Module 11: Publishing Data in the EDI Repository
EDI Training Module 11:  Publishing Data in the EDI RepositoryEDI Training Module 11:  Publishing Data in the EDI Repository
EDI Training Module 11: Publishing Data in the EDI Repository
 
Data Mining: Classification and analysis
Data Mining: Classification and analysisData Mining: Classification and analysis
Data Mining: Classification and analysis
 
A Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining AlgorithmsA Study of Various Projected Data Based Pattern Mining Algorithms
A Study of Various Projected Data Based Pattern Mining Algorithms
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 

Similar to Data Mining

Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Edwin S. Garcia
 
MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsDataminingTools Inc
 
MS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database ConceptsMS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database Conceptssqlserver content
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
SIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdfSIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdfAdiSuputrq
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business IntelligenceSukirti Garg
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data WarehousingAswathy S Nair
 
DMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryDMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryJohannes Hoppe
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questionsSatyam Jaiswal
 
Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMedia Perspectives
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R NotesLakshmiSarvani6
 
DMDW Lesson 03 - Data Warehouse Theory
DMDW Lesson 03 - Data Warehouse TheoryDMDW Lesson 03 - Data Warehouse Theory
DMDW Lesson 03 - Data Warehouse TheoryJohannes Hoppe
 
DMML1_overview.ppt
DMML1_overview.pptDMML1_overview.ppt
DMML1_overview.pptbutest
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and TechniquesPratik Tambekar
 
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopA Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopIJTET Journal
 

Similar to Data Mining (20)

Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019Data Mining @ BSU Malolos 2019
Data Mining @ BSU Malolos 2019
 
MS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database ConceptsMS Sql Server: Introduction To Database Concepts
MS Sql Server: Introduction To Database Concepts
 
MS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database ConceptsMS SQL SERVER: Introduction To Database Concepts
MS SQL SERVER: Introduction To Database Concepts
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Lec 1 introduction
Lec 1 introductionLec 1 introduction
Lec 1 introduction
 
SIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdfSIM PASCA CHAPTER 4.pdf
SIM PASCA CHAPTER 4.pdf
 
Data mining
Data miningData mining
Data mining
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Data Mining and Data Warehousing
Data Mining and Data WarehousingData Mining and Data Warehousing
Data Mining and Data Warehousing
 
DMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining TheoryDMDW Lesson 04 - Data Mining Theory
DMDW Lesson 04 - Data Mining Theory
 
Data warehousing interview questions
Data warehousing interview questionsData warehousing interview questions
Data warehousing interview questions
 
Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno Siebes
 
Introduction to Data Science With R Notes
Introduction to Data Science With R NotesIntroduction to Data Science With R Notes
Introduction to Data Science With R Notes
 
DMDW Lesson 03 - Data Warehouse Theory
DMDW Lesson 03 - Data Warehouse TheoryDMDW Lesson 03 - Data Warehouse Theory
DMDW Lesson 03 - Data Warehouse Theory
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
Database system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdfDatabase system Handbook 4th muhammad sharif.pdf
Database system Handbook 4th muhammad sharif.pdf
 
DMML1_overview.ppt
DMML1_overview.pptDMML1_overview.ppt
DMML1_overview.ppt
 
Data Mining Concepts and Techniques
Data Mining Concepts and TechniquesData Mining Concepts and Techniques
Data Mining Concepts and Techniques
 
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache HadoopA Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
A Survey on Approaches for Frequent Item Set Mining on Apache Hadoop
 
Data warehousing and Data mining
Data warehousing and Data mining Data warehousing and Data mining
Data warehousing and Data mining
 

More from Ashis Chanda

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsAshis Chanda
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHRAshis Chanda
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesAshis Chanda
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Ashis Chanda
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksAshis Chanda
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening searchAshis Chanda
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern miningAshis Chanda
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...Ashis Chanda
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Ashis Chanda
 

More from Ashis Chanda (11)

Understanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methodsUnderstanding medical concepts and codes through NLP methods
Understanding medical concepts and codes through NLP methods
 
Word2vector
Word2vectorWord2vector
Word2vector
 
Information extraction from EHR
Information extraction from EHRInformation extraction from EHR
Information extraction from EHR
 
Understanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational DatabasesUnderstanding Natural Language Queries over Relational Databases
Understanding Natural Language Queries over Relational Databases
 
Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...Multi-class Image Classification using Deep Convolutional Networks on extreme...
Multi-class Image Classification using Deep Convolutional Networks on extreme...
 
Full resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networksFull resolution image compression with recurrent neural networks
Full resolution image compression with recurrent neural networks
 
Iterative deepening search
Iterative deepening searchIterative deepening search
Iterative deepening search
 
Periodic pattern mining
Periodic pattern miningPeriodic pattern mining
Periodic pattern mining
 
An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...An efficient approach to mine flexible periodic patterns in time series datab...
An efficient approach to mine flexible periodic patterns in time series datab...
 
Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)Frequent Pattern Growth Algorithm (FP growth method)
Frequent Pattern Growth Algorithm (FP growth method)
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 

Recently uploaded

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIShubhangi Sonawane
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 

Recently uploaded (20)

Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-IIFood Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

Data Mining

  • 1. 1 I NAME OF PRESENTER Data Mining Ashis Kumar Chanda Department of Computer Science and Engineering University of Dhaka
  • 2. 2 I NAME OF PRESENTERCSE, DU2 Key concepts  What is Data mining  Why learn Data mining  Data type  Warehouse & OLAP  Data Cleaning, Integration  Associations, Item sets, Support, Confidence
  • 3. 3 I NAME OF PRESENTERCSE, DU3 Data Mining  Data mining refers to Knowledge mining from large amount of data  Also known as “Knowledge Discovery from Data” or KDD  Target is to find a Hidden Pattern
  • 4. 4 I NAME OF PRESENTER  We can’t get all type of information through Query  Query not support Statistical analysis  Again, we can apply artificial intelligence & find new patterns or structures CSE, DU4 Why learn data mining Query provide values but data mining provides idea that help to take (business ) decision Ex: Women live at “Dhanmondi” & older than 40 years most frequently buy “Jamdani Shari” at “Arong”
  • 5. 5 I NAME OF PRESENTERCSE, DU5 Data type  Tabular (Transaction data) Most commonly used  Spatial Data (Remote sensing data/ encoded data)  Tree Data ( xml )  Graphs (www, bio-molecular)  Sequence (DNA, activity log)  Text, multimedia data
  • 6. 6 I NAME OF PRESENTERCSE, DU6 Warehouse & OLAP Ware House Data Source Warehouse is an archive of information gathered from multiple sources Suppose a Banking database where each has a data source that stores all transactions of that area. And all data source will provide a clean/safe copy at Warehouse
  • 7. 7 I NAME OF PRESENTERCSE, DU7 Warehouse & OLAP There is several issues about Warehouse:  When and how to gather data  What schema/pattern to use  Data transformation & cleaning  How to update “Warehouse is a collection of data marts” Where data mart is store of data in specialized pattern
  • 8. 8 I NAME OF PRESENTERCSE, DU8 Warehouse & OLAP OLAP: Online Analytical Processing OLAP tools support interactive analysis of summary Information OLAP permits an analyst to view different summaries of multidimensional data Item name Dress Fig: Data Cube
  • 9. 9 I NAME OF PRESENTERCSE, DU9 Data cleaning There may be some missing data, duplicate data, dirty data So we need to data cleaning Some methods:  Ignore the tuple (not effective unless tuple contain many missing attribute)  Fill missing values (time consuming)  Fill with a global value (like: unknown)  Use mean attribute  Use most probable value
  • 10. 10 I NAME OF PRESENTERCSE, DU10
  • 11. 11 I NAME OF PRESENTERCSE, DU11 Associations & Item sets Associations: An associations is a rule of the form if X then Y It is denoted as X-> Y Example: if there is an exam then I read Item Sets: For any rule if X->Y & Y->X Then X, Y are called item-set Example: People buying school books in January also by notebook People buying school note books in January also by book
  • 12. 12 I NAME OF PRESENTERCSE, DU12 Support & confidence Support: The proportion of transactions in the data set which contains the itemset Confidence: The conditional probability that an item appears in a transaction when another item appears.
  • 13. 13 I NAME OF PRESENTERCSE, DU13 Support & confidence Support for {I₁,I₂} = support_count(I1 U I2)/ |D| = 4/9 Confidence for I1 → I2 =support_count(I1 U I2) / support_count(I1) = 4/6
  • 14. 14 I NAME OF PRESENTERCSE, DU14 Association rules Where, support count(AUB) is the number of transactions containing the itemsets AUB, and support count(A) is the number of transactions containing the itemset A. •Association rules can be generated as follows: 1. For each frequent itemset l, generate all nonempty subsets of l. 2. For every nonempty subset s of l, output the rule “s → (l- s)” if support count(l)/support count(s) >= min_conf, where min_conf is the minimum confidence threshold.
  • 15. 15 I NAME OF PRESENTERCSE, DU15 Summary Basic topics: Data mining, Data cleaning, Warehouse, OLAP Term: Association, Item-set, Support, Confidence
  • 16. 16 I NAME OF PRESENTERCSE, DU16 References - Data Mining Concepts & Techniques by J. Han & M. Kamber - Database system Concept by Abraham Sillberschatz, Korth, Sudarshan - Lecture of Dr. S. Srinath Institute of Technology at Madras, India