SlideShare a Scribd company logo
1 of 21
Data Mining
Prepared by
R. Abhinav Bharadwaj
Overview
 Introduction
 Explanation of Data Mining Techniques
 Advantages
 Applications
 Privacy
Data Mining
 What is Data Mining?
 “The process of semi automatically analyzing large
databases to find useful patterns” (Silberschatz)
 KDD – “Knowledge Discovery in Databases” (3)
 “Attempts to discover rules and patterns from data”
 Discover Rules  Make Predictions
 Areas of Use
 Internet – Discover needs of customers
 Economics – Predict stock prices
 Science – Predict environmental change
 Medicine – Match patients with similar problems 
cure
Example of Data Mining
 Credit Card Company wants to discover
information about clients from databases. Want to
find:
 Clients who respond to promotions in “Junk Mail”
 Clients that are likely to change to another
competitor
 Clients that are likely to not pay
 Services that clients use to try to promote
services affiliated with the Credit Card Company
 Anything else that may help the Company
provide/ promote services to help their clients
and ultimately make more money.
Data Mining & Data
Warehousing
 Data Warehouse: “is a repository (or archive) of
information gathered from multiple sources, stored
under a unified schema, at a single site.”
(Silberschatz)
 Collect data  Store in single repository
 Allows for easier query development as a single
repository can be queried.
 Data Mining:
 Analyzing databases or Data Warehouses to discover
patterns about the data to gain knowledge.
 Knowledge is power.
Discovery of Knowledge
Data Mining Techniques
 Classification
 Clustering
 Regression
 Association Rules
Classification
 Classification: Given a set of items that have several
classes, and given the past instances (training
instances) with their associated class, Classification
is the process of predicting the class of a new item.
 Therefore to classify the new item and identify to
which class it belongs
 Example: A bank wants to classify its Home Loan
Customers into groups according to their response to
bank advertisements. The bank might use the
classifications “Responds Rarely, Responds
Sometimes, Responds Frequently”.
 The bank will then attempt to find rules about the
customers that respond Frequently and Sometimes.
 The rules could be used to predict needs of potential
customers.
Technique for Classification
 Decision-Tree Classifiers
Job
Income
Job
Income Income
Carpenter
Engineer Doctor
Bad Good Bad Good Bad Good
<30K <40K <50K>50K >90K
>100K
Predicting credit risk of a person with the jobs specified.
Clustering
 “Clustering algorithms find groups of items
that are similar. … It divides a data set so that
records with similar content are in the same
group, and groups are as different as possible
from each other. ” (2)
 Example: Insurance company could use
clustering to group clients by their age,
location and types of insurance purchased.
 The categories are unspecified and this is
referred to as ‘unsupervised learning’
Clustering
 Group Data into Clusters
 Similar data is grouped in the same cluster
 Dissimilar data is grouped in the same cluster
 How is this achieved ?
 K-Nearest Neighbor
 A classification method that classifies a point
by calculating the distances between the
point and points in the training data set. Then
it assigns the point to the class that is most
common among its k-nearest neighbors
(where k is an integer).(2)
 Hierarchical
 Group data into t-trees
Regression
 “Regression deals with the prediction of a value,
rather than a class.” (1, P747)
 Example: Find out if there is a relationship
between smoking patients and cancer related
illness.
 Given values: X1, X2... Xn
 Objective predict variable Y
 One way is to predict coefficients a0, a1, a2
 Y = a0 + a1X1 + a2X2 + … anXn
 Linear Regression
Regression
 Example graph:
 Line of Best Fit
 Curve Fitting
Association Rules
 “An association algorithm creates rules that
describe how often events have occurred
together.” (2)
 Example: When a customer buys a hammer,
then 90% of the time they will buy nails.
Association Rules
 Support: “is a measure of what fraction of the
population satisfies both the antecedent and the
consequent of the rule”(1, p748)
 Example:
 People who buy hotdog buns also buy hotdog sausages in
99% of cases. = High Support
 People who buy hotdog buns buy hangers in 0.005% of
cases. = Low support
 Situations where there is high support for the
antecedent are worth careful attention
 E.g. Hotdog sausages should be placed in near hotdog buns
in supermarkets if there is also high confidence.
Association Rules
 Confidence: “is a measure of how often the consequent
is true when the antecedent is true.” (1, p748)
 Example:
 90% of Hotdog bun purchases are accompanied by hotdog
sausages.
 High confidence is meaningful as we can derive rules.
 Hotdog bun Hotdog sausage
 2 rules may have different confidence levels and
have the same support.
 E.g. Hotdog sausage  Hotdog bun may have a
much lower confidence than Hotdog bun  Hotdog
sausage yet they both can have the same support.
Advantages of Data Mining
 Provides new knowledge from existing data
 Public databases
 Government sources
 Company Databases
 Old data can be used to develop new knowledge
 New knowledge can be used to improve services or
products
 Improvements lead to:
 Bigger profits
 More efficient service
Uses of Data Mining
 Sales/ Marketing
 Diversify target market
 Identify clients needs to increase response rates
 Risk Assessment
 Identify Customers that pose high credit risk
 Fraud Detection
 Identify people misusing the system. E.g. People
who have two Social Security Numbers
 Customer Care
 Identify customers likely to change providers
 Identify customer needs
Applications of Data Mining
(4)
Source IDC 1998
Privacy Concerns
 Effective Data Mining requires large sources of data
 To achieve a wide spectrum of data, link multiple data
sources
 Linking sources leads can be problematic for privacy as
follows: If the following histories of a customer were
linked:
 Shopping History
 Credit History
 Bank History
 Employment History
 The users life story can be painted from the collected
data
References
1. Silberschatz, Korth, Sudarshan, “Database System
Concepts”, 5th
Edition, Mc Graw Hill, 2005
2. http://www.twocrows.com/glossary.htm, “Two Crows,
Data Mining Glossary”
3. http://en.wikipedia.org/wiki/Data_mining, “Wikipedia”
4. http://phoenix.phys.clemson.edu/tutorials/excel/regres
sion.html
5. http://wwwmaths.anu.edu.au/~steve/pdcn.pdf

More Related Content

What's hot

Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsPromptCloud
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 

What's hot (18)

Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 

Viewers also liked

О БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойО БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойПервый БИТ, Челябинск
 
Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Vincenzo Madonna
 
KnowAtlanta Spring 2006
KnowAtlanta Spring 2006KnowAtlanta Spring 2006
KnowAtlanta Spring 2006Michelle Bourg
 
Learn objective 2
Learn objective 2Learn objective 2
Learn objective 2Hantao Mai
 
Magazine Research
Magazine ResearchMagazine Research
Magazine Researchsreed5
 
მოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზემოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზეგელა გიორგი
 
Tipuri de flori
Tipuri de floriTipuri de flori
Tipuri de floriAna-Gri
 
Analysis about University Homepage
Analysis about University HomepageAnalysis about University Homepage
Analysis about University HomepageMena Govindasamy
 
Moje podjetje – moje sanje
Moje podjetje – moje sanje Moje podjetje – moje sanje
Moje podjetje – moje sanje Sabina Gosenca
 
D01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaD01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaVincenzo Madonna
 
Beauty is healths reward
Beauty is healths rewardBeauty is healths reward
Beauty is healths rewarddermnurse
 
Animalele
AnimaleleAnimalele
AnimaleleAna-Gri
 

Viewers also liked (20)

О БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойО БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службой
 
Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010
 
Apartments In Noida Extention
Apartments In Noida ExtentionApartments In Noida Extention
Apartments In Noida Extention
 
Flats Noida Extention
Flats Noida ExtentionFlats Noida Extention
Flats Noida Extention
 
writing sample redacted
writing sample redactedwriting sample redacted
writing sample redacted
 
KnowAtlanta Spring 2006
KnowAtlanta Spring 2006KnowAtlanta Spring 2006
KnowAtlanta Spring 2006
 
ლოცვა
ლოცვა ლოცვა
ლოცვა
 
Aironet
AironetAironet
Aironet
 
ლოცვის საზღაური
ლოცვის საზღაურილოცვის საზღაური
ლოცვის საზღაური
 
Learn objective 2
Learn objective 2Learn objective 2
Learn objective 2
 
Magazine Research
Magazine ResearchMagazine Research
Magazine Research
 
მოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზემოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზე
 
LCron_FinalPPP
LCron_FinalPPPLCron_FinalPPP
LCron_FinalPPP
 
Tipuri de flori
Tipuri de floriTipuri de flori
Tipuri de flori
 
Analysis about University Homepage
Analysis about University HomepageAnalysis about University Homepage
Analysis about University Homepage
 
Moje podjetje – moje sanje
Moje podjetje – moje sanje Moje podjetje – moje sanje
Moje podjetje – moje sanje
 
D01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaD01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-Madonna
 
Beauty is healths reward
Beauty is healths rewardBeauty is healths reward
Beauty is healths reward
 
Animalele
AnimaleleAnimalele
Animalele
 
კრება უფალთან
კრება უფალთანკრება უფალთან
კრება უფალთან
 

Similar to Data mining and its concepts

Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningNofel Elahi
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Miningabdulraqeebalareqi1
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningDatamining Tools
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceFaisal Aziz
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Techniqueijtsrd
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare DesignCarmen Martin
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Mayer_R_212017705
Mayer_R_212017705Mayer_R_212017705
Mayer_R_212017705Ryno Mayer
 

Similar to Data mining and its concepts (20)

Data mining
Data miningData mining
Data mining
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Mining
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data mining
Data miningData mining
Data mining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare Design
 
Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Mayer_R_212017705
Mayer_R_212017705Mayer_R_212017705
Mayer_R_212017705
 

Recently uploaded

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

Data mining and its concepts

  • 1. Data Mining Prepared by R. Abhinav Bharadwaj
  • 2. Overview  Introduction  Explanation of Data Mining Techniques  Advantages  Applications  Privacy
  • 3. Data Mining  What is Data Mining?  “The process of semi automatically analyzing large databases to find useful patterns” (Silberschatz)  KDD – “Knowledge Discovery in Databases” (3)  “Attempts to discover rules and patterns from data”  Discover Rules  Make Predictions  Areas of Use  Internet – Discover needs of customers  Economics – Predict stock prices  Science – Predict environmental change  Medicine – Match patients with similar problems  cure
  • 4. Example of Data Mining  Credit Card Company wants to discover information about clients from databases. Want to find:  Clients who respond to promotions in “Junk Mail”  Clients that are likely to change to another competitor  Clients that are likely to not pay  Services that clients use to try to promote services affiliated with the Credit Card Company  Anything else that may help the Company provide/ promote services to help their clients and ultimately make more money.
  • 5. Data Mining & Data Warehousing  Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz)  Collect data  Store in single repository  Allows for easier query development as a single repository can be queried.  Data Mining:  Analyzing databases or Data Warehouses to discover patterns about the data to gain knowledge.  Knowledge is power.
  • 7. Data Mining Techniques  Classification  Clustering  Regression  Association Rules
  • 8. Classification  Classification: Given a set of items that have several classes, and given the past instances (training instances) with their associated class, Classification is the process of predicting the class of a new item.  Therefore to classify the new item and identify to which class it belongs  Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications “Responds Rarely, Responds Sometimes, Responds Frequently”.  The bank will then attempt to find rules about the customers that respond Frequently and Sometimes.  The rules could be used to predict needs of potential customers.
  • 9. Technique for Classification  Decision-Tree Classifiers Job Income Job Income Income Carpenter Engineer Doctor Bad Good Bad Good Bad Good <30K <40K <50K>50K >90K >100K Predicting credit risk of a person with the jobs specified.
  • 10. Clustering  “Clustering algorithms find groups of items that are similar. … It divides a data set so that records with similar content are in the same group, and groups are as different as possible from each other. ” (2)  Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased.  The categories are unspecified and this is referred to as ‘unsupervised learning’
  • 11. Clustering  Group Data into Clusters  Similar data is grouped in the same cluster  Dissimilar data is grouped in the same cluster  How is this achieved ?  K-Nearest Neighbor  A classification method that classifies a point by calculating the distances between the point and points in the training data set. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).(2)  Hierarchical  Group data into t-trees
  • 12. Regression  “Regression deals with the prediction of a value, rather than a class.” (1, P747)  Example: Find out if there is a relationship between smoking patients and cancer related illness.  Given values: X1, X2... Xn  Objective predict variable Y  One way is to predict coefficients a0, a1, a2  Y = a0 + a1X1 + a2X2 + … anXn  Linear Regression
  • 13. Regression  Example graph:  Line of Best Fit  Curve Fitting
  • 14. Association Rules  “An association algorithm creates rules that describe how often events have occurred together.” (2)  Example: When a customer buys a hammer, then 90% of the time they will buy nails.
  • 15. Association Rules  Support: “is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule”(1, p748)  Example:  People who buy hotdog buns also buy hotdog sausages in 99% of cases. = High Support  People who buy hotdog buns buy hangers in 0.005% of cases. = Low support  Situations where there is high support for the antecedent are worth careful attention  E.g. Hotdog sausages should be placed in near hotdog buns in supermarkets if there is also high confidence.
  • 16. Association Rules  Confidence: “is a measure of how often the consequent is true when the antecedent is true.” (1, p748)  Example:  90% of Hotdog bun purchases are accompanied by hotdog sausages.  High confidence is meaningful as we can derive rules.  Hotdog bun Hotdog sausage  2 rules may have different confidence levels and have the same support.  E.g. Hotdog sausage  Hotdog bun may have a much lower confidence than Hotdog bun  Hotdog sausage yet they both can have the same support.
  • 17. Advantages of Data Mining  Provides new knowledge from existing data  Public databases  Government sources  Company Databases  Old data can be used to develop new knowledge  New knowledge can be used to improve services or products  Improvements lead to:  Bigger profits  More efficient service
  • 18. Uses of Data Mining  Sales/ Marketing  Diversify target market  Identify clients needs to increase response rates  Risk Assessment  Identify Customers that pose high credit risk  Fraud Detection  Identify people misusing the system. E.g. People who have two Social Security Numbers  Customer Care  Identify customers likely to change providers  Identify customer needs
  • 19. Applications of Data Mining (4) Source IDC 1998
  • 20. Privacy Concerns  Effective Data Mining requires large sources of data  To achieve a wide spectrum of data, link multiple data sources  Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked:  Shopping History  Credit History  Bank History  Employment History  The users life story can be painted from the collected data
  • 21. References 1. Silberschatz, Korth, Sudarshan, “Database System Concepts”, 5th Edition, Mc Graw Hill, 2005 2. http://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary” 3. http://en.wikipedia.org/wiki/Data_mining, “Wikipedia” 4. http://phoenix.phys.clemson.edu/tutorials/excel/regres sion.html 5. http://wwwmaths.anu.edu.au/~steve/pdcn.pdf