SlideShare a Scribd company logo
1 of 21
Data Mining
Prepared by
R. Abhinav Bharadwaj
Overview
 Introduction
 Explanation of Data Mining Techniques
 Advantages
 Applications
 Privacy
Data Mining
 What is Data Mining?
 “The process of semi automatically analyzing large
databases to find useful patterns” (Silberschatz)
 KDD – “Knowledge Discovery in Databases” (3)
 “Attempts to discover rules and patterns from data”
 Discover Rules  Make Predictions
 Areas of Use
 Internet – Discover needs of customers
 Economics – Predict stock prices
 Science – Predict environmental change
 Medicine – Match patients with similar problems 
cure
Example of Data Mining
 Credit Card Company wants to discover
information about clients from databases. Want to
find:
 Clients who respond to promotions in “Junk Mail”
 Clients that are likely to change to another
competitor
 Clients that are likely to not pay
 Services that clients use to try to promote
services affiliated with the Credit Card Company
 Anything else that may help the Company
provide/ promote services to help their clients
and ultimately make more money.
Data Mining & Data
Warehousing
 Data Warehouse: “is a repository (or archive) of
information gathered from multiple sources, stored
under a unified schema, at a single site.”
(Silberschatz)
 Collect data  Store in single repository
 Allows for easier query development as a single
repository can be queried.
 Data Mining:
 Analyzing databases or Data Warehouses to discover
patterns about the data to gain knowledge.
 Knowledge is power.
Discovery of Knowledge
Data Mining Techniques
 Classification
 Clustering
 Regression
 Association Rules
Classification
 Classification: Given a set of items that have several
classes, and given the past instances (training
instances) with their associated class, Classification
is the process of predicting the class of a new item.
 Therefore to classify the new item and identify to
which class it belongs
 Example: A bank wants to classify its Home Loan
Customers into groups according to their response to
bank advertisements. The bank might use the
classifications “Responds Rarely, Responds
Sometimes, Responds Frequently”.
 The bank will then attempt to find rules about the
customers that respond Frequently and Sometimes.
 The rules could be used to predict needs of potential
customers.
Technique for Classification
 Decision-Tree Classifiers
Job
Income
Job
Income Income
Carpenter
Engineer Doctor
Bad Good Bad Good Bad Good
<30K <40K <50K>50K >90K
>100K
Predicting credit risk of a person with the jobs specified.
Clustering
 “Clustering algorithms find groups of items
that are similar. … It divides a data set so that
records with similar content are in the same
group, and groups are as different as possible
from each other. ” (2)
 Example: Insurance company could use
clustering to group clients by their age,
location and types of insurance purchased.
 The categories are unspecified and this is
referred to as ‘unsupervised learning’
Clustering
 Group Data into Clusters
 Similar data is grouped in the same cluster
 Dissimilar data is grouped in the same cluster
 How is this achieved ?
 K-Nearest Neighbor
 A classification method that classifies a point
by calculating the distances between the
point and points in the training data set. Then
it assigns the point to the class that is most
common among its k-nearest neighbors
(where k is an integer).(2)
 Hierarchical
 Group data into t-trees
Regression
 “Regression deals with the prediction of a value,
rather than a class.” (1, P747)
 Example: Find out if there is a relationship
between smoking patients and cancer related
illness.
 Given values: X1, X2... Xn
 Objective predict variable Y
 One way is to predict coefficients a0, a1, a2
 Y = a0 + a1X1 + a2X2 + … anXn
 Linear Regression
Regression
 Example graph:
 Line of Best Fit
 Curve Fitting
Association Rules
 “An association algorithm creates rules that
describe how often events have occurred
together.” (2)
 Example: When a customer buys a hammer,
then 90% of the time they will buy nails.
Association Rules
 Support: “is a measure of what fraction of the
population satisfies both the antecedent and the
consequent of the rule”(1, p748)
 Example:
 People who buy hotdog buns also buy hotdog sausages in
99% of cases. = High Support
 People who buy hotdog buns buy hangers in 0.005% of
cases. = Low support
 Situations where there is high support for the
antecedent are worth careful attention
 E.g. Hotdog sausages should be placed in near hotdog buns
in supermarkets if there is also high confidence.
Association Rules
 Confidence: “is a measure of how often the consequent
is true when the antecedent is true.” (1, p748)
 Example:
 90% of Hotdog bun purchases are accompanied by hotdog
sausages.
 High confidence is meaningful as we can derive rules.
 Hotdog bun Hotdog sausage
 2 rules may have different confidence levels and
have the same support.
 E.g. Hotdog sausage  Hotdog bun may have a
much lower confidence than Hotdog bun  Hotdog
sausage yet they both can have the same support.
Advantages of Data Mining
 Provides new knowledge from existing data
 Public databases
 Government sources
 Company Databases
 Old data can be used to develop new knowledge
 New knowledge can be used to improve services or
products
 Improvements lead to:
 Bigger profits
 More efficient service
Uses of Data Mining
 Sales/ Marketing
 Diversify target market
 Identify clients needs to increase response rates
 Risk Assessment
 Identify Customers that pose high credit risk
 Fraud Detection
 Identify people misusing the system. E.g. People
who have two Social Security Numbers
 Customer Care
 Identify customers likely to change providers
 Identify customer needs
Applications of Data Mining
(4)
Source IDC 1998
Privacy Concerns
 Effective Data Mining requires large sources of data
 To achieve a wide spectrum of data, link multiple data
sources
 Linking sources leads can be problematic for privacy as
follows: If the following histories of a customer were
linked:
 Shopping History
 Credit History
 Bank History
 Employment History
 The users life story can be painted from the collected
data
References
1. Silberschatz, Korth, Sudarshan, “Database System
Concepts”, 5th
Edition, Mc Graw Hill, 2005
2. http://www.twocrows.com/glossary.htm, “Two Crows,
Data Mining Glossary”
3. http://en.wikipedia.org/wiki/Data_mining, “Wikipedia”
4. http://phoenix.phys.clemson.edu/tutorials/excel/regres
sion.html
5. http://wwwmaths.anu.edu.au/~steve/pdcn.pdf

More Related Content

What's hot

Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and MiningDaniel JACOB
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining Phi Jack
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsPromptCloud
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data MiningScottperrone
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining conceptsBasit Rafiq
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olapSalah Amean
 

What's hot (18)

Odam: Open Data, Access and Mining
Odam: Open Data, Access and MiningOdam: Open Data, Access and Mining
Odam: Open Data, Access and Mining
 
Introduction To Data Mining
Introduction To Data Mining   Introduction To Data Mining
Introduction To Data Mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Top Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their ApplicationsTop Data Mining Techniques and Their Applications
Top Data Mining Techniques and Their Applications
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Importance of Data Mining
Importance of Data MiningImportance of Data Mining
Importance of Data Mining
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data mining concepts
Data mining conceptsData mining concepts
Data mining concepts
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olapData Mining:  Concepts and Techniques (3rd ed.)— Chapter _04 olap
Data Mining: Concepts and Techniques (3rd ed.) — Chapter _04 olap
 
Data mining and knowledge Discovery
Data mining and knowledge DiscoveryData mining and knowledge Discovery
Data mining and knowledge Discovery
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 

Viewers also liked

О БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойО БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойПервый БИТ, Челябинск
 
Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Vincenzo Madonna
 
KnowAtlanta Spring 2006
KnowAtlanta Spring 2006KnowAtlanta Spring 2006
KnowAtlanta Spring 2006Michelle Bourg
 
Learn objective 2
Learn objective 2Learn objective 2
Learn objective 2Hantao Mai
 
Magazine Research
Magazine ResearchMagazine Research
Magazine Researchsreed5
 
მოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზემოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზეგელა გიორგი
 
Tipuri de flori
Tipuri de floriTipuri de flori
Tipuri de floriAna-Gri
 
Analysis about University Homepage
Analysis about University HomepageAnalysis about University Homepage
Analysis about University HomepageMena Govindasamy
 
Moje podjetje – moje sanje
Moje podjetje – moje sanje Moje podjetje – moje sanje
Moje podjetje – moje sanje Sabina Gosenca
 
D01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaD01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaVincenzo Madonna
 
Beauty is healths reward
Beauty is healths rewardBeauty is healths reward
Beauty is healths rewarddermnurse
 
Animalele
AnimaleleAnimalele
AnimaleleAna-Gri
 

Viewers also liked (20)

О БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службойО БИТ.CRM.Управление коммерческой службой
О БИТ.CRM.Управление коммерческой службой
 
Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010Business_International_3CPO_22-23 giugno 2010
Business_International_3CPO_22-23 giugno 2010
 
Apartments In Noida Extention
Apartments In Noida ExtentionApartments In Noida Extention
Apartments In Noida Extention
 
Flats Noida Extention
Flats Noida ExtentionFlats Noida Extention
Flats Noida Extention
 
writing sample redacted
writing sample redactedwriting sample redacted
writing sample redacted
 
KnowAtlanta Spring 2006
KnowAtlanta Spring 2006KnowAtlanta Spring 2006
KnowAtlanta Spring 2006
 
ლოცვა
ლოცვა ლოცვა
ლოცვა
 
Aironet
AironetAironet
Aironet
 
ლოცვის საზღაური
ლოცვის საზღაურილოცვის საზღაური
ლოცვის საზღაური
 
Learn objective 2
Learn objective 2Learn objective 2
Learn objective 2
 
Magazine Research
Magazine ResearchMagazine Research
Magazine Research
 
მოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზემოგზაურობა წმიდა მთაზე
მოგზაურობა წმიდა მთაზე
 
LCron_FinalPPP
LCron_FinalPPPLCron_FinalPPP
LCron_FinalPPP
 
Tipuri de flori
Tipuri de floriTipuri de flori
Tipuri de flori
 
Analysis about University Homepage
Analysis about University HomepageAnalysis about University Homepage
Analysis about University Homepage
 
Moje podjetje – moje sanje
Moje podjetje – moje sanje Moje podjetje – moje sanje
Moje podjetje – moje sanje
 
D01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-MadonnaD01_Gruppo-editoriale-lespresso-Madonna
D01_Gruppo-editoriale-lespresso-Madonna
 
Beauty is healths reward
Beauty is healths rewardBeauty is healths reward
Beauty is healths reward
 
Animalele
AnimaleleAnimalele
Animalele
 
კრება უფალთან
კრება უფალთანკრება უფალთან
კრება უფალთან
 

Similar to Data mining and its concepts

Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining Suman Chatterjee
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dssNiyitegekabilly
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningNofel Elahi
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Miningabdulraqeebalareqi1
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningDatamining Tools
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptxmaha797959
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligenceFaisal Aziz
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Techniqueijtsrd
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data miningRohit Kumar
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)yesheeka
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare DesignCarmen Martin
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining TechniquesSanzid Kawsar
 
Mayer_R_212017705
Mayer_R_212017705Mayer_R_212017705
Mayer_R_212017705Ryno Mayer
 

Similar to Data mining and its concepts (20)

Data mining
Data miningData mining
Data mining
 
Customer Profiling using Data Mining
Customer Profiling using Data Mining Customer Profiling using Data Mining
Customer Profiling using Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data mining techniques and dss
Data mining techniques and dssData mining techniques and dss
Data mining techniques and dss
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Mining Lec1.pptx
Data Mining Lec1.pptxData Mining Lec1.pptx
Data Mining Lec1.pptx
 
Data MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData MiningData MiningData MiningData MiningData Mining
Data MiningData MiningData MiningData Mining
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Data mining
Data miningData mining
Data mining
 
Data Mining based on Hashing Technique
Data Mining based on Hashing TechniqueData Mining based on Hashing Technique
Data Mining based on Hashing Technique
 
Data warehouse and data mining
Data warehouse and data miningData warehouse and data mining
Data warehouse and data mining
 
Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)Data mining 1 - Introduction (cheat sheet - printable)
Data mining 1 - Introduction (cheat sheet - printable)
 
Evidence Based Healthcare Design
Evidence Based Healthcare DesignEvidence Based Healthcare Design
Evidence Based Healthcare Design
 
Data mining-basic
Data mining-basicData mining-basic
Data mining-basic
 
Data Mining Techniques
Data Mining TechniquesData Mining Techniques
Data Mining Techniques
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Mayer_R_212017705
Mayer_R_212017705Mayer_R_212017705
Mayer_R_212017705
 

Recently uploaded

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

Data mining and its concepts

  • 1. Data Mining Prepared by R. Abhinav Bharadwaj
  • 2. Overview  Introduction  Explanation of Data Mining Techniques  Advantages  Applications  Privacy
  • 3. Data Mining  What is Data Mining?  “The process of semi automatically analyzing large databases to find useful patterns” (Silberschatz)  KDD – “Knowledge Discovery in Databases” (3)  “Attempts to discover rules and patterns from data”  Discover Rules  Make Predictions  Areas of Use  Internet – Discover needs of customers  Economics – Predict stock prices  Science – Predict environmental change  Medicine – Match patients with similar problems  cure
  • 4. Example of Data Mining  Credit Card Company wants to discover information about clients from databases. Want to find:  Clients who respond to promotions in “Junk Mail”  Clients that are likely to change to another competitor  Clients that are likely to not pay  Services that clients use to try to promote services affiliated with the Credit Card Company  Anything else that may help the Company provide/ promote services to help their clients and ultimately make more money.
  • 5. Data Mining & Data Warehousing  Data Warehouse: “is a repository (or archive) of information gathered from multiple sources, stored under a unified schema, at a single site.” (Silberschatz)  Collect data  Store in single repository  Allows for easier query development as a single repository can be queried.  Data Mining:  Analyzing databases or Data Warehouses to discover patterns about the data to gain knowledge.  Knowledge is power.
  • 7. Data Mining Techniques  Classification  Clustering  Regression  Association Rules
  • 8. Classification  Classification: Given a set of items that have several classes, and given the past instances (training instances) with their associated class, Classification is the process of predicting the class of a new item.  Therefore to classify the new item and identify to which class it belongs  Example: A bank wants to classify its Home Loan Customers into groups according to their response to bank advertisements. The bank might use the classifications “Responds Rarely, Responds Sometimes, Responds Frequently”.  The bank will then attempt to find rules about the customers that respond Frequently and Sometimes.  The rules could be used to predict needs of potential customers.
  • 9. Technique for Classification  Decision-Tree Classifiers Job Income Job Income Income Carpenter Engineer Doctor Bad Good Bad Good Bad Good <30K <40K <50K>50K >90K >100K Predicting credit risk of a person with the jobs specified.
  • 10. Clustering  “Clustering algorithms find groups of items that are similar. … It divides a data set so that records with similar content are in the same group, and groups are as different as possible from each other. ” (2)  Example: Insurance company could use clustering to group clients by their age, location and types of insurance purchased.  The categories are unspecified and this is referred to as ‘unsupervised learning’
  • 11. Clustering  Group Data into Clusters  Similar data is grouped in the same cluster  Dissimilar data is grouped in the same cluster  How is this achieved ?  K-Nearest Neighbor  A classification method that classifies a point by calculating the distances between the point and points in the training data set. Then it assigns the point to the class that is most common among its k-nearest neighbors (where k is an integer).(2)  Hierarchical  Group data into t-trees
  • 12. Regression  “Regression deals with the prediction of a value, rather than a class.” (1, P747)  Example: Find out if there is a relationship between smoking patients and cancer related illness.  Given values: X1, X2... Xn  Objective predict variable Y  One way is to predict coefficients a0, a1, a2  Y = a0 + a1X1 + a2X2 + … anXn  Linear Regression
  • 13. Regression  Example graph:  Line of Best Fit  Curve Fitting
  • 14. Association Rules  “An association algorithm creates rules that describe how often events have occurred together.” (2)  Example: When a customer buys a hammer, then 90% of the time they will buy nails.
  • 15. Association Rules  Support: “is a measure of what fraction of the population satisfies both the antecedent and the consequent of the rule”(1, p748)  Example:  People who buy hotdog buns also buy hotdog sausages in 99% of cases. = High Support  People who buy hotdog buns buy hangers in 0.005% of cases. = Low support  Situations where there is high support for the antecedent are worth careful attention  E.g. Hotdog sausages should be placed in near hotdog buns in supermarkets if there is also high confidence.
  • 16. Association Rules  Confidence: “is a measure of how often the consequent is true when the antecedent is true.” (1, p748)  Example:  90% of Hotdog bun purchases are accompanied by hotdog sausages.  High confidence is meaningful as we can derive rules.  Hotdog bun Hotdog sausage  2 rules may have different confidence levels and have the same support.  E.g. Hotdog sausage  Hotdog bun may have a much lower confidence than Hotdog bun  Hotdog sausage yet they both can have the same support.
  • 17. Advantages of Data Mining  Provides new knowledge from existing data  Public databases  Government sources  Company Databases  Old data can be used to develop new knowledge  New knowledge can be used to improve services or products  Improvements lead to:  Bigger profits  More efficient service
  • 18. Uses of Data Mining  Sales/ Marketing  Diversify target market  Identify clients needs to increase response rates  Risk Assessment  Identify Customers that pose high credit risk  Fraud Detection  Identify people misusing the system. E.g. People who have two Social Security Numbers  Customer Care  Identify customers likely to change providers  Identify customer needs
  • 19. Applications of Data Mining (4) Source IDC 1998
  • 20. Privacy Concerns  Effective Data Mining requires large sources of data  To achieve a wide spectrum of data, link multiple data sources  Linking sources leads can be problematic for privacy as follows: If the following histories of a customer were linked:  Shopping History  Credit History  Bank History  Employment History  The users life story can be painted from the collected data
  • 21. References 1. Silberschatz, Korth, Sudarshan, “Database System Concepts”, 5th Edition, Mc Graw Hill, 2005 2. http://www.twocrows.com/glossary.htm, “Two Crows, Data Mining Glossary” 3. http://en.wikipedia.org/wiki/Data_mining, “Wikipedia” 4. http://phoenix.phys.clemson.edu/tutorials/excel/regres sion.html 5. http://wwwmaths.anu.edu.au/~steve/pdcn.pdf