SlideShare a Scribd company logo
1 of 17
Data Mining
www.StudsPlanet.com
Agenda
 What is Data Mining?
 Data Mining Tasks
 Challenges in Data mining
www.StudsPlanet.com
What is Data Mining
 Data mining is integral part of knowledge
discovery in databases (KDD), which is the
overall process of converting raw data into
useful information. This process consists of
series of transformation steps from
preprocessing to postprocessing of data
mining results
www.StudsPlanet.com
Process of Knowledge
Discovery in Database(KDD)
Data
Preprocessing
Data Mining PostProcessing
Normalization.
Data subsetting
Normalization.
Data subsetting
Filtering
Patterns,Visualization,
Pattern Interpretation
Filtering
Patterns,Visualization,
Pattern Interpretation
Inputdata
Input
Data Information
www.StudsPlanet.com
Data Mining Tasks
 Data Mining is generally divided into two
tasks.
1. Predictive tasks
2. Descriptive tasks
www.StudsPlanet.com
Predictive Tasks
 Objective: Predict the value of a specific
attribute (target/dependent variable)based
on the value of other attributes
(explanatory).
Example: Judge if a patient has specific
disease based on his/her medical tests results.
www.StudsPlanet.com
Descriptive Tasks
 Objective: To derive patterns
(correlation,trends,trajectories) that
summarizes the underlying relationship
between data.
Example: Identifying web pages that are
accessed together.(human interpretable
pattern)
www.StudsPlanet.com
Data Mining Tasks [contd.]
 Classification [Predictive]
 Clustering [Descriptive]
 Association Rule Discovery[Descriptive]
 Sequential Pattern Discovery [Descriptive]
 Regression [Predictive]
 Deviation Detection [Predictive]
www.StudsPlanet.com
Classification: Definition
 Classification: Given a collection of records
 Each record contains a set of attributes, one of the
attribute is a class.
 Find a model for class attribute as a function of
values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
 A test set is used to determine the accuracy of the model.
Usually, the given data set is divided into training and
test sets, with training set used to build the model and
test set used to validate it.www.StudsPlanet.com
Classification: Example
 Direct Marketing
Goal: Reduce cost of mailing by targeting a set of
consumers likely to buy a new product.
 Approach:
 Use the data for a similar product introduced before.
 We know which customers decided to buy and which decided
otherwise. This {buy, don’t buy} decision forms the class
attribute.
 Collect various demographic, lifestyle, and company-interaction
related information about all such customers.
 Type of business, where they stay, how much they earn, etc.
 Use this information as input attributes to learn a classifier
model. (from Berry & Linoff, 1997)
www.StudsPlanet.com
Clustering: Definition
 Given a set of data points, each having a set
of attributes, and a similarity measure among
them, find clusters such that
 Data points in one cluster are more similar to one
another.
 Data points in separate clusters are less similar to
one another.
www.StudsPlanet.com
Clustering: Example
 Document Clustering:
 Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
 Approach: To identify frequently occurring terms in
each document. Form a similarity measure based on the
frequencies of different terms. Use it to cluster.
 Gain: Information Retrieval can utilize the clusters to
relate a new document or search term to clustered
documents.
www.StudsPlanet.com
Illustrating Document Clustering
Category Total
Articles
Correctly Placed
Financial 555 364
Foreign 341 260
National 273 36
Metro 943 746
Sports 738 573
Entertainment 354 278
Clustering Points: 3204 Articles Of Los Angles Times.
Similarity Measure: How Many words are common in these
documents. (after some word filtering) (Introduction to Data mining 2007)
www.StudsPlanet.com
Association Rule Discovery:
Definition
Given a set of records each of which contain some number of items
from a given collection;
Apriori principle: If an item set is frequent then its subset is also
frequent
TID Items
1 Bread, Coke Milk
2
3
Beer, Bread
Beer,Coke, Diaper, Milk
4 Beer, Bread, Diaper,
Milk
5 Coke, Diaper, Milk
Rule Discovered:
Milk -> Coke
Diaper, Milk -> Beer
www.StudsPlanet.com
Other Mining Tasks in Nutshell
 Sequential Pattern Discovery
In point-of-sale transaction sequences,
 Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) -->
(Perl_for_dummies,Tcl_Tk)
 Regression: Neural Networks
 Deviation Detection: Detect deviation from normal
behavior. Eg. Credit card fraud.
www.StudsPlanet.com
Challenges of Data Mining
 Scalability
 Dimensionality
 Complex and Heterogeneous Data
 Data Quality
 Data Ownership and Distribution
 Privacy Preservation
 Streaming Data
www.StudsPlanet.com
References
 Tan, P., Steinbach, M., & Kumar, V.,
Introduction to Data Mining. Addison
Wesley, 2006.
www.StudsPlanet.com

More Related Content

What's hot

1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lectureMahmoud Alfarra
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data MiningAmritanshu Mehra
 
Mc0088 data mining
Mc0088  data miningMc0088  data mining
Mc0088 data miningsmumbahelp
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining ProcessMarc Berman
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languagesMarcy Morales
 
Protection models
Protection modelsProtection models
Protection modelsG Prachi
 
Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)butest
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Aiswaryadevi Jaganmohan
 

What's hot (19)

1 Introduction to-data-mining lecture
1   Introduction to-data-mining lecture1   Introduction to-data-mining lecture
1 Introduction to-data-mining lecture
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Mc0088 data mining
Mc0088  data miningMc0088  data mining
Mc0088 data mining
 
Testing
TestingTesting
Testing
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
The 8 Step Data Mining Process
The 8 Step Data Mining ProcessThe 8 Step Data Mining Process
The 8 Step Data Mining Process
 
Data mining
Data miningData mining
Data mining
 
Data mining query languages
Data mining query languagesData mining query languages
Data mining query languages
 
Kdd process
Kdd processKdd process
Kdd process
 
Talk
TalkTalk
Talk
 
Protection models
Protection modelsProtection models
Protection models
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
3 Data Mining Tasks
3  Data Mining Tasks3  Data Mining Tasks
3 Data Mining Tasks
 
G045033841
G045033841G045033841
G045033841
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)Data Mining with JDM API by Regina Wang (4/11)
Data Mining with JDM API by Regina Wang (4/11)
 
Data Cleaning Techniques
Data Cleaning TechniquesData Cleaning Techniques
Data Cleaning Techniques
 
Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641Data mining-primitives-languages-and-system-architectures2641
Data mining-primitives-languages-and-system-architectures2641
 

Viewers also liked

E commerce 2008 section-c
E commerce 2008 section-cE commerce 2008 section-c
E commerce 2008 section-cStudsPlanet.com
 
Graphic narrative evidence task 2
Graphic narrative evidence task 2 Graphic narrative evidence task 2
Graphic narrative evidence task 2 OliviaBolt
 
Safe surf parent flyer revised
Safe surf parent flyer revisedSafe surf parent flyer revised
Safe surf parent flyer revisedGemey McNabb
 
didactica de la quimica elaborado por: patricia sanchez
didactica de la quimica elaborado por: patricia sanchezdidactica de la quimica elaborado por: patricia sanchez
didactica de la quimica elaborado por: patricia sanchezpatys2015
 
Sales Manager Questions
Sales Manager QuestionsSales Manager Questions
Sales Manager QuestionsSalesLoft
 
motionQR Updated Overview
motionQR Updated OverviewmotionQR Updated Overview
motionQR Updated OverviewmotionQR
 
Speciale salute: Sistemi iperpolarizzazione gas
Speciale salute: Sistemi iperpolarizzazione gasSpeciale salute: Sistemi iperpolarizzazione gas
Speciale salute: Sistemi iperpolarizzazione gasApulian ICT Living Labs
 
Letter of Recommendation Hassan Jukhadar
Letter of Recommendation Hassan JukhadarLetter of Recommendation Hassan Jukhadar
Letter of Recommendation Hassan Jukhadarjukhadar
 
Presentation18iughu79
Presentation18iughu79Presentation18iughu79
Presentation18iughu79ibeeliyah
 
Introducción
IntroducciónIntroducción
IntroducciónGrupo6ma
 

Viewers also liked (20)

Custom clearance
Custom clearanceCustom clearance
Custom clearance
 
Employee motivation
Employee motivationEmployee motivation
Employee motivation
 
Human environment
Human environmentHuman environment
Human environment
 
Derivatives
DerivativesDerivatives
Derivatives
 
Forex
ForexForex
Forex
 
E commerce 2008 section-c
E commerce 2008 section-cE commerce 2008 section-c
E commerce 2008 section-c
 
Factor influencing ihrm
Factor influencing ihrmFactor influencing ihrm
Factor influencing ihrm
 
Graphic narrative evidence task 2
Graphic narrative evidence task 2 Graphic narrative evidence task 2
Graphic narrative evidence task 2
 
Safe surf parent flyer revised
Safe surf parent flyer revisedSafe surf parent flyer revised
Safe surf parent flyer revised
 
Modulo unidad #2
Modulo unidad #2Modulo unidad #2
Modulo unidad #2
 
didactica de la quimica elaborado por: patricia sanchez
didactica de la quimica elaborado por: patricia sanchezdidactica de la quimica elaborado por: patricia sanchez
didactica de la quimica elaborado por: patricia sanchez
 
Horno Siemens HB676G5S1
Horno Siemens HB676G5S1Horno Siemens HB676G5S1
Horno Siemens HB676G5S1
 
Presentation_NEW.PPTX
Presentation_NEW.PPTXPresentation_NEW.PPTX
Presentation_NEW.PPTX
 
Link chemical oil&gas 2013
Link chemical oil&gas 2013Link chemical oil&gas 2013
Link chemical oil&gas 2013
 
Sales Manager Questions
Sales Manager QuestionsSales Manager Questions
Sales Manager Questions
 
motionQR Updated Overview
motionQR Updated OverviewmotionQR Updated Overview
motionQR Updated Overview
 
Speciale salute: Sistemi iperpolarizzazione gas
Speciale salute: Sistemi iperpolarizzazione gasSpeciale salute: Sistemi iperpolarizzazione gas
Speciale salute: Sistemi iperpolarizzazione gas
 
Letter of Recommendation Hassan Jukhadar
Letter of Recommendation Hassan JukhadarLetter of Recommendation Hassan Jukhadar
Letter of Recommendation Hassan Jukhadar
 
Presentation18iughu79
Presentation18iughu79Presentation18iughu79
Presentation18iughu79
 
Introducción
IntroducciónIntroducción
Introducción
 

Similar to Data mining

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the studyanjanishah774
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptDEEPAK948083
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseKartik Kalpande Patil
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxParvathyparu25
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptxayush309565
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining Sushil Kulkarni
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesFellowBuddy.com
 
File 498 Doc 4 01 Dm Intro To Dm
File 498 Doc 4 01 Dm Intro To DmFile 498 Doc 4 01 Dm Intro To Dm
File 498 Doc 4 01 Dm Intro To Dmmupa
 

Similar to Data mining (20)

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
data mining presentation power point for the study
data mining presentation power point for the studydata mining presentation power point for the study
data mining presentation power point for the study
 
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.pptlect1lect1lect1lect1lect1lect1lect1lect1.ppt
lect1lect1lect1lect1lect1lect1lect1lect1.ppt
 
lect1.ppt
lect1.pptlect1.ppt
lect1.ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
data mining
data miningdata mining
data mining
 
Introduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in DatabaseIntroduction-to-Knowledge Discovery in Database
Introduction-to-Knowledge Discovery in Database
 
Data mining
Data miningData mining
Data mining
 
Data-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptxData-Mining-ppt (1).pptx
Data-Mining-ppt (1).pptx
 
Data-Mining-ppt.pptx
Data-Mining-ppt.pptxData-Mining-ppt.pptx
Data-Mining-ppt.pptx
 
data.2.pptx
data.2.pptxdata.2.pptx
data.2.pptx
 
D M1
D M1D M1
D M1
 
Testing
TestingTesting
Testing
 
Data Mining
Data MiningData Mining
Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Introduction to Data Mining
Introduction to Data Mining Introduction to Data Mining
Introduction to Data Mining
 
Data Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture NotesData Mining & Data Warehousing Lecture Notes
Data Mining & Data Warehousing Lecture Notes
 
Part1
Part1Part1
Part1
 
File 498 Doc 4 01 Dm Intro To Dm
File 498 Doc 4 01 Dm Intro To DmFile 498 Doc 4 01 Dm Intro To Dm
File 498 Doc 4 01 Dm Intro To Dm
 

More from StudsPlanet.com

Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule miningStudsPlanet.com
 
Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule miningStudsPlanet.com
 
Face recognition using laplacianfaces
Face recognition using laplacianfaces Face recognition using laplacianfaces
Face recognition using laplacianfaces StudsPlanet.com
 
Face recognition using laplacianfaces
Face recognition using laplacianfaces Face recognition using laplacianfaces
Face recognition using laplacianfaces StudsPlanet.com
 
Worldwide market and trends for electronic manufacturing services
Worldwide market and trends for electronic manufacturing servicesWorldwide market and trends for electronic manufacturing services
Worldwide market and trends for electronic manufacturing servicesStudsPlanet.com
 
World electronic industry 2008
World electronic industry 2008World electronic industry 2008
World electronic industry 2008StudsPlanet.com
 
Trompenaars cultural dimensions
Trompenaars cultural dimensionsTrompenaars cultural dimensions
Trompenaars cultural dimensionsStudsPlanet.com
 
The building of the toyota car factory
The building of the toyota car factoryThe building of the toyota car factory
The building of the toyota car factoryStudsPlanet.com
 
The International legal environment of business
The International legal environment of businessThe International legal environment of business
The International legal environment of businessStudsPlanet.com
 
Roles of strategic leaders
Roles  of  strategic  leadersRoles  of  strategic  leaders
Roles of strategic leadersStudsPlanet.com
 
Resolution of intl commr disputes
Resolution of intl commr disputesResolution of intl commr disputes
Resolution of intl commr disputesStudsPlanet.com
 
Presentation on india's ftp
Presentation on india's ftpPresentation on india's ftp
Presentation on india's ftpStudsPlanet.com
 

More from StudsPlanet.com (20)

Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule mining
 
Hardware enhanced association rule mining
Hardware enhanced association rule miningHardware enhanced association rule mining
Hardware enhanced association rule mining
 
Face recognition using laplacianfaces
Face recognition using laplacianfaces Face recognition using laplacianfaces
Face recognition using laplacianfaces
 
Face recognition using laplacianfaces
Face recognition using laplacianfaces Face recognition using laplacianfaces
Face recognition using laplacianfaces
 
Worldwide market and trends for electronic manufacturing services
Worldwide market and trends for electronic manufacturing servicesWorldwide market and trends for electronic manufacturing services
Worldwide market and trends for electronic manufacturing services
 
World electronic industry 2008
World electronic industry 2008World electronic industry 2008
World electronic industry 2008
 
Weberian model
Weberian modelWeberian model
Weberian model
 
Value orientation model
Value orientation modelValue orientation model
Value orientation model
 
Value orientation model
Value orientation modelValue orientation model
Value orientation model
 
Uk intellectual model
Uk intellectual modelUk intellectual model
Uk intellectual model
 
Trompenaars cultural dimensions
Trompenaars cultural dimensionsTrompenaars cultural dimensions
Trompenaars cultural dimensions
 
The building of the toyota car factory
The building of the toyota car factoryThe building of the toyota car factory
The building of the toyota car factory
 
The International legal environment of business
The International legal environment of businessThe International legal environment of business
The International legal environment of business
 
Textile Industry
Textile IndustryTextile Industry
Textile Industry
 
Sales
SalesSales
Sales
 
Roles of strategic leaders
Roles  of  strategic  leadersRoles  of  strategic  leaders
Roles of strategic leaders
 
Role of ecgc
Role of ecgcRole of ecgc
Role of ecgc
 
Resolution of intl commr disputes
Resolution of intl commr disputesResolution of intl commr disputes
Resolution of intl commr disputes
 
Presentation on india's ftp
Presentation on india's ftpPresentation on india's ftp
Presentation on india's ftp
 
Players in ib
Players in ibPlayers in ib
Players in ib
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Data mining

  • 2. Agenda  What is Data Mining?  Data Mining Tasks  Challenges in Data mining www.StudsPlanet.com
  • 3. What is Data Mining  Data mining is integral part of knowledge discovery in databases (KDD), which is the overall process of converting raw data into useful information. This process consists of series of transformation steps from preprocessing to postprocessing of data mining results www.StudsPlanet.com
  • 4. Process of Knowledge Discovery in Database(KDD) Data Preprocessing Data Mining PostProcessing Normalization. Data subsetting Normalization. Data subsetting Filtering Patterns,Visualization, Pattern Interpretation Filtering Patterns,Visualization, Pattern Interpretation Inputdata Input Data Information www.StudsPlanet.com
  • 5. Data Mining Tasks  Data Mining is generally divided into two tasks. 1. Predictive tasks 2. Descriptive tasks www.StudsPlanet.com
  • 6. Predictive Tasks  Objective: Predict the value of a specific attribute (target/dependent variable)based on the value of other attributes (explanatory). Example: Judge if a patient has specific disease based on his/her medical tests results. www.StudsPlanet.com
  • 7. Descriptive Tasks  Objective: To derive patterns (correlation,trends,trajectories) that summarizes the underlying relationship between data. Example: Identifying web pages that are accessed together.(human interpretable pattern) www.StudsPlanet.com
  • 8. Data Mining Tasks [contd.]  Classification [Predictive]  Clustering [Descriptive]  Association Rule Discovery[Descriptive]  Sequential Pattern Discovery [Descriptive]  Regression [Predictive]  Deviation Detection [Predictive] www.StudsPlanet.com
  • 9. Classification: Definition  Classification: Given a collection of records  Each record contains a set of attributes, one of the attribute is a class.  Find a model for class attribute as a function of values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible.  A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.www.StudsPlanet.com
  • 10. Classification: Example  Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new product.  Approach:  Use the data for a similar product introduced before.  We know which customers decided to buy and which decided otherwise. This {buy, don’t buy} decision forms the class attribute.  Collect various demographic, lifestyle, and company-interaction related information about all such customers.  Type of business, where they stay, how much they earn, etc.  Use this information as input attributes to learn a classifier model. (from Berry & Linoff, 1997) www.StudsPlanet.com
  • 11. Clustering: Definition  Given a set of data points, each having a set of attributes, and a similarity measure among them, find clusters such that  Data points in one cluster are more similar to one another.  Data points in separate clusters are less similar to one another. www.StudsPlanet.com
  • 12. Clustering: Example  Document Clustering:  Goal: To find groups of documents that are similar to each other based on the important terms appearing in them.  Approach: To identify frequently occurring terms in each document. Form a similarity measure based on the frequencies of different terms. Use it to cluster.  Gain: Information Retrieval can utilize the clusters to relate a new document or search term to clustered documents. www.StudsPlanet.com
  • 13. Illustrating Document Clustering Category Total Articles Correctly Placed Financial 555 364 Foreign 341 260 National 273 36 Metro 943 746 Sports 738 573 Entertainment 354 278 Clustering Points: 3204 Articles Of Los Angles Times. Similarity Measure: How Many words are common in these documents. (after some word filtering) (Introduction to Data mining 2007) www.StudsPlanet.com
  • 14. Association Rule Discovery: Definition Given a set of records each of which contain some number of items from a given collection; Apriori principle: If an item set is frequent then its subset is also frequent TID Items 1 Bread, Coke Milk 2 3 Beer, Bread Beer,Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk Rule Discovered: Milk -> Coke Diaper, Milk -> Beer www.StudsPlanet.com
  • 15. Other Mining Tasks in Nutshell  Sequential Pattern Discovery In point-of-sale transaction sequences,  Computer Bookstore: (Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)  Regression: Neural Networks  Deviation Detection: Detect deviation from normal behavior. Eg. Credit card fraud. www.StudsPlanet.com
  • 16. Challenges of Data Mining  Scalability  Dimensionality  Complex and Heterogeneous Data  Data Quality  Data Ownership and Distribution  Privacy Preservation  Streaming Data www.StudsPlanet.com
  • 17. References  Tan, P., Steinbach, M., & Kumar, V., Introduction to Data Mining. Addison Wesley, 2006. www.StudsPlanet.com

Editor's Notes

  1. .