SlideShare a Scribd company logo
1 of 14
Data Mining Steps
Problem Definition
Market Analysis
Customer Profiling, Identifying Customer Requirements, Cross
Market Analysis, Target Marketing, Determining Customer
purchasing pattern
Corporate Analysis and Risk Management
Finance Planning and Asset Evaluation, Resource Planning,
Competition
Fraud Detection
Customer Retention
Production Control
Science Exploration
> Data Preparation
Data preparation is about constructing a dataset from one or
more data sources to be used for exploration and modeling. It is
a solid practice to start with an initial dataset to get familiar
with the data, to discover first insights into the data and have a
good understanding of any possible data quality issues. The
Datasets you are provided in these projects were obtained from
kaggle.com.
Variable selection and description
Numerical – Ratio, Interval
Categorical – Ordinal, Nominal
Simplifying variables: From continuous to discrete
Formatting the data
Basic data integrity checks: missing data, outliers
> Data Exploration
Data Exploration is about describing the data by means of
statistical and visualization techniques.
· Data Visualization:
o
Univariate
analysis explores variables (attributes) one by one. Variables
could be either categorical or numerical.
Univariate Analysis - Categorical
Statistics
Visualization
Description
Count
Bar Chart
The number of values of the specified variable.
Count%
Pie Chart
The percentage of values of the specified variable
Univariate Analysis - Numerical
Statistics
Visualization
Equation
Description
Count
Histogram
N
The number of values (observations) of the variable.
Minimum
Box Plot
Min
The smallest value of the variable.
Maximum
Box Plot
Max
The largest value of the variable.
Mean
Box Plot
The sum of the values divided by the count.
Median
Box Plot
The middle value. Below and above median lies an equal
number of values.
Mode
Histogram
The most frequent value. There can be more than one mode.
Quantile
Box Plot
A set of 'cut points' that divide a set of data into groups
containing equal numbers of values (Quartile, Quintile,
Percentile, ...).
Range
Box Plot
Max-Min
The difference between maximum and minimum.
Variance
Histogram
A measure of data dispersion.
Standard Deviation
Histogram
The square root of variance.
Coefficient of Deviation
Histogram
A measure of data dispersion divided by mean.
Skewness
Histogram
A measure of symmetry or asymmetry in the distribution of
data.
Kurtosis
Histogram
A measure of whether the data are peaked or flat relative to a
normal distribution.
Note: There are two types of numerical variables, interval and
ratio. An interval variable has values whose differences are
interpretable, but it does not have a true zero. A good example
is temperature in Centigrade degrees. Data on an interval scale
can be added and subtracted but cannot be meaningfully
multiplied or divided. For example, we cannot say that one day
is twice as hot as another day. In contrast, a ratio variable has
values with a true zero and can be added, subtracted, multiplied
or divided (e.g., weight).
o
Bivariate analysis
is the simultaneous analysis of two variables (attributes). It
explores the concept of relationship between two variables,
whether there exists an association and the strength of this
association.
There are three types of bivariate analysis.
1.Numerical & Numerical
ScMatter Plot, Linear Correlation …
2.Categorical & Categorical
Stacked Column Chart, Combination Chart, Chi-square Test
3.Numerical & Categorical
Line Chart with Error Bars, Combination Chart, Z-test and t-test
> Modeling
· Predictive modeling is the process by which a model is created
to predict an outcome
o If the outcome is categorical it is called
classification
and if the outcome is numerical it is called
regression
.
· Descriptive modeling or
clustering
is the assignment of observations into clusters so that
observations in the same cluster are similar.
· Finally,
a
ssociation rules
can find interesting associations amongst observations.
Classification algorithms:
Frequency Table
ZeroR
,
OneR
,
Naive Bayesian
,
Decision Tree
Covariance Matrix
Linear Discriminant Analysis
,
Logistic Regression
Similarity Functions
K Nearest Neighbors
Others
Artificial Neural Network
,
Support Vector Machine
Regression
Frequency Table
Decision Tree
Covariance Matrix
Multiple Linear Regression
Similarity Function
K Nearest Neighbors
Others
Artificial Neural Network
,
Support Vector Machine
Clustering algorithms are:
Hierarchical
Agglomerative
,
Divisive
Partitive
K Means
,
Self-Organizing Map
> Evaluation
· helps to find the best model that represents our data and how
well the chosen model will work in the future. Hold-Out and
Cross-Validation
> Deployment
The concept of deployment in predictive data mining refers to
the application of a model for prediction to new data.
<
Data Mining Steps Explained in 40 Characters

More Related Content

Similar to Data Mining Steps Explained in 40 Characters

Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningNandakumar P
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.pptSamPrem3
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.pptPalaniKumarR2
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxPETTIROSETALISIC
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptxMDPiasKhan
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Upstate CSCI 525 Data Mining Chapter 2
Upstate CSCI 525 Data Mining Chapter 2Upstate CSCI 525 Data Mining Chapter 2
Upstate CSCI 525 Data Mining Chapter 2DanWooster1
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfBensonNduati1
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsHarsh Parekh
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —Salah Amean
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 

Similar to Data Mining Steps Explained in 40 Characters (20)

Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
UNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data MiningUNIT 3: Data Warehousing and Data Mining
UNIT 3: Data Warehousing and Data Mining
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt20IT501_DWDM_PPT_Unit_IV.ppt
20IT501_DWDM_PPT_Unit_IV.ppt
 
Introduction to Descriptive Statistics
Introduction to Descriptive StatisticsIntroduction to Descriptive Statistics
Introduction to Descriptive Statistics
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
1234
12341234
1234
 
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptxMMW (Data Management)-Part 1 for ULO 2 (1).pptx
MMW (Data Management)-Part 1 for ULO 2 (1).pptx
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptx
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Upstate CSCI 525 Data Mining Chapter 2
Upstate CSCI 525 Data Mining Chapter 2Upstate CSCI 525 Data Mining Chapter 2
Upstate CSCI 525 Data Mining Chapter 2
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
QUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdfQUANTITATIVE METHODS NOTES.pdf
QUANTITATIVE METHODS NOTES.pdf
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
QQ Plot.pptx
QQ Plot.pptxQQ Plot.pptx
QQ Plot.pptx
 

More from sharondabriggs

There are numerous theories that attempt to explain the development .docx
There are numerous theories that attempt to explain the development .docxThere are numerous theories that attempt to explain the development .docx
There are numerous theories that attempt to explain the development .docxsharondabriggs
 
There are multifaceted ethical issues relating to international inve.docx
There are multifaceted ethical issues relating to international inve.docxThere are multifaceted ethical issues relating to international inve.docx
There are multifaceted ethical issues relating to international inve.docxsharondabriggs
 
There are multiple ways to bring threats and vulnerabilities to ligh.docx
There are multiple ways to bring threats and vulnerabilities to ligh.docxThere are multiple ways to bring threats and vulnerabilities to ligh.docx
There are multiple ways to bring threats and vulnerabilities to ligh.docxsharondabriggs
 
There are many kinds of input controls. Write a 4-5 page paper in wh.docx
There are many kinds of input controls. Write a 4-5 page paper in wh.docxThere are many kinds of input controls. Write a 4-5 page paper in wh.docx
There are many kinds of input controls. Write a 4-5 page paper in wh.docxsharondabriggs
 
There are many different types of tests that can be applied to an in.docx
There are many different types of tests that can be applied to an in.docxThere are many different types of tests that can be applied to an in.docx
There are many different types of tests that can be applied to an in.docxsharondabriggs
 
There are five general ethical topics and you are required to .docx
There are five general ethical topics and you are required to .docxThere are five general ethical topics and you are required to .docx
There are five general ethical topics and you are required to .docxsharondabriggs
 
There are eight elements of thought in reasoning. We often use mor.docx
There are eight elements of thought in reasoning. We often use mor.docxThere are eight elements of thought in reasoning. We often use mor.docx
There are eight elements of thought in reasoning. We often use mor.docxsharondabriggs
 
There are 16 questions on the exam 3 essay questions, 2 short answe.docx
There are 16 questions on the exam 3 essay questions, 2 short answe.docxThere are 16 questions on the exam 3 essay questions, 2 short answe.docx
There are 16 questions on the exam 3 essay questions, 2 short answe.docxsharondabriggs
 
There are 2 easy questions you need to answer, and i need 200 words .docx
There are 2 easy questions you need to answer, and i need 200 words .docxThere are 2 easy questions you need to answer, and i need 200 words .docx
There are 2 easy questions you need to answer, and i need 200 words .docxsharondabriggs
 
Theory Application Paper The theory application p.docx
Theory Application Paper The theory application p.docxTheory Application Paper The theory application p.docx
Theory Application Paper The theory application p.docxsharondabriggs
 
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docx
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docxTheory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docx
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docxsharondabriggs
 
Themed Research paper of a word minimum of 2000 words, which will b.docx
Themed Research paper of a word minimum of 2000 words, which will b.docxThemed Research paper of a word minimum of 2000 words, which will b.docx
Themed Research paper of a word minimum of 2000 words, which will b.docxsharondabriggs
 
Theme and Narrative Elements in the Short StoryIn two to four doub.docx
Theme and Narrative Elements in the Short StoryIn two to four doub.docxTheme and Narrative Elements in the Short StoryIn two to four doub.docx
Theme and Narrative Elements in the Short StoryIn two to four doub.docxsharondabriggs
 
Then write a 3-5 page paper on the doctrine that President Richard N.docx
Then write a 3-5 page paper on the doctrine that President Richard N.docxThen write a 3-5 page paper on the doctrine that President Richard N.docx
Then write a 3-5 page paper on the doctrine that President Richard N.docxsharondabriggs
 
Theodore Dalrymple How—and How Not—to Love Mankind A.docx
Theodore Dalrymple How—and How Not—to Love Mankind A.docxTheodore Dalrymple How—and How Not—to Love Mankind A.docx
Theodore Dalrymple How—and How Not—to Love Mankind A.docxsharondabriggs
 
The yellow highlighted  below is a question in the small online qu.docx
The yellow highlighted  below is a question in the small online qu.docxThe yellow highlighted  below is a question in the small online qu.docx
The yellow highlighted  below is a question in the small online qu.docxsharondabriggs
 
theme throughout this course has been that human and social services.docx
theme throughout this course has been that human and social services.docxtheme throughout this course has been that human and social services.docx
theme throughout this course has been that human and social services.docxsharondabriggs
 
THEMES IN HISTORY 1. Geographic Determinism on the course of.docx
THEMES IN HISTORY 1. Geographic Determinism on the course of.docxTHEMES IN HISTORY 1. Geographic Determinism on the course of.docx
THEMES IN HISTORY 1. Geographic Determinism on the course of.docxsharondabriggs
 
the zip is the webiste i have done so far. i just need addition elem.docx
the zip is the webiste i have done so far. i just need addition elem.docxthe zip is the webiste i have done so far. i just need addition elem.docx
the zip is the webiste i have done so far. i just need addition elem.docxsharondabriggs
 
The  growth, development, and learned behaviors that occur durin.docx
The  growth, development, and learned behaviors that occur durin.docxThe  growth, development, and learned behaviors that occur durin.docx
The  growth, development, and learned behaviors that occur durin.docxsharondabriggs
 

More from sharondabriggs (20)

There are numerous theories that attempt to explain the development .docx
There are numerous theories that attempt to explain the development .docxThere are numerous theories that attempt to explain the development .docx
There are numerous theories that attempt to explain the development .docx
 
There are multifaceted ethical issues relating to international inve.docx
There are multifaceted ethical issues relating to international inve.docxThere are multifaceted ethical issues relating to international inve.docx
There are multifaceted ethical issues relating to international inve.docx
 
There are multiple ways to bring threats and vulnerabilities to ligh.docx
There are multiple ways to bring threats and vulnerabilities to ligh.docxThere are multiple ways to bring threats and vulnerabilities to ligh.docx
There are multiple ways to bring threats and vulnerabilities to ligh.docx
 
There are many kinds of input controls. Write a 4-5 page paper in wh.docx
There are many kinds of input controls. Write a 4-5 page paper in wh.docxThere are many kinds of input controls. Write a 4-5 page paper in wh.docx
There are many kinds of input controls. Write a 4-5 page paper in wh.docx
 
There are many different types of tests that can be applied to an in.docx
There are many different types of tests that can be applied to an in.docxThere are many different types of tests that can be applied to an in.docx
There are many different types of tests that can be applied to an in.docx
 
There are five general ethical topics and you are required to .docx
There are five general ethical topics and you are required to .docxThere are five general ethical topics and you are required to .docx
There are five general ethical topics and you are required to .docx
 
There are eight elements of thought in reasoning. We often use mor.docx
There are eight elements of thought in reasoning. We often use mor.docxThere are eight elements of thought in reasoning. We often use mor.docx
There are eight elements of thought in reasoning. We often use mor.docx
 
There are 16 questions on the exam 3 essay questions, 2 short answe.docx
There are 16 questions on the exam 3 essay questions, 2 short answe.docxThere are 16 questions on the exam 3 essay questions, 2 short answe.docx
There are 16 questions on the exam 3 essay questions, 2 short answe.docx
 
There are 2 easy questions you need to answer, and i need 200 words .docx
There are 2 easy questions you need to answer, and i need 200 words .docxThere are 2 easy questions you need to answer, and i need 200 words .docx
There are 2 easy questions you need to answer, and i need 200 words .docx
 
Theory Application Paper The theory application p.docx
Theory Application Paper The theory application p.docxTheory Application Paper The theory application p.docx
Theory Application Paper The theory application p.docx
 
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docx
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docxTheory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docx
Theory-based Nutrition Education ProgramPart 1 Using your Unit 5 .docx
 
Themed Research paper of a word minimum of 2000 words, which will b.docx
Themed Research paper of a word minimum of 2000 words, which will b.docxThemed Research paper of a word minimum of 2000 words, which will b.docx
Themed Research paper of a word minimum of 2000 words, which will b.docx
 
Theme and Narrative Elements in the Short StoryIn two to four doub.docx
Theme and Narrative Elements in the Short StoryIn two to four doub.docxTheme and Narrative Elements in the Short StoryIn two to four doub.docx
Theme and Narrative Elements in the Short StoryIn two to four doub.docx
 
Then write a 3-5 page paper on the doctrine that President Richard N.docx
Then write a 3-5 page paper on the doctrine that President Richard N.docxThen write a 3-5 page paper on the doctrine that President Richard N.docx
Then write a 3-5 page paper on the doctrine that President Richard N.docx
 
Theodore Dalrymple How—and How Not—to Love Mankind A.docx
Theodore Dalrymple How—and How Not—to Love Mankind A.docxTheodore Dalrymple How—and How Not—to Love Mankind A.docx
Theodore Dalrymple How—and How Not—to Love Mankind A.docx
 
The yellow highlighted  below is a question in the small online qu.docx
The yellow highlighted  below is a question in the small online qu.docxThe yellow highlighted  below is a question in the small online qu.docx
The yellow highlighted  below is a question in the small online qu.docx
 
theme throughout this course has been that human and social services.docx
theme throughout this course has been that human and social services.docxtheme throughout this course has been that human and social services.docx
theme throughout this course has been that human and social services.docx
 
THEMES IN HISTORY 1. Geographic Determinism on the course of.docx
THEMES IN HISTORY 1. Geographic Determinism on the course of.docxTHEMES IN HISTORY 1. Geographic Determinism on the course of.docx
THEMES IN HISTORY 1. Geographic Determinism on the course of.docx
 
the zip is the webiste i have done so far. i just need addition elem.docx
the zip is the webiste i have done so far. i just need addition elem.docxthe zip is the webiste i have done so far. i just need addition elem.docx
the zip is the webiste i have done so far. i just need addition elem.docx
 
The  growth, development, and learned behaviors that occur durin.docx
The  growth, development, and learned behaviors that occur durin.docxThe  growth, development, and learned behaviors that occur durin.docx
The  growth, development, and learned behaviors that occur durin.docx
 

Recently uploaded

How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 

Data Mining Steps Explained in 40 Characters

  • 1. Data Mining Steps Problem Definition Market Analysis Customer Profiling, Identifying Customer Requirements, Cross Market Analysis, Target Marketing, Determining Customer purchasing pattern Corporate Analysis and Risk Management Finance Planning and Asset Evaluation, Resource Planning, Competition Fraud Detection Customer Retention Production Control Science Exploration > Data Preparation Data preparation is about constructing a dataset from one or more data sources to be used for exploration and modeling. It is a solid practice to start with an initial dataset to get familiar with the data, to discover first insights into the data and have a good understanding of any possible data quality issues. The Datasets you are provided in these projects were obtained from kaggle.com.
  • 2. Variable selection and description Numerical – Ratio, Interval Categorical – Ordinal, Nominal Simplifying variables: From continuous to discrete Formatting the data Basic data integrity checks: missing data, outliers > Data Exploration Data Exploration is about describing the data by means of statistical and visualization techniques. · Data Visualization: o Univariate analysis explores variables (attributes) one by one. Variables could be either categorical or numerical. Univariate Analysis - Categorical Statistics Visualization
  • 3. Description Count Bar Chart The number of values of the specified variable. Count% Pie Chart The percentage of values of the specified variable Univariate Analysis - Numerical Statistics Visualization Equation
  • 4. Description Count Histogram N The number of values (observations) of the variable. Minimum Box Plot Min The smallest value of the variable. Maximum Box Plot
  • 5. Max The largest value of the variable. Mean Box Plot The sum of the values divided by the count. Median Box Plot The middle value. Below and above median lies an equal number of values. Mode
  • 6. Histogram The most frequent value. There can be more than one mode. Quantile Box Plot A set of 'cut points' that divide a set of data into groups containing equal numbers of values (Quartile, Quintile, Percentile, ...). Range Box Plot Max-Min The difference between maximum and minimum.
  • 7. Variance Histogram A measure of data dispersion. Standard Deviation Histogram The square root of variance. Coefficient of Deviation Histogram A measure of data dispersion divided by mean.
  • 8. Skewness Histogram A measure of symmetry or asymmetry in the distribution of data. Kurtosis Histogram A measure of whether the data are peaked or flat relative to a normal distribution. Note: There are two types of numerical variables, interval and ratio. An interval variable has values whose differences are interpretable, but it does not have a true zero. A good example is temperature in Centigrade degrees. Data on an interval scale can be added and subtracted but cannot be meaningfully multiplied or divided. For example, we cannot say that one day is twice as hot as another day. In contrast, a ratio variable has values with a true zero and can be added, subtracted, multiplied or divided (e.g., weight). o
  • 9. Bivariate analysis is the simultaneous analysis of two variables (attributes). It explores the concept of relationship between two variables, whether there exists an association and the strength of this association. There are three types of bivariate analysis. 1.Numerical & Numerical ScMatter Plot, Linear Correlation … 2.Categorical & Categorical Stacked Column Chart, Combination Chart, Chi-square Test 3.Numerical & Categorical Line Chart with Error Bars, Combination Chart, Z-test and t-test > Modeling · Predictive modeling is the process by which a model is created to predict an outcome o If the outcome is categorical it is called classification and if the outcome is numerical it is called regression . · Descriptive modeling or clustering is the assignment of observations into clusters so that observations in the same cluster are similar.
  • 10. · Finally, a ssociation rules can find interesting associations amongst observations. Classification algorithms: Frequency Table ZeroR , OneR , Naive Bayesian , Decision Tree Covariance Matrix Linear Discriminant Analysis , Logistic Regression
  • 11. Similarity Functions K Nearest Neighbors Others Artificial Neural Network , Support Vector Machine Regression Frequency Table Decision Tree Covariance Matrix
  • 12. Multiple Linear Regression Similarity Function K Nearest Neighbors Others Artificial Neural Network , Support Vector Machine Clustering algorithms are: Hierarchical
  • 13. Agglomerative , Divisive Partitive K Means , Self-Organizing Map > Evaluation · helps to find the best model that represents our data and how well the chosen model will work in the future. Hold-Out and Cross-Validation > Deployment The concept of deployment in predictive data mining refers to the application of a model for prediction to new data. <