SlideShare a Scribd company logo
1 of 22
BASIC CONCEPT OF
CLASSIFICATION
DR. C.V. SURESH BABU
(CentreforKnowledgeTransfer)
institute
DATA MINING
Data mining in general terms
means mining or digging deep
into data that is in different
forms to gain patterns, and to
gain knowledge on that
pattern. In the process of data
mining, large data sets are first
sorted, then patterns are
identified and relationships are
established to perform data
analysis and solve problems.
(CentreforKnowledgeTransfer)
institute
CLASSIFICATION
It is a data analysis task, i.e. the process of finding a model that
describes and distinguishes data classes and concepts. Classification is
the problem of identifying to which of a set of categories
(subpopulations), a new observation belongs to, on the basis of a
training set of data containing observations and whose categories
membership is known.
Example: Before starting any project, we need to check its feasibility. In
this case, a classifier is required to predict class labels such as ‘Safe’
and ‘Risky’ for adopting the Project and to further approve it. It is a
two-step process such as :
1. Learning Step (Training Phase)
2. Classification Step
(CentreforKnowledgeTransfer)
institute
LEARNING STEP (TRAINING PHASE):
Construction of Classification Model
• Different Algorithms are used to
build a classifier by making the
model learn using the training set
available.
• The model has to be trained for the
prediction of accurate results.
(CentreforKnowledgeTransfer)
institute
CLASSIFICATION STEP
• Model used to predict class labels
and testing the constructed model
on test data and hence estimate
the accuracy of the classification
rules.
(CentreforKnowledgeTransfer)
institute
TRAINING AND TESTING:
• Suppose there is a person who is sitting under a fan and the fan
starts falling on him, he should get aside in order not to get hurt.
• So, this is his training part to move away.
• While Testing if the person sees any heavy object coming towards
him or falling on him and moves aside then the system is tested
positively and if the person does not move aside then the system is
negatively tested.
• Same is the case with the data, it should be trained in order to get
the accurate and best results.
(CentreforKnowledgeTransfer)
institute
CLASSIFICATION MODELS
• Classifiers can be categorized into two major types:
1. Discriminative
2. Generative
(CentreforKnowledgeTransfer)
institute
DISCRIMINATIVE
• It is a very basic classifier and determines just one class for
each row of data. It tries to model just by depending on the
observed data, depends heavily on the quality of data rather
than on distributions.
Example: Logistic Regression
Acceptance of a student at a University (Test and Grades need
to be considered)
(CentreforKnowledgeTransfer)
institute
GENERATIVE
• It models the distribution of individual classes and tries to learn the model that
generates the data behind the scenes by estimating assumptions and
distributions of the model. Used to predict the unseen data.
Example: Naive Bayes Classifier
• Detecting Spam emails by looking at the previous data.
• Suppose 100 emails and that too divided in 1:4 i.e. Class A: 25%(Spam emails) and Class B:
75%(Non-Spam emails).
• Now if a user wants to check that if an email contains the word cheap, then that may be
termed as Spam.
• It seems to be that in Class A(i.e. in 25% of data), 20 out of 25 emails are spam and rest
not.
• And in Class B(i.e. in 75% of data), 70 out of 75 emails are not spam and rest are spam.
• So, if the email contains the word cheap, what is the probability of it being spam ?? (= 80%)
(CentreforKnowledgeTransfer)
institute
CLASSIFIERS OF MACHINE LEARNING
• Decision Trees
• Bayesian Classifiers
• Neural Networks
• K-Nearest Neighbour
• Support Vector Machines
• Linear Regression
• Logistic Regression
(CentreforKnowledgeTransfer)
institute
ASSOCIATED TOOLS AND LANGUAGES
Used to mine/ extract useful information from raw data.
• Main Languages used: R, SAS, Python, SQL
• Major Tools used: RapidMiner, Orange, KNIME, Spark, Weka
• Libraries used: Jupyter, NumPy, Matplotlib, Pandas, ScikitLearn,
NLTK, TensorFlow, Seaborn, Basemap, etc.
(CentreforKnowledgeTransfer)
institute
REAL–LIFE EXAMPLES
• Market Basket Analysis:
It is a modeling technique that has been associated with frequent
transactions of buying some combination of items.
Example: Amazon and many other Retailers use this technique. While
viewing some products, certain suggestions for the commodities are shown
that some people have bought in the past.
• Weather Forecasting:
Changing Patterns in weather conditions needs to be observed based on
parameters such as temperature, humidity, wind direction. This keen
observation also requires the use of previous records in order to predict it
accurately.
(CentreforKnowledgeTransfer)
institute
ADVANTAGES
• Mining Based Methods are cost-effective and efficient
• Helps in identifying criminal suspects
• Helps in predicting the risk of diseases
• Helps Banks and Financial Institutions to identify defaulters so
that they may approve Cards, Loan, etc.
(CentreforKnowledgeTransfer)
institute
DISADVANTAGES
• Privacy: When the data is either are chances that a company
may give some information about their customers to other
vendors or use this information for their profit.
Accuracy Problem: Selection of Accurate model must be there in
order to get the best accuracy and result.
(CentreforKnowledgeTransfer)
institute
APPLICATIONS
• Marketing and Retailing
• Manufacturing
• Telecommunication Industry
• Intrusion Detection
• Education System
• Fraud Detection
(CentreforKnowledgeTransfer)
institute
SUMMARY
• Choosing the correct classification method, like decision trees,
Bayesian networks, or neural networks.
• Need a sample of data, where all class values are known. Then
the data will be divided into two parts, a training set, and a test
set.
(CentreforKnowledgeTransfer)
institute
ASSOCIATION RULES
• Association rules are "if-then" statements, that help to show
the probability of relationships between data items, within
large data sets in various types of databases. Association rule
mining has a number of applications and is widely used to help
discover sales correlations in transactional data or in medical
data sets.
(CentreforKnowledgeTransfer)
institute
MEDICINE
• Doctors can use association rules to help diagnose patients.
There are many variables to consider when making a diagnosis,
as many diseases share symptoms. By using association rules
and machine learning-fueled data analysis, doctors can
determine the conditional probability of a given illness by
comparing symptom relationships in the data from past cases.
As new diagnoses get made, the machine learning model can
adapt the rules to reflect the updated data.
(CentreforKnowledgeTransfer)
institute
RETAIL
• Retailers can collect data about purchasing patterns, recording
purchase data as item barcodes are scanned by point-of-sale
systems. Machine learning models can look for co-occurrence
in this data to determine which products are most likely to be
purchased together. The retailer can then adjust marketing and
sales strategy to take advantage of this information.
(CentreforKnowledgeTransfer)
institute
USER EXPERIENCE (UX) DESIGN
• Developers can collect data on how consumers use a website
they create. They can then use associations in the data to
optimize the website user interface -- by analyzing where
users tend to click and what maximizes the chance that they
engage with a call to action, for example.
(CentreforKnowledgeTransfer)
institute
ENTERTAINMENT
• Services like Netflix and Spotify can use association rules to
fuel their content recommendation engines. Machine learning
models analyze past user behavior data for frequent patterns,
develop association rules and use those rules to recommend
content that a user is likely to engage with, or organize content
in a way that is likely to put the most interesting content for a
given user first.
(CentreforKnowledgeTransfer)
institute
HOW ASSOCIATION RULES WORK
• Association rule mining, at a basic level, involves the use
of machine learning models to analyze data for patterns, or co-
occurrences, in a database. It identifies frequent if-then
associations, which themselves are the association rules.
• An association rule has two parts: an antecedent (if) and a
consequent (then). An antecedent is an item found within the
data. A consequent is an item found in combination with the
antecedent.

More Related Content

What's hot

What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?AbhayDhupar
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Ho Cao Viet
 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniquesOmar F. Althuwaynee
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12Mazhar Poohlah
 
What makes a good decision tree?
What makes a good decision tree?What makes a good decision tree?
What makes a good decision tree?Rupak Roy
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aRai University
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and StatisticsT.S. Lim
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretationVasista Vinuthan
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Mba2216 week 11 data analysis part 01
Mba2216 week 11 data analysis part 01Mba2216 week 11 data analysis part 01
Mba2216 week 11 data analysis part 01Stephen Ong
 

What's hot (20)

03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
What is Data analytics and it's importance ?
What is Data analytics and it's importance ?What is Data analytics and it's importance ?
What is Data analytics and it's importance ?
 
Data preparation
Data preparationData preparation
Data preparation
 
Classification
ClassificationClassification
Classification
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniques
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Data analysis
Data analysisData analysis
Data analysis
 
Research Method EMBA chapter 12
Research Method EMBA chapter 12Research Method EMBA chapter 12
Research Method EMBA chapter 12
 
What makes a good decision tree?
What makes a good decision tree?What makes a good decision tree?
What makes a good decision tree?
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Data Analysis and Statistics
Data Analysis and StatisticsData Analysis and Statistics
Data Analysis and Statistics
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Basics of data_interpretation
Basics of data_interpretationBasics of data_interpretation
Basics of data_interpretation
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Mba2216 week 11 data analysis part 01
Mba2216 week 11 data analysis part 01Mba2216 week 11 data analysis part 01
Mba2216 week 11 data analysis part 01
 

Similar to Basic Concept of Classification

Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!Khalid Salama
 
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfMA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfzm2pfgpcdt
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...Egyptian Engineers Association
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine LearningAnkit Rai
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfDanilo Cardona
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 

Similar to Basic Concept of Classification (20)

Data mining
Data miningData mining
Data mining
 
Data mining
Data miningData mining
Data mining
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Data Mining
Data MiningData Mining
Data Mining
 
Data mining
Data miningData mining
Data mining
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Data Mining - The Big Picture!
Data Mining - The Big Picture!Data Mining - The Big Picture!
Data Mining - The Big Picture!
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Data mining
Data miningData mining
Data mining
 
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdfMA- UNIT -1.pptx for ipu bba sem 5, complete pdf
MA- UNIT -1.pptx for ipu bba sem 5, complete pdf
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
لموعد الإثنين 03 يناير 2022 143 مبادرة #تواصل_تطوير المحاضرة ال 143 من المباد...
 
Supervised Machine Learning
Supervised Machine LearningSupervised Machine Learning
Supervised Machine Learning
 
data-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdfdata-science-lifecycle-ebook.pdf
data-science-lifecycle-ebook.pdf
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 

More from Dr. C.V. Suresh Babu

Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”Dr. C.V. Suresh Babu
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”Dr. C.V. Suresh Babu
 
A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”Dr. C.V. Suresh Babu
 
A study on the impact of data analytics in COVID-19 health care system
A study on the impact of data analytics in COVID-19 health care systemA study on the impact of data analytics in COVID-19 health care system
A study on the impact of data analytics in COVID-19 health care systemDr. C.V. Suresh Babu
 
A study on the impact of new normal in learning & teaching processes chem...
A study on the impact of new normal in learning & teaching processes chem...A study on the impact of new normal in learning & teaching processes chem...
A study on the impact of new normal in learning & teaching processes chem...Dr. C.V. Suresh Babu
 

More from Dr. C.V. Suresh Babu (20)

Data analytics with R
Data analytics with RData analytics with R
Data analytics with R
 
Association rules
Association rulesAssociation rules
Association rules
 
Blue property assumptions.
Blue property assumptions.Blue property assumptions.
Blue property assumptions.
 
DART
DARTDART
DART
 
Mycin
MycinMycin
Mycin
 
Expert systems
Expert systemsExpert systems
Expert systems
 
Dempster shafer theory
Dempster shafer theoryDempster shafer theory
Dempster shafer theory
 
Bayes network
Bayes networkBayes network
Bayes network
 
Bayes' theorem
Bayes' theoremBayes' theorem
Bayes' theorem
 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
 
Rule based system
Rule based systemRule based system
Rule based system
 
Formal Logic in AI
Formal Logic in AIFormal Logic in AI
Formal Logic in AI
 
Production based system
Production based systemProduction based system
Production based system
 
Game playing in AI
Game playing in AIGame playing in AI
Game playing in AI
 
Diagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AIDiagnosis test of diabetics and hypertension by AI
Diagnosis test of diabetics and hypertension by AI
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”A study on “impact of artificial intelligence in covid19 diagnosis”
A study on “impact of artificial intelligence in covid19 diagnosis”
 
A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”A study on “the impact of data analytics in covid 19 health care system”
A study on “the impact of data analytics in covid 19 health care system”
 
A study on the impact of data analytics in COVID-19 health care system
A study on the impact of data analytics in COVID-19 health care systemA study on the impact of data analytics in COVID-19 health care system
A study on the impact of data analytics in COVID-19 health care system
 
A study on the impact of new normal in learning & teaching processes chem...
A study on the impact of new normal in learning & teaching processes chem...A study on the impact of new normal in learning & teaching processes chem...
A study on the impact of new normal in learning & teaching processes chem...
 

Recently uploaded

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

Basic Concept of Classification

  • 2. (CentreforKnowledgeTransfer) institute DATA MINING Data mining in general terms means mining or digging deep into data that is in different forms to gain patterns, and to gain knowledge on that pattern. In the process of data mining, large data sets are first sorted, then patterns are identified and relationships are established to perform data analysis and solve problems.
  • 3. (CentreforKnowledgeTransfer) institute CLASSIFICATION It is a data analysis task, i.e. the process of finding a model that describes and distinguishes data classes and concepts. Classification is the problem of identifying to which of a set of categories (subpopulations), a new observation belongs to, on the basis of a training set of data containing observations and whose categories membership is known. Example: Before starting any project, we need to check its feasibility. In this case, a classifier is required to predict class labels such as ‘Safe’ and ‘Risky’ for adopting the Project and to further approve it. It is a two-step process such as : 1. Learning Step (Training Phase) 2. Classification Step
  • 4. (CentreforKnowledgeTransfer) institute LEARNING STEP (TRAINING PHASE): Construction of Classification Model • Different Algorithms are used to build a classifier by making the model learn using the training set available. • The model has to be trained for the prediction of accurate results.
  • 5. (CentreforKnowledgeTransfer) institute CLASSIFICATION STEP • Model used to predict class labels and testing the constructed model on test data and hence estimate the accuracy of the classification rules.
  • 6. (CentreforKnowledgeTransfer) institute TRAINING AND TESTING: • Suppose there is a person who is sitting under a fan and the fan starts falling on him, he should get aside in order not to get hurt. • So, this is his training part to move away. • While Testing if the person sees any heavy object coming towards him or falling on him and moves aside then the system is tested positively and if the person does not move aside then the system is negatively tested. • Same is the case with the data, it should be trained in order to get the accurate and best results.
  • 7. (CentreforKnowledgeTransfer) institute CLASSIFICATION MODELS • Classifiers can be categorized into two major types: 1. Discriminative 2. Generative
  • 8. (CentreforKnowledgeTransfer) institute DISCRIMINATIVE • It is a very basic classifier and determines just one class for each row of data. It tries to model just by depending on the observed data, depends heavily on the quality of data rather than on distributions. Example: Logistic Regression Acceptance of a student at a University (Test and Grades need to be considered)
  • 9. (CentreforKnowledgeTransfer) institute GENERATIVE • It models the distribution of individual classes and tries to learn the model that generates the data behind the scenes by estimating assumptions and distributions of the model. Used to predict the unseen data. Example: Naive Bayes Classifier • Detecting Spam emails by looking at the previous data. • Suppose 100 emails and that too divided in 1:4 i.e. Class A: 25%(Spam emails) and Class B: 75%(Non-Spam emails). • Now if a user wants to check that if an email contains the word cheap, then that may be termed as Spam. • It seems to be that in Class A(i.e. in 25% of data), 20 out of 25 emails are spam and rest not. • And in Class B(i.e. in 75% of data), 70 out of 75 emails are not spam and rest are spam. • So, if the email contains the word cheap, what is the probability of it being spam ?? (= 80%)
  • 10. (CentreforKnowledgeTransfer) institute CLASSIFIERS OF MACHINE LEARNING • Decision Trees • Bayesian Classifiers • Neural Networks • K-Nearest Neighbour • Support Vector Machines • Linear Regression • Logistic Regression
  • 11. (CentreforKnowledgeTransfer) institute ASSOCIATED TOOLS AND LANGUAGES Used to mine/ extract useful information from raw data. • Main Languages used: R, SAS, Python, SQL • Major Tools used: RapidMiner, Orange, KNIME, Spark, Weka • Libraries used: Jupyter, NumPy, Matplotlib, Pandas, ScikitLearn, NLTK, TensorFlow, Seaborn, Basemap, etc.
  • 12. (CentreforKnowledgeTransfer) institute REAL–LIFE EXAMPLES • Market Basket Analysis: It is a modeling technique that has been associated with frequent transactions of buying some combination of items. Example: Amazon and many other Retailers use this technique. While viewing some products, certain suggestions for the commodities are shown that some people have bought in the past. • Weather Forecasting: Changing Patterns in weather conditions needs to be observed based on parameters such as temperature, humidity, wind direction. This keen observation also requires the use of previous records in order to predict it accurately.
  • 13. (CentreforKnowledgeTransfer) institute ADVANTAGES • Mining Based Methods are cost-effective and efficient • Helps in identifying criminal suspects • Helps in predicting the risk of diseases • Helps Banks and Financial Institutions to identify defaulters so that they may approve Cards, Loan, etc.
  • 14. (CentreforKnowledgeTransfer) institute DISADVANTAGES • Privacy: When the data is either are chances that a company may give some information about their customers to other vendors or use this information for their profit. Accuracy Problem: Selection of Accurate model must be there in order to get the best accuracy and result.
  • 15. (CentreforKnowledgeTransfer) institute APPLICATIONS • Marketing and Retailing • Manufacturing • Telecommunication Industry • Intrusion Detection • Education System • Fraud Detection
  • 16. (CentreforKnowledgeTransfer) institute SUMMARY • Choosing the correct classification method, like decision trees, Bayesian networks, or neural networks. • Need a sample of data, where all class values are known. Then the data will be divided into two parts, a training set, and a test set.
  • 17. (CentreforKnowledgeTransfer) institute ASSOCIATION RULES • Association rules are "if-then" statements, that help to show the probability of relationships between data items, within large data sets in various types of databases. Association rule mining has a number of applications and is widely used to help discover sales correlations in transactional data or in medical data sets.
  • 18. (CentreforKnowledgeTransfer) institute MEDICINE • Doctors can use association rules to help diagnose patients. There are many variables to consider when making a diagnosis, as many diseases share symptoms. By using association rules and machine learning-fueled data analysis, doctors can determine the conditional probability of a given illness by comparing symptom relationships in the data from past cases. As new diagnoses get made, the machine learning model can adapt the rules to reflect the updated data.
  • 19. (CentreforKnowledgeTransfer) institute RETAIL • Retailers can collect data about purchasing patterns, recording purchase data as item barcodes are scanned by point-of-sale systems. Machine learning models can look for co-occurrence in this data to determine which products are most likely to be purchased together. The retailer can then adjust marketing and sales strategy to take advantage of this information.
  • 20. (CentreforKnowledgeTransfer) institute USER EXPERIENCE (UX) DESIGN • Developers can collect data on how consumers use a website they create. They can then use associations in the data to optimize the website user interface -- by analyzing where users tend to click and what maximizes the chance that they engage with a call to action, for example.
  • 21. (CentreforKnowledgeTransfer) institute ENTERTAINMENT • Services like Netflix and Spotify can use association rules to fuel their content recommendation engines. Machine learning models analyze past user behavior data for frequent patterns, develop association rules and use those rules to recommend content that a user is likely to engage with, or organize content in a way that is likely to put the most interesting content for a given user first.
  • 22. (CentreforKnowledgeTransfer) institute HOW ASSOCIATION RULES WORK • Association rule mining, at a basic level, involves the use of machine learning models to analyze data for patterns, or co- occurrences, in a database. It identifies frequent if-then associations, which themselves are the association rules. • An association rule has two parts: an antecedent (if) and a consequent (then). An antecedent is an item found within the data. A consequent is an item found in combination with the antecedent.