The document discusses classification and prediction in data mining. Classification involves using a model or classifier to predict categorical labels for data. It is a two-step process of learning classification rules from training data and then applying those rules to classify new test data. Prediction predicts continuous valued attributes rather than categorical labels. Decision tree induction is a common classification technique that builds a tree structure with internal nodes representing attribute tests and leaf nodes holding class predictions. It selects the attribute that best splits the training data at each step to build the tree top-down. An example shows building a decision tree to predict a drug class from patient data.
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
It is a data mining technique used to place the data elements into their related groups. Clustering is the process of partitioning the data (or objects) into the same class, The data in one class is more similar to each other than to those in other cluster.
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
Data Mining: Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingSalah Amean
the chapter contains :
Data Preprocessing: An Overview,
Data Quality,
Major Tasks in Data Preprocessing,
Data Cleaning,
Data Integration,
Data Reduction,
Data Transformation and Data Discretization,
Summary.
This presentation gives the idea about Data Preprocessing in the field of Data Mining. Images, examples and other things are adopted from "Data Mining Concepts and Techniques by Jiawei Han, Micheline Kamber and Jian Pei "
Jiawei Han, Micheline Kamber and Jian Pei
Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems
Morgan Kaufmann Publishers, July 2011. ISBN 978-0123814791
The project aim is to provide real-time knowledge for all the students who have basic knowledge knowledge of Salesforce and Looking for a real-time project. This project will also help to those professionals who are in cross-technology and wanted to switch to Salesforce with the help of this project they will gain knowledge and can include into their resume
The PPT describes following contents
What is process?
Scheduling Criteria
Types of schedulers
Process Scheduling algorithms along with examples.
Threads
Multithreading
User thread
kernel thread
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
3. What is Classification?
• In Classification, a model or classifier is constructed to predict categorical labels
• Data classification is a two-step process
1. Learning 2. Classification
Training
Data
Classification
algorithm
Classification
rules
Test Data
New Data
(unknown
class label)
Class Label
Data Mining: Classification and Prediction 3
4. What is Classification?
• Learning step:
• A classification algorithm builds the classifier by analyzing or “learning from” a training
set made up of database tuples and their associated class labels.
• The individual tuples making up the training set are referred to as training tuples.
• Data tuples can be referred to as samples, examples, instances, data points, or objects
• This is supervised learning step
• The class label of each training tuple is provided
• This process can be viewed as the learning of a mapping or function y = 𝑓(𝑥)
• Predicts the associated class label 𝑦 of a given tuple 𝑋
• This mapping is represented in the form of classification rules, decision trees, or
mathematical formulae
Data Mining: Classification and Prediction 4
5. What is Classification?
• Classification Step
• The model is used for classification.
• A test set is used, made up of test tuples and their associated class labels.
• Randomly selected tuples from the general data set
• The accuracy of a classifier on a given test set is the percentage of test set tuples that
are correctly classified by the classifier.
• The associated class label of each test tuple is compared with the learned classifier’s class
prediction for that tuple.
• If the accuracy of the classifier is considered acceptable, the classifier can be used to
classify future data tuples for which the class label is not known.
Data Mining: Classification and Prediction 5
7. What is Prediction?
• Data prediction is a two step process similar to data classification
• There is no class attribute
• Because attribute values to be predicted are continuous-valued (ordered) rather than
categorical (discrete-valued)
• Predicted attribute
• Prediction can also be viewed as a mapping or function y = 𝑓(𝑥)
Data Mining: Classification and Prediction 7
8. How classification and prediction are
different?
• Data classification classifies categorical attribute values
• Data prediction predicts continuous-valued attribute value
• Testing data is used to assess accuracy of a classifier
• The accuracy of a predictor is estimated by computing an error based on the
difference between the predicted value and the actual known value of y for each
of the test tuples, X
Data Mining: Classification and Prediction 8
9. Decision Tree Induction
• It is the learning of decision trees from class-labeled training tuples.
• A decision tree is a flowchart-like tree structure, where
• Each internal node (non-leaf node) denotes a test on an attribute,
• Each branch represents an outcome of the test, and
• Each leaf node (or terminal node) holds a class label.
• The topmost node in a tree is the root node
• Is a person fit?????
• Binary decision tree
• Non-binary decision tree
Data Mining: Classification and Prediction 9
Age <30?
Eats lots of
pizza?
Exercises
daily?
Fit Fit
Unfit! Unfit!
Yes No
Yes Yes
No No
Fig. Decision Tree for the concept being fit
10. Decision Tree Induction
• How are decision trees used for classification?
• Given a tuple, X, for which the associated class label is unknown, the attribute values of
the tuple are tested against the decision tree.
• A path is traced from the root to a leaf node, which holds the class prediction for that
tuple.
• Advantages of decision trees:
• Does not require any domain knowledge or parameter setting
• Can handle high dimensional data.
• The learning and classification steps of decision tree induction are simple and fast
• Have good accuracy
Data Mining: Classification and Prediction 10
11. Decision Tree Induction
• Attribute selection measures
• Used to select the attribute that best partitions the tuples into distinct
classes.
• Information gain, Gain Ratio, Gini Index
• A decision tree algorithm is known as ID3 (Iterative Dichotomiser).
• C4.5 algorithm (successor of ID3) benchmark to newer supervised
learning algorithms
• Classification and Regression Trees (CART)
• Adopt a greedy (i.e., nonbacktracking) approach in which decision trees
are constructed in a top-down recursive divide-and-conquer manner
Data Mining: Classification and Prediction 11
Info Gain
Gain Ratio
Gini Index
12. Decision Tree Induction
• Information Gain
• ID3 uses information gain as its attribute selection measure
• The expected information needed to classify a tuple in 𝐷 is given by
𝑰𝒏𝒇𝒐 𝑫 = −
𝒊=𝟏
𝒎
𝒑𝒊 𝐥𝐨𝐠𝟐( 𝒑𝒊)
• where 𝑝𝑖 is the probability that an arbitrary tuple in 𝐷 belongs to class 𝐶𝑖,estimated as ൗ
𝐶𝑖,𝐷
𝐷
• Info(D) is also known as the entropy of D.
• The expected information required to classify a tuple from 𝐷 based on the partitioning by
attribute 𝐴.
𝑰𝒏𝒇𝒐𝑨 𝑫 =
𝒋=𝟏
𝒗
𝑫𝒋
𝑫
× 𝑰𝒏𝒇𝒐(𝑫𝒋)
Data Mining: Classification and Prediction 12
13. Decision Tree Induction
• Information Gain
• Information gain is defined as the difference between the original information
requirement and the new requirement
𝑮𝒂𝒊𝒏 𝑨 = 𝑰𝒏𝒇𝒐 𝑫 − 𝑰𝒏𝒇𝒐𝑨 𝑫
• The attribute 𝐴 with the highest information gain, 𝑮𝒂𝒊𝒏(𝑨), is chosen as the splitting
attribute at node 𝑁.
Data Mining: Classification and Prediction 13
14. Decision Tree Induction
• Decision Tree Generation Algorithm
• Input:
• Data partition, D,
which is a set of training tuples and their associated class labels;
• Attribute_list,
the set of candidate attributes;
• Attribute_selection_method,
a procedure to determine the splitting criterion that “best” partitions the data tuples into
individual classes. This criterion consists of a splitting attribute and, possibly, either a split point
or splitting subset.
Data Mining: Classification and Prediction 14
15. Decision Tree Induction
• Generate_decision_tree Algorithm
• Method
1. create a node N;
2. if tuples in D are all of the same class, C then
3. return N as a leaf node labeled with the class C;
4. if Attribute_list is empty then
5. return N as a leaf node labeled with the majority class in D;
6. apply Attribute_selection_method(D, Attribute_list) to find the “best” splitting
criterion;
7. label node N with splitting criterion;
Data Mining: Classification and Prediction 15
16. Decision Tree Induction
• Decision Tree Generation Algorithm
• Method
8. if splitting_attribute is discrete-valued and multiway splits allowed then
9. Attribute_list ← Attribute_list − Splitting_attribute; // remove splitting attribute
10. for each outcome j of splitting criterion
11. let Dj be the set of data tuples in D satisfying outcome j; // a partition
12. if Dj is empty then
13. attach a leaf labeled with the majority class in D to node N;
14. else attach the node returned by Generate_decision_tree(Dj, attribute_list) to node N;
15. End for
16. return N;
Data Mining: Classification and Prediction 16
17. Example
Data Mining: Classification and Prediction 17
Patient ID Age Sex BP Cholesterol Class: Drug
P1 <=30 F High Normal Drug A
P2 <=30 F High High Drug A
P3 31…50 F High Normal Drug B
P4 >50 F Normal Normal Drug B
P5 >50 M Low Normal Drug B
P6 >50 M Low High Drug A
P7 31…50 M Low High Drug B
P8 <=30 F Normal Normal Drug A
P9 <=30 M Low Normal Drug B
P10 >50 M Normal Normal Drug B
P11 <=30 M Normal High Drug B
P12 31…50 F Normal High Drug B
P13 31…50 M High Normal Drug B
P14 >50 F Normal High Drug A
P15 31…50 F Low Normal ?
18. Example
• Reduced Training Data
• Establish the target classification
Which Drug to advice???
• 5/14 → Drug A
• 9/14 → Drug B
Data Mining: Classification and Prediction 18
Age Gender BP Cholesterol Class: Drug
<=30 F High Normal Drug A
<=30 F High High Drug A
31…50 F High Normal Drug B
>50 F Normal Normal Drug B
>50 M Low Normal Drug B
>50 M Low High Drug A
31…50 M Low High Drug B
<=30 F Normal Normal Drug A
<=30 M Low Normal Drug B
>50 M Normal Normal Drug B
<=30 M Normal High Drug B
31…50 F Normal High Drug B
31…50 M High Normal Drug B
>50 F Normal High Drug A
19. Example
• Calculate Information gain of class attribute: Drug
𝐼𝑛𝑓𝑜 𝐷 = −
5
14
log2
5
14
−
9
14
log2
9
14
𝐼𝑛𝑓𝑜 𝐷 = 0.9403
• Calculate information gain of remaining attributes to determine the root node
Data Mining: Classification and Prediction 19
20. Example
• Attribute: Age
• <=30 →5, 31-50 →4, >50 →5
• 3 distinct values for attribute Age, so we need to calculate 3 entropy calculations
𝐼𝑛𝑓𝑜𝐴𝑔𝑒 𝐷 = 𝐼𝑛𝑓𝑜 𝐷 − ൗ
5
14 × 𝐼𝑛𝑓𝑜(≤30) + ൗ
4
14 × 𝐼𝑛𝑓𝑜(31−50) + ൗ
5
14 × 𝐼𝑛𝑓𝑜(>50)
𝐼𝑛𝑓𝑜𝐴𝑔𝑒 𝐷 = 𝟎. 𝟐𝟒𝟔𝟕
Data Mining: Classification and Prediction 21
<=30: 3-Drug A , 2-Drug B 𝐼𝑛𝑓𝑜(≤30) = − Τ
3
5 log2 Τ
3
5 − Τ
2
5 log2 Τ
2
5 ≈ 0.9710
31-50: 0-Drug A , 4-Drug B 𝐼𝑛𝑓𝑜(31−50) = − Τ
0
4 log2 Τ
0
4 − Τ
4
4 log2 Τ
4
4 = 0
>50 : 2-Drug A , 3-Drug B 𝐼𝑛𝑓𝑜(>50) = − Τ
2
5 log2 Τ
2
5 − Τ
3
5 log2 Τ
3
5 ≈ 0.9710
21. Example
• Attribute: Gender
• M→7, F→ 7
• 2 distinct values for attribute Gender, so we need to calculate 2 entropy calculations
𝐼𝑛𝑓𝑜𝐺𝑒𝑛𝑑𝑒𝑟 𝐷 = 𝐼𝑛𝑓𝑜 𝐷 − ൗ
7
14 × 𝐼𝑛𝑓𝑜𝑀 + ൗ
17
14 × 𝐼𝑛𝑓𝑜𝐹
𝐼𝑛𝑓𝑜𝐺𝑒𝑛𝑑𝑒𝑟 𝐷 = 0.9403 − 0.7885 = 𝟎. 𝟏𝟓𝟏𝟗
Data Mining: Classification and Prediction 23
F: 4 Drug A, 3 drug B 𝐼𝑛𝑓𝑜𝑁𝑜 = − Τ
4
7 log2 Τ
4
7 − Τ
3
7 log2 Τ
3
7 ≈ 0.9852
M: 1 Drug A, 6 drug B 𝐼𝑛𝑓𝑜𝑁𝑜 = − Τ
1
7 log2 Τ
1
7 − Τ
6
7 log2 Τ
6
7 ≈ 0.5917
22. Example
• Attribute: BP
• High→ 4 , Normal→ 6 , Low→ 4
• 3 distinct values for attribute BP, so we need to calculate 3 entropy calculations
𝐼𝑛𝑓𝑜𝐵𝑃 𝐷 = 𝐼𝑛𝑓𝑜 𝐷 − ൗ
4
14 × 𝐼𝑛𝑓𝑜𝐻𝑖𝑔ℎ + ൗ
6
14 × 𝐼𝑛𝑓𝑜𝑁𝑜𝑟𝑚𝑎𝑙 + ൗ
4
14 × 𝐼𝑛𝑓𝑜𝐿𝑜𝑤
𝐼𝑛𝑓𝑜𝐵𝑃 𝐷 = 0.9403 − 0.9111 = 𝟎. 𝟎𝟐𝟗𝟐
Data Mining: Classification and Prediction 25
High: 2-Drug A , 2-Drug B 𝐼𝑛𝑓𝑜(𝐻𝑖𝑔ℎ) = − Τ
2
4 log2 Τ
2
4 − Τ
2
4 log2 Τ
2
4 ≈1.00
Normal: 2-Drug A , 4-Drug B 𝐼𝑛𝑓𝑜(𝑁𝑜𝑟𝑚𝑎𝑙) = − Τ
2
6 log2 Τ
2
6 − Τ
4
6 log2 Τ
4
6 = 0.9183
Low: 1-Drug A , 3-Drug B 𝐼𝑛𝑓𝑜(𝐿𝑜𝑤) = − Τ
1
4 log2 Τ
1
4 − Τ
3
4 log2 Τ
3
4 ≈ 0.8113
23. Data Mining: Classification and Prediction 26
Cholesterol Class: Drug
High Drug A
High Drug A
High Drug B
High Drug B
High Drug B
High Drug A
Cholesterol Class: Drug
Normal Drug A
Normal Drug B
Normal Drug B
Normal Drug B
Normal Drug A
Normal Drug B
Normal Drug B
Normal Drug B
24. Example
• Attribute: Cholesterol
• High→ 6 , Normal →8
• 2 distinct values for attribute Cholesterol, so we need to calculate 2 entropy calculations
𝐼𝑛𝑓𝑜𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝐷 = 𝐼𝑛𝑓𝑜 𝐷 − ൗ
6
14 × 𝐼𝑛𝑓𝑜(𝐻𝑖𝑔ℎ) + ൗ
8
14 × 𝐼𝑛𝑓𝑜(𝑁𝑜𝑟𝑚𝑎𝑙)
𝐼𝑛𝑓𝑜𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝐷 = 0.9403 − 0.8922 = 𝟎. 𝟎𝟒𝟖𝟏
Data Mining: Classification and Prediction 27
High: 3-Drug A , 3-Drug B 𝐼𝑛𝑓𝑜(𝐻𝑖𝑔ℎ) = − Τ
3
6 log2 Τ
3
6 − Τ
3
6 log2 Τ
3
6 = 1.00
Normal: 2-Drug A , 6-Drug B 𝐼𝑛𝑓𝑜(𝑁𝑜𝑟𝑚𝑎𝑙) = − Τ
2
8 log2 Τ
2
8 − Τ
6
8 log2 Τ
6
8 ≈ 0.8113
25. Example
• Recap
• We choose Age being a root node.
Data Mining: Classification and Prediction 29
𝐼𝑛𝑓𝑜𝐴𝑔𝑒 𝐷 0.2467
𝐼𝑛𝑓𝑜𝐺𝑒𝑛𝑑𝑒𝑟 𝐷 0.1519
𝐼𝑛𝑓𝑜𝐵𝑃 𝐷 0.0292
𝐼𝑛𝑓𝑜𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝐷 0.0481
𝑰𝒏𝒇𝒐𝑨𝒈𝒆 𝑫 0.2467
𝐼𝑛𝑓𝑜𝐺𝑒𝑛𝑑𝑒𝑟 𝐷 0.1519
𝐼𝑛𝑓𝑜𝐵𝑃 𝐷 0.0292
𝐼𝑛𝑓𝑜𝐶ℎ𝑜𝑙𝑒𝑠𝑡𝑒𝑟𝑜𝑙 𝐷 0.0481
Age
<=30 31-50 >50
Drug B
? ?
Repeat the steps
26. Example
Data Mining: Classification and Prediction 30
Age
<=30 31-50 >50
Drug B
? ?
Gender
Male Female
Drug B Drug A
Cholesterol
Male
Normal High
Drug B
Drug A
27. Decision Tree Induction
• What if the splitting attribute 𝐴 is continuous-valued?
• The test at node N has two possible outcomes, corresponding to the conditions 𝐴 ≤
𝑠𝑝𝑖𝑙𝑡_𝑝𝑜𝑖𝑛𝑡 and 𝐴 > 𝑠𝑝𝑖𝑙𝑡_𝑝𝑜𝑖𝑛𝑡 , respectively
• where 𝒔𝒑𝒊𝒍𝒕_𝒑𝒐𝒊𝒏𝒕 is the split-point returned by Attribute selection method as part of the
splitting criterion.
• When A is discrete-valued and a binary tree must be produced
• The test at node N is of the form “𝐴 ∈ 𝑆𝐴”,
• where 𝑆𝐴 is the splitting subset for 𝐴 returned by Attribute selection method as part of the
splitting criterion.
Data Mining: Classification and Prediction 31
29. Attribute Selection Measures
• Gain Ratio
• The information gain measure is biased toward tests with many outcomes.
• C4.5, a successor of ID3, uses an extension to information gain known as gain ratio
• Applies a kind of normalization to information gain using a “split information” value defined
analogously with Info(D) as
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝐴 = −
𝑗=1
𝑣
|𝐷𝑗|
|𝐷|
× log2
|𝐷𝑗|
|𝐷|
• This represents the potential information generated by splitting the training data set, 𝐷, into 𝑣
partitions, corresponding to the 𝑣 outcomes of a test on attribute 𝐴.
• The gain ratio is defined as
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝐴 =
𝐺𝑎𝑖𝑛(𝐴)
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜(𝐴)
• The attribute with the maximum gain ratio is selected as the splitting attribute.
Data Mining: Classification and Prediction 33
30. Attribute Selection Measures
• Gain Ratio
• Computation of gain ratio for the attribute weight.
• Attribute: Weight has three values as Heavy, Average and Light containing 5, 6 and 4 tuples
respectively.
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝐴 = −
5
15
× log2
5
15
−
6
15
× log2
6
15
−
4
15
× log2
4
15
= 1.5655
𝐺𝑎𝑖𝑛 𝑊𝑒𝑖𝑔ℎ𝑡 = 0.0622
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝑊𝑒𝑖𝑔ℎ𝑡 =
0.0622
1.5655
= 0.040
Data Mining: Classification and Prediction 34
31. Attribute Selection Measures
• Gini Index
• The Gini index is used in CART
• The Gini index measures the impurity of 𝐷, a data partition or set of training tuples, as
𝐺𝑖𝑛𝑖 𝐷 = 1 −
𝑖=1
𝑚
𝑝𝑖
2
• where 𝑝𝑖 is the probability that a tuple in 𝐷 belongs to class 𝐶𝑖 and is estimated by ൗ
|𝐶𝑖,𝐷|
|𝐷|.
• The sum is computed over 𝑚 classes.
• The Gini index considers a binary split for each attribute
• If a binary split on A partitions D into D1 and D2, the Gini index of D given that partitioning
is
𝐺𝑖𝑛𝑖𝐴 𝐷 =
|𝐷1|
|𝐷|
× 𝐺𝑖𝑛𝑖(𝐷1) +
𝐷2
𝐷
× 𝐺𝑖𝑛𝑖(𝐷2)
Data Mining: Classification and Prediction 35
32. Attribute Selection Measures
• Gini Index
• For a discrete-valued attribute, the subset that gives the minimum gini index for
that attribute is selected as its splitting subset.
• For continuous-valued attributes, each possible split-point must be considered.
• The reduction in impurity that would be incurred by a binary split on a discrete-
or continuous-valued attribute A is
Δ𝐺𝑖𝑛𝑖 𝐴 = 𝐺𝑖𝑛𝑖 𝐷 − 𝐺𝑖𝑛𝑖𝐴 𝐷
• The attribute that maximizes the reduction in impurity is selected as the splitting attribute.
Data Mining: Classification and Prediction 36
35. Bayesian Classification
• Bayesian classifiers are statistical classifiers
• Predicts class membership probabilities.
• based on Bayes’ theorem .
• Exhibits high accuracy and speed when applied to large databases.
• A simple Bayesian classifier is known as the naïve Bayesian classifier
• Assumes that the effect of an attribute value on a given class is independent of the values of
the other attributes: class conditional independence.
• Bayesian belief networks are graphical models, allow the representation of
dependencies among subsets of attributes
Data Mining: Classification and Prediction 39
36. Bayesian Classification
• Bayes’ Theorem
• Let 𝑿 be a data tuple (X is considered “evidence”).
• Let 𝑯 be some hypothesis, such as that the data tuple 𝑿 belongs to a specified class 𝑪.
• Determine 𝑷 𝑯|𝑿 , the probability that the hypothesis 𝑯 holds given the “evidence” or
observed data tuple X.
• 𝑷 𝑯|𝑿 is the posterior probability of 𝑯 conditioned on X.
• 𝑷 𝑯 is the prior probability of 𝑯.
• 𝑷 𝑿|𝑯 is the posterior probability of 𝑿 conditioned on H.
• 𝑷 𝑿 is the prior probability of 𝑿.
• “How are these probabilities estimated?”
Data Mining: Classification and Prediction 40
𝑷 𝑯|𝑿 =
𝑷 𝑿|𝑯 𝑷 𝑯
𝑷 𝑿
…Bayes’ Theorem
37. Naïve Bayesian Classification
• A simple Bayesian classifier is known as the naïve Bayesian classifier
• Assumes that the effect of an attribute value on a given class is independent of
the values of the other attributes: class conditional independence.
• It is made to simplify the computations involved and, in this sense, is
considered “naïve.”
Data Mining: Classification and Prediction 41
38. Naïve Bayesian Classification
• Let 𝐷 be a training set of tuples and their associated class labels.
• Suppose that there are m classes, 𝐶1, 𝐶2, … 𝐶𝑚.
• Given a tuple, 𝑋 = (𝑥1, 𝑥2, … , 𝑥𝑛) depicting 𝑛 measurements made on the tuple from 𝑛
attributes, the classifier will predict that 𝑋 belongs to the class having the highest
posterior probability, conditioned on 𝑋.
• The naïve Bayesian classifier predicts that tuple X belongs to the class 𝐶𝑖 if and only if
𝑃 𝐶𝑖|𝑋 > 𝑃 𝐶𝑗|𝑋 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚, 𝑗 ≠ 𝑖
• We maximize 𝑃 𝐶𝑖|𝑋
• The class 𝐶𝑖 for which 𝑃 𝐶𝑖|𝑋 is maximized is called the maximum posteriori
hypothesis.
Data Mining: Classification and Prediction 42
39. Naïve Bayesian Classification
• By Bayes’ theorem
𝑷 𝑪𝒊|𝑿 =
𝑷 𝑿|𝑪𝒊 𝑷 𝑪𝒊
𝑷 𝑿
• As 𝑃 𝑋 is constant for all classes, only 𝑃 𝑋|𝐶𝑖 𝑃 𝐶𝑖 need be maximized.
• The naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and only if
𝑃 𝐶𝑖|𝑋 > 𝑃 𝐶𝑗|𝑋 𝑓𝑜𝑟 1 ≤ 𝑗 ≤ 𝑚, 𝑗 ≠ 𝑖
• The class 𝐶𝑖 for which 𝑃 𝐶𝑖|𝑋 is maximized is called the maximum posteriori
hypothesis.
• The class prior probabilities may be estimated by
𝑃 𝐶𝑖 =
𝐶𝑖,𝐷
𝐷
… … 𝑤ℎ𝑒𝑟𝑒 𝐶𝑖,𝐷 𝑖𝑠 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑡𝑢𝑝𝑙𝑒𝑠 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 𝐶𝑖 𝑖𝑛 𝐷
Data Mining: Classification and Prediction 43
40. Naïve Bayesian Classification
• In order to reduce computation in evaluating 𝑃 𝑋|𝐶𝑖 , the naive assumption of class
conditional independence is made.
𝑷 𝑿|𝑪𝒊 = ෑ
𝒌=𝟏
𝒏
𝑷 𝒙𝒌|𝑪𝒊 = 𝑷 𝒙𝟏|𝑪𝒊 × 𝑷 𝒙𝟐|𝑪𝒊 × ⋯ × 𝑷 𝒙𝒏|𝑪𝒊
• Bayesian classifiers have the minimum error rate in comparison to all other classifiers.
Data Mining: Classification and Prediction 44
41. Naïve Bayesian Classification: Example
RID age income student credit_rating class: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Data Mining: Classification and Prediction 45
42. Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 46
• Let 𝐶1 be the 𝑐𝑙𝑎𝑠𝑠: 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒r = 𝑦𝑒𝑠 and 𝐶2 be the 𝑐𝑙𝑎𝑠𝑠: 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒r = 𝑛𝑜
• The tuple we wish to classify is
𝑋 = (𝑎𝑔𝑒 = 𝑦𝑜𝑢𝑡ℎ, 𝑖𝑛𝑐𝑜𝑚𝑒 = 𝑚𝑒𝑑𝑖𝑢𝑚, 𝑠𝑡𝑢𝑑𝑒𝑛𝑡 = 𝑦𝑒𝑠, 𝑐𝑟𝑒𝑑𝑖𝑡 𝑟𝑎𝑡𝑖𝑛𝑔 = 𝑓𝑎𝑖𝑟)
• We need to maximize 𝑃 𝑋|𝐶𝑖 𝑃 𝐶𝑖 , for 𝑖 = 1,2
• Calculate 𝑃 𝐶𝑖 , for 𝑖 = 1,2
• 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠 =
• 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜 =
• Calculate 𝑃 𝑋|𝐶𝑖 , for 𝑖 = 1,2
9/14 = 0.643
5/14 = 0.357
43. Naïve Bayesian Classification: Example
RID age income student credit_rating class: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent no
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Data Mining: Classification and Prediction 47
45. Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 49
• Now we calculate from above probabilities
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠) = 0.222 × 0.444 × 0.667 × 0.667
Similarly
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜) = 0.600 × 0.400 × 0.200 × 0.400
• To find the class, 𝐶𝑖, that maximizes 𝑃 𝑋|𝐶𝑖 𝑃 𝐶𝑖 , we compute
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠) 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠 =
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜) 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜 =
• Therefore, the naïve Bayesian classifier predicts, 𝑩𝒖𝒚𝒔_𝒄𝒐𝒎𝒑𝒖𝒕𝒆𝒓 = 𝒚𝒆𝒔 for
tuple 𝑋.
=0.044
=0.019
0.028
0.007
46. Naïve Bayesian Classification: Example
RID age income student credit_rating class: Buys_computer
1 youth high no fair no
2 youth high no excellent no
3 middle_aged high no fair yes
4 senior medium no fair yes
5 senior low yes fair yes
6 senior low yes excellent yes
7 middle_aged low yes excellent yes
8 youth medium no fair no
9 youth low yes fair yes
10 senior medium yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Data Mining: Classification and Prediction 50
Calculate 𝑃 𝐶𝑖 , for 𝑖 = 1,2
𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠 =
10/14 = 0.714
𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜 =
4/14 = 0.286
48. Naïve Bayesian Classification: Example
Data Mining: Classification and Prediction 52
• Now we calculate from above probabilities
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠) = 0.200 × 0.400 × 0.700 × 0.600
Similarly
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜) = 0.750 × 0.500 × 0 × 0.500
• To find the class, 𝐶𝑖, that maximizes 𝑃 𝑋|𝐶𝑖 𝑃 𝐶𝑖 , we compute
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠) 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠 =
𝑃(𝑋|𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜) 𝑃 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑛𝑜 =
• Therefore, the naïve Bayesian classifier predicts, 𝑩𝒖𝒚𝒔_𝒄𝒐𝒎𝒑𝒖𝒕𝒆𝒓 = 𝒚𝒆𝒔 for
tuple 𝑋.
=0.034
=0
0.024
0
IS IT CORRECT CLASSIFICATION ??????????
49. Naïve Bayesian Classification
Data Mining: Classification and Prediction 53
• A zero probability cancels the effects of all of the other (posteriori) probabilities
(on 𝐶𝑖) involved in the product.
• To avoid the effect of zero probability value, Laplacian correction or Laplace
estimator is used.
• We add one to each count.
50. Naïve Bayesian Classification
Data Mining: Classification and Prediction 54
• E.g. If we have a training database D having 1500 tuples.
• Out of which, 1000 tuples are of class Buys_computer = yes.
• For income attribute we have
• 0 tuples for income = low,
• 960 tuple for income = medium,
• 40 tuples for income = high.
• Using the Laplacian correction for the three quantities, we pretend that we have 1 extra tuple for
each income-value pair.
1
1003
= 0.001,
961
1003
= 0.958,
41
1003
= 0.040
• The “corrected” probability estimates are close to their “uncorrected” counterparts, yet the zero
probability value is avoided.
51. Rule-Based Classification
• The learned model is represented as a set of IF-THEN rules.
• An IF-THEN rule is an expression of the form
𝑰𝑭 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑻𝑯𝑬𝑵 𝑐𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛
• Example: 𝑅1: 𝐼𝐹 𝑎𝑔𝑒 = 𝑦𝑜𝑢𝑡ℎ 𝐴𝑁𝐷𝑠𝑡𝑢𝑑𝑒𝑛𝑡 = 𝑦𝑒𝑠 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠
• R1 can also be written as
𝑅1: (𝑎𝑔𝑒 = 𝑦𝑜𝑢𝑡ℎ) ∧ (𝑠𝑡𝑢𝑑𝑒𝑛𝑡 = 𝑦𝑒𝑠) ⇒ (𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠)
Data Mining: Classification and Prediction 55
Rule antecedent
or precondition
Rule consequent
Attribute sets Class Prediction
52. Rule-Based Classification
• If the condition in a rule antecedent holds true for a given tuple, the rule antecedent is
satisfied and that the rule covers the tuple.
• Evaluation of Rule R:
𝑐𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑅 =
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
|𝐷|
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑅 =
𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡
𝑛𝑐𝑜𝑣𝑒𝑟𝑠
• Let 𝑛𝑐𝑜𝑣𝑒𝑟𝑠 be the number of tuples covered by R
• 𝑛𝑐𝑜𝑟𝑟𝑒𝑐𝑡 be the number of tuples correctly classified by R
• |𝐷| be the number of tuples in D.
Data Mining: Classification and Prediction 56
53. Naïve Bayesian Classification: Example
RID age income student credit_rating class: Buys_computer
1 youth high no fair no
2 youth high no excellent no
8 youth medium no fair no
9 youth low yes fair yes
11 youth medium yes excellent yes
12 middle_aged medium no excellent yes
13 middle_aged high yes fair yes
14 senior medium no excellent no
Data Mining: Classification and Prediction 57
54. Rule-Based Classification
• If a rule is satisfied by X, the rule is said to be triggered.
X= (age = youth, income = medium, student = yes, credit rating = fair)
• X satisfies the rule R1, which triggers the rule.
• If R1 is the only rule satisfied, then the rule fires by returning the class prediction for X.
• If more than one rule is triggered, we need a conflict resolution strategy.
• Size ordering: assigns the highest priority to the triggering rule that has the “toughest”
requirements
• Rule ordering: prioritizes the rules beforehand. The ordering may be class-based or rule-
based.
• Class-based ordering: the classes are sorted in order of decreasing “importance”
• Rule-based ordering, the rules are organized into one long priority list
Data Mining: Classification and Prediction 58
55. Rule-Based Classification
• Extracting rules from a decision tree
• One rule is created for each path from the root to a leaf node.
• Each splitting criterion along a given path is logically ANDed to form the rule
antecedent (“IF” part).
• The leaf node holds the class prediction, forming the rule consequent (“THEN”
part).
Data Mining: Classification and Prediction 59
56. Rule-Based Classification
• Extracting rules from a decision tree
Data Mining: Classification and Prediction 60
age?
student? credit_rating?
yes
middle_aged
youth senior
no yes no yes
no yes fair excellent
57. Rule-Based Classification
• Extracted rules from a decision tree are
𝑅1: 𝐼𝐹 𝑎𝑔𝑒 = 𝑆𝑒𝑛𝑖𝑜𝑟 𝐴𝑁𝐷 𝑐𝑟𝑒𝑑𝑖𝑡_𝑟𝑎𝑡𝑖𝑛𝑔 = 𝑒𝑥𝑐𝑒𝑙𝑙𝑒𝑛𝑡 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠
𝑅2: 𝐼𝐹 𝑎𝑔𝑒 = 𝑆𝑒𝑛𝑖𝑜𝑟 𝐴𝑁𝐷 𝑐𝑟𝑒𝑑𝑖𝑡_𝑟𝑎𝑡𝑖𝑛𝑔 = 𝑓𝑎𝑖𝑟 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = no
𝑅3: 𝐼𝐹 𝑎𝑔𝑒 = 𝑚𝑖𝑑𝑑𝑙𝑒_𝑎𝑔𝑒𝑑 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠
𝑅4: 𝐼𝐹 𝑎𝑔𝑒 = 𝑦𝑜𝑢𝑡ℎ 𝐴𝑁𝐷 student = yes 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = 𝑦𝑒𝑠
𝑅5: 𝐼𝐹 𝑎𝑔𝑒 = 𝑦𝑜𝑢𝑡ℎ 𝐴𝑁𝐷 student = 𝑛𝑜 𝑇𝐻𝐸𝑁 𝐵𝑢𝑦𝑠_𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟 = no
Data Mining: Classification and Prediction 61
60. Prediction
• Numeric prediction is the task of predicting continuous (or ordered) values for
given input.
• Widely used approach for numeric prediction is regression.
• Regression is used to model the relationship between one or more independent
or predictor variables and a dependent or response variable.
• The predictor variables are the attributes of interest describing the tuple.
• The response variable is what we want to predict.
Data Mining: Classification and Prediction 64
Predictor Variables Response Variable
𝑋 = {age = youth, "income = medium, student = yes, credit rating = fair", Buys_computer =? }
61. Prediction: Linear Regression
• Straight-line regression analysis involves a response variable, 𝑦, and a single
predictor variable, 𝑥.
• Simplest regression technique which models 𝑦 as a linear function of 𝑥.
𝑦 = 𝑏 + 𝑤𝑥
• 𝑏 and 𝑤 are regression coefficients specifying the Y-intercept and slope of the line.
• Coefficients can also be thought as weights
𝑦 = 𝑤0 + 𝑤1𝑥
• These coefficients can be solved for by the method of least squares, which estimates
the best-fitting straight line as the one that minimizes the error between the actual
data and the estimate of the line.
Data Mining: Classification and Prediction 65
62. Prediction: Linear Regression
• The regression coefficients can be estimated
𝑤1 =
σ𝑖=1
|𝐷|
𝑥𝑖 − ҧ
𝑥 𝑦𝑖 − ത
𝑦
σ𝑖=1
|𝐷|
𝑥𝑖 − ҧ
𝑥 2
𝑤0 = ത
𝑦 − 𝑤1 ҧ
𝑥
Data Mining: Classification and Prediction 66
63. Prediction: Linear Regression Age
(x)
Avg. amount spent on
medical expenses
(per month in Rs.) (y)
15 100
20 135
25 135
37 150
40 250
45 270
48 290
50 360
55 375
61 400
64 500
67 1000
70 1500
Data Mining: Classification and Prediction 67
ҧ
𝑥 = 45.92
ത
𝑦 = 412.69
The regression coefficients are
𝑤1 = 16.89
𝑤0 = −355.32
The equation of the least square (best fitting) line is
𝑦 = −355.32 + 16.89𝑥
64. Prediction: Linear Regression Age
(x)
Avg. amount spent on
medical expenses
(per month in Rs.) (y)
15 100
20 135
25 135
37 150
40 250
45 270
48 290
50 360
55 375
61 400
64 500
67 1000
70 1500
Data Mining: Classification and Prediction 68
y = 16.891x - 355.32
-200
0
200
400
600
800
1000
1200
1400
1600
0 10 20 30 40 50 60 70 80
Avg. amount spent on medical expenses (per month in Rs.) (y)
65. Classifier Accuracy Measures
Data Mining: Classification and Prediction 69
• Confusion Matrix:
• Given 𝒎 classes, a confusion matrix is a table of at least size 𝒎 by 𝒎
• where an entry is row 𝒊 and column 𝒋 shows the number of tuples of class 𝒊 that were
labeled by the classifier as class 𝒋.
66. Class – Low Class – Medium Class - High
Class – Low 250 10 0
Class – Medium 10 440 10
Class - High 0 10 270
Data Mining: Classification and Prediction 70
1000 tuples
67. Classifier Accuracy Measures
Data Mining: Classification and Prediction 71
• Classifier Accuracy
• The percentage of test set tuples that are correctly classified by the classifier.
• Also referred to as the overall recognition rate of the classifier.
• Error Measure
• An error rate or misclassification rate of a classifier M, which is simply
1 − 𝐴𝑐𝑐 𝑀
where 𝐴𝑐𝑐(𝑀) is the accuracy of M.
68. Classifier Accuracy Measures
Data Mining: Classification and Prediction 72
• Confusion Matrix: Given 2 classes
• Positive tuples:
• tuples of the main class of interest
• Negative tuples:
• True Positive:
• The positive tuples that were correctly labeled by the classifier
• True negatives
• The negative tuples that were correctly labeled by the classifier
• False positives
• The negative tuples that were incorrectly labeled
• False negatives
• The positive tuples that were incorrectly labeled
69. Classifier Accuracy Measures
Data Mining: Classification and Prediction 73
• We would like to be able to access how well the classifier can recognize the
positive tuples and how well it can recognize the negative tuples.
• Sensitivity (true positive (recognition) rate)
• The proportion of positive tuples that are correctly identified.
𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
𝑡_𝑝𝑜𝑠
𝑝𝑜𝑠
• Specificity (true negative rate)
• The proportion of negative tuples that are correctly identified.
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
𝑡_𝑛𝑒𝑔
𝑛𝑒𝑔
• Precision
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =
𝑡 _𝑝𝑜𝑠
(𝑡 _𝑝𝑜𝑠 + 𝑓 _𝑝𝑜𝑠 )
70. Classifier Accuracy Measures
Data Mining: Classification and Prediction 74
• It can be shown that accuracy is a function of sensitivity and specificity.
𝑎𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 ×
𝑝𝑜𝑠
𝑝𝑜𝑠 + 𝑛𝑒𝑔
+ 𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 ×
𝑛𝑒𝑔
𝑝𝑜𝑠 + 𝑛𝑒𝑔
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑡𝑝𝑜𝑠 + 𝑡𝑛𝑒𝑔
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑡𝑢𝑝𝑙𝑒𝑠
71. Predictor Accuracy Measures
Data Mining: Classification and Prediction 75
• Instead of focusing on whether the predicted value 𝑦′𝑖 is an “exact” match with actual
value 𝑦𝑖 , we check how far off the predicted value is from the actual known value.
• Loss functions measures the error between the actual value 𝑦𝑖 and the predicted value
𝑦′𝑖.
𝑨𝒃𝒔𝒐𝒍𝒖𝒕𝒆 𝒆𝒓𝒓𝒐𝒓: 𝒚𝒊 − 𝒚′𝒊
𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓: (𝒚𝒊 − 𝒚′𝒊)𝟐
• The test error (rate), or generalization error, is the average loss over the test set.
𝑴𝒆𝒂𝒏 𝒂𝒃𝒔𝒐𝒍𝒖𝒕𝒆 𝒆𝒓𝒓𝒐𝒓 =
σ𝒊=𝟏
𝒅
𝒚𝒊 − 𝒚′𝒊
𝒅
𝑴𝒆𝒂𝒏 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓 =
σ𝒊=𝟏
𝒅
(𝒚𝒊 − 𝒚′𝒊)𝟐
𝒅
• If we were to take the square root of the mean squared error, the resulting error measure
is called the root mean squared error.
72. Predictor Accuracy Measures
Data Mining: Classification and Prediction 76
• Relative measures of error include
𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒂𝒃𝒔𝒐𝒍𝒖𝒕𝒆 𝒆𝒓𝒓𝒐𝒓 =
σ𝒊=𝟏
𝒅
𝒚𝒊 − 𝒚′𝒊
σ𝒊=𝟏
𝒅
𝒚𝒊 − ഥ
𝒚
𝑹𝒆𝒍𝒂𝒕𝒊𝒗𝒆 𝒔𝒒𝒖𝒂𝒓𝒆𝒅 𝒆𝒓𝒓𝒐𝒓 =
σ𝒊=𝟏
𝒅
(𝒚𝒊 − 𝒚′𝒊)𝟐
σ𝒊=𝟏
𝒅
(𝒚𝒊 − ഥ
𝒚)𝟐
• We can take the root of the relative squared error to obtain the root relative squared
error so that the resulting error is of the same magnitude as the quantity predicted.
73. Accuracy Measures
Data Mining: Classification and Prediction 77
• Evaluating the Accuracy of a Classifier or Predictor
Holdout
Random Subsampling
Cross Validation
Bootstrap
74. Accuracy Measures
Data Mining: Classification and Prediction 78
• Holdout
• The given data are randomly partitioned into two independent sets, a training set and a
test set.
• Two-thirds of the data are allocated to the training set, and the remaining one-third is
allocated to the test set.
Data
Training
set
Test set
Derive model
Estimate
Accuracy
75. Accuracy Measures
Data Mining: Classification and Prediction 79
• Random Subsampling
• A variation of the holdout method in which the holdout method is repeated 𝒌 times.
• The overall accuracy estimate is taken as the average of the accuracies obtained from
each iteration
Data
Training
set 1
Test set
1
Derive model
Estimate
Accuracy
Data
Training
set 2
Test set
2
Derive model
Estimate
Accuracy
Data
Training
set k
Test set
k
Derive model
Estimate
Accuracy
Iteration 1
Iteration 2 Iteration k
. . .
76. Accuracy Measures
Data Mining: Classification and Prediction 80
• Cross Validation
Data
𝐷2 𝐷3
𝐷1 𝐷𝑘
𝐷2
𝐷3
𝐷1
𝐷𝑘
…
𝐷2
𝐷3
𝐷1
𝐷𝑘
…
𝐷2
𝐷3
𝐷1
𝐷𝑘
…
Iteration 1 Iteration 2 Iteration3 Iteration 𝑘
𝑘 mutually exclusive folds
i.e.𝑘 data partitions
. . .
𝐷2
𝐷3
𝐷1
𝐷𝑘
…
Training Set Test Set
77. Accuracy Measures
Data Mining: Classification and Prediction 81
• Cross Validation
• Each sample is used the same number of times for training and once for testing.
• For Classification, the accuracy estimate is the overall number of correct classifications
from the k iterations, divided by the total number of tuples in the initial data.
• For Prediction, the error estimate can be computed as the total loss from the k iterations,
divided by the total number of initial tuples.
• Leave-one-out
• 𝑘 is set to the number of initial tuples. So, only one sample is “left out” at a time for the test set.
• Stratified cross-validation
• The folds are stratified so that the class distribution of the tuples in each fold is approximately
the same as that in the initial data
78. Accuracy Measures
Data Mining: Classification and Prediction 82
• Bootstrap
• The bootstrap method samples the given training tuples uniformly with replacement.
• i.e. each time a tuple is selected, it is equally likely to be selected again and readded to the
training set.
• .632 Bootstrap
• On an average, 63.2% of the original data tuples will end up in the bootstrap, and the remaining
36.8% will form the test set
• Each tuple has a probability of Τ
1
𝑑 of being selected, so the probability of not being chosen is (1 −
Τ
1
𝑑).
• We have to select 𝑑 times, so the probability that a tuple will not be chosen during this whole time is
(1 − Τ
1
𝑑)𝑑.
• If 𝑑 is large, the probability approaches e−1
= 0.368 .
• Thus, 36.8% of tuples will not be selected for training and thereby end up in the test set, and the
remaining 63.2% will form the training set.
79. Accuracy Measures
Data Mining: Classification and Prediction 83
• Bootstrap
• .632 Bootstrap
• Repeat the sampling procedure 𝑘 times, where in each iteration, we use the current test set to
obtain an accuracy estimate of the model obtained from the current bootstrap sample.
• The overall accuracy of the model is
𝐴𝑐𝑐 𝑀 =
𝑖=1
𝑘
0.632 × 𝐴𝑐𝑐(𝑀𝑖)𝑡𝑒𝑠𝑡_𝑠𝑒𝑡 + 0.368 × 𝐴𝑐𝑐(𝑀𝑖)𝑡𝑟𝑎𝑖𝑛_𝑠𝑒𝑡