Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelSri Ambati
H2O World 2015 - Arno Candel
- Powered by the open source machine learning software H2O.ai. Contributors welcome at: https://github.com/h2oai
- To view videos on H2O open source machine learning software, go to: https://www.youtube.com/user/0xdata
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano.
http://www.facebook.com/polimigamecollective
https://twitter.com/@POLIMIGC
http://www.youtube.com/PierLucaLanzi
http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Politecnico di Milano, Videogiochi, Video Games, Computer Engineering, game design, game development, sviluppo videogiochi
Video game design and programming course for the Master in Computer Engineering at the Politecnico di Milano. http://www.facebook.com/polimigamecollective https://twitter.com/@POLIMIGC http://www.youtube.com/PierLucaLanzi http://www.polimigamecollective.org
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
Q: How have dummies (like me) managed to gain (some) control over a (seemingly) complex world?
A:The world is simpler than we think.
◆ Models contain clumps
◆ A few collar variables decide which clumps to use.
This is a presentation to explain the concepts described in the paper "A Concept Analysis Inspired Greedy Algorithm for Test Suite Minimization" by Sriraman Tallam and Neelam Gupta.
Abstract: Software testing and retesting occurs continuously during the soft- ware development lifecycle to detect errors as early as possible and to ensure that changes to existing software do not break the soft- ware. Test suites once developed are reused and updated frequently as the software evolves. As a result, some test cases in the test suite may become redundant as the software is modified over time since the requirements covered by them are also covered by other test cases. Due to the resource and time constraints for re-executing large test suites, it is important to develop techniques to minimize available test suites by removing redundant test cases. In general, the test suite minimization problem is NP complete. In this paper, we present a new greedy heuristic algorithm for selecting a minimal subset of a test suite T that covers all the requirements covered by T . We show how our algorithm was inspired by the concept analy- sis framework. We conducted experiments to measure the extent of test suite reduction obtained by our algorithm and prior heuristics for test suite minimization. In our experiments, our algorithm al- ways selected same size or smaller size test suite than that selected by prior heuristics and had comparable time performance.
Similar to DMTM 2015 - 12 Classification Rules (20)
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides for the 2016/2017 edition of the Data Mining and Text Mining Course at the Politecnico di Milano. The course is also part of the joint program with the University of Illinois at Chicago.
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Slides from the 2016/2017 edition of the Video game Design and Programming course at the Politecnico di Milano. More information at http://www.polimigamecollective.org Some of the video games developed by the students during the course are available at https://polimi-game-collective.itch.io
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
1. Prof. Pier Luca Lanzi
Classification: Rule Induction
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
2. Prof. Pier Luca Lanzi
The Weather Dataset
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
2
3. Prof. Pier Luca Lanzi
A Rule Set to Classify the Data
• IF (humidity = high) and (outlook = sunny)
THEN play=no (3.0/0.0)
• IF (outlook = rainy) and (windy = TRUE)
THEN play=no (2.0/0.0)
• OTHERWISE play=yes (9.0/0.0)
• Confusion Matrix
§ yes no -- classified as
§ 7 2 | yes
§ 3 2 | no
3
4. Prof. Pier Luca Lanzi
Let’s Check the First Rule
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
4
IF (humidity = high) and (outlook = sunny) THEN play=no (3.0/0.0)
5. Prof. Pier Luca Lanzi
Then, The Second Rule
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
5
IF (outlook = rainy) and (windy = TRUE) THEN play=no (2.0/0.0)
6. Prof. Pier Luca Lanzi
Finally, the Third Rule
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
6
IF (outlook = rainy) and (windy = TRUE) THEN play=no (2.0/0.0)
7. Prof. Pier Luca Lanzi
A Simpler Solution
• IF (outlook = sunny) THEN play IS no
ELSE IF (outlook = overcast) THEN play IS yes
ELSE IF (outlook = rainy) THEN play IS yes
(6/14 instances correct)
• Confusion Matrix
§ yes no -- classified as
§ 4 5 | yes
§ 3 2 | no
7
9. Prof. Pier Luca Lanzi
What Is A Classification Rules?
Why Rules?
• They are IF-THEN rules
§ The IF part states a condition over the data
§ The THEN part includes a class label
• What types of conditions?
§ Propositional, with attribute-value comparisons
§ First order Horn clauses, with variables
• Why rules? Because they are one of the most expressive and
most human readable representation for hypotheses is sets of IF-
THEN rules
9
10. Prof. Pier Luca Lanzi
Coverage and Accuracy
• IF (humidity = high) and (outlook = sunny)
THEN play=no (3.0/0.0)
• ncovers = number of examples covered by the rule
• ncorrect = number of examples correctly classified by the rule
• coverage(R) = ncovers /size of the |training data set
• accuracy(R) = ncorrect /ncovers
10
11. Prof. Pier Luca Lanzi
Conflict Resolution
• If more than one rule is triggered, we need conflict resolution
• Size ordering: assign the highest priority to the triggering rules
that has the “toughest” requirement (i.e., with the most attribute
test)
• Class-based ordering: decreasing order of prevalence or
misclassification cost per class
• Rule-based ordering (decision list): rules are organized into one
long priority list, according to some measure of rule quality or by
experts
11
12. Prof. Pier Luca Lanzi
Two Approaches for Rule Learning
• Direct Methods
§ Directly learn the rules from the training data
• Indirect Methods
§ Learn decision tree, then convert to rules
§ Learn neural networks, then extract rules
12
14. Prof. Pier Luca Lanzi
Inferring Rudimentary Rules
• OneRule (1R) learns a simple rule involving one attribute
§ Assumes nominal attributes
§ The rule that all the values of one particular attribute
• Basic version
§ One branch for each value
§ Each branch assigns most frequent class
§ Error rate: proportion of instances that don’t belong to the
majority class of their corresponding branch
§ Choose attribute with lowest error rate
§ “missing” is treated as a separate value
14
15. Prof. Pier Luca Lanzi
For each attribute,
For each value of the attribute,
make a rule as follows:
count how often each class appears
find the most frequent class
make the rule assign that class to
this attribute-value
Calculate the error rate of the rules
Choose the rules with the smallest error rate
Pseudo-Code for OneRule 15
17. Prof. Pier Luca Lanzi
OneRule and Numerical Attributes
• Applies simple supervised discretization
• Sort instances according to attribute’s values
• Place breakpoints where class changes (majority class)
• This procedure is however very sensitive to noise since one
example with an incorrect class label may produce a separate
interval. This is likely to lead to overfitting.
• In the case of the temperature,
17
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
18. Prof. Pier Luca Lanzi
OneRule and Numerical Attributes
• To limit overfitting, enforce minimum number of instances in
majority class per interval.
• For instance, in the case of the temperature, if we set the
minimum number of majority class instances to 3, we have
18
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No
join the intervals to get at least 3 examples
join the intervals with the same majority class
19. Prof. Pier Luca Lanzi
OneRule Applied to the Numerical Version of
the Weather Dataset
19
0/1 95.5 →Yes
3/6True → No*
5/142/8False →YesWindy
2/6 82.5 and ≤ 95.5 → No
3/141/7≤ 82.5 → YesHumidity
5/14
4/14
Total errors
2/4 77.5 → No*
3/10≤ 77.5 →YesTemperature
2/5Rainy →Yes
0/4Overcast →Yes
2/5Sunny → NoOutlook
ErrorsRulesAttribute
21. Prof. Pier Luca Lanzi
Sequential Covering Algorithms
• Consider the set E of positive and negative examples
• Repeat
§ Learn one rule with high accuracy, any coverage
§ Remove positive examples covered by this rule
• Until all the examples are covered
21
22. Prof. Pier Luca Lanzi
Basic Sequential Covering Algorithm
procedure Covering (Examples, Classifier)
input: a set of positive and negative examples for class c
// rule set is initially empty
classifier = {}
while PositiveExamples(Examples)!={}
// find the best rule possible
Rule = FindBestRule(Examples)
// check if we need more rules
if Stop(Examples, Rule, Classifier) breakwhile
// remove covered examples and update the model
Examples = ExamplesCover(Rule,Examples)
Classifier = Classifier U {Rule}
Endwhile
// post-process the rules (sort them, simplify them, etc.)
Classifier = PostProcessing(Classifier)
output: Classifier
22
23. Prof. Pier Luca Lanzi
Finding the Best Rule Possible 23
IF THEN Play=yes
IF Wind=No
THEN Play=yes
IF Humidity=Normal
THEN Play=yes
IF Humidity=High
THEN Play=yes
IF …
THEN …
IF Wind=yes
THEN Play=yes
IF Humidity=Normal
AND Wind=yes
THEN Play=yes
IF Humidity=Normal
AND Wind=No
THEN Play=yes
IF Humidity=Normal
AND Outlook=Rainy
THEN Play=yes
P=5/10 = 0.5
P=6/8=0.75 P=6/7=0.86 P=3/7=0.43
24. Prof. Pier Luca Lanzi
Another Viewpoint 24
y
x
a
b b
b
b
b
b
b
b
b b b
b
b
b
a
a
a
a
a
y
a
b b
b
b
b
b
b
b
b b
b
b
b
b
a
a
aa
a
x
1đ2
y
a
b b
b
b
b
b
b
b
b b
b
b
b
b
a
a
aa
a
x
1đ2
2đ6
If x 1.2 then class = a
If x 1.2 and y 2.6
then class = a
If true then class = a
25. Prof. Pier Luca Lanzi
And Another Viewpoint 25
(i) Original Data (ii) Step 1
(iii) Step 2
R1
(iv) Step 3
R1
R2
26. Prof. Pier Luca Lanzi
Learning Just One Rule
LearnOneRule(Attributes, Examples, k)
init BH to the most general hypothesis
init CH to {BH}
while CH not empty Do
Generate Next More Specific CH in NCH
// check all the NCH for an hypothesis that
// improves the performance of BH
Update BH
Update CH with the k best NCH
endwhile
return a rule “IF BH THEN prediction”
26
27. Prof. Pier Luca Lanzi
An Example Using Contact Lens Data 27
NoneReducedYesHypermetropePre-presbyopic
NoneNormalYesHypermetropePre-presbyopic
NoneReducedNoMyopePresbyopic
NoneNormalNoMyopePresbyopic
NoneReducedYesMyopePresbyopic
HardNormalYesMyopePresbyopic
NoneReducedNoHypermetropePresbyopic
SoftNormalNoHypermetropePresbyopic
NoneReducedYesHypermetropePresbyopic
NoneNormalYesHypermetropePresbyopic
SoftNormalNoHypermetropePre-presbyopic
NoneReducedNoHypermetropePre-presbyopic
HardNormalYesMyopePre-presbyopic
NoneReducedYesMyopePre-presbyopic
SoftNormalNoMyopePre-presbyopic
NoneReducedNoMyopePre-presbyopic
hardNormalYesHypermetropeYoung
NoneReducedYesHypermetropeYoung
SoftNormalNoHypermetropeYoung
NoneReducedNoHypermetropeYoung
HardNormalYesMyopeYoung
NoneReducedYesMyopeYoung
SoftNormalNoMyopeYoung
NoneReducedNoMyopeYoung
Recommended lensesTear production rateAstigmatismSpectacle prescriptionAge
28. Prof. Pier Luca Lanzi
First Step: the Most General Rule
• Rule we seek:
• Possible tests:
28
4/12Tear production rate = Normal
0/12Tear production rate = Reduced
4/12Astigmatism = yes
0/12Astigmatism = no
1/12Spectacle prescription = Hypermetrope
3/12Spectacle prescription = Myope
1/8Age = Presbyopic
1/8Age = Pre-presbyopic
2/8Age = Young
If ?
then recommendation = hard
29. Prof. Pier Luca Lanzi
Adding the First Clause
• Rule with best test added,
• Instances covered by modified rule,
29
NoneReducedYesHypermetropePre-presbyopic
NoneNormalYesHypermetropePre-presbyopic
NoneReducedYesMyopePresbyopic
HardNormalYesMyopePresbyopic
NoneReducedYesHypermetropePresbyopic
NoneNormalYesHypermetropePresbyopic
HardNormalYesMyopePre-presbyopic
NoneReducedYesMyopePre-presbyopic
hardNormalYesHypermetropeYoung
NoneReducedYesHypermetropeYoung
HardNormalYesMyopeYoung
NoneReducedYesMyopeYoung
Recommended lensesTear production rateAstigmatismSpectacle prescriptionAge
If astigmatism = yes
then recommendation = hard
30. Prof. Pier Luca Lanzi
Extending the First Rule
• Current state,
• Possible tests,
30
4/6Tear production rate = Normal
0/6Tear production rate = Reduced
1/6Spectacle prescription = Hypermetrope
3/6Spectacle prescription = Myope
1/4Age = Presbyopic
1/4Age = Pre-presbyopic
2/4Age = Young
If astigmatism = yes
and ?
then recommendation = hard
31. Prof. Pier Luca Lanzi
The Second Rule
• Rule with best test added:
• Instances covered by modified rule
31
NoneNormalYesHypermetropePrepresbyopic
HardNormalYesMyopePresbyopic
NoneNormalYesHypermetropePresbyopic
HardNormalYesMyopePrepresbyopic
HardNormalYesHypermetropeYoung
HardNormalYesMyopeYoung
Recommended lensesTear production rateAstigmatismSpectacle prescriptionAge
If astigmatism = yes
and tear production rate = normal
then recommendation = Hard
32. Prof. Pier Luca Lanzi
Adding the Third Clause
• Current state:
• Possible tests:
• Tie between the first and the fourth test,
we choose the one with greater coverage
32
1/3Spectacle prescription = Hypermetrope
3/3Spectacle prescription = Myope
1/2Age = Presbyopic
1/2Age = Pre-presbyopic
2/2Age = Young
If astigmatism = yes
and tear production rate = normal
and ?
then recommendation = hard
33. Prof. Pier Luca Lanzi
The Final Result
• Final rule:
• Second rule for recommending “hard lenses”:
(built from instances not covered by first rule)
• These two rules cover all “hard lenses”:
• Process is repeated with other two classes
33
If astigmatism = yes
and tear production rate = normal
and spectacle prescription = myope
then recommendation = hard
If age = young and astigmatism = yes
and tear production rate = normal
then recommendation = hard
34. Prof. Pier Luca Lanzi
Testing for the Best Rule
• Measure 1: Accuracy (p/t)
§ t total instances covered by rule
pnumber of these that are positive
§ Produce rules that do not cover negative instances,
as quickly as possible
§ May produce rules with very small coverage
—special cases or noise?
• Measure 2: Information gain p (log(p/t) – log(P/T))
§ P and T the positive and total numbers before the new
condition was added
§ Information gain emphasizes positive rather than negative
instances
• These measures interact with the pruning mechanism used
34
35. Prof. Pier Luca Lanzi
Eliminating Instances
• Why do we need to eliminate instances?
§ Otherwise, the next rule is identical to previous rule
• Why do we remove positive instances?
§ To ensure that the next rule is different
• Why do we remove negative instances?
§ Prevent underestimating accuracy of rule
§ Compare rules R2 and R3 in the following diagram
35
37. Prof. Pier Luca Lanzi
Missing Values and Numeric Attributes
• Missing values usually fail the test
• Covering algorithm must either
§ Use other tests to separate out positive instances
§ Leave them uncovered until later in the process
• In some cases it is better to treat “missing” as a separate value
• Numeric attributes are treated as in decision trees
37
38. Prof. Pier Luca Lanzi
Stopping Criterion and Rule Pruning
• The process usually stops when there is no significant
improvement by adding the new rule
• Rule pruning is similar to post-pruning of decision trees
• Reduced Error Pruning:
§ Remove one of the conjuncts in the rule
§ Compare error rate on validation set
§ If error improves, prune the conjunct
38
39. Prof. Pier Luca Lanzi
Rules vs. Trees
• Rule sets can be more readable
• Decision trees suffer from replicated subtrees
• Rule sets are collections of local models, trees represent models
over the whole domain
• The covering algorithm concentrates on one class at a time
whereas decision tree learner takes all classes into account
39
40. Prof. Pier Luca Lanzi
Mining Association Rules for Classification
41. Prof. Pier Luca Lanzi
Mining Association Rules for Classification
(the CBA algorithm)
• Association rule mining assumes that the data consist of a set of
transactions. Thus, the typical tabular representation of data used
in classification must be mapped into such a format.
• Association rule mining is then applied to the new dataset and
the search is focused on association rules in which the tail
identifies a class label
X⇒ ci (where ci is a class label)
• The association rules are pruned using the pessimistic error-based
method used in C4.5
• Finally, rules are sorted to build the final classifier.
41
50. Prof. Pier Luca Lanzi
Summary
• Advantages of Rule-Based Classifiers
§ As highly expressive as decision trees
§ Easy to interpret
§ Easy to generate
§ Can classify new instances rapidly
§ Performance comparable to decision trees
• Two approaches: direct and indirect methods
50
51. Prof. Pier Luca Lanzi
Summary
• Direct Methods, typically apply sequential covering approach
§ Grow a single rule
§ Remove Instances from rule
§ Prune the rule (if necessary)
§ Add rule to Current Rule Set
§ Repeat
• Other approaches exist
§ Specific to general exploration (RISE)
§ Post processing of neural networks,
association rules, decision trees, etc.
51
52. Prof. Pier Luca Lanzi
Homework
• Generate the rule set for the Weather dataset by repeatedly
applying the procedure to learn one rule until no improvement
can be produced or the covered examples are too few
• Check the problems provided in the previous exams and apply
both OneRule and Sequential Covering to generate the first rule.
Then, check the result with one of the implementations available
in Weka
52