SlideShare a Scribd company logo
Data mining
Assignment week 4




BARRY KOLLEE

10349863
Assignment	
  4	
  
	
  
Exercise 1: Pruning
1. Which problem do we try to address when using pruning?

“Overfitting and lack of generalization beyond training data, i.e. models that describe the training data
(too) well, but do not model the principles and characteristics underlying the data.”

On schema level we state that pruning merges a part of the tree together into one node. The difference
is descripted within the two schema’s below:




2. Describe the purpose of separating the data into training, development, and test data.

“Training data is used to build the model, and test data to test it. Just the Training data by itself is not
able to measure to what extend the model will perform (i.e.. generalize to) on unseen data. Test data
measures this, but we should not use the test data to directly inform our model construction. For this
purpose a third set is used: the development data set, which behaves like the test set but the feedback
can be used to change the model”

We create our training set to increase the accuracy of the classifier, which we use on the data. The
more data we train the more accurate the resulting model will be.

The other two sets are used to evaluate the performance of the classifier we use. The development set
is used to evaluate the accuracy of different configurations of our classifier. It’s called the development
set because we continuously need to evaluate the classification performance.

In the end we’ve got a model, which has a great performance on the test data. To get estimates on how
good the new model will deal with new data we use the test data.




2
Assignment	
  4	
  
	
  


Exercise 2: Information Gain and Attributes with Many Values.

Information gain is defined as:


Following to this definition, information gain
favors attributes with many values.
Why? Give an example.


We use a training set with (as shown in the table):

        •        N number of instances
        •        A number of attributes


                                A1              …                  Ak                   A*                class
            1                   T               …                 Black                 V1                 C1
            2                   T               …                 White                 V2                 C2
            ..                  ..              …                  …                    …                   …
            n                   F               …                 Black                 Vn                 Cn

If we want to classify a certain attribute we can state that we have a 50/50 chance of having a ‘-‘ and a
‘+’ classification. So Attribute A* could be a plus or a minus. We note this as follows.




                                [1+, 0-]
       SVi (A*) = {
                                [0+, 1-]



We can calculate the Entropy (uncertainty) of both outcomes of a plus or minus classification:



       H(S+) = - (1/1 log2 1/1 + 0/1 log2 0/1) = 0

       H(S-) = - (0/1 log2 0/1 + 1/1 log2 1/1) = 0


For calculating our information gain we perform the following formula:


       Gain(S, A*) = H(S)                   – (sum |Sv(A*)| / |S| * H(Sv(A*) )

       Gain(S, A*) = Entropy of H(S) – (gain of H(S+) + gain of H(S-))
       Gain(S, A*) = Entropy of H(S) – (0 + 0)
       Gain(S, A*) = Entropy of H(S)


We see that the Entropy of H(S+) and H(S-) is 0. So in the end we will have a high information gain because there’s
nothing to deduct.




3
Assignment	
  4	
  
	
  
Exercise 3: Missing Attribute Values
Consider the following set of training instances.
Instance 2 has a missing value
for attribute a1.

Apply at least two different strategies for dealing
with missing attribute values
and show how they work in this concrete example.

Example 1 :

We can give a prediction on the true/false value for the missing attribute ‘a1’ by looking at the attributes
from a2. Within the a2 attribute there’s an equal chance of having a ‘true’ value and having a ‘false’
value (50 % chance). We could also state this for attribute a1. In conclusion: the missing question mark
could be a ‘false’ value if we use this way of thinking.

Example 2:

We can also focus on the class attribute. Within a2 we can state the following:
   •    There’s a 100 % chance of having a ‘+’ when having the ‘true’ attribute.
   •    There’s a 50 % chance of having a ‘+’ value when having the ‘false’.

With this way of thinking we should write down the ‘true’ value at the question mark

Example 3:

Now we only look at the attribute a1. We can give a precise prediction of the value what should replace
the question mark.:


       P(true) = 2/3
       P(false) = 1/3




4
Assignment	
  4	
  
	
  



Exercise 4: Regression Trees

1. What are the stopping conditions for decision trees predicting discrete
classes?

       1.   All instances under a node have the same label.
       2.   All attributes have been used along a branch
       3.   There are no instances under a node


By labeling every input value we can state that only one of these outcomes is the correct one. We’ve
seen this with the weather example from the lecture. Because we predefine certain outcomes we also
define stopping conditions where it’s ‘Yes or No.




2. Why and how do the stopping conditions have to be changed for decision
trees that predict numerical values (e.g., regression trees)?

1. Measure the standard deviation of all instances under a node. If this value is below a pre-defined
value, we stop.
2. and
3. as before

In stead of defining a certain value like ‘yes’ or ‘no’ we define a certain range where the value can be
any point within that range. I.e. for temperature we define a particular degree in stead of hot and warm.
With this way of making our model we can still put several stopping conditions within our decision tree.




5

More Related Content

What's hot

Maxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun UmraoMaxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
ssuserd6b1fd
 
Decreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umraoDecreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umrao
ssuserd6b1fd
 
Differential in several variables
Differential in several variables Differential in several variables
Differential in several variables
Kum Visal
 
Java arrays
Java    arraysJava    arrays
Java arrays
Mohammed Sikander
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
Hemantha Kulathilake
 
Arrays in Java | Edureka
Arrays in Java | EdurekaArrays in Java | Edureka
Arrays in Java | Edureka
Edureka!
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notes
Infinity Tech Solutions
 
27 power series x
27 power series x27 power series x
27 power series x
math266
 
Principle of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun UmraoPrinciple of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun Umrao
ssuserd6b1fd
 
E10
E10E10
E10
lksoo
 
Limit & Continuity of Functions - Differential Calculus by Arun Umrao
Limit & Continuity of Functions - Differential Calculus by Arun UmraoLimit & Continuity of Functions - Differential Calculus by Arun Umrao
Limit & Continuity of Functions - Differential Calculus by Arun Umrao
ssuserd6b1fd
 
Java căn bản - Chapter3
Java căn bản - Chapter3Java căn bản - Chapter3
Java căn bản - Chapter3Vince Vo
 
Array 31.8.2020 updated
Array 31.8.2020 updatedArray 31.8.2020 updated
Array 31.8.2020 updated
vrgokila
 
Principle of Function Analysis - by Arun Umrao
Principle of Function Analysis - by Arun UmraoPrinciple of Function Analysis - by Arun Umrao
Principle of Function Analysis - by Arun Umrao
ssuserd6b1fd
 
Matlab lab manual
Matlab lab manualMatlab lab manual
Matlab lab manual
nmahi96
 
03 truncation errors
03 truncation errors03 truncation errors
03 truncation errorsmaheej
 
Principle of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun UmraoPrinciple of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun Umrao
ssuserd6b1fd
 
Arrays and structures
Arrays and structuresArrays and structures
Arrays and structuresMohd Arif
 

What's hot (18)

Maxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun UmraoMaxima & Minima of Functions - Differential Calculus by Arun Umrao
Maxima & Minima of Functions - Differential Calculus by Arun Umrao
 
Decreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umraoDecreasing and increasing functions by arun umrao
Decreasing and increasing functions by arun umrao
 
Differential in several variables
Differential in several variables Differential in several variables
Differential in several variables
 
Java arrays
Java    arraysJava    arrays
Java arrays
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
Arrays in Java | Edureka
Arrays in Java | EdurekaArrays in Java | Edureka
Arrays in Java | Edureka
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notes
 
27 power series x
27 power series x27 power series x
27 power series x
 
Principle of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun UmraoPrinciple of Definite Integra - Integral Calculus - by Arun Umrao
Principle of Definite Integra - Integral Calculus - by Arun Umrao
 
E10
E10E10
E10
 
Limit & Continuity of Functions - Differential Calculus by Arun Umrao
Limit & Continuity of Functions - Differential Calculus by Arun UmraoLimit & Continuity of Functions - Differential Calculus by Arun Umrao
Limit & Continuity of Functions - Differential Calculus by Arun Umrao
 
Java căn bản - Chapter3
Java căn bản - Chapter3Java căn bản - Chapter3
Java căn bản - Chapter3
 
Array 31.8.2020 updated
Array 31.8.2020 updatedArray 31.8.2020 updated
Array 31.8.2020 updated
 
Principle of Function Analysis - by Arun Umrao
Principle of Function Analysis - by Arun UmraoPrinciple of Function Analysis - by Arun Umrao
Principle of Function Analysis - by Arun Umrao
 
Matlab lab manual
Matlab lab manualMatlab lab manual
Matlab lab manual
 
03 truncation errors
03 truncation errors03 truncation errors
03 truncation errors
 
Principle of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun UmraoPrinciple of Integration - Basic Introduction - by Arun Umrao
Principle of Integration - Basic Introduction - by Arun Umrao
 
Arrays and structures
Arrays and structuresArrays and structures
Arrays and structures
 

Viewers also liked

Tree pruning
Tree pruningTree pruning
Tree pruning
priya_kalia
 
Data mining assignment 1
Data mining assignment 1Data mining assignment 1
Data mining assignment 1
BarryK88
 
DATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORDATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTOR
Renuka Chand
 
Csc1100 lecture04 ch04
Csc1100 lecture04 ch04Csc1100 lecture04 ch04
Csc1100 lecture04 ch04IIUM
 
05 Conditional statements
05 Conditional statements05 Conditional statements
05 Conditional statements
maznabili
 
01 10 speech channel assignment
01 10 speech channel assignment01 10 speech channel assignment
01 10 speech channel assignmentEricsson Saudi
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and delete
Platonov Sergey
 
Data Engineering - Data Mining Assignment
Data Engineering - Data Mining AssignmentData Engineering - Data Mining Assignment
Data Engineering - Data Mining AssignmentDarran Mottershead
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
Hein Min Htike
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance. Ranjith Gowda
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
Krish_ver2
 
Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetMateusz Brzoska
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKAsatyamkhatri
 
Data ming wsn
Data ming wsnData ming wsn
Data ming wsn
Mesbah-Ul Islam
 

Viewers also liked (19)

Tree pruning
Tree pruningTree pruning
Tree pruning
 
Data mining assignment 1
Data mining assignment 1Data mining assignment 1
Data mining assignment 1
 
DATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTORDATA MINING IN RETAIL SECTOR
DATA MINING IN RETAIL SECTOR
 
Csc1100 lecture04 ch04
Csc1100 lecture04 ch04Csc1100 lecture04 ch04
Csc1100 lecture04 ch04
 
05 Conditional statements
05 Conditional statements05 Conditional statements
05 Conditional statements
 
01 10 speech channel assignment
01 10 speech channel assignment01 10 speech channel assignment
01 10 speech channel assignment
 
Project_702
Project_702Project_702
Project_702
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and delete
 
Data Engineering - Data Mining Assignment
Data Engineering - Data Mining AssignmentData Engineering - Data Mining Assignment
Data Engineering - Data Mining Assignment
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
 
Data mining to predict academic performance.
Data mining to predict academic performance. Data mining to predict academic performance.
Data mining to predict academic performance.
 
4.2 bst
4.2 bst4.2 bst
4.2 bst
 
Data Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data SetData Mining – analyse Bank Marketing Data Set
Data Mining – analyse Bank Marketing Data Set
 
DATA MINING on WEKA
DATA MINING on WEKADATA MINING on WEKA
DATA MINING on WEKA
 
Ch06
Ch06Ch06
Ch06
 
Decision trees
Decision treesDecision trees
Decision trees
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Data ming wsn
Data ming wsnData ming wsn
Data ming wsn
 

Similar to Data mining assignment 4

Midterm
MidtermMidterm
Midterm sols
Midterm solsMidterm sols
Midterm sols
Robert Edwards
 
03-Primitive-Datatypes.pdf
03-Primitive-Datatypes.pdf03-Primitive-Datatypes.pdf
03-Primitive-Datatypes.pdf
KaraBaesh
 
Chapter 13.pptx
Chapter 13.pptxChapter 13.pptx
Chapter 13.pptx
AnisZahirahAzman
 
Python Programming
Python Programming Python Programming
Python Programming
Sreedhar Chowdam
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshop
BAINIDA
 
Array in C full basic explanation
Array in C full basic explanationArray in C full basic explanation
Array in C full basic explanation
TeresaJencyBala
 
The Ring programming language version 1.5.4 book - Part 179 of 185
The Ring programming language version 1.5.4 book - Part 179 of 185The Ring programming language version 1.5.4 book - Part 179 of 185
The Ring programming language version 1.5.4 book - Part 179 of 185
Mahmoud Samir Fayed
 
Calculus Application Problem #3 Name _________________________.docx
Calculus Application Problem #3 Name _________________________.docxCalculus Application Problem #3 Name _________________________.docx
Calculus Application Problem #3 Name _________________________.docx
humphrieskalyn
 
Reasoning about laziness
Reasoning about lazinessReasoning about laziness
Reasoning about lazinessJohan Tibell
 
03. Week 03.pptx
03. Week 03.pptx03. Week 03.pptx
03. Week 03.pptx
Vinc2ntCabrera
 
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
ShiraPrater50
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
Brian Miles
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Introduction to python programming
Introduction to python programmingIntroduction to python programming
Introduction to python programming
Rakotoarison Louis Frederick
 
Lesson 18-20.pptx
Lesson 18-20.pptxLesson 18-20.pptx
Lesson 18-20.pptx
MIZANURRAHMANTUSHAR1
 
The Ring programming language version 1.5.2 book - Part 175 of 181
The Ring programming language version 1.5.2 book - Part 175 of 181The Ring programming language version 1.5.2 book - Part 175 of 181
The Ring programming language version 1.5.2 book - Part 175 of 181
Mahmoud Samir Fayed
 
Java: Introduction to Arrays
Java: Introduction to ArraysJava: Introduction to Arrays
Java: Introduction to Arrays
Tareq Hasan
 

Similar to Data mining assignment 4 (20)

Midterm
MidtermMidterm
Midterm
 
Midterm sols
Midterm solsMidterm sols
Midterm sols
 
03-Primitive-Datatypes.pdf
03-Primitive-Datatypes.pdf03-Primitive-Datatypes.pdf
03-Primitive-Datatypes.pdf
 
Chapter 13.pptx
Chapter 13.pptxChapter 13.pptx
Chapter 13.pptx
 
Python Programming
Python Programming Python Programming
Python Programming
 
Python programming workshop
Python programming workshopPython programming workshop
Python programming workshop
 
Array in C full basic explanation
Array in C full basic explanationArray in C full basic explanation
Array in C full basic explanation
 
The Ring programming language version 1.5.4 book - Part 179 of 185
The Ring programming language version 1.5.4 book - Part 179 of 185The Ring programming language version 1.5.4 book - Part 179 of 185
The Ring programming language version 1.5.4 book - Part 179 of 185
 
Calculus Application Problem #3 Name _________________________.docx
Calculus Application Problem #3 Name _________________________.docxCalculus Application Problem #3 Name _________________________.docx
Calculus Application Problem #3 Name _________________________.docx
 
Reasoning about laziness
Reasoning about lazinessReasoning about laziness
Reasoning about laziness
 
03. Week 03.pptx
03. Week 03.pptx03. Week 03.pptx
03. Week 03.pptx
 
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx  BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
BUS 308 Week 4 Lecture 3 Developing Relationships in Exc.docx
 
Statistics assignment
Statistics assignmentStatistics assignment
Statistics assignment
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Introduction to python programming
Introduction to python programmingIntroduction to python programming
Introduction to python programming
 
Lesson 18-20.pptx
Lesson 18-20.pptxLesson 18-20.pptx
Lesson 18-20.pptx
 
The Ring programming language version 1.5.2 book - Part 175 of 181
The Ring programming language version 1.5.2 book - Part 175 of 181The Ring programming language version 1.5.2 book - Part 175 of 181
The Ring programming language version 1.5.2 book - Part 175 of 181
 
Java: Introduction to Arrays
Java: Introduction to ArraysJava: Introduction to Arrays
Java: Introduction to Arrays
 

More from BarryK88

Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)
BarryK88
 
Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)
BarryK88
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
BarryK88
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
BarryK88
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2
BarryK88
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1
BarryK88
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
BarryK88
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
BarryK88
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
BarryK88
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
BarryK88
 

More from BarryK88 (10)

Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)
 
Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
 

Recently uploaded

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 

Recently uploaded (20)

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 

Data mining assignment 4

  • 1. Data mining Assignment week 4 BARRY KOLLEE 10349863
  • 2. Assignment  4     Exercise 1: Pruning 1. Which problem do we try to address when using pruning? “Overfitting and lack of generalization beyond training data, i.e. models that describe the training data (too) well, but do not model the principles and characteristics underlying the data.” On schema level we state that pruning merges a part of the tree together into one node. The difference is descripted within the two schema’s below: 2. Describe the purpose of separating the data into training, development, and test data. “Training data is used to build the model, and test data to test it. Just the Training data by itself is not able to measure to what extend the model will perform (i.e.. generalize to) on unseen data. Test data measures this, but we should not use the test data to directly inform our model construction. For this purpose a third set is used: the development data set, which behaves like the test set but the feedback can be used to change the model” We create our training set to increase the accuracy of the classifier, which we use on the data. The more data we train the more accurate the resulting model will be. The other two sets are used to evaluate the performance of the classifier we use. The development set is used to evaluate the accuracy of different configurations of our classifier. It’s called the development set because we continuously need to evaluate the classification performance. In the end we’ve got a model, which has a great performance on the test data. To get estimates on how good the new model will deal with new data we use the test data. 2
  • 3. Assignment  4     Exercise 2: Information Gain and Attributes with Many Values. Information gain is defined as: Following to this definition, information gain favors attributes with many values. Why? Give an example. We use a training set with (as shown in the table): • N number of instances • A number of attributes A1 … Ak A* class 1 T … Black V1 C1 2 T … White V2 C2 .. .. … … … … n F … Black Vn Cn If we want to classify a certain attribute we can state that we have a 50/50 chance of having a ‘-‘ and a ‘+’ classification. So Attribute A* could be a plus or a minus. We note this as follows. [1+, 0-] SVi (A*) = { [0+, 1-] We can calculate the Entropy (uncertainty) of both outcomes of a plus or minus classification: H(S+) = - (1/1 log2 1/1 + 0/1 log2 0/1) = 0 H(S-) = - (0/1 log2 0/1 + 1/1 log2 1/1) = 0 For calculating our information gain we perform the following formula: Gain(S, A*) = H(S) – (sum |Sv(A*)| / |S| * H(Sv(A*) ) Gain(S, A*) = Entropy of H(S) – (gain of H(S+) + gain of H(S-)) Gain(S, A*) = Entropy of H(S) – (0 + 0) Gain(S, A*) = Entropy of H(S) We see that the Entropy of H(S+) and H(S-) is 0. So in the end we will have a high information gain because there’s nothing to deduct. 3
  • 4. Assignment  4     Exercise 3: Missing Attribute Values Consider the following set of training instances. Instance 2 has a missing value for attribute a1. Apply at least two different strategies for dealing with missing attribute values and show how they work in this concrete example. Example 1 : We can give a prediction on the true/false value for the missing attribute ‘a1’ by looking at the attributes from a2. Within the a2 attribute there’s an equal chance of having a ‘true’ value and having a ‘false’ value (50 % chance). We could also state this for attribute a1. In conclusion: the missing question mark could be a ‘false’ value if we use this way of thinking. Example 2: We can also focus on the class attribute. Within a2 we can state the following: • There’s a 100 % chance of having a ‘+’ when having the ‘true’ attribute. • There’s a 50 % chance of having a ‘+’ value when having the ‘false’. With this way of thinking we should write down the ‘true’ value at the question mark Example 3: Now we only look at the attribute a1. We can give a precise prediction of the value what should replace the question mark.: P(true) = 2/3 P(false) = 1/3 4
  • 5. Assignment  4     Exercise 4: Regression Trees 1. What are the stopping conditions for decision trees predicting discrete classes? 1. All instances under a node have the same label. 2. All attributes have been used along a branch 3. There are no instances under a node By labeling every input value we can state that only one of these outcomes is the correct one. We’ve seen this with the weather example from the lecture. Because we predefine certain outcomes we also define stopping conditions where it’s ‘Yes or No. 2. Why and how do the stopping conditions have to be changed for decision trees that predict numerical values (e.g., regression trees)? 1. Measure the standard deviation of all instances under a node. If this value is below a pre-defined value, we stop. 2. and 3. as before In stead of defining a certain value like ‘yes’ or ‘no’ we define a certain range where the value can be any point within that range. I.e. for temperature we define a particular degree in stead of hot and warm. With this way of making our model we can still put several stopping conditions within our decision tree. 5