SlideShare a Scribd company logo
1 of 3
Download to read offline
Data mining
Assignment week 1




BARRY KOLLEE

10349863
Assignment	
  1	
  
	
  
Exercise 1: Data Mining in General
Describe in half a page to one page two scenarios to which you think one could apply data
mining. Preferably these two scenarios should be relevant to your professional or personal
interests. Describe what you would like to predict with data mining methods and what the
relevant attributes in these applications are. Describe also what type of data you would use
and what kind of problems you could anticipate.

What is the chance of the cancelling of the football match of next Sunday at 14:30?

It happened a lot of times that a football matches was being cancelled due to bad weather in my
competition. Unfortunately it always takes a while for being noticed of this cancellation. It would be great
to be able to predict if a football match would continue and that we would be able to plan other stuff with
or without the team during our Sunday or not.

To be able to predict if the match continues we should take several attributes into account where we
base our final decision on. Attributes which we could take into account are:

       •   Amount Of Rain (mm), we would like to know the amount of rain which has fallen for a certain
           amount of time. (hours/days/weeks)
       •   Temperature (degrees), the temperature should definitely being taken into account.
       •   Humidity, We can give a numerical value to the number of moist (mm) within a cubic meter.
       •   Weatherconditions, this attribute could be made out of discrete values like sunny, overcast and
           rainy.
       •   Has scattered sand, this could be a condition where we would like to know if the groundsman
           of this football field has used sand for draining the water from the field
       •   Has artificial grass, in case we play on artificial grass we can less worry about the weather
           conditions.
       •   Will another match being played at the same time (Yes/No)? In case we have just one artificial
           grass field it could happen that another team gets primacy.

The data could come from several weather stations and the football club where we will be playing a
football match. The main problem to address is to find the moments where we can’t play football. What
are the weather conditions when matches are being cancelled compared to the ones when we do play?

What is the chance for football club Ajax of winning the next football match to Real Madrid?

Next Tuesday Ajax plays against Real Madrid and we would like to predict the outcome of this match. In
stead of hoping for an octopus which shows us the outcome of every match we would like that doing the
prediction ourselves. There are several attributes we can take into account:

       •   Number of players
       •   Number of injuries in team
       •   Number of goals during previous matches
       •   Outcomes of the clubs within their national competition
       •   Difference in value of players (how much is this player worth?)

The main problem to address within the example above is that football contains so much variables that
it’s really hard to address the bottlenecks of predicting. Because referees could also influence outcomes
of matches.




2
Assignment	
  1	
  
	
  
Exercise 2: Training and Test Data
Describe the difference between a training set and a test set? What would happen if we do
not make that distinction and combine all available data into one single set?

Training set:

This set of data contains our weighted data. If we predict if we would play football tomorrow we can give
a certain weight to every attribute. If the weather is sunny it doesn’t matter anymore if we have a artificial
grass field or not.

Test set:

A test set is used to see if our training set does what it needs to do. Does my training set predict good
yes/no?

The main difference between these two is that the training set contains weighted values and the test set
doesn’t. We use the test used to see of our training set is put up the right way. If we wouldn’t make a
distinction between the two it could result in a very bad result because we haven’t used any true
measured values. I.e. our training set it’s weight hasn’t been set properly and we can’t play football on
sunny days because only one artificial field is available and 7 regular football fields.

Exercise 3: Data Characteristics
Briefly describe and provide an example for each of the following concepts:

1. Feature or attribute

A feature or attribute describes characteristics of an object.

2. Instance

An instance is a component of a class. If we would take the class footballclubsInEurope then the football
club Ajax would be an instance of it. Or when we take the class of the course Data mining. Then I’m an
instance of this class.

3. Classes.

With a class we mean a thing where we can put properties and functionalities under. And these should
have instances. If we would take all footballclubsInEurope then Ajax and Real Madrid are instances of
this supergroup.




3

More Related Content

Viewers also liked

05 Conditional statements
05 Conditional statements05 Conditional statements
05 Conditional statementsmaznabili
 
HCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink AppHCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink AppDarran Mottershead
 
01 10 speech channel assignment
01 10 speech channel assignment01 10 speech channel assignment
01 10 speech channel assignmentEricsson Saudi
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and deletePlatonov Sergey
 
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...Triangle American Marketing Association
 
Loan Processing System
Loan Processing SystemLoan Processing System
Loan Processing Systemtenlaclgt
 
Text classification with Weka
Text classification with WekaText classification with Weka
Text classification with WekaMilad Alshomary
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Chathawee May
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentationkrampert
 

Viewers also liked (14)

05 Conditional statements
05 Conditional statements05 Conditional statements
05 Conditional statements
 
HCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink AppHCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink App
 
01 10 speech channel assignment
01 10 speech channel assignment01 10 speech channel assignment
01 10 speech channel assignment
 
Project_702
Project_702Project_702
Project_702
 
С++ without new and delete
С++ without new and deleteС++ without new and delete
С++ without new and delete
 
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
Steps to Converting Exisiting Visitors to Customers Using Data, Testing and P...
 
Loan Processing System
Loan Processing SystemLoan Processing System
Loan Processing System
 
Text classification with Weka
Text classification with WekaText classification with Weka
Text classification with Weka
 
2014 Profile of Results
2014 Profile of Results2014 Profile of Results
2014 Profile of Results
 
Tutorial weka
Tutorial wekaTutorial weka
Tutorial weka
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144Weka By Chathawee Luangmanotham 54102011144
Weka By Chathawee Luangmanotham 54102011144
 
weka data mining
weka data mining weka data mining
weka data mining
 
Data Mining Final Presentation
Data Mining Final PresentationData Mining Final Presentation
Data Mining Final Presentation
 

Similar to Data mining assignment week 1 predictions

Integer Optimisation for Dream 11 Cricket Team Selection
Integer Optimisation for Dream 11 Cricket Team SelectionInteger Optimisation for Dream 11 Cricket Team Selection
Integer Optimisation for Dream 11 Cricket Team Selectionsaurav singla
 
Football predictions
Football predictionsFootball predictions
Football predictionsponton42
 
Machine Learning-Driven Injury Prediction for a Professional Sports Team
Machine Learning-Driven Injury Prediction for a Professional Sports TeamMachine Learning-Driven Injury Prediction for a Professional Sports Team
Machine Learning-Driven Injury Prediction for a Professional Sports TeamInstitute of Contemporary Sciences
 
Sports performance modelling in 100 ball
Sports performance modelling in 100 ball  Sports performance modelling in 100 ball
Sports performance modelling in 100 ball Devansh Chawla
 
Analysis on Attributes Deciding Cricket Winning
Analysis on Attributes Deciding Cricket WinningAnalysis on Attributes Deciding Cricket Winning
Analysis on Attributes Deciding Cricket WinningIRJET Journal
 
Opponent collective analysis.pptx
Opponent collective analysis.pptxOpponent collective analysis.pptx
Opponent collective analysis.pptxssuser63adf7
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points FinalJohn Michael Croft
 

Similar to Data mining assignment week 1 predictions (10)

Cricket predictor
Cricket predictorCricket predictor
Cricket predictor
 
Integer Optimisation for Dream 11 Cricket Team Selection
Integer Optimisation for Dream 11 Cricket Team SelectionInteger Optimisation for Dream 11 Cricket Team Selection
Integer Optimisation for Dream 11 Cricket Team Selection
 
Football predictions
Football predictionsFootball predictions
Football predictions
 
Lineup Efficiency
Lineup EfficiencyLineup Efficiency
Lineup Efficiency
 
Machine Learning-Driven Injury Prediction for a Professional Sports Team
Machine Learning-Driven Injury Prediction for a Professional Sports TeamMachine Learning-Driven Injury Prediction for a Professional Sports Team
Machine Learning-Driven Injury Prediction for a Professional Sports Team
 
Sports performance modelling in 100 ball
Sports performance modelling in 100 ball  Sports performance modelling in 100 ball
Sports performance modelling in 100 ball
 
Analysis on Attributes Deciding Cricket Winning
Analysis on Attributes Deciding Cricket WinningAnalysis on Attributes Deciding Cricket Winning
Analysis on Attributes Deciding Cricket Winning
 
Opponent collective analysis.pptx
Opponent collective analysis.pptxOpponent collective analysis.pptx
Opponent collective analysis.pptx
 
Cs229 final report
Cs229 final reportCs229 final report
Cs229 final report
 
Regression Analysis of NBA Points Final
Regression Analysis of NBA Points  FinalRegression Analysis of NBA Points  Final
Regression Analysis of NBA Points Final
 

More from BarryK88

Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)BarryK88
 
Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)BarryK88
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3BarryK88
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2BarryK88
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3BarryK88
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6BarryK88
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2BarryK88
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1BarryK88
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignmentBarryK88
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3BarryK88
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2BarryK88
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1BarryK88
 

More from BarryK88 (12)

Data mining test notes (back)
Data mining test notes (back)Data mining test notes (back)
Data mining test notes (back)
 
Data mining test notes (front)
Data mining test notes (front)Data mining test notes (front)
Data mining test notes (front)
 
Data mining Computerassignment 3
Data mining Computerassignment 3Data mining Computerassignment 3
Data mining Computerassignment 3
 
Data mining assignment 2
Data mining assignment 2Data mining assignment 2
Data mining assignment 2
 
Data mining assignment 3
Data mining assignment 3Data mining assignment 3
Data mining assignment 3
 
Data mining assignment 6
Data mining assignment 6Data mining assignment 6
Data mining assignment 6
 
Data mining Computerassignment 2
Data mining Computerassignment 2Data mining Computerassignment 2
Data mining Computerassignment 2
 
Data mining Computerassignment 1
Data mining Computerassignment 1Data mining Computerassignment 1
Data mining Computerassignment 1
 
Semantic web final assignment
Semantic web final assignmentSemantic web final assignment
Semantic web final assignment
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
 
Semantic web assignment1
Semantic web assignment1Semantic web assignment1
Semantic web assignment1
 

Data mining assignment week 1 predictions

  • 1. Data mining Assignment week 1 BARRY KOLLEE 10349863
  • 2. Assignment  1     Exercise 1: Data Mining in General Describe in half a page to one page two scenarios to which you think one could apply data mining. Preferably these two scenarios should be relevant to your professional or personal interests. Describe what you would like to predict with data mining methods and what the relevant attributes in these applications are. Describe also what type of data you would use and what kind of problems you could anticipate. What is the chance of the cancelling of the football match of next Sunday at 14:30? It happened a lot of times that a football matches was being cancelled due to bad weather in my competition. Unfortunately it always takes a while for being noticed of this cancellation. It would be great to be able to predict if a football match would continue and that we would be able to plan other stuff with or without the team during our Sunday or not. To be able to predict if the match continues we should take several attributes into account where we base our final decision on. Attributes which we could take into account are: • Amount Of Rain (mm), we would like to know the amount of rain which has fallen for a certain amount of time. (hours/days/weeks) • Temperature (degrees), the temperature should definitely being taken into account. • Humidity, We can give a numerical value to the number of moist (mm) within a cubic meter. • Weatherconditions, this attribute could be made out of discrete values like sunny, overcast and rainy. • Has scattered sand, this could be a condition where we would like to know if the groundsman of this football field has used sand for draining the water from the field • Has artificial grass, in case we play on artificial grass we can less worry about the weather conditions. • Will another match being played at the same time (Yes/No)? In case we have just one artificial grass field it could happen that another team gets primacy. The data could come from several weather stations and the football club where we will be playing a football match. The main problem to address is to find the moments where we can’t play football. What are the weather conditions when matches are being cancelled compared to the ones when we do play? What is the chance for football club Ajax of winning the next football match to Real Madrid? Next Tuesday Ajax plays against Real Madrid and we would like to predict the outcome of this match. In stead of hoping for an octopus which shows us the outcome of every match we would like that doing the prediction ourselves. There are several attributes we can take into account: • Number of players • Number of injuries in team • Number of goals during previous matches • Outcomes of the clubs within their national competition • Difference in value of players (how much is this player worth?) The main problem to address within the example above is that football contains so much variables that it’s really hard to address the bottlenecks of predicting. Because referees could also influence outcomes of matches. 2
  • 3. Assignment  1     Exercise 2: Training and Test Data Describe the difference between a training set and a test set? What would happen if we do not make that distinction and combine all available data into one single set? Training set: This set of data contains our weighted data. If we predict if we would play football tomorrow we can give a certain weight to every attribute. If the weather is sunny it doesn’t matter anymore if we have a artificial grass field or not. Test set: A test set is used to see if our training set does what it needs to do. Does my training set predict good yes/no? The main difference between these two is that the training set contains weighted values and the test set doesn’t. We use the test used to see of our training set is put up the right way. If we wouldn’t make a distinction between the two it could result in a very bad result because we haven’t used any true measured values. I.e. our training set it’s weight hasn’t been set properly and we can’t play football on sunny days because only one artificial field is available and 7 regular football fields. Exercise 3: Data Characteristics Briefly describe and provide an example for each of the following concepts: 1. Feature or attribute A feature or attribute describes characteristics of an object. 2. Instance An instance is a component of a class. If we would take the class footballclubsInEurope then the football club Ajax would be an instance of it. Or when we take the class of the course Data mining. Then I’m an instance of this class. 3. Classes. With a class we mean a thing where we can put properties and functionalities under. And these should have instances. If we would take all footballclubsInEurope then Ajax and Real Madrid are instances of this supergroup. 3