Data Mining
Homework Week 1
Submit to Blackboard in electronic form before 11 am on November 11, 2010)
For questions, plea...
Upcoming SlideShare
Loading in...5
×

Dm week01 homework(1)

224

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
224
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Dm week01 homework(1)"

  1. 1. Data Mining Homework Week 1 Submit to Blackboard in electronic form before 11 am on November 11, 2010) For questions, please contact the teaching assistants Spyros Martzoukos: S.Martzoukos@uva.nl (English only!) Jiyin He: j.he@uva.nl (English only!) Exercise 1: Data Mining in General Describe in half a page to one page two scenarios to which you think one could apply data mining. Preferably these two scenarios should be relevant to your professional or personal interests. Describe what you would like to predict with data mining methods and what the relevant attributes in these applications are. Describe also what type of data you would use and what kind of problems you could anticipate. Exercise 2: Probabilities How can Bayes’ rule be derived from simpler definitions, such as the definition of conditional probability, symmetry of joint probability, the chain rule? Give a step-wise derivation, mentioning which rule you applied at each step. Exercise 3: Entropy 3.1 Assume a variable X with three possible values: a, b, and c. If p(a) = 0.4, and p(b) = 0.25, what is the entropy of of X, i.e., what is H(X)? [You can use a calculator for this exercise.] 3.2 Assuming the probability values in the the previous exercise, what is the minimum number of bits that we need to use on average to represent the values of X? [You can use a calculator for this exercise.] 3.3 Assume a variable X with three possible values: a, b, and c. What is the probability distribution with the highest entropy? Which one(s) has/have the lowest one? Explain in a sentence or two and in your in own words why these distributions have the highest and lowest entropies. 3.4 In general, if a variable X has n possible values, what is the maximum entropy? 1

×