1.
Data Mining
Homework Week 1
Submit to Blackboard in electronic form before 11 am on November 11, 2010)
For questions, please contact the teaching assistants
Spyros Martzoukos: S.Martzoukos@uva.nl (English only!)
Jiyin He: j.he@uva.nl (English only!)
Exercise 1: Data Mining in General
Describe in half a page to one page two scenarios to which you think one could apply
data mining. Preferably these two scenarios should be relevant to your professional
or personal interests. Describe what you would like to predict with data mining
methods and what the relevant attributes in these applications are. Describe also
what type of data you would use and what kind of problems you could anticipate.
Exercise 2: Probabilities
How can Bayes’ rule be derived from simpler deﬁnitions, such as the deﬁnition
of conditional probability, symmetry of joint probability, the chain rule? Give a
step-wise derivation, mentioning which rule you applied at each step.
Exercise 3: Entropy
3.1
Assume a variable X with three possible values: a, b, and c. If p(a) = 0.4, and
p(b) = 0.25, what is the entropy of of X, i.e., what is H(X)? [You can use a
calculator for this exercise.]
3.2
Assuming the probability values in the the previous exercise, what is the minimum
number of bits that we need to use on average to represent the values of X? [You
can use a calculator for this exercise.]
3.3
Assume a variable X with three possible values: a, b, and c. What is the probability
distribution with the highest entropy? Which one(s) has/have the lowest one?
Explain in a sentence or two and in your in own words why these distributions have
the highest and lowest entropies.
3.4
In general, if a variable X has n possible values, what is the maximum entropy?
1
Be the first to comment