53. u Examples: temperature in Kelvin, length, time, counts
Attribute
Type
Description Examples Operations
Nominal The values of a nominal attribute are
just different names, i.e., nominal
attributes provide only enough
information to distinguish one object
from another. (=, ¹)
zip codes, employee
ID numbers, eye color,
sex: {male, female}
mode, entropy,
contingency
correlation, c2 test
Ordinal The values of an ordinal attribute
provide enough information to order
objects. (<, >)
hardness of minerals,
{good, better, best},
grades, street numbers
median, percentiles,
rank correlation,
run tests, sign tests
Interval For interval attributes, the
54. differences between values are
meaningful, i.e., a unit of
measurement exists.
(+, - )
calendar dates,
temperature in Celsius
or Fahrenheit
mean, standard
deviation, Pearson's
correlation, t and F
tests
Ratio For ratio variables, both differences
and ratios are meaningful. (*, /)
temperature in Kelvin,
monetary quantities,
counts, age, mass,
length, electrical
current
geometric mean,
harmonic mean,
percent variation
Attribute
Level
Transformation Comments
Nominal Any permutation of values If all employee ID numbers
were reassigned, would it
63. ● Examples: Generic graph and HTML Links
● Data objects are nodes, links are properties
5
2
1
2
5
<a href="papers/papers.html#bbbb">
Data Mining </a>
<li>
<a href="papers/papers.html#aaaa">
Graph Partitioning </a>
<li>
<a href="papers/papers.html#aaaa">
Parallel
Solution
of Sparse Linear System of Equations </a>
<li>
<a href="papers/papers.html#ffff">
N-Body Computation and Dense Linear System Solvers
69. Missing Values
● Reasons for missing values
– Information is not collected
(e.g., people decline to give their age and weight)
– Attributes may not be applicable to all cases
(e.g., annual income is not applicable to children)
● Handling missing values
– Eliminate Data Objects (unless many missing)
– Estimate Missing Values (avg., most common val.)
– Ignore the Missing Value During Analysis
– Replace with all possible values (weighted by their
probabilities)
100. Dr. Oner Celepcikay
Dept. of Information Technology &
School of Computer and Information Sciences
University of the Cumberlands
Chapter 2 Assignment
1. What's noise? How can noise be reduced in a dataset?
2. Define outlier. Describe 2 different approaches to detect
outliers in a dataset.
3. Give 2 examples in which aggregation is useful.
4. What's stratified sampling? Why is it preferred?
5. Provide a brief description of what Principal Components
101. Analysis (PCA) does. [Hint: See
Appendix A and your lecture notes.] State what's the input and
what the output of PCA is.
6. What's the difference between dimensionality reduction and
feature selection?
7. What's the difference between feature selection and feature
extraction?
8. Give two examples of data in which feature extraction would
be useful.
9. What's data discretization and when is it needed?
10. How are the Correlation and Covariance, used in data pre-
processing (see pp. 76-78).