2. βΊ Learning Task
β Given: Expression profiles of leukemia patients and healthy
persons.
β Compute: A model distinguishing if a person has leukemia from
expression data.
βΊ Classification Task
β Given: Expression profile of a new patient + a learned model
β Determine: If a patient has leukemia or not.
3. οΆ Often high dimension of data.
οΆ Hard to put up simple rules.
οΆ Amount of data.
ο Need automated ways to deal with the data.
ο Use computers β data processing, statistical analysis, try to learn
patterns from the data (Machine Learning)
4. o Binary Classification problem
o The data above the red line
belongs to class βxβ
o The data below the red line
belongs to class βoβ
ο§ Examples: SVM, Probabilistic
Classifier
x
x
x
x
xx
x
x
x
x
o
o
o
o
o
o
o
o
o o
o
o
o
5. o Classification of high-dimensional
data sets
o No need for feature selection
o Quadratic programming problem
o Finds an optimal solution.
o Most successful current text
classification method
6. o Lots of possible linear separator
o Select one that maximizes the margin!
o Separator depends only on a small
number of training examples.
f(x) =-1
=+1
10. 0
x
ο§ Datasets that are linearly separable work out
great
ο§ But what are we going to do if the dataset is just too
hard?
0
x
11. 0
x
ο§ How about β¦ mapping data to a higher-dimensional
space:
0
x
12. βΊ Mapping to transform training data into a higher
dimension
βΊ With the new dimension, it searches for the linear
optimal separating hyperplane.
βΊ With an appropriate nonlinear mapping, data from two
classes can always be separated by a hyperplane.
βΊ SVM finds this hyperplane using support vectors.