View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
As we said, this is the game we are playing; in NLP, it has always been clear, that the raw information In a sentence is not sufficient, as is to represent a good predictor. Better functions of the input were Generated, and learning was done in these terms.
Good treatment in Bishop, Chp 3 Classic Weiner filtering solution; text omits 0.5 factor; In any case we use the gradient and eta (text) or R (these notes) to modulate the step size
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Badges game Don’t give me the answer Start thinking about how to write a program that will figure out whether my name has + or – next to it.
Transcript
1.
CS 446: Machine Learning Gerald DeJong [email_address] 3-0491 3320 SC Recent approval for a TA to be named later
Given: Examples (x,f ( x)) of some unknown function f
Find: A good approximation to f
x provides some representation of the input
The process of mapping a domain element into a representation is called Feature Extraction. (Hard; ill-understood; important)
x 2 {0,1} n or x 2 < n
The target function (label)
f(x) 2 {-1,+1} Binary Classification
f(x) 2 {1,2,3,.,k-1} Multi-class classification
f(x) 2 < Regression
8.
Example and Hypothesis Spaces X H X: Example Space – set of all well-formed inputs [w/a distribution] H: Hypothesis Space – set of all well-formed outputs - - + + + - - - +
10.
y = f (x 1 , x 2 , x 3 , x 4 ) Unknown function x 1 x 2 x 3 x 4 A Learning Problem X H ? ? (Boolean: x1, x2, x3, x4, f )
11.
y = f (x 1 , x 2 , x 3 , x 4 ) Unknown function x 1 x 2 x 3 x 4 Training Set Example x 1 x 2 x 3 x 4 y 1 0 0 1 0 0 3 0 0 1 1 1 4 1 0 0 1 1 5 0 1 1 0 0 6 1 1 0 0 0 7 0 1 0 1 0 2 0 1 0 0 0
conjunctive rules of the form y=x i Æ x j Æ x k ...
No simple rule explains the data. The same is true for simple clauses
1 0 0 1 0 0 3 0 0 1 1 1 4 1 0 0 1 1 5 0 1 1 0 0 6 1 1 0 0 0 7 0 1 0 1 0 2 0 1 0 0 0 y =c x 1 1100 0 x 2 0100 0 x 3 0110 0 x 4 0101 1 x 1 x 2 1100 0 x 1 x 3 0011 1 x 1 x 4 0011 1 Rule Counterexample x 2 x 3 0011 1 x 2 x 4 0011 1 x 3 x 4 1001 1 x 1 x 2 x 3 0011 1 x 1 x 2 x 4 0011 1 x 1 x 3 x 4 0011 1 x 2 x 3 x 4 0011 1 x 1 x 2 x 3 x 4 0011 1 Rule Counterexample
Target function (concept): The true function f (?)
Hypothesis: A proposed function h, believed to be similar to f.
Concept: Boolean function. Example for which f (x)= 1 are positive examples; those for which f (x)= 0 are negative examples (instances) (sometimes used interchangeably w/ “Hypothesis”)
Classifier: A discrete valued function. The possible value of f: {1,2,…K} are the classes or class labels .
Hypothesis space: The space of all hypotheses that can, in principle, be output by the learning algorithm.
Version Space: The space of all hypothesis in the hypothesis space that have not yet been ruled out.
Our prediction on the d-th example x is therefore:
Let t d be the target value for this example ( real value; represents u ¢ x )
A convenient error function of the data set is:
(i (subscript) – vector component; j (superscript) - time; d – example #) Assumption: x 2 R n ; u 2 R n is the target weight vector; the target (label) is t d = u ¢ x Noise has been added; so, possibly, no weight vector is consistent with the data.
In the general (non-separable) case the learning rate R
must decrease to zero to guarantee convergence. It cannot decrease too quickly nor too slowly.
The learning rate is called the step size. There are more
sophisticates algorithms (Conjugate Gradient) that choose
the step size automatically and converge faster.
There is only one “basin” for linear threshold units, so a
local minimum is the global minimum. However, choosing
a starting point can make the algorithm converge much
faster.
46.
Computational Issues Assume the data is linearly separable. Sample complexity: Suppose we want to ensure that our LTU has an error rate (on new examples) of less than with high probability(at least (1- )) How large must m (the number of examples) be in order to achieve this? It can be shown that for n dimensional problems m = O(1/ [ln(1/ ) + (n+1) ln(1/ ) ]. Computational complexity: What can be said? It can be shown that there exists a polynomial time algorithm for finding consistent LTU (by reduction from linear programming). (On-line algorithms have inverse quadratic dependence on the margin)
Set J( w ) = 0 and solve for w . Can be accomplished using SVD
methods.
Fisher Linear Discriminant:
A direct computation method.
Probabilistic methods (naive Bayes):
Produces a stochastic classifier that can be viewed as a linear
threshold unit.
Winnow:
A multiplicative update algorithm with the property that it can
handle large numbers of irrelevant attributes.
48.
Summary of LMS algorithms for LTUs Local search: Begins with initial weight vector. Modifies iteratively to minimize and error function. The error function is loosely related to the goal of minimizing the number of classification errors. Memory: The classifier is constructed from the training examples. The examples can then be discarded. Online or Batch: Both online and batch variants of the algorithms can be used.
Now we can compute explicitly: We can do a similar computation for the means:
J(w) = | m P – m N | 2 / S 2 p + s 2 N = w t S B w / w t S W w
We are looking for a the value of w that maximizes this expression.
This is a generalized eigenvalue problem; when S W is nonsingular, it is just a eigenvalue problem. The solution can be written without solving the problem, as:
w=S -1 W (M P -M N )
This is the Fisher Linear Discriminant .
1 : We converted a d-dimensional problem to a 1-dimensional problem and suggested a solution that makes some sense.
2: We have a solution that makes sense; how to make it a classifier? And, how good it is?
It turns out that both problems can be solved if we make assumptions. E.g., if the data consists of two classes of points, generated according to a normal distribution, with the same covariance. Then:
The solution is optimal.
Classification can be done by choosing a threshold, which can be computed.
Views
Actions
Embeds 0
Report content