Svm algorithm

OUTLINE
• Introduction of Parse Reranking
• SVM
• An SVM Based Voting Algorithm.
• Theoretical Justification.
• Experiments on Parse Reranking
• Conclusions

INTRODUCTION PARSE RERANKING
• Motivation (Collins)
Rank Parses Log-
likelihood
F-score Rerank Vote
1 P2 -120.0 92% 3
2 P3 -121.5 90% 4
3 P1 -122.0 96% 1 X
4 P4 -122.5 93% 2

SUPPORT VECTOR MACHINES
• The SVM is a large margin classifier that searches for the
hyperplane that maximizes the margin between the positive
samples and the negative samples.

• Measures of the capacity of a learning machine: VC Dimension, Fat
Shattering Dimension
• The capacity of a learning machine is related to the margin on the
training data.
- As the margin goes up, VC-dimension may go down and thus the
upper bound of the test error goes down. (Vapnik 79)

• SVMs’ theoretical accuracy is much lower than their actual
performance. The margin based upper bounds of the test error
are too loose. •
• This is why – SVM based voting algorithm.

SVM BASED VOTING
• Previous work (Dijkstra 02)- Use SVM for parse reranking
directly.- Positive samples: parse with highest f-score for each
sentence.
• First try-Tree kernel: compute dot-product on the space of all
the subtrees (Collins 02)-Linear kernel: rich features (Collins
00)

SVM BASED VOTING ALGORITHM
• Using pairwise parses as samples
• Let 𝑥𝑖𝑗 is the 𝑗-th candidate parse for the 𝑖-th sentence in the training
data.
• Let 𝑥𝑖𝑗 is the parse with highest f-score among all the parses for the 𝑖-
th sentence.
• Positive samples: (𝑥𝑖1,𝑥𝑖𝑗),𝑗 > 1
• Negative samples: (𝑥𝑖𝑗,𝑥𝑖1),𝑗 > 1

PREFERENCE KERNELS
• Let 𝑡1, 𝑡2 , (𝑣1, 𝑣2)are two pairs of parses
• K – kernel : linear or tree kernel
• The preference kernel is defined:
𝑃𝐾((𝑡1, 𝑡2), (𝑣1, 𝑣2)) =
𝐾 (𝑡1, 𝑣1)- 𝐾 (𝑡1, 𝑣2)- 𝐾 𝑡2, 𝑣1 + 𝐾 (𝑡2, 𝑣2)
• A sample (𝑡1, 𝑣1) represents the difference between a good
parse and a bad one, the preference computes the similarity
between the two differences.

SVM BASED VOTING
• Decision function f of SVM: for each of the pair parses:
𝑓 𝑥1, 𝑥2 = 𝑠𝑐𝑜𝑟𝑒(𝑥1) - 𝑠𝑐𝑜𝑟𝑒(𝑥2)
𝑠𝑐𝑜𝑟𝑒(𝑥) = 𝑖=1
𝑁𝑠
𝑎𝑖, 𝑦𝑖(𝐾(𝑠𝑖1, 𝑥),𝐾(𝑠𝑖2, 𝑥)
(𝑠𝑖1,𝑠𝑖2
) is the 𝑖-th support vector
𝑁𝑠 is the total number of support vectors
𝑌𝑖 is the class of (𝑠𝑖1,𝑠𝑖2
) can be {-1,1}
𝑎𝑖 is the Lagrange multiplier solved by the SVM

THEORETICAL ISSUES
• Justifying the Preference Kernel
• Justifying Pairwise Samples
• Margin Based Bound for the SVM Based Voting Algorithm

JUSTIFYING THE PREFERENCE KERNEL
• The Kernel
(𝑥1, 𝑥2) = 𝑝(𝑥1)𝑝(𝑥2)
• The preference Kernel
𝑃𝐾( 𝑡1, 𝑡2 , 𝑣1, 𝑣2 ) =
𝐾 𝑡1, 𝑣1 - 𝐾 𝑡1, 𝑣2 - 𝐾 𝑡2, 𝑣1 +𝐾 𝑡2, 𝑣2 =
𝑝(𝑡1)𝑝(𝑣1) - 𝑝(𝑡1)𝑝(𝑣2) - 𝑝(𝑡2)𝑝(𝑣1) + 𝑝(𝑡2)𝑝(𝑣2) =
(𝑝 𝑡1 − 𝑝(𝑡2)) (𝑝(𝑣1) − 𝑝(𝑣2))

JUSTIFYING THE PAIRWISE SAMPLES
• The SVM using simple parses as samples searches for a decision
function score constrained by the condition:
- score (𝑥𝑖1) > 0
- score (𝑥𝑖1) < 0,𝑗 > 1
too strong.
• Pairwise:
- score (𝑥𝑖1) > -score (𝑥𝑖𝑗)

MARGIN BASED BOUND FOR SVM BASED VOTING
• Loss function of voting:
𝑙𝑣𝑜𝑡𝑒(𝑥, 𝑓) = {0 𝑒𝑙𝑠𝑒
1 𝑓 𝑥∗ < 𝑓(𝑥)
• Loss function of classification:
𝑙𝑐𝑙𝑎𝑠𝑠 𝑥1, 𝑥2, 𝑔𝑓 = {1 𝑥2= 𝑥1
∗,𝑔𝑓 𝑥1,𝑥2 = 1
1 𝑥1= 𝑥2
∗,𝑔𝑓 𝑥1,𝑥2 = −1
• Expected voting loss is equal expected classification loss (Herbrich
2000)

EXPERIMENTS – WSJ TREEBANK
• N-best parsing results (Collins 02)
• SVM-light (Joachims 98)
• Two Kernels (K) used in the preference kernel:
- Linear Kernel
- Tree Kernel
• Tree Kernel- very slow

EXPERIMENTS – LINEAR KERNEL
• Training data are cut into slices. Slice i contains two pairwise samples
((𝑝𝑘1𝑝𝑘𝑖), 1), ((𝑝𝑘𝑖𝑝𝑘1), −1) of each sentence.
• 22 SVMs on 22 slices of training data.
• 2 days to train an SVM in a Pentium III 1.13Ghz.

RESULTS
Experimental Results on section 23
≤40 words (2245 sentences)
Model LR LP CBs 0 CBs 2 CBs f-score
Collins 99 88.5% 88.7% 0.92 66.7% 87.1% 88.6%
Charniak 00 90.1% 90.1% 0.74 70.1% 89.6% 90.1%
Collins 00 90.1% 90.4% 0.75 70.7% 89.6% 90.3%
SVM - linear 89.9% 90.3% 0.73 71.7% 89.4% 90.1%
≤40 words (2245 sentences)
Model LR LP CBs 0 CBs 2 CBs f-score
Collins 99 88.1% 88.3% 1.06 64.0% 85.1% 88.2%
Charniak 00 89.6% 89.5% 0.88 67.6% 87.7% 89.6%
Collins 00 89.6% 89.5% 0.87 68.3% 87.7% 89.8%
SVM - linear 89.4% 89.8% 0.89 69.2% 87.6% 89.6%

CONCLUSIONS
• Using an SVM approach:
- achieving state-of-the-art results
- SVM with linear kernel is superior to tree kernel in speed and
accuracy.

Svm algorithm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Svm algorithm

Similar to Svm algorithm (20)

More from Waleed Khan

More from Waleed Khan (7)

Recently uploaded

Recently uploaded (20)

Svm algorithm