2. Liblinear
SVM:
Looking for a hyper-plane to separate sample
data
SVR:
Looking for a hyper-plane to predict data
distribution
Example:
PASS Grade w1 w2 w3 w4
T 95 4.7 118 1M 172
T 70 3 121 1.2M 181
F 55 3.6 102 0.8M 173
F 48 2.7 108 0.85M 183
6. SOP
Text classification
Text segmentation
N-Gram, HMM
Segmentor for Python (Opensource)
囉嗦(Loso)
http://opensource.plurk.com/Loso_Chinese_Segmenta
tion_System/
結巴(jieba)
https://github.com/fxsjy/jieba
Smallseg
https://code.google.com/p/smallseg/
7. SOP
Text classification
Feature selection
Garbage in garbage out
EX: Wiki title index
http://dumps.wikimedia.org/zhwiktionary/
Libsvm vs Liblinear
Libsvm:O(n2) or O(n3)
Liblinear: O(n)
in practice libsvm becomes painfully slow at 10k samples.
http://tinyurl.com/ke4btjv
8. SOP
Text classification
Train
Format
Solver type(default 1)
0 -- L2-regularized logistic regression (primal)
1 -- L2-regularized L2-loss support vector classification
(dual)
2 -- L2-regularized L2-loss support vector classification
(primal)
3 -- L2-regularized L1-loss support vector classification
(dual)
4 -- support vector classification by Crammer and Singer
5 -- L1-regularized L2-loss support vector classification
9. SOP
Text classification
Train
-c cost
set the parameter C (default 1)
-p epsilon
set the epsilon in loss function of epsilon-SVR (default 0.1)
-e epsilon
set tolerance of termination criterion
-B bias
if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term added (default -1)
-wi weight
weights adjust the parameter C of different classes (see README for details)
-v n
n-fold cross validation mode
-q
quiet mode (no outputs)
12. Reference
LIBLINEAR
A Library for Large Linear Classication
http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf
L1, L2-Regularization
L1 vs. L2 Regularization and feature selection
http://cs.nyu.edu/~rostami/presentations/L1_vs_L2.pdf
L1-norm Regularization
http://cseweb.ucsd.edu/~saul/teaching/cse291s07/L1norm.
pdf
Sparsity and Some Basics of L1 Regularization
http://freemind.pluskid.org/machine-learning/sparsity-and-
some-basics-of-l1-regularization/