20 mins of Liblinear

3,732 views

Published on

Published in: Technology, Education
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,732
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
0
Likes
7
Embeds 0
No embeds

No notes for slide

20 mins of Liblinear

  1. 1. LIBLINEAR IN 20 MINSChandler Huangprevia [at] gmail.com
  2. 2. Liblinear SVM: Looking for a hyper-plane to separate sampledata SVR: Looking for a hyper-plane to predict datadistribution Example:PASS Grade w1 w2 w3 w4T 95 4.7 118 1M 172T 70 3 121 1.2M 181F 55 3.6 102 0.8M 173F 48 2.7 108 0.85M 183
  3. 3. Liblinear Both solve with different
  4. 4. Python wrapper of Liblinear liblinear.py liblinear = CDLL(path.join(dirname,../liblinear.so.1)) Class: feature_node, problem, parameter, model liblinearutil.py import liblinear load/save_model(), evaluations(), train(), predict()
  5. 5. SOP Text classification Text segmentation Feature selection Train model Verify testing data
  6. 6. SOP Text classification Text segmentation N-Gram, HMM Segmentor for Python (Opensource) 囉嗦(Loso) http://opensource.plurk.com/Loso_Chinese_Segmentation_System/ 結巴(jieba) https://github.com/fxsjy/jieba Smallseg https://code.google.com/p/smallseg/
  7. 7. SOP Text classification Feature selection Garbage in garbage out EX: Wiki title index http://dumps.wikimedia.org/zhwiktionary/ Libsvm vs Liblinear Libsvm:O(n2) or O(n3) Liblinear: O(n) in practice libsvm becomes painfully slow at 10k samples. http://tinyurl.com/ke4btjv
  8. 8. SOP Text classification Train Format Solver type(default 1)0 -- L2-regularized logistic regression (primal)1 -- L2-regularized L2-loss support vector classification(dual)2 -- L2-regularized L2-loss support vector classification(primal)3 -- L2-regularized L1-loss support vector classification(dual)4 -- support vector classification by Crammer and Singer5 -- L1-regularized L2-loss support vector classification
  9. 9. SOP Text classification Train -c cost set the parameter C (default 1) -p epsilon set the epsilon in loss function of epsilon-SVR (default 0.1) -e epsilon set tolerance of termination criterion -B bias if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term added (default -1) -wi weight weights adjust the parameter C of different classes (see README for details) -v n n-fold cross validation mode -q quiet mode (no outputs)
  10. 10. SOP Text classification Verify testing data Using predict()
  11. 11. LIVE DEMO
  12. 12. Reference LIBLINEAR A Library for Large Linear Classication http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf L1, L2-Regularization L1 vs. L2 Regularization and feature selection http://cs.nyu.edu/~rostami/presentations/L1_vs_L2.pdf L1-norm Regularization http://cseweb.ucsd.edu/~saul/teaching/cse291s07/L1norm.pdf Sparsity and Some Basics of L1 Regularization http://freemind.pluskid.org/machine-learning/sparsity-and-some-basics-of-l1-regularization/
  13. 13. Reference Segmentor 四款python中文分词系统简单测试 http://hi.baidu.com/fooying/item/6ae7a0e26087e8d7eb34c9e8 MMSEG http://technology.chtsai.org/mmseg/ 開源中國,中文分詞庫 http://tinyurl.com/k564x9k
  14. 14. THANKS

×