SlideShare a Scribd company logo
1 of 14
LIBLINEAR IN 20 MINS
Chandler Huang
previa [at] gmail.com
Liblinear
 SVM:
 Looking for a hyper-plane to separate sample
data
 SVR:
 Looking for a hyper-plane to predict data
distribution
 Example:
PASS Grade w1 w2 w3 w4
T 95 4.7 118 1M 172
T 70 3 121 1.2M 181
F 55 3.6 102 0.8M 173
F 48 2.7 108 0.85M 183
Liblinear
 Both solve
 with different
Python wrapper of Liblinear
 liblinear.py
 liblinear = CDLL(path.join(dirname,
'../liblinear.so.1'))
 Class: feature_node, problem, parameter, model
 liblinearutil.py
 import liblinear
 load/save_model(), evaluations(), train(), predict()
SOP
 Text classification
 Text segmentation
 Feature selection
 Train model
 Verify testing data
SOP
 Text classification
 Text segmentation
 N-Gram, HMM
 Segmentor for Python (Opensource)
 囉嗦(Loso)
 http://opensource.plurk.com/Loso_Chinese_Segmenta
tion_System/
 結巴(jieba)
 https://github.com/fxsjy/jieba
 Smallseg
 https://code.google.com/p/smallseg/
SOP
 Text classification
 Feature selection
 Garbage in garbage out
 EX: Wiki title index
 http://dumps.wikimedia.org/zhwiktionary/
 Libsvm vs Liblinear
 Libsvm:O(n2) or O(n3)
 Liblinear: O(n)
 in practice libsvm becomes painfully slow at 10k samples.
 http://tinyurl.com/ke4btjv
SOP
 Text classification
 Train
 Format
 Solver type(default 1)
0 -- L2-regularized logistic regression (primal)
1 -- L2-regularized L2-loss support vector classification
(dual)
2 -- L2-regularized L2-loss support vector classification
(primal)
3 -- L2-regularized L1-loss support vector classification
(dual)
4 -- support vector classification by Crammer and Singer
5 -- L1-regularized L2-loss support vector classification
SOP
 Text classification
 Train
 -c cost
 set the parameter C (default 1)
 -p epsilon
 set the epsilon in loss function of epsilon-SVR (default 0.1)
 -e epsilon
 set tolerance of termination criterion
 -B bias
 if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term added (default -1)
 -wi weight
 weights adjust the parameter C of different classes (see README for details)
 -v n
 n-fold cross validation mode
 -q
 quiet mode (no outputs)
SOP
 Text classification
 Verify testing data
 Using predict()
LIVE DEMO
Reference
 LIBLINEAR
 A Library for Large Linear Classication
 http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf
 L1, L2-Regularization
 L1 vs. L2 Regularization and feature selection
 http://cs.nyu.edu/~rostami/presentations/L1_vs_L2.pdf
 L1-norm Regularization
 http://cseweb.ucsd.edu/~saul/teaching/cse291s07/L1norm.
pdf
 Sparsity and Some Basics of L1 Regularization
 http://freemind.pluskid.org/machine-learning/sparsity-and-
some-basics-of-l1-regularization/
Reference
 Segmentor
 四款python中文分词系统简单测试
 http://hi.baidu.com/fooying/item/6ae7a0e26087e8d7eb
34c9e8
 MMSEG
 http://technology.chtsai.org/mmseg/
 開源中國,中文分詞庫
 http://tinyurl.com/k564x9k
THANKS

More Related Content

Similar to 20 mins of Liblinear

Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machinesbutest
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in Rherbps10
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Modern Data Stack France
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Sparkfelixcss
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Comsysto Reply GmbH
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health DataAbhishek Agrawal
 
Java Training in Noida Delhi NCR BY Ducat
Java Training in Noida Delhi NCR BY DucatJava Training in Noida Delhi NCR BY Ducat
Java Training in Noida Delhi NCR BY DucatShri Prakash Pandey
 
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMsRailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMsLourens Naudé
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
 
From logistic regression to linear chain CRF
From logistic regression to linear chain CRFFrom logistic regression to linear chain CRF
From logistic regression to linear chain CRFDarren Yow-Bang Wang
 
Tool wear monitoring and alarm system based on pattern recognition with logic...
Tool wear monitoring and alarm system based on pattern recognition with logic...Tool wear monitoring and alarm system based on pattern recognition with logic...
Tool wear monitoring and alarm system based on pattern recognition with logic...Nehem Tudu
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_TitanicAliciaWei1
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengSpark Summit
 

Similar to 20 mins of Liblinear (20)

Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
Tutorial - Support vector machines
Tutorial - Support vector machinesTutorial - Support vector machines
Tutorial - Support vector machines
 
Computational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in RComputational Techniques for the Statistical Analysis of Big Data in R
Computational Techniques for the Statistical Analysis of Big Data in R
 
Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)Spark ML par Xebia (Spark Meetup du 11/06/2015)
Spark ML par Xebia (Spark Meetup du 11/06/2015)
 
Scalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache SparkScalable Data Science in Python and R on Apache Spark
Scalable Data Science in Python and R on Apache Spark
 
Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016Machinelearning Spark Hadoop User Group Munich Meetup 2016
Machinelearning Spark Hadoop User Group Munich Meetup 2016
 
SHOGUN使ってみました
SHOGUN使ってみましたSHOGUN使ってみました
SHOGUN使ってみました
 
Reginf pldi3
Reginf pldi3Reginf pldi3
Reginf pldi3
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
Svm implementation for Health Data
Svm implementation for Health DataSvm implementation for Health Data
Svm implementation for Health Data
 
Java Training in Noida Delhi NCR BY Ducat
Java Training in Noida Delhi NCR BY DucatJava Training in Noida Delhi NCR BY Ducat
Java Training in Noida Delhi NCR BY Ducat
 
BioWeka
BioWekaBioWeka
BioWeka
 
RailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMsRailswayCon 2010 - Dynamic Language VMs
RailswayCon 2010 - Dynamic Language VMs
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
From logistic regression to linear chain CRF
From logistic regression to linear chain CRFFrom logistic regression to linear chain CRF
From logistic regression to linear chain CRF
 
Tool wear monitoring and alarm system based on pattern recognition with logic...
Tool wear monitoring and alarm system based on pattern recognition with logic...Tool wear monitoring and alarm system based on pattern recognition with logic...
Tool wear monitoring and alarm system based on pattern recognition with logic...
 
MLBox
MLBoxMLBox
MLBox
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_Titanic
 
Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui MengGeneralized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
Generalized Linear Models in Spark MLlib and SparkR by Xiangrui Meng
 

Recently uploaded

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

20 mins of Liblinear

  • 1. LIBLINEAR IN 20 MINS Chandler Huang previa [at] gmail.com
  • 2. Liblinear  SVM:  Looking for a hyper-plane to separate sample data  SVR:  Looking for a hyper-plane to predict data distribution  Example: PASS Grade w1 w2 w3 w4 T 95 4.7 118 1M 172 T 70 3 121 1.2M 181 F 55 3.6 102 0.8M 173 F 48 2.7 108 0.85M 183
  • 4. Python wrapper of Liblinear  liblinear.py  liblinear = CDLL(path.join(dirname, '../liblinear.so.1'))  Class: feature_node, problem, parameter, model  liblinearutil.py  import liblinear  load/save_model(), evaluations(), train(), predict()
  • 5. SOP  Text classification  Text segmentation  Feature selection  Train model  Verify testing data
  • 6. SOP  Text classification  Text segmentation  N-Gram, HMM  Segmentor for Python (Opensource)  囉嗦(Loso)  http://opensource.plurk.com/Loso_Chinese_Segmenta tion_System/  結巴(jieba)  https://github.com/fxsjy/jieba  Smallseg  https://code.google.com/p/smallseg/
  • 7. SOP  Text classification  Feature selection  Garbage in garbage out  EX: Wiki title index  http://dumps.wikimedia.org/zhwiktionary/  Libsvm vs Liblinear  Libsvm:O(n2) or O(n3)  Liblinear: O(n)  in practice libsvm becomes painfully slow at 10k samples.  http://tinyurl.com/ke4btjv
  • 8. SOP  Text classification  Train  Format  Solver type(default 1) 0 -- L2-regularized logistic regression (primal) 1 -- L2-regularized L2-loss support vector classification (dual) 2 -- L2-regularized L2-loss support vector classification (primal) 3 -- L2-regularized L1-loss support vector classification (dual) 4 -- support vector classification by Crammer and Singer 5 -- L1-regularized L2-loss support vector classification
  • 9. SOP  Text classification  Train  -c cost  set the parameter C (default 1)  -p epsilon  set the epsilon in loss function of epsilon-SVR (default 0.1)  -e epsilon  set tolerance of termination criterion  -B bias  if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term added (default -1)  -wi weight  weights adjust the parameter C of different classes (see README for details)  -v n  n-fold cross validation mode  -q  quiet mode (no outputs)
  • 10. SOP  Text classification  Verify testing data  Using predict()
  • 12. Reference  LIBLINEAR  A Library for Large Linear Classication  http://www.csie.ntu.edu.tw/~cjlin/papers/liblinear.pdf  L1, L2-Regularization  L1 vs. L2 Regularization and feature selection  http://cs.nyu.edu/~rostami/presentations/L1_vs_L2.pdf  L1-norm Regularization  http://cseweb.ucsd.edu/~saul/teaching/cse291s07/L1norm. pdf  Sparsity and Some Basics of L1 Regularization  http://freemind.pluskid.org/machine-learning/sparsity-and- some-basics-of-l1-regularization/
  • 13. Reference  Segmentor  四款python中文分词系统简单测试  http://hi.baidu.com/fooying/item/6ae7a0e26087e8d7eb 34c9e8  MMSEG  http://technology.chtsai.org/mmseg/  開源中國,中文分詞庫  http://tinyurl.com/k564x9k