SlideShare a Scribd company logo
By: Sy-Quan Nguyen Minh-Hoang Nguyen Phi-Dung Tran Instructor : Prof. Quang-Thuy Ha   Tuan-Quang Nguyen  A Comparation study on SVM,TSVM and SVM-kNN in Text Categorization
Table of content ,[object Object],[object Object],[object Object],[object Object],[object Object]
Document Classification: Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Text Categorization ,[object Object],[object Object],[object Object],11/01/11 Categorization System … Sports Business Education Science … Sports Business Education
Document Classification: Problem Definition ,[object Object],[object Object],[object Object],[object Object],[object Object]
Flavors of Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Classification Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Steps in Document Classification ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Support Vector Mechine (SVM)
SVM — History and Applications ,[object Object],[object Object],[object Object],[object Object],[object Object]
SVM ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object]
 
Problem ,[object Object],[object Object]
SVM – Separable Case ,[object Object],[object Object],[object Object],[object Object]
SVM Linear
[object Object],[object Object],[object Object]
[object Object],[object Object]
 
[object Object],[object Object],[object Object],[object Object],[object Object]
SVM – Nonlinear ,[object Object]
[object Object],[object Object],[object Object],[object Object]
 
Transductive SVM
TSVM - Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TSVM - Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TSVM - Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TSVM - Overview ,[object Object],[object Object],[object Object],[object Object],After SVM After TSVM
TSVM - Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TSVM - Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
TSVM - Algorithm ,[object Object]
TSVM - Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object]
SVM - KNN
Problem ,[object Object],[object Object],[object Object]
Semi-Supervised ,[object Object],[object Object],[object Object],[object Object],[object Object]
Semi-Supervised ,[object Object],[object Object],[object Object]
Algorithm SVM ,[object Object],[object Object]
Algorithm KNN ,[object Object],[object Object],[object Object]
Pseudocode  KNN ,[object Object],[object Object],[object Object],[object Object],[object Object]
Example ,[object Object],[object Object]
[object Object]
[object Object]
[object Object]
Algorithm SVM-KNN ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pesudocode SVM-KNN [1] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Model SVM1
Initial training set Predict labels  all the remaining unlabeled data SVM1 New testing set Choose 2n  Vector boudary Boundary vectors  get new labels Use  KNN Retrain new SVM2 Put them in training set Number training set =m  times whole data set
Experiment
[object Object],[object Object],[object Object]
 
 
References ,[object Object],[object Object],[object Object],11/01/11 Data Mining: Principles and Algorithms
References ,[object Object],[object Object],[object Object],[object Object],[object Object]
Thank You

More Related Content

What's hot

Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
KU Leuven
 

What's hot (20)

5.3 mining sequential patterns
5.3 mining sequential patterns5.3 mining sequential patterns
5.3 mining sequential patterns
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Data warehousing and online analytical processing
Data warehousing and online analytical processingData warehousing and online analytical processing
Data warehousing and online analytical processing
 
Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)Machine Learning lecture6(regularization)
Machine Learning lecture6(regularization)
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Probabilistic models (part 1)
Probabilistic models (part 1)Probabilistic models (part 1)
Probabilistic models (part 1)
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Chapter 4 Classification
Chapter 4 ClassificationChapter 4 Classification
Chapter 4 Classification
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
4.2 spatial data mining
4.2 spatial data mining4.2 spatial data mining
4.2 spatial data mining
 
Spatial data mining
Spatial data miningSpatial data mining
Spatial data mining
 
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5 Data Mining:  Concepts and Techniques (3rd ed.)— Chapter 5
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 5
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Association rule mining.pptx
Association rule mining.pptxAssociation rule mining.pptx
Association rule mining.pptx
 

Viewers also liked

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 

Viewers also liked (10)

Text categorization
Text categorizationText categorization
Text categorization
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
CS571: Gradient Descent
CS571: Gradient DescentCS571: Gradient Descent
CS571: Gradient Descent
 
Tutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment AnalysisTutorial on Opinion Mining and Sentiment Analysis
Tutorial on Opinion Mining and Sentiment Analysis
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
Rule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slidesRule based approach to sentiment analysis at romip’11 slides
Rule based approach to sentiment analysis at romip’11 slides
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
CS571: Sentiment Analysis
CS571: Sentiment AnalysisCS571: Sentiment Analysis
CS571: Sentiment Analysis
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结(Deep) Neural Networks在 NLP 和 Text Mining 总结
(Deep) Neural Networks在 NLP 和 Text Mining 总结
 

Similar to Text categorization

Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
Margaret Wang
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
butest
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
taikhoan262
 

Similar to Text categorization (20)

Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
2.7 other classifiers
2.7 other classifiers2.7 other classifiers
2.7 other classifiers
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
MLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic trackMLHEP Lectures - day 1, basic track
MLHEP Lectures - day 1, basic track
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Guide
GuideGuide
Guide
 
Chapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.pptChapter 9. Classification Advanced Methods.ppt
Chapter 9. Classification Advanced Methods.ppt
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 

Recently uploaded

Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 

Recently uploaded (20)

How to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS ModuleHow to Split Bills in the Odoo 17 POS Module
How to Split Bills in the Odoo 17 POS Module
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
UNIT – IV_PCI Complaints: Complaints and evaluation of complaints, Handling o...
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptxslides CapTechTalks Webinar May 2024 Alexander Perry.pptx
slides CapTechTalks Webinar May 2024 Alexander Perry.pptx
 
Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......Ethnobotany and Ethnopharmacology ......
Ethnobotany and Ethnopharmacology ......
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdfDanh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
Danh sách HSG Bộ môn cấp trường - Cấp THPT.pdf
 
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptxMatatag-Curriculum and the 21st Century Skills Presentation.pptx
Matatag-Curriculum and the 21st Century Skills Presentation.pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
Mattingly "AI & Prompt Design: Limitations and Solutions with LLMs"
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptxSolid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
Solid waste management & Types of Basic civil Engineering notes by DJ Sir.pptx
 
Salient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptxSalient features of Environment protection Act 1986.pptx
Salient features of Environment protection Act 1986.pptx
 

Text categorization

Editor's Notes

  1. this is the instance in many application areas of machine learning, for example,
  2. For instance: in content-based image retrieval, a user usually poses several example images as a query and asks a system to return similar images. In this situation there are many unlabeled examples. IE: images that exist in a database, but there are only several labeled examples. Another instance is online web page recommendation. When a user is surfing the Internet, he may occasionally encounter some interesting web pages and may want the system bring him similarly interesting web pages. It will be difficult to require the user to confirm more interesting pages as training examples because the user may not know where they are. In this instance, although there are a lot of unlabeled examples and there are only a few labeled examples. In the cases, there is only one labeled training example to reply on. If the initial weakly useful predictor cannot be generated based on this single example, the above-mentioned SSL techniques cannot be applied.[4]
  3. Because the RBF kernel nonlinearly maps examples into a higher dimensional space, unlike the linear kernel, it can handle the situation when the relation between class labels and attributes is nonlinear.
  4. This method is very simple: for example, if example x 1 has k nearest examples in the feature space and majority of them have the same label y 1 , then example x 1 belongs to y 1. Because: Although KNN method depends on outmost theorem in the theory, during the decision course it is only related to small number of nearest neighbors, so adopting this method can avoid the problem of examples imbalanced, otherwise, KNN mainly depends on limited number of nearest neighbor around not a decision boundary.
  5. Because the examples located around the boundary are easy to be misclassified, but they are likely to the support vectors, we call them boundary vectors, so picking out these boundary vectors whose labels are fuzzy labeled by weaker classifier SVM.