SlideShare a Scribd company logo
1 of 37
Text Categorization Chapter 16 Foundations of Statistical Natural Language Processing
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
Part I ,[object Object]
Classification ,[object Object],[object Object],Parse trees Sentence PP attachment The word’s seneses Context of a word Disambiguation topics Document Text categorization Languages Document Language identification Document authors Document Author identification The word’s (POS) tags Context of a word Tagging Categories Object Problem
Task Description ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Task Formulation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],A data representation model g(x) = 0 x1 x2 w w = (1,1) b = -1 w x2 + b < 0 w x1 + b > 0 (0,1) (1,0)
Evaluation(1) ,[object Object],[object Object],[object Object],Contingency table d c No was assigned b a Yes was assigned No is correct Yes is correct
Evaluation(2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part II ,[object Object]
E.g.  A trained decision tree for category “earnings” Doc = {cts=1, net =3} Node1  7681 articles P(c|n1) = 0.3000 split: cts  value: 2 Node2  5977 articles P(c|n2) = 0.116 split: net  value: 1 Node5  1704 articles P(c|n5) = 0.943 split: vs  value: 2 Node3 5436 articles P(c|n3) = 0.050 Node4 541 articles P(c|n4) = 0.649 Node6 301 articles P(c|n6) = 0.694 Node7 1403  articles P(c|n7) = 0.996 cts < 2 cts >= 2 net<1 Net>= 1 vs <2 vs >= 2
A Closer Look on the E.g. ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Presentation Model (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],Ref to: Chap 5
Data Presentation Model (2) ,[object Object],[object Object],[object Object],[object Object]
Training Procedure: Growing (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Entropy of parent Node Proportion of elements that passed on to the left nodes Ref. Machine Learning
Training Procedure: Growing (2) ,[object Object],[object Object],[object Object],[object Object],cts < 2 cts >= 2 Node1  7681 articles P(c|n1) = 0.3000 split: cts  value: 2 Node2  5977 articles2 p(c|n) = 0.116 Node5  1704 articles P(c|n5) = 0.943
Training Procedure: pruning (1) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Ref to :chap3 (3.7.1) machine learning
Training Procedure: pruning (2) ,[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part III ,[object Object],[object Object],[object Object],[object Object]
Basic Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Presentation Model ,[object Object],[object Object],[object Object]
Model Class ,[object Object],[object Object],[object Object],[object Object]
Training Process: Generalized Iterative Scaling ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Principle of Maximum Entropy ,[object Object],[object Object],[object Object],[object Object],[object Object]
Application to Text Categorization ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part VI ,[object Object]
Models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Perceptron learning Procedure: gradient descent ,[object Object],[object Object],[object Object]
Perceptron learning Procedure: Basic Idea ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],j-th item in the weight vector   j-th item in the input vector   expected output & output
Why ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
E.g. w x x w+x s’ s Yes No
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Part V ,[object Object]
Nearest Neighbor ,[object Object]
K Nearest Neighbor ,[object Object],[object Object]
Discussion ,[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks!

More Related Content

What's hot

Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
KU Leuven
 

What's hot (20)

Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Lec 4,5
Lec 4,5Lec 4,5
Lec 4,5
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
Mapping Subsets of Scholarly Information
Mapping Subsets of Scholarly InformationMapping Subsets of Scholarly Information
Mapping Subsets of Scholarly Information
 
Data Mining: Concepts and Techniques — Chapter 2 —
Data Mining:  Concepts and Techniques — Chapter 2 —Data Mining:  Concepts and Techniques — Chapter 2 —
Data Mining: Concepts and Techniques — Chapter 2 —
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Ir models
Ir modelsIr models
Ir models
 
Topic Modeling
Topic ModelingTopic Modeling
Topic Modeling
 
Chapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text miningChapter 11 cluster advanced : web and text mining
Chapter 11 cluster advanced : web and text mining
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Capter10 cluster basic
Capter10 cluster basicCapter10 cluster basic
Capter10 cluster basic
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
3.1 clustering
3.1 clustering3.1 clustering
3.1 clustering
 
Learning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwiseLearning to Rank - From pairwise approach to listwise
Learning to Rank - From pairwise approach to listwise
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 

Viewers also liked

Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
Tommaso Teofili
 
Text classification
Text classificationText classification
Text classification
James Wong
 
Text clustering
Text clusteringText clustering
Text clustering
KU Leuven
 

Viewers also liked (20)

Text Categorization
Text CategorizationText Categorization
Text Categorization
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Tutorial on Text Categorization, EACL, 2003
Tutorial on Text Categorization, EACL, 2003Tutorial on Text Categorization, EACL, 2003
Tutorial on Text Categorization, EACL, 2003
 
NLP for Robotics
NLP for RoboticsNLP for Robotics
NLP for Robotics
 
Textmining Predictive Models
Textmining Predictive ModelsTextmining Predictive Models
Textmining Predictive Models
 
[ppt]
[ppt][ppt]
[ppt]
 
It
ItIt
It
 
Text categorization
Text categorizationText categorization
Text categorization
 
Text categorization using Rough Set
Text categorization using Rough SetText categorization using Rough Set
Text categorization using Rough Set
 
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
[DL Hacks] Learning Transferable Features with Deep Adaptation Networks
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Text classification
Text classificationText classification
Text classification
 
Chainerを使ったらカノジョができたお話
Chainerを使ったらカノジョができたお話Chainerを使ったらカノジョができたお話
Chainerを使ったらカノジョができたお話
 
NLP
NLPNLP
NLP
 
Text clustering
Text clusteringText clustering
Text clustering
 
Text categorization
Text categorizationText categorization
Text categorization
 
ICML2016読み会 概要紹介
ICML2016読み会 概要紹介ICML2016読み会 概要紹介
ICML2016読み会 概要紹介
 
Icml読み会 deep speech2
Icml読み会 deep speech2Icml読み会 deep speech2
Icml読み会 deep speech2
 
Meta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural NetworkMeta-Learning with Memory Augmented Neural Network
Meta-Learning with Memory Augmented Neural Network
 
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
ベイジアンネットとレコメンデーション -第5回データマイニング+WEB勉強会@東京
 

Similar to 20070702 Text Categorization

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
butest
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
butest
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
butest
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
butest
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
butest
 

Similar to 20070702 Text Categorization (20)

Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
ppt
pptppt
ppt
 
Machine learning and Neural Networks
Machine learning and Neural NetworksMachine learning and Neural Networks
Machine learning and Neural Networks
 
Lect4
Lect4Lect4
Lect4
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3Machine Learning: Decision Trees Chapter 18.1-18.3
Machine Learning: Decision Trees Chapter 18.1-18.3
 
nnml.ppt
nnml.pptnnml.ppt
nnml.ppt
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
 
[ppt]
[ppt][ppt]
[ppt]
 
[ppt]
[ppt][ppt]
[ppt]
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

20070702 Text Categorization

  • 1. Text Categorization Chapter 16 Foundations of Statistical Natural Language Processing
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. E.g. A trained decision tree for category “earnings” Doc = {cts=1, net =3} Node1 7681 articles P(c|n1) = 0.3000 split: cts value: 2 Node2 5977 articles P(c|n2) = 0.116 split: net value: 1 Node5 1704 articles P(c|n5) = 0.943 split: vs value: 2 Node3 5436 articles P(c|n3) = 0.050 Node4 541 articles P(c|n4) = 0.649 Node6 301 articles P(c|n6) = 0.694 Node7 1403 articles P(c|n7) = 0.996 cts < 2 cts >= 2 net<1 Net>= 1 vs <2 vs >= 2
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31. E.g. w x x w+x s’ s Yes No
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.