General guide in nlp

General guide in NLP
微光國際資訊有限公司 2018/05/16

About this chapter
• Introduction - Machine Learning in NLP
• Pipelines
• Tokenization
• Bag of Words and TF - IDF
• Topic Model
• Library - implement
• Visualization

Introduction - Machine Learning in NLP

Introduction - Machine Learning in NLP
Machine Learning
Supervised UnSupervised
•K-mean
•Clustering
•Topic Model
•K-NN
•Logistic regression
•Decision tree
•Classification

–mallet CS In UMASS
Let's define Topic model
“Topic models provide a simple way to
analyze large volumes of text. A
"topic" consists of a of words that
occur together.”
unlabeled
cluster
frequently

Pipelines
Document
word
preprocess vector Model Visualization
• tokenization
• stopword
• lemm
• argument
• Bow
• Dictionary
•LDA
•LSA
•PLSA

TokenizationThe quick brown fox jumps over the lazy dog.The quick brown fox jumps over the lazy dog.jump

Bag of Words and TF - IDFquick brown fox jump over lazy dog
:1
:2
:3
:4
:5
:6
:7
Document Document Document
0
1
4
4
3
0
7
12
0
1
2
9
3
1
10
5
20
4
0
29
3
維度

Bag of Words and TF - IDF
Term Frequency Inverse Document FrequencyX
文字出現頻率權重值

主題權重值參數
文字於主題權重值
參數
文件
主題數
文字
文字總量

General guide in nlp

Recommended

Recommended

More Related Content

Similar to General guide in nlp

Similar to General guide in nlp (20)

More from Heng-Xiu Xu

More from Heng-Xiu Xu (6)

Recently uploaded

Recently uploaded (20)

General guide in nlp