Seminar dm

outlines
 Purpose of search
 Introduction
 Applications of text classification
 Approaches and methods in text classification
 Summary
.
2

Purpose of search
State of the art for text
classification problem
3

Introduction
 TC is one of the important fields in natural language processing.
 Text Classification assigns one or more classes to a document
according to their content.
5

Applications of text
classification
o CRM tasks.
o Social media
o E-mail spam filtering
o Sentiment Analysis
o Commercial world
o Question answering systems Dialogue Agents
o other
6

Approaches and methods in
text classification
7


Rule-base
Or rule classification , use
rules to classify text.
Methods in text classification
Statistical
use machine learning
and deep learning .
.
8

Machine learning for text
classification
use BOW as a way of extracting features from
text for use in ml algorithms
9

Machine learning algorithms for
text classification
 Decision Trees.
 Support Vector Machine.
 Naïve Bayes.
 K-Nearest Neighbors.
 Hidden Markov model.
11

Decision Trees
 A decision tree is a tree whose internal nodes are tests and
whose leaf nodes are categories .
 capable to learn disjunctive expressions and their
robustness to noisy data seem convenient for document
classification.
 learning DT cannot guarantee to return
the globally optimal decision tree .
 High cost .
12

Decision Trees
▷ Harrag, El-Qawasmeh & Pichappan :use decision tree for
Arabic text classification They suggested hybrid techniques
of document frequency threshold by using embedded
information gain criterion and the preferable feature
selection criterion.
▷ Vateekul & Kubat : worked on Imbalanced, Large Scale,
and Multi-label Data , try to reduce these costs by FDT
("fast decision - tree induction") .
▷ Johnson, Oles, Zhang & Goetz (2002) : performed
combination of a FDT and a modern method for converting
a decision tree to a rule set .
13

K-Nearest Neighbors
▷ applied to text categorization in early 90's strong baseline
in benchmark evaluations
▷ among top-performing methods in TC evaluations scalable
to large TC applications.
▷ Also called:
○ Case-based learning
○ Memory-based learning
○ Lazy learning
14

K-Nearest Neighbors
▷ Using only the closest example to determine the categorization is
subject to errors due to:
○ A single atypical example.
○ Noise (i.e. error) in the category label of a single training
example.
▷ More robust alternative is to find the k most-similar examples and
return the majority category of these k examples.
▷ Value of k is typically odd to avoid
ties; 3 and 5 are most common.
▷ No feature selection necessary
15

KNN
 Hierarchical KNN (high performance with
small and large dataset) with two steps:
 Step1: select high K
 Step2 : select neighbor
features.
 KNN with indexing documents by N-gram
(unigrams and bigrams)
 KNN(with K-means) for grouping into
clusters then Weighted
16

Naïve Bayes
 Simple ,common and very fast.
 Baseline
 Naïve Bayes is not so naïve , A good dependable baseline for text
classification (but not the best)!
 Very good in Domains with many equally important features.
 popular for document categorization.
 Conditional independence assumption
 Features are independent of each other given in the class.
 Need very large training examples.
17

Naïve Bayes
 Singhal & Sharma: eliminating features leads to improved
performance.
 Posteriori with dependency between features and Reduce
dimensions of features.
 Use NB without features independence assumption and split
related features (high performance with increase dataset).
18

Hidden Markov model
 HMM is one sequential model of text .
 A simple process to generate a sequence of
words.
 Classification is not simple.
 generate states y1,...,yn
 generate words w1,..,wn from Pr(W|Y=yi)
19

HMM
 Frasconi, Soda &Vullo : represent documents as series of
pages(high performance with large documents )
 Use “Minimum Message length estimator“ for optimal
number of states for higher
performance .
20

Support Vector Machine
 was proposed by Vapnik, provides "a maximal margin
separating hyper plane" between two classes of data and
has non-linear extensions
 represents the text document as a vector .
 A popular supervised learning model used for binary
classification.
▷ Why SVM?
○ High dimensional input space
○ Few irrelevant features
○ Sparse document vectors
21

SVM
 Yao & Fan : use weighted kernel function depended on
features of the training data for interference detection.
 Rennie &Rifkin : to the task of classifying multilayered text .
 Joseph , Yun and Yanqing(2015) :Use Word2Vector
representation with SVM for Semantic Features.
22

Deep learning for text
classification
No feature extraction
23

Deep learning
 In ~2010 DL started outperforming other ML techniques .
 first in speech and vision, then NLP.
 Several big improvements in recent years in NLP .
 Leverage different levels of representation.
 words & characters.
 syntax & semantics
24

25
o Manually designed
features are often over-
specified, incomplete
and take a long time to
design and validate.
o Learned Features are
easy to adapt, fast to
learn
Deep learning –why?
o Can learn both supervised
unsupervised and.
o Deep learning provides a
very flexible, (almost?)
universal, learnable
framework for representing
world, visual and linguistic
information.

Convolution NN
 Convolutional Neural Networks (CNNs -2014)
 Main CNN idea for text: Compute vectors for n-grams and group
them afterwards
 Use Single 1-dimensional convolution layer followed by a max pooling
layer combining neighboring vectors.
 Goal is to learn a region based text embedding.
 fast in training and powerful in text classification .
 learning an optimal kernel size is challenging.
26

Recurrent NN
 Recurrent NN has obtained much attention because of their
superior ability to pr Tai et al. (2015) generalized LSTM to
Tree-LSTM where each LSTM unit gains information from
its children units. reserve sequence information over time.
 Has ability to remember long sequence , has forget gates .
 Has High cost(O(n2))
28

Bidirectional LSTM
▷ It involves duplicating the first recurrent layer in
the network.
▷ remarkable performance in sentences more
than in documents
29

Recurrent Convolutional NN(2015)
▷ capture contextual information by maintaining a state of all previous
inputs.
▷ remarkable performance in documents classification.
30

AC-BLSTM
▷ Asymmetric Convolutional Bidirectional LSTM (AC-BLSTM -
2017).
▷ remarkable performance in sentences and documents
classification tasks.
31

32
Hierarchical
Attention
Networks

Hierarchical Attention Networks
▷ HAN(2016).
▷ Assume that a document has L sentences Si and each
sentence contains Ti words.
▷ It consists of several parts:
○ a word sequence encoder
○ a word-level attention layer
○ a sentence encoder and
○ a sentence-level attention layer.
33

Rule base
▷ based on linguistic rules that capture all of the elements and
attributes of a document to assign it to a category.
▷ A rules-based approach is flexible, powerful and easy to express.
▷ Required understanding of text (meaning, relevancy, relationship
between concepts, etc.)
▷ Provides a true representation of the language.
▷ Supports writing simpler rules with a higher level of abstraction.
▷ Makes it easier to improve accuracy over time
▷ But… not for very large rules .
▷ Old method ,but used.
35

Seminar dm

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Seminar dm

Similar to Seminar dm (20)

Recently uploaded

Recently uploaded (20)

Seminar dm