Lecture 10

•Download as PPTX, PDF•

1 like•152 views

Jeet Das

Supervised Machine learning-Classification

Engineering

Supervised Learning:
 Classification: Predict a discrete value(label)
associated with feature vector.
 Regression: Predict a real number associated with a
feature vector.
E.g., Use linear regression to fit a curve to data.

Using Distance Matrix for Classification:
 Simplest approach is probably nearest neighbors.
 Remember training data
 When predicting the label of a new example
 Find the nearest example in the training data
 Predict the label associated with that example.

Advantages and Disadvantages of KNN:
Advantages:
 Learning Fast, no explicit training
 No theory Required
 Easy to explain method and results
Disadvantages:
 Memory intensive and predictions can take a long
time.
 No model to shed light on process that generated
data.

Naïve Baye’s Text classification:
Why?
 Learn which news articles are of interest.
 Learn to classify web pages category
Basic Intuition:
 Simple (naïve) classification method based on
Bayes rule.
 Relies on very simple representation of documents
 Bag of words

Naïve Bayes Text Classification:
Bayes Rule:
For a document d and class c
Goal of Classifier:

Learn to Classify Text using Naïve Bayes:
Target concept interesting? : Document {+, -}
 Represent each document by vector of words
 One attribute per word position in document
 Learning : Use training examples to estimate
P(+), P(-), P(doc|+), P(doc|-)
Naïve Bayes conditional independence assumption
Where P(ai = Wk|Vj) is probability that a word
in position in i is Wk , given Vj

An example: Movie Review
Dictionary: 10 Unique words
< I, loved, the, movie, hated, a, great, good, poor,
acting>

Steps:
 Covert the documents into feature sets, where
attributes are possible words, and the values are the
number of times a word occurs in the given
document.
Doc I love
d
the movi
e
hate
d
a great goo
d
poor actin
g
Clas
s
1 1 1 1 1 +
2 1 1 1 1 -
3 2 1 1 1 +
4 1 1 -
5 1 1 1 1 1 +
Let us look at the probabilities per outcomes(+
or -)

Naïve Bayes…
 Documents with positive outcomes:
P(+)= 3/5= 0.6
Compute: P(I|+), P(loved|+), P(the|+), P(movie|+), P(a|+),
P(great|+), P(good|+), P(acting|+)
Let n be the number of words in the (+) case: 14, nk the
number of word k occurs in these case(+)
Let P(Wk|+) = (n k + 1)/(n +|vocabulary|)
Doc I loved the movie hate
d
a great goo
d
poo
r
actin
g
Clas
s
1 1 1 1 1 +
3 2 1 1 1 +
5 1 1 1 1 1 +

Naïve Bayes…
P(I|+)=0.0833 P(acting|+)=
0.0833
P(loved|+)=0.0833 P(poor|+)=
0.0417
P(the|+)= 0.0833 P(hated|+) =
0.0417
P(movie|+)= 0.2083 P(great|+)=
0.1250
P(a|+)= 0.1250 P(good|+)=
0.1250
 Now, Documents with negative class:
Doc I love
d
the movie hate
d
a gre
at
goo
d
poo
r
acting Clas
s
2 1 1 1 1 -
4 1 1 -

P(I|-)= 0.1250 P(acting|-)= 0.1250
P(loved|-)= 0.0625 P(poor|-)= 0.1250
P(the|-)= 0.1250 P(hated|-) = 0.1250
P(movie|-)= 0.1250 P(great|-)= 0.0625
P(a|-)= 0.0625 P(good|-)= 0.0625
Now, Let’s classify a new sentence w.r.t our training
samples:
Test document: I hated the poor acting
If Vj= +;
P(+)*P(I|+)*P(hated|+)*P(the|+)*P(poor|+)*P(acting|+)
6.03× 10^(-7)
If Vj= - ; P(-)*P(I|-)*P(hated|-)*P(the|-)*P(poor|-)*P(acting|-)
1.22 × 10^(-5)

What's hot

Lecture 09(introduction to machine learning)Jeet Das

Lecture 11Jeet Das

Learning to Rank - From pairwise approach to listwiseHasan H Topcu

Lec 4,5alaa223

Text Classification, Sentiment Analysis, and Opinion MiningFabrizio Sebastiani

Word vectorization(embedding) with nnlmhyunsung lee

Dual Embedding Space Model (DESM)Bhaskar Mitra

Neural Models for Information RetrievalBhaskar Mitra

5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra

Information Retrieval 02Jeet Das

Neural Semi-supervised Learning under Domain ShiftSebastian Ruder

FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...Sebastian Ruder

Lecture 6: Ensemble Methods Marina Santini

MachineLearning.pptbutest

What's hot (14)

Lecture 09(introduction to machine learning)

Lecture 11

Learning to Rank - From pairwise approach to listwise

Lec 4,5

Text Classification, Sentiment Analysis, and Opinion Mining

Word vectorization(embedding) with nnlm

Dual Embedding Space Model (DESM)

Neural Models for Information Retrieval

5 Lessons Learned from Designing Neural Models for Information Retrieval

Information Retrieval 02

Neural Semi-supervised Learning under Domain Shift

FaDA: Fast document aligner with word embedding - Pintu Lohar, Debasis Gangul...

Lecture 6: Ensemble Methods

MachineLearning.ppt

Similar to Lecture 10

Search Enginesbutest

Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...Jonathon Hare

A Review on Subjectivity Analysis through Text Classification Using Mining Te...IJERA Editor

Data.Mining.C.6(II).classification and predictionMargaret Wang

Classification Of Web Documents hussainahmad77100

Sentiment analysis using naive bayes classifier Dev Sahu

Machine Learning: Decision Trees Chapter 18.1-18.3butest

Naive bayesLearnbay Datascience

Machine Learning Applications in NLP.pptbutest

MLEARN 210 B Autumn 2018: Lecture 1heinestien

Introduction to Machine Learning Aristotelis Tsirigos butest

powerpointbutest

Machine Learning and Inductive Inferencebutest

NLP - Sentiment AnalysisRupak Roy

Part 1butest

slidesbutest

[ppt]butest

Mapping Keywords to Isabelle Augenstein

Similar to Lecture 10 (20)

Search Engines

Multimodal Searching and Semantic Spaces: ...or how to find images of Dalmati...

A Review on Subjectivity Analysis through Text Classification Using Mining Te...

Data.Mining.C.6(II).classification and prediction

Classification Of Web Documents

Sentiment analysis using naive bayes classifier

Machine Learning: Decision Trees Chapter 18.1-18.3

Naive bayes

Machine Learning Applications in NLP.ppt

MLEARN 210 B Autumn 2018: Lecture 1

Introduction to Machine Learning Aristotelis Tsirigos

powerpoint

Machine Learning and Inductive Inference

NLP - Sentiment Analysis

Part 1

slides

[ppt]

Mapping Keywords to

Recently uploaded

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

KubeKraft presentation @CloudNativeHooghlysanyuktamishra911

SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

Roadmap to Membership of RICS - Pathways and RoutesM Maged Hegazy, LLM, MBA, CCP, P3O

Porous Ceramics seminar and technical writingrakeshbaidya232001

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia

UNIT - IV - Air Compressors and its Performancesivaprakash250

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Recently uploaded (20)

MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS

Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...

Processing & Properties of Floor and Wall Tiles.pptx

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

KubeKraft presentation @CloudNativeHooghly

SPICE PARK APR2024 ( 6,793 SPICE Models )

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

Roadmap to Membership of RICS - Pathways and Routes

Porous Ceramics seminar and technical writing

Software Development Life Cycle By Team Orange (Dept. of Pharmacy)

UNIT - IV - Air Compressors and its Performance

UNIT-III FMM. DIMENSIONAL ANALYSIS

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

Coefficient of Thermal Expansion and their Importance.pptx

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Lecture 10

1. Classification Machine Learning

2. Supervised Learning:  Classification: Predict a discrete value(label) associated with feature vector.  Regression: Predict a real number associated with a feature vector. E.g., Use linear regression to fit a curve to data.

3. Example:

4. Distance Matrix:

5. Using Distance Matrix for Classification:  Simplest approach is probably nearest neighbors.  Remember training data  When predicting the label of a new example  Find the nearest example in the training data  Predict the label associated with that example.

6. Distance Matrix:

7. Hand-Written Character Recognition:

8. K-nearest neighbors

9. Advantages and Disadvantages of KNN: Advantages:  Learning Fast, no explicit training  No theory Required  Easy to explain method and results Disadvantages:  Memory intensive and predictions can take a long time.  No model to shed light on process that generated data.

10. Naïve Baye’s Text classification: Why?  Learn which news articles are of interest.  Learn to classify web pages category Basic Intuition:  Simple (naïve) classification method based on Bayes rule.  Relies on very simple representation of documents  Bag of words

11. Bag of words representation:

12. Naïve Bayes Text Classification: Bayes Rule: For a document d and class c Goal of Classifier:

13. Learn to Classify Text using Naïve Bayes: Target concept interesting? : Document {+, -}  Represent each document by vector of words  One attribute per word position in document  Learning : Use training examples to estimate P(+), P(-), P(doc|+), P(doc|-) Naïve Bayes conditional independence assumption Where P(ai = Wk|Vj) is probability that a word in position in i is Wk , given Vj

14. An example: Movie Review Dictionary: 10 Unique words < I, loved, the, movie, hated, a, great, good, poor, acting>

15. Steps:  Covert the documents into feature sets, where attributes are possible words, and the values are the number of times a word occurs in the given document. Doc I love d the movi e hate d a great goo d poor actin g Clas s 1 1 1 1 1 + 2 1 1 1 1 - 3 2 1 1 1 + 4 1 1 - 5 1 1 1 1 1 + Let us look at the probabilities per outcomes(+ or -)

Lecture 10

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Lecture 10

Similar to Lecture 10 (20)

More from Jeet Das

More from Jeet Das (11)

Recently uploaded

Recently uploaded (20)

Lecture 10