SlideShare a Scribd company logo
Mathematical Background For
Natural Language Processing
Anantharaman Narayana Iyer
Narayana dot Anantharaman at gmail dot com
14 Aug 2015
Why Probability for NLP?
• Consider the following example scenarios:
• You are travelling in an autorikshaw on a busy road in Bangalore
and are a on a call with your friend.
• We are watching an Hollywood English film. Often times we may
not understand exactly every word that is spoken either due to
the accent of the speaker or the word is a slang that not
everyone outside the context can relate to.
• We are reading tweets that are cryptic with several misspelled
words, emoticons, hashtags and so on.
• Commonality in all the above cases is the presence of noise along
with the signal
• The noise or ambiguities result in uncertainty of interpretation
• In order for an NLP application to process such text, we need an
appropriate mathematical machinery.
• Probability theory is our tool to handle such cases.
What is a Random Variable?
• A is a Boolean valued RV if A denotes an event
and there is some degree of uncertainty to
whether A occurs.
• Example: It will rain in Manchester during the 4th
Cricket test match between India and England
• Probability of A is the fraction of possible
worlds in which A is true
• The area of blue rectangle = 1
• Random Variable is not a variable in the
traditional sense. It is rather a function
mapping.
Worlds
where A
is true
Worlds where A is false
Axioms of Probability
The following axioms always hold good:
• 0 <= P(A) <= 1
• P(True) = 1
• P(False) = 0
• P(A or B) = P(A) + P(B) – P(A and B)
Note: We can diagrammatically represent the above and verify these
Multivalued Discrete Random Variables
Examples of multivalued RVs
• Number of hashtags in a Tweet
• Number of URLs in a tweet
• Number of tweets that mention the
term iPhone in a given day
• Number of tweets sent by Times Now
channel per day
• The values a word position can take in a
natural language text sentence of a
given vocabulary
Random Variable Example
• We can treat the number of hashtags contained in a given tweet (h) as a Random
Variable because:
• A tweet has non-negative number of hashtags: 0 to n, where n is the maximum number of
hashtags bounded by the size of the tweet. For example, if a tweet has 140 characters and if
the minimum string length required to specify a hashtag is 3 (one # character, one character
for the tag, one delimiter), then n = floor(140 / 3) = 46
• As we observe, h is a discrete RV taking values: 0 <= h <= 46
• Thus the probability of finding a hashtag in a tweet satisfies:
0 < P(hashtag in a tweet) < 1
P(h = 0 V h = 1 V h = 2….V h = 46) = 1 – where h is the number of hashtags in the tweet
P(h = hi, h = hj) = 0 when i and j are not equal – hi and hj are 2 different values of hashtag
• If we are given a set of tweets, e.g., 500, we can compute the average number of
hashtags per tweet. This is real valued and hence a continuous variable.
• We can find the probability distribution for the continuous variable also (see later
slides)
Joint Distribution of Discrete Random Variables
• Consider 2 RVs X and Y, where X and Y
can take discrete values. The joint
distribution is given by:
P(X = x, Y = y)
The above satisfies:
1. P(X, Y) >= 0
2. Σ Σ P(X = xi, Y = yj) = 1 where the
summation is done for all i and all j
This concept can be generalized in to n
variables
HASHTAG RT URL SNAME
0 0 1 0
3 0 1 1
1 0 0 0
0 0 0 0
0 0 1 0
0 0 0 0
1 0 0 1
0 0 0 0
0 0 0 0
0 0 0 0
1 0 1 0
1 1 0 0
1 0 1 0
1 0 0 0
0 0 1 0
3 0 0 0
0 1 0 0
Frequency Distributions for our Tweet Corpora
Some Observations
• Frequency distribution shows distinguishable
patterns across the different corpora. For e.g, in the
case of Union budget, the number of tweets that
have hashtags one or more is roughly 50% of total
tweets while that for Railway budget is roughly 25%
• The expectation value of
hashtags/Retweets/URLs/Screen names per tweet
can be treated as a vector of real valued Random
Variables and used for classification.
0
50
100
150
200
250
300
350
400
0 1 2 3 4 5 6 7 8 9 10
Frequency Distribution For Dataset 10 (Tweets on
General Elections)
htfreq rtfreq urlfreq snfreq
0
50
100
150
200
250
300
350
400
450
0 1 2 3 4 5 6 7 8 9 10
Frequency Distribution for Dataset 1 (Tweets on Railway
Budget)
htfreq rtfreq urlfreq snfreq
0
100
200
300
400
500
0 1 2 3 4 5 6 7 8 9 10
Frequency Distribution for Dataset 19 (Tweets on
Union Budget)
htfreq rtfreq urlfreq snfreq
Random Variables: Illustration
• Let us illustrate the definition of Random Variables and derive the sum, product
rules of conditional probability with an example below:
• Suppose we follow tweets from 2 media channels, say, Times News (x1) and Business News
(x2). x1 is a general news channel and x2 is focused more on business news.
• Suppose we measured the tweets generated by either of these 2 channels over a 1 month
duration and observed the following:
• Total tweets generated (union of tweets from both channels) = 1200
• Break up of the tweets received as given in the table below:
600 40
200 360
x1 x2
y1
y2
Random Variables
• We can model the example we stated using 2
random variables:
• Source of tweets (X), Type of tweet (Y)
• X and Y are 2 RVs that can take values:
• X = x where x ԑ {x1, x2}, Y = y where y ԑ {y1, y2}
• Also, from the given data, we can compute
prior probabilities and likelihood:
P(X = x1) = (600 + 200)/1200 = 2/3 = 0.67
P(X = x2) = 1 – 2/3 = 1/3 = 0.33
P(Y = y1 | X = x1) = 600 / (600 + 200) = 3/4 = 0.75
P(Y = y2 | X = x1) = 0.25
P(Y = y1 | X = x2) = 0.1
P(Y = y2 | X = x2) = 0.9
600 40
200 360
x1 x2
y1
y2
Conditional Probability
• Conditional probability is the probability
of one event, given the other event has
occurred.
• Example:
• Assume that we know the probability of finding a
hashtag in a tweet. Suppose we have a tweet
corpus C on a particular domain, where there is a
increased probability of finding a hashtag. In this
example, we have some notion (or prior) idea
about the probability of finding a hashtag in a
tag. When given an additional fact that the
corpus from where the tweet was drawn was C,
we now can revise our probability estimate on
hashtag, which is: P(hashtag|C). This is called
posterior probability
Sum Rule
In our example:
P(X = x1) =
P(X = x1, Y = y1) + P(X = x1, Y = y2)
We can calculate the other
probabilities: P(x2), P(y1), P(y2)
Note:
P(X = x1) + P(X = x2) = 1
600 40
200 360
x1 x2
y1
y2
Product Rule and Generalization
From product rule, we have:
P(X, Y) = P(Y|X) P(X)
We can generalize this in to:
P(An, ..,A1)=P(An|An-1..A1)P(An-1,..,A1)
Suppose e.g. n = 4
P(A4, A3, A2, A1) = P(A4|A3, A2, A1) P(A3|A2, A1)
P(A2|A1) P(A1)
600 40
200 360
x1 x2
y1
y2
Bayes Theorem
From product rule, we have:
P(X, Y) = P(Y|X) P(X)
We know: P(X, Y) = P(Y, X), hence:
P(Y|X) P(X) = P(X|Y) P(Y)
From the above, we derive:
P(Y|X) = P(X|Y) P(Y) / P(X)
The above is the Bayes Theorem
Independence
• Independent Variables: Knowing Y does not alter our belief on X
From product rule, we know:
P(X, Y) = P(X|Y) P(Y)
If X and Y are independent random variables:
P(X|Y) = P(X)
Hence: P(X, Y) = P(X) P(Y)
We write: X  Y to denote X, Y are independent
• Conditional Independence
• Informally, suppose X, Y are not independent taken together alone, but are independent on
observing another variable Z. This is denoted by: X  Y | Z
• Definition: Let X, Y, Z be discrete random variables. X is conditionally independent of Y given Z
if the probability distribution governing X is independent of the value of Y given a value of Z.
P(X|Y, Z) = P(X|Z)
Also: P(X, Y | Z) = P(X|Y, Z) P(Y|Z) = P(X|Z) P(Y|Z)
Graph Models: Bayesian Networks
• Graph models also called Bayesian networks, belief networks and
probabilistic networks
• Nodes and edges between the nodes
• Each node corresponds to a random variable X and the value of the node is
the probability of X
• If there is a direct edge between two vertices X to Y, it means there is a
influence of X on Y
• This influence is specified by the conditional probability P(Y|X)
• This is a DAG
• Nodes and edges define the structure of the network and the conditional
probabilities are the parameters given the structure
Examples
• Preparation for the exam R, and the marks obtained in the exam M
• Marketing budget B and the advertisements A
• Nationality of Team N and chance of qualifying for quarter final of
world cup, Q
• In all cases, the Probability distribution P respects the graph G
Representing the joint distributions
• Consider P(A, B, C) = P(A) P(B|A) P(C|A, B). This can
be represented as a graph (fig a)
• Key Concept: Factorization – each vertex has a factor
• Exercise: Draw the graph for P(X1, X2, X3, X4, X5)
• The joint probability distribution with conditional
probability assumptions respects the associated
graph. (fig b)
• The graph describing the distribution is useful in 2
ways:
• Visualization of conditional dependencies
• Inferencing
• Determining Conditional Independence of a
distribution is vital for tractable inference
A
B C
A
B C
Fig (a)
Fig (b)
Continuous Random Variables

f (x)  0,x
f (x) 1



P(0  X 1)  f (x)dx
0
1

• A random variable X is said to be continuous if it takes
continuous values – for example, a real valued number
representing some physical quantity that can vary at random.
• Suppose there is a sentiment analyser that outputs a sentiment
polarity value as a real number in the range (-1, 1). The output of
this system is an example of a continuous RV
• While the polarity as mentioned in the above example is a scalar
variable, we also encounter vectors of continuous random
variables. For example, some sentic computing systems model
emotions as a multi dimensional vector of real number.
• Likewise a vector whose elements are the average values of
hashtag, URL, Screen Names, Retweets per tweet, averaged over
a corpus constitutes a vector of continuous Random Variables
Notes on Probability density P(x)
Canonical Cases
• Head to tail
• Tail to tail
• Head to head
A
B C
A
B C
A
B C
Joint Distribution of Continuous Random Variables
HASHTAG RT SCREEN NAME URL CORPUS
0.462 0.65 0.534 0.336 Elections
0.434 0.482 0.41 0.548 Elections
0.452 0.454 0.398 0.58 Elections
0.232 0.254 0.146 0.284 Railway
0.248 0.296 0.218 0.32 Railway
0.64 0.312 0.18 0.158 Union
0.38 0.572 0.45 0.34 Elections
0.646 0.296 0.164 0.178 Union
0.314 0.262 0.162 0.338 Railway
0.672 0.318 0.176 0.188 Union
0.32 0.31 0.196 0.314 Railway
0.542 0.586 0.422 0.334 Elections
0.638 0.31 0.172 0.2 Union
0.41 0.636 0.37 0.348 Elections
0.382 0.58 0.418 0.344 Elections
0.416 0.492 0.414 0.58 Elections
0.546 0.63 0.434 0.318 Elections
0.512 0.676 0.47 0.368 Elections
0.672 0.314 0.166 0.158 Union
0.422 0.678 0.4 0.37 Elections
0.694 0.298 0.112 0.182 Union
0.44 0.46 0.442 0.582 Elections
0.502 0.668 0.462 0.388 Elections
0.418 0.49 0.388 0.562 Elections
0.478 0.62 0.41 0.308 Elections
0.5 0.624 0.516 0.334 Elections
Expectation Value
• For discrete variables:
• Expectation value: E[X] = Σ f(x) p(x)
• If a random sample is picked from the distribution, the expectation is simply
the average value of f(x)
• Thus, the values of hashtag, screen name etc shown in the previous slide
correspond to their respective expectation values
• For continuous variables:
• E[X] = Integral of f(x) p(x) dx over the range of values X can take
Common Distributions
• Normal X N(μ, σ2)

 E.g. the height of the entire population
f (x) 
1
 2
exp 
(x  )2
22






Naïve Bayes Classifier
• Derive on the board.
Naïve Bayes Classifier: Worked out example
• Classify the tweets in to one of the three corpora: (Railway, Union,
Elections), given a dataset.
• Split the dataset in to 75% training and 25% test
• Estimate the parameters using the training data
• Evaluate the model with the test data
• Report the classifier accuracy

More Related Content

What's hot

DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
Anuj Gupta
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
ananth
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1chidabdu
 
Unit 3
Unit 3Unit 3
Unit 3
guna287176
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
Kaniska Mandal
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
Ildar Nurgaliev
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2
Traian Rebedea
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
Natan Katz
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
Srinivasan R
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
Erlangen Artificial Intelligence & Machine Learning Meetup
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
Natan Katz
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Traian Rebedea
 
Classical Sets & fuzzy sets
Classical Sets & fuzzy setsClassical Sets & fuzzy sets
Classical Sets & fuzzy sets
Dr.Ashvini Chaudhari Bhongade
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic Handshakes
Lutz Maicher
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
Natan Katz
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Traian Rebedea
 
Unit 4
Unit 4Unit 4
Unit 4
guna287176
 
Unit i
Unit iUnit i
Unit i
guna287176
 
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
NAVER Engineering
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
taeseon ryu
 

What's hot (20)

DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Search problems in Artificial Intelligence
Search problems in Artificial IntelligenceSearch problems in Artificial Intelligence
Search problems in Artificial Intelligence
 
Algorithm chapter 1
Algorithm chapter 1Algorithm chapter 1
Algorithm chapter 1
 
Unit 3
Unit 3Unit 3
Unit 3
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
 
Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2Algorithm Design and Complexity - Course 1&2
Algorithm Design and Complexity - Course 1&2
 
Foundation of KL Divergence
Foundation of KL DivergenceFoundation of KL Divergence
Foundation of KL Divergence
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic ProgammingAlgorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
Algorithm Design and Complexity - Course 4 - Heaps and Dynamic Progamming
 
Classical Sets & fuzzy sets
Classical Sets & fuzzy setsClassical Sets & fuzzy sets
Classical Sets & fuzzy sets
 
The Impact Of Semantic Handshakes
The Impact Of Semantic HandshakesThe Impact Of Semantic Handshakes
The Impact Of Semantic Handshakes
 
GAN for Bayesian Inference objectives
GAN for Bayesian Inference objectivesGAN for Bayesian Inference objectives
GAN for Bayesian Inference objectives
 
Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5Algorithm Design and Complexity - Course 5
Algorithm Design and Complexity - Course 5
 
Unit 4
Unit 4Unit 4
Unit 4
 
Unit i
Unit iUnit i
Unit i
 
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
강화학습을 자연어 처리에 이용할 수 있을까? (보상의 희소성 문제와 그 방안)
 
Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr Deep learning paper review ppt sourece -Direct clr
Deep learning paper review ppt sourece -Direct clr
 

Viewers also liked

Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
ananth
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
ananth
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
ananth
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
ananth
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
ananth
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
ananth
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
ananth
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth
 
ADO.NET Introduction
ADO.NET IntroductionADO.NET Introduction
ADO.NET Introduction
Yogendra Tamang
 
Azure machine learning tech mela
Azure machine learning tech melaAzure machine learning tech mela
Azure machine learning tech mela
Yogendra Tamang
 
Operations Management Study in Textured Jersy Lanka Limited
Operations Management Study in Textured Jersy Lanka LimitedOperations Management Study in Textured Jersy Lanka Limited
Operations Management Study in Textured Jersy Lanka Limited
Hansi Thenuwara
 
Introduction and Starting ASP.NET MVC
Introduction and Starting ASP.NET MVCIntroduction and Starting ASP.NET MVC
Introduction and Starting ASP.NET MVC
Yogendra Tamang
 
Electronics projects
Electronics projectsElectronics projects
Electronics projects
Yogendra Tamang
 
Machine learning and azure ml studio
Machine learning and azure ml studioMachine learning and azure ml studio
Machine learning and azure ml studio
Yogendra Tamang
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
ananth
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
ananth
 
Machine learning and azure ml studio gabc
Machine learning and azure ml studio gabcMachine learning and azure ml studio gabc
Machine learning and azure ml studio gabc
Yogendra Tamang
 
Arduino Workshop Day 2
Arduino  Workshop Day 2Arduino  Workshop Day 2
Arduino Workshop Day 2
Yogendra Tamang
 
Travel sri lanka
Travel sri lankaTravel sri lanka
Travel sri lanka
Hansi Thenuwara
 

Viewers also liked (20)

Natural Language Processing: L02 words
Natural Language Processing: L02 wordsNatural Language Processing: L02 words
Natural Language Processing: L02 words
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Overview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language ProcessingOverview of TensorFlow For Natural Language Processing
Overview of TensorFlow For Natural Language Processing
 
Natural Language Processing: L01 introduction
Natural Language Processing: L01 introductionNatural Language Processing: L01 introduction
Natural Language Processing: L01 introduction
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 
Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1Convolutional Neural Networks: Part 1
Convolutional Neural Networks: Part 1
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
ADO.NET Introduction
ADO.NET IntroductionADO.NET Introduction
ADO.NET Introduction
 
Azure machine learning tech mela
Azure machine learning tech melaAzure machine learning tech mela
Azure machine learning tech mela
 
Operations Management Study in Textured Jersy Lanka Limited
Operations Management Study in Textured Jersy Lanka LimitedOperations Management Study in Textured Jersy Lanka Limited
Operations Management Study in Textured Jersy Lanka Limited
 
Introduction and Starting ASP.NET MVC
Introduction and Starting ASP.NET MVCIntroduction and Starting ASP.NET MVC
Introduction and Starting ASP.NET MVC
 
Electronics projects
Electronics projectsElectronics projects
Electronics projects
 
Machine learning and azure ml studio
Machine learning and azure ml studioMachine learning and azure ml studio
Machine learning and azure ml studio
 
RC4&RC5
RC4&RC5RC4&RC5
RC4&RC5
 
Deep Learning Primer - a brief introduction
Deep Learning Primer - a brief introductionDeep Learning Primer - a brief introduction
Deep Learning Primer - a brief introduction
 
L05 word representation
L05 word representationL05 word representation
L05 word representation
 
Machine learning and azure ml studio gabc
Machine learning and azure ml studio gabcMachine learning and azure ml studio gabc
Machine learning and azure ml studio gabc
 
Arduino Workshop Day 2
Arduino  Workshop Day 2Arduino  Workshop Day 2
Arduino Workshop Day 2
 
Travel sri lanka
Travel sri lankaTravel sri lanka
Travel sri lanka
 

Similar to Natural Language Processing: L03 maths fornlp

Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
ananth
 
Computationally Viable Handling of Beliefs in Arguments for Persuasion
Computationally Viable Handling of Beliefs in Arguments for PersuasionComputationally Viable Handling of Beliefs in Arguments for Persuasion
Computationally Viable Handling of Beliefs in Arguments for Persuasion
Emmanuel Hadoux
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
ssuser1eba67
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
HomeworkAssignmentHe
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
Excel Homework Help
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
SESSION-11 PPT.pptx
SESSION-11 PPT.pptxSESSION-11 PPT.pptx
SESSION-11 PPT.pptx
NaniSarath
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distributionpolscjp
 
Chapter-6-Random Variables & Probability distributions-3.doc
Chapter-6-Random Variables & Probability distributions-3.docChapter-6-Random Variables & Probability distributions-3.doc
Chapter-6-Random Variables & Probability distributions-3.doc
Desalechali1
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
Cumberland County Schools
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceasimnawaz54
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).ppt
AlfredNavea1
 
Chapter 1 random variables and probability distributions
Chapter 1   random variables and probability distributionsChapter 1   random variables and probability distributions
Chapter 1 random variables and probability distributions
Antonio F. Balatar Jr.
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
Nanyang Technological University
 
Random Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptxRandom Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptx
JAYARSOCIAS3
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
Statistics Homework Helper
 
Random variables and probability distributions Random Va.docx
Random variables and probability distributions Random Va.docxRandom variables and probability distributions Random Va.docx
Random variables and probability distributions Random Va.docx
catheryncouper
 
Discrete Math Lecture 02: First Order Logic
Discrete Math Lecture 02: First Order LogicDiscrete Math Lecture 02: First Order Logic
Discrete Math Lecture 02: First Order Logic
IT Engineering Department
 

Similar to Natural Language Processing: L03 maths fornlp (20)

Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Computationally Viable Handling of Beliefs in Arguments for Persuasion
Computationally Viable Handling of Beliefs in Arguments for PersuasionComputationally Viable Handling of Beliefs in Arguments for Persuasion
Computationally Viable Handling of Beliefs in Arguments for Persuasion
 
ISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptxISM_Session_5 _ 23rd and 24th December.pptx
ISM_Session_5 _ 23rd and 24th December.pptx
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
SESSION-11 PPT.pptx
SESSION-11 PPT.pptxSESSION-11 PPT.pptx
SESSION-11 PPT.pptx
 
Theory of probability and probability distribution
Theory of probability and probability distributionTheory of probability and probability distribution
Theory of probability and probability distribution
 
Chapter-6-Random Variables & Probability distributions-3.doc
Chapter-6-Random Variables & Probability distributions-3.docChapter-6-Random Variables & Probability distributions-3.doc
Chapter-6-Random Variables & Probability distributions-3.doc
 
Discrete and Continuous Random Variables
Discrete and Continuous Random VariablesDiscrete and Continuous Random Variables
Discrete and Continuous Random Variables
 
Lecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inferenceLecture 2 predicates quantifiers and rules of inference
Lecture 2 predicates quantifiers and rules of inference
 
AP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).pptAP Statistic and Probability 6.1 (1).ppt
AP Statistic and Probability 6.1 (1).ppt
 
Chapter 1 random variables and probability distributions
Chapter 1   random variables and probability distributionsChapter 1   random variables and probability distributions
Chapter 1 random variables and probability distributions
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
C sanlitalk lvhash_spike_fromdatatoknowledge_mons2015
 
Random Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptxRandom Variable & Probability Distribution 1.pptx
Random Variable & Probability Distribution 1.pptx
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Random variables and probability distributions Random Va.docx
Random variables and probability distributions Random Va.docxRandom variables and probability distributions Random Va.docx
Random variables and probability distributions Random Va.docx
 
Discrete Math Lecture 02: First Order Logic
Discrete Math Lecture 02: First Order LogicDiscrete Math Lecture 02: First Order Logic
Discrete Math Lecture 02: First Order Logic
 
Chapter07
Chapter07Chapter07
Chapter07
 

More from ananth

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
ananth
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
ananth
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
ananth
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
ananth
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ananth
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
ananth
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
ananth
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
ananth
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth
 

More from ananth (10)

Generative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variantsGenerative Adversarial Networks : Basic architecture and variants
Generative Adversarial Networks : Basic architecture and variants
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Foundations: Artificial Neural Networks
Foundations: Artificial Neural NetworksFoundations: Artificial Neural Networks
Foundations: Artificial Neural Networks
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Introduction to Artificial Intelligence
Introduction to Artificial IntelligenceIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 

Recently uploaded

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
MayankTawar1
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 

Recently uploaded (20)

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 

Natural Language Processing: L03 maths fornlp

  • 1. Mathematical Background For Natural Language Processing Anantharaman Narayana Iyer Narayana dot Anantharaman at gmail dot com 14 Aug 2015
  • 2. Why Probability for NLP? • Consider the following example scenarios: • You are travelling in an autorikshaw on a busy road in Bangalore and are a on a call with your friend. • We are watching an Hollywood English film. Often times we may not understand exactly every word that is spoken either due to the accent of the speaker or the word is a slang that not everyone outside the context can relate to. • We are reading tweets that are cryptic with several misspelled words, emoticons, hashtags and so on. • Commonality in all the above cases is the presence of noise along with the signal • The noise or ambiguities result in uncertainty of interpretation • In order for an NLP application to process such text, we need an appropriate mathematical machinery. • Probability theory is our tool to handle such cases.
  • 3. What is a Random Variable? • A is a Boolean valued RV if A denotes an event and there is some degree of uncertainty to whether A occurs. • Example: It will rain in Manchester during the 4th Cricket test match between India and England • Probability of A is the fraction of possible worlds in which A is true • The area of blue rectangle = 1 • Random Variable is not a variable in the traditional sense. It is rather a function mapping. Worlds where A is true Worlds where A is false
  • 4. Axioms of Probability The following axioms always hold good: • 0 <= P(A) <= 1 • P(True) = 1 • P(False) = 0 • P(A or B) = P(A) + P(B) – P(A and B) Note: We can diagrammatically represent the above and verify these
  • 5. Multivalued Discrete Random Variables Examples of multivalued RVs • Number of hashtags in a Tweet • Number of URLs in a tweet • Number of tweets that mention the term iPhone in a given day • Number of tweets sent by Times Now channel per day • The values a word position can take in a natural language text sentence of a given vocabulary
  • 6. Random Variable Example • We can treat the number of hashtags contained in a given tweet (h) as a Random Variable because: • A tweet has non-negative number of hashtags: 0 to n, where n is the maximum number of hashtags bounded by the size of the tweet. For example, if a tweet has 140 characters and if the minimum string length required to specify a hashtag is 3 (one # character, one character for the tag, one delimiter), then n = floor(140 / 3) = 46 • As we observe, h is a discrete RV taking values: 0 <= h <= 46 • Thus the probability of finding a hashtag in a tweet satisfies: 0 < P(hashtag in a tweet) < 1 P(h = 0 V h = 1 V h = 2….V h = 46) = 1 – where h is the number of hashtags in the tweet P(h = hi, h = hj) = 0 when i and j are not equal – hi and hj are 2 different values of hashtag • If we are given a set of tweets, e.g., 500, we can compute the average number of hashtags per tweet. This is real valued and hence a continuous variable. • We can find the probability distribution for the continuous variable also (see later slides)
  • 7. Joint Distribution of Discrete Random Variables • Consider 2 RVs X and Y, where X and Y can take discrete values. The joint distribution is given by: P(X = x, Y = y) The above satisfies: 1. P(X, Y) >= 0 2. Σ Σ P(X = xi, Y = yj) = 1 where the summation is done for all i and all j This concept can be generalized in to n variables HASHTAG RT URL SNAME 0 0 1 0 3 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 0 0 0 0 0 1 0 3 0 0 0 0 1 0 0
  • 8. Frequency Distributions for our Tweet Corpora Some Observations • Frequency distribution shows distinguishable patterns across the different corpora. For e.g, in the case of Union budget, the number of tweets that have hashtags one or more is roughly 50% of total tweets while that for Railway budget is roughly 25% • The expectation value of hashtags/Retweets/URLs/Screen names per tweet can be treated as a vector of real valued Random Variables and used for classification. 0 50 100 150 200 250 300 350 400 0 1 2 3 4 5 6 7 8 9 10 Frequency Distribution For Dataset 10 (Tweets on General Elections) htfreq rtfreq urlfreq snfreq 0 50 100 150 200 250 300 350 400 450 0 1 2 3 4 5 6 7 8 9 10 Frequency Distribution for Dataset 1 (Tweets on Railway Budget) htfreq rtfreq urlfreq snfreq 0 100 200 300 400 500 0 1 2 3 4 5 6 7 8 9 10 Frequency Distribution for Dataset 19 (Tweets on Union Budget) htfreq rtfreq urlfreq snfreq
  • 9. Random Variables: Illustration • Let us illustrate the definition of Random Variables and derive the sum, product rules of conditional probability with an example below: • Suppose we follow tweets from 2 media channels, say, Times News (x1) and Business News (x2). x1 is a general news channel and x2 is focused more on business news. • Suppose we measured the tweets generated by either of these 2 channels over a 1 month duration and observed the following: • Total tweets generated (union of tweets from both channels) = 1200 • Break up of the tweets received as given in the table below: 600 40 200 360 x1 x2 y1 y2
  • 10. Random Variables • We can model the example we stated using 2 random variables: • Source of tweets (X), Type of tweet (Y) • X and Y are 2 RVs that can take values: • X = x where x ԑ {x1, x2}, Y = y where y ԑ {y1, y2} • Also, from the given data, we can compute prior probabilities and likelihood: P(X = x1) = (600 + 200)/1200 = 2/3 = 0.67 P(X = x2) = 1 – 2/3 = 1/3 = 0.33 P(Y = y1 | X = x1) = 600 / (600 + 200) = 3/4 = 0.75 P(Y = y2 | X = x1) = 0.25 P(Y = y1 | X = x2) = 0.1 P(Y = y2 | X = x2) = 0.9 600 40 200 360 x1 x2 y1 y2
  • 11. Conditional Probability • Conditional probability is the probability of one event, given the other event has occurred. • Example: • Assume that we know the probability of finding a hashtag in a tweet. Suppose we have a tweet corpus C on a particular domain, where there is a increased probability of finding a hashtag. In this example, we have some notion (or prior) idea about the probability of finding a hashtag in a tag. When given an additional fact that the corpus from where the tweet was drawn was C, we now can revise our probability estimate on hashtag, which is: P(hashtag|C). This is called posterior probability
  • 12. Sum Rule In our example: P(X = x1) = P(X = x1, Y = y1) + P(X = x1, Y = y2) We can calculate the other probabilities: P(x2), P(y1), P(y2) Note: P(X = x1) + P(X = x2) = 1 600 40 200 360 x1 x2 y1 y2
  • 13. Product Rule and Generalization From product rule, we have: P(X, Y) = P(Y|X) P(X) We can generalize this in to: P(An, ..,A1)=P(An|An-1..A1)P(An-1,..,A1) Suppose e.g. n = 4 P(A4, A3, A2, A1) = P(A4|A3, A2, A1) P(A3|A2, A1) P(A2|A1) P(A1) 600 40 200 360 x1 x2 y1 y2
  • 14. Bayes Theorem From product rule, we have: P(X, Y) = P(Y|X) P(X) We know: P(X, Y) = P(Y, X), hence: P(Y|X) P(X) = P(X|Y) P(Y) From the above, we derive: P(Y|X) = P(X|Y) P(Y) / P(X) The above is the Bayes Theorem
  • 15. Independence • Independent Variables: Knowing Y does not alter our belief on X From product rule, we know: P(X, Y) = P(X|Y) P(Y) If X and Y are independent random variables: P(X|Y) = P(X) Hence: P(X, Y) = P(X) P(Y) We write: X  Y to denote X, Y are independent • Conditional Independence • Informally, suppose X, Y are not independent taken together alone, but are independent on observing another variable Z. This is denoted by: X  Y | Z • Definition: Let X, Y, Z be discrete random variables. X is conditionally independent of Y given Z if the probability distribution governing X is independent of the value of Y given a value of Z. P(X|Y, Z) = P(X|Z) Also: P(X, Y | Z) = P(X|Y, Z) P(Y|Z) = P(X|Z) P(Y|Z)
  • 16. Graph Models: Bayesian Networks • Graph models also called Bayesian networks, belief networks and probabilistic networks • Nodes and edges between the nodes • Each node corresponds to a random variable X and the value of the node is the probability of X • If there is a direct edge between two vertices X to Y, it means there is a influence of X on Y • This influence is specified by the conditional probability P(Y|X) • This is a DAG • Nodes and edges define the structure of the network and the conditional probabilities are the parameters given the structure
  • 17. Examples • Preparation for the exam R, and the marks obtained in the exam M • Marketing budget B and the advertisements A • Nationality of Team N and chance of qualifying for quarter final of world cup, Q • In all cases, the Probability distribution P respects the graph G
  • 18. Representing the joint distributions • Consider P(A, B, C) = P(A) P(B|A) P(C|A, B). This can be represented as a graph (fig a) • Key Concept: Factorization – each vertex has a factor • Exercise: Draw the graph for P(X1, X2, X3, X4, X5) • The joint probability distribution with conditional probability assumptions respects the associated graph. (fig b) • The graph describing the distribution is useful in 2 ways: • Visualization of conditional dependencies • Inferencing • Determining Conditional Independence of a distribution is vital for tractable inference A B C A B C Fig (a) Fig (b)
  • 19. Continuous Random Variables  f (x)  0,x f (x) 1    P(0  X 1)  f (x)dx 0 1  • A random variable X is said to be continuous if it takes continuous values – for example, a real valued number representing some physical quantity that can vary at random. • Suppose there is a sentiment analyser that outputs a sentiment polarity value as a real number in the range (-1, 1). The output of this system is an example of a continuous RV • While the polarity as mentioned in the above example is a scalar variable, we also encounter vectors of continuous random variables. For example, some sentic computing systems model emotions as a multi dimensional vector of real number. • Likewise a vector whose elements are the average values of hashtag, URL, Screen Names, Retweets per tweet, averaged over a corpus constitutes a vector of continuous Random Variables
  • 20. Notes on Probability density P(x)
  • 21. Canonical Cases • Head to tail • Tail to tail • Head to head A B C A B C A B C
  • 22. Joint Distribution of Continuous Random Variables HASHTAG RT SCREEN NAME URL CORPUS 0.462 0.65 0.534 0.336 Elections 0.434 0.482 0.41 0.548 Elections 0.452 0.454 0.398 0.58 Elections 0.232 0.254 0.146 0.284 Railway 0.248 0.296 0.218 0.32 Railway 0.64 0.312 0.18 0.158 Union 0.38 0.572 0.45 0.34 Elections 0.646 0.296 0.164 0.178 Union 0.314 0.262 0.162 0.338 Railway 0.672 0.318 0.176 0.188 Union 0.32 0.31 0.196 0.314 Railway 0.542 0.586 0.422 0.334 Elections 0.638 0.31 0.172 0.2 Union 0.41 0.636 0.37 0.348 Elections 0.382 0.58 0.418 0.344 Elections 0.416 0.492 0.414 0.58 Elections 0.546 0.63 0.434 0.318 Elections 0.512 0.676 0.47 0.368 Elections 0.672 0.314 0.166 0.158 Union 0.422 0.678 0.4 0.37 Elections 0.694 0.298 0.112 0.182 Union 0.44 0.46 0.442 0.582 Elections 0.502 0.668 0.462 0.388 Elections 0.418 0.49 0.388 0.562 Elections 0.478 0.62 0.41 0.308 Elections 0.5 0.624 0.516 0.334 Elections
  • 23. Expectation Value • For discrete variables: • Expectation value: E[X] = Σ f(x) p(x) • If a random sample is picked from the distribution, the expectation is simply the average value of f(x) • Thus, the values of hashtag, screen name etc shown in the previous slide correspond to their respective expectation values • For continuous variables: • E[X] = Integral of f(x) p(x) dx over the range of values X can take
  • 24. Common Distributions • Normal X N(μ, σ2)   E.g. the height of the entire population f (x)  1  2 exp  (x  )2 22      
  • 25. Naïve Bayes Classifier • Derive on the board.
  • 26. Naïve Bayes Classifier: Worked out example • Classify the tweets in to one of the three corpora: (Railway, Union, Elections), given a dataset. • Split the dataset in to 75% training and 25% test • Estimate the parameters using the training data • Evaluate the model with the test data • Report the classifier accuracy