8. 8
Let us distinguish two types of sequence data
● Continuous time series
● Categorical (discrete) sequences
9. 9
Let us distinguish two types of sequence data
● Continuous time series
– Stock share price
– Daily degree in Cologne
● Categorical (discrete) sequences (focus)
– Sunny/Rainy weather sequence
– Human mobility
– Web navigation
– Song listening sequences
13. 13
Markov Chain Model
● Stochastic Model
● Transitions between states
S1S1
S2S2 S3S3
1/2 1/2
1/3
2/3
1
States
Transition
probabilities
14. 14
Markov Chain Model
● Markovian property
– The next state in a sequence only depends on the
current one, and not on a sequence of preceding ones
S1S1
S2S2 S3S3
1/2 1/2
1/3
2/3
1
States
Transition
probabilities
20. 20
Maximum Likelihood (MLE)
● Given some sequence data, how can we
determine parameters?
● MLE estimation
Maximize!
See ref [1]
[1] http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
26. 26
Full example
5/7 2/7
2/3 1/3
Transition matrix (MLE) Likelihood of given sequence
We calculate the probability of the sequence with
the assumption that we start with sunny.
29. 29
Higher order Markov Chain models
● Drop the memoryless assumption?
● Models of increasing order
– 2nd
order MC model
– 3rd
order MC model
– ...
30. 30
Higher order Markov Chain models
● Drop the memoryless assumption?
● Models of increasing order
– 2nd
order MC model
– 3rd
order MC model
– ...
2nd
order example
depends on
31. 31
Higher order to first order transformation
● Transform state space
●
2nd
order example – new compound states
33. 33
Reset states
R R
...R R R R
● Marking start and end of sequences
● Transformation easier (same #transitions)
34. 34
Comparing models
●
1st
vs. 2nd
order
● Statistical model comparison necessary
● Nested models → higher order always fits better
● Account for potential overfitting
35. 35
Model comparison
● Likelihood ratio test
– Ratio between likelihoods for order m and k
– Follows a Chi2 distribution with degrees of freedom
– Only for nested models
● Akaike Information Criterion (AIC)
–
– The lower the better
● Bayesian Information Criterion (BIC)
–
● Bayes Factors
– Ratio of evidences (marginal likelihoods)
● Cross validation
See http://journals.plos.org/plosone/articleid=10.1371/journal.pone.0102070
36. 36
AIC example
R R
...R R R R
5/8 2/8
2/3 1/3
R
R
1/8
0/3
1/1 0/1 0/1
3/5 1/5
1/2 1/2
0
1/2 1/2
R R
R
R
R
R
R
1/5
0
1/10
0
1/1 0 0
1/1 0 0
0 00
00
0 00
0 00
1st
order parameters
2nd
order parameters
37. 37
AIC example
5/8 2/8
2/3 1/3
R
R
1/8
0/3
1/1 0/1 0/1
3/5 1/5
1/2 1/2
0
1/2 1/2
R R
R
R
R
R
R
1/5
0
1/10
0
1/1 0 0
1/1 0 0
0 00
00
0 00
0 00
1st
order parameters
2nd
order parameters
Example on
blackboard
40. 40
Hidden Markov Models
● Extends Markov chain model
● Hidden state sequence
● Observed emissions
What is the
weather like?
41. 41
Forward-Backward algorithm
● Given emission sequence
● Probability of emission sequence?
● Probable sequence of hidden states?
Hidden seq.Obs. seq.
Check out YouTube tutorial: https://www.youtube.com/watch?v=7zDARfKVm7s
Further material: cs229.stanford.edu/section/cs229-hmm.pdf
42. 42
Setup
0.7 0.3
0.6 0.4
R 0.5 0.5
0.9
0.2
0.1
0.8
R
0.5
0.5
Note: Literature usually uses a start
probability and uniform end probability
for the forward-backward algorithm.
44. 44
Forward
0.7 0.3
0.6 0.4
R 0.5 0.5
0.9
0.2
0.1
0.8
R
0.4
0.1
0.034
0.144
R
0.5
0.5
What is the probability of
going to each possible
state at t2 given t1?
48. 48
Backwards
0.7 0.3
0.6 0.4
R 0.5 0.5
0.9
0.2
0.1
0.8
0.31
0.28
R
0.5
0.5
What is the probability of
arriving at t4 given each
possible state at t3?
R
0.5
0.5
53. 53
Learning parameters
● Train parameters of HMM
● No tractable solution for MLE known
● Baum-Welch algorithm
– Special case of EM algorithm
– Uses Forward-Backward