Hidden Markov Models
Upcoming SlideShare
Loading in...5

Hidden Markov Models






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Hidden Markov Models Hidden Markov Models Presentation Transcript

  • Markov chain and Hidden Markov Models Nasir and Rajab Dr. Pan Spring 2014
  • MARKOV CHAINS: One assumption that the stochastic process lead to Markov chain, which has the following key property: A stochastic process {Xt} is said to have the Markovian property if The state of the system at time t+1 depends only on the state of the system at time t ]xX|xP[X ]xX,xX,...,xX,xX|xP[X tt11t 00111-t1-ttt11t t t
  • The n-step transition matrix
  • _ Irreducible Markov chain : A Markov Chain is irreducible if the corresponding graph is strongly connected. - Recurrent and transient states A and B are transient states, C and D are recurrent states. Once the process moves from B to D, it will never come back. E
  • The period of a state A Markov Chain is periodic if all the states in it have a period k >1. It is aperiodic otherwise. Ergodic A Markov chain is ergodic if : 1.the corresponding graph is strongly connected. 2.It is not peridoic E
  • Markov Chain Example • Based on the weather today what will it be tomorrow? • Assuming only four possible weather states Sunny Cloudy Rainy Snowing
  • Markov Chain Structure • Each state is an observable event • At each time interval the state changes to another or same state (qt {S1, S2, S3, S4}) State S1 State S2 State S3 State S4 (Sunny) (Snowing)(Rainy) (Cloudy)
  • Markov Chain Structure Sunny Cloudy Rainy Snowy
  • Markov Chain Transition Probabilities • Transition probability matrix: Time t + 1 State S1 S2 S3 S4 Total Time t S1 a11 a12 a13 a14 1 S2 a21 a22 a23 a24 1 S3 a31 a32 a33 a34 1 S4 a41 a42 a43 a44 1
  • Markov Chain Transition Probabilities • Probabilities for tomorrow’s weather based on today’s weather Time t + 1 State Sunny Cloudy Rainy Snowing Time t Sunny 0.6 0.3 0.1 0.0 Cloudy 0.2 0.4 0.3 0.1 Rainy 0.1 0.2 0.5 0.2 Snowing 0.0 0.3 0.2 0.5
  • 0.6 0.4 0.5 0.5 0.2 0.1 0.3 0.2 0.3 0.1 0.3 0.2 0.1 0.2 Sunny Cloudy Rainy Snowing
  • 0.1 0.1 0.4 0.1 transition probabilities 1.0)|Pr( 4.0)|Pr( 1.0)|Pr( 1.0)|Pr( 1 1 1 1 gXtX gXgX gXcX gXaX ii ii ii ii Markov Chain Models A Markov Chain Model for DNA A TC G begin state transition A Adenine C G T Cytosine Guanine Thymine
  • The Probability of a Sequence for a Given Markov Chain Model end A TC G begin Pr(cggt) Pr(c |begin)Pr(g |c)Pr(g | g)Pr(t | g)Pr(end | t)
  • Markov Chain Notation The transition parameters can be denoted by where • Similarly we can denote the probability of a sequence x as where represents the transition from the begin state • This gives a probability distribution over sequences of length M )|Pr()Pr( 1 2 1 2 11 i M i i M i xxx xxxaa iiB axi 1xi Pr(Xi xi | Xi 1 xi 1) ii xxa 1 1xaB
  • Estimating the Model Parameters Given some data (e.g. a set of sequences from CpG islands), how can we determine the probability parameters of our model? * One approach maximum likelihood estimation * A Bayesian Approach The "p" in CpG indicates that the C and the G are next to each other in sequence, regardless of being single- or double- stranded. In a CpG site, both C and G are found on the same strand of DNA or RNA and are connected by a phosphodiester bond. This is a covalent bond between atoms.
  • Maximum Likelihood Estimation • Let’s use a very simple sequence model Every position is independent of the others Eevery position generated from the same multinomial distribution We want to estimate the parameters Pr(a), Pr(c), Pr(g), Pr(t) and we’re given the sequences accgcgctta gcttagtgac tagccgttac then the maximum likelihood estimates are the observed frequencies of the bases 267.0 30 8 )Pr( 233.0 30 7 )Pr( t g 3.0 30 9 )Pr( 2.0 30 6 )Pr( c a Pr(a) na ni i
  • Maximum Likelihood Estimation • Suppose instead we saw the following sequences gccgcgcttg gcttggtggc tggccgttgc • Then the maximum likelihood estimates are 267.0 30 8 )Pr( 433.0 30 13 )Pr( t g 3.0 30 9 )Pr( 0 30 0 )Pr( c a
  • A Bayesian Approach • A more general form: m-estimates mn mpn a i i aa )Pr( • with m=8 and uniform priors gccgcgcttg gcttggtggc tggccgttgc number of “virtual” instances prior probability of a 38 11 830 825.09 )Pr(c
  • Estimation for 1st Order Probabilities To estimate a 1st order parameter, such as Pr(c|g), we count the number of times that c follows the history g in our given sequences using Laplace estimates with the sequences gccgcgcttg gcttggtggc tggccgttgc 412 12 )|Pr( 412 13 )|Pr( 412 17 )|Pr( 412 10 )|Pr( gt gg gc ga  47 10 )|Pr( ca
  • Example Application Markov Chains for Discrimination • Suppose we want to distinguish CpG islands from other sequence regions • given sequences from CpG islands, and sequences from other regions, we can construct • A model to represent CpG islands. • A null model to represent the other regions.
  • Markov Chains for Discrimination + a c g t a .18 .27 .43 .12 c .17 .37 .27 .19 g .16 .34 .38 .12 t .08 .36 .38 .18 - a c g t a .30 .21 .28 .21 c .32 .30 .08 .30 g .25 .24 .30 .21 t .18 .24 .29 .29 • Parameters estimated for CpG and null models human sequences containing 48 CpG islands 60,000 nucleotides Pr( | )c a CpG null
  • The CpG matrix The Null matrix
  • Hidden Markov Models (HMM) “Doubly stochastic process with an underlying process that is not observable (It’s Hidden) but can only be observed through another set of stochastic process that produce the sequence observed symbol” Rabiner& Juang 86’
  • • In Markov chain, the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Difference between Markov chains and HMM
  • HMM Example • Suppose we want to determine the average annual temperature at a particular location on earth over a series of years. • we consider two annual temperatures, (hot" and cold“). State H C H 0.7 0.3 C 0.4 0.6 Time t +1 Time t The state is the average annual temperature. The transition from one state to the next is a Markov process
  • Since we can't observe the state (temperature) in the past, we can observe the size of tree rings.
  • Now suppose that current research indicates a correlation between the size of tree growth rings and temperature. we only consider three different tree ring sizes, small, medium and large, respectively. S M L H 0.1 0.4 0.5 C 0.7 0.2 0.1 Observation State The probabilistic relationship between annual temperature and tree ring sizes is
  • State probability Normalized HHHH 0.000412 0.042787 HHHC 0.000035 0.003635 HHCH 0.000706 0.073320 HHCC 0.000212 0.022017 HCHH 0.000050 0.005193 HCHC 0.000004 0.000415 HCCH 0.000302 0.031364 HCCC 0.000091 0.009451 CHHH 0.001098 0.114031 CHHC 0.000094 0.009762 CHCH 0.001882 0.195451 CHCC 0.000564 0.058573 CCHH 0.000470 0.048811 CCHC 0.000040 0.004154 CCCH 0.002822 0.293073 CCCC 0.000847 0.087963 Table 1: State sequence probabilities To find the optimal state sequence in the dynamic programming (DP), we simply choose the sequence with the highest probability, namely, CCCH.
  • 0 1 2 3 P(H) 0.188182 0.519576 0.228788 0.804029 P(C) 0.811818 0.480424 0.771212 0.195971 Table 2: HMM probabilities From Table 2 we found that the optimal sequence in the HMM sense is CHCH. and, the optimal DP sequence differs from the optimal HMM sequence and all state transitions are valid. Note that: The DP solution and the HMM solution are not necessarily the same. For example, the DP solution must have valid state transitions, while this is not necessarily the case for the HMMs.
  • R code of CpG matrix : library(markovchain) DNAStates <- c("A", "C", "G","T") byRow <- TRUE DNAMatrix <- matrix(data = c(0.18,0.27,0.43,0.12,0.17,0.37,0.27,0.19,0.16,0.34,0.38,0.12,0.08,0.36,0.38,0.18),byrow =byRow , nrow =4 , dimnames = list(DNAStates, DNAStates)) mcDNA <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name = "DNA") plot(mcDNA) R code of null matrix : DNAStates <- c("A", "C", "G","T") byRow <- TRUE DNAMatrix <- matrix(data = c(0.30,0.21,0.28,0.21,0.32,0.30,0.08,0.30,0.25,0.24,0.30,0.21,0.18,0.24,0.29,0.29),byrow =byRow , nrow =4 , dimnames = list(DNAStates, DNAStates)) mcDNAnull <- new("markovchain", states = DNAStates, byrow = byRow, transitionMatrix = DNAMatrix, name = "DNA") plot(mcDNAnull)
  • References: http://www.scs.leeds.ac.uk/scs-only/teaching-materials/HiddenMarkovModels/html_dev/main.html A Revealing Introduction to Hidden Markov Models, Mark Stamp, Department of Computer Science San Jose State University, September 28, 2012