INTRODUCTION OF HIDDEN
MARKOV MODEL
Mohan Kumar Yadav
M.Sc Bioinformatics
JNU JAIPUR
HIDDEN MARKOV MODEL(HMM)
Real-world has structures and processes which have
observable outputs.
– Usually sequential .
– Cannot see the event producing the output.
Problem: how to construct a model of the structure or
process given only observations.
HISTORY OF HMM
• Basic theory developed and published in 1960s and 70s
• No widespread understanding and application until late
80s
• Why?
– Theory published in mathematic journals which were
not widely read.
– Insufficient tutorial material for readers to understand
and apply concepts.
Andrei Andreyevich Markov
1856-1922

Andrey Andreyevich Markov was a Russian
mathematician.
He is best known for his work on stochastic
processes.
A primary subject of his research later became
known as Markov chains and Markov processes .
HIDDEN MARKOV MODEL
• A Hidden Markov Model (HMM) is a statical model in
which the system is being modeled is assumed to be
a Markov process with hidden states.
• Markov chain property: probability of each
subsequent state depends only on what was the
previous state.
EXAMPLE OF HMM
• Coin toss:
– Heads, tails sequence with 2 coins
– You are in a room, with a wall
– Person behind wall flips coin, tells result
– Coin selection and toss is hidden
– Cannot observe events, only output (heads, tails) from events
– Problem is then to build a model to explain observed
sequence of heads and tails.
EXAMPLE OF HMM
• Weather
– Once each day weather is observed
– State 1: rain
– State 2: cloudy
– State 3: sunny
– What is the probability the weather for the next 7 days will be:
– sun, sun, rain, rain, sun, cloudy, sun

– Each state corresponds to a physical observable event
HMM COMPONENTS
• A set of states (x’s)
• A set of possible output symbols (y’s)
• A state transition matrix (a’s)
– probability of making transition from one state to the
next
• Output emission matrix (b’s)
– probability of a emitting/observing a symbol at a
particular state
• Initial probability vector
– probability of starting at a particular state
– Not shown, sometimes assumed to be 1
EXAMPLE OF HMM
0.3

0.7

Rain

Dry
0.2

• Two states : ‘Rain’ and ‘Dry’.
• Transition probabilities: P(‘Rain’|‘Rain’)=0.3 ,

P(‘Dry’|‘Rain’)=0.7 , P(‘Ra’)=0.6 .
• in’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8
• Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry

0.8
CALCULATION OF HMM
HMM COMPONENTS
COMMON HMM TYPES
• Ergodic (fully connected):
– Every state of model can be reached in a single step from
every other state of the model.

• Bakis (left-right):
– As time increases, states proceed from left to right
HMM IN BIOINFORMATICS
• Hidden Markov Models (HMMs) are a
probabilistic model for modeling and
representing biological sequences.
• They allow us to do things like find genes, do
sequence alignments and find regulatory
elements such as promoters in a principled
manner.
PROBLEMS OF HMM
• Three problems must be solved for HMMs to be
useful in real-world applications
●

1) Evaluation

●

2) Decoding

●

3) Learning
EVOLUTION OF PROBLEM
Given a set of HMMs, which is the one most
likely to have produced the observation sequence?
GACGAAACCCTGTCTCTATTTATCC
p(HMM-3)?
p(HMM-1)?
p(HMM-2)?
HMM 1

HMM 2

HMM 3

p(HMM-n)?

…

HMM n
DECODING PROBLEM
TRAINING PROBLEM

From raw seqence data…
AATAGAGAGGTTCGACTCTGCAT
TTCCCAAATACGTAATGCTTACGG
TACACGACCCAAGCTCTCTGCTT
GAATCCCAAATCTGAGCGGACAG
ATGAGGGGGCGCAGAGGAAAAA
CAGGTTTTGGACCCTACATAAAN
AGAGAGGTTCGTAAATAGAGAGG
TTCGACTCTGCATTTCCCAAATAC
GTAATGCTTACGGTTAAATAGAGA
GGTTCGACTCTGCATTTCCCAAA
TACGTAATGCTTACGGTACACGA
CCCAAGCTCTCTGCTTGTAACTT
GTTTTNGTCGCAGCTGGTCTTGC
CTTTGCTGGGGCTGCTGAC

to Transition Probabilities




A+ C+

A+

H
o
w
?

C+
G+
T+
ACGT-

0.17
0.16
0.15
0.07
0.01
0.01
0.01
0.01

0.26
0.36
0.33
0.35
0.01
0.01
0.01
0.01

G+ T+

0.42
0.26
0.37
0.37
0.01
0.01
0.01
0.01

0.11
0.18
0.11
0.17
0.01
0.01
0.01
0.01

A-

C-

G-

T-

0.01
0.01
0.01
0.01
0.29
0.31
0.24
0.17

0.01
0.01
0.01
0.01
0.2
0.29
0.23
0.23

0.01
0.01
0.01
0.01
0.27
0.07
0.29
0.28

0.01
0.01
0.01
0.01
0.2
0.29
0.2
0.28



HMM-APPLICATION

• DNA Sequence analysis
• Protein family profiling
• Predprediction
• Splicing signals prediction
• Prediction of genes
• Horizontal gene transfer
• Radiation hybrid mapping, linkage analysis
• Prediction of DNA functional sites.
• CpG island
HMM-APPLICATION
• Speech Recognition
• Vehicle Trajectory Projection
• Gesture Learning for Human-Robot Interface
• Positron Emission Tomography (PET)
• Optical Signal Detection
• Digital Communications
• Music Analysis
HMM-BASED TOOLS
• GENSCAN (Burge 1997)
• FGENESH (Solovyev 1997)
• HMMgene (Krogh 1997)
• GENIE (Kulp 1996)
• GENMARK (Borodovsky & McIninch 1993)
• VEIL (Henderson, Salzberg, & Fasman 1997)
BIOINFORMATICS RESOURCES
• PROBE www.ncbi.nlm.nih.gov/
• BLOCKS www.blocks.fhcrc.org/
• META-MEME
www.cse.ucsd.edu/users/bgrundy/metameme.1.0.html
• SAM
www.cse.ucsc.edu/research/compbio/sam.html
• HMMERS hmmer.wustl.edu/
• HMMpro www.netid.com/
• GENEWISE www.sanger.ac.uk/Software/Wise2/
• PSI-BLAST www.ncbi.nlm.nih.gov/BLAST/newblast.html
• PFAM www.sanger.ac.uk/Pfam/
Refrences
• Rabiner, L. R. (1989). A Tutorial on Hidden Markov
Models and Selected Applications in Speech
Recognition. Proceedings of the IEEE, 77(2), 257-285.

• Essential bioinformatics, Jin Xion
• http://www.sociable1.com/v/Andrey-Markov108362562522144#sthash.tbdud7my.dpuf
Thank You!

HMM (Hidden Markov Model)

  • 1.
    INTRODUCTION OF HIDDEN MARKOVMODEL Mohan Kumar Yadav M.Sc Bioinformatics JNU JAIPUR
  • 2.
    HIDDEN MARKOV MODEL(HMM) Real-worldhas structures and processes which have observable outputs. – Usually sequential . – Cannot see the event producing the output. Problem: how to construct a model of the structure or process given only observations.
  • 3.
    HISTORY OF HMM •Basic theory developed and published in 1960s and 70s • No widespread understanding and application until late 80s • Why? – Theory published in mathematic journals which were not widely read. – Insufficient tutorial material for readers to understand and apply concepts.
  • 4.
    Andrei Andreyevich Markov 1856-1922 AndreyAndreyevich Markov was a Russian mathematician. He is best known for his work on stochastic processes. A primary subject of his research later became known as Markov chains and Markov processes .
  • 5.
    HIDDEN MARKOV MODEL •A Hidden Markov Model (HMM) is a statical model in which the system is being modeled is assumed to be a Markov process with hidden states. • Markov chain property: probability of each subsequent state depends only on what was the previous state.
  • 6.
    EXAMPLE OF HMM •Coin toss: – Heads, tails sequence with 2 coins – You are in a room, with a wall – Person behind wall flips coin, tells result – Coin selection and toss is hidden – Cannot observe events, only output (heads, tails) from events – Problem is then to build a model to explain observed sequence of heads and tails.
  • 7.
    EXAMPLE OF HMM •Weather – Once each day weather is observed – State 1: rain – State 2: cloudy – State 3: sunny – What is the probability the weather for the next 7 days will be: – sun, sun, rain, rain, sun, cloudy, sun – Each state corresponds to a physical observable event
  • 8.
    HMM COMPONENTS • Aset of states (x’s) • A set of possible output symbols (y’s) • A state transition matrix (a’s) – probability of making transition from one state to the next • Output emission matrix (b’s) – probability of a emitting/observing a symbol at a particular state • Initial probability vector – probability of starting at a particular state – Not shown, sometimes assumed to be 1
  • 9.
    EXAMPLE OF HMM 0.3 0.7 Rain Dry 0.2 •Two states : ‘Rain’ and ‘Dry’. • Transition probabilities: P(‘Rain’|‘Rain’)=0.3 , P(‘Dry’|‘Rain’)=0.7 , P(‘Ra’)=0.6 . • in’|‘Dry’)=0.2, P(‘Dry’|‘Dry’)=0.8 • Initial probabilities: say P(‘Rain’)=0.4 , P(‘Dry 0.8
  • 10.
  • 11.
  • 12.
    COMMON HMM TYPES •Ergodic (fully connected): – Every state of model can be reached in a single step from every other state of the model. • Bakis (left-right): – As time increases, states proceed from left to right
  • 13.
    HMM IN BIOINFORMATICS •Hidden Markov Models (HMMs) are a probabilistic model for modeling and representing biological sequences. • They allow us to do things like find genes, do sequence alignments and find regulatory elements such as promoters in a principled manner.
  • 14.
    PROBLEMS OF HMM •Three problems must be solved for HMMs to be useful in real-world applications ● 1) Evaluation ● 2) Decoding ● 3) Learning
  • 15.
    EVOLUTION OF PROBLEM Givena set of HMMs, which is the one most likely to have produced the observation sequence? GACGAAACCCTGTCTCTATTTATCC p(HMM-3)? p(HMM-1)? p(HMM-2)? HMM 1 HMM 2 HMM 3 p(HMM-n)? … HMM n
  • 16.
  • 17.
    TRAINING PROBLEM From rawseqence data… AATAGAGAGGTTCGACTCTGCAT TTCCCAAATACGTAATGCTTACGG TACACGACCCAAGCTCTCTGCTT GAATCCCAAATCTGAGCGGACAG ATGAGGGGGCGCAGAGGAAAAA CAGGTTTTGGACCCTACATAAAN AGAGAGGTTCGTAAATAGAGAGG TTCGACTCTGCATTTCCCAAATAC GTAATGCTTACGGTTAAATAGAGA GGTTCGACTCTGCATTTCCCAAA TACGTAATGCTTACGGTACACGA CCCAAGCTCTCTGCTTGTAACTT GTTTTNGTCGCAGCTGGTCTTGC CTTTGCTGGGGCTGCTGAC to Transition Probabilities   A+ C+ A+ H o w ? C+ G+ T+ ACGT- 0.17 0.16 0.15 0.07 0.01 0.01 0.01 0.01 0.26 0.36 0.33 0.35 0.01 0.01 0.01 0.01 G+ T+ 0.42 0.26 0.37 0.37 0.01 0.01 0.01 0.01 0.11 0.18 0.11 0.17 0.01 0.01 0.01 0.01 A- C- G- T- 0.01 0.01 0.01 0.01 0.29 0.31 0.24 0.17 0.01 0.01 0.01 0.01 0.2 0.29 0.23 0.23 0.01 0.01 0.01 0.01 0.27 0.07 0.29 0.28 0.01 0.01 0.01 0.01 0.2 0.29 0.2 0.28  
  • 18.
    HMM-APPLICATION • DNA Sequenceanalysis • Protein family profiling • Predprediction • Splicing signals prediction • Prediction of genes • Horizontal gene transfer • Radiation hybrid mapping, linkage analysis • Prediction of DNA functional sites. • CpG island
  • 19.
    HMM-APPLICATION • Speech Recognition •Vehicle Trajectory Projection • Gesture Learning for Human-Robot Interface • Positron Emission Tomography (PET) • Optical Signal Detection • Digital Communications • Music Analysis
  • 20.
    HMM-BASED TOOLS • GENSCAN(Burge 1997) • FGENESH (Solovyev 1997) • HMMgene (Krogh 1997) • GENIE (Kulp 1996) • GENMARK (Borodovsky & McIninch 1993) • VEIL (Henderson, Salzberg, & Fasman 1997)
  • 21.
    BIOINFORMATICS RESOURCES • PROBEwww.ncbi.nlm.nih.gov/ • BLOCKS www.blocks.fhcrc.org/ • META-MEME www.cse.ucsd.edu/users/bgrundy/metameme.1.0.html • SAM www.cse.ucsc.edu/research/compbio/sam.html • HMMERS hmmer.wustl.edu/ • HMMpro www.netid.com/ • GENEWISE www.sanger.ac.uk/Software/Wise2/ • PSI-BLAST www.ncbi.nlm.nih.gov/BLAST/newblast.html • PFAM www.sanger.ac.uk/Pfam/
  • 22.
    Refrences • Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 77(2), 257-285. • Essential bioinformatics, Jin Xion • http://www.sociable1.com/v/Andrey-Markov108362562522144#sthash.tbdud7my.dpuf
  • 23.