Upcoming SlideShare
×

# Hidden markov chain and bayes belief networks doctor consortium

851 views

Published on

Hidden markov chain and bayes belief networks doctor consortium.I made it by myself. I hope it's helpful for you.

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Hidden markov chain and bayes belief networks doctor consortium

1. 1. Graphical Models of Probability <ul><li>Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated. </li></ul><ul><li>Bayesian Networks : Directed acyclic graphs that indicate causal structure. </li></ul><ul><li>Markov Networks : Undirected graphs that capture general dependencies. </li></ul>Middle ware, CCNT, ZJU 10/02/11
2. 2. H idden M arkov M odel Zhejiang Univ CCNT Yueshen Xu Middle ware, CCNT, ZJU 10/02/11
3. 3. Overview <ul><li>Markov Chain </li></ul><ul><li>HMM </li></ul><ul><li>Three Core Problems and Algorithms </li></ul><ul><li>Application </li></ul>Middleware, CCNT, ZJU 10/02/11
4. 4. Markov Chain Instance We can regard the weather as three states ： state1 ： Rain state2 ： Cloudy state3 ： Sun We can obtain the transition matrix with long term observation Middleware, CCNT, ZJU 10/02/11 Tomorrow Rain Cloudy Sun Today Rain 0.4 0.3 0.3 Cloudy 0.2 0.6 0.2 Sun 0.1 0.1 0.8
5. 5. Definition one-step transition probability That is to say, the evolvement of the stochastic process only relies on the current state and has nothing to do with those states before. Then we call this Markov property , and the process is regarded as Markov Process State Space: Observation Sequence: Middleware, CCNT, ZJU 10/02/11
6. 6. Keystone Middleware, CCNT, ZJU state transition matrix 其中： Initial state probability matrix 10/02/11
7. 7. HMM <ul><li>A HMM is a double random process, consisting of two parallel parts: </li></ul><ul><li>Markov Chain : Describe the transition of the states, which is unobservable, by means of transition probability matrix. </li></ul><ul><li>Common stochastic process : Describe the stochastic process of the observable events </li></ul>Markov Chain （  , A ） Stochastic Process （ B ） State Sequence Observation Sequence q 1 , q 2 , ..., q T o 1 , o 2 , ..., o T HMM Middleware, CCNT, ZJU 10/02/11 Unobservable Observable Core Feature
8. 8. Example: S 1 S 2 S 3 What’s the probability of producing the sequence “abb” for this stochastic process? Middleware, CCNT, ZJU 10/02/11 a 11 0.3 a b 0.80.2 a 22 0.4 a b 0.30.7 a 12 0.5 a b 1 0 a 23 0.6 a b 0.50.5 a 13 0.2 a b 0 1
9. 9. Instance1: S 1 S 2 S 3 S 1 ->S 1 ->S 2 ->S 3 0.3*0.8*0.5*1.0*0.6*0.5=0.036 Middleware, CCNT, ZJU 10/02/11 a 11 0.3 a b 0.80.2 a 12 0.5 a b 1 0 a 23 0.6 a b 0.50.5 a 13 0.2 a b 0 1 a 22 0.4 a b 0.30.7
10. 10. Instance2: S 1 S 2 S 3 S 1 ->S 2 ->S 2 ->S 3 0.5*1.0*0.4*0.3*0.6*0.5=0.018 Middleware, CCNT, ZJU 10/02/11 a 11 0.3 a b 0.80.2 a 12 0.5 a b 1 0 a 23 0.6 a b 0.50.5 a 13 0.2 a b 0 1 a 22 0.4 a b 0.30.7
11. 11. Instance3: S 1 S 2 S 3 S 1 ->S 1 ->S 1 ->S 3 0.3*0.8*0.3*0.8*0.2*1.0=0.01152 Therefore, the total probability is: 0.036+0.018+0.01152=0.06552 Middleware, CCNT, ZJU We just know “abb”, but don’t know “S ? S ? S ? ”-----That’s the point. 10/02/11 a 11 0.3 a b 0.80.2 a 12 0.5 a b 1 0 a 23 0.6 a b 0.50.5 a 13 0.2 a b 0 1 a 22 0.4 a b 0.30.7
12. 12. Description <ul><li>A HMM can be identified by those parameters below: </li></ul><ul><li>N: the number of states </li></ul><ul><li>M: the number of observable events for each state </li></ul><ul><li>A: the state transition matrix </li></ul><ul><li>B: observable event probability </li></ul><ul><li>: the initial state probability </li></ul>Middleware, CCNT, ZJU <ul><li>We generally record it as </li></ul>10/02/11
13. 13. Three Core Problem <ul><li>Evaluation: </li></ul><ul><li>In the case that the observation sequence and the model have been preseted, then how can we calculate ? </li></ul><ul><li>Optimization: </li></ul><ul><li>Based on question 1, the question is how to choose a special sequence so that the observation sequence O can be explained reasonably? </li></ul><ul><li>Training </li></ul><ul><li>Based on question 1, here is how to adjust parameters of the model to maximize ? </li></ul>Middleware, CCNT, ZJU We know O, but don’t know Q 10/02/11
14. 14. Solution <ul><li>There is no need to expound those algorithms, since we should pay attention to the application context. </li></ul><ul><li>Evaluation——Dynamic Programming </li></ul><ul><li>Forward </li></ul><ul><li>Backward </li></ul><ul><li>Optimization——Greedy </li></ul><ul><li>Viterbi </li></ul><ul><li>Training——Iterative </li></ul><ul><li>Baum-Welch & Maximum Likelihood Estimation </li></ul><ul><li>You can think over and deduce these methods after the workshop. </li></ul>Middleware, CCNT, ZJU 10/02/11
15. 15. Application Context <ul><li>Just think over it : </li></ul><ul><li>The feature of HMM </li></ul><ul><li>Which kind of problem can it describe and model? </li></ul><ul><li>Two stochastic sequence </li></ul><ul><li>One relies on another or two is related . </li></ul><ul><li>One can be “ seen”, but another can not </li></ul><ul><li>Just think about the Three Core Problem </li></ul><ul><li>…… </li></ul><ul><li>I think we can make a conclusion , just as: Use One sequence to deduce and predict another or Find Out Who is Behind </li></ul>“ Iceberg” Problem Middleware, CCNT, ZJU 10/02/11
16. 16. Application Context(1): Voice Recognition <ul><li>Statistical Description </li></ul><ul><li>The characteristic pattern of voice, from sampling more often: T =t 1 ,t 2 ,…, t n </li></ul><ul><li>The word sequence W(n): W 1, W 2, ...,W n </li></ul><ul><li>Therefore, what we concern about is P( W(n)|T ) </li></ul>Middleware, CCNT, ZJU <ul><li>Formalization Description </li></ul><ul><li>What we have to solve is : </li></ul><ul><li>k = arg max{ P( W(n)| T ) } </li></ul>10/02/11
17. 17. Application Context(1): Voice Recognition Middleware, CCNT, ZJU Recognition Framework 10/02/11 Baum-Welch Re-estimation Speech database Feature Extraction Converged?  1  2  7 HMM waveform feature Yes No end
18. 18. Application Context(2): Text Information Extraction <ul><li>Figure out the HMM Model : </li></ul><ul><li>Q1:What ‘s the state and what’s the observation event? </li></ul><ul><li>Q2:How to figure out those parameters, just like a ij ? </li></ul>Middleware, CCNT, ZJU <ul><li>state : what you want to extract </li></ul><ul><li>observation event : text block or each word etc </li></ul>Through Training Samples 10/02/11
19. 19. Application Context(2): Text Information Extraction Middleware, CCNT, ZJU Partition-ing State List Extracted Sequence Document Partitioni-ng Training Sample HMM Extraction Framework country, state , city, street title, author, email, abstract 10/02/11
20. 20. Application Context(3): Other Fields: <ul><li>Face Recognition </li></ul><ul><li>POS tagging </li></ul><ul><li>Web Data Extraction </li></ul><ul><li>Bioinformatics </li></ul><ul><li>Network intrusion detection </li></ul><ul><li>Handwriting recognition </li></ul><ul><li>Document Categorization </li></ul><ul><li>Multiple Sequence Alignment </li></ul><ul><li>… </li></ul>Middleware, CCNT, ZJU Which field are you interested in ? 10/02/11
21. 21. Middleware, CCNT, ZJU 10/02/11
22. 22. B ayes B elief N etwork Yueshen Xu, too Middle ware, CCNT, ZJU 10/02/11
23. 23. Overview <ul><li>Bayes Theorem </li></ul><ul><li>Naïve Bayes Theorem </li></ul><ul><li>Bayes Belief Network </li></ul><ul><li>Application </li></ul>Middleware, CCNT, ZJU 10/02/11
24. 24. Bayes Theorem <ul><li>Basic Bayes Formula </li></ul><ul><li>Basic of basis, but vital. </li></ul>prior probability posterior probability complete probability formula Middleware, CCNT, ZJU Condition Inversion 10/02/11
25. 25. <ul><li>The naive Bayes theorem is a simple probabilistic theorem based on applying Bayes theorem with strong independence  assumptions </li></ul>Naïve Bayes Theorem Middleware, CCNT, ZJU Chain Rule Conditional Independence C F 1 F 2 … F n Naïve Bayes is a simple Bayes Net 10/02/11
26. 26. Bayes Belief Network: Graph Structure <ul><li>Directed Acyclic Graph (DAG) </li></ul><ul><li>Nodes are random variables </li></ul><ul><li>Edges indicate causal influences </li></ul>Middleware, CCNT, ZJU Burglary Earthquake Alarm JohnCalls MaryCalls RV parents descendant relationship 10/02/11
27. 27. Bayes Belief Network: Conditional Probability Table <ul><li>Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents. </li></ul><ul><li>Roots (sources) of the DAG that have no parents are given prior probabilities. </li></ul>Middleware, CCNT, ZJU Burglary Earthquake Alarm JohnCalls MaryCalls 10/02/11 P(B) .001 P(E) .002 B E P(A) T T .95 T F .94 F T .29 F F .001 A P(M) T .70 F .01 A P(J) T .90 F .05
28. 28. Bayes Belief Network: Joint Distributions <ul><li>A Bayesian Network implicitly defines a joint distribution. </li></ul><ul><li>Example </li></ul><ul><li>Therefore an inefficient approach to inference is: </li></ul><ul><ul><li>1) Compute the joint distribution using this equation. </li></ul></ul><ul><ul><li>2) Compute any desired conditional probability using the joint distribution. </li></ul></ul>Middleware, CCNT, ZJU Conditional Independence 10/02/11
29. 29. Conditional Independence & D-separation <ul><li>D-separation </li></ul><ul><li>Let X,Y and Z be three sets of node </li></ul><ul><li>If X and Y are d-separation by Z then X and Y are conditional independent given Z </li></ul><ul><li>D-separation </li></ul><ul><li>A is d-separation from B given C if every undirected path between them is blocked </li></ul><ul><li>Path blocking </li></ul><ul><li>Three cases that expand on three basic independence structures. </li></ul>Middleware, CCNT, ZJU 10/02/11
30. 30. Application: Simple Document Classification(1) <ul><li>Step1: Assume for the moment that there are only two mutually exclusive classes, S and ¬S (eg, spam and not spam), such that every element(email) is in either one or the other, that is to say: </li></ul><ul><li>Step2: what we concern about is : </li></ul>Middleware, CCNT, ZJU 10/02/11
31. 31. Application: Simple Document Classification(2) <ul><li>Step3: Dividing one by the other gives, and the be re-factored . </li></ul><ul><li>Step4: Taking the logarithm of all these ratios for decreasing calculated quantity: </li></ul>>0 or <0 Known Sample Training Middleware, CCNT, ZJU 10/02/11
32. 32. Application: Overall <ul><li>Medical diagnosis </li></ul><ul><li>Pathfinder system outperforms leading experts in diagnosis of lymph-node disease. </li></ul><ul><li>Microsoft applications </li></ul><ul><li>Problem diagnosis: printer problems </li></ul><ul><li>Recognizing user intents for HCI </li></ul><ul><li>Text categorization and spam filtering </li></ul><ul><li>Student modeling for intelligent tutoring systems. </li></ul><ul><li>Biochemical Data Analysis </li></ul><ul><li>Predicting mutagenicity </li></ul><ul><li>So many… </li></ul>Which field are you interested in ? Middleware, CCNT, ZJU 10/02/11
33. 33. Middleware, CCNT, ZJU 10/02/11