Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gene Prediction Using Hidden Markov Model and Recurrent Neural Network

628 views

Published on

  • Be the first to comment

Gene Prediction Using Hidden Markov Model and Recurrent Neural Network

  1. 1. Gene Prediction Using Hidden Markov Model & Recurrent Neural Network Ahmed Hani AlGhidani MSc Student in Computer Science at Cairo University Research and SDE at RDI Egypt ahmed.hani@rdi-eg.com
  2. 2. Agenda • DNA Structure - Eukaryotic and Prokaryotic Cells • Gene Prediction Methods - Empirical Methods - Ab initio Methods • Hidden Markov Model - Existed HMM-based systems • Recurrent Neural Network • Other Methods
  3. 3. DNA Structure
  4. 4. DNA Structure (Cont.) • Prokaryotic Cells • Most of DNA is coding • No Introns • Promoters
  5. 5. DNA Structure (Cont.) • Eukaryotic Cells • Exons (Coding) • Introns (Non-Coding) • Acceptors (End of Intron in 5’ direction) • Donors (Start of Intron in 5’ direction)
  6. 6. DNA Structure (Cont.) • Eukaryotic Cells (cont.)
  7. 7. Gene Prediction • Get the exons regions that would be translated to Amino Acid (Protein)
  8. 8. Gene Prediction (Cont.) • Empirical methods are used for specifically Prokaryotic cells • Most of it is coding regions and no introns • Feature Engineering method • Open Reading Frames (ORFs)
  9. 9. Gene Prediction (Cont.)
  10. 10. Gene Prediction (Cont.) • Pros - Simple and easy for implementation - Works well with Prokaryotic DNA because of its simplicity • Cons - Bad performance in large sequences - Works bad with complex DNA such as Eukaryotic DNA
  11. 11. Gene Prediction (Cont.) • Ab initio methods for Eukaryotic cells • Depend on statistical methods and computational models • Features Engineering could be involved in the computations • Hidden Markov Model and Recurrent Neural Networks
  12. 12. Hidden Markov Model • The basic idea is Markov Chains • • Set of finite states • Transition Matrix
  13. 13. Hidden Markov Model (Cont.)
  14. 14. Hidden Markov Model (Cont.) • Practically, it may be hard to access the patterns or classes that we want to predict • We need indicators (visible states) to obtain the hidden patterns
  15. 15. Hidden Markov Model (Cont.)
  16. 16. Hidden Markov Model (Cont.) • Observations Probability Estimation - Estimate the probability of observation sequence given the model • Optimal Hidden State Sequence - Determine the optimal sequence of the hidden states • HMM Parameters Estimation - Get the model parameters that maximizes the probability of specific observations given specific states
  17. 17. Hidden Markov Model (Cont.) • In Gene Prediction, the observations are the A, C, G, T sequences, and the hidden states are Exons, Introns and Other • Use the training data to set the model parameters (problem 3) using Baum- Welch algorithm • For the given observations, we predict the states (problem 2) using Viterbi algorithm
  18. 18. Hidden Markov Model (Cont.)
  19. 19. Hidden Markov Model (Cont.)
  20. 20. Neural Network (Cont.) • Unexplored area in Bioinformatics • No need for features engineering • Outperforms old-school Machine Learning • Based on Biological philiosophy!
  21. 21. Neural Network (Cont.)
  22. 22. Recurrent Neural Networks
  23. 23. Recurrent Neural Networks (Cont.)
  24. 24. Recurrent Neural Networks (Cont.) • Acceptor/Donor experiments
  25. 25. Recurrent Neural Networks (Cont.) • Exons/Introns still in progress • Dataset size is 800K sequences • Sequences aren’t fixed-size • LSTM instead of Vanilla RNN • Tensorflow
  26. 26. Other Methods • Naive Bayesian + Statistical Features • Hidden Markov Model Support Vector Machine (HMM-SVM) • Open Reading Frames + Hidden Markov Model • Open Reading Frames + Statistical Features + Hidden Markov Model
  27. 27. References • http://bpg.utoledo.edu/~afedorov/lab/eid.html • http://www.ece.drexel.edu/gailr/ECE-S690-503/markov_models.ppt.pdf • http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105- 7-62 • https://github.com/AhmedHani/Hidden-Markov-Model • https://ahmedhanibrahim.wordpress.com/2015/10/25/hidden-markov- models-hmms-part-i/ • http://www.cbcb.umd.edu/software/Glim- merHMM/man.shtml?tid%5B%5D=44&=Apply • http://www.math.uwaterloo.ca/~aghodsib/courses/w05stat440/w05stat44 0-notes/feb27.pdf • https://en.wikipedia.org/wiki/GLIMMER • https://ocw.mit.edu/courses/electrical-engineering-and-computer-sci- ence/6-096-algorithms-for-computational-biology-spring-2005/lecture- notes/lecture7.pdf • https://www.cs.us.es/~fran/students/julian/gene_finding/gene_find- ing.html • http://www.nature.com/nbt/journal/v25/n8/full/nbt0807-883.html • http://gobics.de/mario/papers/diss.pdf • https://www.ncbi.nlm.nih.gov/books/NBK21132/ • https://archive.ics.uci.edu/ml/datasets/Molecular+Biology+(Splice-junc-

×