MOM 2010 Bioinformatics: Reality Check & ChallengesDr. SouhamMeshoulInformation Technology Dept. CCIS, KSUsmeshoul@ksu.edu.saDr. NikhatSiddiqiBiochemistry Dept.CS, KSUnikkat@ksu.edu.sa&MOM 2010 ,  May 31 st
Outline  Introduction
  Central Dogma of Molecular Biology
  Biological Data representation
  How Computers can be useful in Biology
  Intelligent bioinformatics
  Challenges
  ConclusionIntroductionThe human body is made up of an estimated 1012 cells, each of which contains 23 pairs of chromosomes that are composed of approximately 30,000 genes which in turn contain some 3 billion pairs of DNA bases.Biological data explosionhttp://bip.weizmann.ac.il/education/course/introbioinfo/04/lect1/introbioinfo04/index.htm
Introduction
IntroductionWhat is Bioinformatics?Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single disciplineBioinformatics is the electronic infrastructure of BiologyScience that encompasses the methods that are used to collect, store, retrieve, analyze, and correlate the mountain of complex biological information.Source: http://ccb.wustl.edu/
IntroductionBioinformatics Vs Computational biologySource: http://ccb.wustl.edu/
IntroductionThe problem:             basic understanding of how gene sequences code specific proteins    l   Lack of the information necessary to completely understand role of DNA in specific diseases or the functions of the thousands of proteins that are produced.The goals:provide scientists with a means to explain:Biological processes. Malfunctions in these processes which lead to diseases. Drug discovery and their mode of action.
Central Dogma of Molecular Biology
Central Dogma of Molecular Biology
Central Dogma of Molecular BiologyDNA is responsible for all the hereditary information in an organism.
Central Dogma of Molecular Biology
Deoxyribonucleic Acid (DNA)DNA is found inside a special area of the cell called the nucleus. Because the cell is very small, and because organisms have many DNA molecules per cell, each DNA molecule must be tightly packaged. This packaged form of the DNA is called a chromosome.
Deoxyribonucleic Acid (DNA)CompositionDNA is made of chemical building blocks called nucleotides.  What is DNA made of?The four types of nitrogen bases found in nucleotides are: adenine (A), , thymine (T), guanine (G) and cytosine (C). The order, or sequence, of these bases determines what biological instructions are contained in a strand of DNA. For example, the sequence ATCGTT might instruct for blue eyes, while ATCGCT might instruct for brown.
Deoxyribonucleic Acid (DNA)GenesGenesThese unique coding sections of DNA that ultimately are transcribed into unique mRNA which are translated into unique proteins are called genes.
Deoxyribonucleic Acid (DNA)FunctionWhat does DNA do?DNA contains the instructions needed for an organism to develop, survive and reproduce. To carry out these functions, DNA sequences must be converted into messages that can be used to produce proteins, which are the complex molecules that do most of the work in our bodies.To form a strand of DNA, nucleotides are linked into chains, with the phosphate and sugar groups alternating.
Deoxyribonucleic Acid (DNA)ReplicationReplicationChromosomes are located in the nucleus of a cell.  DNA must be duplicated in a process called replication before a cell divides.  The replication of DNA allows each daughter cell to contain a full complement of chromosomes.  
Deoxyribonucleic Acid (DNA)TranscriptionTranscription: The actual information in the DNA of chromosomes is decoded in a process called transcription   through the formation of another nucleic acid, ribonucleic acid or RNA.
Deoxyribonucleic Acid (DNA)TranslationTranslation:The information from the DNA, now in the form of a linear RNA sequence, is decoded in a process called translation, to form a protein, another biological polymer.
Biological Data Representation
Biological Data Representation
Biological Data RepresentationStrings: to represent DNA, RNA and sequences of amino-acids    DNA: {A,C,G,T}, RNA: {A,C,G,U},Protein: {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}e.g5’GTAAAGTCCCGTTAGC 3’Image source; www.ebi.ac.uk/microarray/ biology_intro.htm
Biological Data RepresentationTrees: to represent the evolution of various organisms
Biological Data RepresentationSets of 3D points: to represent the protein structure
Biological Data RepresentationGraphs: to represent metabolic and signaling pathways.
How can computers be useful for biology?
How can computers be useful to biology?First, Computing technology for storing DNA sequences and constructing these latter from fragments. Data storage and access requirements – Study of the organization and evolution of genomes through comparative genome analysis. Visualisation tools and techniques requirements, sequence analysis requirements .
How can computers be useful to biology?Structuring and organizing large databases using a common ontology . Access data from different databases using the same query language. (Gene Ontology Consortium )Many areas of biology use images to communicate their results. Tools and techniques for searching, describing, manipulating and analyzing features within these images.
How can computers be useful to biology?Databases maintenance: Need to check consistency of databases for a valid and error free content. Storing protein sequences, their structure as well as their function. Tools and techniques for manipulating protein sequences, protein secondary and tertiary structure prediction…
How can computers be useful for biology?Several databases: genome databases, protein sequence databases, metabolic databases, Microarray databasesEMBL-EBI (Europe),  GenBank-NCBI (USA),  DDBJ (Japan)Basic Local Alignment Search Tool (BLAST) program for quick DB searches.
How can computers be useful for biology?Several algorithms: Sequence Comparison Algorithms: Needleman-Wunch (global alignment  1970),Smith-Waterman algorithm: Local sequence alignment (1981).BLAST,  FASTA, CLUSTALW, MEME...etchttp://en.wikipedia.org/wiki/Category:Bioinformatics_algorithms
Intelligent Bioinformatics  Data generation in biology/bioinformatics is outpacing methods of data analysis.
  Data interpretation and generation of hypotheses requires intelligence.
  AI offers established methods for knowledge representation and “intelligent” data interpretation.
  Predict utilization of AI in bioinformatics to increase.Intelligent BioinformaticsSearch problems: Sequence alignmentLearning problems: Gene regulatory networksClustering problems: Gene expression data processing.Prediction problems: tertiary and secondary structure.Data mining: Inferring knowledge from large biological databases.
Intelligent BioinformaticsAI  and computational intelligence techniques and models:Basic search techniques  A*, Branch and Bound, Genetic Algorithms Simulated Annealing Particle Swarm Optimization Neural networks, Support Vector Machines,K nearest neighbors,Hidden Markov Models,….
Intelligent Bioinformatics:An example: Sequence AlignmentA-TGGGG--TTA-TACCC-G-AG-GTTGTGT-A--ACCA-GCPossible alignmentPossible alignmentS1=AGGTCS2=GTTCGS3=TGAACFor lengthy sequences the problem is  very hard to solve:Optimization problem.
Search strategy
Scoring functionIntelligent Bioinformatics:An example: Sequence Alignmentsi: string defined over an alphabet AAligning sequences in S is obtaining S’ /such that:Each sequence si’ is an extension of si defined over the       alphabet 2.  For all i,jlength(S)=length(S’
Intelligent BioinformaticsAn example: the robot scientistSource: BBC NewsUniversity of Wales
 Designed for the study of functional genomics
 Tested on yeast metabolic pathways
 Uses knowledge representation schemes
Utilizes a Prolog database to store background biological information.

MoM2010: Bioinformatics

  • 1.
    MOM 2010 Bioinformatics:Reality Check & ChallengesDr. SouhamMeshoulInformation Technology Dept. CCIS, KSUsmeshoul@ksu.edu.saDr. NikhatSiddiqiBiochemistry Dept.CS, KSUnikkat@ksu.edu.sa&MOM 2010 , May 31 st
  • 2.
  • 3.
    CentralDogma of Molecular Biology
  • 4.
    BiologicalData representation
  • 5.
    HowComputers can be useful in Biology
  • 6.
    Intelligentbioinformatics
  • 7.
  • 8.
    ConclusionIntroductionThehuman body is made up of an estimated 1012 cells, each of which contains 23 pairs of chromosomes that are composed of approximately 30,000 genes which in turn contain some 3 billion pairs of DNA bases.Biological data explosionhttp://bip.weizmann.ac.il/education/course/introbioinfo/04/lect1/introbioinfo04/index.htm
  • 9.
  • 10.
    IntroductionWhat is Bioinformatics?Bioinformaticsis the field of science in which biology, computer science, and information technology merge into a single disciplineBioinformatics is the electronic infrastructure of BiologyScience that encompasses the methods that are used to collect, store, retrieve, analyze, and correlate the mountain of complex biological information.Source: http://ccb.wustl.edu/
  • 11.
    IntroductionBioinformatics Vs ComputationalbiologySource: http://ccb.wustl.edu/
  • 12.
    IntroductionThe problem: basic understanding of how gene sequences code specific proteins l Lack of the information necessary to completely understand role of DNA in specific diseases or the functions of the thousands of proteins that are produced.The goals:provide scientists with a means to explain:Biological processes. Malfunctions in these processes which lead to diseases. Drug discovery and their mode of action.
  • 13.
    Central Dogma ofMolecular Biology
  • 14.
    Central Dogma ofMolecular Biology
  • 15.
    Central Dogma ofMolecular BiologyDNA is responsible for all the hereditary information in an organism.
  • 16.
    Central Dogma ofMolecular Biology
  • 17.
    Deoxyribonucleic Acid (DNA)DNAis found inside a special area of the cell called the nucleus. Because the cell is very small, and because organisms have many DNA molecules per cell, each DNA molecule must be tightly packaged. This packaged form of the DNA is called a chromosome.
  • 18.
    Deoxyribonucleic Acid (DNA)CompositionDNAis made of chemical building blocks called nucleotides. What is DNA made of?The four types of nitrogen bases found in nucleotides are: adenine (A), , thymine (T), guanine (G) and cytosine (C). The order, or sequence, of these bases determines what biological instructions are contained in a strand of DNA. For example, the sequence ATCGTT might instruct for blue eyes, while ATCGCT might instruct for brown.
  • 19.
    Deoxyribonucleic Acid (DNA)GenesGenesTheseunique coding sections of DNA that ultimately are transcribed into unique mRNA which are translated into unique proteins are called genes.
  • 20.
    Deoxyribonucleic Acid (DNA)FunctionWhatdoes DNA do?DNA contains the instructions needed for an organism to develop, survive and reproduce. To carry out these functions, DNA sequences must be converted into messages that can be used to produce proteins, which are the complex molecules that do most of the work in our bodies.To form a strand of DNA, nucleotides are linked into chains, with the phosphate and sugar groups alternating.
  • 21.
    Deoxyribonucleic Acid (DNA)ReplicationReplicationChromosomesare located in the nucleus of a cell.  DNA must be duplicated in a process called replication before a cell divides.  The replication of DNA allows each daughter cell to contain a full complement of chromosomes.  
  • 22.
    Deoxyribonucleic Acid (DNA)TranscriptionTranscription:The actual information in the DNA of chromosomes is decoded in a process called transcription   through the formation of another nucleic acid, ribonucleic acid or RNA.
  • 23.
    Deoxyribonucleic Acid (DNA)TranslationTranslation:Theinformation from the DNA, now in the form of a linear RNA sequence, is decoded in a process called translation, to form a protein, another biological polymer.
  • 24.
  • 25.
  • 26.
    Biological Data RepresentationStrings:to represent DNA, RNA and sequences of amino-acids DNA: {A,C,G,T}, RNA: {A,C,G,U},Protein: {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}e.g5’GTAAAGTCCCGTTAGC 3’Image source; www.ebi.ac.uk/microarray/ biology_intro.htm
  • 27.
    Biological Data RepresentationTrees:to represent the evolution of various organisms
  • 28.
    Biological Data RepresentationSetsof 3D points: to represent the protein structure
  • 29.
    Biological Data RepresentationGraphs:to represent metabolic and signaling pathways.
  • 30.
    How can computersbe useful for biology?
  • 31.
    How can computersbe useful to biology?First, Computing technology for storing DNA sequences and constructing these latter from fragments. Data storage and access requirements – Study of the organization and evolution of genomes through comparative genome analysis. Visualisation tools and techniques requirements, sequence analysis requirements .
  • 32.
    How can computersbe useful to biology?Structuring and organizing large databases using a common ontology . Access data from different databases using the same query language. (Gene Ontology Consortium )Many areas of biology use images to communicate their results. Tools and techniques for searching, describing, manipulating and analyzing features within these images.
  • 33.
    How can computersbe useful to biology?Databases maintenance: Need to check consistency of databases for a valid and error free content. Storing protein sequences, their structure as well as their function. Tools and techniques for manipulating protein sequences, protein secondary and tertiary structure prediction…
  • 34.
    How can computersbe useful for biology?Several databases: genome databases, protein sequence databases, metabolic databases, Microarray databasesEMBL-EBI (Europe), GenBank-NCBI (USA), DDBJ (Japan)Basic Local Alignment Search Tool (BLAST) program for quick DB searches.
  • 35.
    How can computersbe useful for biology?Several algorithms: Sequence Comparison Algorithms: Needleman-Wunch (global alignment 1970),Smith-Waterman algorithm: Local sequence alignment (1981).BLAST, FASTA, CLUSTALW, MEME...etchttp://en.wikipedia.org/wiki/Category:Bioinformatics_algorithms
  • 36.
    Intelligent Bioinformatics Data generation in biology/bioinformatics is outpacing methods of data analysis.
  • 37.
    Datainterpretation and generation of hypotheses requires intelligence.
  • 38.
    AIoffers established methods for knowledge representation and “intelligent” data interpretation.
  • 39.
    Predictutilization of AI in bioinformatics to increase.Intelligent BioinformaticsSearch problems: Sequence alignmentLearning problems: Gene regulatory networksClustering problems: Gene expression data processing.Prediction problems: tertiary and secondary structure.Data mining: Inferring knowledge from large biological databases.
  • 40.
    Intelligent BioinformaticsAI and computational intelligence techniques and models:Basic search techniques A*, Branch and Bound, Genetic Algorithms Simulated Annealing Particle Swarm Optimization Neural networks, Support Vector Machines,K nearest neighbors,Hidden Markov Models,….
  • 41.
    Intelligent Bioinformatics:An example:Sequence AlignmentA-TGGGG--TTA-TACCC-G-AG-GTTGTGT-A--ACCA-GCPossible alignmentPossible alignmentS1=AGGTCS2=GTTCGS3=TGAACFor lengthy sequences the problem is very hard to solve:Optimization problem.
  • 42.
  • 43.
    Scoring functionIntelligent Bioinformatics:Anexample: Sequence Alignmentsi: string defined over an alphabet AAligning sequences in S is obtaining S’ /such that:Each sequence si’ is an extension of si defined over the alphabet 2. For all i,jlength(S)=length(S’
  • 44.
    Intelligent BioinformaticsAn example:the robot scientistSource: BBC NewsUniversity of Wales
  • 45.
    Designed forthe study of functional genomics
  • 46.
    Tested onyeast metabolic pathways
  • 47.
    Uses knowledgerepresentation schemes
  • 48.
    Utilizes a Prologdatabase to store background biological information.
  • 49.
    Prolog can inspectbiological information, infer knowledge, and make predictions
  • 50.
    Optimal hypothesis isdetermined using machine learning, which looks at probabilities and associated costIntelligent BioinformaticsAnother example: the robot scientistRoss D. King, et al., Nature, January 2004
  • 51.
    Intelligent BioinformaticsAn otherexample: the robot scientistPerformance similar to humansPerformance significantly better than “naïve” or “random” selection of experimentsRoss D. King, et al., Nature, January 2004
  • 52.
    ChallengesData Fusion orintegration: Integration of a wide variety of data sources such as clinical and genomic data will allow us to use disease symptoms to predict genetic mutations and vice versa. The integration of GIS data, such as maps, weather systems, with crop health and genotype data, will allow us to predict successful outcomes of agriculture experiments.
  • 53.
    ChallengesLarge-scale comparative genomics.development of tools that can do comparisons of genomes will push forward the discovery rate in this field of bioinformatics.Modeling and visualization of full networks of complex systemspredict how the system (or cell) reacts to a drug for example.
  • 54.
    ChallengesCompare complex biologicalobservations, such as gene expression patterns and protein networks.Converting biological observations to a model that a computer will understand.More than that we are more than the sum of the parts….COMPLEXITY
  • 55.
    ChallengesImportant information stillneeds to be decoded: Genome sequencing, microarrays
  • 56.
    Exciting researchpotential: Leads to important discoveries
  • 57.
  • 58.
    SmartMoney ranks Bioinformaticsa #1 among next Hot Jobs
  • 59.
    Savestime and moneyhttp://smartmoney.com/consumer/index.cfm?story=working-june02
  • 60.
    OUR CHALLENGE: STARTWORKING ON BIOINFORMATICSJ. Cohen “Computer scientists should be encouraged to learn biology as biologists computer science to prepare themselves for an intellectually stimulating and financially rewarding future.”D. Knuth “…the number of radically new results in pure computer science is likely to decrease, while scientists continue working on biological challenges for the next 500 years….”L. Adleman “..biological life can be equated with computation…”
  • 61.
  • 62.
  • 63.
    RessourcesISCB: http://www.iscb.org/NBCI: http://ncbi.nlm.nih.gov/http://www.bioinformatics.org/Journals:IEEE/ACM Conferences (ISMB, RECOMB, PSB…)http://kbrin.a-bldg.louisville.edu/CECS694/
  • 64.
    ConclusionBioinformatics is allabout how computer science can enhance biology and how biology can stimulate computer science.
  • 65.