Hi,Welcome to this bioinformatics course.I’m Etienne, I have a bachelor in Mathematics and informatics, a master in Computer science, a Master Degree in computer science and mathematics for integrative biology, and a PhD in Biotechnology and bioinformatics.During this course, we can name informatician persons who have a background in information systems like a computer scientist. We can also name biology persons who have a background in the life sciences like biology, ecology, agriculture, and so on …[Next]
We will talk about fundamentals, career opportunities, and end with the applications of bioinformatics.Don’t try to retain the slide content. The aim of this talk is to give you the overview of bioinformatics as discipline.At the end of the talk, you will be able to indicate what role bioinformatics plays in the whole biotechnology area, some applications examples, and the career opportunities for bioinformaticians. It is an interactive talk, if you have a question, don’t hesitate to interrupt me.
Bioinformatics fundamentals depend on the public, here, we have to use some terms most often used byinformaticians, but I’m sure you have strong computational backgrounds.
- This part talk about bioinformatics in biotechnology area,- About What Bioinformatics is, Main areas developed in bioinformatics as Metabolic pathway, epigenetics, genomics, transcriptomics andproteomics- In this course we will also talk about2 Key concepts in biology and computer science: the lactose operon, and the central dogma of molecular biology
We cannot speak about bioinformatics fundamentalswithout biotechnology. To be referring to the united nation convention on biological diversity, biotechnology describe any technological application that uses for biological systems, living organism, or derivative thereof.....…. thereof to make or modify product or process for specific used, as biochemistry in genetics for genetically modify organism (GMO).From this conception .....
From this conception, we can draw a simple map of biotechnology area.This map organized around four principals axes corresponding to:The green biotech is for plants.The red biotech is for animals and healthcare. The Blue Biotech is for aquatic bioengineering. And the White biotech is for the industry area It is important to note that this card is just a representation of the area. For an idea, an reference bioscience dictionary "SpellexBioScience" has identified more than 13,000 biotechnical term in his 2011 version.So you may ask me, where in the figure does bioinformatics fall?Informatics (or information technology with computer) is used in each of these four branches.Then, the bioinformatics (informatics in bio world) tools are used in every apply domains concerning……….……… concerning Agriculture for green Biotech. Bioengineering like biodegradation and bioremediation for white BiotechMedicine for the green engineeringAnd Education area applicationsNow, let us talk about two important fields, the bio world and the informatics one.
The Bio world is too large and complex.Large because more than billions different elements to study.From the smallest element, invisible (the nucleotides, ACGTU) to ecosystem and environmental factors. Complex because each element interact with every one, and some interactions are unknownThe researchers in biology work on a specific domain (physiology), on a specialized theme (Cells growth and apoptosis process in cancer development), and for a given moment, researchers work on one subject based on only one hypothesis to verify (in epigenetic context, DNA methylation modify the cells growth profile).The problem here is having the right information at a given moment in their research process.
Fig. Historical milestones that have placed bioinformatics at the heart of 21st century biology, from the determination of the first amino acid sequence, to the development of an archive of 500 billion nucleotide sequences. Some major milestones are denoted in black; key computing innovations are indicated in purple; example databases are indicated in blue; organizations and institutions in green; numbers of sequences in red, the growing mass of which is highlighted both in the red curve and the background gradient – the impact of genomic sequencing in the mid ‘90s is clear.
In the other hand, we have the informatics worldSpecialize on information manipulation (remind that information is not data).
In this focus we have the bioinformatics world which rotates about:
We have spoken about biological species and interactions between them. the main issue is how mathematician can understand these interactions? Biologists propose two key tools in this direction. the lactose operon to detail the mechanism of gene expression, and the central dogma of molecular biology for interactions.The gene regions of the DNA in the nucleus of the cell is copied (transcribed) into the RNA andRNA travels to protein production sites and is translated into proteins. In short, DNA , RNA Proteins, is the Central Dogma of Molecular Biology. Imagine, there are trillions of cells in your body, the DNA of each of them is churning out thousands of RNAs which in turn cause thousands of proteins to be produced, every moment. One of them is making your hair strong, another giving the glitter in your eyes, another one carrying oxygen to different parts, and yet another one helping in the making of proteins themselves!No wonder that famous life scientist Russel Doolittle exclaimed: “We are our proteins”
With the study of the lactose operon, François Jacob, André Lwoff and Jacques Monod were the first scientists to describe a system for regulating gene transcription. They propose the existence of two classes of genes that differ in their function: the structural genes and regulatory genes. It is from this work was born the concept of gene regulation. (Nobel Prize for Physiology or Medicine in 1965).
Rearrange for poster
Research and academic institutes have also become big players in the employment market as more candidates look to acquire a PhD and some essential researching skills in the hope that it would lead to better opportunities in the future.
This figure represents an informatician’s perception of bioinformatics. So Bioinformatics has many side view applications.Biology, pharmacology, ….Each application side is accessible through an ad hoc interface adapted to the user's environment.
In the other hand, we can observe bioinformatics like an integrate based tool, process and databank according to the aims of our work.In that case, we have
In all cases, The practice of bioinformatics depends on several parameters. One of the most important is the context and profile of the practitioner. Thus, a basic user is a specialist in life sciences, meaningwe have different practices.From a biological standpoint, the practice has taken back in bioinformatics computing tools useful to the resolution of known biological problems (BIO-INFORMATICS).From a computing point of view, bioinformatics is the construction program and processes relevant to biology (computer-or computational Biology).And here lies the relationship between bioinformatics and computational biologyWhy do I bother to take this precision?What emerges from the perception of bioinformatics is a biologist and bioinformatician will be both a computer scientist. there are computer scientists who work on bioinformatics with a rudimentary level in biology gained during their experience.For aninformatician, bioinformatics refers to threethings:- Data manupulation- Programmbuiding- Model designingFor example….next
This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on
This model illustrates the growth of a tumor and how it resists chemical treatment. A tumor consists of two kinds of cells: stem cells (blue) and transitory cells (all other colors).HOW IT WORKSDuring mitosis, a stem cell can divide either asymmetrically or symmetrically. In asymmetric mitosis, one of the two daughter cells remains a stem cell, replacing its parent. So a stem cell effectively never dies - it is quasi reincarnated after each division. The other daughter cell turns into a transitory cell that moves outward.Young transitory cells may divide, breeding other transitory cells. The transitory cells stop dividing at a certain age and change color from red to white to black, eventually dying.A stem cell may also divide symmetrically into two stem cells (blue). In this example the original stem cell divides symmetrically only once. The first stem cell remains static, but the second stem cell moves to the right. This activity, in which the cell advances into distant sites and creates another tumor colony, is called metastasis. Notice that the metastasis is red. It is made of cells that die young, when they are still red, rather than ending as black dots as in the static tumor. As the disease progresses, cells die younger and younger.
Epidemiologists use bioinformatics models as well. Our last example models epidemiology inquiries.HIV or other diseases can be modeled to show how it will spread or subside under certain conditions.
BIOINFORMATICS. Dr. Etienne Z. GNIMPIEBA Etienne.firstname.lastname@example.org May 2012 1
Plan• PART I: Fundamentals• PART II: Career• PART III: Applications 22
Bioinformatics Fundamentals PlanPART I: Fundamentals • From biotechnology toPART II: CareerPART III: Applications bioinformatics • Bioinformatics world • FOCUS: main areas • 2 Key concepts between biology and computer science 4 4
Bioinformatics Fundamentals From Biotechnology to Bioinformatics 1 "Any technological application that uses biological systems, living organisms, or derivatives thereof, to make or modify products or processes for specific use.“1 1The United Nations Convention on Biological Diversity, 2008 5
Bioinformatics Fundamentals From Biotechnology to Bioinformatics: apply area 2 Reduce dependence on fertilizer, pesticide, agrochemical Good yield Increase nutritional quality Novel substance Agriculture Reduce vulnerability in crop plant Pharmacogenomics Bio-process Biochemical Gene therapy Biosystems Genetic test (DNA) Medicine Bioinformatics DNA Vaccines Organism adapt. Cloning Environment Clinical trials contamination Education Biotechnology Training Programs (BTPs) 2 Spellex BioScientific, v.2011 6 6
Bioinformatics Fundamentals Challenges 4• Accumulating mass of data• Biological systems complexity• Development of new research interest on DNA1950 1960 1970 1980 1990 2000 2010 4 Attwood T. K., 2012 8
Bioinformatics Fundamentals Challenges 5 • Accumulating mass of data • Biological systems complexity • Development of new research interest on DNA 9 5 MiPPI, 2007 9
Bioinformatics Fundamentals Informatics world 6• Data Manipulation / • Math – Calculus Management – Representation tools – Modeling & predicting tools –Creation (Learning, interpreting, deducing, simulation, .. ) –Acquire / Collect – Formalisms – Exploration tools – Optimization tools • Process – Theories –Experiment –Organize – Inference tools process design –Store – Statistics –Algorithm –Secure – Graphics (Surfaces, Volumes) –Process –Validate (standard, norms, safety) – Comparison and 3D Matching (Vision, recognition) –Workflow –Analyze (statistics, mining) –Visualize • Material –Share (security, import, export, clean, …) – Server – Archiving – Network – Storage supports – Processor • Art & music • Physics• Software – Quantum computing – Design (Human machine interaction) – Data manipulation tools – Signal treatment tools – Usefulness (beauty, – Programming tools attractiveness) – Biomedical material – Artificial intelligence tools interaction (electric, optic – Philosophy – High computing tools fiber, Wi-Fi, radio wave) – Signal – Singling tools – Electrostatics – Web 10 6Etienne Gnimpieba, 2012 – Robotics 10
Bioinformatics Fundamentals Bioinformatics World: some topics 7 Genome Sequence Protein Sequence• Finding Genes in Genomic DNA • Sequence Alignment• Characterizing Repeats in Genomic DNA Dynamic Programming for Local vs Global Alignment• Duplications in the Genome • Multiple Alignment and Consensus Patterns• Secondary Structure “Prediction” • Scoring schemes and Matching statistics (How to tell if a given alignment or match is statistically significant) Genomics • Expression Analysis Structures • Large scale cross referencing of information • Basic Protein Geometry and Least-Squares Fitting • Function Classification and Orthologs • Calculating a helix axis in 3D via fitting a line • The Genomic vs. Single molecule Perspective • Calculation of Volume and Surface • Genome Comparisons • Structural Alignment • Structural Genomics • Genome Trees Databases • Relational Database Concepts • Natural Join as "where“ selection on cross product Modeling & Simulation • Array Referencing (perl/dbm) • Protein Units? • Molecular Simulation • sequence, structure • How to measure the change in a vector • motifs, modules, domains (gradient) • Clustering and Trees • UPGMA • Parameter Sets • single-linkage • Number Density • multiple linkage • Poisson-Boltzman Equation • Parsimony, Maximum likelihood • Lattice Models and Simplification • The Bias Problem 7 Etienne Gnimpieba, 2012 11
Bioinformatics Fundamentals Bioinformatics World: some topics 8 Experiment Compulation Information Technology Hardware & Instrumentation Mathematical & Physical Models Methodology & ExpertiseDNA Sequence Genome sequencing Geomonic data Statistical Gene & Genome Organization analysis genetics Sequence Physiology (and beyond)MolecularEvolution Proteomics Protein structure prediction,Protein Structure,Folding, Function, protein dynamics, protein folding& Interaction and designMetabolicPathways FunctionalRegulation genomics Data standards, Signaling (microarrays, data representations, Dynamical Networks 2D-PAGE, etc.) and analytical tools for systems modelingPhysiology & Cell complex biological dataBiology Interspecies Interaction High-techEcology & field ecologyEnvironment Computational ecology 8 SABU M. THAMPI, Dept. of CSE, LBS College of Engineering, Kasaragod, Kerala-671542, 2011 12
Bioinformatics Fundamentals Key concept: central dogma of Molecular Biology 9,10 DNA DNA E Transcription Degradation Gene mRNA Repression Translation Degradation E Catalyse S P 13 9 Barbeillini, 2003 10 Etienne Gnimpieba, 2012 13
Bioinformatics Fundamentals Key concept: Lactose Operon (Lac) 11 Genes and its binding sitesIn the "induced" state, the lac repressor In the "repressed" state, the repressor IS is NOT bound to the operator site bound to the operator. 11 blc.arizona.edu 14
Bioinformatics Career Where can you be a bioinformatician? 12 • Public institution – University( research project, training) – Research center (research project) – State & Federal agency (FDA, ) • Companies – Pharmaceuticals – Biotech – Agricultural & food – Health – Information systems Fundamental research Development research (product) Used, commercialization, market Apply research • Owner (your own boss) – Contractor (entrepreneur) – Consultant • International institutions – WHO – UN 12 Etienne Gnimpieba, 2012 18
Bioinformatics Career What do you do in Bioinformatics? As informaticians, you have a lot of tasks • DNA computing • Algorithms • Neural computing • Databases and information systems • Evolutionary computing • Web technologies • Immuno-computing • Artificial intelligence and soft • Swarm-computing computing • Cellular-computing • Information and computation theory • Visualization • Software engineering • Decisions making • Data mining • Sequence Assembly • Image processing • Genomic Sequence Analysis • Modeling and simulation • Functional genomics • Signal processing • Genotyping • Discrete mathematics • Proteomics • Control and system theory • Pharmacogenomics • Statistics • Integrative computing • Database Administration 19
Bioinformatics Career How to become a bioinformatician? Skills Needed • Database administration and programming skills • (SQL Server, Oracle, Sybase, MySQL, CORBA, PERL, Java, C, C++, web scripting). • Genomic sequence analysis , • Molecular modeling programs, • Biologist and computers scientists, • Skills for data analysis, storage and retrieval. • Skills filter information and from possible relationships between datasets. Training Eligibility biopharmaceutical : • Bachelor • Life Sciences Graduates • Master • Computer Sciences Graduates • Databases Specialists • MD • Engineering Graduates • PhD • Marketing and Management Graduates • High school diploma • MD-s, RN-s and Medical Professionals 20
Bioinformatics Career Who does bioinformatics? More than 100 profile denominations according to: country, company, domain, experience, education profile, competence From BIO based profile to Informatics based profile • Bioinformatician • Biostatistician – Cheminformatician • Scientist – Computational Biologist • Biomedical Chemist – Gene Analyst • Clinical Data Manager – Genomic Scientist • Molecular Microbiologist – Molecular Modeler • Software/Database – Phylogenitist Programmer – Protein Analyst • Medical Writer/Technical – Scientific Curator Writer – Structural Analyst • Research Associates and • Biomedical Computer Scientist Research Scientists • Geneticist • Data analyst • Computational Biologist • Data designer 21
Bioinformatics Career Career profile: an example An example of a Bioinformatician work profile 22 22
Bioinformatics Career Summary Part II 13 Data manipulation • Cloud • Databank • Database • Data designer • Information manipulation Informatics • Create/collect informationBio/life • Statistic analysis • Date inference, learning • Model from data • Model from SB • Large scale model Modeling & learning SB 13 Etienne Gnimpieba, 2012 23
Bioinformatics Applications Overview 14 Pharma- Biology Ad Hoc Interface cologyPART I: FundamentalsPART II: CareerPART III: Applications Tools Tools Ad Hoc Interface Ad Hoc Interface Tools Tools Ecology Medicine CORE Tools Tools Tools Tools Computer Molecular Science Ad Hoc Interface Nutrition 14 COSBI Report, 2010 25
Bioinformatics Applications Small synopsis view of bioinformatics 15 15 Korean Bioinformation Center, 2010 26
Bioinformatics Applications Informatician’s view of bioinformatics • Data manipulation – Data analysis – Designing database and databank – Management (collect, store, explore, secure) – Inference/ mining – Statistics • Model design – From biological process to mathematical formalism – Model checking and validation • Program building – Data analyzing tools (implement algorithm) – Integration tools (data, program, model) – Modeling & Simulink tools – Data protection tools – … 27
Bioinformatics Applications Exeample 1 Data Manipulation Molecular online tools and Bioextract Server. 28
Lab #1 Molecular online tools and server 16 Context Biological HypothesisStatement of problem / Case study: The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The Reduced expression of frataxin is theprotein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble cause of Friedrichs ataxia (FRDA), aclusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic lethal neurodegenerative disease, howcondition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty. about liver cancer? 0. Specification & aims Resolution process Aim: T1. Genome exploration: The purpose of this experiment is to initiate online Objective: used of Ensembl online tools to localize the FXN on the human genome and biological exploration tools of the human genome. We identify the genes implicate in pancreatic cancer disease. After, getting an appropriate simulated the application (FXN gene and pancreatic data (sequence) on FASTA and Blast format. cancer). Now we can understand how a researcher can come to identify cross biological knowledge available T1.1. Locate a given gene on human genome in data banks. T1.2. Get a genomic sequence from NCBI T1.3. Get the protein information and sequence from EBI Keywords: T1.4. Save the export sequences data in data folder Bio: FXN, Frataxin, pancreatic cancer, CDKN4 Math: HMM, Informatics: programing, bioinformatics tools, getting T2. Sequences manipulation and exporting data Frataxin molecule structure Objective: Find similar sequence using BLAST tools and make an alignment on given FXN on chromosome 9 (pymol) sequences. T2.1. Find similar sequences using BLAST tool T2.2. Align generated sequences with ClustalW tool T1.3. Visualized result using phylogenic tree on Jalview Biological DB ? T2. Bioextract server Objective: used server tool to optimized data manipulation process, apply on Bioextract server. Tools T3.1. Server Initialization T3.2. Pancreatic cancer & Frataxin (FXN) T3.3. Mapping, Alignment Pancreas anatomy Pancreatic cancer T3.4. Workflow save & reused Acquired skills Online and server tools: - Query biological DB (fasta, Html, txt, figure formats) Conclusion: ? - Sequence tools (protein and gene) Mapping (tmap) Alignment (clustalw2) - Manage data result (select, keep, map, export) - Built and reuse workflow 16 Korean Bioinformation Center, 2010 29
Bioinformatics Applications Biostatistics: gene expression data analysis Gene expression data (microarray, NGS) analysis process Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment Image analysis Normalization Estimation Testing Clustering Discrimination Biological verification and interpretation 31
Bioinformatics Applications Example 3 Model design Mathematical modeling of molecular nutrition From food to molecule: folate absorption, metabolism, and distribution 32
Bioinformatics Applications Model design: Molecular nutrition and nutrigenomic 17 17 Achuthsankar S. Nair, 2007
Bioinformatics Applications Example 2 Model design Mathematical modeling of Biological systems Folate mediate one carbon metabolism: MTHFR (gene) mutation and cancer genesis 34
cle 105 4 10Bioinformatics Applications 15 15 10 100 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 10 2 5 5 Mathematical modeling of Biological systems Time(Hours) 18 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Time(Hours) 00 0.5 1 1.5 2 2.5 3 Time(Hours) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Time(hours) Folate metabolism (folic acid or Vitamin B9) and pathogenesis Time(Hours) 45 140 AdoMet/AdoHc 40 60 135 AdoMet 35 y Formalization of the model of metabolic networks 50 130 30 Unit AdoMet/AdoHcy 40 125 25 S dm ( t , P ) AdoHcy(µM) AdoMet(µM) Unit rij(Eij,Vij) Transmethylation pathway Vc ( t , m ( t , P ),P ) Vr ( t )) UM 120 20 30 dt 15 115 m 20 m rii(Eii,Vii) m ( t0 , P ) m0 ( P ) 110 10 105 5 i 10 rji(Eji,Vji) j vij f (t , mij , Pij ) 0 100 0 0.20 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 Time(Hours) 0.8 1 1.2 1.4 Time(Hours) IntraCellCp.DNA Time(Hours) States versus Time IntraCellCp.DNA_CH3 States versus Time kc 18 10 20.02 Homocystei ne Methionine 10.18 20 10.16 16 45 0.7 DNA 9 19.98 DNA-CH3 10.14 40 d Homocystei ne 14 0.6 35 8 19.96 10.12 kc . Homocystei ne dUMP(µM) dTMP(µM) 12 0.5 19.94 10.1 Unit Unit 7 30 dt AdoHcy/AdoMet AdoMet/AdoHcy UM UM Amount (µM) Amount (µM) 10 25 6 19.92 0.4 10.08 8 20 5 19.9 0.3 d Methionine 10.06 6 15 19.88 kc . Homocystei ne 10.04 4 10 4 19.86 0.2 dt 10.02 3 0.1 5 19.84 10 2 0 5 10 15 0 2 0 0 Time(Hours) 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 Time(Hours) 1 0 1 2 3 4 Time(Hours) 5 6 1 2 3 4 5 6 Time (Hours) Time (Hours) 20.02 2.01 10.18 0.514 Uracile methylation 20 dUMP 10.16 0.512 19.98 2 10.14 dUMP/dTMP 0.51 19.96 1.99 10.12 Unit Unit dUMP/dTMP dTMP/dUMP dUMP(µM) dTMP(µM) 19.94 10.1 0.508 UM 1.98 19.92 10.08 0.506 19.9 1.97 10.06 0.504 19.88 10.04 1.96 0.502 19.86 10.02 19.84 1.95 10 0.5 0 5 10 15 0 0 5 5 10 10 15 15 0 Time(Hours) Time(Hours) Time(Hours) 18 J. M. 2.01 Scott, 1994 0.514 35
Bioinformatics Applications Example 4 Model design Drug-DNA interaction  Saffroy & al., 2004  Chango & al., 2008 36
Bioinformatics Applications Model design: drug-DNA interaction 19 Protein/DNA Ligand (drug molecule) Evaluate the uploaded molecule Predict the possible target protein through the Lipinskis Rule of Five allosteric site Target Protein ready for Docking Target Protein ready for Docking Docking & Scoring  Saffroy & al., 2004 37 19 B. Jayaram, 2011  Chango & al., 2008 37
Bioinformatics Applications Example 5 Model design 3D Modeling /simulation in biology  Saffroy & al., 2004  Chango & al., 2008 38
Bioinformatics Applications Model design: 3D Modeling 20, 21 Google Body browser E-cell project 20 Google, 2011 21 E-Cell.org, 2011 39
Bioinformatics Applications Example 6 Model design Cancer tumor model  Saffroy & al., 2004  Chango & al., 2008 40
Bioinformatics Applications Model design: cancer tumor development 22 22 Northwestern, 2010 41
Bioinformatics Applications Example 7 Model design Epidemiology: HIV spread 42
Bioinformatics Applications Model design: HIV spread 23 23 Northwestern, 2010 43