A proposed undergraduate bioinformatics curriculum for


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

A proposed undergraduate bioinformatics curriculum for

  1. 1. A proposed undergraduate bioinformatics curriculum for computer scientists Crossing the Interdisciplinary Boundaries Drs. Travis Doom (CS), Michael Raymer (CS), Dan Krane (Bio), and Oscar Garcia (CS) This work supported by NSF grant #EIA-0122582                             
  2. 2. Overview <ul><li>What is bioinformatics? </li></ul><ul><ul><li>The genome as an information source </li></ul></ul><ul><ul><li>Bioinformatics problems </li></ul></ul><ul><li>How do people learn bioinformatics? </li></ul><ul><li>A bioinformatics curriculum </li></ul>
  3. 3. Genomic information: from genes to proteins <ul><li>TATAAGCTGACT GTC ACTGA </li></ul>one codon 3apr.pdb 4 Bases: A,G,C,T 20 Amino Acids Protein: Structural or Enzyme
  4. 4. Bioinformatics Problems <ul><li>Sequence alignment </li></ul><ul><ul><li>Given a gene, search a database for similar genes </li></ul></ul><ul><li>Protein folding </li></ul>GCTATAATGCGTGT*CCA*CGCA GC*A*AATGC*TGTACCATCGCA
  5. 5. Bioinformatics Problems <ul><li>Complementarity </li></ul><ul><ul><li>Shape </li></ul></ul><ul><ul><li>Chemical </li></ul></ul><ul><ul><li>Electrostatic </li></ul></ul>? Drug Lead Screening/Docking
  6. 6. The Role of Computation <ul><li>Target Identification: Pattern Recognition, Data Mining, Dynamic Programming </li></ul><ul><ul><li>Finding proteins that are likely to be related to disease & determining their active sites </li></ul></ul><ul><ul><li>Finding genes that code for these proteins </li></ul></ul><ul><li>Finding drug leads: Databases, Parallel Systems, Graph Theory, etc. </li></ul><ul><ul><li>Database screening </li></ul></ul><ul><ul><li>Docking </li></ul></ul><ul><li>Refining leads: Knowledge-Based & Expert Systems, AI, Pattern Recognition, Graph Theory </li></ul><ul><ul><li>Toxicology & delivery </li></ul></ul>
  7. 7. Growth of biological databases Source: GenBank 3D Structures Growth: Source: http://www.rcsb.org/pdb/holdings.html GenBank BASEPAIR GROWTH:
  9. 9. Bioinformatics Problems <ul><li>Site Recognition </li></ul><ul><ul><li>Active site </li></ul></ul><ul><ul><li>Other binding sites </li></ul></ul><ul><li>Data Integration </li></ul><ul><ul><li>Indexing, retrieval </li></ul></ul><ul><ul><li>Formatting </li></ul></ul>
  10. 10. A growing research industry Source: Ernst & Young 13 th & 14 th Annual Reports, Biospace. 3,500 2,007 1,222 1,354 10,896 0 2,000 4,000 6,000 8,000 10,000 12,000 96 97 98 99 Qt1, 00 Cash Inflow ($M) 3x Bioinformatic s Related (B) Other Biotech (O) 37% 63% AVERAGE QUARTERLY FINANCING Year >95% O <5% B
  11. 11. Overview <ul><li>What is bioinformatics? </li></ul><ul><li>How do people learn bioinformatics? </li></ul><ul><ul><li>Learning a bilingual discipline </li></ul></ul><ul><ul><li>Why undergraduates? </li></ul></ul><ul><li>A bioinformatics curriculum </li></ul>
  12. 12. Bioinformatics in the US <ul><li>The demand is growing </li></ul><ul><ul><li>The National Institute for General Medical Sciences (NIGMS) has issued a report that shows there is a critical need for researchers for other disciplines that can perform the kind of modeling and data analysis that biological researchers require. </li></ul></ul><ul><li>Graduate programs are flourishing </li></ul><ul><ul><li>Approximately 20 US universities started graduate programs in Bioinformatics last year. </li></ul></ul><ul><ul><li>New graduate programs are being proposed at many universities across the nation. </li></ul></ul>
  13. 13. The Problem <ul><li>Bioinformatics is interdisciplinary </li></ul><ul><ul><li>Students must posses a strong grasp of computer science fundamentals </li></ul></ul><ul><ul><li>Students must posses a strong grasp of biochemistry to recognize and appreciate the results </li></ul></ul><ul><ul><li>Learning to speak the languages of both fields is essential </li></ul></ul><ul><ul><li>Learning to “think” as a bioinformatician requires training in both the scientific method and solid engineering design methodology </li></ul></ul><ul><li>We believe this can (and must) be done at the undergraduate level </li></ul>
  14. 14. The Problem <ul><li>To pursue a career or graduate study in bioinformatics, a CS student must be familiar with: </li></ul><ul><ul><li>“ Classical” CS: introductory programming, data structures,, formal and comparative languages (complexity and optimizaiton algorithms), probability and statistics </li></ul></ul><ul><ul><li>“ Contemporary” CS: AI algorithms (search, optimization, list processing, pattern recognition, etc.), databases (storage, transmission, and processing of large data sets), modeling and simulation </li></ul></ul><ul><ul><li>Biology: genetics, molecular bio, cellular bio, gene expression, replication, recombination, repair, and the experimental tools of molecular biology (~2.5 years) </li></ul></ul><ul><ul><li>Chemistry: inorganic and organic chemistry (~2 years) </li></ul></ul>
  15. 15. Overview <ul><li>What is Bioinformatics? </li></ul><ul><li>How do people learn bioinformatics? </li></ul><ul><li>How are we facilitating learning in bioinformatics at Wright State University? </li></ul><ul><ul><li>NSF CISE Educational Innovation Award </li></ul></ul><ul><ul><li>Towards an accredited undergraduate program in bioinformatics </li></ul></ul>
  16. 16. NSF Educational Innovation <ul><li>The NSF’s directorate for Computer and Information Sciences and Engineering has awarded WSU an Educational Innovation grant. </li></ul><ul><ul><li>Crossing the interdisciplinary barrier: An integrated undergraduate program in bioinformatics </li></ul></ul><ul><ul><li>Three year plan – Fall 2001 to Summer 2004. </li></ul></ul><ul><ul><li>Goal: An interdisciplinary baccalaureate bioinformatics program in Computer Science at WSU to serve as a national model of excellence </li></ul></ul>
  17. 17. The Big Picture <ul><li>Graduate programs accept students with either bachelor’s degrees in CS or Biology </li></ul><ul><ul><li>The majority of the first year of graduate study is generally consumed with remedial coursework in the other discipline </li></ul></ul><ul><li>Undergraduate programs must incorporate: </li></ul><ul><ul><li>More specific (and shorter) biology and chemistry sequences </li></ul></ul><ul><ul><li>More focused computer science foundation </li></ul></ul><ul><ul><li>Redesignate traditional “core” CS with contemporary areas of IT knowledge </li></ul></ul>
  18. 18. Goal: Integrating research <ul><li>Integrating research into the undergraduate curriculum </li></ul><ul><ul><li>Academic collaborations </li></ul></ul><ul><ul><li>Industry collaborations for research and internship </li></ul></ul><ul><li>Why is bioinformatics a rich field for integration? </li></ul><ul><ul><li>Apply the tools to new data </li></ul></ul><ul><ul><li>Develop new tools </li></ul></ul>
  19. 19. Goal: Minimal New Resources <ul><li>Bio/CS 2xx – Introduction to Bioinformatics </li></ul><ul><ul><li>Tools-oriented approach to bioinformatics emphasizing data structure in DNA, string representation in PERL, data searches, pairwise alignment, substitution patterns, protein structure prediction and modeling, proteomics, and the use of web-based bioinformatic tools </li></ul></ul><ul><li>Bio/CS 4xx – Algorithms for Bioinformatics </li></ul><ul><ul><li>Theory-oriented approach to the application of contemporary algorithms to bioinformatics. Graph theory, complexity theory, dynamic programming and optimization techniques are introduced in the context of application toward solving specific computational problems in molecular genetics </li></ul></ul>
  20. 20. Goal: Strong CS BS program <ul><li>This degree program should a different but strong CS BS student: </li></ul><ul><ul><li>We use the CAC guidelines as a rule for “core” CS </li></ul></ul><ul><ul><li>Other components developed in close collaboration with Biology and an industry panel </li></ul></ul><ul><li>CAC guidelines include: </li></ul><ul><ul><li>Algorithms, data structures, software design, programming languages (variety), computer org. & arch., discrete math, calculus, statistics, lab science, and development of oral, written, and social/ethical skills </li></ul></ul>
  21. 21. Towards a CAC accredited program Courses Removed 3xx-04 Digital Sys. Design 4xx-04 Concurrent Software 4xx-04 Formal Languages 4xx-04 Software Engineering xxx-20 CS Electives package 1xx-16 Physics sequence xxx-04 Science elective xxx-24 Concentration reqs. (MTH/SCI/ENG) 80 QH removed Courses Added 2xx-04 Intro. Bioinformatics 4xx-04 Artificial Intelligence 4xx-04 Algorithms for Bioinf. 4xx-04 Databases xxx-08 Focused CS electives 1xx-15 Inorganic Chemistry 2xx-18 Organic Chemistry xxx-29 Biology sequence 82 QH added
  22. 22. Towards a CAC accredited program <ul><li>195 Total Quarter Credit Hours </li></ul><ul><ul><li>42 General Education (as per CS) </li></ul></ul><ul><ul><li>66 Computer Science / Engineering (Vs. 82) </li></ul></ul><ul><ul><ul><li>Includes AI, Databases, two new bioinformatics courses; excludes Digital System Design, Formal Languages, Software Eng., Concurrent Software </li></ul></ul></ul><ul><ul><li>29 Biology (~two year sequence) (Vs. 24 Concentration) </li></ul></ul><ul><ul><li>33 Chemistry (two year sequence) (Vs. 19 MTH/Sci) </li></ul></ul><ul><ul><li>25 Mathematics (as per CS) </li></ul></ul><ul><li>Approved Winter 2002 </li></ul>
  23. 23. Un undergraduate textbook Fundamental Concepts in Bioinformatics I. Molecular Biology and Biological Chemistry II. Data searches and pairwise alignments III. Substitution patterns IV. Distance-based methods of phylogenetics V. Character-Based approaches to phylogenetics VI. Gene recognition: Prokaryotic Genomes VII. Gene Recognition: Eukaryotic Genomes VII. Protein folding VIII. Proteomics Appendix 1: A gentle introduction to programming & data structures Appendix 2: Enzyme kinetics Appendix 3: Sample programs in Perl and worksets
  24. 24. Questions? [email_address] http://birg.cs.wright.edu
  25. 25. Simplified Diagram of Modern IT & CS Sofware Hardware Databases Networking Logic Databases Machine Reasoning DataWarehousing Web Programming WWW Datamining Video on Demand Parallelism Human-Computer Interaction Searching Classical View Modern IT View
  26. 26. Three Possible Views of Bioinformatics Computer Science Biology Is it Genomics in CS? Computer Science Biology Or is it CS in Biology? Or is it an independent discipline? This argues for the formation of interdisciplinary centers broader than either the bio or the informatics disciplines. See: “ Impact of Emerging Technologies on the Bio- logical Sciences ” at http://www.nsf.gov/bio/ pubs/stctech/stcmain.html A C T G
  27. 27. Sister program in Biology <ul><li>200 credit hour program in Biological Sciences </li></ul><ul><ul><li>42 General Education </li></ul></ul><ul><ul><li>63 Biology (~four year sequence) </li></ul></ul><ul><ul><ul><li>Includes two new bioinformatics courses </li></ul></ul></ul><ul><ul><li>28 Computer Science (~three year sequence) </li></ul></ul><ul><ul><li>33 Chemistry (two year sequence) </li></ul></ul><ul><ul><li>34 Mathematics and Physics </li></ul></ul><ul><li>Close collaboration with the department of computer science and an industrial panel </li></ul>
  28. 28. Bioinformatics Overview <ul><li>Genomics </li></ul><ul><ul><li>emphasis on genetics, chemical and physical aspects of flow of genetic information from DNA to proteins, gene expression, replication, recombination, and repair </li></ul></ul><ul><ul><li>Databases, Data Mining, Neural Networks, Pattern Recognition, etc. </li></ul></ul><ul><li>Proteomics </li></ul><ul><ul><li>Study of how genes make proteins. Emphasis on the structure and properties of proteins and ligands </li></ul></ul><ul><ul><li>Molecular modeling, Pattern Recognition, Data Mining, etc. </li></ul></ul>
  29. 29. Molecular Evolution XLRHODOP 1 ggtagaacagcttcagttgggatcacaggcttcta 35 |||||||||||||||||||||||||||||||||| XL23808 1171 tgggtcatactgtagaacagcttcagttgggatcacaggcttcta 1215 XLRHODOP 36 gggatcctttgggcaaaaaagaaacacagaaggcattctttctat 80 ||||||||||||||||||||||||||||||||||||||||||||| XL23808 1216 gggatcctttgggcaaaaaagaaacacagaaggcattctttctat 1260 XLRHODOP 81 acaagaaaggactttatagagctgctaccatgaacggaacagaag 125 ||||||||||||||||||||||||||||||||||||||||||||| XL23808 1261 acaagaaaggactttatagagctgctaccatgaacggaacagaag 1305 XLRHODOP 126 gtccaaatttttatgtccccatgtccaacaaaactggggtggtac 170 ||||||||||||||||||||||||||||||||||||||||||||| XL23808 1306 gtccaaatttttatgtccccatgtccaacaaaactggggtggtac 1350
  30. 30. Drug discovery life cycle Years 0 2 4 6 8 10 12 14 16 Discovery (2 to 10 Years) Preclinical Testing (Lab and Animal Testing) Phase I (20-30 Healthy Volunteers used to check for safety and dosage) Phase II (100-300 Patient Volunteers used to check for efficacy and side effects) Phase III (1000-5000 Patient Volunteers used to monitor reactions to long-term drug use) FDA Review & Approval Post-Marketing Testing $600-700 Million, 7 – 15 Years
  31. 31. Benefits of bioinformatics <ul><li>Every major pharmaceutical company now employs bioinformatics techniques to improve drug design (among other business aspects) </li></ul><ul><li>Increased understanding of evolution at the genetic/molecular level (phylogenetics) </li></ul><ul><li>Our best glimpse yet at the molecular mechanisms that regulate life at a cellular level and possibilities for simulating some aspects with a computer (basic science) </li></ul>