Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Molecular biology in the information era

529 views

Published on

Talk at Winter School organized by the IU Genetik Klubü, Department of Molecular Biology and Genetics, Istanbul University

Published in: Education
  • Be the first to comment

  • Be the first to like this

Molecular biology in the information era

  1. 1. Molecular Biology in the Information Era WinterSchool2015 Andrés Aravena, PhD - Istanbul University Department of Molecular Biology and Genetics - 7 March 2015
  2. 2. My name is Andrés Aravena Türkçe bilmiorum ! I am New Assistant Professor at Molecular Biology and Genomics Department Mathematical Engineer, U. of Chile PhD Informatics, U Rennes 1, France PhD Mathematical Modeling, U. of Chile not a Biologist but an Applied Mathematician who can speak "biologist language" · · · · · · 3/67
  3. 3. I will speak about The Past, Present and Future Facts, opinion and guess What I've done before so you can understand why I'm here What I'm doing now at Istanbul University What I foresee from my "outsider" point of view · · · 4/67
  4. 4. I've worked on Big and small computers Telecommunication Networks Between 2003 and 2014 I was the chief research engineer · · · on the main bioinformatic group in my country in the top research center (CMM) in the top university (University of Chile) of my country - - - - 5/67
  5. 5. I come from Chile 6/67
  6. 6. Chile Small country of ~17 million people Universities ranks similar to Turkish ones Spanish colony 500 years ago (so language is Spanish) Independent Republic 200 years ago First Latin American country to recognize Turkish republic OECD member Everyday life very similar to Turkey 7/67
  7. 7. Chilean Economy: Exports 1st world producer of copper 2nd world producer of salmon Fruits: peaches, grapes, apples, avocado Wine: exported worldwide Official data for 2014 9/67
  8. 8. The natural question was How can we improve these industries using Molecular Biology and Bioinformatics?
  9. 9. Fruits Peach and Grapes Gene expression analysis for industrial applications: Peach: response to cold stress Grapefruit: development related to seed and grape size (Sultaniye) · · 11/67
  10. 10. Fishes Salmon Farmed salmons are feed with cheap vegetal protein But wild salmons eat animal protein How is salmon's metabolism affected by the diet? Which genes change their expression because the changes in food? Gene expression analysis using microarrays Fish selection for breeding using microarrays (patent pending) · · 12/67
  11. 11. Fishes Salmon Genomic Sequence ... and sequencing of whole Salmo salar genome (10 million dollars project) 13/67
  12. 12. Wine Chilean wine travels long distances to final markets Any yeast contamination means big economic loses (people stops buying all Chilean brands) Quality control is usually done growing samples for 3 days But time is expensive: penalty for shipping delays We designed qPCR method for rapid detection of yeast contamination It is currently used by one major wine producer in Chile. It may be sold to Roche. 14/67
  13. 13. Mining industry molecular biology to extract copper A little chemistry: Copper is part of a compound, with Sulfur and Iron. Ferric acid separates it. Cu2S + 4Fe3+ 2Cu2+ + 4Fe2+ + S Resulting Cu2+ is soluble and is recovered. But all Fe3+ transforms to Fe2+ and reaction stops There are bacteria that "eat" e- and keep the reaction going on Fe2+ Fe3+ + e- 15/67
  14. 14. Why is it important? The biological method is much better that the standard one The goal is to understand and improve the involved bacteria so this technology can be used extensively Enables building new mines It is like discovering petrol reserves for the country Reduced contamination Cheaper · · 16/67
  15. 15. Most of the results are still industrial secret We had a research contract with the main mining company State owned, big enough to pay for long term research Few papers, many patents 17/67
  16. 16. Bioidentification Monitoring the presence of good bacteria We need to control the "ecosystem" on the mine Molecular Biology methods are fast, sensible and reliable They can be used in place: metagenomic approach. No culture Key problem: Design probes that match a taxonomic branch, not a specific strain The probes should be tolerant to mutations that occur in environmental samples with many strains Classical tools don't work on big scales 18/67
  17. 17. Design of probes for complex samples I designed and built a solution using a super-computer Calculation tool one day on 32 processors (one processor month) Resulting probes worked as expected They can be used on qPCR or in microarrays. 19/67
  18. 18. Automatic Interpretation of Results using a Statistical Classification Model 20/67
  19. 19. Publications The microarray was published in N. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass, P. Parada, Design and use of oligonucleotide microarrays for identification of Biomining microorganisms. Advanced Materials Research 71-73 (2009) 155-158. 21/67
  20. 20. Patents The method and the probes have been patented in USA, Number: US 7 853 408 B2, Date: 14/12/2010; South Africa, Number: 2006/06828, Date: 26/03/2008; Australia, Number: 2006203551, Date: 15/09/2011; Mexico, Number: PXMX 32/2006, Date: November 2012. Peru, Number: PE 5838, Date: 29/10/2010; Chine, Number: 200810095172.6, Date: 2013; Chile, Number: DPI-660-2007, Date: 06/05/2013; Argentina, Number: AR056179 · · · · · · · · 22/67
  21. 21. Functional genomics How does the bacteria work? To improve the process we need to see inside the black box. We sequenced the complete genome of 3 bacteria We paid over USD $150K. Today is USD $5K Hint: Sequence assembly requires a big computer. It does not work on a regular PC Acidithiobacillus ferrooxidans Acidithiobacillus thiooxidans Leptospirillum ferrooxidans · · · 23/67
  22. 22. Modeling Metabolism We predict which genes code enzymes Each enzyme catalyzes a reaction, with a known stoichiometry Every reaction gives an equation All equations plus boundary conditions give model to predict metabolite concentration We can predict how the cell adapts to environmental changes 24/67
  23. 23. Modeling Regulation From the genome sequence we can predict which genes code for transcription factors and they bind They form a putative regulatory network. But current methods produce too many false positives We expected ~4K regulations. We got 25K regulations. I integrate this model with microarray data to find the "most probable" regulatory network using a parsimony criterium 25/67
  24. 24. Systems Biology beyond Bioinformatics A very active research area that aim to understand the cell as a system with complex interactions The focus is not on the genes, is on the genome The key is to understand networks regulatory metabolic signaling protein-protein-interaction · · · · 26/67
  25. 25. The present Why Computers in Molecular Biology and Genetics?
  26. 26. DNA is digital information All experimental values in science are measured with an observational error. (e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa) Except genetic sequences: Nucleotides are either A, C, T or G. There is no "average" or "intermediate case" So is natural to use computers and information theory to model DNA but there is another reason ... 28/67
  27. 27. 29/67
  28. 28. Science converges to Molecular Biology Physicists, mathematicians, computer scientist and engineers, turned their attention to molecular biology questions. They come looking with new eyes and creating new theoretical and practical tools. Molecular Biology has always interacted with other disciplines Just consider the word "Biochemistry" 30/67
  29. 29. Internet makes Molecular Biology theory accessible to more people Before Internet times top science was accessible only to researchers with money to finding references took several weeks by regular mail Professors had the only copy of the textbooks · make complex experiments or buy expensive books and journals - - · · 31/67
  30. 30. Today all journals are accessible on-line references are download in minutes at low cost experimental results of each article are also free · · free when the article is Open Access- · 32/67
  31. 31. Anyone can analyze this data Structured data is easy to process to discover new knowledge. The software for this meta-analysis is also Open Source Scientist can adapt the program internal code to solve their specific question Anyone can download these programs without cost. If the analysis requires big computational power you can rent it at low cost 33/67
  32. 32. You don't need your own super-computer You can rent Cloud computers Companies like Amazon.com and Google sell their spare computer power at low prices This enables researchers to carry computations that would be impossible otherwise. 34/67
  33. 33. The World is Flat This democratization of knowledge provides an exciting challenge. Rich countries have no longer the monopoly of knowledge. We can be players in the big leagues, on a leveled surface. We can read the same books and the same articles, use the same machines and the same programs. Anyone could make the new scientific breakthrough, either in New York, New Delhi or Istanbul. But the same opportunity presents to everyone else. 35/67
  34. 34. There are more PhD students than ever And many of them will be on Molecular Biology Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79. 36/67
  35. 35. More players come to the game Emerging economies push up the number of researchers worldwide India graduates more than a million engineers each year. Many of them in biotechnology Egypt has 35.000 PhD students and Israel 10.000. Many of them will find jobs in Molecular Biology companies or academia Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84. 37/67
  36. 36. How will we be different?
  37. 37. Success of Molecular Biology generates Big Data Advances in molecular biology technology has produced They produce new generation sequencers microarrays mass spectrometers real-time PCR. · · · · reproducible experimental results in big volumes at low cost · · · 39/67
  38. 38. Data production costs is falling National Human Genome Research Institute. http://genome.gov/sequencingcosts 40/67
  39. 39. Extracting Information from Raw Data Surviving the Data Tsunami In a few years we passed from lack of data to excess of it We need to learn how to extract biological meaning from big volumes of data Classical methods are not enough What is significant? What is the "null hypothesis"? 41/67
  40. 40. If we don't fully analyze our own experimental data, someone else will do And they will publish it
  41. 41. The plan what we will teach
  42. 42. Teaching "Introduction to Data Science" The students will learn how to handle experimental data how to communicate with scientists of other data-oriented disciplines how to produce publication quality reports with reproducible results How to get raw data, extracting relevant information, filter it using several selection criteria. How to store and retrieve it in efficient and useful ways. How to transform it, organize it, categorize it, display, show and understand the results. · · · · · · 44/67
  43. 43. Teaching "Scientific Computing" Teach Python and BioPython to analyze, model, evaluate and predict the behavior of genomic and molecular biology entities. The students should be able to interact with high end servers, use command line tools and be comfortable in computing environments others than Microsoft Windows. Tools include Unix command line tools, SQL and the R statistical package. The student should be able to understand how computer networks work and what are their limitations. 45/67
  44. 44. The idea is no to be experts on computers, but to have the concepts and language to work in interdisciplinary groups
  45. 45. Let's start learning Data Science To test these ideas we start next week an Introduction to Data Science Workshop The mathematical tools can be explored together with the biological context, so they make sense and are easier to learn. I will give you a link at the end of this talk. If you are interested visit the webpage and send an email. after all, maybe I'm just crazy 47/67
  46. 46. Every normal student is capable of good mathematical reasoning if attention is directed to activities of his interest “ ” Jean Piaget, 1976 Swiss psychologist and philosopher
  47. 47. A Secret You can also learn at home Everything we will show is available on the Internet You just need to look for it But it is in English Translation takes too long Translated science is obsolete science 49/67
  48. 48. The Future My personal prediction
  49. 49. It is hard to make predictions, especially about the future “ ” Danish proverb
  50. 50. Molecular Biology has become mainstream Genomic tools are also used outside academia. Several companies provide "personalized DNA services". Both offer to trace ancestry and migrations of the human population. Any person can know which are his true origins. 23andMe, partially owned by Google. The Genographic project, created by the National Geographic Society and IBM. · · 52/67
  51. 51. Molecular Biology will follow the path of computers Today PCR thermocyclers are expensive devices found in universities and research centers, very much like desktop computers were in the 70's and 80's. Nowadays computers are low-cost and found everywhere. Will the same happen with PCR? 54/67
  52. 52. PCR future Today only a few companies produce PCR thermocyclers, just like smartphones such as the iPhone and Samsung. Nevertheless you can see them everywhere. And this is a big opportunity for creators of software applications. The value is in the apps. Ask Nokia or Blackberry 55/67
  53. 53. A computer on every desk and in every home, all running Microsoft software “ ” Bill Gates, Microsoft’s founding mission.
  54. 54. PCR is the new PC Gates set this goal in the late 70's, when it was not obvious if people would even see a computer in their lives. PCR technology is now in the same state that Personal Computers were in 1975. If PCR machines become inexpensive, then who will be making "software apps" for them? and there is "a PCR on every desk and home", in hospitals, restaurants and high schools, · · · · 57/67
  55. 55. If PCR machines are available everywhere applications can be: Determining ancestry (e.g. race horses, farm animals, fishes) Detection of unwanted organisms Marker-assisted breeding Food quality control (e.g. in an university canteen) Security and control of Genetically Modified Organisms Polymorphism detection Clinical diagnosis Personalized medicine Police forensic analysis · · · · · · · · · 58/67
  56. 56. Software for PCR the specific parameters of an application I think we should prepare our students to make these "apps". They should have easy access to low-cost thermocyclers, use them frequently and creatively. Then, like in the computer industry, they may create completely new applications that we cannot foresee now. DNA extraction protocols Primers design Amplification protocols Detection methods · · · · 59/67
  57. 57. New tools for new science
  58. 58. New Instruments trigger advances in Molecular Biology and in other sciences They are usually named according to their inventor Galileo created modern science when he made his own telescope Newton also invented a new kind of telescope, still used today Bunsen enabled spectrometry analysis with his burner Svedberg ultracentrifugue (16S) Sanger DNA sequencing method Southern blot method for specific DNA detection PCR to amplify DNA samples · · · · · · · 61/67
  59. 59. Scientific Instrumentation I propose to create a course on "Scientific Instrumentation" using initially software tools. Making instruments is now "software", not craftsmanship. We can understand this with a biological analogy. Designs in digital files are like genes. 3D printers are like ribosomes, producing physical versions of the design. Online collaboration is like the evolution: designs are changed to improve their fitness. · · · 62/67
  60. 60. It is not rocket science
  61. 61. It is not heart surgery
  62. 62. Teşekkür Ederim andres.aravena@istanbul.edu.tr
  63. 63. http://anaraven.github.io/data-science-workshop/

×