SlideShare a Scribd company logo
1 of 64
Download to read offline
Molecular Biology in the
Information Era
WinterSchool2015
Andrés Aravena, PhD - Istanbul University
Department of Molecular Biology and Genetics - 7 March 2015
My name is Andrés Aravena
Türkçe bilmiorum !
I am
New Assistant Professor at Molecular Biology and Genomics
Department
Mathematical Engineer, U. of Chile
PhD Informatics, U Rennes 1, France
PhD Mathematical Modeling, U. of Chile
not a Biologist
but an Applied Mathematician who can speak "biologist language"
·
·
·
·
·
·
3/67
I will speak about
The Past, Present and Future
Facts, opinion and guess
What I've done before
so you can understand why I'm here
What I'm doing now at Istanbul University
What I foresee from my "outsider" point of view
·
·
·
4/67
I've worked on
Big and small computers
Telecommunication Networks
Between 2003 and 2014 I was the chief research engineer
·
·
·
on the main bioinformatic group in my country
in the top research center (CMM)
in the top university (University of Chile)
of my country
-
-
-
-
5/67
I come from Chile
6/67
Chile
Small country of ~17 million people
Universities ranks similar to Turkish ones
Spanish colony 500 years ago (so language is Spanish)
Independent Republic 200 years ago
First Latin American country to recognize Turkish republic
OECD member
Everyday life very similar to Turkey
7/67
Chilean Economy: Exports
1st world producer of copper
2nd world producer of salmon
Fruits: peaches, grapes, apples,
avocado
Wine: exported worldwide
Official data for 2014
9/67
The natural question was
How can we improve these
industries
using Molecular Biology and Bioinformatics?
Fruits
Peach and Grapes
Gene expression analysis for industrial applications:
Peach: response to cold stress
Grapefruit: development related to seed and grape size (Sultaniye)
·
·
11/67
Fishes
Salmon
Farmed salmons are feed with cheap vegetal protein
But wild salmons eat animal protein
How is salmon's metabolism affected by the diet?
Which genes change their expression because the changes in food?
Gene expression analysis using
microarrays
Fish selection for breeding using
microarrays (patent pending)
·
·
12/67
Fishes
Salmon Genomic Sequence
... and sequencing of whole Salmo salar genome
(10 million dollars project)
13/67
Wine
Chilean wine travels long distances to final markets
Any yeast contamination means big economic loses
(people stops buying all Chilean brands)
Quality control is usually done growing samples for 3 days
But time is expensive: penalty for shipping delays
We designed qPCR method for rapid detection of yeast contamination
It is currently used by one major wine producer in Chile. It may be
sold to Roche.
14/67
Mining industry
molecular biology to extract copper
A little chemistry:
Copper is part of a compound, with Sulfur and Iron.
Ferric acid separates it.
Cu2S + 4Fe3+ 2Cu2+ + 4Fe2+ + S
Resulting Cu2+ is soluble and is recovered.
But all Fe3+ transforms to Fe2+ and reaction stops
There are bacteria that "eat" e- and keep the reaction going on
Fe2+ Fe3+ + e-
15/67
Why is it important?
The biological method is much better that the standard one
The goal is to understand and improve the involved bacteria so this
technology can be used extensively
Enables building new mines
It is like discovering petrol reserves for the country
Reduced contamination
Cheaper
·
·
16/67
Most of the results are still industrial secret
We had a research contract with the main mining company
State owned, big enough to pay for long term research
Few papers, many patents
17/67
Bioidentification
Monitoring the presence of good bacteria
We need to control the "ecosystem" on the mine
Molecular Biology methods are fast, sensible and reliable
They can be used in place: metagenomic approach. No culture
Key problem: Design probes that match a taxonomic branch, not a
specific strain
The probes should be tolerant to mutations that occur in
environmental samples with many strains
Classical tools don't work on big scales
18/67
Design of probes for complex samples
I designed and built a solution using a super-computer
Calculation tool one day on 32 processors (one processor month)
Resulting probes worked as expected
They can be used on qPCR or in microarrays.
19/67
Automatic Interpretation of Results
using a Statistical Classification Model
20/67
Publications
The microarray was published in
N. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass,
P. Parada, Design and use of oligonucleotide microarrays for identification
of Biomining microorganisms. Advanced Materials Research 71-73
(2009) 155-158.
21/67
Patents
The method and the probes have been patented in
USA, Number: US 7 853 408 B2, Date: 14/12/2010;
South Africa, Number: 2006/06828, Date: 26/03/2008;
Australia, Number: 2006203551, Date: 15/09/2011;
Mexico, Number: PXMX 32/2006, Date: November 2012.
Peru, Number: PE 5838, Date: 29/10/2010;
Chine, Number: 200810095172.6, Date: 2013;
Chile, Number: DPI-660-2007, Date: 06/05/2013;
Argentina, Number: AR056179
·
·
·
·
·
·
·
·
22/67
Functional genomics
How does the bacteria work?
To improve the process we need to see inside the black box. We
sequenced the complete genome of 3 bacteria
We paid over USD $150K. Today is USD $5K
Hint: Sequence assembly requires a big computer. It does not work
on a regular PC
Acidithiobacillus ferrooxidans
Acidithiobacillus thiooxidans
Leptospirillum ferrooxidans
·
·
·
23/67
Modeling Metabolism
We predict which genes code
enzymes
Each enzyme catalyzes a reaction,
with a known stoichiometry
Every reaction gives an equation
All equations plus boundary
conditions give model to predict
metabolite concentration
We can predict how the cell adapts
to environmental changes
24/67
Modeling Regulation
From the genome sequence we can predict which genes code for
transcription factors and they bind
They form a putative regulatory network.
But current methods produce too many false positives
We expected ~4K regulations. We got 25K regulations.
I integrate this model with microarray data to find the "most
probable" regulatory network using a parsimony criterium
25/67
Systems Biology
beyond Bioinformatics
A very active research area that aim to understand the cell as a
system with complex interactions
The focus is not on the genes, is on the genome
The key is to understand networks
regulatory
metabolic
signaling
protein-protein-interaction
·
·
·
·
26/67
The present
Why Computers in Molecular
Biology and Genetics?
DNA is digital information
All experimental values in science are measured with an observational
error.
(e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa)
Except genetic sequences: Nucleotides are either A, C, T or G.
There is no "average" or "intermediate case"
So is natural to use computers and information theory to model DNA
but there is another reason ...
28/67
29/67
Science converges to Molecular Biology
Physicists, mathematicians, computer scientist and engineers, turned
their attention to molecular biology questions.
They come looking with new eyes and creating new theoretical and
practical tools.
Molecular Biology has always interacted with other disciplines
Just consider the word "Biochemistry"
30/67
Internet makes Molecular Biology theory
accessible to more people
Before Internet times
top science was accessible only to researchers with money to
finding references took several weeks by regular mail
Professors had the only copy of the textbooks
·
make complex experiments or
buy expensive books and journals
-
-
·
·
31/67
Today
all journals are accessible on-line
references are download in minutes at low cost
experimental results of each article are also free
·
·
free when the article is Open Access-
·
32/67
Anyone can analyze this data
Structured data is easy to process to discover new knowledge.
The software for this meta-analysis is also Open Source
Scientist can adapt the program internal code to solve their specific
question
Anyone can download these programs without cost.
If the analysis requires big computational power you can rent it at low
cost
33/67
You don't need your own super-computer
You can rent Cloud computers
Companies like Amazon.com and Google sell their spare computer
power at low prices
This enables researchers to carry computations that would be
impossible otherwise.
34/67
The World is Flat
This democratization of knowledge provides an exciting challenge.
Rich countries have no longer the monopoly of knowledge.
We can be players in the big leagues, on a leveled surface.
We can read the same books and the same articles, use the same
machines and the same programs.
Anyone could make the new scientific breakthrough, either in New
York, New Delhi or Istanbul.
But the same opportunity presents to everyone else.
35/67
There are more PhD students than ever
And many of them will be on Molecular Biology
Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79.
36/67
More players come to the game
Emerging economies push up the number of researchers worldwide
India graduates more than a million engineers each year. Many of
them in biotechnology
Egypt has 35.000 PhD students and Israel 10.000.
Many of them will find jobs in Molecular Biology companies or
academia
Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84.
37/67
How will we be different?
Success of Molecular Biology generates Big Data
Advances in molecular biology technology has produced
They produce
new generation sequencers
microarrays
mass spectrometers
real-time PCR.
·
·
·
·
reproducible experimental results
in big volumes
at low cost
·
·
·
39/67
Data production costs is falling
National Human Genome Research Institute. http://genome.gov/sequencingcosts
40/67
Extracting Information from Raw Data
Surviving the Data Tsunami
In a few years we passed from lack of data to excess of it
We need to learn how to extract biological meaning from big volumes
of data
Classical methods are not enough
What is significant? What is the "null hypothesis"?
41/67
If we don't fully analyze our own
experimental data, someone else
will do
And they will publish it
The plan
what we will teach
Teaching "Introduction to Data Science"
The students will learn
how to handle experimental data
how to communicate with scientists of other data-oriented
disciplines
how to produce publication quality reports with reproducible
results
How to get raw data, extracting relevant information, filter it using
several selection criteria.
How to store and retrieve it in efficient and useful ways.
How to transform it, organize it, categorize it, display, show and
understand the results.
·
·
·
·
·
·
44/67
Teaching "Scientific Computing"
Teach Python and BioPython to analyze, model, evaluate and predict
the behavior of genomic and molecular biology entities.
The students should be able to interact with high end servers, use
command line tools and be comfortable in computing environments
others than Microsoft Windows.
Tools include Unix command line tools, SQL and the R statistical
package.
The student should be able to understand how computer networks
work and what are their limitations.
45/67
The idea is no to be experts on
computers, but to have the
concepts and language to work in
interdisciplinary groups
Let's start learning Data Science
To test these ideas we start next week an
Introduction to Data Science Workshop
The mathematical tools can be explored together with the biological
context, so they make sense and are easier to learn.
I will give you a link at the end of this talk.
If you are interested visit the webpage and send an email.
after all, maybe I'm just crazy
47/67
Every normal student is capable of good
mathematical reasoning if attention is
directed to activities of his interest
“
”
Jean Piaget, 1976
Swiss psychologist and philosopher
A Secret
You can also learn at home
Everything we will show is available on the Internet
You just need to look for it
But it is in English
Translation takes too long
Translated science is obsolete science
49/67
The Future
My personal prediction
It is hard to make predictions, especially
about the future
“
”
Danish proverb
Molecular Biology has become mainstream
Genomic tools are also used outside academia.
Several companies provide "personalized DNA services".
Both offer to trace ancestry and migrations of the human population.
Any person can know which are his true origins.
23andMe, partially owned by Google.
The Genographic project, created by the National Geographic Society
and IBM.
·
·
52/67
Molecular Biology will follow the path of
computers
Today PCR thermocyclers are expensive devices found in universities
and research centers, very much like desktop computers were in the
70's and 80's.
Nowadays computers are low-cost and found everywhere.
Will the same happen with PCR?
54/67
PCR future
Today only a few companies produce PCR thermocyclers, just like
smartphones such as the iPhone and Samsung.
Nevertheless you can see them everywhere.
And this is a big opportunity for creators of software applications.
The value is in the apps. Ask Nokia or Blackberry
55/67
A computer on every desk and in every
home, all running Microsoft software
“
”
Bill Gates,
Microsoft’s founding mission.
PCR is the new PC
Gates set this goal in the late 70's, when it was not obvious if people
would even see a computer in their lives.
PCR technology is now in the same state that Personal Computers
were in 1975. If PCR machines become inexpensive,
then who will be making "software apps" for them?
and there is "a PCR on every desk and home",
in hospitals,
restaurants
and high schools,
·
·
·
·
57/67
If PCR machines are available everywhere
applications can be:
Determining ancestry (e.g. race horses, farm animals, fishes)
Detection of unwanted organisms
Marker-assisted breeding
Food quality control (e.g. in an university canteen)
Security and control of Genetically Modified Organisms
Polymorphism detection
Clinical diagnosis
Personalized medicine
Police forensic analysis
·
·
·
·
·
·
·
·
·
58/67
Software for PCR
the specific parameters of an application
I think we should prepare our students to make these "apps".
They should have easy access to low-cost thermocyclers, use them
frequently and creatively.
Then, like in the computer industry, they may create completely new
applications that we cannot foresee now.
DNA extraction protocols
Primers design
Amplification protocols
Detection methods
·
·
·
·
59/67
New tools for new science
New Instruments trigger advances in Molecular
Biology
and in other sciences
They are usually named according to their inventor
Galileo created modern science when he made his own telescope
Newton also invented a new kind of telescope, still used today
Bunsen enabled spectrometry analysis with his burner
Svedberg ultracentrifugue (16S)
Sanger DNA sequencing method
Southern blot method for specific DNA detection
PCR to amplify DNA samples
·
·
·
·
·
·
·
61/67
Scientific Instrumentation
I propose to create a course on "Scientific Instrumentation" using
initially software tools.
Making instruments is now "software", not craftsmanship.
We can understand this with a biological analogy.
Designs in digital files are like genes.
3D printers are like ribosomes, producing physical versions of the
design.
Online collaboration is like the evolution: designs are changed to
improve their fitness.
·
·
·
62/67
It is not rocket science
It is not heart surgery
Teşekkür Ederim
andres.aravena@istanbul.edu.tr
http://anaraven.github.io/data-science-workshop/

More Related Content

Similar to Molecular biology in the information era

SMi Group's 3d Cell Culture 2019 conference
SMi Group's 3d Cell Culture 2019 conferenceSMi Group's 3d Cell Culture 2019 conference
SMi Group's 3d Cell Culture 2019 conferenceDale Butler
 
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...Manuel GEA - Bio-Modeling Systems
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsmikaelhuss
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2Isabelle Chiu
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 
Synthetic biology, Artificial intelligence, quantum computing - in genetics
Synthetic biology, Artificial intelligence, quantum computing - in geneticsSynthetic biology, Artificial intelligence, quantum computing - in genetics
Synthetic biology, Artificial intelligence, quantum computing - in geneticsSUMESHM13
 
Biochips seminar report
Biochips seminar reportBiochips seminar report
Biochips seminar reportGolam Murshid
 
Newsletter 224
Newsletter 224Newsletter 224
Newsletter 224ESTHHUB
 
Hartung - Lush Prize Conference 2014
Hartung - Lush Prize Conference 2014Hartung - Lush Prize Conference 2014
Hartung - Lush Prize Conference 2014LushPrize
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing techc.titus.brown
 
Biomaterials & Tissue engineering - London - Agenda
Biomaterials & Tissue engineering - London - AgendaBiomaterials & Tissue engineering - London - Agenda
Biomaterials & Tissue engineering - London - AgendaTony Couch
 
Biocomputing
BiocomputingBiocomputing
Biocomputingijtsrd
 
application_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptapplication_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptshankjunk
 
application_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptapplication_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptshankjunk
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Kees van Bochove
 
Essay On College Education. 24 Greatest College Essay Examples RedlineSP
Essay On College Education. 24 Greatest College Essay Examples  RedlineSPEssay On College Education. 24 Greatest College Essay Examples  RedlineSP
Essay On College Education. 24 Greatest College Essay Examples RedlineSPMelissa Otero
 
MNTL000_2016 Review 8_RVSD
MNTL000_2016 Review 8_RVSDMNTL000_2016 Review 8_RVSD
MNTL000_2016 Review 8_RVSDJonathan Lin
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainNatalio Krasnogor
 

Similar to Molecular biology in the information era (20)

SMi Group's 3d Cell Culture 2019 conference
SMi Group's 3d Cell Culture 2019 conferenceSMi Group's 3d Cell Culture 2019 conference
SMi Group's 3d Cell Culture 2019 conference
 
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
Conference-The-future-will-be-digital-and-biology-but who-will-lead-watson-go...
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
China Medical University Student ePaper2
China Medical University Student ePaper2China Medical University Student ePaper2
China Medical University Student ePaper2
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Synthetic biology, Artificial intelligence, quantum computing - in genetics
Synthetic biology, Artificial intelligence, quantum computing - in geneticsSynthetic biology, Artificial intelligence, quantum computing - in genetics
Synthetic biology, Artificial intelligence, quantum computing - in genetics
 
Biochips seminar report
Biochips seminar reportBiochips seminar report
Biochips seminar report
 
Newsletter 224
Newsletter 224Newsletter 224
Newsletter 224
 
Hartung - Lush Prize Conference 2014
Hartung - Lush Prize Conference 2014Hartung - Lush Prize Conference 2014
Hartung - Lush Prize Conference 2014
 
Introduction
IntroductionIntroduction
Introduction
 
BEACON 101: Sequencing tech
BEACON 101: Sequencing techBEACON 101: Sequencing tech
BEACON 101: Sequencing tech
 
Biomaterials & Tissue engineering - London - Agenda
Biomaterials & Tissue engineering - London - AgendaBiomaterials & Tissue engineering - London - Agenda
Biomaterials & Tissue engineering - London - Agenda
 
Biocomputing
BiocomputingBiocomputing
Biocomputing
 
application_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptapplication_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.ppt
 
application_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.pptapplication_of_bioinformatics_in_various_fields.ppt
application_of_bioinformatics_in_various_fields.ppt
 
MicrofluidicsCongressAgenda
MicrofluidicsCongressAgendaMicrofluidicsCongressAgenda
MicrofluidicsCongressAgenda
 
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016Using Healthcare Data for Research @ The Hyve - Campus Party 2016
Using Healthcare Data for Research @ The Hyve - Campus Party 2016
 
Essay On College Education. 24 Greatest College Essay Examples RedlineSP
Essay On College Education. 24 Greatest College Essay Examples  RedlineSPEssay On College Education. 24 Greatest College Essay Examples  RedlineSP
Essay On College Education. 24 Greatest College Essay Examples RedlineSP
 
MNTL000_2016 Review 8_RVSD
MNTL000_2016 Review 8_RVSDMNTL000_2016 Review 8_RVSD
MNTL000_2016 Review 8_RVSD
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 

Recently uploaded

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 

Recently uploaded (20)

DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 

Molecular biology in the information era

  • 1. Molecular Biology in the Information Era WinterSchool2015 Andrés Aravena, PhD - Istanbul University Department of Molecular Biology and Genetics - 7 March 2015
  • 2. My name is Andrés Aravena Türkçe bilmiorum ! I am New Assistant Professor at Molecular Biology and Genomics Department Mathematical Engineer, U. of Chile PhD Informatics, U Rennes 1, France PhD Mathematical Modeling, U. of Chile not a Biologist but an Applied Mathematician who can speak "biologist language" · · · · · · 3/67
  • 3. I will speak about The Past, Present and Future Facts, opinion and guess What I've done before so you can understand why I'm here What I'm doing now at Istanbul University What I foresee from my "outsider" point of view · · · 4/67
  • 4. I've worked on Big and small computers Telecommunication Networks Between 2003 and 2014 I was the chief research engineer · · · on the main bioinformatic group in my country in the top research center (CMM) in the top university (University of Chile) of my country - - - - 5/67
  • 5. I come from Chile 6/67
  • 6. Chile Small country of ~17 million people Universities ranks similar to Turkish ones Spanish colony 500 years ago (so language is Spanish) Independent Republic 200 years ago First Latin American country to recognize Turkish republic OECD member Everyday life very similar to Turkey 7/67
  • 7. Chilean Economy: Exports 1st world producer of copper 2nd world producer of salmon Fruits: peaches, grapes, apples, avocado Wine: exported worldwide Official data for 2014 9/67
  • 8. The natural question was How can we improve these industries using Molecular Biology and Bioinformatics?
  • 9. Fruits Peach and Grapes Gene expression analysis for industrial applications: Peach: response to cold stress Grapefruit: development related to seed and grape size (Sultaniye) · · 11/67
  • 10. Fishes Salmon Farmed salmons are feed with cheap vegetal protein But wild salmons eat animal protein How is salmon's metabolism affected by the diet? Which genes change their expression because the changes in food? Gene expression analysis using microarrays Fish selection for breeding using microarrays (patent pending) · · 12/67
  • 11. Fishes Salmon Genomic Sequence ... and sequencing of whole Salmo salar genome (10 million dollars project) 13/67
  • 12. Wine Chilean wine travels long distances to final markets Any yeast contamination means big economic loses (people stops buying all Chilean brands) Quality control is usually done growing samples for 3 days But time is expensive: penalty for shipping delays We designed qPCR method for rapid detection of yeast contamination It is currently used by one major wine producer in Chile. It may be sold to Roche. 14/67
  • 13. Mining industry molecular biology to extract copper A little chemistry: Copper is part of a compound, with Sulfur and Iron. Ferric acid separates it. Cu2S + 4Fe3+ 2Cu2+ + 4Fe2+ + S Resulting Cu2+ is soluble and is recovered. But all Fe3+ transforms to Fe2+ and reaction stops There are bacteria that "eat" e- and keep the reaction going on Fe2+ Fe3+ + e- 15/67
  • 14. Why is it important? The biological method is much better that the standard one The goal is to understand and improve the involved bacteria so this technology can be used extensively Enables building new mines It is like discovering petrol reserves for the country Reduced contamination Cheaper · · 16/67
  • 15. Most of the results are still industrial secret We had a research contract with the main mining company State owned, big enough to pay for long term research Few papers, many patents 17/67
  • 16. Bioidentification Monitoring the presence of good bacteria We need to control the "ecosystem" on the mine Molecular Biology methods are fast, sensible and reliable They can be used in place: metagenomic approach. No culture Key problem: Design probes that match a taxonomic branch, not a specific strain The probes should be tolerant to mutations that occur in environmental samples with many strains Classical tools don't work on big scales 18/67
  • 17. Design of probes for complex samples I designed and built a solution using a super-computer Calculation tool one day on 32 processors (one processor month) Resulting probes worked as expected They can be used on qPCR or in microarrays. 19/67
  • 18. Automatic Interpretation of Results using a Statistical Classification Model 20/67
  • 19. Publications The microarray was published in N. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass, P. Parada, Design and use of oligonucleotide microarrays for identification of Biomining microorganisms. Advanced Materials Research 71-73 (2009) 155-158. 21/67
  • 20. Patents The method and the probes have been patented in USA, Number: US 7 853 408 B2, Date: 14/12/2010; South Africa, Number: 2006/06828, Date: 26/03/2008; Australia, Number: 2006203551, Date: 15/09/2011; Mexico, Number: PXMX 32/2006, Date: November 2012. Peru, Number: PE 5838, Date: 29/10/2010; Chine, Number: 200810095172.6, Date: 2013; Chile, Number: DPI-660-2007, Date: 06/05/2013; Argentina, Number: AR056179 · · · · · · · · 22/67
  • 21. Functional genomics How does the bacteria work? To improve the process we need to see inside the black box. We sequenced the complete genome of 3 bacteria We paid over USD $150K. Today is USD $5K Hint: Sequence assembly requires a big computer. It does not work on a regular PC Acidithiobacillus ferrooxidans Acidithiobacillus thiooxidans Leptospirillum ferrooxidans · · · 23/67
  • 22. Modeling Metabolism We predict which genes code enzymes Each enzyme catalyzes a reaction, with a known stoichiometry Every reaction gives an equation All equations plus boundary conditions give model to predict metabolite concentration We can predict how the cell adapts to environmental changes 24/67
  • 23. Modeling Regulation From the genome sequence we can predict which genes code for transcription factors and they bind They form a putative regulatory network. But current methods produce too many false positives We expected ~4K regulations. We got 25K regulations. I integrate this model with microarray data to find the "most probable" regulatory network using a parsimony criterium 25/67
  • 24. Systems Biology beyond Bioinformatics A very active research area that aim to understand the cell as a system with complex interactions The focus is not on the genes, is on the genome The key is to understand networks regulatory metabolic signaling protein-protein-interaction · · · · 26/67
  • 25. The present Why Computers in Molecular Biology and Genetics?
  • 26. DNA is digital information All experimental values in science are measured with an observational error. (e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa) Except genetic sequences: Nucleotides are either A, C, T or G. There is no "average" or "intermediate case" So is natural to use computers and information theory to model DNA but there is another reason ... 28/67
  • 27. 29/67
  • 28. Science converges to Molecular Biology Physicists, mathematicians, computer scientist and engineers, turned their attention to molecular biology questions. They come looking with new eyes and creating new theoretical and practical tools. Molecular Biology has always interacted with other disciplines Just consider the word "Biochemistry" 30/67
  • 29. Internet makes Molecular Biology theory accessible to more people Before Internet times top science was accessible only to researchers with money to finding references took several weeks by regular mail Professors had the only copy of the textbooks · make complex experiments or buy expensive books and journals - - · · 31/67
  • 30. Today all journals are accessible on-line references are download in minutes at low cost experimental results of each article are also free · · free when the article is Open Access- · 32/67
  • 31. Anyone can analyze this data Structured data is easy to process to discover new knowledge. The software for this meta-analysis is also Open Source Scientist can adapt the program internal code to solve their specific question Anyone can download these programs without cost. If the analysis requires big computational power you can rent it at low cost 33/67
  • 32. You don't need your own super-computer You can rent Cloud computers Companies like Amazon.com and Google sell their spare computer power at low prices This enables researchers to carry computations that would be impossible otherwise. 34/67
  • 33. The World is Flat This democratization of knowledge provides an exciting challenge. Rich countries have no longer the monopoly of knowledge. We can be players in the big leagues, on a leveled surface. We can read the same books and the same articles, use the same machines and the same programs. Anyone could make the new scientific breakthrough, either in New York, New Delhi or Istanbul. But the same opportunity presents to everyone else. 35/67
  • 34. There are more PhD students than ever And many of them will be on Molecular Biology Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79. 36/67
  • 35. More players come to the game Emerging economies push up the number of researchers worldwide India graduates more than a million engineers each year. Many of them in biotechnology Egypt has 35.000 PhD students and Israel 10.000. Many of them will find jobs in Molecular Biology companies or academia Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84. 37/67
  • 36. How will we be different?
  • 37. Success of Molecular Biology generates Big Data Advances in molecular biology technology has produced They produce new generation sequencers microarrays mass spectrometers real-time PCR. · · · · reproducible experimental results in big volumes at low cost · · · 39/67
  • 38. Data production costs is falling National Human Genome Research Institute. http://genome.gov/sequencingcosts 40/67
  • 39. Extracting Information from Raw Data Surviving the Data Tsunami In a few years we passed from lack of data to excess of it We need to learn how to extract biological meaning from big volumes of data Classical methods are not enough What is significant? What is the "null hypothesis"? 41/67
  • 40. If we don't fully analyze our own experimental data, someone else will do And they will publish it
  • 41. The plan what we will teach
  • 42. Teaching "Introduction to Data Science" The students will learn how to handle experimental data how to communicate with scientists of other data-oriented disciplines how to produce publication quality reports with reproducible results How to get raw data, extracting relevant information, filter it using several selection criteria. How to store and retrieve it in efficient and useful ways. How to transform it, organize it, categorize it, display, show and understand the results. · · · · · · 44/67
  • 43. Teaching "Scientific Computing" Teach Python and BioPython to analyze, model, evaluate and predict the behavior of genomic and molecular biology entities. The students should be able to interact with high end servers, use command line tools and be comfortable in computing environments others than Microsoft Windows. Tools include Unix command line tools, SQL and the R statistical package. The student should be able to understand how computer networks work and what are their limitations. 45/67
  • 44. The idea is no to be experts on computers, but to have the concepts and language to work in interdisciplinary groups
  • 45. Let's start learning Data Science To test these ideas we start next week an Introduction to Data Science Workshop The mathematical tools can be explored together with the biological context, so they make sense and are easier to learn. I will give you a link at the end of this talk. If you are interested visit the webpage and send an email. after all, maybe I'm just crazy 47/67
  • 46. Every normal student is capable of good mathematical reasoning if attention is directed to activities of his interest “ ” Jean Piaget, 1976 Swiss psychologist and philosopher
  • 47. A Secret You can also learn at home Everything we will show is available on the Internet You just need to look for it But it is in English Translation takes too long Translated science is obsolete science 49/67
  • 49. It is hard to make predictions, especially about the future “ ” Danish proverb
  • 50. Molecular Biology has become mainstream Genomic tools are also used outside academia. Several companies provide "personalized DNA services". Both offer to trace ancestry and migrations of the human population. Any person can know which are his true origins. 23andMe, partially owned by Google. The Genographic project, created by the National Geographic Society and IBM. · · 52/67
  • 51. Molecular Biology will follow the path of computers Today PCR thermocyclers are expensive devices found in universities and research centers, very much like desktop computers were in the 70's and 80's. Nowadays computers are low-cost and found everywhere. Will the same happen with PCR? 54/67
  • 52. PCR future Today only a few companies produce PCR thermocyclers, just like smartphones such as the iPhone and Samsung. Nevertheless you can see them everywhere. And this is a big opportunity for creators of software applications. The value is in the apps. Ask Nokia or Blackberry 55/67
  • 53. A computer on every desk and in every home, all running Microsoft software “ ” Bill Gates, Microsoft’s founding mission.
  • 54. PCR is the new PC Gates set this goal in the late 70's, when it was not obvious if people would even see a computer in their lives. PCR technology is now in the same state that Personal Computers were in 1975. If PCR machines become inexpensive, then who will be making "software apps" for them? and there is "a PCR on every desk and home", in hospitals, restaurants and high schools, · · · · 57/67
  • 55. If PCR machines are available everywhere applications can be: Determining ancestry (e.g. race horses, farm animals, fishes) Detection of unwanted organisms Marker-assisted breeding Food quality control (e.g. in an university canteen) Security and control of Genetically Modified Organisms Polymorphism detection Clinical diagnosis Personalized medicine Police forensic analysis · · · · · · · · · 58/67
  • 56. Software for PCR the specific parameters of an application I think we should prepare our students to make these "apps". They should have easy access to low-cost thermocyclers, use them frequently and creatively. Then, like in the computer industry, they may create completely new applications that we cannot foresee now. DNA extraction protocols Primers design Amplification protocols Detection methods · · · · 59/67
  • 57. New tools for new science
  • 58. New Instruments trigger advances in Molecular Biology and in other sciences They are usually named according to their inventor Galileo created modern science when he made his own telescope Newton also invented a new kind of telescope, still used today Bunsen enabled spectrometry analysis with his burner Svedberg ultracentrifugue (16S) Sanger DNA sequencing method Southern blot method for specific DNA detection PCR to amplify DNA samples · · · · · · · 61/67
  • 59. Scientific Instrumentation I propose to create a course on "Scientific Instrumentation" using initially software tools. Making instruments is now "software", not craftsmanship. We can understand this with a biological analogy. Designs in digital files are like genes. 3D printers are like ribosomes, producing physical versions of the design. Online collaboration is like the evolution: designs are changed to improve their fitness. · · · 62/67
  • 60. It is not rocket science
  • 61. It is not heart surgery
  • 62.