Molecular Biology in the
Andrés Aravena, PhD - Istanbul University
Department of Molecular Biology and Genetics - 7 March 2015
My name is Andrés Aravena
Türkçe bilmiorum !
New Assistant Professor at Molecular Biology and Genomics
Mathematical Engineer, U. of Chile
PhD Informatics, U Rennes 1, France
PhD Mathematical Modeling, U. of Chile
not a Biologist
but an Applied Mathematician who can speak "biologist language"
I will speak about
The Past, Present and Future
Facts, opinion and guess
What I've done before
so you can understand why I'm here
What I'm doing now at Istanbul University
What I foresee from my "outsider" point of view
I've worked on
Big and small computers
Between 2003 and 2014 I was the chief research engineer
on the main bioinformatic group in my country
in the top research center (CMM)
in the top university (University of Chile)
of my country
Small country of ~17 million people
Universities ranks similar to Turkish ones
Spanish colony 500 years ago (so language is Spanish)
Independent Republic 200 years ago
First Latin American country to recognize Turkish republic
Everyday life very similar to Turkey
Chilean Economy: Exports
1st world producer of copper
2nd world producer of salmon
Fruits: peaches, grapes, apples,
Wine: exported worldwide
Official data for 2014
The natural question was
How can we improve these
using Molecular Biology and Bioinformatics?
Peach and Grapes
Gene expression analysis for industrial applications:
Peach: response to cold stress
Grapefruit: development related to seed and grape size (Sultaniye)
Farmed salmons are feed with cheap vegetal protein
But wild salmons eat animal protein
How is salmon's metabolism affected by the diet?
Which genes change their expression because the changes in food?
Gene expression analysis using
Fish selection for breeding using
microarrays (patent pending)
Salmon Genomic Sequence
... and sequencing of whole Salmo salar genome
(10 million dollars project)
Chilean wine travels long distances to final markets
Any yeast contamination means big economic loses
(people stops buying all Chilean brands)
Quality control is usually done growing samples for 3 days
But time is expensive: penalty for shipping delays
We designed qPCR method for rapid detection of yeast contamination
It is currently used by one major wine producer in Chile. It may be
sold to Roche.
molecular biology to extract copper
A little chemistry:
Copper is part of a compound, with Sulfur and Iron.
Ferric acid separates it.
Cu2S + 4Fe3+ 2Cu2+ + 4Fe2+ + S
Resulting Cu2+ is soluble and is recovered.
But all Fe3+ transforms to Fe2+ and reaction stops
There are bacteria that "eat" e- and keep the reaction going on
Fe2+ Fe3+ + e-
Why is it important?
The biological method is much better that the standard one
The goal is to understand and improve the involved bacteria so this
technology can be used extensively
Enables building new mines
It is like discovering petrol reserves for the country
Most of the results are still industrial secret
We had a research contract with the main mining company
State owned, big enough to pay for long term research
Few papers, many patents
Monitoring the presence of good bacteria
We need to control the "ecosystem" on the mine
Molecular Biology methods are fast, sensible and reliable
They can be used in place: metagenomic approach. No culture
Key problem: Design probes that match a taxonomic branch, not a
The probes should be tolerant to mutations that occur in
environmental samples with many strains
Classical tools don't work on big scales
Design of probes for complex samples
I designed and built a solution using a super-computer
Calculation tool one day on 32 processors (one processor month)
Resulting probes worked as expected
They can be used on qPCR or in microarrays.
Automatic Interpretation of Results
using a Statistical Classification Model
The microarray was published in
N. Ehrenfeld, A. Aravena, A. Reyes-Jara, N. Barreto, R. Assar, A. Maass,
P. Parada, Design and use of oligonucleotide microarrays for identification
of Biomining microorganisms. Advanced Materials Research 71-73
The method and the probes have been patented in
USA, Number: US 7 853 408 B2, Date: 14/12/2010;
South Africa, Number: 2006/06828, Date: 26/03/2008;
Australia, Number: 2006203551, Date: 15/09/2011;
Mexico, Number: PXMX 32/2006, Date: November 2012.
Peru, Number: PE 5838, Date: 29/10/2010;
Chine, Number: 200810095172.6, Date: 2013;
Chile, Number: DPI-660-2007, Date: 06/05/2013;
Argentina, Number: AR056179
How does the bacteria work?
To improve the process we need to see inside the black box. We
sequenced the complete genome of 3 bacteria
We paid over USD $150K. Today is USD $5K
Hint: Sequence assembly requires a big computer. It does not work
on a regular PC
We predict which genes code
Each enzyme catalyzes a reaction,
with a known stoichiometry
Every reaction gives an equation
All equations plus boundary
conditions give model to predict
We can predict how the cell adapts
to environmental changes
From the genome sequence we can predict which genes code for
transcription factors and they bind
They form a putative regulatory network.
But current methods produce too many false positives
We expected ~4K regulations. We got 25K regulations.
I integrate this model with microarray data to find the "most
probable" regulatory network using a parsimony criterium
A very active research area that aim to understand the cell as a
system with complex interactions
The focus is not on the genes, is on the genome
The key is to understand networks
Why Computers in Molecular
Biology and Genetics?
DNA is digital information
All experimental values in science are measured with an observational
(e.g. temperature is 10.2 ± 0.05°C, pressure is 101215 ± 125 Pa)
Except genetic sequences: Nucleotides are either A, C, T or G.
There is no "average" or "intermediate case"
So is natural to use computers and information theory to model DNA
but there is another reason ...
Science converges to Molecular Biology
Physicists, mathematicians, computer scientist and engineers, turned
their attention to molecular biology questions.
They come looking with new eyes and creating new theoretical and
Molecular Biology has always interacted with other disciplines
Just consider the word "Biochemistry"
Internet makes Molecular Biology theory
accessible to more people
Before Internet times
top science was accessible only to researchers with money to
finding references took several weeks by regular mail
Professors had the only copy of the textbooks
make complex experiments or
buy expensive books and journals
all journals are accessible on-line
references are download in minutes at low cost
experimental results of each article are also free
free when the article is Open Access-
Anyone can analyze this data
Structured data is easy to process to discover new knowledge.
The software for this meta-analysis is also Open Source
Scientist can adapt the program internal code to solve their specific
Anyone can download these programs without cost.
If the analysis requires big computational power you can rent it at low
You don't need your own super-computer
You can rent Cloud computers
Companies like Amazon.com and Google sell their spare computer
power at low prices
This enables researchers to carry computations that would be
The World is Flat
This democratization of knowledge provides an exciting challenge.
Rich countries have no longer the monopoly of knowledge.
We can be players in the big leagues, on a leveled surface.
We can read the same books and the same articles, use the same
machines and the same programs.
Anyone could make the new scientific breakthrough, either in New
York, New Delhi or Istanbul.
But the same opportunity presents to everyone else.
There are more PhD students than ever
And many of them will be on Molecular Biology
Cyranoski et al. 2011. “Education: The PhD Factory.” Nature 472: 276–79.
More players come to the game
Emerging economies push up the number of researchers worldwide
India graduates more than a million engineers each year. Many of
them in biotechnology
Egypt has 35.000 PhD students and Israel 10.000.
Many of them will find jobs in Molecular Biology companies or
Hays, Thomas. 2011. “PhDs: Israel Also Trains Plenty.” Nature 473 (7347). Nature Publishing Group: 284–84.
Success of Molecular Biology generates Big Data
Advances in molecular biology technology has produced
new generation sequencers
reproducible experimental results
in big volumes
at low cost
Data production costs is falling
National Human Genome Research Institute. http://genome.gov/sequencingcosts
Extracting Information from Raw Data
Surviving the Data Tsunami
In a few years we passed from lack of data to excess of it
We need to learn how to extract biological meaning from big volumes
Classical methods are not enough
What is significant? What is the "null hypothesis"?
If we don't fully analyze our own
experimental data, someone else
And they will publish it
Teaching "Introduction to Data Science"
The students will learn
how to handle experimental data
how to communicate with scientists of other data-oriented
how to produce publication quality reports with reproducible
How to get raw data, extracting relevant information, filter it using
several selection criteria.
How to store and retrieve it in efficient and useful ways.
How to transform it, organize it, categorize it, display, show and
understand the results.
Teaching "Scientific Computing"
Teach Python and BioPython to analyze, model, evaluate and predict
the behavior of genomic and molecular biology entities.
The students should be able to interact with high end servers, use
command line tools and be comfortable in computing environments
others than Microsoft Windows.
Tools include Unix command line tools, SQL and the R statistical
The student should be able to understand how computer networks
work and what are their limitations.
The idea is no to be experts on
computers, but to have the
concepts and language to work in
Let's start learning Data Science
To test these ideas we start next week an
Introduction to Data Science Workshop
The mathematical tools can be explored together with the biological
context, so they make sense and are easier to learn.
I will give you a link at the end of this talk.
If you are interested visit the webpage and send an email.
after all, maybe I'm just crazy
Every normal student is capable of good
mathematical reasoning if attention is
directed to activities of his interest
Jean Piaget, 1976
Swiss psychologist and philosopher
You can also learn at home
Everything we will show is available on the Internet
You just need to look for it
But it is in English
Translation takes too long
Translated science is obsolete science
It is hard to make predictions, especially
about the future
Molecular Biology has become mainstream
Genomic tools are also used outside academia.
Several companies provide "personalized DNA services".
Both offer to trace ancestry and migrations of the human population.
Any person can know which are his true origins.
23andMe, partially owned by Google.
The Genographic project, created by the National Geographic Society
Molecular Biology will follow the path of
Today PCR thermocyclers are expensive devices found in universities
and research centers, very much like desktop computers were in the
70's and 80's.
Nowadays computers are low-cost and found everywhere.
Will the same happen with PCR?
Today only a few companies produce PCR thermocyclers, just like
smartphones such as the iPhone and Samsung.
Nevertheless you can see them everywhere.
And this is a big opportunity for creators of software applications.
The value is in the apps. Ask Nokia or Blackberry
A computer on every desk and in every
home, all running Microsoft software
Microsoft’s founding mission.
PCR is the new PC
Gates set this goal in the late 70's, when it was not obvious if people
would even see a computer in their lives.
PCR technology is now in the same state that Personal Computers
were in 1975. If PCR machines become inexpensive,
then who will be making "software apps" for them?
and there is "a PCR on every desk and home",
and high schools,
If PCR machines are available everywhere
applications can be:
Determining ancestry (e.g. race horses, farm animals, fishes)
Detection of unwanted organisms
Food quality control (e.g. in an university canteen)
Security and control of Genetically Modified Organisms
Police forensic analysis
Software for PCR
the specific parameters of an application
I think we should prepare our students to make these "apps".
They should have easy access to low-cost thermocyclers, use them
frequently and creatively.
Then, like in the computer industry, they may create completely new
applications that we cannot foresee now.
DNA extraction protocols
New Instruments trigger advances in Molecular
and in other sciences
They are usually named according to their inventor
Galileo created modern science when he made his own telescope
Newton also invented a new kind of telescope, still used today
Bunsen enabled spectrometry analysis with his burner
Svedberg ultracentrifugue (16S)
Sanger DNA sequencing method
Southern blot method for specific DNA detection
PCR to amplify DNA samples
I propose to create a course on "Scientific Instrumentation" using
initially software tools.
Making instruments is now "software", not craftsmanship.
We can understand this with a biological analogy.
Designs in digital files are like genes.
3D printers are like ribosomes, producing physical versions of the
Online collaboration is like the evolution: designs are changed to
improve their fitness.