Call Girls Service Surat Samaira ❤️🍑 8250192130 👄 Independent Escort Service ...
Bda2015 tutorial-part1-intro
1. 16th December 2015
Genomics 3.0: Big Data in
Precision Medicine
Asoke K Talukder, Ph.D
InterpretOmics, Bangalore, India
17th December 2009
Big Data Analytics 2015
Hyderabad 16-18 December, 2015
2. 16th December 2015
Acknowledgement
• BDA2015 Technical committee
• Authors & Publishers making their articles Open
Access in the Web
• Open Source Software/Foundation
• Authors of Open Source & Open Domain software
• NCBI & other open domain databases
• Wikipedia & other sites that believe in Bhikshu
Economy
2
3. 16th December 2015 3
Disclaimer
• During my research for this tutorial, I have referred
many text and many presentations available in the
Web and obtained from various colleagues and
professionals. I tried to give credit to creators of
artifacts used in this presentation; however, if I
have missed credit citation to the original author,
that is undeliberate and unintentional. Such
omissions are regretted.
4. 16th December 2015
About the Speaker
• Dr. Asoke K. Talukder is a computer scientist – worked for
companies like Fujitsu-ICIM, Microsoft, Oracle, Informix, Digital,
Hewlett Packard, ICL, Sequoia, Northern Telecom, NEC,
KredietBank, iGate, Cellnext, etc. Dr. Asoke authored/edited six
books out of which two are translated in Chinese and published
many peer-reviewed research papers. He is recipient of many
international awards including All India Radio/Doordarshan award,
ICIM Professional Excellence Award, ICL Excellence Award, IBM
Solutions Excellence Award, Simagine GSMWorld Award etc. He
has been listed in “Who’s Who in the World”, “Who’s Who in
Science and Engineering”, and “Outstanding Scientists of 21st
Century”. He did M.Sc (Physics with Biophysics Major) and Ph.D in
Computer Science. He was the DaimlerChrysler Chair Professor at
IIIT, Adjunct Professor, Department of CSE, NIT Warangal and
Adjunct Faculty CE, NITK, Surathkal. He is Co-founder and Chief
Scientific Officer of InterpretOmics the Data Sciences and Systems
Biology company.
4
7. 16th December 2015
Structure of the Tutorial
• Introduction to Omic Sciences
• Omic Sciences Challenges
• Computational Biology
• Algorithms, & Data Mining in Biology
• Blood Biopsy – a case study
7
8. 16th December 2015
Goal of this Tutorial
• This tutorial will define the role of Big Data and
Data Sciences in biology and lifesciences. With the
help of chemistry and physics, we have some
understanding of biology. With advancement of
technology, our next leap in biology is becoming
possible. We need Mathematics and Computers to
solve grand challenges in Biology for better
understanding of life and understanding of
genomics – the building block of life. This will help
solve problems in life like diseases management or
management of food and environment
8
9. 16th December 2015
Leading causes of death (U.S., 1999)
number of % total
Rank Cause deaths deaths
1 heart disease 725,192 30.3
2 malignant neoplasm 549,192 23.0
3 cerebrovascular disease 167,366 7.0
4 chronic lower respiratory 124,181 5.2
5 accidents 97,860 4.1
6 diabetes mellitus 68,399 2.9
7 influenza, pneumonia 63,730 2.7
8 Alzheimer’s disease 44,536 1.9
9 nephritis & related 35,525 1.5
10 septicemia 30,680 1.3
11 … all other 2,391,39920.2
Source: National Vital Statistics Reports 49(11):1-87, 2001.
Classification of Disease
9
10. 16th December 2015
Genomics and World Health
• “It is now believed that the information generated by
genomics will, in the long-term, have major benefits for the
prevention, diagnosis and management of many diseases
which hitherto have been difficult or impossible to control.
These include communicable and genetic diseases,
together with other common killers or causes of chronic
illhealth, including cardiovascular disease, cancer, diabetes,
the major psychoses, dementia, rheumatic disease,
asthma, and many others.”
– Genomics and World Health, Report of the Advisory
Committee on Health Research, presented to Director
general of WHO on 20 December 2001; Ref - Jeffrey D.
Sachs, WHO, Geneva, 2002
10
11. 16th December 2015
Genomics and Food Chain
• To develop high nutrient food and high yield
crop, we need to understand the genetic
structure of plants and the disease vectors.
• We also need GMO (Genetically Modified
Organisms) crops that can grow and
produce in hostile environments like drought
affected or high salineted areas
11
12. 16th December 2015
Genomics and Energy
• All our energy come from fossil fuels like
coal and petroleum, which has been
converted from some living biological
organism to fuel for millions of years
• Can we culture organisms that will reduce
this cycle to few years instead of millions of
years
• Can we generate bio-fuels that will be
economic and commercially viable?
12
13. 16th December 2015
Genomics and Environment
• Can we culture organisms that will help the
carbon cycle and reduce the CO2?
• Can we culture organisms or plants that will
desalinate the sea water and produce sweet
drinking water?
• Can we culture organisms or plans that will
clean the environment and accelerate the
bio-degradability of waste?
13
15. 16th December 2015
Landmark Discoveries
• 1941 Genes code for single proteins
• 1944 Proof that DNA carries genetic information
• 1949 The concept of sickle cell anaemia as a “molecular disease”
• 1953 Structure of insulin determined
• 1953 Multistage mutational theory of cancer by Nordling
• 1953 Field Cancerization theory of cancer
• 1953 Structure of Neuclic Acid and DNA determined
• 1956 Monogenic disease due to a single amino acid substitution of the β-chain of haemoglobin
• 1960 The X-ray crystallographic structure of haemoglobin
• 1961 The genetic code, messenger RNA, gene regulation
• 1972 Recombinant DNA, cloning and gene isolation
• 1974 Direct demonstration of a human gene deletion
• 1975 Southern blotting*
• 1976 Proto-oncogenes
• 1977 DNA sequencing
• 1978 Human gene library
• 1979 Restriction fragment length polymorphism used for prenatal diagnosis Stop codon mutation
demonstrated in human globin messenger RNA Cellular oncogenes
• 1979–81 Human genes cloned and sequenced
• 1985 “Disease genes” isolated by positional cloning Polymerase chain reaction (PCR)
• 2000 The Human Genome Project — completion of 90% draft
15
16. 16th December 2015
Questions Biologists Often Ask
Biologists need answers to a number of questions
How can we get all the knowledge that are contained in a
given sequence or structural data
analysis
prediction of certain properties
How can software tools help in designing drugs and
cure diseases based on available data
Tools for early drug discovery process
Tools to predict and treat before they manifest
16
17. 16th December 2015
Omic Sciences
• Genomics – is the "basic recipe" book defining an individual’s
characteristics, or that of a population or of a living species
• Transcriptomics – is the science that studies how the "basic recipes" are
translated into a final product: the proteins
• Proteomics – is the study of all proteins produced by the genome
expression
• Metabolomics – is the the study of interactions between proteins and all
"metabolites" (sugar, fat, biomolecules, etc.) – of a cell or a biological entity
• Physiomics – is the study of interaction with physiology
• Fluxomics – is the study of dynamic changes of molecules within a cell over
time.
• Sociomics – is the study of all social and cultural ecosystems that interact
with the genomes
• Epigenomics – is the influence of the environmental imprint on the "coat"
that covers the genetic material in the genome
• Phenomics – is the study of phenotype
• Bibliomics – is the study of literature
17
18. 16th December 2015
Genomics
• Genomics is the study of the genomes of organisms. The
field includes intensive efforts to determine the entire DNA
sequence of organisms and fine-scale genetic mapping
efforts. The field also includes studies of intragenomic
phenomena such as heterosis, epistasis, pleiotropy and
other interactions between loci and alleles within the
genome. In contrast, the investigation of the roles and
functions of single genes is a primary focus of molecular
biology or genetics and is a common topic of modern
medical and biological research. Research of single genes
does not fall into the definition of genomics unless the aim
of this genetic, pathway, and functional information analysis
is to elucidate its effect on, place in, and response to the
entire genome's networks.
18
19. 16th December 2015
Gene
• With the exception of viruses, which are intracellular parasites, living
organisms are divided into two general classes. First, there are
eukaryotes whose cells have a complex compartmentalized internal
structure; they comprise algae, fungi, plants and animals. Second, there
are prokaryotes, single-celled microorganisms with a simple internal
organization, which comprise bacteria and related organisms. Genetic
information is transferred from one generation to the next by subcellular
structures called chromosomes. Prokaryotes usually have a single
circular chromosome, while most eukaryotes have more than two and in
some cases up to several hundred. For example, in humans there are
23 pairs; one of the pair is inherited from each parent. Twenty-two pairs
are called autosomes and one pair are called sex chromosomes. The
latter are designated X and Y; females have two X chromosomes (XX)
while males have an X and Y (XY).
19
20. 16th December 2015
Genetics Vs Genomics
• Genetics is Biology
• Genomics is Statistical Data Mining
• Genetics is Confirmatory
• Genomics is Expolratory
• Genetics is hypothesis driven
• Genomics is hypothesis creating
20
21. 16th December 2015
Genomics 3.0
• Genomics 1.0: started with the Human genome project, used by
academics and researchers to understand the disease dynamics and
the genotype phenotypic association of a living system at a time when
clinicians treat the symptom of a disease (phenotype)
• Genomics 2.0: entered the clinic and pharmaceutical companies
through translational genomics. It is used today as a tool for diagnosis
of non-communicable and genetic diseases. Clinicians use Genomics
2.0 to not just treat symptoms; but, to treat the disease
• Genomics 3.0: will deal with holistic precision medicine and will be
driven by big-data genomic analytics of the 21st Century. Genomics 3.0
will be used for asymptomatic disease onset. It will not just treat a
disease, but treat a patient and cure a disease
23. 16th December 2015
What is a System?
• A system is a whoesome entity made out of set of interacting or
interdependent components forming an integrated whole object
• It can be collection of a set of elements (often called
'components') and relationships which are different from
relationships of the set or its elements to other elements or sets
• Interdependent components may have some property or even
cannot exibit any property outside the wholesome object
• These components when combined, it becomes a wholesome
system with a static and dynamic property completely different
from the properties of individual components
23
24. 16th December 2015
Systems Biology
• Systems Biology Is about integration of modeling,
simulation, experimentation, databases, and
bioinformatic approaches
• Predictive understanding of microbial and plant
systems for advancing for clinical medicine, high
yield crops, hight nutriant produce, biofuel,
biological sontrol on carbon-cycling, cleaning up
contaminated environment etc.
• integration of modeling, simulation, experimentation,
and bioinformatic approaches
24
25. 16th December 2015
The Synergy
Genomics
Transcriptomics
Proteomics
Metabolomics
Fluxomics
Sociomics
Epigenomics
Systems Biology
........
Bibliomics
25
26. 16th December 2015
Model
• Scientific modelling is an activity to make a particular function
or entity of the real world easier to define, quantify, visualize,
understand, or simulate by referencing it to existing and
usually commonly accepted knowledge
• A simulator should be able to model the actual system in
Reduced or Enlarged Space & Time
• Key issues in simulation include representation of the true
characteristics, function, and behaviours of the original
system in a space that can be manipulated or changed as
desired
• However, in many cases the similarity is only approximate or
even intentionally distorted.
26
29. 16th December 2015
Deductive and Inductive Science
Ref: Sylvia Wassertheil-Smoller, Biostatistics and Epidemiology, Springer, 2003
Physical Science
Law of Gravitation,
Newton's Law of Motion
E = mC2
Chemical/Molecular Properties
Statistics
Biological Phenomenon
Simulation (Model fitting)
Wireless Mobile Communication
Clinical Trial
29
30. 16th December 2015
Technical Attractions of
Simulation
• Ability to compress time, expand time
• Ability to control sources of variation
• Avoids errors in measurement
• Ability to stop and review
• Ability to restore system state
• Facilitates replication
• Modeler can control level of detail
Discrete-Event Simulation: Modeling, Programming, and Analysis by G. Fishman, 2001
30
33. 16th December 2015
Will impact the health care system significantly:
• Pharmaceuticals
• Biotechnology
• Healthcare industry
• Health insurance
• Medicine--diagnostics, therapy, prevention, wellness
• Nutrition
• Assessments of environmental toxicities
• Academia and medical schools
Precision Medicine Will Transform
the Health Care Industry
Healthcare
System
New ideas need new
organizational structures
33
36. 16th December 2015
• Based on X-Ray data from Rosliand Franklin, recognized that the 3.4
Angstrom period suggested a double helix.
• Based on Chargaff’s rule ([A]=[T] and [C]=[G]), recognized that the
two strands must be held together by H-bonds between purine and
pyrimidine pairs.
• Accepted the assumption that nucleotides were held together by
phosphodiester bonds with phosphate as the chain backbone.
Watson-Crick Model of DNA
36
37. 16th December 2015
• James D. Watson and Francis
Crick who, using x-ray data
collected by Rosalind Franklin,
proposed the double helix
structure of the DNA molecule in
1953. Their article, Molecular
Structure of Nucleic Acids: A
Structure for Deoxyribose
Nucleic Acid, is celebrated for its
treatment of the B form of DNA
(B-DNA), and as the source of
Watson-Crick base pairing of
nucleotides. They with Maurice
Wilkins, were awarded the Nobel
Prize in Physiology or Medicine
in 1962.
Watson & Crick
37
39. 16th December 2015
Interactions within a Cell
Animal Plant
Nucleus
Ribosome
Endoplasmic Reticulum
Golgi Body
Ribosome: site where proteins are made
39
42. 16th December 2015
Watson-Crick Model of DNA
• Chains were in an antiparallel
orientation
• Bases stacked perpendicular
to helix axis and associate
through hydrogen bonds
• Each turn is 34 Angstroms =
10 bases/turn
• Major and minor grooves
within the helix
• Double helix has a 20
Angstrom diameter
42
44. 16th December 2015
Nucleotide Base Pairing
Nucleotides pair by forming H-bonds between bases. The
pairing is the basis for the antiparallel strands associating with
each other.
44
46. 16th December 2015
Proteins play key roles in a living
system
• Three examples of protein functions
– Catalysis:
Almost all chemical reactions in a
living cell are catalyzed by protein
enzymes.
– Transport:
Some proteins transports various
substances, such as oxygen, ions, and
so on.
– Information transfer:
For example, hormones.
Alcohol
dehydrogenase
oxidizes alcohols
to aldehydes or
ketones
Haemoglobin
carries oxygen
Insulin controls
the amount of
sugar in the
blood
46
47. 16th December 2015
Amino acid: Basic unit of protein
COO-NH3
+ C
R
H
An amino acid
Different side chains,
R, determin the
properties of 20
amino acids.
Amino group Carboxylic
acid group
47
48. 16th December 2015
Proteins are linear polymers of
amino acids
R1
NH3
+ C CO
H
R2
NH C CO
H
R3
NH C CO
H
R2
NH3
+
C COO
ー
H
+
R1
NH3
+
C COO
ー
H
+
H2OH2O
Peptide
bond
Peptide
bond
The amino acid
sequence is called as
primary structure
A A
F
NG
G
S
T
S
D
K
A carboxylic acid
condenses with an amino
group with the release of a
water
48
49. 16th December 2015
Gene is protein’s blueprint,
genome is life’s blueprint
Gene
GenomeDNA
Protein
Gene Gene
Gene
Gene
Gene
Gene
GeneGene
GeneGene
GeneGene
Gene
Gene
Protein Protein
Protein
Protein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
49
50. 16th December 2015
Gene is protein’s blueprint,
Genome is life’s blueprint
Genome
Gene Gene
Gene
Gene
Gene
Gene
GeneGene
GeneGene
GeneGene
Gene
Gene
Protein Protein
Protein
Protein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
Protein
Protein
Protein
Glycolysis network
50
51. 16th December 2015
Amino acid sequence is
encoded by DNA base sequence
in a gene
Thirdletter
G
A
C
T
G
A
C
T
G
A
C
T
G
A
C
T
Gly
Arg
Ser
Arg
Trp
Stop
Cys
GACT
GGGGAGGCGGTG
GGA
Glu
GAAGCAGTA
GGCGACGCCGTC
GGT
Asp
GAT
Ala
GCT
Val
GTT
G
AGGAAGACGMetATG
AGA
Lys
AAAACAATA
AGCAACACCATC
AGT
Asn
AAT
Thr
ACT
Ile
ATT
A
CGGCAGCCGCTG
CGA
Gln
CAACCACTA
CGCCACCCCCTC
CGT
His
CAT
Pro
CCT
Leu
CTT
C
TGGTAGTCGTTG
TGA
Stop
TAATCA
Leu
TTA
TGCTACTCCTTC
TGT
Tyr
TAT
Ser
TCT
Phe
TTT
T
Firstletter
Second letter
51
52. 16th December 2015
Our life is maintained by
molecular network systems
Molecular network
system in a cell
(From ExPASy Biochemical Pathways; http://www.expasy.org/cgi-bin/show_thumbnails.pl?2)
52
56. 16th December 2015
End of Part I & II
InterpretOmics
Office: Shezan Lavelle, 5th Floor,
#15 Walton Road, Bengaluru 560001
Lab: #329, 7th Main, HAL 2nd Stage,
Indiranagar, Bengaluru 560008
Phone: +91(80)46623800