OMICS ERA
Dr. Hetalkumar Panchal
Associate Professor
Gujarat Agricultural Biotechnology Institute (GABI)
Navsari Agriculture University,
Athwa Farm, Surat – 395007
swamihetal@gmail.com
29th Refresher Course :Bio-Sciences and Bio-Enginering (ID)
(02/06/2014 to 2/06/2014)
UGC-Academic Staff College,
Sardar Patel University,
Vallbh Vidyangar -38120, Dist. Anand, (Gujarat)
• OMICS
– The term ‘‘omic’’ is derived from the Latin suffix
‘‘ome’’ meaning mass or many. Thus, OMICS
involve a mass (large number) of measurements
per endpoint. (Jackson et al., 2006)
• Integration of OMICS data
– Efficient integration of data from different OMICS
can greatly facilitate the discovery of true causes
and states of disease, mostly done by softwares
(Andrew et al., 2006).
What is ‘omics’?
What is ‘omics’?
• In biological context , suffix –omics is used to
refer to the study of large sets of biological
molecules (Smith et al., 2005)
• The realization that DNA is not alone regulate
complex biological processes (as a result of
HGP, 2001), triggered the rapid development of
several fields in molecular biology that together
are described with the term OMICS.
• The OMICS field ranges from
– Genomics (focused on the genome)
– Proteomics (focused on large sets of proteins, the
proteome)
– Metabolomics (focused on large sets of small
molecules, the metabolome).
TYPES OF OMICS
Genomics
Computational genomics
Epigenomics
Functional genomics
Immunomics
Metagenomics
Pathogenomics
Regenomics
Personal genomics
Proteomics
Psychogenomics
4
GENOMICS
• The field of genomics has been divided into 3 major
categories.
– Genotyping (focused on the genome sequence),
• The physiological function of genes and the elucidation of the
role of specific genes in disease susceptibility (Syvanen, 2001)
– Transcriptomics (focused on genomic expression)
• The abundance of specific mRNA transcripts in a biological
sample is a reflection of the expression levels of the
corresponding genes (Manning et al., 2007)
– Epigenomics (focused on epigenetic regulation of
genome expression)
• Study of epigenetic processes (expression activities not involving
DNA) on a large (ultimately genome-wide) scale (Feinberg, 2007)
GENOTYPING
• Goal
– Identification of the physiological function of genes
– Role of specific genes in disease susceptibility (syvanen et al., 2001)
• Common Parameter used
– Among different variations (insertions, deletions, SNPs, etc.), single
nucleotide polymorphisms (SNPs) are the most commonly investigated
(Sachidanandam et al., 2001) and can be used as markers for diseases.
– Tag SNPs (informative subset of SNPs) and fine mapping are further
used to identify true cause of phenotype (patil et al., 2001).
• Application
– Identification of genes associated with disease
• Recent improvement in genotyping
– Array-based genotyping techniques, allowing the simultaneous
assessment (up to 1 million SNPs) per assay, leads to the genotyping of
entire genome known as genome-wide association studies (GWAS)
Jelly et al., 2010)
TRANSCRIPTOMICS
• Gene expression profiling
– The identification and characterization of the mixture of mRNA that is
present in a specific sample.
• Principle
– The abundance of specific mRNA transcripts in a biological sample
is a reflection of the expression levels of the corresponding genes
(Manning et al., 2007).
• Application
– To associate differences in mRNA mixtures originating from different
groups of individuals to phenotypic differences between the groups
(Nachtomy et al., 2007).
• Challenge
– The transcriptome in contrast to the genome is highly variable over
time, between cell types and environmental changes (Celis et al.,
2000).
EPIGENOMICS
• Epigenetic processes
– Mechanisms other than changes in DNA sequence that cause
effect in gene transcription and gene silencing30-32.
– Number of mechanisms of epigenomics but is mainly based on
two mechanisms, DNA methylation and histone modification28 33-
39.
– Recently RNAi has acquired considerable attention31 40 41.
• Goal
– The focus of epigenomics is to study epigenetic processes on a
large (ultimately genome-wide) scale to assess the effect on
disease28 29.
• Association with disease
– Hypermethylation of CpG islands located in promoter regions of
genes is related to gene silencing. 28 36. Altered gene silencing
plays a causal role in human disease31 34 37 38 42.
– Histone proteins are involved in the structural packaging of DNA
in the chromatin complex. Post translational histone
modifications such as acetylation and methylation are believed
to regulate chromatin structure and therefore gene expression34
37
PROTEOMICS
• Proteomics provides insights into the role proteins in biological systems. The
proteome consists of all proteins present in specific cell types or tissue and
highly variable over time, between cell types and will change in response to
changes in its environment, a major challenge (Fliser et al., 2007).
• The overall function of cells can be described by the proteins (intra- and inter-
cellular )and the abundance of these proteins (Sellers et al., 2003)
• Although all proteins are directly correlated to mRNA (transcriptome) , post
translational modifications (PTM) and environmental interactions impede to
predict from gene expression analysis alone (Hanash et al., 2008)
• Tools for proteomics
– Mainly two different approaches that are based on detection by
• mass spectrometry (MS) and
• protein microarrays using capturing agents such as antibodies.
• Major focuses
– the identification of proteins and proteins interacting in protein-complexes
– Then the quantification of the protein abundance. The abundance of a specific protein is
related to its role in cell function (Fliser et al., 2007)
METABOLOMICS
• The metabolome consists of small molecules (e.g.
lipids or vitamins) that are also known as metabolites
(Claudino et al., 2007).
• Metabolites are involved in the energy transmission in
cells (metabolism) by interacting with other biological
molecules following metabolic pathways.
• Metabolic phenotypes are the by-products of
interactions between genetic, environmental, lifestyle
and other factors (Holmes et al., 2008).
• The metabolome is highly variable and time
dependent, and it consists of a wide range of chemical
structures.
• An important challenge of metabolomics is to acquire
qualitative and quantitative information with
preturbance of environment (Jelly et al., 2010)
METABOLITES, METABOLOME & METABONOMICS
Metabolites are the intermediates and products of metabolism.
Within the context of metabolomics, a metabolite is usually
defined as any molecule less than 1 kDa in size.
Metabolome refers to the complete set of small-molecule
metabolites (such as metabolic intermediates, hormones and
other signaling molecules, and secondary metabolites) to be
found within a biological sample.
The word was coined in analogy with transcriptomics and
proteomics; like the transcriptome and the proteome, the
metabolome is dynamic, i.e. changing from second to second.
Metabonomics is defined as "the quantitative measurement of
the dynamic multiparametric metabolic response of living
systems to pathophysiological stimuli or genetic modification".
11
METAGENOMICS,
Metagenomics is the study of
metagenomes, genetic material recovered
directly from environmental samples. The
broad field may also be referred to as
environmental genomics, ecogenomics or
community genomics.
12
COMPUTATIONAL GENOMICS
Computational genomics (often referred to as
Computational Genetics) refers to the use of
computational and statistical analysis to decipher
biology from genome sequences and related
data,[1] including both DNA and RNA sequence as
well as other "post-genomic" data (i.e. experimental
data obtained with technologies that require the
genome sequence, such as genomic DNA
microarrays). These, in combination with
computational and statistical approaches to
understanding the function of the genes and
statistical association analysis, this field is also often
referred to as Computational and Statistical
Genetics/genomics.
13
EPIGENETICS
Genomic modifications that alter gene
expression that cannot be attributed to
modification of the primary DNA sequence
and that are heritable mitotically and
meiotically are classified as epigenetic
modifications. DNA methylation and histone
modification are among the best
characterized epigenetic processes
14
FUNCTIONAL GENOMICS
Functional genomics is a field of molecular biology that
attempts to make use of the vast wealth of data
produced by genomic projects (such as genome
sequencing projects) to describe gene (and protein)
functions and interactions. Unlike genomics, functional
genomics focuses on the dynamic aspects such as
gene transcription, translation, and protein–protein
interactions, as opposed to the static aspects of the
genomic information such as DNA sequence or
structures. Functional genomics attempts to answer
questions about the function of DNA at the levels of
genes, RNA transcripts, and protein products. A key
characteristic of functional genomics studies is their
genome-wide approach to these questions, generally
involving high-throughput methods rather than a more
traditional “gene-by-gene” approach.
15
IMMUNOMICS
Immunomics is the study of immune system
regulation and response to pathogens using
genome-wide approaches. With the rise of
genomic and proteomic technologies,
scientists have been able to visualize
biological networks and infer interrelationships
between genes and/or proteins; recently, these
technologies have been used to help better
understand how the immune system functions
and how it is regulated.
16
PATHOGENOMICS
Pathogen infections are among the leading causes of
infirmity and mortality among humans and other animals
in the world.[1] Until recently, it has been difficult to
compile information to understand the generation of
pathogen virulence factors as well as pathogen
behaviour in a host environment. The study of
Pathogenomics attempts to utilize genomic and
metagenomics data gathered from high through-put
technologies (e.g. sequencing or DNA microarrays), to
understand microbe diversity and interaction as well as
host-microbe interactions involved in disease states.
The bulk of pathogenomics research concerns itself with
pathogens that affect human health; however, studies
also exist for plant and animal infecting microbes.
17
REGENOMICS
Regenomics represents the merger of two fields of
scientific endeavor: Regenerative medicine[1] and
genomics.[2][3][4] New technologies to reprogram aged
somatic cells back to pluripotency and to restore
telomere length are currently used in research in
regenerative medicine,[5] though FDA-approved cellular
therapies using reprogrammed cells are currently not
available in the United States.[6] The culture and
banking of somatic cells also allows the parallel
sequencing of their nuclear DNA to provide individuals
with potentially valuable information for guiding them in
lifestyle choices, but also one day, potentially in
preventative strategies where cell types are made in
advance for high risk categories of disease, i.e.
preparing cardiac progenitor cells for individuals at high
risk for heart disease.
18
PERSONAL GENOMICS
Personal genomics is the branch of genomics concerned with
the sequencing and analysis of the genome of an individual.
The genotyping stage employs different techniques, including
single-nucleotide polymorphism (SNP) analysis chips (typically
0.02% of the genome), or partial or full genome sequencing.
Once the genotypes are known, the individual's genotype can
be compared with the published literature to determine
likelihood of trait expression and disease risk.
Use of personal genomics in predictive and precision
medicine[edit]
Predictive medicine is the use of the information produced by
personal genomics techniques when deciding what medical
treatments are appropriate for a particular individual. Precision
medicine is focused on "a new taxonomy of human disease
based on molecular biology“.
19
APPLICATION OF DIFFERENT OMICS
Stress-responsive transcription Factors DataBase
(STIFDB)
Database
for Annotation,
Visualization
and Integrated
Discovery (DAVID )
http://caps.ncbs.res.in/stifdb2/
http://david.abcc.ncifcrf.gov/
'OMICS' DATA REPOSITORIES
I. SEQUENCE SIMILARITY SEARCH
Find a protein sequence: text search
Based on Pair-Wise Comparisons
BLOSUM scoring matrix
PAM scoring matrix
Dynamic Programming Algorithms
Global Similarity: Needleman-Wunsch
(GAP/BestFit)
Local Similarity: Smith-Waterman (SSEARCH)
Heuristic Algorithms (Sequence Database Searching)
FASTA: Based on K-Tuples (2-Amino Acid)
BLAST: Triples of Conserved Amino Acids
Gapped-BLAST: Allow Gaps in Segment Pairs
(NREF)
PHI-BLAST: Pattern-Hit Initiated Search (NCBI)
PSI-BLAST: Iterative Search (NCBI) 27
SEQUENCE SEARCH BY TEXT OR UNIQUE ID
28
Entrez (http://www.ncbi.nlm.nih.gov/Entrez/)
(http://pir.georgetown.edu/pirwww/search/textsearch.html)
PAIR-WISE
COMPARISONS
29
Scoring matrix
Global and local
Similarity: Dynamic
Programming
(Needleman-
Wunsch,
Smith-Waterman)
(http://www.ebi.ac.uk/emboss/align/)
FASTA SEARCH
30
(http://www.ebi.ac.uk/fasta33/)
(http://pir.georgetown.edu/pirwww/search/fasta.html)
GAPPED-BLAST SEARCH
31
(http://pir.georgetown.edu/pirwww/search/pirnref.sh
tml)
(http://www.ncbi.nlm.nih.gov/BLAST/)
A BLAST
Result
THE DIFFERENT VERSIONS OF BLAST
PSI-BLAST ITERATIVE SEARCH
34
(http://www.ncbi.nlm.nih.gov/BLAST/)
PSI-BLAST
35
II. FAMILY CLASSIFICATION METHODS
Multiple Sequence Alignment and Phylogenetic Analysis
ClustalW Multiple Sequence Alignment
Alignment Editor & Phylogenetic Trees
Searches Based on Family Information
PROSITE Pattern Search
Motif and Profile Search
Hidden Markov Model (HMMs)
36
MULTIPLE SEQUENCE ALIGNMENT
37
ClustalW
(http://pir.georgetown.edu/pirwww/search/multaln.html)
ALIGNMENT EDITOR (JALVIEW)
38
(http://www.ebi.ac.uk/clustalw/)
ALIGNMENT EDITOR (GENEDOC)
39
(http://www.psc.edu/biomed/genedoc/)
PHYLOGENETIC ANALYSIS
40
Tree Programs: (http://evolution.
genetics.washington.edu/phylip.html) Tree Searches: (http://pauling.
mbu.iisc.ernet.in/~pali/index.html)
PHYLOGENETIC TREES
(IGFBP SUPERFAMILY)
41
(Radial Tree)
(Phylogram)
PROSITE PATTERN SEARCH
42
(http://pir.georgetown.edu/pirwww/search/patmatch.html)
PROFILE SEARCH
43
(http://bmerc-www.bu.edu/bioinformatics/profile_request.html)
HIDDEN MARKOV MODEL SEARCH
44
(http://www.sanger.ac.uk/Software/Pfam/search.shtml)
(http://smart.em
bl-
heidelberg.de)
III. STRUCTURAL PREDICTION METHODS
Signal Peptide: SIGFIND, SignalP
Transmembrane Helix: TMHMM, TMAP
2D Prediction (a-helix, b-sheet, Coiled-coils): PHD,
JPred
3D Modeling: Homology Modeling (Modeller, SWISS-
MODEL), Threading, Ab-initio Prediction
45
STRUCTURE
PREDICTION:
A GUIDE
46
(http://speedy.embl-
heidelberg.de/gtsp/fl
owchart2.html)
PROTEIN
PREDICTIO
N SERVER
47
(http://www.cbs.d
tu.dk/services/)
SIGNAL PEPTIDE PREDICTION
48
(http://www.stepc.gr/~synaptic/sigfind.html)
(http://www.cbs.dtu.dk/services/SignalP-2.0)
TRANSMEMBRANE HELIX
49(http://www.cbs.dtu.dk/services/TMHMM/)
PROTEIN STRUCTURE PREDICTION
50
(http://cmgm.stanford.edu/WWW/www_predict.html)
(http://restools.sdsc.edu/biotools/biotools9.html)
STRUCTURE PREDICTION SERVER
51
(http://cubic.bioc.columbia.edu/predictprot
ein/)
(http://www.compbio.dundee.
ac.uk/WWW_Servers/JPred/jp
red.html)
3D-MODELLING
52
(http://www.salilab.org/modeller/modeller.html)
(http://www.expas
y.ch/swissmod/S
WISS-
MODEL.html)
IV. PROTEIN FAMILY DATABASES
Whole Proteins
PIR: Superfamilies and Families
COG (Clusters of Orthologous Groups) of Complete
Genomes
ProtoNet: Automated Hierarchical Classification of Proteins
Protein Domains
Pfam: Alignments and HMM Models of Protein Domains
SMART: Protein Domain Families
Protein Motifs
PROSITE: Protein Patterns and Profiles
BLOCKS: Protein Sequence Motifs and Alignments
PRINTS: Protein Sequence Motifs and Signatures
Integrated Family Databases
iProClass: Superfamilies/Families, Domains, Motifs, Rich
Links
InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom,
SMART
53
PROTEIN CLUSTERING
54
(http://www.ncbi.nlm.nih.gov/COG/)
PROTEIN DOMAINS
55
Pfam (http://www.sanger.ac.uk/Software/Pfam/)
SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)
PROTEIN MOTIFS
56
PROSITE is a database of protein families and domains.
It consists of biologically significant sites, patterns and
profiles. (http://www.expasy.ch/prosite/)
INTEGRATED FAMILY CLASSIFICATION
InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom,
Pfam, SMART, and TIGRFAMs, PIRSF.
(http://www.ebi.ac.uk/interpro/search.html)
57
V. DATABASES OF PROTEIN FUNCTIONS
Metabolic Pathways, Enzymes, and Compounds
Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-
IUBMB)
KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways
LIGAND (at KEGG): Chemical Compounds, Reactions and
Enzymes
EcoCyc: Encyclopedia of E. coli Genes and Metabolism
MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)
WIT: Functional Curation and Metabolic Models
BRENDA: Enzyme Database
UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways
Klotho: Collection and Categorization of Biological Compounds
Cellular Regulation and Gene Networks
EpoDB: Genes Expressed during Human Erythropoiesis
BIND: Descriptions of interactions, molecular complexes and
pathways
DIP: Catalogs experimentally determined interactions
between proteins
RegulonDB: Escherichia coli Pathways and Regulation
58
KEGG METABOLIC & REGULATORY PATHWAYS
59
(http://www.genome.ad.jp/dbget-
bin/show_pathway?hsa00590+874)
KEGG is a suite of databases and associated software, integrating our current
knowledge
on molecular interaction networks, the information of genes and proteins, and of
chemical
compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)
BIOCYC (ECOCYC/METACYC METABOLIC PATHWAYS)
60
The BioCyc Knowledge Library is a collection of
Pathway/Genome
Databases (http://biocyc.org/)
61
PROTEIN-PROTEIN INTERACTIONS: DIP
(http://dip.doe-mbi.ucla.edu/)
PROTEIN-PROTEIN INTERACTION: BIND
62
(http://www.bind.ca/)
BIOCARTA CELLULAR PATHWAYS
63
(http://www.biocarta.com/index.asp)
VI. DATABASES OF PROTEIN STRUCTURES
Protein Structure and Classification
PDB: Structure Determined by X-ray Crystallography and
NMR
CATH: Hierarchical Classification of Protein Domain
Structures
SCOP: Familial and Structural Protein Relationships
FSSP: Protein Fold Family Database
Protein Sequence-Structure Relationship
PIR-NRL3D: Protein Sequence-Structure Database
PIR-RESID: Protein Structure/Post-Translational
Modifications
HSSP: Families and Alignments of Structurally-Conserved
Regions
64
PDB STRUCTURE DATA
65
(http://www.rcsb.org/pdb/)
PDBSUM:
66
Summary and Analysis
(http://www.biochem.u
cl.ac.uk/bsm/pdbsum)
67
PDB: EXPERIMENTAL 3D STRUCTURE
REPOSITORY
(http://www.rcsb.org/pdb/)
Rat gamma-crystallin,
chain A, B.
Can you do a
text search at
PIR to find this?
68
PDBSUM:
Summary and Analysis
(http://www.ebi.ac.uk/thornton-
srv/databases/pdbsum/)
Search 3-D structure summary
2-D structure
69
PROTEIN STRUCTURAL CLASSIFICATION
CATH: Hierarchical domain
classification of protein
structures (http://www.biochem.
ucl.ac.uk/bsm/cath_new/)
PROTEIN
STRUCTURAL
CLASSIFICATION
70
CATH: Hierarchical
domain classification of
protein structures
(http://www.biochem.
ucl.ac.uk/bsm/cath_new/)
PROTEIN STRUCTURAL CLASSIFICATION
71
(http://scop.mrc-lmb.
cam.ac.uk/scop/)
The SCOP database aims to provide a detailed and comprehensive
description of the structural and evolutionary relationships between
all proteins whose structure is known, including all entries in the PDB.
VII. PROTEOMIC RESOURCES
72
GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed
genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/)
PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/
pep/): Summarized analyses of protein sequences
Proteome BioKnowledge Library: (http://www.proteome.com): Detailed
information on human, mouse and rat proteomes
Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online
application of InterPro and CluSTr for the functional classification of
proteins in whole genomes
Expression Profiling databases: GNF (http://expression.gnf.org/cgi-
bin/index.cgi, human and mouse transcriptome), SMD (http://genome-
www5.stanford.edu/MicroArray/SMD/, Stanford microarray data
analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/
index.html , managing, storing and analyzing microarray data)
2D-GEL IMAGE DATABASES (2)
73
(http://us.expasy.org/ch2d/2d-index.html)
(http://us.expasy.org/cgi-bin/nice2dpage.pl?P06493)
VIII. PROTEOME ANALYSIS
74
(http://www.ebi.ac.uk/proteom
e)
EXPRESSION PROFILING
75
Human and Mouse Transcriptome
(http://expression.gnf.org/cgi-bin/index.cgi)
(http://genome-www. stanford.edu/serum/)
OMICS TOOLS
76
(http://omictools.com/)
OMICS TOOLS METAGENOMICS ANALYSIS
77
OMICS TOOLS MASS SPECTROMETRY ANALYSIS
78
OMICS TOOLS MASS SPECTROMETRY ANALYSIS
79
OMICS TOOLS COMMON TOOLS
80
Q & A
Q U E S T I O N S
&
A N S W E R S
Omics era

Omics era

  • 1.
    OMICS ERA Dr. HetalkumarPanchal Associate Professor Gujarat Agricultural Biotechnology Institute (GABI) Navsari Agriculture University, Athwa Farm, Surat – 395007 swamihetal@gmail.com 29th Refresher Course :Bio-Sciences and Bio-Enginering (ID) (02/06/2014 to 2/06/2014) UGC-Academic Staff College, Sardar Patel University, Vallbh Vidyangar -38120, Dist. Anand, (Gujarat)
  • 2.
    • OMICS – Theterm ‘‘omic’’ is derived from the Latin suffix ‘‘ome’’ meaning mass or many. Thus, OMICS involve a mass (large number) of measurements per endpoint. (Jackson et al., 2006) • Integration of OMICS data – Efficient integration of data from different OMICS can greatly facilitate the discovery of true causes and states of disease, mostly done by softwares (Andrew et al., 2006). What is ‘omics’?
  • 3.
    What is ‘omics’? •In biological context , suffix –omics is used to refer to the study of large sets of biological molecules (Smith et al., 2005) • The realization that DNA is not alone regulate complex biological processes (as a result of HGP, 2001), triggered the rapid development of several fields in molecular biology that together are described with the term OMICS. • The OMICS field ranges from – Genomics (focused on the genome) – Proteomics (focused on large sets of proteins, the proteome) – Metabolomics (focused on large sets of small molecules, the metabolome).
  • 4.
    TYPES OF OMICS Genomics Computationalgenomics Epigenomics Functional genomics Immunomics Metagenomics Pathogenomics Regenomics Personal genomics Proteomics Psychogenomics 4
  • 5.
    GENOMICS • The fieldof genomics has been divided into 3 major categories. – Genotyping (focused on the genome sequence), • The physiological function of genes and the elucidation of the role of specific genes in disease susceptibility (Syvanen, 2001) – Transcriptomics (focused on genomic expression) • The abundance of specific mRNA transcripts in a biological sample is a reflection of the expression levels of the corresponding genes (Manning et al., 2007) – Epigenomics (focused on epigenetic regulation of genome expression) • Study of epigenetic processes (expression activities not involving DNA) on a large (ultimately genome-wide) scale (Feinberg, 2007)
  • 6.
    GENOTYPING • Goal – Identificationof the physiological function of genes – Role of specific genes in disease susceptibility (syvanen et al., 2001) • Common Parameter used – Among different variations (insertions, deletions, SNPs, etc.), single nucleotide polymorphisms (SNPs) are the most commonly investigated (Sachidanandam et al., 2001) and can be used as markers for diseases. – Tag SNPs (informative subset of SNPs) and fine mapping are further used to identify true cause of phenotype (patil et al., 2001). • Application – Identification of genes associated with disease • Recent improvement in genotyping – Array-based genotyping techniques, allowing the simultaneous assessment (up to 1 million SNPs) per assay, leads to the genotyping of entire genome known as genome-wide association studies (GWAS) Jelly et al., 2010)
  • 7.
    TRANSCRIPTOMICS • Gene expressionprofiling – The identification and characterization of the mixture of mRNA that is present in a specific sample. • Principle – The abundance of specific mRNA transcripts in a biological sample is a reflection of the expression levels of the corresponding genes (Manning et al., 2007). • Application – To associate differences in mRNA mixtures originating from different groups of individuals to phenotypic differences between the groups (Nachtomy et al., 2007). • Challenge – The transcriptome in contrast to the genome is highly variable over time, between cell types and environmental changes (Celis et al., 2000).
  • 8.
    EPIGENOMICS • Epigenetic processes –Mechanisms other than changes in DNA sequence that cause effect in gene transcription and gene silencing30-32. – Number of mechanisms of epigenomics but is mainly based on two mechanisms, DNA methylation and histone modification28 33- 39. – Recently RNAi has acquired considerable attention31 40 41. • Goal – The focus of epigenomics is to study epigenetic processes on a large (ultimately genome-wide) scale to assess the effect on disease28 29. • Association with disease – Hypermethylation of CpG islands located in promoter regions of genes is related to gene silencing. 28 36. Altered gene silencing plays a causal role in human disease31 34 37 38 42. – Histone proteins are involved in the structural packaging of DNA in the chromatin complex. Post translational histone modifications such as acetylation and methylation are believed to regulate chromatin structure and therefore gene expression34 37
  • 9.
    PROTEOMICS • Proteomics providesinsights into the role proteins in biological systems. The proteome consists of all proteins present in specific cell types or tissue and highly variable over time, between cell types and will change in response to changes in its environment, a major challenge (Fliser et al., 2007). • The overall function of cells can be described by the proteins (intra- and inter- cellular )and the abundance of these proteins (Sellers et al., 2003) • Although all proteins are directly correlated to mRNA (transcriptome) , post translational modifications (PTM) and environmental interactions impede to predict from gene expression analysis alone (Hanash et al., 2008) • Tools for proteomics – Mainly two different approaches that are based on detection by • mass spectrometry (MS) and • protein microarrays using capturing agents such as antibodies. • Major focuses – the identification of proteins and proteins interacting in protein-complexes – Then the quantification of the protein abundance. The abundance of a specific protein is related to its role in cell function (Fliser et al., 2007)
  • 10.
    METABOLOMICS • The metabolomeconsists of small molecules (e.g. lipids or vitamins) that are also known as metabolites (Claudino et al., 2007). • Metabolites are involved in the energy transmission in cells (metabolism) by interacting with other biological molecules following metabolic pathways. • Metabolic phenotypes are the by-products of interactions between genetic, environmental, lifestyle and other factors (Holmes et al., 2008). • The metabolome is highly variable and time dependent, and it consists of a wide range of chemical structures. • An important challenge of metabolomics is to acquire qualitative and quantitative information with preturbance of environment (Jelly et al., 2010)
  • 11.
    METABOLITES, METABOLOME &METABONOMICS Metabolites are the intermediates and products of metabolism. Within the context of metabolomics, a metabolite is usually defined as any molecule less than 1 kDa in size. Metabolome refers to the complete set of small-molecule metabolites (such as metabolic intermediates, hormones and other signaling molecules, and secondary metabolites) to be found within a biological sample. The word was coined in analogy with transcriptomics and proteomics; like the transcriptome and the proteome, the metabolome is dynamic, i.e. changing from second to second. Metabonomics is defined as "the quantitative measurement of the dynamic multiparametric metabolic response of living systems to pathophysiological stimuli or genetic modification". 11
  • 12.
    METAGENOMICS, Metagenomics is thestudy of metagenomes, genetic material recovered directly from environmental samples. The broad field may also be referred to as environmental genomics, ecogenomics or community genomics. 12
  • 13.
    COMPUTATIONAL GENOMICS Computational genomics(often referred to as Computational Genetics) refers to the use of computational and statistical analysis to decipher biology from genome sequences and related data,[1] including both DNA and RNA sequence as well as other "post-genomic" data (i.e. experimental data obtained with technologies that require the genome sequence, such as genomic DNA microarrays). These, in combination with computational and statistical approaches to understanding the function of the genes and statistical association analysis, this field is also often referred to as Computational and Statistical Genetics/genomics. 13
  • 14.
    EPIGENETICS Genomic modifications thatalter gene expression that cannot be attributed to modification of the primary DNA sequence and that are heritable mitotically and meiotically are classified as epigenetic modifications. DNA methylation and histone modification are among the best characterized epigenetic processes 14
  • 15.
    FUNCTIONAL GENOMICS Functional genomicsis a field of molecular biology that attempts to make use of the vast wealth of data produced by genomic projects (such as genome sequencing projects) to describe gene (and protein) functions and interactions. Unlike genomics, functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein–protein interactions, as opposed to the static aspects of the genomic information such as DNA sequence or structures. Functional genomics attempts to answer questions about the function of DNA at the levels of genes, RNA transcripts, and protein products. A key characteristic of functional genomics studies is their genome-wide approach to these questions, generally involving high-throughput methods rather than a more traditional “gene-by-gene” approach. 15
  • 16.
    IMMUNOMICS Immunomics is thestudy of immune system regulation and response to pathogens using genome-wide approaches. With the rise of genomic and proteomic technologies, scientists have been able to visualize biological networks and infer interrelationships between genes and/or proteins; recently, these technologies have been used to help better understand how the immune system functions and how it is regulated. 16
  • 17.
    PATHOGENOMICS Pathogen infections areamong the leading causes of infirmity and mortality among humans and other animals in the world.[1] Until recently, it has been difficult to compile information to understand the generation of pathogen virulence factors as well as pathogen behaviour in a host environment. The study of Pathogenomics attempts to utilize genomic and metagenomics data gathered from high through-put technologies (e.g. sequencing or DNA microarrays), to understand microbe diversity and interaction as well as host-microbe interactions involved in disease states. The bulk of pathogenomics research concerns itself with pathogens that affect human health; however, studies also exist for plant and animal infecting microbes. 17
  • 18.
    REGENOMICS Regenomics represents themerger of two fields of scientific endeavor: Regenerative medicine[1] and genomics.[2][3][4] New technologies to reprogram aged somatic cells back to pluripotency and to restore telomere length are currently used in research in regenerative medicine,[5] though FDA-approved cellular therapies using reprogrammed cells are currently not available in the United States.[6] The culture and banking of somatic cells also allows the parallel sequencing of their nuclear DNA to provide individuals with potentially valuable information for guiding them in lifestyle choices, but also one day, potentially in preventative strategies where cell types are made in advance for high risk categories of disease, i.e. preparing cardiac progenitor cells for individuals at high risk for heart disease. 18
  • 19.
    PERSONAL GENOMICS Personal genomicsis the branch of genomics concerned with the sequencing and analysis of the genome of an individual. The genotyping stage employs different techniques, including single-nucleotide polymorphism (SNP) analysis chips (typically 0.02% of the genome), or partial or full genome sequencing. Once the genotypes are known, the individual's genotype can be compared with the published literature to determine likelihood of trait expression and disease risk. Use of personal genomics in predictive and precision medicine[edit] Predictive medicine is the use of the information produced by personal genomics techniques when deciding what medical treatments are appropriate for a particular individual. Precision medicine is focused on "a new taxonomy of human disease based on molecular biology“. 19
  • 20.
  • 22.
    Stress-responsive transcription FactorsDataBase (STIFDB) Database for Annotation, Visualization and Integrated Discovery (DAVID )
  • 23.
  • 24.
  • 26.
  • 27.
    I. SEQUENCE SIMILARITYSEARCH Find a protein sequence: text search Based on Pair-Wise Comparisons BLOSUM scoring matrix PAM scoring matrix Dynamic Programming Algorithms Global Similarity: Needleman-Wunsch (GAP/BestFit) Local Similarity: Smith-Waterman (SSEARCH) Heuristic Algorithms (Sequence Database Searching) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PSI-BLAST: Iterative Search (NCBI) 27
  • 28.
    SEQUENCE SEARCH BYTEXT OR UNIQUE ID 28 Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) (http://pir.georgetown.edu/pirwww/search/textsearch.html)
  • 29.
    PAIR-WISE COMPARISONS 29 Scoring matrix Global andlocal Similarity: Dynamic Programming (Needleman- Wunsch, Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/)
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
    II. FAMILY CLASSIFICATIONMETHODS Multiple Sequence Alignment and Phylogenetic Analysis ClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic Trees Searches Based on Family Information PROSITE Pattern Search Motif and Profile Search Hidden Markov Model (HMMs) 36
  • 37.
  • 38.
  • 39.
  • 40.
    PHYLOGENETIC ANALYSIS 40 Tree Programs:(http://evolution. genetics.washington.edu/phylip.html) Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)
  • 41.
  • 42.
  • 43.
  • 44.
    HIDDEN MARKOV MODELSEARCH 44 (http://www.sanger.ac.uk/Software/Pfam/search.shtml) (http://smart.em bl- heidelberg.de)
  • 45.
    III. STRUCTURAL PREDICTIONMETHODS Signal Peptide: SIGFIND, SignalP Transmembrane Helix: TMHMM, TMAP 2D Prediction (a-helix, b-sheet, Coiled-coils): PHD, JPred 3D Modeling: Homology Modeling (Modeller, SWISS- MODEL), Threading, Ab-initio Prediction 45
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
    IV. PROTEIN FAMILYDATABASES Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART 53
  • 54.
  • 55.
    PROTEIN DOMAINS 55 Pfam (http://www.sanger.ac.uk/Software/Pfam/) SMART(http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)
  • 56.
    PROTEIN MOTIFS 56 PROSITE isa database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)
  • 57.
    INTEGRATED FAMILY CLASSIFICATION InterPro:An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html) 57
  • 58.
    V. DATABASES OFPROTEIN FUNCTIONS Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC- IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation 58
  • 59.
    KEGG METABOLIC &REGULATORY PATHWAYS 59 (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00590+874) KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)
  • 60.
    BIOCYC (ECOCYC/METACYC METABOLICPATHWAYS) 60 The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)
  • 61.
  • 62.
  • 63.
  • 64.
    VI. DATABASES OFPROTEIN STRUCTURES Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved Regions 64
  • 65.
  • 66.
  • 67.
    67 PDB: EXPERIMENTAL 3DSTRUCTURE REPOSITORY (http://www.rcsb.org/pdb/) Rat gamma-crystallin, chain A, B. Can you do a text search at PIR to find this?
  • 68.
  • 69.
    69 PROTEIN STRUCTURAL CLASSIFICATION CATH:Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)
  • 70.
    PROTEIN STRUCTURAL CLASSIFICATION 70 CATH: Hierarchical domain classificationof protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)
  • 71.
    PROTEIN STRUCTURAL CLASSIFICATION 71 (http://scop.mrc-lmb. cam.ac.uk/scop/) TheSCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.
  • 72.
    VII. PROTEOMIC RESOURCES 72 GELBANK(http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes Expression Profiling databases: GNF (http://expression.gnf.org/cgi- bin/index.cgi, human and mouse transcriptome), SMD (http://genome- www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and analyzing microarray data)
  • 73.
    2D-GEL IMAGE DATABASES(2) 73 (http://us.expasy.org/ch2d/2d-index.html) (http://us.expasy.org/cgi-bin/nice2dpage.pl?P06493)
  • 74.
  • 75.
    EXPRESSION PROFILING 75 Human andMouse Transcriptome (http://expression.gnf.org/cgi-bin/index.cgi) (http://genome-www. stanford.edu/serum/)
  • 76.
  • 77.
  • 78.
    OMICS TOOLS MASSSPECTROMETRY ANALYSIS 78
  • 79.
    OMICS TOOLS MASSSPECTROMETRY ANALYSIS 79
  • 80.
  • 81.
    Q & A QU E S T I O N S & A N S W E R S