Comparative proteogenomics using mass spectrometry data from multiple genomes can address problems that a single genome approach cannot. It helps identify rare post-translational modifications, resolve "one-hit wonders" by looking for correlated peptides in orthologous proteins across species, and identify programmed frameshifts and sequencing errors. The approach is demonstrated through an analysis of mass spectrometry data from three Shewanella bacteria genomes, improving gene predictions and annotations compared to existing tools.
In this document
Powered by AI
Overview of comparative genomics and the significance of the genome in understanding biology.
Timeline highlighting key discoveries in genetics, including the Human Genome Project and notable completed genomes.
Importance of examining DNA sequences in context, with emphasis on comparative genomics for evolution insights.
Discussion on proteomics providing deeper insights into gene expression and the overall function compared to genomics. Detailed objectives of the HGP, including sequencing DNA and the focus on various model organisms.
Benefits of the HGP in disease understanding and genetic research, highlighting implications of comparative genomics.
Emergence of mass spectrometry in proteomics, advantages of comparative analyses among different genomes.
Methods for peptide identification and the significance of post-translational modifications in protein analysis.
Findings from comparative studies of Shewanella genomes and implications for gene expression and operon prediction.
Use of comparative genomics to trace the genetic pathway for CoA biosynthesis in humans and bacteria.
Study of gastric cancer links through comparative proteomics of Helicobacter pylori clinical strains.
Strategic goals for understanding human genetics, disease associations, and improvements in drug discovery.
Invitation for audience questions regarding the presentation topics.
Genome
The genome containsall the biological information required to build
and maintain any given living organism.
The genome contains the organisms molecular history.
Decoding the biological information encoded in these molecules will
have enormous impact in our understanding of biology.
3.
1865 Mendel discovers laws of genetics
1900 Rediscovery of Mendel’s genetics
1944 DNA identified as hereditary material
1953 DNA structure
1960’s Genetic code
1977 Advent of DNA sequencing
1975-79 First human genes isolated
1986 DNA sequencing automated
~50 years
1990 Human genome project officially begins
1995 First whole genome
1999 First human chromosome
2003 ‘Finished’ human genome sequence
How much cansequence data
alone tell us?
• The answer is that that a DNA sequence taken in isolation from a
single organism reveals very little.
• The vast majority of DNA in most organisms is noncoding. Protein
coding sequences or genes cannot function as isolated units without
interaction with noncoding DNA and neighboring genes.
• This genomic environment is specific to each organism. In order to
understand this we need to look at similar genes in different
organisms, to determine how function and position has changed
over the course of evolution.
• By understanding evolutionary processes we can gain a greater
insight into what makes a gene and the wider processes of genetics
and inheritance
7.
Comparative Genomics
• studyof the relationship of genome structure and function across
different biological species or strains
• One of the important goals of the field is the identification of the
mechanisms of eukaryotic genome evolution.
• It is however often complicated by the multiplicity of events that have
taken place throughout the history of individual lineages, leaving
only distorted and superimposed traces in the genome of each living
organism.
• For this reason comparative genomics studies of small model
organisms (for example the model Caenorhabditis elegans and
closely related Caenorhabditis briggsae) are of great importance to
advance our understanding of general mechanisms of evolution
8.
• proteome isa blend of "protein" and "genome“
• it gives a much better understanding of an organism than genomics.
• First, the level of transcription of a gene gives only a rough estimate
of its level of expression into a protein. An mRNA produced in
abundance may be degraded rapidly or translated inefficiently,
resulting in a small amount of protein.
• Second, as mentioned above many proteins experience post-
translational modifications that profoundly affect their activities; for
example some proteins are not active until they become
phosphorylated
9.
Human Genome Project
• international scientific research project with a primary goal of
determining the sequence of chemical base pairs which make up
DNA, and of identifying and mapping the approximately 20,000-
25,000 genes of the human genome from both a physical and
functional standpoint
• The project began in October 1990 and was initially headed by Ari
Patrinos, head of the Office of Biological and Environmental
Research in the U.S. Department of Energy's Office of Science.
• While the objective of the Human Genome Project is to understand
the genetic makeup of the human species, the project has also
focused on several other nonhuman organisms such as E. coli, the
fruit fly, and the laboratory mouse.
http://www.ornl.gov/sci/techresources/Human_Genome/research/function.shtml
10.
Original goals
• constructionof a high-resolution genetic map of the human genome;
• production of a variety of physical maps of all human
chromosomes and of the DNA of selected model organisms;
• determination of the complete sequence of human and selected
model-organism DNA;
• development of capabilities for collecting, storing, distributing, and
analyzing the data produced; and
• creation of appropriate technologies necessary to achieve these
objectives
11.
The Human genomeproject
This was a huge technical undertaking so further
aims of the project were…
• Develop and improve technologies for: DNA sequencing, physical
and genetic mapping, database design, informatics, public access
• Genome projects of 5 model organisms e.g. E. coli, S.
cerevisiae, C. elegans, D. melanogaster, M. musculus.
Provide information about As test cases for refinement and
these organisms implementation of various tools
required for the HGP
• There are approximately 23,000 genes in human beings, the same
range as in mice and roundworms. Understanding how these genes
express themselves will provide clues to how diseases are caused
http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml
12.
Advantages of HumanGenome Project:
• Knowledge of the effects of variation of DNA among individuals can
revolutionize the ways to diagnose, treat and even prevent a number
of diseases that affects the human beings.
• It provides clues to the understanding of human biology.
13.
• The functionsof human genes and other DNA regions often are
revealed by studying their parallels in nonhumans.
• To enable such comparisons, HGP researchers have obtained
complete genomic sequences for the bacterium Escherichia coli, the
yeast Saccharomyces cerevisiae, the roundworm Caenorhabditis
elegans, the fruitfly Drosophila melanogaster, the laboratory
mouse, and many other organisms.
• The availability of complete genome sequences generated both
inside and outside the HGP is driving a major breakthrough in
fundamental biology as scientists compare entire genomes to gain
new insights into evolutionary, biochemical, genetic, metabolic, and
physiological pathways
14.
How we usethis data to understand
physiology, behaviour, disease and variation between
species/individuals we need to:
• The evolutionary history of every genetic element (every base)
• Evolutionary forces shaping the genome
• Structural and sequence variation in the population and between
species.
Comparative genomics studies differences between genome
sequences pin-pointing changes over time. Comparison of the
number/type changes against the background “neutral” expected
changes provides a better understanding of the forces that shaped
genomes and traits.
15.
Introduction
• Mass spectrometryrecently emerged as a valuable technique for
proteogenomic annotations that improves on the state of the art in
predicting genes and other features. Previous proteogenomic
approaches were limited to a single genome and did not take advantage
of analyzing mass spectrometry data from multiple genomes at once.
We show that such a comparative proteogenomics approach allows one
to address the problems that remained beyond the reach of the
traditional “single proteome” approach in mass spectrometry.
• In particular, we show how comparative
proteogenomics addresses the notoriously
difficult problemof “one-hit-wonders” in
proteomics, improves on the existing gene
prediction tools in genomics and allows
identification of rare post-translation
modifications.
http://genome.cshlp.org/content/18/7/1133.short
16.
Developments
• Since thesequencing of the first genome, Haemophilus influenzae
in 1995, the number of sequenced genomes has been rising sharply.
Every sequencing project is followed by annotation of the genome to
identify genes, pathways, etc.
• MS-Genome software for automated proteogenomic annotation of
bacterial genomes was developed and applied for improving
annotation of Shewanella oneidensis MR-1, a model bacterium for
studies of bioremediation and metal reduction. However, the synergy
between MS/MS data from different species was never explored in
the past. We show that such comparative proteogenomics analysis
sheds new light on the annotations of both genomes and proteomes.
http://genome.cshlp.org/content/18/7/113
3.short
17.
Continued…..
• Similar toExpressed Sequence Tag (EST) studies, mass spectrometry
experiments generate Expressed Protein Tag (EPT) that provide
valueable information about expresses proteins. Unlike ESTs, EPTs
are relatively uniformly distributed along the protein length and provide
information about the translational starts, proteolytic events and post-
translational modifications, making it nontrivial to transform the
existing EST approaches into the EPT domain.
• Here, we will analyze MS/MS data sets for the three Shewanella
bacteria representing multiple growth conditions: S. oneidensis MR-1
(~14.5 mn spectra), S. frigidimarina (~0.955 mn spectra) and S.
putrefaciens CN-32 (~0.768 mn spectra). In addition to predicting new
genes and finding errors in existing annotations, we will show that
MS/MS data help to identify programmed frameshifts, a difficult
problem in genomics. We will also demonstrate that comparative
analysis of peptides across species is helpful in resolving the dilemma
of “one-hit-wonders” in proteomics.
http://genome.cshlp.org/content/18/7/113
3.short
18.
Methods
• Peptide identification
It was performed for So earlier. The MS/MS spectra were acquired on
ion-trap mass spectrometers using electrospray ionization. They used
InsPecT to search the spectra of each species against a database
containing the six frame translation of the genome along with common
contaminants and a decoy database of the same size.
The InsPecT score threshold was
selected for each case to limit the number
of identifications on the decoy database to
at most 1% of the number of identifications
on the target database to keep the false
discovery rate under control. After filtering
step we obtain the peptides in all three
species that do not match the annotated
proteins in these genomes.
http://genome.cshlp.org/content/18/7/113
3.short
19.
• Analyzing late start codons
We describe an algorithm for predicting „late‟ start codons i.e. the (correct) start
codons that are located downstream from the wrongly annotated start codons.
While a late start codon implies a “missing” peptide in the beginning of the
protein, such missing peptides can also be caused by low peptide detectability or
may simply represent signal peptides.
However, noncovered peptides in the beginning of the protein , that cannot be
explained by the signal peptide consensus sequence , point to late start codons.
Within 18 residues of the start, there are 33 cases of N-terminal most-
noncovered peptides in So. Many of them start with ATG start codon or start
immediately after a start codon. Distribution of codons for amino acids at
positions 1 and -1 in the peptides are non-uniform, hence raising the case that
these all cannot be artifacts.
http://genome.cshlp.org/content/18/7/1133.short
20.
• Correlated peptides
Traditional MS/MS analysis is focused on identification of proteins and is less
concerned with the question of which peptide in a protein are observed or
not. In a typical mass spectrometry experiment, some peptides with low
detectability are always missed, resulting in highly non-uniform protein
coverage by identifies peptides.
Peptide detectability depends on protein abundance, peptide
length, peptide hydrophobicity, etc. and several groups are using large data
sets to develop the ability to its prediction. Peptides identified by MS/MS in
two species are called correlated peptides if they are observed in the same
position in the protein alignment or one of them spans another.
For example if one peptide is located at position (start1,end1) and other at
(start2, end2) in the alignment then they are considered correlated if
start1=<start2=<end2=<end1
or
start2=<start1=<end1=<end2
http://genome.cshlp.org/content/18/7/113
3.short
21.
• Identification ofpost-translational modifications
MS-Alignment was used to identify PTMs in each of the three
organisms in a blind mode, in the range from -200 to 250 Da.
Common contaminants like keratin were included in protein sequence
databases. A decoy database of the same size as the actual protein
database, containing shuffled sequences was used to control error
rate. Any hits to decoy database are expected to be incorrect
identifications.
A score cut off is chosen such that the number of PTMs identified
in the decoy database is at most 5% of the number of identifications in
the target database. This provides a controlled PTM site-specific false
discovery rate of 5%. All spectra that were identified in the regular
InsPecT search were removed. After this post processing MS-
Alignment results,9917,7649, and 6709 PTMs were obtained in So, Sf
and Sp respectively.
http://genome.cshlp.org/content/18/7/113
3.short
22.
Results
• Multiple Shewanella genome
The three Shewanella species used
in this study were sequenced. The
protein orthology assignments
across different Shewanella species
were prepared using
INPARANOID, subsequently
aligned by MUSCLE.
Expression of orthologous genes
across the three species. (A) The
number of orthologs shared
between different species. There
are 2590 orthologous genes
present in all three species (referred
to as “shared genes”). (B) The
number of expressed shared genes
among the three species; 1052
shared genes are expressed in all
three species.
http://genome.cshlp.org/content/18/7/113
3.short
23.
• Protein Identification
MS-based protein identification can be done to analyze the
expression of pathways or functional categories. Having proteomic
data for these three species allows us to compare the expression of
pathways and identify which pathways are conserved or differentially
expressed across these species.
http://genome.cshlp.org/content/18/7/113
3.short
24.
• Resolving onehit wonders
There are 1052 shared genes that are expressed in all three species.
However, as per guidelines, we require at least two peptides to
consider a protein as expressed. Since almost every analysis of
MS/MS data sets reveals a large number of proteins with a single
identified peptide, it leads to a significant reduction in the number of
identified proteins.
While orthologous one-hit-wonders are strong indicators of protein
expression, peptides identified at the same orthologous positions in
different species provide overwhelming evidence that the proteins are
expressed.
http://genome.cshlp.org/content/18/7/113
3.short
25.
We should observeorthologous peptides in closely related species.
We thus check if the only peptide observed in the protein is
correlated between multiple species. If peptide identification is
spurious, it is very likely that the peptide will be at the same position
as the observed peptides in its orthologs.
Aligned amino acid sequences of the shared gene (annotated as
hypothetical lipoprotein). The identified peptides are shown in blue.
http://genome.cshlp.org/content/18/7/113
3.short
26.
Identification of programmedframeshifts and
sequencing errors
• A frameshift occurs when a ribosome skips one or more nucleotides in
an mRNA sequence, thereby changing the reading frame to produce
different protein sequence from the original frame. In programmed
frameshifts, this phenomenon is built into the translational machinery.
Mass spectrometry provides experimental evidence for the actual
translation products and allows one to detect the frame shifts. The
presence of peptides from two different reading frames within the
region of predicted gene may represent (1) Incorrect peptide
identification,
(2) an insertion/deletion sequencing error,
(3) overlapping genes in different frames or
(4) a programmed frameshift.
http://genome.cshlp.org/content/18/7/113
3.short
Proteolytic events
An in vivo proteolytic event can be observed as a non-tryptic peptide.
However, non-tryptic peptides may also be observed due to other reasons
such as degradation of tryptic peptides or incorrect peptide identifications.
On applying the filter approach and removing the cuts explained by trypsin
specificity we obtain some putative proteolytic sites in these three species.
Then to check whether these are conserved between multiple organisms, we
map them on the alignment of orthologous protein. And the found conserved
proteolytic sites between two or more organism were greater than expected by
chance. Thus, there is an argument that the conserved sites reported here
cannot be results of non-specific degradations.
http://genome.cshlp.org/content/18/7/113
3.short
29.
Post-translational modifications
• While algorithms for blind searches for unexpected modifications have been
developed they had to rely on the “strength in numbers” principle to
distinguish real modifications from computational artifacts. As a result, the
biologically important modifications that appears only a few times in the
genome are likely to be classified as computational artifacts.
Blind PTM searches with MS-Alignment find all possible mass offsets
without a priori knowledge of which modifications may be present in the
sample. Since blind searches may yield thousands of modifications, the
strength in numbers approach consider frequent modification as reliable and
discards rare modifications as unreliable.
After the post processing of MS-Alignment results, we find 162 different
modifications that are observed in all three species. While 74 of these
represent chemical adducts that are expected in mass spectrometry
experiments, 88 others reveal biologically interesting modifications as well
as other potentially important modifications that remain unknown.
http://genome.cshlp.org/content/18/7/113
3.short
30.
Some more Applications:
•“RNA editing” is difficult to confirm by MS-based analysis of a single genome
since amino acid mutations can also be explained by DNA sequencing errors or
false peptide identifications. While mass spectrometry is routinely used for
confirming RNA editing events in a case-by-case fashion, it was never used for
genome-wide discovery of RNA editing. The comparative proteogenomics
analysis of related species would be a simple way to rule out such alternative
explanations and to confirm RNA editing.
• While “signal peptides” are important for understanding protein function, they
are difficult to confirm experimentally, and computational tools are used to fill
the gap. Comparative proteogenomics opens a possibility to construct the first
reliable data set of all signal peptides in a set of genomes and to study
evolution of signal peptides across multiple species.
• “Operon prediction” in bacterial genomes is an important but still unsolved
problem. Also, since peptide detectability varies from species, we expect that
comparative proteogenomics approach based on signatures may minimize
errors and improve on existing operon predictions.
http://genome.cshlp.org/content/18/7/113
3.short
• Comparative analysisof a large and growing number
of diverse sequenced genomes is revolutionizing the
pace of gene discovery.
• A common theme of these efforts is the integration of
various types of genomic evidence such as clustering
of genes on the chromosome, protein fusion
events, occurrence profiles or signatures and shared
regulatory sites to infer functional coupling for proteins
participating in related cellular processes.
• It is primarily focused on which components (e.g.
metabolic enzymes) are actually present and which
should be present but cannot be identified and thus
provides a rather specific and precise notion of what is
actually missing.
Source- Missing genes in metabolic pathways: a comparative genomics approach by Andrei Osterman and
Ross Overbeek
33.
Human coenzyme Abiosynthesis
• Only the gene for human pantothenate kinase (PANK) was known.
• Given the conservation at the functional level of this pathway between
humans and bacteria, genes from bacteria to humans were projected
using comparative genomics.
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de
Cre cy-Lagard, and Andrei Osterman
34.
Human coenzyme Abiosynthesis
PANK PPCS PPCDC PPAT DPCK
• PSI-BLAST searches identified three proteins in the human cDNA
sequence database as strong homologs of E. coli CoA biosynthesis
enzymes.
•One homolog was found for PPCDC and two homologs were found for
DPCK.
•No reliable homologs could be found for E. coli PPCS or PPAT
•The predicted human PPCDC appeared to be a mono-functional enzyme.
This is in contrast to most bacteria, in which PPCDC is fused with PPCS
forming a bi-functional protein.
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de
Cre cy-Lagard, and Andrei Osterman
35.
Human coenzyme Abiosynthesis
PANK PPCS PPCDC PPAT DPCK
•Among prokaryotic genomes, only streptococci and enterococci contain
mono-functional PPCDC genes. Bacterial mono-functional PPCDC from
these organisms shows the highest sequence similarity to human mono-
functional PPCDC.
• In the same bacterial genomes, PPCS is also mono-functional and is
found in the same operon with PPCDC. Using this unique mono-functional
PPCS from Streptococcus pneumoniae, a candidate for human mono-
functional PPCS was identified with a reliable similarity.
•There was marginal similarity between PPCS domains of bacterial bi-
functional proteins and putative human mono-functional PPCS.
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de Cre cy-
Lagard, and Andrei Osterman
36.
Human coenzyme Abiosynthesis
PANK PPCS PPCDC PPAT DPCK
•Biochemical analysis in rat and pig suggested the existence of a non-
dissociable complex, potentially a bi-functional fusion protein, of PPAT and
DPCK.
•Based on the biochemical evidence of PPAT/DPCK fusion, additional
searches in the human expressed sequence tag database were
performed, revealing that the predicted human DPCK open reading frames
was potentially 5‟-truncated.
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de Cre cy-
Lagard, and Andrei Osterman
37.
Human coenzyme Abiosynthesis
PANK PPCS PPCDC PPAT DPCK
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de Cre cy-
Lagard, and Andrei Osterman
38.
Human coenzyme Abiosynthesis
Source- Missing genes in metabolic pathways: a comparative genomics approach by Andrei Osterman and
Ross Overbeek
39.
Human coenzyme Abiosynthesis
PANK PPCS PPCDC PPAT DPCK
The analysis of publicly available human genomic data allows us to
establish chromosomal localization of all genes encoding the final four
steps of CoA biosynthesis. Most of them exist as single copies, such as:-
a. PPCS on chromosome 1
b. PPCDC on chromosome 15
c. PPAT/DPCK on chromosome 17
Source- Complete Reconstitution of the Human Coenzyme A Biosynthetic Pathway via Comparative Genomics
by Matthew Daugherty, Boris Polanuyer, Michael Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de Cre cy-
Lagard, and Andrei Osterman
40.
Comparative proteome analysisof Helicobacter pylori clinical
strains by two-dimensional gel electrophoresis
Objective: To investigate the pathogenic
properties of Helicobacter pylori by comparing
the proteome map
of H. pylori clinical strains.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
41.
Introduction
• Gastric canceris the fourth most common cancer and the second
most common cause of cancer deaths .
• There is a multistep progression stage from pre-malignancy to
invasive malignancy.
• Thus, early gastric cancer diagnosis for effective preventive
strategies and therapeutics against gastric cancer is urgently
required.
• Based on results of epidemiological and clinical studies, the World
Health Organization (WHO) has declared H. pylori as a definitive
carcinogen in 1994.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
42.
• The evidencecomes mainly from epidemiological studies supporting
that the risk for development of stomach diseases is higher in
persons infected with cag pathogenicity island (cagPAI)-positive H-
pylori than in those infected with cagPAI-negative strains.
• Gastric epithelial cell damage may be a consequence of the
inflammatory responses induced by H. pylori infection.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
43.
• cagA andiceA have been proposed as biomarkers that might
predict the risk for symptomatic clinical outcomes.
• However, they cannot explain why H. pylori strains isolated from
asymptomatic patients have also the same frequency of expression
of both CagA and VacA compared to the strains isolated from
patients with peptic ulcer or gastric cancer .
• Neither iceA nor assembly of iceA/vacA/cagA is helpful in predicting
the clinical presentation infected with H. pylori. H. pylori strain-
specific factors may influence the pathogenicity of different H. pylori
isolates and the presentation of a clinical outcome.
• There should be other proteins/ factors cooperating with CagA and
VacA to induce or promote the development of disease.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
44.
Necessity of proteomeanalysis
• The complete genome of strains provides sufficient genetic
information for proteome analysis of H. pylori.
• The fact that 35 genes of H. pylori 11637 translate 93 proteins
suggests that H. pylori proteins express a high degree of post-
translational modification.
• A comparative proteome map of H. pylori strains should be
beneficial to investigate the pathogenic properties of these
organisms.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
45.
Experiment
• Materials and methods :
• Two wild-type H. pylori strains:
YN8 (isolated from biopsy tissue of a gastric cancer patient)
YN14 (isolated from biopsy tissue of a gastritis and duodenal ulcer patient)
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
46.
Experimental procedure:
H-pylori and protein preparation
Protein assay
Two-dimensional gel electrophoresis
In-gel digestion
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
47.
• Quadrupole time-of-flight(Q-TOF) mass
spectrometry analysis and database search
The peptide analysis was carried out as per protocols of the
supplier (Bio-Rad).
Mass spectra were obtained using the Q-TOF mass
spectrometer
MS/MS data were searched against NCBInr protein sequence
databases (http://www.ncbi.nlm. nih.gov) and Mascot
(http://www.matrixscience.com).
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
48.
Results
• The proteincompositions of H. pylori YN8 and YN14
were initially separated on 2-DE gel stained by silver
staining .
YN8 YN14
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
49.
• The proteinspots were separated over the molecular
weight (Mr) range of 10–200 kDa and the pI range of
3–10.
• The gel revealed prominent individual proteins with
several protein “families” (most notably as clusters of
bands).
• Although several main spots/clusters were found at
the same position, some proteins visually varied in
expression level.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
50.
Protein identification
• Becausemost expressed proteins were spread
over the center of pI 3–10 gel, we further
selected the 2-DE experiments of pI 5–8.
YN8 YN14
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
51.
• Then Q-TOFwas performed for protein identification with statistical
confidence .
• Seven of nine protein spots identified using protein database
(http://www.ncbi.nlm.nih.gov)
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
52.
• Interestingly, thesame amino acid sequence has
different protein definition depending on individual
strain, e.g., IVESDAITALIQR definition is hydantoin
utilization protein A in H. pylori HPKX, and
hypothetical protein in H. pylori B128.
• Two of nine proteins are unknown.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
53.
Discussion
• H. pyloristrains display a high interstrain genomic divergence .This
high variation at the genomic level does not provide evidence of a
functional protein difference between strains because silent
mutations happen naturally.
• Disordered proteins are considered as disease-initiating factors.
• Thus, authors focused on protein expression levels of H. pylori
isolates.
• From this study, different H. pylori isolates have individual protein
expression levels. The presence or absence of some protein spots
on 2-DE was thought to be useful for H. pylori infection
characterization.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
54.
• Disease-specific proteinswere thought to be
responsible for the clinical presentation induced by H.
pylori infection. However, none of the seven identified
proteins showed similarities with virulence factors .
• E.g. Dsb family of redox protein:
The interesting thing is that H. pylori isolated from
gastric cancer showed high increased DsbB-like
protein compared to that of the strain isolated from
gastritis. This infers that a strain which produces much
more redox proteins when it colonizes human gastric
mucosa may portend a higher risk for gastric cancer.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
55.
• Taylor (1992)speculated that H. pylori strains
undergo genomic rearrangements to adapt to a new
human host environment.
• H. pylori strains express/repress proteins
variation, not only in terms of the virulence
proteins, but also in terms of physiological proteins
when they infect a human host.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
56.
Inference
• comparative analysisof proteins is, to date, a
better method to find a new disease-specific
protein antigen .
• In this preliminary study, variation at the
protein level, of H. pylori isolated from
patients with gastric cancer and gastritis was
confirmed.
• This reveals completely unexpected
complexity and diversity in protein expression.
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/
57.
Future directions
• Comprehensivelyidentify the structural and
functional components encoded in human
genome
• Develop a detailed understanding of the heritable
variation in the human genome
• Understand evolutionary variation across species
and the mechanisms underlying it
• Develop robust strategies for identifying the
genetic contributions to desease and drug
response
58.
• Develop strategiesto identify gene variants
that contribute to good health and resistance
to disease
• Develop genome and proteome approaches to
detect illness and thus accelerate drug
discovery.
59.
Bibliography
• http://genome.cshlp.org/content/18/7/1133.s
hort
• http://www.ncbi.nlm.nih.gov/pmc/articles/P
MC3190097/
• Complete Reconstitution of the Human Coenzyme A
Biosynthetic Pathway via Comparative Genomics by
Matthew Daugherty, Boris Polanuyer, Michael
Farrell, Michael Scholle, Athanasios Lykidis, Vale rie de
Cre cy-Lagard, and Andrei Osterman
• http://www.ornl.gov/sci/techresources/Human_Genome/ho
me.shtml
http://www.ncbi.nlm.nih.gov/pmc/articles/
PMC3190097/