SlideShare a Scribd company logo
1 of 10
Multi-Level Comparison of CGI Presence and
Genomic Architecture Across Animal Phylogeny
Christopher Carroll, Lauren Kordonowy, Lindsay Havens, Kaelina Lombardo, Dr. David
Plachetzki, Dr. Matthew MacManes
Introduction:
CpG islands are high-density clusters of cytosine and guanine nucleotides. The islands are
characterized by a lack of 5’-cytosine methylation, contrary to what is typical of background CpG
dinucleotides. This attribute alone makes them curious genomic structures,as ~80% of CpG dinucleotides
in mammalian genomes are estimated to be methylated1
. Additionally, vertebrate genomes are noted to
have low levels of CpGs due to the fact that the methylation of the cytosines frequently causes the
nucleotides to mutate into thymines during the next cycle of replication, altering CpGs to TpGs2
.
Research has unveiled that not only do CGIs strongly correlate with promoter regions and even exonic
regions, but also that the methylation state of CGIs is important for proper transcription to take place. In
mammalian genomes, it is estimated that about 50% of gene promoters are associated with one or more
CGIs 2
. One study found that 72% of promoters in the human genome were present with CGIs 3
. A
commonly held view is that methylation state of CGIs regulates transcription by recruiting proteins
involved in the transcription machinery that recognize unmethylated CpG moieties. This machinery is
thought to alter chromatin configurations, thus regulating transcription by causing a physical intervention.
As unmethylated CpG islands (CGIs) allow transcription and gene expression to occur, methylated CGIs
prevent the transcriptional machinery from accessing the TSSs of genes3, 4
. Irregular methylation of CGIs
in promoter regions of genes is associated with disease and disorders, most notably cancer1, 4, 5
. Additional
studies have suggested that the biological relevance of CGIs goes beyond aiding in transcription. In the
mouse, CGIs in genes located on the X chromosome became methylated after preceding mechanisms had
already silenced the gene, leading to the hypothesis that methylation is involved in the stabilization of
gene silencing and X inactivation6
. CGIs are also consistently implicated in genomic imprinting, indeed
CpG methylation is one of the few factors of imprinting that are well documented. The majority of
imprinted genes have methylated CGIs on only one of the parental alleles. Experiments in mouse
involving the deletion of the methyltransferase gene Dnmt1 resulted in mice that were deficient in
methylation and lacked imprinting7
. Therefore,CGIs are useful gene markers,as well as important
epigenetic elements that will cue further research.
Since CGIs have been established as useful gene markers, and have been suggested to play roles
in so many genetic regulatory processes,this study attempted to obtain evolutionary context of CGI
development by running a statistical analysis across a representative sample of thirty-four animal
genomes. To provide alternative windows through which to understand and explain possible patterns of
CGIs across our spectrum, we also looked at relationships of CGIs and genome size, background GC
content, transcription factor diversity, number of unique protein isoforms, and phylogeny. The
phylogenetic logic underpinning this study lies in the tree constructed by Dr. Plachetzki and associates8
.
Bioinformatic approaches to analyzing genomic CGI content necessarily involve setting statistical
thresholds. Because there is not definitive boundary at which a CGI can be said to stop and start existing,
definitions for CGIs are a bit arbitrary by nature, although the accuracy of these boundaries can be tested
on known CGI locations. The parameters can be thus applied to an entire analysis. The first definitions for
CGIs were proposed by Gardiner-Gardner and Frommer as 200-bp sequences of CG content ≥50%, with
an observed/expected ratio of CpG ≥0.6. These parameters were refined experimentally by Takai and
Jones. Their parameters were comprised of a sequence length of 500-bp with GC content ≥55% and an
observed/expected CpG ratio ≥0.65. These thresholds removed most of the Alu- elements that were called
as a CGI with the original parameters,yet maintained most of the 5’ region CGIs9
. This study took
advantage of the methods created by Takai and Jones.
This study hypothesizes that the number of CGIs per genome should increase as a function of
genome complexity, that is, an increase in the number of unique TFs and CDSs which give rise to greater
amounts of expression variation and control. As such, this study also predicts CGI distribution across
animal phylogeny should be able to demonstrate explanatory power for the relatedness of species and the
topology of the given phylogenetic tree.
Methods:
While the algorithm proposed by Takai and Jones was utilized in this study one notable factor
was altered for the full analysis. Defining a CGI involves considering the GC content of a stretch of DNA,
but this consideration is underpinned by a relation to the background GC content. Statistical thresholds of
CGIs have involved setting a GC% parameter. The Takai and Jones algorithm used a parameter of GC
content ≥55%. However,there are varied ranges of GC content in genomes and between species. Average
GC content of 100-kb fragments in humans ranges from 35%-60%, a range twice as wide as that found in
teleostean fishes.10
Ignoring the dynamic range of GC content might, and assuming a fixed percentage
between species might cause data to be erroneously analyzed. A previous study found that 28% of human
gene promoters had CpG content similar to that of the background GC; these were classified as low CpG
concentration (LCG) promoters3
. To account for this, this study employed a GC content threshold of
≥15% of the background GC. The remaining parameters and code set by Takai and Jones were unaltered
and used for this study.
Thirty-four genomes, representing the spectrum of animal evolution and diversity, were analyzed.
The annotation software CEGMA was used to conclude the completeness of the genome assemblies11
.
The quality was determined as the number of contigs per the genome size (Mb). CGI data were
characterized as both raw number count of CGIs per genome and CGI density, that is, the number of CGIs
per Mb per genome. This study made an effort to look for relationships between transcription diversity
and CGIs. To this end, TransDecoder was used to cluster similar isoforms, effectively making sure the
proteins considered for results were unique12
. Additionally, Pfam was used to pull out unique
transcription factors. This gives a reliable reference the relative complexity of transcription between
species13
.
To understand whether or not number of CGIs or CGI density gave phylogenetic signal, that is,
the degree some relationship of CGIs can explain the topology of the tree and relatedness of species,the
data were compared to the phylogenetic tree constructed using Bayesian methods by Plachetzki and
associates7
(Figure 1). The tree is rooted by the animal outgroup consisting of Monosiga brevicollis and
Salpingoeca rosetta. To compare any signal given by the CGI data,unique TFs, background GC content,
genome size, average O/E and unique CDSs were also tested for phylogenetic signal. The genomic
features aforementioned of each species were plotted against number of CGIs and CGI density for a
regression analysis. Each feature was also plotted against genome size to serve as the null hypothesis in
each case. The P-value and R-squared value are reported in each correlation graph, and the P-values for
each respective phylogenetic signal analysis are given.
Figure 1: Topology of the tree constructed using Bayesian methods
Results:
`
Figure 21: Assembly quality does not display
phylogenetic signal.
(#contigs/Mb) (M b)
Figure 22: Genome size does not display
phylogenetic signal.
(# U nique T Fs)
Figure 23: Unique TFs interestingly display
similarly insignificant phylogenetic signal to
that of genome size.
(# U nique C DSs)
Figure 25: A verage O/E GC content of C GIs gives
significant phylogenetic signal.
Figure 24: U nique CDSs do not display
phylogenetic signal.
Figure 26: Like Average O/E GC content of C GIs,
background GC displays phylogenetic signal.
Figure 27: U nique T Fs/Unique CDSs display
significant phylogenetic signal.
Discussion:
The results of this study reject the hypothesis: neither CGI density nor the number of CGIs per
genome significantly relates to an increase of unique CDSs or unique TFs across a representative sample
of the animal kingdom. Instead,genome size was the biggest contributor to an increase of CGIs by
positively driving the correlations. This does not necessarily indicate a lack of relationship between CGI
distribution and significant genome architectural aspects such as CDS or TFs, but it casts shadows of
uncertainty when examining correlational data. Two of the most interesting results from the phylogenetic
analysis is that average O/E GC content relating to CGIs (Figure 25) and background GC content (Figure
26) both display significant phylogenetic signal, suggesting that generalgenomic GC content has been
maintained throughout animal phylogeny. Another interesting result from this analysis is that the ratio of
TFs per CDSs (Figure 27) also gives significant phylogenetic signal, even though the ratio correlates
strongly with genome size. Number of CGIs (Figure 29) demonstrates punctuation along the species,
which is echoed, if to a less exaggerated degree,to that of TF/sCDSs (Figure 27). This observation
underpins a general one relating to this study’s analysis: in all relationships tested, there is a large degree
of variation. Even if the relationships explain a quarter to half of all of the data across the entire kingdom,
as is observed, it is difficult to ignore the strong outliers. As is suggested by the observed punctuation in
the phylogenetic graphs, it may be that greater resolution would be obtained from this kind of analysis by
decreasing its scope and investigating an area of punctuation such as, say, the vertebrates,and comparing
results for each punctuated group. Genome size (Figure 22) may be said to demonstrate a small amount of
this punctuation as well.
As in all studies requiring statistical parameters,this study is limited by its utilized definitions.
Because there is no structural, physical border that constitutes a CGI from the background, defined
parameters must be used to conduct the investigation, and as such any definition of a CGI can be a bit
arbitrary. Therefore,this study has operated on the assumption that the parameters reflect,to a degree of
confidence as reflected in the literature, the biologically relevant moieties. This reflection demonstrates a
desire for biochemical, functional assays to greatly supplement and aid this kind of research.
(# C GIs/Mb)
Figure 28: CGI density does not display
phylogenetic signal.
(# C GIs)
Figure 29: Number of CGIs, although more
significantly than CGI density, does not give
phylogenetic signal.
References:
1) Zhao, Z. Han,L. (2009) CpG islands: algorithms and applications in methylation studies. BBRC. 382:
14; 643-645
2) Ioshikhes, I. P. Zhang, M. Q. (2000) Large-scale human promoter mapping using CpG islands. Nature
Genetics. 26: 61-63
3) Saxonov, S. Berg, P. Brutlag, D. L. (2006) A genome-wide analysis of CpG dinucleotides in the
human genome distinguishes two distinct classes of promoters. PNAS. 103: 5; 1412-1417
4) Deaton, A. M. Bird, A. (2011) CpG islands and regulation of transcription. Genes and Dev. 25: 1010-
1022
5) Elango, N. Soojin, Y. V. (2008) DNA methylation and structural and functional bimodality of
vertebrate promoters. Mol Biol. 25: 8; 1602-1608
6) Bird, A. (2002) DNA methylation patterns and epigenetic memory. Genes and Dev. 16: 6-21
7) Borowiec, M. L. Lee,E. K. Chiu, J. C. Plachetzki, D. C. (2015) Dissecting phylogenetic signal and
accounting for bias in whole-genome data sets: a case study of the Metazoa. BioRxiv. doi:
http://dx.doi.org/10.1101/013946
8) Feil, R. Khosla, S. (1999) Genomic imprinting in mammals: an interplay between chromatin and DNA
methylation? Trends in Genetics. 15: 11; 431-435
9) Zhao, Z. Han,L. (2009) CpG islands: algorithms and applications in methylation studies. BBRC. 382:
4; 643-645
10) Romiguier, J. Ranwez, V. Douzery, E. J. P. Galtier, N. (2010) Contrasting GC-content dynamics
across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res.
20: 1001-1009
11) Parra,G. Bradnam, K. Korf, I. (2007) CEGMA: a pipeline to accurately annotate core genes in
eukaryotic genomes. Bioinformatics. 23: 1061 - 1067.
12) Haas,B. Papanicolaou, A. TransDecoder (Find Coding Regions Within Transcripts).
http://transdecoder.github.io/. 2015.
13) Finn, R. D. et al (2014) Pfam: the protein families database. Nucl. Acids Res. 42: D1; D222-D230

More Related Content

What's hot

Cellular Transforming Genes in Cancer
Cellular Transforming Genes in CancerCellular Transforming Genes in Cancer
Cellular Transforming Genes in CancerDeedee Chatham
 
Minimal and Compact
Minimal and CompactMinimal and Compact
Minimal and CompactJoshua Gefen
 
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Elsa von Licy
 
Chromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal clubChromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal clubAIIMS
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyDeepak Kumar
 
Science-2015-Gantz-442-4
Science-2015-Gantz-442-4Science-2015-Gantz-442-4
Science-2015-Gantz-442-4PricyBark0
 
Association mapping
Association mapping Association mapping
Association mapping Preeti Kapoor
 
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy Gately
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy GatelyTargeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy Gately
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy GatelyHannahMcCarthy31
 
The_Journal_of_Immunology_2009_Udyavar
The_Journal_of_Immunology_2009_UdyavarThe_Journal_of_Immunology_2009_Udyavar
The_Journal_of_Immunology_2009_UdyavarAkshata Udyavar PhD
 
The Effects of Genetic Alteration on Reprogramming of Fibroblasts into Induc...
The Effects of Genetic Alteration on Reprogramming of  Fibroblasts into Induc...The Effects of Genetic Alteration on Reprogramming of  Fibroblasts into Induc...
The Effects of Genetic Alteration on Reprogramming of Fibroblasts into Induc...remedypublications2
 
Brown and Feder 2005
Brown and Feder 2005Brown and Feder 2005
Brown and Feder 2005Rebecca Brown
 

What's hot (20)

Cellular Transforming Genes in Cancer
Cellular Transforming Genes in CancerCellular Transforming Genes in Cancer
Cellular Transforming Genes in Cancer
 
Biomed central
Biomed centralBiomed central
Biomed central
 
GWAS
GWASGWAS
GWAS
 
JoB spike in manuscript 2014
JoB spike in manuscript 2014JoB spike in manuscript 2014
JoB spike in manuscript 2014
 
Minimal and Compact
Minimal and CompactMinimal and Compact
Minimal and Compact
 
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...Transcriptional signaling pathways inversely regulated in alzheimer's disease...
Transcriptional signaling pathways inversely regulated in alzheimer's disease...
 
Chromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal clubChromosome 7 in lung cancer_Journal club
Chromosome 7 in lung cancer_Journal club
 
Genetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacologyGenetic variation and its role in health pharmacology
Genetic variation and its role in health pharmacology
 
Poster
PosterPoster
Poster
 
Science-2015-Gantz-442-4
Science-2015-Gantz-442-4Science-2015-Gantz-442-4
Science-2015-Gantz-442-4
 
Association mapping
Association mapping Association mapping
Association mapping
 
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy Gately
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy GatelyTargeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy Gately
Targeting PIM kinase to overcome drug resistance in NSCLC - Dr Kathy Gately
 
The_Journal_of_Immunology_2009_Udyavar
The_Journal_of_Immunology_2009_UdyavarThe_Journal_of_Immunology_2009_Udyavar
The_Journal_of_Immunology_2009_Udyavar
 
The Effects of Genetic Alteration on Reprogramming of Fibroblasts into Induc...
The Effects of Genetic Alteration on Reprogramming of  Fibroblasts into Induc...The Effects of Genetic Alteration on Reprogramming of  Fibroblasts into Induc...
The Effects of Genetic Alteration on Reprogramming of Fibroblasts into Induc...
 
Bmc research note
Bmc research noteBmc research note
Bmc research note
 
Poster_mainFin1
Poster_mainFin1Poster_mainFin1
Poster_mainFin1
 
QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS  QTL MAPPING & ANALYSIS
QTL MAPPING & ANALYSIS
 
sclabas2003
sclabas2003sclabas2003
sclabas2003
 
Brown and Feder 2005
Brown and Feder 2005Brown and Feder 2005
Brown and Feder 2005
 
GP120 pdf
GP120 pdfGP120 pdf
GP120 pdf
 

Viewers also liked

Denuncia contra Dina Chuquimia presentada al Ministerio de Transparencia
Denuncia contra Dina Chuquimia presentada al Ministerio de TransparenciaDenuncia contra Dina Chuquimia presentada al Ministerio de Transparencia
Denuncia contra Dina Chuquimia presentada al Ministerio de TransparenciaJuan Macias
 
Tabla periódica mostrando los elementos químicos que son bioelementos
Tabla periódica mostrando los elementos químicos que son bioelementosTabla periódica mostrando los elementos químicos que son bioelementos
Tabla periódica mostrando los elementos químicos que son bioelementosAlexander Ulloa
 
Распад колониальной системы
Распад колониальной системыРаспад колониальной системы
Распад колониальной системыПётр Ситник
 
Análisis qué pasa sí what if
Análisis qué pasa sí  what if Análisis qué pasa sí  what if
Análisis qué pasa sí what if SST Asesores SAC
 
Comunicado: Ante el acoso a la ciudadanía
Comunicado: Ante el acoso a la ciudadaníaComunicado: Ante el acoso a la ciudadanía
Comunicado: Ante el acoso a la ciudadaníaFUSADES
 
La paz mundial
La paz mundialLa paz mundial
La paz mundialjoneiver
 
Cv special ed. 6 (1)
Cv special ed. 6 (1)Cv special ed. 6 (1)
Cv special ed. 6 (1)Jay Singh
 

Viewers also liked (12)

Parménides de elea
Parménides de eleaParménides de elea
Parménides de elea
 
Denuncia contra Dina Chuquimia presentada al Ministerio de Transparencia
Denuncia contra Dina Chuquimia presentada al Ministerio de TransparenciaDenuncia contra Dina Chuquimia presentada al Ministerio de Transparencia
Denuncia contra Dina Chuquimia presentada al Ministerio de Transparencia
 
Physiology of bone 2
Physiology of bone 2Physiology of bone 2
Physiology of bone 2
 
Tabla periódica mostrando los elementos químicos que son bioelementos
Tabla periódica mostrando los elementos químicos que son bioelementosTabla periódica mostrando los elementos químicos que son bioelementos
Tabla periódica mostrando los elementos químicos que son bioelementos
 
Physiology of bone
Physiology of bonePhysiology of bone
Physiology of bone
 
Распад колониальной системы
Распад колониальной системыРаспад колониальной системы
Распад колониальной системы
 
Análisis qué pasa sí what if
Análisis qué pasa sí  what if Análisis qué pasa sí  what if
Análisis qué pasa sí what if
 
Comunicado: Ante el acoso a la ciudadanía
Comunicado: Ante el acoso a la ciudadaníaComunicado: Ante el acoso a la ciudadanía
Comunicado: Ante el acoso a la ciudadanía
 
La paz mundial
La paz mundialLa paz mundial
La paz mundial
 
Crónica:
Crónica:  Crónica:
Crónica:
 
Alfabeto
AlfabetoAlfabeto
Alfabeto
 
Cv special ed. 6 (1)
Cv special ed. 6 (1)Cv special ed. 6 (1)
Cv special ed. 6 (1)
 

Similar to CGI.Paper

Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102Steve Pells
 
Science-2015-Siklenka-science.aab2006
Science-2015-Siklenka-science.aab2006Science-2015-Siklenka-science.aab2006
Science-2015-Siklenka-science.aab2006Sarah Kimmins
 
Epiroadmap20
Epiroadmap20Epiroadmap20
Epiroadmap20n0rr
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand upChris Southan
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionJonathan Karten
 
Genetic Dna And Bioinformatics ( Accession No. Xp Essay
Genetic Dna And Bioinformatics ( Accession No. Xp EssayGenetic Dna And Bioinformatics ( Accession No. Xp Essay
Genetic Dna And Bioinformatics ( Accession No. Xp EssayJessica Deakin
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.Varsha Gayatonde
 
ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75Sam Yang
 
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variants
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variantsMultiplex pcr detection of gstm1, gstt1, and gstp1 gene variants
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variantsReema Mohammed
 
Cancer Res-2014-Chakraborty-3489-500
Cancer Res-2014-Chakraborty-3489-500Cancer Res-2014-Chakraborty-3489-500
Cancer Res-2014-Chakraborty-3489-500Rachel Stupay
 
Analysis of loss of heterozygosity of the tumor
Analysis of loss of heterozygosity of the tumorAnalysis of loss of heterozygosity of the tumor
Analysis of loss of heterozygosity of the tumorAlexander Decker
 
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypes
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypesFocusing the diversity of Gardnerella vaginalis through the lens of ecotypes
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypesRoxana Hickey
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...Healthcare and Medical Sciences
 
Short Tandem Repeats in plants: Genomic distribution and function prediction
Short Tandem Repeats in plants: Genomic distribution and function predictionShort Tandem Repeats in plants: Genomic distribution and function prediction
Short Tandem Repeats in plants: Genomic distribution and function predictionRana Asif Abbas
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Ahmed Madni
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyTom Kelly
 
An Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesAn Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesNick Brown
 

Similar to CGI.Paper (20)

2013_WCBSURC.pptx
2013_WCBSURC.pptx2013_WCBSURC.pptx
2013_WCBSURC.pptx
 
Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102Pells et al [2015] PLoS ONE 10[7] e0131102
Pells et al [2015] PLoS ONE 10[7] e0131102
 
Science-2015-Siklenka-science.aab2006
Science-2015-Siklenka-science.aab2006Science-2015-Siklenka-science.aab2006
Science-2015-Siklenka-science.aab2006
 
Epiroadmap20
Epiroadmap20Epiroadmap20
Epiroadmap20
 
Will the real proteins please stand up
Will the real proteins please stand upWill the real proteins please stand up
Will the real proteins please stand up
 
AJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_VersionAJP_12-0313_Araten_et_al_Word_Version
AJP_12-0313_Araten_et_al_Word_Version
 
Genetic Dna And Bioinformatics ( Accession No. Xp Essay
Genetic Dna And Bioinformatics ( Accession No. Xp EssayGenetic Dna And Bioinformatics ( Accession No. Xp Essay
Genetic Dna And Bioinformatics ( Accession No. Xp Essay
 
Report- Genome wide association studies.
Report- Genome wide association studies.Report- Genome wide association studies.
Report- Genome wide association studies.
 
ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75ASEE-GSW_2015_submission_75
ASEE-GSW_2015_submission_75
 
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variants
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variantsMultiplex pcr detection of gstm1, gstt1, and gstp1 gene variants
Multiplex pcr detection of gstm1, gstt1, and gstp1 gene variants
 
Cancer Res-2014-Chakraborty-3489-500
Cancer Res-2014-Chakraborty-3489-500Cancer Res-2014-Chakraborty-3489-500
Cancer Res-2014-Chakraborty-3489-500
 
Analysis of loss of heterozygosity of the tumor
Analysis of loss of heterozygosity of the tumorAnalysis of loss of heterozygosity of the tumor
Analysis of loss of heterozygosity of the tumor
 
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypes
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypesFocusing the diversity of Gardnerella vaginalis through the lens of ecotypes
Focusing the diversity of Gardnerella vaginalis through the lens of ecotypes
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 
Oncogene_2010_Ocak
Oncogene_2010_OcakOncogene_2010_Ocak
Oncogene_2010_Ocak
 
PIIS0016508514604509
PIIS0016508514604509PIIS0016508514604509
PIIS0016508514604509
 
Short Tandem Repeats in plants: Genomic distribution and function prediction
Short Tandem Repeats in plants: Genomic distribution and function predictionShort Tandem Repeats in plants: Genomic distribution and function prediction
Short Tandem Repeats in plants: Genomic distribution and function prediction
 
Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...Gene expression profile analysis of human hepatocellular carcinoma using sage...
Gene expression profile analysis of human hepatocellular carcinoma using sage...
 
QMB_Poster_Tom_Kelly
QMB_Poster_Tom_KellyQMB_Poster_Tom_Kelly
QMB_Poster_Tom_Kelly
 
An Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesAn Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation Rules
 

CGI.Paper

  • 1. Multi-Level Comparison of CGI Presence and Genomic Architecture Across Animal Phylogeny Christopher Carroll, Lauren Kordonowy, Lindsay Havens, Kaelina Lombardo, Dr. David Plachetzki, Dr. Matthew MacManes
  • 2. Introduction: CpG islands are high-density clusters of cytosine and guanine nucleotides. The islands are characterized by a lack of 5’-cytosine methylation, contrary to what is typical of background CpG dinucleotides. This attribute alone makes them curious genomic structures,as ~80% of CpG dinucleotides in mammalian genomes are estimated to be methylated1 . Additionally, vertebrate genomes are noted to have low levels of CpGs due to the fact that the methylation of the cytosines frequently causes the nucleotides to mutate into thymines during the next cycle of replication, altering CpGs to TpGs2 . Research has unveiled that not only do CGIs strongly correlate with promoter regions and even exonic regions, but also that the methylation state of CGIs is important for proper transcription to take place. In mammalian genomes, it is estimated that about 50% of gene promoters are associated with one or more CGIs 2 . One study found that 72% of promoters in the human genome were present with CGIs 3 . A commonly held view is that methylation state of CGIs regulates transcription by recruiting proteins involved in the transcription machinery that recognize unmethylated CpG moieties. This machinery is thought to alter chromatin configurations, thus regulating transcription by causing a physical intervention. As unmethylated CpG islands (CGIs) allow transcription and gene expression to occur, methylated CGIs prevent the transcriptional machinery from accessing the TSSs of genes3, 4 . Irregular methylation of CGIs in promoter regions of genes is associated with disease and disorders, most notably cancer1, 4, 5 . Additional studies have suggested that the biological relevance of CGIs goes beyond aiding in transcription. In the mouse, CGIs in genes located on the X chromosome became methylated after preceding mechanisms had already silenced the gene, leading to the hypothesis that methylation is involved in the stabilization of gene silencing and X inactivation6 . CGIs are also consistently implicated in genomic imprinting, indeed CpG methylation is one of the few factors of imprinting that are well documented. The majority of imprinted genes have methylated CGIs on only one of the parental alleles. Experiments in mouse involving the deletion of the methyltransferase gene Dnmt1 resulted in mice that were deficient in methylation and lacked imprinting7 . Therefore,CGIs are useful gene markers,as well as important epigenetic elements that will cue further research. Since CGIs have been established as useful gene markers, and have been suggested to play roles in so many genetic regulatory processes,this study attempted to obtain evolutionary context of CGI development by running a statistical analysis across a representative sample of thirty-four animal genomes. To provide alternative windows through which to understand and explain possible patterns of CGIs across our spectrum, we also looked at relationships of CGIs and genome size, background GC content, transcription factor diversity, number of unique protein isoforms, and phylogeny. The phylogenetic logic underpinning this study lies in the tree constructed by Dr. Plachetzki and associates8 . Bioinformatic approaches to analyzing genomic CGI content necessarily involve setting statistical thresholds. Because there is not definitive boundary at which a CGI can be said to stop and start existing, definitions for CGIs are a bit arbitrary by nature, although the accuracy of these boundaries can be tested on known CGI locations. The parameters can be thus applied to an entire analysis. The first definitions for CGIs were proposed by Gardiner-Gardner and Frommer as 200-bp sequences of CG content ≥50%, with an observed/expected ratio of CpG ≥0.6. These parameters were refined experimentally by Takai and Jones. Their parameters were comprised of a sequence length of 500-bp with GC content ≥55% and an observed/expected CpG ratio ≥0.65. These thresholds removed most of the Alu- elements that were called as a CGI with the original parameters,yet maintained most of the 5’ region CGIs9 . This study took advantage of the methods created by Takai and Jones.
  • 3. This study hypothesizes that the number of CGIs per genome should increase as a function of genome complexity, that is, an increase in the number of unique TFs and CDSs which give rise to greater amounts of expression variation and control. As such, this study also predicts CGI distribution across animal phylogeny should be able to demonstrate explanatory power for the relatedness of species and the topology of the given phylogenetic tree. Methods: While the algorithm proposed by Takai and Jones was utilized in this study one notable factor was altered for the full analysis. Defining a CGI involves considering the GC content of a stretch of DNA, but this consideration is underpinned by a relation to the background GC content. Statistical thresholds of CGIs have involved setting a GC% parameter. The Takai and Jones algorithm used a parameter of GC content ≥55%. However,there are varied ranges of GC content in genomes and between species. Average GC content of 100-kb fragments in humans ranges from 35%-60%, a range twice as wide as that found in teleostean fishes.10 Ignoring the dynamic range of GC content might, and assuming a fixed percentage between species might cause data to be erroneously analyzed. A previous study found that 28% of human gene promoters had CpG content similar to that of the background GC; these were classified as low CpG concentration (LCG) promoters3 . To account for this, this study employed a GC content threshold of ≥15% of the background GC. The remaining parameters and code set by Takai and Jones were unaltered and used for this study. Thirty-four genomes, representing the spectrum of animal evolution and diversity, were analyzed. The annotation software CEGMA was used to conclude the completeness of the genome assemblies11 . The quality was determined as the number of contigs per the genome size (Mb). CGI data were characterized as both raw number count of CGIs per genome and CGI density, that is, the number of CGIs per Mb per genome. This study made an effort to look for relationships between transcription diversity and CGIs. To this end, TransDecoder was used to cluster similar isoforms, effectively making sure the proteins considered for results were unique12 . Additionally, Pfam was used to pull out unique transcription factors. This gives a reliable reference the relative complexity of transcription between species13 . To understand whether or not number of CGIs or CGI density gave phylogenetic signal, that is, the degree some relationship of CGIs can explain the topology of the tree and relatedness of species,the data were compared to the phylogenetic tree constructed using Bayesian methods by Plachetzki and associates7 (Figure 1). The tree is rooted by the animal outgroup consisting of Monosiga brevicollis and Salpingoeca rosetta. To compare any signal given by the CGI data,unique TFs, background GC content, genome size, average O/E and unique CDSs were also tested for phylogenetic signal. The genomic features aforementioned of each species were plotted against number of CGIs and CGI density for a regression analysis. Each feature was also plotted against genome size to serve as the null hypothesis in each case. The P-value and R-squared value are reported in each correlation graph, and the P-values for each respective phylogenetic signal analysis are given.
  • 4. Figure 1: Topology of the tree constructed using Bayesian methods Results:
  • 5.
  • 6.
  • 7. ` Figure 21: Assembly quality does not display phylogenetic signal. (#contigs/Mb) (M b) Figure 22: Genome size does not display phylogenetic signal. (# U nique T Fs) Figure 23: Unique TFs interestingly display similarly insignificant phylogenetic signal to that of genome size.
  • 8. (# U nique C DSs) Figure 25: A verage O/E GC content of C GIs gives significant phylogenetic signal. Figure 24: U nique CDSs do not display phylogenetic signal. Figure 26: Like Average O/E GC content of C GIs, background GC displays phylogenetic signal. Figure 27: U nique T Fs/Unique CDSs display significant phylogenetic signal.
  • 9. Discussion: The results of this study reject the hypothesis: neither CGI density nor the number of CGIs per genome significantly relates to an increase of unique CDSs or unique TFs across a representative sample of the animal kingdom. Instead,genome size was the biggest contributor to an increase of CGIs by positively driving the correlations. This does not necessarily indicate a lack of relationship between CGI distribution and significant genome architectural aspects such as CDS or TFs, but it casts shadows of uncertainty when examining correlational data. Two of the most interesting results from the phylogenetic analysis is that average O/E GC content relating to CGIs (Figure 25) and background GC content (Figure 26) both display significant phylogenetic signal, suggesting that generalgenomic GC content has been maintained throughout animal phylogeny. Another interesting result from this analysis is that the ratio of TFs per CDSs (Figure 27) also gives significant phylogenetic signal, even though the ratio correlates strongly with genome size. Number of CGIs (Figure 29) demonstrates punctuation along the species, which is echoed, if to a less exaggerated degree,to that of TF/sCDSs (Figure 27). This observation underpins a general one relating to this study’s analysis: in all relationships tested, there is a large degree of variation. Even if the relationships explain a quarter to half of all of the data across the entire kingdom, as is observed, it is difficult to ignore the strong outliers. As is suggested by the observed punctuation in the phylogenetic graphs, it may be that greater resolution would be obtained from this kind of analysis by decreasing its scope and investigating an area of punctuation such as, say, the vertebrates,and comparing results for each punctuated group. Genome size (Figure 22) may be said to demonstrate a small amount of this punctuation as well. As in all studies requiring statistical parameters,this study is limited by its utilized definitions. Because there is no structural, physical border that constitutes a CGI from the background, defined parameters must be used to conduct the investigation, and as such any definition of a CGI can be a bit arbitrary. Therefore,this study has operated on the assumption that the parameters reflect,to a degree of confidence as reflected in the literature, the biologically relevant moieties. This reflection demonstrates a desire for biochemical, functional assays to greatly supplement and aid this kind of research. (# C GIs/Mb) Figure 28: CGI density does not display phylogenetic signal. (# C GIs) Figure 29: Number of CGIs, although more significantly than CGI density, does not give phylogenetic signal.
  • 10. References: 1) Zhao, Z. Han,L. (2009) CpG islands: algorithms and applications in methylation studies. BBRC. 382: 14; 643-645 2) Ioshikhes, I. P. Zhang, M. Q. (2000) Large-scale human promoter mapping using CpG islands. Nature Genetics. 26: 61-63 3) Saxonov, S. Berg, P. Brutlag, D. L. (2006) A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. PNAS. 103: 5; 1412-1417 4) Deaton, A. M. Bird, A. (2011) CpG islands and regulation of transcription. Genes and Dev. 25: 1010- 1022 5) Elango, N. Soojin, Y. V. (2008) DNA methylation and structural and functional bimodality of vertebrate promoters. Mol Biol. 25: 8; 1602-1608 6) Bird, A. (2002) DNA methylation patterns and epigenetic memory. Genes and Dev. 16: 6-21 7) Borowiec, M. L. Lee,E. K. Chiu, J. C. Plachetzki, D. C. (2015) Dissecting phylogenetic signal and accounting for bias in whole-genome data sets: a case study of the Metazoa. BioRxiv. doi: http://dx.doi.org/10.1101/013946 8) Feil, R. Khosla, S. (1999) Genomic imprinting in mammals: an interplay between chromatin and DNA methylation? Trends in Genetics. 15: 11; 431-435 9) Zhao, Z. Han,L. (2009) CpG islands: algorithms and applications in methylation studies. BBRC. 382: 4; 643-645 10) Romiguier, J. Ranwez, V. Douzery, E. J. P. Galtier, N. (2010) Contrasting GC-content dynamics across 33 mammalian genomes: relationship with life-history traits and chromosome sizes. Genome Res. 20: 1001-1009 11) Parra,G. Bradnam, K. Korf, I. (2007) CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics. 23: 1061 - 1067. 12) Haas,B. Papanicolaou, A. TransDecoder (Find Coding Regions Within Transcripts). http://transdecoder.github.io/. 2015. 13) Finn, R. D. et al (2014) Pfam: the protein families database. Nucl. Acids Res. 42: D1; D222-D230