SlideShare a Scribd company logo
1 of 19
Organizational Heterogeneity
of Human Genome:
Significant variation of recombination rate of
100 kbp sequences within GC ranges
Svetlana Frenkel
Valery Kirzhner
Abraham Korol
Department of Evolutionary and Environmental Biology
Institute of Evolution
University of Haifa
Some aspects of intra-genome
heterogeneity
 Varying gene density
 Clusters of tissue-specific and
housekeeping genes
 Linkage disequilibrium (LD) blocks
 Mutation and recombination rates
 Conserved and Ultraconserved segments
 Localization of inversions, deletions,
insertions and duplications
Genome Heterogeneity: GC content
From: Costantini, M., Clay, O., Auletta, F., Bernardi, G. (2006) An isochore map of
human chromosomes. Genome Res., 16, 536-541.
From: UHN Microarray Centre's CpG Island Database
http://data.microarrays.ca/cpg/index.htm
The level of
redness denotes
the relative
number of CpG
islands that can
be located on
the chromosome
in that region
4
Genome
Signature
Samuel Karlin, et al, 1997
Local:
• preliminary searches of candidates for gene
alignment
• detecting candidate regulatory signals
• detecting promoter regions
• detecting repetitive elements
• duplications of genomic
• horizontal gene transfer
Genome-wide:
• phylogenetic analysis
• species recognition
• whole-genome sequence comparisons
Linguistic-like methods
Detecting all of
“words” with certain
maximal length
Characterizing the
sequence
“vocabulary”
Scoring the occurrences
of fixed-length “words”
from a predefined
“vocabulary”
Comparison of “word”
frequencies obtained
from different sequences
Comparison the
“vocabularies” of
different sequences
Compositional Spectra
Analysis
Compositional Spectra
  
A linguistic-like method of genome analysis based
on occurrences of “words” in the A,C,G,T alphabet
Compositional spectrum (CS) is measured as a
histogram of imperfect word occurrences
From: V. Kirzhner et al., 2002-2005
6
Methods: calculating of distances
d1
d’1 d’2
d2
F(Si, W)
F(S’i, W)
F(Sj, W)
F(S’ j, W)
5’
5’3’
3’
Manhattan (city block) distance
Spearman Rank Correlation ρ (d= 1-ρ)
Kendall distance τ
d = min(di, d’i, dj, d’j)
F(Si, W’) F(Sj, W’)
Methods: Detection of Organizational
Pattern groups of segments
Genome segment number
Low HighClustering tree
Relative distance
between two
clusters
Maximal
distance
between
segments
Neighbor-Joining Clustering
“adaptive cutoff”
Analysis of Organizational Pattern
groups of segments
9
Significant variation of evolutionary features
of 100 kbp sequences within GC ranges
Testing for potential
association between
genome-wide distribution
of organizational patterns
and various evolutionary
and structural features
reveals the existence of
inter-OP heterogeneity in
such features as SNP and
Indels frequency,
recombination rate,
number of segmental
duplications, size of
linkage disequilibrium
blocks, and proportion of
evolutionary conserved
sequence.
10
Estimation of heterogeneity
between OP groups
11
GC
RecombinationRate
Estimation of heterogeneity
between OP groups
12
0.22 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 0.03 1.9×10-3 0.01 0.11 3.9×10-3
-log(FDR-correctedp-value)
GC
Kruskal–Wallis non-parametric rank test
10,000 segments reshuffles to estimate test critical value
FDR correction for multiple comparisons
Reshuffled sequences within every segment as control
2.3 5.1 86.1 48.6 81.9 35.7 21.0 26.0 46.7 36.6 13.6 15.7 15.5 16.9
Detecting the words related to
recombination rate
13
GC, %
Average RR in the compared
OPGs
Proportion of correct classifications of segments to OP
groups, %
low RR high RR all words set of 47 words set of 8 words
35 0.82 0.93 98.60 98.62 76.03
36 0.62 1.16 98.40 96.56 82.34
37 0.83 1.28 94.10 93.88 80.47
38 0.80 1.46 99.58 99.17 98.33
39 0.91 1.59 97.32 97.32 96.55
40 0.96 1.50 100.0 100.0 100.0
41 1.13 1.81 98.80 98.50 98.50
42 1.05 1.80 100.0 100.0 99.62
43 1.29 1.99 97.48 96.98 95.46
44 1.44 1.83 99.01 99.21 98.81
45 1.35 2.06 100 98.93 98.22
46 1.30 1.88 98.53 98.53 97.35
47 1.15 1.74 94.62 94.61 91.48
48 1.33 2.04 98.78 98.77 97.55
Oligonucleotides, which showed high importance in more
than half of OPG comparisons in classification of 100kbp
segments for high and low recombination rate
14
Oligonucleotide GC, %
Appeared in the list of 10
most important variables
(times)
Appeared
as the most important variable
(times)
Previously described
pattern
Reference
CAGCCAGGTT 60 11 4
-CCNCCNTNNCCNC-
-CAGCCAGGTT----
Myers et al. 2008
GACCGGACTG 70 10 1
---CCTCCCT--
-GACCGGACTG-
Myers et al. 2005
-CCNCCNTNNCCNC-
---GACCGGACTG--
Myers et al. 2008
CGCCGGGACT 80 10 3
-CCNCCNTNNCCNC-
--CGCCGGGACT---
Myers et al. 2008
GCGTAGGCTA 60 9 0
-CCNCCNTNNCCNC-
---GCGTAGGCTA--
Myers et al. 2008
TGGGCCCGGC 90 8 4 n/a
GGCGTGCGCG 90 8 1
-GGNGGNAGGGG-
-GGCGTGCGCG--
Zheng et al. 2010
-CCNCCNTNNCCNC-
---GGCGTGCGCG--
Myers et al. 2008
CCCGGTATCG 70 8 0
-CCNCCNTNNCCNC-
--CCCGGTATCG---
Myers et al. 2008
GCCCTTTCCT 60 7 0
---CCTCCCT--
-GCCCTTTCCT-
Myers et al. 2005
-CCNCCNTNNCCNC-
---GCCCTTTCCT--
Myers et al. 2008
-CCTCCCTNNCCAC-
---GCCCTTTCCT--
Myers et al. 2008
Functionally related genes tend to reside in
organizationally similar genomic regions
Genes provided the GO
enrichment of four
organizational pattern
clusters, which showed the
most significant GO
enrichments.
L2-a cluster is enriched by
“mitochondrion”, “intracellular non-
membrane-bounded organelle”,
“nuclear envelope” and
“ribonucleoprotein complex” GO
terms;
L2-h cluster is enriched by “G-
protein-coupled receptor protein
signaling pathway” and “sensory
perception of smell” GO terms;
H1-i cluster is enriched by “epithelial
cell differentiation” and “epithelium
development” GO terms;
H2-a cluster is enriched by “skeletal
system development” GO term.
Paz A, Frenkel S, Snir S, Kirzhner V, Korol A. 2014. BMC Genomics 15:252.
15
Thank you for your attention
Acknowledgments
Dr. Valery Kirzhner
Prof. Abraham Korol
Prof. Edward Trifonov
Dr. Arnon Paz and Dr. Zeev Frenkel
This work was supported by
The Israeli Ministry of Immigrant Absorption
The Israel Council for Higher Education
Calculating compositional
spectra
…
AGTAGTTACA
CTACTATAGT
GACGACTCCA
TCGTCGTCGA
GAACGTACCT
TCTATATCCA
AGGTACTACA
CTCGCGACCG
…
3676
CTACTATAGT
…
…
CTACTATAGT
CTACTAAAGT
CTAGTAAAGT
CTAGTAAAGT
CTAGTAACGT
CGCCTAAAGT
CCACTAAGGT
…
256 × 3676 = 941056 86.7%
Additional slide
Spearman's rank correlation
coefficient rho
 Spearman's rank correlation coefficient is a non-
parametric measure of correlation
 ρ is given by:
 where:
• Di = xi − yi = the difference between the ranks of
corresponding values Xi and Yi, and
• n = the number of values in each data set (same for
both sets).
Additional slide
The Kendall tau distance
 The Kendall tau distance is a metric that counts the number of
pairwise disagreements between two lists. The larger the
distance, the more dissimilar the two lists are.
 The Kendall tau distance between two lists τ1 and τ2 is
 K(τ1,τ2) will be equal to 0 if the two lists are identical and n(n
− 1) / 2 (where n is the list size) if one list is the reverse of the
other. Often Kendall tau distance is normalized by dividing by
n(n − 1) / 2 so a value of 1 indicates maximum disagreement.
The normalized Kendall tau distance therefore lies in the
interval [0,1].
Additional slide

More Related Content

What's hot

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009Sean Davis
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in riceSopan Zuge
 
Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvementnagamani gorantla
 
QTL lecture for Bio4025
QTL lecture for Bio4025QTL lecture for Bio4025
QTL lecture for Bio4025DanChitwood
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xFOODCROPS
 
Genomic selection on rice
Genomic selection on riceGenomic selection on rice
Genomic selection on riceCIAT
 
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...Jonathan Karr
 
Mapping and QTL
Mapping and QTLMapping and QTL
Mapping and QTLFAO
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selectionmuzamil ahmad
 
Integrated sequence technology approaches to genomic diagnosis of birth defects
Integrated sequence technology approaches to genomic diagnosis of birth defectsIntegrated sequence technology approaches to genomic diagnosis of birth defects
Integrated sequence technology approaches to genomic diagnosis of birth defectsKaryn Meltz Steinberg
 
Exploring the role of Epigenetic regulation in plant disease management
Exploring the role of Epigenetic regulation in plant disease managementExploring the role of Epigenetic regulation in plant disease management
Exploring the role of Epigenetic regulation in plant disease managementVigneshVikki10
 
An Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesAn Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesNick Brown
 
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...Integrated Breeding Platform
 
Accelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionAccelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionViolinaBharali
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research PosterKrista Chew
 
MAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvementMAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvementDr. Asit Prasad Dash
 

What's hot (20)

OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Association mapping for improvement of agronomic traits in rice
Association mapping  for improvement of agronomic traits in riceAssociation mapping  for improvement of agronomic traits in rice
Association mapping for improvement of agronomic traits in rice
 
Genomic selection for crop improvement
Genomic selection for crop improvementGenomic selection for crop improvement
Genomic selection for crop improvement
 
QTL lecture for Bio4025
QTL lecture for Bio4025QTL lecture for Bio4025
QTL lecture for Bio4025
 
Molecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010xMolecular quantitative genetics for plant breeding roundtable 2010x
Molecular quantitative genetics for plant breeding roundtable 2010x
 
Nikhil ahlawat
Nikhil ahlawatNikhil ahlawat
Nikhil ahlawat
 
Genomic selection on rice
Genomic selection on riceGenomic selection on rice
Genomic selection on rice
 
MAGIC POPULATION
MAGIC POPULATIONMAGIC POPULATION
MAGIC POPULATION
 
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...
Introduction to whole-cell modeling lecture | Whole-cell modeling summer scho...
 
Mapping and QTL
Mapping and QTLMapping and QTL
Mapping and QTL
 
Marker assissted selection
Marker assissted selectionMarker assissted selection
Marker assissted selection
 
Integrated sequence technology approaches to genomic diagnosis of birth defects
Integrated sequence technology approaches to genomic diagnosis of birth defectsIntegrated sequence technology approaches to genomic diagnosis of birth defects
Integrated sequence technology approaches to genomic diagnosis of birth defects
 
Exploring the role of Epigenetic regulation in plant disease management
Exploring the role of Epigenetic regulation in plant disease managementExploring the role of Epigenetic regulation in plant disease management
Exploring the role of Epigenetic regulation in plant disease management
 
An Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation RulesAn Investigation Of The Rigor Of Interpretation Rules
An Investigation Of The Rigor Of Interpretation Rules
 
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...
Our PAG XXVI Presentations: Integrating Marker-Assisted Selection into a Popu...
 
Accelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selectionAccelerating crop genetic gains with genomic selection
Accelerating crop genetic gains with genomic selection
 
REU Research Poster
REU Research PosterREU Research Poster
REU Research Poster
 
Virador fast 102913
Virador fast 102913Virador fast 102913
Virador fast 102913
 
Chiranjeev patel thesis viva voce
Chiranjeev patel thesis viva voceChiranjeev patel thesis viva voce
Chiranjeev patel thesis viva voce
 
MAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvementMAGIC populations and its role in crop improvement
MAGIC populations and its role in crop improvement
 

Viewers also liked

Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Databricks
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013MLconf
 
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"DataStax Academy
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonDatabricks
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to SparkLi Ming Tsai
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?Edureka!
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFramesSpark Summit
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsDatabricks
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterDatabricks
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksLegacy Typesafe (now Lightbend)
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit
 

Viewers also liked (15)

Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)Announcing Databricks Cloud (Spark Summit 2014)
Announcing Databricks Cloud (Spark Summit 2014)
 
Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013Matei zaharia, spark presentation m lconf 2013
Matei zaharia, spark presentation m lconf 2013
 
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"
NYC* 2013 - "Analyzing the Human Genome/DNA with Cassandra"
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Why Talend for Big Data?
Why Talend for Big Data?Why Talend for Big Data?
Why Talend for Big Data?
 
Building a modern Application with DataFrames
Building a modern Application with DataFramesBuilding a modern Application with DataFrames
Building a modern Application with DataFrames
 
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on TutorialsSparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
Sparkcamp @ Strata CA: Intro to Apache Spark with Hands-on Tutorials
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Apache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and SmarterApache Spark 2.0: Faster, Easier, and Smarter
Apache Spark 2.0: Faster, Easier, and Smarter
 
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and DatabricksFour Things to Know About Reliable Spark Streaming with Typesafe and Databricks
Four Things to Know About Reliable Spark Streaming with Typesafe and Databricks
 
Spark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan PuSpark Summit EU talk by Qifan Pu
Spark Summit EU talk by Qifan Pu
 

Similar to Organizational Patterns Reveal Genome Heterogeneity

Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Leighton Pritchard
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics TechnologiesSean Davis
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...Healthcare and Medical Sciences
 
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGVALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGNARRANAGAPAVANKUMAR
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEMMonica Pava-Ripoll
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
Clinical Epigenetics - Conesa et al
Clinical Epigenetics - Conesa et al Clinical Epigenetics - Conesa et al
Clinical Epigenetics - Conesa et al Jamie Wilce
 
Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Leonid Mirny
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerTom Kelly
 

Similar to Organizational Patterns Reveal Genome Heterogeneity (20)

Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010Comparative Genomics and Visualisation BS32010
Comparative Genomics and Visualisation BS32010
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
Vivo vitrothingamajig
Vivo vitrothingamajigVivo vitrothingamajig
Vivo vitrothingamajig
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art... Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
Image analysis; Spinocellular carcinoma; Melanoma; Basal cell carcinoma; Art...
 
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCINGVALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
VALIDATION OF NGS SEQUENCING BY SANGER SEQUENCING
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
7 0
7 07 0
7 0
 
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
2013_CarterEtal_MultiplexPCR-Cronobacter_ AEM
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Clinical Epigenetics - Conesa et al
Clinical Epigenetics - Conesa et al Clinical Epigenetics - Conesa et al
Clinical Epigenetics - Conesa et al
 
Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization Genome folding by loop extrusion and compartmentalization
Genome folding by loop extrusion and compartmentalization
 
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast CancerBioinformatic Analysis of Synthetic Lethality in Breast Cancer
Bioinformatic Analysis of Synthetic Lethality in Breast Cancer
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
TB Genotyping
TB GenotypingTB Genotyping
TB Genotyping
 

Recently uploaded

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 

Recently uploaded (20)

GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 

Organizational Patterns Reveal Genome Heterogeneity

  • 1. Organizational Heterogeneity of Human Genome: Significant variation of recombination rate of 100 kbp sequences within GC ranges Svetlana Frenkel Valery Kirzhner Abraham Korol Department of Evolutionary and Environmental Biology Institute of Evolution University of Haifa
  • 2. Some aspects of intra-genome heterogeneity  Varying gene density  Clusters of tissue-specific and housekeeping genes  Linkage disequilibrium (LD) blocks  Mutation and recombination rates  Conserved and Ultraconserved segments  Localization of inversions, deletions, insertions and duplications
  • 3. Genome Heterogeneity: GC content From: Costantini, M., Clay, O., Auletta, F., Bernardi, G. (2006) An isochore map of human chromosomes. Genome Res., 16, 536-541. From: UHN Microarray Centre's CpG Island Database http://data.microarrays.ca/cpg/index.htm The level of redness denotes the relative number of CpG islands that can be located on the chromosome in that region
  • 4. 4 Genome Signature Samuel Karlin, et al, 1997 Local: • preliminary searches of candidates for gene alignment • detecting candidate regulatory signals • detecting promoter regions • detecting repetitive elements • duplications of genomic • horizontal gene transfer Genome-wide: • phylogenetic analysis • species recognition • whole-genome sequence comparisons
  • 5. Linguistic-like methods Detecting all of “words” with certain maximal length Characterizing the sequence “vocabulary” Scoring the occurrences of fixed-length “words” from a predefined “vocabulary” Comparison of “word” frequencies obtained from different sequences Comparison the “vocabularies” of different sequences Compositional Spectra Analysis
  • 6. Compositional Spectra    A linguistic-like method of genome analysis based on occurrences of “words” in the A,C,G,T alphabet Compositional spectrum (CS) is measured as a histogram of imperfect word occurrences From: V. Kirzhner et al., 2002-2005 6
  • 7. Methods: calculating of distances d1 d’1 d’2 d2 F(Si, W) F(S’i, W) F(Sj, W) F(S’ j, W) 5’ 5’3’ 3’ Manhattan (city block) distance Spearman Rank Correlation ρ (d= 1-ρ) Kendall distance τ d = min(di, d’i, dj, d’j) F(Si, W’) F(Sj, W’)
  • 8. Methods: Detection of Organizational Pattern groups of segments Genome segment number Low HighClustering tree Relative distance between two clusters Maximal distance between segments Neighbor-Joining Clustering “adaptive cutoff”
  • 9. Analysis of Organizational Pattern groups of segments 9
  • 10. Significant variation of evolutionary features of 100 kbp sequences within GC ranges Testing for potential association between genome-wide distribution of organizational patterns and various evolutionary and structural features reveals the existence of inter-OP heterogeneity in such features as SNP and Indels frequency, recombination rate, number of segmental duplications, size of linkage disequilibrium blocks, and proportion of evolutionary conserved sequence. 10
  • 11. Estimation of heterogeneity between OP groups 11 GC RecombinationRate
  • 12. Estimation of heterogeneity between OP groups 12 0.22 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 8.8×10-5 0.03 1.9×10-3 0.01 0.11 3.9×10-3 -log(FDR-correctedp-value) GC Kruskal–Wallis non-parametric rank test 10,000 segments reshuffles to estimate test critical value FDR correction for multiple comparisons Reshuffled sequences within every segment as control 2.3 5.1 86.1 48.6 81.9 35.7 21.0 26.0 46.7 36.6 13.6 15.7 15.5 16.9
  • 13. Detecting the words related to recombination rate 13 GC, % Average RR in the compared OPGs Proportion of correct classifications of segments to OP groups, % low RR high RR all words set of 47 words set of 8 words 35 0.82 0.93 98.60 98.62 76.03 36 0.62 1.16 98.40 96.56 82.34 37 0.83 1.28 94.10 93.88 80.47 38 0.80 1.46 99.58 99.17 98.33 39 0.91 1.59 97.32 97.32 96.55 40 0.96 1.50 100.0 100.0 100.0 41 1.13 1.81 98.80 98.50 98.50 42 1.05 1.80 100.0 100.0 99.62 43 1.29 1.99 97.48 96.98 95.46 44 1.44 1.83 99.01 99.21 98.81 45 1.35 2.06 100 98.93 98.22 46 1.30 1.88 98.53 98.53 97.35 47 1.15 1.74 94.62 94.61 91.48 48 1.33 2.04 98.78 98.77 97.55
  • 14. Oligonucleotides, which showed high importance in more than half of OPG comparisons in classification of 100kbp segments for high and low recombination rate 14 Oligonucleotide GC, % Appeared in the list of 10 most important variables (times) Appeared as the most important variable (times) Previously described pattern Reference CAGCCAGGTT 60 11 4 -CCNCCNTNNCCNC- -CAGCCAGGTT---- Myers et al. 2008 GACCGGACTG 70 10 1 ---CCTCCCT-- -GACCGGACTG- Myers et al. 2005 -CCNCCNTNNCCNC- ---GACCGGACTG-- Myers et al. 2008 CGCCGGGACT 80 10 3 -CCNCCNTNNCCNC- --CGCCGGGACT--- Myers et al. 2008 GCGTAGGCTA 60 9 0 -CCNCCNTNNCCNC- ---GCGTAGGCTA-- Myers et al. 2008 TGGGCCCGGC 90 8 4 n/a GGCGTGCGCG 90 8 1 -GGNGGNAGGGG- -GGCGTGCGCG-- Zheng et al. 2010 -CCNCCNTNNCCNC- ---GGCGTGCGCG-- Myers et al. 2008 CCCGGTATCG 70 8 0 -CCNCCNTNNCCNC- --CCCGGTATCG--- Myers et al. 2008 GCCCTTTCCT 60 7 0 ---CCTCCCT-- -GCCCTTTCCT- Myers et al. 2005 -CCNCCNTNNCCNC- ---GCCCTTTCCT-- Myers et al. 2008 -CCTCCCTNNCCAC- ---GCCCTTTCCT-- Myers et al. 2008
  • 15. Functionally related genes tend to reside in organizationally similar genomic regions Genes provided the GO enrichment of four organizational pattern clusters, which showed the most significant GO enrichments. L2-a cluster is enriched by “mitochondrion”, “intracellular non- membrane-bounded organelle”, “nuclear envelope” and “ribonucleoprotein complex” GO terms; L2-h cluster is enriched by “G- protein-coupled receptor protein signaling pathway” and “sensory perception of smell” GO terms; H1-i cluster is enriched by “epithelial cell differentiation” and “epithelium development” GO terms; H2-a cluster is enriched by “skeletal system development” GO term. Paz A, Frenkel S, Snir S, Kirzhner V, Korol A. 2014. BMC Genomics 15:252. 15
  • 16. Thank you for your attention Acknowledgments Dr. Valery Kirzhner Prof. Abraham Korol Prof. Edward Trifonov Dr. Arnon Paz and Dr. Zeev Frenkel This work was supported by The Israeli Ministry of Immigrant Absorption The Israel Council for Higher Education
  • 18. Spearman's rank correlation coefficient rho  Spearman's rank correlation coefficient is a non- parametric measure of correlation  ρ is given by:  where: • Di = xi − yi = the difference between the ranks of corresponding values Xi and Yi, and • n = the number of values in each data set (same for both sets). Additional slide
  • 19. The Kendall tau distance  The Kendall tau distance is a metric that counts the number of pairwise disagreements between two lists. The larger the distance, the more dissimilar the two lists are.  The Kendall tau distance between two lists τ1 and τ2 is  K(τ1,τ2) will be equal to 0 if the two lists are identical and n(n − 1) / 2 (where n is the list size) if one list is the reverse of the other. Often Kendall tau distance is normalized by dividing by n(n − 1) / 2 so a value of 1 indicates maximum disagreement. The normalized Kendall tau distance therefore lies in the interval [0,1]. Additional slide