SlideShare a Scribd company logo
1 of 1
A New Method to Identify and Study Palindromic DNA
Devin Petersohn, Matt Spencer, Chi-Ren Shyu
University of Missouri – Informatics Institute, Department of Computer Science
A genetic palindrome is a DNA sequence that is the
same on both strands, when read from the 5’ to 3’
end in both cases. Palindromes are studied because
they are known to be the source of diseases,
including cancer.
Palindromes
5’
5’
3’
3’
Cruciforms (displayed above) are associated with
beneficial and harmful functions. The location of
palindromes is important for researching the
effect that cruciforms have on cell functions.
Module 4: Extract Palindromes
Module 5: Iterative Doubling
Module 6: Index and Retrieval
With our filtered k-blocks, palindromes are
easily identified by finding cases where
identical sequences on opposing strands
overlap. This gives us all palindromes of lengths
in the range [k, 2k) without the use of heuristics
and with no false positives.
To double the size of our k-blocks, each k-block
is hashed together with the next one in the
genome. The two sequences are joined and a
new k-block is made with the new k being twice
as large.
Modules 2-5 are repeated until no palindromes
are found in an iteration. This process
guarantees that all palindromes are located, as
larger palindromes are always extensions of
smaller ones.
The extracted palindromes are stored in a
database in the form of a Spark RDD. This
allows indexing by species, chromosome,
sequence length, and more. The database
trivializes further exploration of palindromes,
even when performing multi-species analyses.
A palindromic sequence is present on both the
forward and reverse strand of the same
chromosome. Thus, we remove any sequences
that do not fit this criteria, as they cannot be
part of a palindrome.
Module 1: Sequence Processing
Module 2: Coarse-Grained Filter
Module 3: Fine-Grained Filter
A k-block might be part of a palindrome with
length in [k, 2k) if it has a complementary core
around which the flanking nucleotides are
complementary. Without a complementary
core, we know the k-block isn’t part of a
palindrome in this length range, but it could still
be part of a larger palindrome.
A sliding window is used to scan the raw
genome sequence and collect all subsequences
of 6 base pairs and their reverse complements.
These are stored in a tuple with the genome,
chromosome, and position info as the key. We
call this tuple a “k-block”, with our initial k
being 6.
Findings
0.0001
0.01
1
100
10000
1000000
100000000
1E+10
6 12 24 48
Observed and Expected Palindromic DNA
Occurrences
Observed Expected
Length 6 GC Content and Center Bases
AT
CG
GC
TAGC
Content
AT Content
Length 12 GC Content and Center Bases
GC
Content
AT Content
Length 24 GC Content and Center Bases
AT
CG
GC
TA
GC Content
AT Content
Length 48 GC Content and Center Bases
GC Content
AT Content
• The longest palindrome in the dataset was found in I. tridecemlineatus (ground squirrel) with a length of
101,980bp.
• Extraordinarily long palindromes are abundant in the Gorilla gorilla genome.
• 13/24 Gorilla chromosomes have palindromes over 6kb long.
System Architecture
Future Work & Implications
• There is a genetic bias toward certain lengths and compositions of palindromic DNA
• Properly identifying this bias could lead to innovations in disease treatment
• Plants are very different from animals in their genetic makeup. Study of their palindromic makeup is vital to continued

More Related Content

What's hot

Abstract piis0022202 x1831114x
Abstract piis0022202 x1831114xAbstract piis0022202 x1831114x
Abstract piis0022202 x1831114xWaddah Moghram
 
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...Allison Mitchell
 
Dna sequencing pp
Dna sequencing ppDna sequencing pp
Dna sequencing pplibs6359
 
Statistics for K-mer Based Splicing Analysis
Statistics for K-mer Based Splicing AnalysisStatistics for K-mer Based Splicing Analysis
Statistics for K-mer Based Splicing AnalysisRuofei Du
 
watson and crick model of DNA(molecular biology)
watson and crick model of DNA(molecular biology) watson and crick model of DNA(molecular biology)
watson and crick model of DNA(molecular biology) IndrajaDoradla
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Scienceresearchinventy
 
Kurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSKurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSSwati Jalgaonkar
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENTDinie Fariz
 
NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartAlexander Pico
 
chromosomal abnormalities by Iqra malik
chromosomal abnormalities by Iqra malik chromosomal abnormalities by Iqra malik
chromosomal abnormalities by Iqra malik hafizaiqramalik
 
CVA Biology I - B10vrv4122
CVA Biology I - B10vrv4122CVA Biology I - B10vrv4122
CVA Biology I - B10vrv4122ClayVirtual
 
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...espontanea
 

What's hot (17)

Abstract piis0022202 x1831114x
Abstract piis0022202 x1831114xAbstract piis0022202 x1831114x
Abstract piis0022202 x1831114x
 
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
CHARACTERIZATION OF BIOCONJUGATED CHLOROTOXIN BINDING TO POPULATIONS OF NEURA...
 
Dna sequencing pp
Dna sequencing ppDna sequencing pp
Dna sequencing pp
 
Statistics for K-mer Based Splicing Analysis
Statistics for K-mer Based Splicing AnalysisStatistics for K-mer Based Splicing Analysis
Statistics for K-mer Based Splicing Analysis
 
watson and crick model of DNA(molecular biology)
watson and crick model of DNA(molecular biology) watson and crick model of DNA(molecular biology)
watson and crick model of DNA(molecular biology)
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
E1062632
E1062632E1062632
E1062632
 
Structure of Chromosomes
Structure of ChromosomesStructure of Chromosomes
Structure of Chromosomes
 
Research Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and ScienceResearch Inventy : International Journal of Engineering and Science
Research Inventy : International Journal of Engineering and Science
 
Kurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLSKurrey_et_al-2009-STEM_CELLS
Kurrey_et_al-2009-STEM_CELLS
 
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
APPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENTAPPLICATION OF  NEXT GENERATION SEQUENCING (NGS)  IN CANCER TREATMENT
APPLICATION OF NEXT GENERATION SEQUENCING (NGS) IN CANCER TREATMENT
 
NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver Hart
 
chromosomal abnormalities by Iqra malik
chromosomal abnormalities by Iqra malik chromosomal abnormalities by Iqra malik
chromosomal abnormalities by Iqra malik
 
CVA Biology I - B10vrv4122
CVA Biology I - B10vrv4122CVA Biology I - B10vrv4122
CVA Biology I - B10vrv4122
 
Pharmacogenomics
PharmacogenomicsPharmacogenomics
Pharmacogenomics
 
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...
Level of Tumor Protein Indicates Chances Cancer Will Spread AND Malfunctionin...
 
neha_ppt
neha_pptneha_ppt
neha_ppt
 

Viewers also liked (20)

Hemophilia
HemophiliaHemophilia
Hemophilia
 
Hemophilia
HemophiliaHemophilia
Hemophilia
 
Hemophilia
HemophiliaHemophilia
Hemophilia
 
Heamophilia
HeamophiliaHeamophilia
Heamophilia
 
Hemophilia,Clinical Presentation, Types,molecular Basis And Inheritance,overv...
Hemophilia,Clinical Presentation, Types,molecular Basis And Inheritance,overv...Hemophilia,Clinical Presentation, Types,molecular Basis And Inheritance,overv...
Hemophilia,Clinical Presentation, Types,molecular Basis And Inheritance,overv...
 
Sickle Cell Anemia
Sickle Cell AnemiaSickle Cell Anemia
Sickle Cell Anemia
 
Transgenic and cloned organisms
Transgenic and cloned organismsTransgenic and cloned organisms
Transgenic and cloned organisms
 
Coagulation disorder
Coagulation disorder Coagulation disorder
Coagulation disorder
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Hemophila pp
Hemophila ppHemophila pp
Hemophila pp
 
Hemophilia
HemophiliaHemophilia
Hemophilia
 
Anemia, thalassemia and hemophilia in children
Anemia, thalassemia and hemophilia in childrenAnemia, thalassemia and hemophilia in children
Anemia, thalassemia and hemophilia in children
 
Hemophilia
HemophiliaHemophilia
Hemophilia
 
Student Work Hemophilia
Student Work HemophiliaStudent Work Hemophilia
Student Work Hemophilia
 
Genetic inheritance and chromosomal disorders
Genetic inheritance and chromosomal disordersGenetic inheritance and chromosomal disorders
Genetic inheritance and chromosomal disorders
 
Inheritance and genetic of blood group
Inheritance and genetic of blood groupInheritance and genetic of blood group
Inheritance and genetic of blood group
 
Coagulation Disorders
Coagulation DisordersCoagulation Disorders
Coagulation Disorders
 
Hemophilia by Suhasis Mondal
Hemophilia by Suhasis MondalHemophilia by Suhasis Mondal
Hemophilia by Suhasis Mondal
 
Hemophilia
HemophiliaHemophilia
Hemophilia
 
COMMON GENETIC DISORDERS
COMMON GENETIC DISORDERSCOMMON GENETIC DISORDERS
COMMON GENETIC DISORDERS
 

Similar to Poster_Devin_Petersohn_Jeff_City

DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGPuneet Kulyana
 
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...Gang Zhang
 
theoretical perspectives on marriage and family
theoretical perspectives on marriage and familytheoretical perspectives on marriage and family
theoretical perspectives on marriage and familyRameenIqbal1
 
Nuclear Transport And Its Effect On Breast Cancer Tumor Cells
Nuclear Transport And Its Effect On Breast Cancer Tumor CellsNuclear Transport And Its Effect On Breast Cancer Tumor Cells
Nuclear Transport And Its Effect On Breast Cancer Tumor CellsStephanie Clark
 
cloning and sub-cloning
cloning and sub-cloningcloning and sub-cloning
cloning and sub-cloningSandhya Talla
 
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...Haley D. Norman
 
Useful.ppt
Useful.pptUseful.ppt
Useful.pptaaaa bbb
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cellAmitSamadhiya1
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS4RTPCRAnand
 
Crispr cas9 scalpels and their application
Crispr cas9 scalpels and their applicationCrispr cas9 scalpels and their application
Crispr cas9 scalpels and their applicationPyarelal Syoran
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionAashish Patel
 
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...Thermo Fisher Scientific
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun Mahadevan
 

Similar to Poster_Devin_Petersohn_Jeff_City (20)

DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCINGDNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
DNA SEQUENCING METHODS AND STRATEGIES FOR GENOME SEQUENCING
 
14825.full
14825.full14825.full
14825.full
 
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...
 
theoretical perspectives on marriage and family
theoretical perspectives on marriage and familytheoretical perspectives on marriage and family
theoretical perspectives on marriage and family
 
HHMI Poster
HHMI PosterHHMI Poster
HHMI Poster
 
Nuclear Transport And Its Effect On Breast Cancer Tumor Cells
Nuclear Transport And Its Effect On Breast Cancer Tumor CellsNuclear Transport And Its Effect On Breast Cancer Tumor Cells
Nuclear Transport And Its Effect On Breast Cancer Tumor Cells
 
CRISPR REPORT
CRISPR REPORTCRISPR REPORT
CRISPR REPORT
 
10.1.1.80.2149
10.1.1.80.214910.1.1.80.2149
10.1.1.80.2149
 
Project_702
Project_702Project_702
Project_702
 
cloning and sub-cloning
cloning and sub-cloningcloning and sub-cloning
cloning and sub-cloning
 
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
The Assembly, Structure and Activation of Influenza a M2 Transmembrane Domain...
 
Useful.ppt
Useful.pptUseful.ppt
Useful.ppt
 
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
DNA Sequencing -  DNA sequencing is like reading the instructions inside a cellDNA Sequencing -  DNA sequencing is like reading the instructions inside a cell
DNA Sequencing - DNA sequencing is like reading the instructions inside a cell
 
DNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGSDNA Sequencing: History, methods and NGS
DNA Sequencing: History, methods and NGS
 
Crispr cas9 scalpels and their application
Crispr cas9 scalpels and their applicationCrispr cas9 scalpels and their application
Crispr cas9 scalpels and their application
 
antiviral coursework
antiviral courseworkantiviral coursework
antiviral coursework
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Poster ESCS 2020 - PROIMI - CONICET
Poster ESCS 2020 - PROIMI - CONICETPoster ESCS 2020 - PROIMI - CONICET
Poster ESCS 2020 - PROIMI - CONICET
 
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...
Identification of Rare and Novel Alleles in FFPE Tumor Samples | ESHG 2015 Po...
 
Arjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTERArjun's Poster ACTUAL FINAL POSTER
Arjun's Poster ACTUAL FINAL POSTER
 

Poster_Devin_Petersohn_Jeff_City

  • 1. A New Method to Identify and Study Palindromic DNA Devin Petersohn, Matt Spencer, Chi-Ren Shyu University of Missouri – Informatics Institute, Department of Computer Science A genetic palindrome is a DNA sequence that is the same on both strands, when read from the 5’ to 3’ end in both cases. Palindromes are studied because they are known to be the source of diseases, including cancer. Palindromes 5’ 5’ 3’ 3’ Cruciforms (displayed above) are associated with beneficial and harmful functions. The location of palindromes is important for researching the effect that cruciforms have on cell functions. Module 4: Extract Palindromes Module 5: Iterative Doubling Module 6: Index and Retrieval With our filtered k-blocks, palindromes are easily identified by finding cases where identical sequences on opposing strands overlap. This gives us all palindromes of lengths in the range [k, 2k) without the use of heuristics and with no false positives. To double the size of our k-blocks, each k-block is hashed together with the next one in the genome. The two sequences are joined and a new k-block is made with the new k being twice as large. Modules 2-5 are repeated until no palindromes are found in an iteration. This process guarantees that all palindromes are located, as larger palindromes are always extensions of smaller ones. The extracted palindromes are stored in a database in the form of a Spark RDD. This allows indexing by species, chromosome, sequence length, and more. The database trivializes further exploration of palindromes, even when performing multi-species analyses. A palindromic sequence is present on both the forward and reverse strand of the same chromosome. Thus, we remove any sequences that do not fit this criteria, as they cannot be part of a palindrome. Module 1: Sequence Processing Module 2: Coarse-Grained Filter Module 3: Fine-Grained Filter A k-block might be part of a palindrome with length in [k, 2k) if it has a complementary core around which the flanking nucleotides are complementary. Without a complementary core, we know the k-block isn’t part of a palindrome in this length range, but it could still be part of a larger palindrome. A sliding window is used to scan the raw genome sequence and collect all subsequences of 6 base pairs and their reverse complements. These are stored in a tuple with the genome, chromosome, and position info as the key. We call this tuple a “k-block”, with our initial k being 6. Findings 0.0001 0.01 1 100 10000 1000000 100000000 1E+10 6 12 24 48 Observed and Expected Palindromic DNA Occurrences Observed Expected Length 6 GC Content and Center Bases AT CG GC TAGC Content AT Content Length 12 GC Content and Center Bases GC Content AT Content Length 24 GC Content and Center Bases AT CG GC TA GC Content AT Content Length 48 GC Content and Center Bases GC Content AT Content • The longest palindrome in the dataset was found in I. tridecemlineatus (ground squirrel) with a length of 101,980bp. • Extraordinarily long palindromes are abundant in the Gorilla gorilla genome. • 13/24 Gorilla chromosomes have palindromes over 6kb long. System Architecture Future Work & Implications • There is a genetic bias toward certain lengths and compositions of palindromic DNA • Properly identifying this bias could lead to innovations in disease treatment • Plants are very different from animals in their genetic makeup. Study of their palindromic makeup is vital to continued