SlideShare a Scribd company logo
1 of 19
Comparison of Genomic DNA to
cDNA Alignment Methods
Miguel Galves and Zanoni Dias
Institute of Computing – Unicamp – Campinas – SP – Brazil
{miguel.galves,zanoni}@ic.unicamp.br
Scylla Bioinformatics – Campinas – SP – Brazil
{miguel,zanoni}@scylla.com.br
Agenda
 Introduction
 Problem
 Aligners
 Data set
 Subsets
 Evaluation Methods
 Results: Exact Alignments
 Results: EST Alignments
 Running Time Comparison
 Conclusions
Introduction
 Identifying genes in non-characterized DNA
sequences is one of the greatest challenges in
genomics
 EST-to-DNA alignment is one of the most common
methods
 EST are key to understanding the inner working of
an organism
– Human being has between 30000 and 35000 genes
– Alternative Splicing plays an important role in diversity
CCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGGCCCGGGAAACGAAUAU CCUCUCACCCGGG
A
CUUGG
Problem
Mature mRNA
mRNA
Intron
Exon
Problem: How to solve ?
 Classic algorithms
– Dynamic programming
 Heuristic based algorithms
– Multi-steps
– Based on other tools such as Blast and
local alignments.
Aligners
 Java version of global and semi-global
– Affine gap penalty function
– Linear space
– Global algorithm by Miller and Myers (1988)
– Semi-global based on global algorithm
 Heuristic based algorithms
– sim4, Spidey and est_genome
Data Set
 Human genome database
– Based on FASTA a GENBANK’s flat format file from
NCBI repository.
 Filtering criteria
– Genes, mRNAs and CDS with /pseudo tag
– mRNAs without any CDS
– Genes without any mRNA
– CDS matching wrong patterns
 23124 genes and 27448 mRNAs stored in database
Subsets
 Subset 1Subset 1:: 66 genes from chromossome Y whith
less than 100000 bases
 Subset 2: 50 complete genes from chromossome
Y whith less than 100000 bases
 Subset 3: 8056 complete genes from all
chromossomes whith less than 100000 bases
 Subset 4: 493 artificial EST based on complete
genes from chromossome 6 with less than
100000 bases
Evaluation methods
 Number of gaps introduced in the aligned
gene sequence
 Delta exons
 Bases similarity percentage
 Mismatch percentage
Experimental method
 Two score systems, from 15 previously
defined and an alignment strategy were
choosed, using subsets 1 and 2:
– Semi-global aligner
– (1,-2,-1,0) and (1,-2,-10,0) score systems
 The classic semi-global aligner was
compared to sim4, Spidey and est_genome,
both with subsets 3 and 4
Results: Exact Alignments
Extra Gap
Strategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10,
0)
0.00 0.00 100.00%
sim4 1.11 1.63 54.56%
est_genome 16.99 21.49 27.84%
Spidey 0.15 1.39 97.43%
Results: Exact Alignments
Delta Exons
Strategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10, 0) 0.01 0.07 99.91%
sim4 -0.01 0.20 97.46%
est_genome -0.14 0.30 76.79%
Spidey -4.04 3.10 0.00%
Results: Exact Alignments
Base Similarity
Strategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 99.89% 0.49% 53.56%
SG(1, -2, -10, 0) 99.89% 0.49% 53.49%
sim4 99.39% 1.34% 22.79%
est_genome 53.83% 35.00% 18.11%
Spidey 80.34% 36.49% 44.25%
Results: Exact Alignments
Mismatch Percentage
Strategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 0.00% 0.00% 100.00%
SG(1, -2, -10, 0) 0.01% 0.03% 99.47%
sim4 0.17% 0.21% 36.68%
est_genome 1.19% 1.26% 21.55%
Spidey 0.15% 0.98% 90.65%
Results: EST Alignments
Results: EST Alignments
Running Time Comparison
EST-to-DNA
(sec/alignment)
mRNA-toDNA
(sec/alignment)
sim4 0.013 0.170
Spidey 0.066 0.140
est_genome 0.640 3.400
Semi-global 0.670 5.170
Conclusions
 Classic semi-globl algorithm produces good
results
– Running time is a problem, although it can be
improved
 Sim4 produces the best results amont
external softwares tested
Thanks

More Related Content

Similar to Comparison of Genomic DNA to cDNA Alignment Methods

PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
sequence alignment
sequence alignmentsequence alignment
sequence alignmentammar kareem
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadfalizain9604
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшаваValeriya Simeonova
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...cscpconf
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Arinze Akutekwe
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfH K Yoon
 
Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Florian Markowetz
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...eSAT Journals
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programmingeSAT Publishing House
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andAlexander Decker
 

Similar to Comparison of Genomic DNA to cDNA Alignment Methods (20)

PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
lecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadflecture4.ppt Sequence Alignmentaldf sdfsadf
lecture4.ppt Sequence Alignmentaldf sdfsadf
 
презентация за варшава
презентация за варшавапрезентация за варшава
презентация за варшава
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
 
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO... ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
ENHANCED BREAST CANCER RECOGNITION BASED ON ROTATION FOREST FEATURE SELECTIO...
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
Inference of Nonlinear Gene Regulatory Networks through Optimized Ensemble of...
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1Network Biology Lent 2010 - lecture 1
Network Biology Lent 2010 - lecture 1
 
Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...Comparative analysis of dynamic programming algorithms to find similarity in ...
Comparative analysis of dynamic programming algorithms to find similarity in ...
 
Comparative analysis of dynamic programming
Comparative analysis of dynamic programmingComparative analysis of dynamic programming
Comparative analysis of dynamic programming
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Discovery_Schreiner
Discovery_SchreinerDiscovery_Schreiner
Discovery_Schreiner
 

More from Miguel Galves

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Miguel Galves
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Miguel Galves
 
New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPsMiguel Galves
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de MestradoMiguel Galves
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaMiguel Galves
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webMiguel Galves
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociaisMiguel Galves
 

More from Miguel Galves (9)

Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
Processamento de tweets em tempo real com Python, Django e Celery - TDC 2014
 
Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014Redis para iniciantes - TDC 2014
Redis para iniciantes - TDC 2014
 
New Strategy to detect SNPs
New Strategy to detect SNPsNew Strategy to detect SNPs
New Strategy to detect SNPs
 
Qualificação de Mestrado
Qualificação de MestradoQualificação de Mestrado
Qualificação de Mestrado
 
Uma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base únicaUma abordagem computacional para a determinação de polimorfismos de base única
Uma abordagem computacional para a determinação de polimorfismos de base única
 
Django: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento webDjango: Uso de frameworks ágeis para desenvolvimento web
Django: Uso de frameworks ágeis para desenvolvimento web
 
GIS em 3 horas
GIS em 3 horasGIS em 3 horas
GIS em 3 horas
 
AJAX
AJAXAJAX
AJAX
 
Data Mining em redes sociais
Data Mining em redes sociaisData Mining em redes sociais
Data Mining em redes sociais
 

Recently uploaded

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oManavSingh202607
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 

Recently uploaded (20)

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Comparison of Genomic DNA to cDNA Alignment Methods

  • 1. Comparison of Genomic DNA to cDNA Alignment Methods Miguel Galves and Zanoni Dias Institute of Computing – Unicamp – Campinas – SP – Brazil {miguel.galves,zanoni}@ic.unicamp.br Scylla Bioinformatics – Campinas – SP – Brazil {miguel,zanoni}@scylla.com.br
  • 2. Agenda  Introduction  Problem  Aligners  Data set  Subsets  Evaluation Methods  Results: Exact Alignments  Results: EST Alignments  Running Time Comparison  Conclusions
  • 3. Introduction  Identifying genes in non-characterized DNA sequences is one of the greatest challenges in genomics  EST-to-DNA alignment is one of the most common methods  EST are key to understanding the inner working of an organism – Human being has between 30000 and 35000 genes – Alternative Splicing plays an important role in diversity
  • 4. CCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGGCCCGGGAAACGAAUAU CCUCUCACCCGGG A CUUGG Problem Mature mRNA mRNA Intron Exon
  • 5. Problem: How to solve ?  Classic algorithms – Dynamic programming  Heuristic based algorithms – Multi-steps – Based on other tools such as Blast and local alignments.
  • 6. Aligners  Java version of global and semi-global – Affine gap penalty function – Linear space – Global algorithm by Miller and Myers (1988) – Semi-global based on global algorithm  Heuristic based algorithms – sim4, Spidey and est_genome
  • 7. Data Set  Human genome database – Based on FASTA a GENBANK’s flat format file from NCBI repository.  Filtering criteria – Genes, mRNAs and CDS with /pseudo tag – mRNAs without any CDS – Genes without any mRNA – CDS matching wrong patterns  23124 genes and 27448 mRNAs stored in database
  • 8. Subsets  Subset 1Subset 1:: 66 genes from chromossome Y whith less than 100000 bases  Subset 2: 50 complete genes from chromossome Y whith less than 100000 bases  Subset 3: 8056 complete genes from all chromossomes whith less than 100000 bases  Subset 4: 493 artificial EST based on complete genes from chromossome 6 with less than 100000 bases
  • 9. Evaluation methods  Number of gaps introduced in the aligned gene sequence  Delta exons  Bases similarity percentage  Mismatch percentage
  • 10. Experimental method  Two score systems, from 15 previously defined and an alignment strategy were choosed, using subsets 1 and 2: – Semi-global aligner – (1,-2,-1,0) and (1,-2,-10,0) score systems  The classic semi-global aligner was compared to sim4, Spidey and est_genome, both with subsets 3 and 4
  • 11. Results: Exact Alignments Extra Gap Strategy Avg SD %Score 0 SG(1, -2, -1, 0) 0.00 0.00 100.00% SG(1, -2, -10, 0) 0.00 0.00 100.00% sim4 1.11 1.63 54.56% est_genome 16.99 21.49 27.84% Spidey 0.15 1.39 97.43%
  • 12. Results: Exact Alignments Delta Exons Strategy Avg SD %Score 0 SG(1, -2, -1, 0) 0.00 0.00 100.00% SG(1, -2, -10, 0) 0.01 0.07 99.91% sim4 -0.01 0.20 97.46% est_genome -0.14 0.30 76.79% Spidey -4.04 3.10 0.00%
  • 13. Results: Exact Alignments Base Similarity Strategy Avg SD %Scr. 100% SG(1, -2, -1, 0) 99.89% 0.49% 53.56% SG(1, -2, -10, 0) 99.89% 0.49% 53.49% sim4 99.39% 1.34% 22.79% est_genome 53.83% 35.00% 18.11% Spidey 80.34% 36.49% 44.25%
  • 14. Results: Exact Alignments Mismatch Percentage Strategy Avg SD %Scr. 100% SG(1, -2, -1, 0) 0.00% 0.00% 100.00% SG(1, -2, -10, 0) 0.01% 0.03% 99.47% sim4 0.17% 0.21% 36.68% est_genome 1.19% 1.26% 21.55% Spidey 0.15% 0.98% 90.65%
  • 17. Running Time Comparison EST-to-DNA (sec/alignment) mRNA-toDNA (sec/alignment) sim4 0.013 0.170 Spidey 0.066 0.140 est_genome 0.640 3.400 Semi-global 0.670 5.170
  • 18. Conclusions  Classic semi-globl algorithm produces good results – Running time is a problem, although it can be improved  Sim4 produces the best results amont external softwares tested