SlideShare a Scribd company logo
1 of 12
Pairwise sequence Alignment

                    Dr Avril Coghlan
                   alc@sanger.ac.uk

Note: this talk contains animations which can only be seen by
downloading and using ‘View Slide show’ in Powerpoint
Sequence comparison
• How can we compare the human & Drosophila
  melanogaster Eyeless protein sequences?
  One method is a dotplot
• A dotplot is a graphical (visual) approach
  Regions of local similarity between the 2 sequences appear as diagonal
       lines of coloured cells (‘dots’)
                Fruitfly Eyeless




                                                   Window-size = 10,
                                                   Threshold = 5




                                   Human Eyeless
Sequence alignment
• A second method for comparing sequences is a
  sequence alignment
• An alignment is an arrangement in columns of 2
  sequences, highlighting their similarity
  The sequences are padded with gaps (dashes) so that wherever
  possible, alignment columns contain identical letters from the   two
  sequences involved
  An insertion or deletion is represented by ‘–’ (a gap)
  The symbol “|” is used to represent matches
  eg. here is an alignment for amino acid sequences
  “QKGSYPVRSTC” & “QKGSGPVRSTC”:

            Q K G S Y P V R S T C             This alignment has
                                              There are 10 matches
                                                     is 1 mismatch
            | | | |   | | | | | |
            Q K G S G P V R S T C              11 columns
            1 2 3 4 5 6 7 8 9 10 11
Sequence alignment
• An alignment of the human and fruitfly
  (Drosophila melanogaster) Eyeless proteins:
What does an alignment mean?
• An alignment is tells you tells you what mutations
  occurred in the sequences since the sequences
  shared a common ancestor
  eg. an alignment of the human & fruitfly Eyeless suggests:
  (i) there were probably deletion(s) at the start of the human
  Eyeless, or insertion(s) at the start of fruitfly Eyeless




  (ii) there was probably a G→N substitution in human Eyeless, or a N→G
         substitution in fruitfly Eyeless (see arrow)
How do we make an alignment?
• Given two or more sequences, what is the best way
  to align them to each other
  We want the alignment columns to contain identical letters
• Comparison of similar sequences of similar length is
  straightforward
  eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, we
       line up the identical letters in columns:

               Q K G S Y P V R S T C            sequence 1
               | | | |   | | | | | |
               Q K G S G P V R S T C            sequence 2

  The alignment implies that one mutation occurred since the two
  sequences shared a common ancestor
  That is, the alignment implies there was a G→Y substitution in
  sequence 1 or a Y→G substitution in sequence 2
Problem
• Are there other possible plausible alignments for
  sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
Answer
• Are there other possible plausible alignments for
  sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
  There are many other possible alignments, eg. :

  Q K G S Y - P V R S T C
  | | |       | | | | | |
  Q K G - S G P V R S T C
  Q K G S - Y P V R S T C
  | | | |       | | | | |
  Q K G S G P - V R S T C
  Q K G - - - - - S Y P V R S T C
  | | |           |           | |
  Q K G S G P V R S - - - - - T C
  Q K - G S Y P V R S T C
  | |                   |
  Q K G S G P V R S T - C                  etc. etc. etc. . . .
Number of possible pairwise alignments
• There are lots of different possible alignments for
  two sequences that are both of length n
  The number of possible alignments of 2 seqs of length n letters (amino
  acids/nucleotides) is ( ) (“2n2n
                                 choose n”)
                                       n
      2n
  (   n)   can be calculated as ( 2n
                                   n   ) =   (2*n) !
                                             n! * n!
  where n! (‘n factorial’) = n * (n - 1) * (n – 2) * (n – 3) * ... * 3 * 2 * 1
• For example, for “QKGSYPVRSTC” &
  “QKGSGPVRSTC”, n (length) = 11 letters
  The number of possible alignments of these two sequences is
  (2*11) = ( 22 ) = (2*11) !  =           22!
    11       11
                    11! * 11!     39916800*3991680

  = 1.124001e+21/1.593351e+15 = 705,432 possible alignments
Number of possible pairwise alignments
• Even for relatively short sequences, (2n ) is large, so
                                        n
  there are lots of possible alignments
  eg. for two sequences that are both 11 letters long, there are
  705,432 possible alignments
• In fact, the number of possible alignments, ( 2n ),
                                                n
  increases exponentially with the sequence length (n)
  ie. ( 2n ) is approximately equal to 22n
        n

                                                        For two sequences of
    Number of                                           17 letters long (n=17),
    possible                                            there are 2.3 billion
    alignments                                          possible alignments



                         Length of sequences (n)
• Many of the possible alignments for 2 seqs are
  implausible as they imply many mutations occurred
  (but it’s known mutations are rare)
  eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, the
        alignment made by lining the identical letters into columns only
        implies one mutation:
  Q K G S Y P V R S T C              This alignment implies that 1 G→Y or
  | | | |   | | | | | |              Y→G substitution occurred
  Q K G S G P V R S T C

  Many of the alternative alignments for these two sequences        imply
  that many more mutations occurred, eg. :

  Q K G S Y - P V R S T C             This alignment implies that 1 S→Y or
  | | |       | | | | | |             Y→S substitution occurred;
  Q K G - S G P V R S T C
                                      that 1 insertion of S or deletion of S
                                      occurred;
                                      and that 1 deletion of G or insertion of G
                                      occurred
Further Reading
•   Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn
•   Practical on pairwise alignment in R in the Little Book of R for
    Bioinformatics:
    https://a-little-book-of-r-for-
    bioinformatics.readthedocs.org/en/latest/src/chapter4.html

More Related Content

What's hot (20)

Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Gen bank databases
Gen bank databasesGen bank databases
Gen bank databases
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
sequence alignment
sequence alignmentsequence alignment
sequence alignment
 
Blast
BlastBlast
Blast
 
Fasta
FastaFasta
Fasta
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)Global and local alignment (bioinformatics)
Global and local alignment (bioinformatics)
 
blast bioinformatics
blast bioinformaticsblast bioinformatics
blast bioinformatics
 
Fasta
FastaFasta
Fasta
 
Sequence alignment
Sequence alignmentSequence alignment
Sequence alignment
 
Dynamic programming
Dynamic programming Dynamic programming
Dynamic programming
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
PAM : Point Accepted Mutation
PAM : Point Accepted MutationPAM : Point Accepted Mutation
PAM : Point Accepted Mutation
 
(Expasy)
(Expasy)(Expasy)
(Expasy)
 
BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)BLAST (Basic local alignment search Tool)
BLAST (Basic local alignment search Tool)
 
Needleman-wunch algorithm harshita
Needleman-wunch algorithm  harshitaNeedleman-wunch algorithm  harshita
Needleman-wunch algorithm harshita
 
shotgun sequncing
 shotgun sequncing shotgun sequncing
shotgun sequncing
 
SEQUENCE ANALYSIS
SEQUENCE ANALYSISSEQUENCE ANALYSIS
SEQUENCE ANALYSIS
 

Viewers also liked

Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsNikesh Narayanan
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionGrijesh Chauhan
 
Using Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection SystemsUsing Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection SystemsOmar Shaya
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databasesfarwa fayaz
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure predictionkaramveer prajapat
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database bhargvi sharma
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databasesPranavathiyani G
 

Viewers also liked (9)

Sequence Alignment In Bioinformatics
Sequence Alignment In BioinformaticsSequence Alignment In Bioinformatics
Sequence Alignment In Bioinformatics
 
Metamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and DetectionMetamorphic Malware Analysis and Detection
Metamorphic Malware Analysis and Detection
 
Using Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection SystemsUsing Machine Learning in Networks Intrusion Detection Systems
Using Machine Learning in Networks Intrusion Detection Systems
 
Global alignment
Global alignmentGlobal alignment
Global alignment
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
 
methods for protein structure prediction
methods for protein structure predictionmethods for protein structure prediction
methods for protein structure prediction
 
Nucleic acid database
Nucleic acid database Nucleic acid database
Nucleic acid database
 
Nucleic Acid Sequence databases
Nucleic Acid Sequence databasesNucleic Acid Sequence databases
Nucleic Acid Sequence databases
 
Biological databases
Biological databasesBiological databases
Biological databases
 

Similar to Pairwise sequence alignment

Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentationaustinps
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignmentavrilcoghlan
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi DivergenceMasahiro Suzuki
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Konstantinos Giannakis
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesChase Yetter
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...University of California, San Diego
 
Smaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexesSmaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexesFabio Cunial
 
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical ProcessesBetter Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical ProcessesMarco Peressotti
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)Danushka Bollegala
 
20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomicsComputer Science Club
 

Similar to Pairwise sequence alignment (20)

Ch06 multalign
Ch06 multalignCh06 multalign
Ch06 multalign
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
Bioinformatics lesson
Bioinformatics lessonBioinformatics lesson
Bioinformatics lesson
 
SyMAP Master's Thesis Presentation
SyMAP Master's Thesis PresentationSyMAP Master's Thesis Presentation
SyMAP Master's Thesis Presentation
 
Slides4
Slides4Slides4
Slides4
 
Dot matrix seminar
Dot matrix seminarDot matrix seminar
Dot matrix seminar
 
A superglue for string comparison
A superglue for string comparisonA superglue for string comparison
A superglue for string comparison
 
Multiple alignment
Multiple alignmentMultiple alignment
Multiple alignment
 
(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence(DL hacks輪読) Variational Inference with Rényi Divergence
(DL hacks輪読) Variational Inference with Rényi Divergence
 
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
Infinite and Standard Computation with Unconventional and Quantum Methods Usi...
 
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph StatesConference Poster: Discrete Symmetries of Symmetric Hypergraph States
Conference Poster: Discrete Symmetries of Symmetric Hypergraph States
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
UCSD NANO106 - 03 - Lattice Directions and Planes, Reciprocal Lattice and Coo...
 
Smaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexesSmaller fully-functional bidirectional BWT indexes
Smaller fully-functional bidirectional BWT indexes
 
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical ProcessesBetter Late Than Never: A Fully Abstract Semantics for Classical Processes
Better Late Than Never: A Fully Abstract Semantics for Classical Processes
 
深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)深層意味表現学習 (Deep Semantic Representations)
深層意味表現学習 (Deep Semantic Representations)
 
seq alignment.ppt
seq alignment.pptseq alignment.ppt
seq alignment.ppt
 
20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics20110501 csseminar alekseyev_comparative_genomics
20110501 csseminar alekseyev_comparative_genomics
 
Quantified NTL
Quantified NTLQuantified NTL
Quantified NTL
 

More from avrilcoghlan

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club avrilcoghlan
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomesavrilcoghlan
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignmentsavrilcoghlan
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithmavrilcoghlan
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithmavrilcoghlan
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformaticsavrilcoghlan
 

More from avrilcoghlan (9)

DESeq Paper Journal club
DESeq Paper Journal club DESeq Paper Journal club
DESeq Paper Journal club
 
Introduction to genomes
Introduction to genomesIntroduction to genomes
Introduction to genomes
 
Homology
HomologyHomology
Homology
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
BLAST
BLASTBLAST
BLAST
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
The Needleman Wunsch algorithm
The Needleman Wunsch algorithmThe Needleman Wunsch algorithm
The Needleman Wunsch algorithm
 
Introduction to HMMs in Bioinformatics
Introduction to HMMs in BioinformaticsIntroduction to HMMs in Bioinformatics
Introduction to HMMs in Bioinformatics
 

Recently uploaded

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Recently uploaded (20)

Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Pairwise sequence alignment

  • 1. Pairwise sequence Alignment Dr Avril Coghlan alc@sanger.ac.uk Note: this talk contains animations which can only be seen by downloading and using ‘View Slide show’ in Powerpoint
  • 2. Sequence comparison • How can we compare the human & Drosophila melanogaster Eyeless protein sequences? One method is a dotplot • A dotplot is a graphical (visual) approach Regions of local similarity between the 2 sequences appear as diagonal lines of coloured cells (‘dots’) Fruitfly Eyeless Window-size = 10, Threshold = 5 Human Eyeless
  • 3. Sequence alignment • A second method for comparing sequences is a sequence alignment • An alignment is an arrangement in columns of 2 sequences, highlighting their similarity The sequences are padded with gaps (dashes) so that wherever possible, alignment columns contain identical letters from the two sequences involved An insertion or deletion is represented by ‘–’ (a gap) The symbol “|” is used to represent matches eg. here is an alignment for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”: Q K G S Y P V R S T C This alignment has There are 10 matches is 1 mismatch | | | | | | | | | | Q K G S G P V R S T C 11 columns 1 2 3 4 5 6 7 8 9 10 11
  • 4. Sequence alignment • An alignment of the human and fruitfly (Drosophila melanogaster) Eyeless proteins:
  • 5. What does an alignment mean? • An alignment is tells you tells you what mutations occurred in the sequences since the sequences shared a common ancestor eg. an alignment of the human & fruitfly Eyeless suggests: (i) there were probably deletion(s) at the start of the human Eyeless, or insertion(s) at the start of fruitfly Eyeless (ii) there was probably a G→N substitution in human Eyeless, or a N→G substitution in fruitfly Eyeless (see arrow)
  • 6. How do we make an alignment? • Given two or more sequences, what is the best way to align them to each other We want the alignment columns to contain identical letters • Comparison of similar sequences of similar length is straightforward eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, we line up the identical letters in columns: Q K G S Y P V R S T C sequence 1 | | | | | | | | | | Q K G S G P V R S T C sequence 2 The alignment implies that one mutation occurred since the two sequences shared a common ancestor That is, the alignment implies there was a G→Y substitution in sequence 1 or a Y→G substitution in sequence 2
  • 7. Problem • Are there other possible plausible alignments for sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”?
  • 8. Answer • Are there other possible plausible alignments for sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”? There are many other possible alignments, eg. : Q K G S Y - P V R S T C | | | | | | | | | Q K G - S G P V R S T C Q K G S - Y P V R S T C | | | | | | | | | Q K G S G P - V R S T C Q K G - - - - - S Y P V R S T C | | | | | | Q K G S G P V R S - - - - - T C Q K - G S Y P V R S T C | | | Q K G S G P V R S T - C etc. etc. etc. . . .
  • 9. Number of possible pairwise alignments • There are lots of different possible alignments for two sequences that are both of length n The number of possible alignments of 2 seqs of length n letters (amino acids/nucleotides) is ( ) (“2n2n choose n”) n 2n ( n) can be calculated as ( 2n n ) = (2*n) ! n! * n! where n! (‘n factorial’) = n * (n - 1) * (n – 2) * (n – 3) * ... * 3 * 2 * 1 • For example, for “QKGSYPVRSTC” & “QKGSGPVRSTC”, n (length) = 11 letters The number of possible alignments of these two sequences is (2*11) = ( 22 ) = (2*11) ! = 22! 11 11 11! * 11! 39916800*3991680 = 1.124001e+21/1.593351e+15 = 705,432 possible alignments
  • 10. Number of possible pairwise alignments • Even for relatively short sequences, (2n ) is large, so n there are lots of possible alignments eg. for two sequences that are both 11 letters long, there are 705,432 possible alignments • In fact, the number of possible alignments, ( 2n ), n increases exponentially with the sequence length (n) ie. ( 2n ) is approximately equal to 22n n For two sequences of Number of 17 letters long (n=17), possible there are 2.3 billion alignments possible alignments Length of sequences (n)
  • 11. • Many of the possible alignments for 2 seqs are implausible as they imply many mutations occurred (but it’s known mutations are rare) eg. for amino acid sequences “QKGSYPVRSTC” & “QKGSGPVRSTC”, the alignment made by lining the identical letters into columns only implies one mutation: Q K G S Y P V R S T C This alignment implies that 1 G→Y or | | | | | | | | | | Y→G substitution occurred Q K G S G P V R S T C Many of the alternative alignments for these two sequences imply that many more mutations occurred, eg. : Q K G S Y - P V R S T C This alignment implies that 1 S→Y or | | | | | | | | | Y→S substitution occurred; Q K G - S G P V R S T C that 1 insertion of S or deletion of S occurred; and that 1 deletion of G or insertion of G occurred
  • 12. Further Reading • Chapter 3 in Introduction to Computational Genomics Cristianini & Hahn • Practical on pairwise alignment in R in the Little Book of R for Bioinformatics: https://a-little-book-of-r-for- bioinformatics.readthedocs.org/en/latest/src/chapter4.html

Editor's Notes

  1. Made
  2. Made alignment of human.fa and fly.fa using Needleman-wunsch with default parameters at: http://emboss.bioinformatics.nl/cgi-bin/emboss/needle (EMBOSS needle) Human Eyeless (PAX6) from: http://www.treefam.org/cgi-bin/TFseq.pl?id=ENST00000379111.1 D. Melanogaster Eyeless from: http://www.treefam.org/cgi-bin/TFseq.pl?id=FBtr0100396.5 Viewed in jalview, and saved as humanfly_needlemanwunsch.png
  3. Made
  4. Made
  5. In R factorial(22)/( (factorial(11)) * (factorial(11)) )
  6. N.B. (2n choose n) = the binomial coefficient = the number of ways that n things can be 'chosen' from a set of 2 n things = ((2n)!)/(n!)*(n!). This can be shown to be proportional to 2^(2*n) (Deonier, Tavare & Waterman book page 158-9). Graph made using wolfram alpha at http://www.wolframalpha.com/ and typing “plot 2n choose n from 1 to 20”.
  7. Made