Dr Aron Fazekas - Plant DNA Barcoding; data workflow

2,086 views
1,979 views

Published on

Dr Fazekas process for checking and editing DNA sequences before publishing on BOLD.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,086
On SlideShare
0
From Embeds
0
Number of Embeds
92
Actions
Shares
0
Downloads
75
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Assumptions: BOLD project exists already. Just received raw data back from sequencer.
  • Every base is criticalOther principles: homology
  • Mention orientation
  • Mention orientation
  • Contigs need to agree…ABI software will make mistakes from time to time
  • Important to look at the sequence… many gaps inserted (an extreme example, but it can happen on a smaller scale.
  • Delete old alignment or make new: develop methods to backcheck the aligned file with the original
  • Relevant points outliers odditiesSingle sequenes – how do we know they are what they are?
  • Dr Aron Fazekas - Plant DNA Barcoding; data workflow

    1. 1. Plant DNA Barcoding:data workflowAron Fazekas University of Guelph
    2. 2. Plant DNA Barcoding: data workflowWorkflow Outline: raw sequence editing data alignment re-edit the sequence file upload to BOLD quality checks using BOLD / genbank
    3. 3. Sequence editing: primer trimming
    4. 4. Sequence editing: primer trimming 5’ GTTATGCATGAACGTAATGCTC GAGCATTACGT….
    5. 5. Sequence editing: primer trimming
    6. 6. Sequence editing: editing miscalls
    7. 7. Sequence editing: congruence between forward/ reverse reads
    8. 8. Sequence Alignment After editing: need to align the data Kelchner (2000) Ann Missouri Bot Gard rbcL easy to align - most programs work well matK tricky to align – TransAlign seems to do the best job trnH difficult (impossible between genera?) ITS difficult (impossible between genera?)Clustal www.clustal.orgTransAlign http://www.biomedcentral.com/1471-2105/6/156K-Align http://www.ebi.ac.uk/Tools/msa/kalign/
    9. 9. Sequence AlignmentProblems to look for after alignment: - primers not trimmed - gaps at the ends - gaps in the middle (protein coding) - translation shows stop codons
    10. 10. - primers not trimmed trnH-psbA- gaps at the ends Real data submitted for publication
    11. 11. rbcL - gaps in the middle of a data submitted for publicationcoding region
    12. 12. Translate coding regions (rbcL, matK) toensure there are no stop codons present
    13. 13. Edit both the alignment file and the original sequence file
    14. 14. Can trnH-psbA (or other non-coding sequence) be alignedacross diverse species?
    15. 15. Upload to BOLD
    16. 16. After data is edited, aligned: use BOLD tocreate a tree
    17. 17. • Check for misplaced taxa – remove them from the dataset• Check for singleton species – make a list
    18. 18. BOLD BLAST check
    19. 19. Genbank BLAST check
    20. 20. Genbank BLAST check
    21. 21. Genbank Blast
    22. 22. Acknowledgements Sujeevan Ratnasingham & Bold Team

    ×