Managing Data Flow Through the      Barcoding Pipeline                   Amy Driskell      Laboratories of Analytical Biol...
What is the “pipeline”?                  LIMSSpecimen                               Data Deposition                 Data QC
Outline1. BEFORE the LIMS2. LIMS  – Data recorded  – Exploring laboratory success/failure  – Tracking project completion3....
Critical data management BEFOREspecimen enters the laboratory pipeline• Data elements (“metadata”) necessary for laborator...
Careful Metadata Collection at        Specimen Collection or Harvest• Metadata can be formatted at the beginning of a  pro...
Rockin’ It “Old School” -- Spreadsheets•   Modified BOLD specimen spreadsheet for use in field/museum•   Additional fields...
An Elegant Solution:                       BiocodeMoorea FIMS                                      Actively connected to t...
bioValidator – cleaning up the             collection of metadata• Many aspects of metadata require specific formats:  dig...
Museum Collection Databases• Sampling directly from existing collections?• Some museum databases cannot link directly  to ...
Why?     1. Downstream insertion of data into other databases simplified2. Because metadata has important uses in the lab•...
Specimens enter the lab           Metadata enters the LIMS                  LIMSSpecimen   &Metadata                      ...
What is a LIMS?• An electronic lab “notebook” (aka database) to  replace our traditional paper lab notebooks.• Tracks a sp...
My requirements for a LIMS• I want a system that records every piece of  information about each specimen/extract for  whic...
Data to be recorded• Extraction: protocol, digestion time, etc.• PCR: recipes, DNA [ ], cycling parameters, clean-up  meth...
• A LIMS can be homegrown (like LAB’s barcoding     LIMS, or SI’s plant barcoding LIMS) – relatively simple     relational...
Workflow
Mapping workflow elements to success
Tracking project progress      & identifying next steps• Which specimens have  completed barcodes?• Which specimens need  ...
Project Progress
Raw data enters the QC process             LIMSSpecimen                          Data Deposition            Data QC
Data QC• OUTSIDE of LIMS database• “Clean up” raw data – trim, examine quality• Assemble passed traces (“contig”) for a  s...
My data QC ethos• All criteria for each step of data analysis is  recorded• For raw trace processing: trimming  criteria, ...
Any DNA sequence analysis software can be               used for data QC• Sequencher (Genecodes) &Geneious (Biomatters)   ...
Data analysis               Here are the traces. You can see some                FIMS data in the document fields (eg     ...
BinningAutomatic categorization of reads and              assemblies                                 •Change binning      ...
Final Steps:  Is it a contaminant? Is it identified correctly?• A number of procedures for identifying  contamination or i...
Verify Taxonomy• BLASTs your sequences• Gets the NCBI taxonomy for the best hit(s)• Compares to the taxonomy from the FIMS
Good, clean, barcode sequences  • Feed back into LIMS*      – Monitor progress      – Connect sequences and traces to spec...
Positive Information Flow from field or     museum to final data deposition1. Collect metadata to flow easily into LIMS an...
Upcoming SlideShare
Loading in...5
×

Amy Driskell - Information management and data Quality

852

Published on

Tracking progress through the laboratory pipeline, keeping all required products together, consistent data assessment, analysis-lab feedback loop, key elements of a data management database (LIMS)

Published in: Education, Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
852
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • An example workflow. This workflow was very straight forward – everything worked the first time so we didn’t have to rerun anything. Reaction templates and cocktails on the left, reaction thermocycles on the right.
  • Amy Driskell - Information management and data Quality

    1. 1. Managing Data Flow Through the Barcoding Pipeline Amy Driskell Laboratories of Analytical Biology (LAB) National Museum of Natural History Smithsonian Institution
    2. 2. What is the “pipeline”? LIMSSpecimen Data Deposition Data QC
    3. 3. Outline1. BEFORE the LIMS2. LIMS – Data recorded – Exploring laboratory success/failure – Tracking project completion3. Data QC – Criteria and data requirements – Checking for contamination and validity
    4. 4. Critical data management BEFOREspecimen enters the laboratory pipeline• Data elements (“metadata”) necessary for laboratory processing: – Taxonomy, collection information, etc.IMPORTANT!• Assess laboratory successes/failures in light of this information• Tailor/change lab protocols
    5. 5. Careful Metadata Collection at Specimen Collection or Harvest• Metadata can be formatted at the beginning of a project (e.g. at specimen collection) to guarantee a smooth information transfer into the LIMS• Multiple sources for metadata: – Spreadsheets – Field Information Management Systems (FIMS) – Museum databases – Fusion tables
    6. 6. Rockin’ It “Old School” -- Spreadsheets• Modified BOLD specimen spreadsheet for use in field/museum• Additional fields desired by PIs• Modified easily to interface with multiple kinds of databases• 96-well format – 2D barcoded tubes, extraction plates• NOT directly connected to other databases, including LIMS
    7. 7. An Elegant Solution: BiocodeMoorea FIMS Actively connected to their LIMShttp://biocode.berkeley.edu/
    8. 8. bioValidator – cleaning up the collection of metadata• Many aspects of metadata require specific formats: digital lat/long, meters, names• bioValidator enforces adherence to formatting and other rules• Photo matcherhttp://biovalidator.sourceforge.net/
    9. 9. Museum Collection Databases• Sampling directly from existing collections?• Some museum databases cannot link directly to lab-based information systems (LIMS)• Requires output from collection database, input into lab database – no automatic updates
    10. 10. Why? 1. Downstream insertion of data into other databases simplified2. Because metadata has important uses in the lab• Determine possible causes of failure: taxonomy, collection event, specimen age• adjust extraction or amplification protocols• design new primers – e.g. smaller fragments
    11. 11. Specimens enter the lab Metadata enters the LIMS LIMSSpecimen &Metadata Data Deposition Data QC
    12. 12. What is a LIMS?• An electronic lab “notebook” (aka database) to replace our traditional paper lab notebooks.• Tracks a specimen through lab processes from extraction through to barcode sequence completion (data QC may use external software).• Records every lab procedure.• Provides information to guide further lab efforts – success rates, “redo” lists• Records the physical location of extracts, etc.
    13. 13. My requirements for a LIMS• I want a system that records every piece of information about each specimen/extract for which I produce a barcode sequence.• I want my procedures and protocols to be transparent enough so that anyone can reproduce my results.• This includes my QC procedures.• Currently no good place to publish these data.
    14. 14. Data to be recorded• Extraction: protocol, digestion time, etc.• PCR: recipes, DNA [ ], cycling parameters, clean-up method (PCR machine, brand of enzyme, lot #)• Gel photos• Sequencing: recipe, clean-up, machine, etc.• Bonus: success or failure can be mapped back to any of these recorded values. Maybe the Taq was bad? Or the PCR machine needs repair?
    15. 15. • A LIMS can be homegrown (like LAB’s barcoding LIMS, or SI’s plant barcoding LIMS) – relatively simple relational databases • Sophisticated, commercially produced – Geneious plug-in MooreaBiocode LIMS (plug-in is free)•Software updated and maintained•Plugs into the Geneious data analysis software http://software.mooreabiocode.org
    16. 16. Workflow
    17. 17. Mapping workflow elements to success
    18. 18. Tracking project progress & identifying next steps• Which specimens have completed barcodes?• Which specimens need additional labwork?• Which specimens should be abandoned?• Where are the original DNA extracts or tissue samples?
    19. 19. Project Progress
    20. 20. Raw data enters the QC process LIMSSpecimen Data Deposition Data QC
    21. 21. Data QC• OUTSIDE of LIMS database• “Clean up” raw data – trim, examine quality• Assemble passed traces (“contig”) for a specimen• Examine/edit contigs• Check validity of resulting sequences
    22. 22. My data QC ethos• All criteria for each step of data analysis is recorded• For raw trace processing: trimming criteria, length and quality requirements, binning criteria• For assembly: assembly parameters, product length, etc.• Hand editing is minimized*• It would be possible for anyone to recreate the barcode sequence
    23. 23. Any DNA sequence analysis software can be used for data QC• Sequencher (Genecodes) &Geneious (Biomatters) – Trim ends of raw sequences with adjustable criteria, explore effects of trim criteria – Discard short or poor sequences – Assemble trimmed reads with stringent, but adjustable criteria – Output completed sequences• Geneious LIMS is plugged into the data analysis software – direct communication – binning*• Sequencher data must be exported and imported into LIMS
    24. 24. Data analysis Here are the traces. You can see some FIMS data in the document fields (eg identified by, tissue id). You will also notice a binning column (see the following slide)
    25. 25. BinningAutomatic categorization of reads and assemblies •Change binning parameters, examine effects •Trimming and assembly dialog boxes similar
    26. 26. Final Steps: Is it a contaminant? Is it identified correctly?• A number of procedures for identifying contamination or incorrect identification – BLASTingdatabase of known contaminants; Genbank; BOLD – Quick and dirty assembly tests – NJ trees – Geneious taxonomy verification tool
    27. 27. Verify Taxonomy• BLASTs your sequences• Gets the NCBI taxonomy for the best hit(s)• Compares to the taxonomy from the FIMS
    28. 28. Good, clean, barcode sequences • Feed back into LIMS* – Monitor progress – Connect sequences and traces to specimen data • Prepare for output to databases Genbank or BOLD upload packages LIMS &Specimen Data QC Data Deposition
    29. 29. Positive Information Flow from field or museum to final data deposition1. Collect metadata to flow easily into LIMS and other databases2. Record all aspects of all laboratory procedures (LIMS)3. Use LIMS system for reporting and protocol investigation, monitoring of project progress4. Input information and data from QC procedures into LIMS*5. LIMS output upload packages for public databases
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×