SlideShare a Scribd company logo
The Barcode of Life
     Data Portal
(http://bol.uvm.edu)
 Dr. David E Schindel, Executive Secretary
    Michael Trizna, Database Specialist
 Consortium for the Barcode of Life (CBOL)
           Smithsonian Institution
             Washington, DC
          www.barcodeoflife.org;
  SchindelD@si.edu and TriznaM@si.edu
Contents of Presentation
Crowd-sourced open source software
How does Data Portal complement BOLD
and GenBank?
Data Portal capabilities
Case Study: Smithsonian frozen bird
tissue project
An Experiment in Museum Tissue
 Mining and Fast Data Release
  Tissue sampling winter/spring
  Sequencing completed in September
  Sequence quality control in October
  Taxonomic checking in early November
   – Obvious errors removed
   – Minor discrepancies remain
  Data released for Adelaide Conference
   – Crowd-sourced annotation by community
   – Will data be mis-used?
Unique Data Portal Capabilities
 Creating customized datasets from public
 and/or your private data
 Online library of standard datasets
 Support sharing within project teams using
 Connect IDs, easy link to Working Groups
 Running different identification analyses
 based on different methodologies:
  – Standard sequence input using FASTA format
  – Use standard or customized datasets
Barcode Aggregator




 727,170 public records
Summary Statistics per Family
Creating Customized Datasets
Existing Data Analysis Packages
  LIST of packages
  – BLOG
  – BRONX
  – Kernel
  – CAOS
  – USEARCH
  – BLAST
  Output of identification routines as
  probabilities of assignment
Data Analysis Methods Session
 New packages presented Friday
 afternoon:
  – Damon Little: Automatic Plants Barcode
    pipeline (from raw traces to trimmed/edited
    sequences)
  – Ka Hou Chu: Composite Vector Method
    (profile trees for faster alignment and tree-
    based analysis)
  – Alain Franc: Matching Next Generation results
    to Sanger-based reference records
Sample output
CONNECT for Data Portal
    Collaboration
The USNM Bird Project
USNM Division of Birds frozen tissue
collection:
– 21,104 specimens, 2512 species
Which new ones ones to sample/barcode?
Public records for birds
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388
Moving Data Among
 BOLD, GenBank, Data Portal
  USNM Excel                     BOLD
  Spreadsheet           Split into projects that
(KE-Emu Source)          consist of 2-4 plates




Local database that         Data Portal
holds all fields from       Aggregator
    the original             database
   spreadsheet
Creating a ‘Pick List’
Spreadsheet of tissue samples compared
with:
– ITIS taxonomy
– Clemens species list in BOLD
– Counts of GenBank and/or public BOLD
  records
– Geographic informattion
Screenshot of USNM list side-by-side with
BOLD records
Identifying Samples to be Subsampled
Side-by-Side Lists
USNM Bird Dataset
3150 tissues sampled
168 failed sequences
94 problematic sequences
166 clustered badly
2761 ‘BARCODE-ready’ samples
1,147 ‘first-BARCODE’ species
91% increase over 1,259 barcoded species
(3,892 listed in BOLD includes BINs, others)
Two problematic clades, USNM data
  Flycatchers: Family Tyrannidae
   – Sublegatus arenarum, S. modestus, S.
     obscurior, S. sp.
   – Conopias parvus, C. albovittatus
   – Myiarchus ferox, M. swainsoni, M. sp.
  Hummingbirds: Family Trochilidae
   – Phaethornis longuemareus
  Inconsistencies within USNM dataset
  Incompatibilities with public, other data
Resolving Mis-identified
      Specimens
What testing dataset to use?
ID trees and analytical routines could use:
– All public bird COI records: 10,967
– All BARCODE records in GenBank: 8,419
– BARCODE with taxonomic names: 7,965
– BARCODE, name and 2 traces: 2,388
Which ones have reliable taxonomic IDs?
Preparing a Data Release Paper
 Summary statistics from Data Portal




 Figures from BOLD

More Related Content

What's hot

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
ExternalEvents
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
Rafael C. Jimenez
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
Rutger Vos
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Flávio Codeço Coelho
 
Rap db(rice annotation project data base)
Rap db(rice annotation project data base)Rap db(rice annotation project data base)
Rap db(rice annotation project data base)
PrajaktaKale17
 
Kegg databse
Kegg databseKegg databse
Kegg databse
Rashi Srivastava
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Alejandra Gonzalez-Beltran
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
Jun Zhao
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
Araport
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Textming chancediscovery
Textming chancediscoveryTextming chancediscovery
Textming chancediscovery
Chris Yukna
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Taxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosTaxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicos
Himanshi Chauhan
 
creation of DNA barcoding database with website
creation of DNA barcoding database with websitecreation of DNA barcoding database with website
creation of DNA barcoding database with website
JunaidAKG
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
CEDAR: Center for Expanded Data Annotation and Retrieval
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
Tim Clark
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Flávio Codeço Coelho
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
Michel Dumontier
 

What's hot (20)

The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...
 
DAS game: how a programmer thinks
DAS game: how a programmer thinksDAS game: how a programmer thinks
DAS game: how a programmer thinks
 
Assembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence dataAssembling the Tree of Life from public DNA sequence data
Assembling the Tree of Life from public DNA sequence data
 
Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.Claudia medina: Linking Health Records for Population Health Research in Brazil.
Claudia medina: Linking Health Records for Population Health Research in Brazil.
 
Rap db(rice annotation project data base)
Rap db(rice annotation project data base)Rap db(rice annotation project data base)
Rap db(rice annotation project data base)
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
The CEDAR Workbench: An Ontology-Assisted Environment for Authoring Metadata ...
 
Textming chancediscovery
Textming chancediscoveryTextming chancediscovery
Textming chancediscovery
 
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
Metadata in the BioSample Online Repository are Impaired by Numerous Anomalie...
 
Taxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicosTaxanomic websites IPNI,plant list,tropicos
Taxanomic websites IPNI,plant list,tropicos
 
creation of DNA barcoding database with website
creation of DNA barcoding database with websitecreation of DNA barcoding database with website
creation of DNA barcoding database with website
 
ENVS 604 Fall 2012
ENVS 604 Fall 2012ENVS 604 Fall 2012
ENVS 604 Fall 2012
 
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (S...
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio  Silva: Cloud Computing Technologies for Genomic Big Data AnalysisFabricio  Silva: Cloud Computing Technologies for Genomic Big Data Analysis
Fabricio Silva: Cloud Computing Technologies for Genomic Big Data Analysis
 
Model Organism Linked Data
Model Organism Linked DataModel Organism Linked Data
Model Organism Linked Data
 
Plant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plantsPlant names: Obstacles and Solutions to access information about plants
Plant names: Obstacles and Solutions to access information about plants
 

Viewers also liked

Amy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipelineAmy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipeline
Consortium for the Barcode of Life (CBOL)
 
Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates
Consortium for the Barcode of Life (CBOL)
 
Steven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMSSteven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMS
Consortium for the Barcode of Life (CBOL)
 
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Consortium for the Barcode of Life (CBOL)
 
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLDDr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Consortium for the Barcode of Life (CBOL)
 
Navigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your RightsNavigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your Rightscedwvugraphics
 
Dirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebratesDirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebrates
Consortium for the Barcode of Life (CBOL)
 

Viewers also liked (7)

Amy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipelineAmy Driskell - The Barcoding pipeline
Amy Driskell - The Barcoding pipeline
 
Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates Julie Stahlhut - Terrestrial invertebrates
Julie Stahlhut - Terrestrial invertebrates
 
Steven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMSSteven Stones-Havas - Geneious: Biocode and LIMS
Steven Stones-Havas - Geneious: Biocode and LIMS
 
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
Sarah Adamowicz - Pros and Cons of Collecting Specimens for Barcoding vs. Sam...
 
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLDDr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
Dr Robert Hanner & Dr Dirk Steinke - Campaigns on BOLD
 
Navigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your RightsNavigating the Benefits Maze & Exercising Your Rights
Navigating the Benefits Maze & Exercising Your Rights
 
Dirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebratesDirk Steinke - Marine invertebrates
Dirk Steinke - Marine invertebrates
 

Similar to Dr David Schindel and Mike Trizna - BOL Data Portal

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
Anshika Bansal
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Consortium for the Barcode of Life (CBOL)
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
Laura Clarke
 
Biological databases
Biological databasesBiological databases
Biological databases
Sarfaraz Nasri
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
Vidya Kalaivani Rajkumar
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
Vidya Kalaivani Rajkumar
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
Rajendra K Labala
 
Selected innovations in Biodiversity Informatics
Selected innovations inBiodiversity InformaticsSelected innovations inBiodiversity Informatics
Selected innovations in Biodiversity Informatics
Tony Rees
 
Texas sla presentation finding sci tech grey literature information
Texas sla presentation  finding sci tech grey literature informationTexas sla presentation  finding sci tech grey literature information
Texas sla presentation finding sci tech grey literature information
Matthew Von Hendy
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
Carole Goble
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
GigaScience, BGI Hong Kong
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
vijayapraba1
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
Biogeeks
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
GigaScience, BGI Hong Kong
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
Geohedrick
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
GigaScience, BGI Hong Kong
 
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Matthew Von Hendy
 

Similar to Dr David Schindel and Mike Trizna - BOL Data Portal (20)

Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungiDr Robert Hanner - Barcode Data standards for animals, plants & fungi
Dr Robert Hanner - Barcode Data standards for animals, plants & fungi
 
The Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination PlatformThe Human Cell Atlas Data Coordination Platform
The Human Cell Atlas Data Coordination Platform
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Data retrieval tools
Data retrieval toolsData retrieval tools
Data retrieval tools
 
Understanding Genome
Understanding Genome Understanding Genome
Understanding Genome
 
Selected innovations in Biodiversity Informatics
Selected innovations inBiodiversity InformaticsSelected innovations inBiodiversity Informatics
Selected innovations in Biodiversity Informatics
 
Texas sla presentation finding sci tech grey literature information
Texas sla presentation  finding sci tech grey literature informationTexas sla presentation  finding sci tech grey literature information
Texas sla presentation finding sci tech grey literature information
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx2 Discovery and Acquisition of Data1.pptx
2 Discovery and Acquisition of Data1.pptx
 
Ondex: Data integration and visualisation
Ondex: Data integration and visualisationOndex: Data integration and visualisation
Ondex: Data integration and visualisation
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
Great Science, Technology, Engineering and Medicine Resources Web Search Univ...
 

More from Consortium for the Barcode of Life (CBOL)

John La Salle - Opening Plenary
John La Salle - Opening PlenaryJohn La Salle - Opening Plenary
John La Salle - Opening Plenary
Consortium for the Barcode of Life (CBOL)
 

More from Consortium for the Barcode of Life (CBOL) (20)

Andrew Lowe - Opening Plenary
Andrew Lowe - Opening PlenaryAndrew Lowe - Opening Plenary
Andrew Lowe - Opening Plenary
 
Axel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates PlenaryAxel Hausmann - Invertebrates Plenary
Axel Hausmann - Invertebrates Plenary
 
Hannah McPherson - Plants Plenary
Hannah McPherson - Plants PlenaryHannah McPherson - Plants Plenary
Hannah McPherson - Plants Plenary
 
Rebecca Johnson - Opening Plenary
Rebecca Johnson - Opening PlenaryRebecca Johnson - Opening Plenary
Rebecca Johnson - Opening Plenary
 
K.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi PlenaryK.A. Seifert - Algae, Protists & Fungi Plenary
K.A. Seifert - Algae, Protists & Fungi Plenary
 
Scott Miller - Opening Plenary
Scott Miller - Opening PlenaryScott Miller - Opening Plenary
Scott Miller - Opening Plenary
 
Bruce Deagle - Opening Plenary
Bruce Deagle - Opening PlenaryBruce Deagle - Opening Plenary
Bruce Deagle - Opening Plenary
 
Ralph Imondi - Opening Plenary
Ralph Imondi - Opening PlenaryRalph Imondi - Opening Plenary
Ralph Imondi - Opening Plenary
 
Damon Little - Opening Plenary
Damon Little - Opening PlenaryDamon Little - Opening Plenary
Damon Little - Opening Plenary
 
Natasha de Vere - Plants Plenary
Natasha de Vere - Plants PlenaryNatasha de Vere - Plants Plenary
Natasha de Vere - Plants Plenary
 
Robert Hanner - Closing Plenary
Robert Hanner - Closing PlenaryRobert Hanner - Closing Plenary
Robert Hanner - Closing Plenary
 
Paul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing PlenaryPaul Hebert - Saturday Closing Plenary
Paul Hebert - Saturday Closing Plenary
 
Conrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing PlenaryConrad Schoch - Saturday Closing Plenary
Conrad Schoch - Saturday Closing Plenary
 
Xin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing PlenaryXin Zhou - Saturday Closing Plenary
Xin Zhou - Saturday Closing Plenary
 
Pierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing PlenaryPierre Taberlet - Saturday Closing Plenary
Pierre Taberlet - Saturday Closing Plenary
 
Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative Stoeckle - All Birds Barcoding Initiative
Stoeckle - All Birds Barcoding Initiative
 
Weiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi PlenaryWeiland Meyer - Algae, Protists & Fungi Plenary
Weiland Meyer - Algae, Protists & Fungi Plenary
 
Alain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi PlenaryAlain Franc - Algae, Protists & Fungi Plenary
Alain Franc - Algae, Protists & Fungi Plenary
 
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi PlenaryMarieka Gryzenhout - Algae, Protists & Fungi Plenary
Marieka Gryzenhout - Algae, Protists & Fungi Plenary
 
John La Salle - Opening Plenary
John La Salle - Opening PlenaryJohn La Salle - Opening Plenary
John La Salle - Opening Plenary
 

Recently uploaded

Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
DhatriParmar
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
goswamiyash170123
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 

Recently uploaded (20)

Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
The Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptxThe Accursed House by Émile Gaboriau.pptx
The Accursed House by Émile Gaboriau.pptx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdfMASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
MASS MEDIA STUDIES-835-CLASS XI Resource Material.pdf
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 

Dr David Schindel and Mike Trizna - BOL Data Portal

  • 1. The Barcode of Life Data Portal (http://bol.uvm.edu) Dr. David E Schindel, Executive Secretary Michael Trizna, Database Specialist Consortium for the Barcode of Life (CBOL) Smithsonian Institution Washington, DC www.barcodeoflife.org; SchindelD@si.edu and TriznaM@si.edu
  • 2. Contents of Presentation Crowd-sourced open source software How does Data Portal complement BOLD and GenBank? Data Portal capabilities Case Study: Smithsonian frozen bird tissue project
  • 3. An Experiment in Museum Tissue Mining and Fast Data Release Tissue sampling winter/spring Sequencing completed in September Sequence quality control in October Taxonomic checking in early November – Obvious errors removed – Minor discrepancies remain Data released for Adelaide Conference – Crowd-sourced annotation by community – Will data be mis-used?
  • 4. Unique Data Portal Capabilities Creating customized datasets from public and/or your private data Online library of standard datasets Support sharing within project teams using Connect IDs, easy link to Working Groups Running different identification analyses based on different methodologies: – Standard sequence input using FASTA format – Use standard or customized datasets
  • 5. Barcode Aggregator 727,170 public records
  • 8. Existing Data Analysis Packages LIST of packages – BLOG – BRONX – Kernel – CAOS – USEARCH – BLAST Output of identification routines as probabilities of assignment
  • 9. Data Analysis Methods Session New packages presented Friday afternoon: – Damon Little: Automatic Plants Barcode pipeline (from raw traces to trimmed/edited sequences) – Ka Hou Chu: Composite Vector Method (profile trees for faster alignment and tree- based analysis) – Alain Franc: Matching Next Generation results to Sanger-based reference records
  • 10.
  • 12. CONNECT for Data Portal Collaboration
  • 13.
  • 14. The USNM Bird Project USNM Division of Birds frozen tissue collection: – 21,104 specimens, 2512 species Which new ones ones to sample/barcode? Public records for birds – All public bird COI records: 10,967 – All BARCODE records in GenBank: 8,419 – BARCODE with taxonomic names: 7,965 – BARCODE, name and 2 traces: 2,388
  • 15. Moving Data Among BOLD, GenBank, Data Portal USNM Excel BOLD Spreadsheet Split into projects that (KE-Emu Source) consist of 2-4 plates Local database that Data Portal holds all fields from Aggregator the original database spreadsheet
  • 16. Creating a ‘Pick List’ Spreadsheet of tissue samples compared with: – ITIS taxonomy – Clemens species list in BOLD – Counts of GenBank and/or public BOLD records – Geographic informattion Screenshot of USNM list side-by-side with BOLD records
  • 17. Identifying Samples to be Subsampled
  • 19. USNM Bird Dataset 3150 tissues sampled 168 failed sequences 94 problematic sequences 166 clustered badly 2761 ‘BARCODE-ready’ samples 1,147 ‘first-BARCODE’ species 91% increase over 1,259 barcoded species (3,892 listed in BOLD includes BINs, others)
  • 20. Two problematic clades, USNM data Flycatchers: Family Tyrannidae – Sublegatus arenarum, S. modestus, S. obscurior, S. sp. – Conopias parvus, C. albovittatus – Myiarchus ferox, M. swainsoni, M. sp. Hummingbirds: Family Trochilidae – Phaethornis longuemareus Inconsistencies within USNM dataset Incompatibilities with public, other data
  • 21.
  • 23. What testing dataset to use? ID trees and analytical routines could use: – All public bird COI records: 10,967 – All BARCODE records in GenBank: 8,419 – BARCODE with taxonomic names: 7,965 – BARCODE, name and 2 traces: 2,388 Which ones have reliable taxonomic IDs?
  • 24. Preparing a Data Release Paper Summary statistics from Data Portal Figures from BOLD