SlideShare a Scribd company logo
Description & annotation of biomedical experimental data sets:
                      work in progress
                                                                      Jen Ferguson • Harvard School of Public Health


                                                                 1
                                                                       Objective
                       Biomedical data deposition is on the rise as more scientists make their
                       experimental data openly available. However, lack of context and metadata can
                       create obstacles to the understanding and reuse of these data sets.
                                                                                                                                  4   Conclusions
                       Researchers from the Harvard School of Public Health Bioinformatics Core
                       Group have attempted to address this issue by assembling a team of curators to
                                                                                                                        Work on this curation project has yielded useful
                       annotate and contextualize this data. The curation team includes life sciences
                                                                                                                        insights for both scientists and librarians.
                       graduate students and postdoctoral associates, plus one science librarian (JF).

                                                                                                                        Similarities:
                                                                                                                                 What information is relevant for curation?
                                                                                                                                 How much metadata is enough/too much?

                                                                                                                        Differences:
                                                                                                                                Areas of expertise
                                                                                                                                Perspectives and concepts of scale

                                                                              2   Methods                                There are valuable roles for librarians to
                                                                                                                         play in the curation process.

                                                                                                                         Some of the most worthwhile contributions
                                                                ISATools, an open source software suite, is used to
                                                                                                                         that librarians can offer may occur prior to
                                                                annotate experimental descriptions and data sets.
                                                                                                                         the actual curation process, in areas such
                                                                Curators draw upon information from PubMed
                                                                                                                         as training and software development.
                                                                papers and associated data sets (GEO files) to
                                                                create ISATab records.




                                                                                                                                                 3
                                                                                                                                                     Results
                                                                                                                                  As of March 2011, HSPH has collected over 50
                                                                Curators annotate and describe both raw and                       annotated studies comprising 900+ assays in their
                                                                derived data files for each investigation, as well as             internal data management system.
                                                                supplying metadata for the investigation as a whole.
                                                                                                                                 The ultimate goal:
                                                                 Once annotated, the records are validated using                         create records that clearly tie curated,
                                                                 ISAvalidator and sent to an internal data                               metadata-enriched data sets to published
                                                                 management system.                                                      works, and
 *      References                                                                                                                        make this information openly available
1.    Field, D., et al. "'Omics Data Sharing." Science
      326.5950 (2009): 234-36. Print.
                                                                                                                                          in public repositories.
2.    ISA Team. "Isa - About". January 24 2011. http://
      isatab.sourceforge.net/index.html.
3.    Rocca-Serra, P., et al. "Isa Software Suite: Supporting
      Standards-Compliant Experimental Annotation and
      Enabling Curation at the Community Level."
      Bioinformatics 26.18 (2010): 2354-56. Print.
4.    Rocca-Serra, Phillippe, Eamonn Maguire, and
      Susanna-Assunta Sanson. "The Isa Infrastructure:
      standards & Software for Annotating, Managing and
      Sharing Life Science Investigation." Ed. Oxford,
      University of. 2010. Print.

More Related Content

Similar to Description & annotation of biomedical experimental data sets: work in progress

GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
David Johnson
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009
Fiona Salvage
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
Susanna-Assunta Sansone
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_final
Kristi Holmes
 
Mining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposingMining public domain data as a basis for drug repurposing
Evolving Roles in Scholarly Communications
Evolving Roles in Scholarly Communications Evolving Roles in Scholarly Communications
Evolving Roles in Scholarly Communications
LIBER Europe
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
Jisc
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
CitizenScience.org
 
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
Health Informatics New Zealand
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseHilmar Lapp
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computersNoonapau
 
Lurking in the lab: analysis of data from molecular biology laboratory instr...
Lurking in the lab:  analysis of data from molecular biology laboratory instr...Lurking in the lab:  analysis of data from molecular biology laboratory instr...
Lurking in the lab: analysis of data from molecular biology laboratory instr...
jenferguson
 
From metadata to data curation: the role of libraries in data exchange
From metadata to data curation: the role of libraries in data exchangeFrom metadata to data curation: the role of libraries in data exchange
From metadata to data curation: the role of libraries in data exchange
LIBER Europe
 
Full text
Full textFull text
Full textbutest
 
Full text
Full textFull text
Full textbutest
 

Similar to Description & annotation of biomedical experimental data sets: work in progress (20)

GARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant ScienceGARNet workshop on Integrating Large Data into Plant Science
GARNet workshop on Integrating Large Data into Plant Science
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009THE Jisc Supplement 25 Nov 2009
THE Jisc Supplement 25 Nov 2009
 
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific DataNIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
NIH iDASH meeting on data sharing - BioSharing, ISA and Scientific Data
 
Scio12 sem web_final
Scio12 sem web_finalScio12 sem web_final
Scio12 sem web_final
 
Mining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposingMining public domain data as a basis for drug repurposing
Mining public domain data as a basis for drug repurposing
 
Evolving Roles in Scholarly Communications
Evolving Roles in Scholarly Communications Evolving Roles in Scholarly Communications
Evolving Roles in Scholarly Communications
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
Michener Plenary PPSR2012
Michener Plenary PPSR2012Michener Plenary PPSR2012
Michener Plenary PPSR2012
 
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
Informatics in Clinical Practice: Designing and Implementing an Electronic Re...
 
2012 researching your first review article class
2012 researching your first review article class2012 researching your first review article class
2012 researching your first review article class
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic DatabaseTowards a Simple, Standards-Compliant, and Generic Phylogenetic Database
Towards a Simple, Standards-Compliant, and Generic Phylogenetic Database
 
Data analysis – using computers
Data analysis – using computersData analysis – using computers
Data analysis – using computers
 
Lurking in the lab: analysis of data from molecular biology laboratory instr...
Lurking in the lab:  analysis of data from molecular biology laboratory instr...Lurking in the lab:  analysis of data from molecular biology laboratory instr...
Lurking in the lab: analysis of data from molecular biology laboratory instr...
 
From metadata to data curation: the role of libraries in data exchange
From metadata to data curation: the role of libraries in data exchangeFrom metadata to data curation: the role of libraries in data exchange
From metadata to data curation: the role of libraries in data exchange
 
Full text
Full textFull text
Full text
 
Full text
Full textFull text
Full text
 
M&e for ciat seminar
M&e for ciat seminarM&e for ciat seminar
M&e for ciat seminar
 
Introduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHRIntroduction to Database Research Projects @ CWHR
Introduction to Database Research Projects @ CWHR
 

Description & annotation of biomedical experimental data sets: work in progress

  • 1. Description & annotation of biomedical experimental data sets: work in progress Jen Ferguson • Harvard School of Public Health 1 Objective Biomedical data deposition is on the rise as more scientists make their experimental data openly available. However, lack of context and metadata can create obstacles to the understanding and reuse of these data sets. 4 Conclusions Researchers from the Harvard School of Public Health Bioinformatics Core Group have attempted to address this issue by assembling a team of curators to Work on this curation project has yielded useful annotate and contextualize this data. The curation team includes life sciences insights for both scientists and librarians. graduate students and postdoctoral associates, plus one science librarian (JF). Similarities: What information is relevant for curation? How much metadata is enough/too much? Differences: Areas of expertise Perspectives and concepts of scale 2 Methods There are valuable roles for librarians to play in the curation process. Some of the most worthwhile contributions ISATools, an open source software suite, is used to that librarians can offer may occur prior to annotate experimental descriptions and data sets. the actual curation process, in areas such Curators draw upon information from PubMed as training and software development. papers and associated data sets (GEO files) to create ISATab records. 3 Results As of March 2011, HSPH has collected over 50 Curators annotate and describe both raw and annotated studies comprising 900+ assays in their derived data files for each investigation, as well as internal data management system. supplying metadata for the investigation as a whole. The ultimate goal: Once annotated, the records are validated using create records that clearly tie curated, ISAvalidator and sent to an internal data metadata-enriched data sets to published management system. works, and * References make this information openly available 1.  Field, D., et al. "'Omics Data Sharing." Science 326.5950 (2009): 234-36. Print. in public repositories. 2.  ISA Team. "Isa - About". January 24 2011. http:// isatab.sourceforge.net/index.html. 3.  Rocca-Serra, P., et al. "Isa Software Suite: Supporting Standards-Compliant Experimental Annotation and Enabling Curation at the Community Level." Bioinformatics 26.18 (2010): 2354-56. Print. 4.  Rocca-Serra, Phillippe, Eamonn Maguire, and Susanna-Assunta Sanson. "The Isa Infrastructure: standards & Software for Annotating, Managing and Sharing Life Science Investigation." Ed. Oxford, University of. 2010. Print.