The document describes a project by researchers at the Harvard School of Public Health to annotate and contextualize biomedical experimental data sets to make them more understandable and reusable. The curation team includes graduate students, postdocs, and a science librarian who use ISATools software to annotate descriptions, metadata, and raw/derived data files from published studies. As of March 2011, over 50 studies comprising 900+ assays had been annotated and collected in their internal system. The goal is to clearly link curated data sets to published works and make this information openly available.
A sponsored supplement produced for Jisc on how researchers can cope with the data deluge of modern research techniques. Published by Times Higher Education on 25 November 2009
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
A sponsored supplement produced for Jisc on how researchers can cope with the data deluge of modern research techniques. Published by Times Higher Education on 25 November 2009
Online databases containing high throughput screening and other property data continue to proliferate in number. Many pharmaceutical chemists will have used databases such as PubChem, ChemSpider, DrugBank, BindingDB and many others. This work will report on the potential value of these databases for providing data to be used to repurpose drugs using cheminformatics-based approaches (e.g. docking, ligand-based machine learning methods). This work will also discuss the potentially related applications of the Open PHACTS project, a European Union Innovative Medicines Initiative project, that is utilizing semantic web based approaches to integrate large scale chemical and biological data in new ways. We will report on how compound and data quality should be taken into account when utilizing data from online databases and how their careful curation can provide high quality data that can be used to underpin the delivery of molecular models that can in turn identify new uses for old drugs.
Presentation Title: Grand Challenges and Big Data: Implications for Public Participation in Scientific Research
Presenter: William Michener, Professor and PI/Director of DataONE, University Libraries, University of New Mexico
Description & annotation of biomedical experimental data sets: work in progress
1. Description & annotation of biomedical experimental data sets:
work in progress
Jen Ferguson • Harvard School of Public Health
1
Objective
Biomedical data deposition is on the rise as more scientists make their
experimental data openly available. However, lack of context and metadata can
create obstacles to the understanding and reuse of these data sets.
4 Conclusions
Researchers from the Harvard School of Public Health Bioinformatics Core
Group have attempted to address this issue by assembling a team of curators to
Work on this curation project has yielded useful
annotate and contextualize this data. The curation team includes life sciences
insights for both scientists and librarians.
graduate students and postdoctoral associates, plus one science librarian (JF).
Similarities:
What information is relevant for curation?
How much metadata is enough/too much?
Differences:
Areas of expertise
Perspectives and concepts of scale
2 Methods There are valuable roles for librarians to
play in the curation process.
Some of the most worthwhile contributions
ISATools, an open source software suite, is used to
that librarians can offer may occur prior to
annotate experimental descriptions and data sets.
the actual curation process, in areas such
Curators draw upon information from PubMed
as training and software development.
papers and associated data sets (GEO files) to
create ISATab records.
3
Results
As of March 2011, HSPH has collected over 50
Curators annotate and describe both raw and annotated studies comprising 900+ assays in their
derived data files for each investigation, as well as internal data management system.
supplying metadata for the investigation as a whole.
The ultimate goal:
Once annotated, the records are validated using create records that clearly tie curated,
ISAvalidator and sent to an internal data metadata-enriched data sets to published
management system. works, and
* References make this information openly available
1. Field, D., et al. "'Omics Data Sharing." Science
326.5950 (2009): 234-36. Print.
in public repositories.
2. ISA Team. "Isa - About". January 24 2011. http://
isatab.sourceforge.net/index.html.
3. Rocca-Serra, P., et al. "Isa Software Suite: Supporting
Standards-Compliant Experimental Annotation and
Enabling Curation at the Community Level."
Bioinformatics 26.18 (2010): 2354-56. Print.
4. Rocca-Serra, Phillippe, Eamonn Maguire, and
Susanna-Assunta Sanson. "The Isa Infrastructure:
standards & Software for Annotating, Managing and
Sharing Life Science Investigation." Ed. Oxford,
University of. 2010. Print.