Sabina Leonelli


Published on

Web Semantics in Action: Web 3.0 in e-Science

11:25 – 11:50 Sabina Leonelli: An HPSSB Approach to Gene Ontology

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sabina Leonelli

  1. 1. An HPSSB (History, Philosophy and Social Studies of Biology) Approach to Biomedical Ontologies Sabina Leonelli ESRC Centre for Genomics in Society Department of Sociology and Philosophy University of Exeter
  2. 2. An HPSSB Perspective on the epistemic role of e-Science Characterisation of experimental science as encompassing a variety of ways of knowing and communicating, beyond what can be formalised e.g. modelling, experimental practices, tacit familiarity with instruments and materials This awareness needs to carry to e-Science: not attempt to replace laboratory activities, but to complement them (attention: pointing to new directions does not mean guiding research, exploratory quality of experimentation) • History of biology ‘big science’ infrastructure since WWII; history of model organism research in biology, and of relations between biological and medical research • Philosophy of biology The role of data, theories, different types of models, instruments and materials in experimental practices; Epistemic functions of classification • Social Studies of biology: Social organisation of science; Forms of and conditions for cooperation and communication; Power relations among actors; Institutional and economic context
  3. 3. Case Study: The Gene Ontology • Arguably most successful bio-ontology to date • Developed for use by community databases as a standard for the annotation of gene products  history steeped in model organism research • Good tool for data sharing: – Choice of terms is based on research interests of users – Dynamic system: can be updated to reflect scientific developments • Flexibility comes from appropriate curation: – Manual and labour-intensive (impossible to automate) – Research interests vary across epistemic cultures: • How to choose relevant and intelligible labels?
  4. 4. The Classification Problem stability of classificatory categories versus dynamism and diversity of research practices  Can classification through standard categories enable collaborative research without at the same time stifling its development and pluralism?
  5. 5. GO as a Classification System Making data travel across different epistemic communities, to facilitate cross-species, integrative research: classification of both biological phenomena and data • Data are associated to biological phenomena via machine- readable labels • Users can automatically assess the relevance of data as evidence for claims about those phenomena • To re-use data towards new discoveries, users need to assess their reliability within their own research context: meta-data enable users to ‘situate’ information through their own expertise and tacit knowledge = data are de-contextualised for travel and re- contextualised for appropriation by a new context = access is differential: users can choose parameters for their queries depending on their interests and expertise
  6. 6. Classification of ‘mined’ data
  7. 7. Classification of data provenance EVIDENCE CODES Experimental evidence codes - Inferred from Mutant Phenotype - Inferred from Direct Assay - Inferred from Genetic Interaction - Inferred from Physical Interaction - Inferred from Expression Pattern Computational analysis IEA - Inferred from Electronic Annotation RCA - Reviewed Computational Analysis ISS - Inferred from Sequence Similarity Author statement TAS - Traceable Author Statement NAS - Non-traceable Author Statement Curatorial statement IC - Inferred by Curator ND - No biological Data available
  8. 8. GO as an Expert Community The threat of imperialism Vs. GO as ‘service to biology’: whoever chooses labels and what counts as meta-data determines nomenclature and protocols used as standard across biology (and thus interpretation of data as well as experimental set-ups) 1. De-contextualisation: separating data from information about ‘local’ features of data production 2. Abstraction: simplifying, eliminating or modifying characteristics of data to be standardised 3. Knowledge-stabilisation: define terms and relations to mirror (what they see as) the consensus 4. Situating: associate each dataset with a specific term (and thus a specific phenomenon) Solution: Curator as mediator between requirements of e- Science (consistency, computability, ease of use and wide intelligibility) and the diverse practices characterising experimental biology • GO curators develop specific expertise to tackle the threat – Cross-disciplinary training> awareness of diverse epistemic cultures – Experience ‘at the bench’ > awareness of what users need and look for • Community involvement (content meetings, feedback, crowdsourcing, user training workshop and online
  9. 9. GO as a Scientific Institution However: emergence of separate expertise is itself an obstacle to dialogue with users. Curators face two severe problems: • Impossible to serve users without consultation, yet users do not provide feedback: lack of interest, time, expertise • Need to minimise duplication/proliferation of labels, yet each curator/ontology has a different perception/ function of/in the field Solution: Consortia as regulatory centres -- standardisation as a tool to serve diversity in epistemic practices and interests of users: • Centralising expertise • Centralising procedures
  10. 10. The Gene Ontology Consortium • Michael Ashburner 1998: the terms used for data classification should be the ones used to describe research interests • July 1998: First meeting of the consortium, members from Saccharomyces Genome Database, Mouse Genome Informatics, FlyBase, Berkeley Drosophila Genome Project • October 1999: funding application NIH, AstraZeneca • 2000-1: Rapid expansion, including the Zebrafish Information Network, the Rat Genome Database, The Arabidopsis Information Resource, Gramene. • 2002: Central office in Cambridge • Grants from National Human Genome Research Institute (NHGRI), NIH, EU, AstraZeneca, InciteGenomics, United States Department of Agriculture, Research and Education Service and the UK Medical Research Council. • De facto standard for classification, annotation and dissemination of genomic data in model organism biology • In parallel: birth of the Open Biomedical Ontologies Consortium
  11. 11. The Institutional Role of Consortia: Enforcing Collaboration • Encourage feedback loops among curators: – Rules for bio-ontology development – Organisation of curator meetings and communication – Enhancing accountability and clear division of labour • Encourage dialogue with users: – ‘Content meetings’ – Experiment on peer review procedures (e.g. Reactome) – Liase with industry to align their data sharing practices • Co-operate with journals (linking data disclosure with publication) E.g. Plant Physiology and TAIR: enforcing feedback on GO • Train users and curators – Workshops at conferences and elsewhere – Enforce institutionalisation within universities (e.g. Stanford Biomedical Informatics; graduate training in UK system biology)
  12. 12. The multiple identities of GO • GO needs to be playing several epistemic roles in biology • Classification system • Expert community • Regulatory institution • Exemplifies and regulates epistemic and social relations between virtual (in silico) and material (wet) practices in biology • Despite institutionalisation within biology, still far from having resolved tensions between curator’s vision of what technology can do for science, and user needs and practices • Handling dissent on terms or definitions • Providing sufficient meta-data to assess data provenance • Non-overlapping datasets and checking data quality • Long-term maintenance, strategies for revision and updating (how has GO actually been revised?)
  13. 13. Thanks to ESRC for funding and several bio-ontology curators (including the GO team at EBI) for their patience and availability for interviews • (in preparation) On the Role of Theory in Data-Driven Research: The Case of Bio-Ontologies. • (2010) Documenting the Emergence of Bio-Ontologies: Or, Why Researching Bioinformatics Requires HPSSB. History and Philosophy of the Life Sciences. • (2010) Packaging Data for Re-Use: Databases in Model Organism Biology. In Howlett, P and Morgan, MS (eds) How Well Do ‘Facts’ Travel. CUP.  • (2009) On the Locality of Data and Claims About Phenomena. Philosophy of Science 76, 5. • (2009) Centralising Labels to Distribute Data: The Regulatory Role of Genomic Consortia. In Atkinson et al (eds.) Handbook for Genetics and Society: Mapping the New Genomic Era. Routledge, pp. 469-485. • (2008) Bio-Ontologies as Tools for Integration in Biology. Biological Theory 3, 1: 8-11.
  14. 14. Abstract This paper reflects on the analytic challenges emerging from the study of bioinformatic tools recently created to store and disseminate biological data, such as databases, repositories and bio-ontologies. I focus my discussion on the Gene Ontology, a term that defines three entities at once: a classification system facilitating the distribution and use of genomic data as evidence towards new insights; an expert community specialised in the curation of those data; and a scientific institution promoting the use of this tool among experimental biologists. These three dimensions of the Gene Ontology can be clearly distinguished analytically, but are tightly intertwined in practice. I suggest that this is true of all bioinformatic tools: they need to be understood simultaneously as epistemic, social and institutional entities, since they shape the knowledge extracted from data and at the same time regulate the organisation, development and communication of research. This viewpoint has one important implication for the methodologies used to study these tools, that is the need to integrate historical, philosophical and sociological approaches. I illustrate this claim through examples of misunderstandings that may result from a narrowly disciplinary study of the Gene Ontology, as I experienced them in my own research.