Building a Semantic Web Image Repository for Biological Research Images Jun Zhao , Graham Klyne and David Shotton [email_address] Image Bioinformatics Research Group Department of Zoology University of Oxford, UK
FlyTED The  Drosophila  Testis Gene Expression Image Database Publish research data Use existing tools to build a biological image repository Browse-able and searchable by research biologists Accessible to wider communities, e.g. Semantic Web, Linked Data Loosely coupled software architecture maximises the opportunity of replacing or updating components used
Images “…  so that people can come and see the images, and they will notice something special about the genes”  --- Dr Helen White-Cooper ( Drosophila  testis expert)
Where we started from In situ  gene expression images of the ~1500 genes of the testis of  Drosophila melanogaster >=1 image for each  wild type  gene Possibly, >=1 image for any of the 6 different  mutant strains  having defective sperm maturation Images stored in the file system Metadata in spreadsheets, but not expressed using controlled keywords No way to search for images Search through the hard disk whenever they need an image
The goal Publish  Drosophila  gene expression images to the Web Make them accessible and searchable to our biological researchers as well as third parties Quick, easy and cost-effective approach
EPrints 3.0  (http://eprints.org/software/) A digital repository software system Quick and easy to deploy Built-in user interface Programmatically data access Repository-specific protocol: OAI-PMH  Support for domain-specific image metadata, e.g. Serpent Project  http://archive.serpentproject.com/ A Piglet Squid from Serpent Project
Our gene expression images Gene name Strain name >1 Expression location Slide name Creation date ………… .
Adaptation of EPrints Basic structure cannot hold the domain metadata Customize underlying database:  Add additional metadata fields in the database schema Keep both images and their metadata files as blobs Customize the user interface: CSS https://milos2.zoo.ox.ac.uk/svn/ImageWeb/FlyTED/Trunk/
 
 
 
 
 
 
Issues Difficult to query metadata programmatically Limited flexibility in the user interface
FlyTED on the Semantic Web Data become programmatically accessible: http://www.fly-ted.org/sparql Images can be used in more flexible UIs  Semantic Web faceted browsers Exhibit  from MIT SIMILE Javascripts, run in Web browsers http://simile.mit.edu/exhibit/ jSpace  from clarkparsia Java Web Application  http://clarkparsia.com/jspace/
Publish FlyTED on the Semantic Web Free-text  metadata file OAI-PMH + Relational FlyTED Database N3  metadata Local Harvesting & Transformer script Jena RDF database Command-line   Jena model loader Joseki SPARQL Endpoint JSON  data HTTP-based   Babel HTTP SPARQL jSpace Exhibit
FlyTED in Exhibit
 
Functionality Measurement Yes Yes Partial Yes Yes Yes Yes No Yes Partial No Yes Partial Yes Yes No No No
Performance Exhibit Exhibit jSpace jSpace
Summary We built an image repository, based on  Eprints We used existing tools ( OAIHarvester2  API and  Joseki ) to make metadata accessible through SPARQL We consumed these images using existing faceted browsers ( Exhibit  and  jSpace ) in order to present them in more flexible user interfaces Potentially, we can replace existing components with new tools, e.g.  OAI2LOD ,  Joseki/SDB
To take home Publish your data  Take a look at our “Exhibit” of FlyTED images http://www.fly-ted.org/exhibit/exhibit_flyted.html Play with and make a link to our SPARQL endpoint http://www.fly-ted.org/sparql VOID: Vocabulary of Interlinked Data
Since then More images Enrich the Fly Anatomy Ontology Link with others: the FlyWeb Project BDGP ( http://www.fruitfly.org/ ):  Drosophila gene expression images in embryos FlyBase ( http://www.flybase.org/ ):  Genomic Drosophila database Bio2RDF ( http://bio2rdf.org/ ):  RDFized PubMed, Medline, UniProt, GO database, etc
Screen shot of FlyWeb
Acknowledgement David Shotton, Graham Klyne, and Alistair Miles Dr Helen White-Cooper and her research group JISC and BBSRC EPrints Southampton team HP Labs, SIMILE project (MIT), and Clark&Parsia
Thank you!

2008 Jun Zhao Eswc

  • 1.
    Building a SemanticWeb Image Repository for Biological Research Images Jun Zhao , Graham Klyne and David Shotton [email_address] Image Bioinformatics Research Group Department of Zoology University of Oxford, UK
  • 2.
    FlyTED The Drosophila Testis Gene Expression Image Database Publish research data Use existing tools to build a biological image repository Browse-able and searchable by research biologists Accessible to wider communities, e.g. Semantic Web, Linked Data Loosely coupled software architecture maximises the opportunity of replacing or updating components used
  • 3.
    Images “… so that people can come and see the images, and they will notice something special about the genes” --- Dr Helen White-Cooper ( Drosophila testis expert)
  • 4.
    Where we startedfrom In situ gene expression images of the ~1500 genes of the testis of Drosophila melanogaster >=1 image for each wild type gene Possibly, >=1 image for any of the 6 different mutant strains having defective sperm maturation Images stored in the file system Metadata in spreadsheets, but not expressed using controlled keywords No way to search for images Search through the hard disk whenever they need an image
  • 5.
    The goal Publish Drosophila gene expression images to the Web Make them accessible and searchable to our biological researchers as well as third parties Quick, easy and cost-effective approach
  • 6.
    EPrints 3.0 (http://eprints.org/software/) A digital repository software system Quick and easy to deploy Built-in user interface Programmatically data access Repository-specific protocol: OAI-PMH Support for domain-specific image metadata, e.g. Serpent Project http://archive.serpentproject.com/ A Piglet Squid from Serpent Project
  • 7.
    Our gene expressionimages Gene name Strain name >1 Expression location Slide name Creation date ………… .
  • 8.
    Adaptation of EPrintsBasic structure cannot hold the domain metadata Customize underlying database: Add additional metadata fields in the database schema Keep both images and their metadata files as blobs Customize the user interface: CSS https://milos2.zoo.ox.ac.uk/svn/ImageWeb/FlyTED/Trunk/
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    Issues Difficult toquery metadata programmatically Limited flexibility in the user interface
  • 16.
    FlyTED on theSemantic Web Data become programmatically accessible: http://www.fly-ted.org/sparql Images can be used in more flexible UIs Semantic Web faceted browsers Exhibit from MIT SIMILE Javascripts, run in Web browsers http://simile.mit.edu/exhibit/ jSpace from clarkparsia Java Web Application http://clarkparsia.com/jspace/
  • 17.
    Publish FlyTED onthe Semantic Web Free-text metadata file OAI-PMH + Relational FlyTED Database N3 metadata Local Harvesting & Transformer script Jena RDF database Command-line Jena model loader Joseki SPARQL Endpoint JSON data HTTP-based Babel HTTP SPARQL jSpace Exhibit
  • 18.
  • 19.
  • 20.
    Functionality Measurement YesYes Partial Yes Yes Yes Yes No Yes Partial No Yes Partial Yes Yes No No No
  • 21.
  • 22.
    Summary We builtan image repository, based on Eprints We used existing tools ( OAIHarvester2 API and Joseki ) to make metadata accessible through SPARQL We consumed these images using existing faceted browsers ( Exhibit and jSpace ) in order to present them in more flexible user interfaces Potentially, we can replace existing components with new tools, e.g. OAI2LOD , Joseki/SDB
  • 23.
    To take homePublish your data Take a look at our “Exhibit” of FlyTED images http://www.fly-ted.org/exhibit/exhibit_flyted.html Play with and make a link to our SPARQL endpoint http://www.fly-ted.org/sparql VOID: Vocabulary of Interlinked Data
  • 24.
    Since then Moreimages Enrich the Fly Anatomy Ontology Link with others: the FlyWeb Project BDGP ( http://www.fruitfly.org/ ): Drosophila gene expression images in embryos FlyBase ( http://www.flybase.org/ ): Genomic Drosophila database Bio2RDF ( http://bio2rdf.org/ ): RDFized PubMed, Medline, UniProt, GO database, etc
  • 25.
  • 26.
    Acknowledgement David Shotton,Graham Klyne, and Alistair Miles Dr Helen White-Cooper and her research group JISC and BBSRC EPrints Southampton team HP Labs, SIMILE project (MIT), and Clark&Parsia
  • 27.