A Semantic Framework for Biomedical Image
Discovery
Ahmad C. Bukhari , Mate Levente Nagy, Michael Krauthammer , Paolo Ciccarese ,
Christopher J. O. Baker
1
Background and Motivation
• Rapid development in Biomedical research produces a
continuous stream of new knowledge.
• Efficient literature accessing practices are essential to transfer of
information from the research community to peer investigators
and other healthcare practitioners.
• Images depict key findings of research papers and help
academically to better understand the biological concepts.
Background and Motivation
Protein and DNA sequence Images: Provide exact specification of the
composition of a biological entity.
Pathway diagrams:
Signaling flow / interacting proteins.
Background and Motivation
MRI Scans - locations of specific brain
activity.
Gel Imaging - information about DNA and
protein manipulation
Ultrasoun
d
X-
Rays
Graphs
Background and Motivation
• Making biomedical image content explicit is essential with regards to making
medical decisions such as:
5
 Diagnosis, Treatment and Follow-up
 Data management and the secondary use for biomedical research
 Assessment of care delivery.
• However, the issues associated with knowledge management and
utility operations unique to image data are only recently gaining
recognition.
Background and Motivation
6
In our previous work, we have developed
Yale Image Finder
• Yale Image Finder (YIF) is one of the most widely accessed biomedical image
search engines.
• It retrieves biomedical images and associated data based on queries made
over the metadata of the images.
• YIF also searches within the image using a sophisticated image segmentation
method followed by OCR
• YIF repository currently holds over two million biomedical images and
associated metadata in its index.
7
Challenges
• Searching for images of a certain type is error prone as images are still opaque to
information retrieval and knowledge extraction engines.
• In the Life Sciences, spreadsheets, databases and XML files continue to be the
conventional formats used to store experimental data, e.g. Biota , DrugBank and Open
Microscopy Environment (OME)
• The fact that data exists only in these legacy formats frequently impedes data integration
and significantly impedes scientific knowledge discovery.
 Interoperability and Reusability
 Data integration
 Image Provenance (Orphan Data)
 Semantic Search is not possible
Proposed Solution
• To overcome these issues and to accelerate the adoption of the YIF for next
generation biomedical applications, we have developed a publically
accessible semantic API for biomedical images with multiple modalities called
• iCyrus is powered by a dedicated semantic architecture that exposes the YIF
content as linked Image data
9
Proposed Solution
• iCyrus permits integration with related information resources and consume by
linked data-aware data services.
• To facilitate the adhoc integration of image data with other online data
resources, we also built semantic web services for iCyrus, such that it is
compatible with the SADI framework.
• We have extended iCyrus functionalities further through the incorporation of
Domeo.
• The iCyrus triplestore currently holds more than thirty-five million triples and
can be accessed and operated through syntactic or semantic query interfaces.
10
iCyrus Process diagram
11
Stage 1
• At stage 1, Image datasets are acquired from Yale
Image Finder repository to build a knowledgebase for
iCyrus API
• Establishes connections with YIF and PubMed
concurrently to crosscheck the image metadata
• Resolve Image redundancy and completes the
metadata information
• Stored the clean image data in mysql as parallel
storage with triplestore
12
Stage 2
• The foremost task in semantic data publication is defining
appropriate semantic vocabularies.
• Reusability is considered a noble practice in semantic web
application development and it is generally accepted that
well-known semantic vocabularies
• To identify appropriate semantic mappings between
available ontologies and the YIF metadata, we created a
Java program that suggests possible mappings.
13
Stage 2 (Continue)
A cursory evaluation of the derived mappings showed there were
three types of results;
① Mappings that fully met our requirements which suggested
predicates such as hasPubMedID and hasPMCID in the FRBR-
aligned Bibliographic Ontology (fbio)
② Mappings that were insufficiently defined, like the imageFeature
property that exists in DICOM Ontology
③ Mappings with hosted resources that did not appear trustworthy.
To manage the new vocabularies that fulfill the requirements
of iCyrus and SEBI
14
BIM Ontology
◼ Biomedical Image Ontology provides the semantic vocabularies to all modules of
SEBI.
◼ BIM vocabularies can be categorised into four types:
 Annotation vocabularies hasImageAnnotationSet, hasSequenceType
 Provenance Vocabularies hasAnnVerfiBy, hasCreatedBy
 Features and function vocabularies hasSequenceMotif, hasConservedResidue
 Semantic service vocabularies SADI services input and output
◼ It maintains the provenance on semantic annotations.
BIM Model of automatic sequence annotation by a web
17
BIMcrowd-sourceModelingofabiomedical
Image
18
BIMModelingforImageassociatedtext
UNB-VPS (Semantic Vocabularies Publishing Server)
http://cbakerlab.unbsj.ca/unbvps
 UNB-VPS is vocabulary publishing server deployed at UNB SJ campus and are used to publish the
semantic vocabularies.
The BIM Ontology
Stage 2 (Continue)
• we developed a customizable domain-dependent schema mapper
for the curation of initial mappings based on our MySQL stored
data.
• With the help of the Jena Model API and domain-dependent
schema mapper, we RDFized the image data.
• At last stage, we stored the RDFized data into Sesame Triple
store
21
Stage 3
• To expose the linked open data, we configured a
Sesame triple store and deployed SNORQL, an
AJAX front-end for exploring RDF SPARQL
endpoints
• SNORQL permits users to view and export data in
XML, XHTML and JSON - a javascript object
notation, a popular format among web developers.
• To facilitate end user navigation for technical users
through iCyrus linked data, we deployed Pubby , a
Linked Data interface for local and remote SPARQL 22
23
iCyrus SNORQL endpoint
24
iCyrus Linked data explorer
Stage 4
• To demonstrate iCyrus’ usability as a semantic image
API in general, we developed a number of web services
using the SADI framework to advertise their availability.
• Web services are effective medium for the use of
software functionalities in distributed environments
without deploying the entire application on the client
machine.
• A plethora of Bioinformatics software is available on the
internet, but most of them have their own accessing
criteria and information exchange formats. 25
Stage 4 (Continue)
• To get full benefit out of these utilities, output should be available in an integrated and
 Available web services protocols - REST, SOAP WSDL etc.
 Non Semantic
 Schema oriented therefore interoperability and scalability issues
SADI Framework- Stage 4 (Continue)
• The SADI framework is a set of conventions for creating HTTP-based semantic
web services that can be automatically discovered and orchestrated
• SADI services consume RDF document(s) as input and produce RDF
document(s) as output
 It solves the interoperability problem.
• SADI framework is designed to achieve semantic interoperability among web
services designed for different purposes.
Use Case (getAlternateGeneName)
SADI Service Modeling
iCyrus with Federated Query Client
• We configured iCyrus’ SADI services registry with the
SHARE federated query engine to illustrate SHARE’s
automatic information discovery feature.
• As an example, we ran the following query: Display
images of the ‘IDS gene’ from all documents along with
their captions and extend the search with alternate
gene names.
iCyrus Plugin for DOMEO
• The Domeo collection of software components provides a rich set of
features that can be further extended through the development of new
software plugins.
• In order to provide the additional information about the scientific contents in
Domeo, we mashed up metadata provided by Yale Image Finder through
the iCyrus SPARQL endpoint.
• We developed a server-side software connector that, given a PubMed
Central document, is able to query the iCyrus SPARQL end-point for all the
metadata related to a document’s images.
31
Domeo Environment
iCyrus Images is crowd
Annotated and annotati
Are stored back to triple
SEBI as an Extension of iCyrus
32
• Image-first knowledge discovery framework to foster the efficient biomedical
literature searching practices and to facilitate the biological image discovery
and reuse.
Working
• SEBI unlocks information associated with and contained in biomedical
sequence images.
• Utilize the information extracted from images to harvest new image
annotations from heterogeneous online biomedical resources. e.g. BLAST,
HMMER
• SEBI incorporates knowledge infrastructure components and services
including image feature extraction, Semantic Web data services, linked openSEBI stands for Semantic Enrichment of Biomedical Images
Semantic Sequence Image Enrichment
33
Related Image Finding in SEBI
34
 SEBI employs the cosine similarity along with fuzzy rule engine to discover and categorize the rel
Who will get benefit out of this work?
35
Clinician looks for the visual representation of a disease or condition
Researcher searches for studies with certain types of analyses
Students seek for diagrams that elucidate complex processes such as DNA
replication
Professional or educator look for an image for a presentation
Patient wants to better understand his disease.
Any Question?
Scan for project page
36Please Feedback @bukharig8

A semantic framework for biomedical image discovery

  • 1.
    A Semantic Frameworkfor Biomedical Image Discovery Ahmad C. Bukhari , Mate Levente Nagy, Michael Krauthammer , Paolo Ciccarese , Christopher J. O. Baker 1
  • 2.
    Background and Motivation •Rapid development in Biomedical research produces a continuous stream of new knowledge. • Efficient literature accessing practices are essential to transfer of information from the research community to peer investigators and other healthcare practitioners. • Images depict key findings of research papers and help academically to better understand the biological concepts.
  • 3.
    Background and Motivation Proteinand DNA sequence Images: Provide exact specification of the composition of a biological entity. Pathway diagrams: Signaling flow / interacting proteins.
  • 4.
    Background and Motivation MRIScans - locations of specific brain activity. Gel Imaging - information about DNA and protein manipulation Ultrasoun d X- Rays Graphs
  • 5.
    Background and Motivation •Making biomedical image content explicit is essential with regards to making medical decisions such as: 5  Diagnosis, Treatment and Follow-up  Data management and the secondary use for biomedical research  Assessment of care delivery. • However, the issues associated with knowledge management and utility operations unique to image data are only recently gaining recognition.
  • 6.
    Background and Motivation 6 Inour previous work, we have developed
  • 7.
    Yale Image Finder •Yale Image Finder (YIF) is one of the most widely accessed biomedical image search engines. • It retrieves biomedical images and associated data based on queries made over the metadata of the images. • YIF also searches within the image using a sophisticated image segmentation method followed by OCR • YIF repository currently holds over two million biomedical images and associated metadata in its index. 7
  • 8.
    Challenges • Searching forimages of a certain type is error prone as images are still opaque to information retrieval and knowledge extraction engines. • In the Life Sciences, spreadsheets, databases and XML files continue to be the conventional formats used to store experimental data, e.g. Biota , DrugBank and Open Microscopy Environment (OME) • The fact that data exists only in these legacy formats frequently impedes data integration and significantly impedes scientific knowledge discovery.  Interoperability and Reusability  Data integration  Image Provenance (Orphan Data)  Semantic Search is not possible
  • 9.
    Proposed Solution • Toovercome these issues and to accelerate the adoption of the YIF for next generation biomedical applications, we have developed a publically accessible semantic API for biomedical images with multiple modalities called • iCyrus is powered by a dedicated semantic architecture that exposes the YIF content as linked Image data 9
  • 10.
    Proposed Solution • iCyruspermits integration with related information resources and consume by linked data-aware data services. • To facilitate the adhoc integration of image data with other online data resources, we also built semantic web services for iCyrus, such that it is compatible with the SADI framework. • We have extended iCyrus functionalities further through the incorporation of Domeo. • The iCyrus triplestore currently holds more than thirty-five million triples and can be accessed and operated through syntactic or semantic query interfaces. 10
  • 11.
  • 12.
    Stage 1 • Atstage 1, Image datasets are acquired from Yale Image Finder repository to build a knowledgebase for iCyrus API • Establishes connections with YIF and PubMed concurrently to crosscheck the image metadata • Resolve Image redundancy and completes the metadata information • Stored the clean image data in mysql as parallel storage with triplestore 12
  • 13.
    Stage 2 • Theforemost task in semantic data publication is defining appropriate semantic vocabularies. • Reusability is considered a noble practice in semantic web application development and it is generally accepted that well-known semantic vocabularies • To identify appropriate semantic mappings between available ontologies and the YIF metadata, we created a Java program that suggests possible mappings. 13
  • 14.
    Stage 2 (Continue) Acursory evaluation of the derived mappings showed there were three types of results; ① Mappings that fully met our requirements which suggested predicates such as hasPubMedID and hasPMCID in the FRBR- aligned Bibliographic Ontology (fbio) ② Mappings that were insufficiently defined, like the imageFeature property that exists in DICOM Ontology ③ Mappings with hosted resources that did not appear trustworthy. To manage the new vocabularies that fulfill the requirements of iCyrus and SEBI 14
  • 15.
    BIM Ontology ◼ BiomedicalImage Ontology provides the semantic vocabularies to all modules of SEBI. ◼ BIM vocabularies can be categorised into four types:  Annotation vocabularies hasImageAnnotationSet, hasSequenceType  Provenance Vocabularies hasAnnVerfiBy, hasCreatedBy  Features and function vocabularies hasSequenceMotif, hasConservedResidue  Semantic service vocabularies SADI services input and output ◼ It maintains the provenance on semantic annotations.
  • 16.
    BIM Model ofautomatic sequence annotation by a web
  • 17.
  • 18.
  • 19.
    UNB-VPS (Semantic VocabulariesPublishing Server) http://cbakerlab.unbsj.ca/unbvps  UNB-VPS is vocabulary publishing server deployed at UNB SJ campus and are used to publish the semantic vocabularies.
  • 20.
  • 21.
    Stage 2 (Continue) •we developed a customizable domain-dependent schema mapper for the curation of initial mappings based on our MySQL stored data. • With the help of the Jena Model API and domain-dependent schema mapper, we RDFized the image data. • At last stage, we stored the RDFized data into Sesame Triple store 21
  • 22.
    Stage 3 • Toexpose the linked open data, we configured a Sesame triple store and deployed SNORQL, an AJAX front-end for exploring RDF SPARQL endpoints • SNORQL permits users to view and export data in XML, XHTML and JSON - a javascript object notation, a popular format among web developers. • To facilitate end user navigation for technical users through iCyrus linked data, we deployed Pubby , a Linked Data interface for local and remote SPARQL 22
  • 23.
  • 24.
  • 25.
    Stage 4 • Todemonstrate iCyrus’ usability as a semantic image API in general, we developed a number of web services using the SADI framework to advertise their availability. • Web services are effective medium for the use of software functionalities in distributed environments without deploying the entire application on the client machine. • A plethora of Bioinformatics software is available on the internet, but most of them have their own accessing criteria and information exchange formats. 25
  • 26.
    Stage 4 (Continue) •To get full benefit out of these utilities, output should be available in an integrated and  Available web services protocols - REST, SOAP WSDL etc.  Non Semantic  Schema oriented therefore interoperability and scalability issues
  • 27.
    SADI Framework- Stage4 (Continue) • The SADI framework is a set of conventions for creating HTTP-based semantic web services that can be automatically discovered and orchestrated • SADI services consume RDF document(s) as input and produce RDF document(s) as output  It solves the interoperability problem. • SADI framework is designed to achieve semantic interoperability among web services designed for different purposes.
  • 28.
  • 29.
    iCyrus with FederatedQuery Client • We configured iCyrus’ SADI services registry with the SHARE federated query engine to illustrate SHARE’s automatic information discovery feature. • As an example, we ran the following query: Display images of the ‘IDS gene’ from all documents along with their captions and extend the search with alternate gene names.
  • 30.
    iCyrus Plugin forDOMEO • The Domeo collection of software components provides a rich set of features that can be further extended through the development of new software plugins. • In order to provide the additional information about the scientific contents in Domeo, we mashed up metadata provided by Yale Image Finder through the iCyrus SPARQL endpoint. • We developed a server-side software connector that, given a PubMed Central document, is able to query the iCyrus SPARQL end-point for all the metadata related to a document’s images.
  • 31.
    31 Domeo Environment iCyrus Imagesis crowd Annotated and annotati Are stored back to triple
  • 32.
    SEBI as anExtension of iCyrus 32 • Image-first knowledge discovery framework to foster the efficient biomedical literature searching practices and to facilitate the biological image discovery and reuse. Working • SEBI unlocks information associated with and contained in biomedical sequence images. • Utilize the information extracted from images to harvest new image annotations from heterogeneous online biomedical resources. e.g. BLAST, HMMER • SEBI incorporates knowledge infrastructure components and services including image feature extraction, Semantic Web data services, linked openSEBI stands for Semantic Enrichment of Biomedical Images
  • 33.
  • 34.
    Related Image Findingin SEBI 34  SEBI employs the cosine similarity along with fuzzy rule engine to discover and categorize the rel
  • 35.
    Who will getbenefit out of this work? 35 Clinician looks for the visual representation of a disease or condition Researcher searches for studies with certain types of analyses Students seek for diagrams that elucidate complex processes such as DNA replication Professional or educator look for an image for a presentation Patient wants to better understand his disease.
  • 36.
    Any Question? Scan forproject page 36Please Feedback @bukharig8