Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
CURRENT ADVANCES TO BRIDGE THE
USABILITY-EXPRESSIVITY GAP IN
BIOMEDICAL SEMANTIC SEARCH
(AND VISUALIZING LINKED DATA)
Maul...
QUERYING HETEROGENEOUS
DATASETS ON THE LINKED DATA WEB
André Freitas, Edward Curry, João Gabriel
Oliveira and Seán O'Riain...
INTRODUCTION
¢  Opportunities
—  Builds on existing Web Infrastructure (URIs and HTTP)
and Semantic Web Standards (RDF, ...
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
EXISTING APPROACHES
¢  Information Retrieval Approaches
—  Entity-centric Search (SWSE, Sindice)
—  Structure Search (S...
CHALLENGE DIMENSIONS
¢  Query expressivity
—  Query datasets by referencing elements in the data model, operate
over the...
GOOGLE KNOWLEDGE GRAPH
GOOGLE KNOWLEDGE GRAPH
BIOMEDICAL MOTIVATION
~5 compounds
~300 000
compounds
~300 interesting
compounds
~ 10 interesting
compounds
Literature
Vir...
REVEALD: A USER-DRIVEN
DOMAIN-SPECIFIC INTERACTIVE
SEARCH PLATFORM FOR
BIOMEDICAL RESEARCH
Maulik R. Kamdar, Dimitris Zegi...
CHALLENGES
¢  Awareness of which exposed datasets potentially
contain the data they want and their data model.
¢  Large,...
BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic m...
BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic m...
LIFE SCIENCES LINKED OPEN DATA CLOUD
~3 Billion Triples Life
Sciences
53 datasets
Cyganiak,R. and Jentzsch,A. (2014) The L...
BACKGROUND: CATALOGUING & LINKING
1248 Concepts and 1255 properties were harvested
from more than 53 Linked Biomedical Dat...
BACKGROUND: ENTITY RECONCILIATION
BACKGROUND: FEDERATED ARCHITECTURE
Chebi:Compound	
  	
  	
  	
  	
  	
  	
  	
  void-­‐ext:subClassOf	
  	
  	
  	
  	
  ...
BACKGROUND: FEDERATED ARCHITECTURE
Ø Non-intuitive
Ø SPARQL, RDF,
Schema knowledge
required
Ø Domain-specific
visualiza...
REVEALD SEARCH PLATFORM
¢  ReVeaLD :- Real-Time Visual Explorer and
Aggregator of Linked Data, is a user-driven
domain-sp...
REVEALD SEARCH PLATFORM
Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
REVEALD SEARCH PLATFORM
Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
DSL VISUAL REPRESENTATION
¢  Concept Map Visualization
VISUAL QUERY BUILDER
CanCO DSL
VISUAL QUERY BUILDER
CanCO DSL
VISUAL QUERY MODEL
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX granatum: <http://chem.deri.ie/granatum/>
P...
REVEALD DATA BROWSER
REVEALD DATA BROWSER
REVEALD DATA BROWSER
REVEALD DATA BROWSER
GRAPHIC RULES
¢  Query : SELECT * WHERE {<clickedURI> ?p ?o}
¢  Results are subjected to a set of Graphic Rules, which
f...
SINGLE ENTITY SEARCH
EVALUATION
¢  Tracking Real-time User Experience Methodology (TRUE)
- widely used in the HCI community to evaluate comput...
EVALUATION RESULTS
EVALUATION RESULTS
EVALUATION RESULTS
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TC...
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TC...
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TC...
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TC...
OTHER IMPLEMENTATIONS: LINKEDPPI
Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-P...
OTHER IMPLEMENTATIONS: LINKEDPPI
Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-P...
DISCUSSION
¢  DSL Incrementation Mechanism
—  Extend the current model represented in the Visual Query
Builder by adding...
FUTURE WORK
¢  Ontologies, indexed term labels and catalogue as elements
in a Controlled Natural Language to increase usa...
Thank You!
maulikrk@stanford.edu
Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)
Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)
Upcoming SlideShare
Loading in …5
×

Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

760 views

Published on

I presented a talk at the Protege research meeting on the 'Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)' https://sites.google.com/site/protegeresearchmeeting/meeting-materials/current-advances-to-bridge-the-usability-expressivity-gap-in-semantic-search

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

  1. 1. CURRENT ADVANCES TO BRIDGE THE USABILITY-EXPRESSIVITY GAP IN BIOMEDICAL SEMANTIC SEARCH (AND VISUALIZING LINKED DATA) Maulik R. Kamdar Biomedical Informatics PhD Program 3rd April 2015
  2. 2. QUERYING HETEROGENEOUS DATASETS ON THE LINKED DATA WEB André Freitas, Edward Curry, João Gabriel Oliveira and Seán O'Riain Internet Computing February 2012 EVALUATING THE USABILITY OF NATURAL LANGUAGE QUERY LANGUAGES AND INTERFACES TO SEMANTIC WEB KNOWLEDGE BASES Esther Kaufmann and Abraham Bernstein Journal Of Web Semantics November 2010
  3. 3. INTRODUCTION ¢  Opportunities —  Builds on existing Web Infrastructure (URIs and HTTP) and Semantic Web Standards (RDF, RDFS, vocabularies) —  Reduce barriers to data publication, consumption, reuse and availability, adding a fine-grained structure. —  Expose previously siloed databases as data graphs (D2R, Google Refine) to be interlinked and integrated with other datasets to create a global-scale interlinked dataspace. ¢  Challenges —  Awareness of which exposed datasets potentially contain the data they want, their location and their data model. —  Syntax of structured query languages like SPARQL —  Heterogeneous, different descriptors for same entity, loosely-connected (yet!) and distributed data sources
  4. 4. USABILITY-EXPRESSIVITY GAP
  5. 5. USABILITY-EXPRESSIVITY GAP
  6. 6. USABILITY-EXPRESSIVITY GAP
  7. 7. USABILITY-EXPRESSIVITY GAP
  8. 8. USABILITY-EXPRESSIVITY GAP
  9. 9. USABILITY-EXPRESSIVITY GAP
  10. 10. EXISTING APPROACHES ¢  Information Retrieval Approaches —  Entity-centric Search (SWSE, Sindice) —  Structure Search (Semplore) – use of inverted indexes and user feedback strategies ¢  Natural Language Queries —  Question Answering (PowerAqua, FREyA) —  Difficult to expand across domains —  Best-effort Natural Language Interfaces (Treo) —  Habitability Problem - users need guidance and support —  WordNet/Wikipedia semantic approximation techniques ¢  Structured SPARQL Queries
  11. 11. CHALLENGE DIMENSIONS ¢  Query expressivity —  Query datasets by referencing elements in the data model, operate over the data (aggregate results, express conditional statements). ¢  Usability —  An easy-to-operate, intuitive, and task-efficient query interface. ¢  Vocabulary-level semantic matching —  Semantically match query terms to dataset vocabulary-level terms. ¢  Entity reconciliation —  Match entities expressed in the query to semantically equivalent dataset entities. ¢  Semantic tractability mechanisms —  Answer queries not supported by explicit dataset statements (for example, “Is Natalie Portman an Actress?” can be supported by the statement “Natalie Portman starred Star Wars”).
  12. 12. GOOGLE KNOWLEDGE GRAPH
  13. 13. GOOGLE KNOWLEDGE GRAPH
  14. 14. BIOMEDICAL MOTIVATION ~5 compounds ~300 000 compounds ~300 interesting compounds ~ 10 interesting compounds Literature VirtualScreening Querydatabases Hypothesis Generation (Linked) Data “Are there Drugs with molecular weight under 400 tested against ‘Colon Cancer’?” “Do any Publications refer to assays using ‘Aspirin’ as the primary Drug in treatment of ‘Prostrate Cancer’?
  15. 15. REVEALD: A USER-DRIVEN DOMAIN-SPECIFIC INTERACTIVE SEARCH PLATFORM FOR BIOMEDICAL RESEARCH Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain, Stefan Decker and Helena F. Deus Journal of Biomedical Informatics February 2014
  16. 16. CHALLENGES ¢  Awareness of which exposed datasets potentially contain the data they want and their data model. ¢  Large, heterogeneous biomedical data sources, which are too dynamic for reliable data centralization ¢  The assembly of SPARQL queries to create the aggregated information for bioinformatics analysis still poses a high cognitive entry barrier. ¢  Human-readable, and more specifically, domain- specific representation of query results is required. ¢  None of the previous systems tested in biomedical domains, except DistilBio, VIQUEN and Cuebee ¢  Trade-off between expressivity and usability.
  17. 17. BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
  18. 18. BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
  19. 19. LIFE SCIENCES LINKED OPEN DATA CLOUD ~3 Billion Triples Life Sciences 53 datasets Cyganiak,R. and Jentzsch,A. (2014) The Linking Open Data cloud diagram. http://lod-cloud.net/ [Accessed: March 23, 2013]
  20. 20. BACKGROUND: CATALOGUING & LINKING 1248 Concepts and 1255 properties were harvested from more than 53 Linked Biomedical Data Sources (LBDS) (Life Sciences Linked Open Data – LSLOD catalogue) and linked to the CanCO Query Elements. Hasnain, Ali, et al. "Cataloguing and linking life sciences LOD cloud." 1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012).
  21. 21. BACKGROUND: ENTITY RECONCILIATION
  22. 22. BACKGROUND: FEDERATED ARCHITECTURE Chebi:Compound                void-­‐ext:subClassOf          Granatum:Molecule   Pubchem:Compound    void-­‐ext:subClassOf          Granatum:Molecule   ?molec a Granatum:Molecule ?molec a Chebi:Compound ?molec a Pubchem:Compound SPARQL     Query   Chebi   DrugBank   UniProt   Others   Life  Sciences  Linked  Open  Data     (LSLOD)   LSLOD   Catalogue   CanCO   Saved   Queries   Transformed   Query   Transformed   Query   Transformed   Query   Transformed   Query   Rule  Templates   Experimental   Datasets   Query     Engine     Query  Logging   TransformaGon   Cataloguing  &     Links  CreaGon   RDFizaGon   Social  CollaboraGve   Workspace   Hasnain, Ali, et al. "A Roadmap for navigating the Life Scinces Linked Open Data Cloud." International Semantic Technology (JIST2014) conference. 2014.
  23. 23. BACKGROUND: FEDERATED ARCHITECTURE Ø Non-intuitive Ø SPARQL, RDF, Schema knowledge required Ø Domain-specific visualization of results is not possible
  24. 24. REVEALD SEARCH PLATFORM ¢  ReVeaLD :- Real-Time Visual Explorer and Aggregator of Linked Data, is a user-driven domain-specific search platform. ¢  Intuitively formulate advanced search queries using a click-input-select mechanism ¢  Visualize the results in a domain–suitable format. ¢  Entity-centric and Visual Query Search System ¢  Assembly of the query is governed by a Domain- specific Language (DSL), which in this case is the Cancer Chemoprevention Ontology(CanCO)
  25. 25. REVEALD SEARCH PLATFORM Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
  26. 26. REVEALD SEARCH PLATFORM Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
  27. 27. DSL VISUAL REPRESENTATION ¢  Concept Map Visualization
  28. 28. VISUAL QUERY BUILDER CanCO DSL
  29. 29. VISUAL QUERY BUILDER CanCO DSL
  30. 30. VISUAL QUERY MODEL PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX granatum: <http://chem.deri.ie/granatum/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT * WHERE { ?x0_Assay a granatum:Assay ; granatum:hasInput ?x1_Target ; granatum:identify ?x2_ChemopreventiveAgent ; granatum:outcome_method ?x3_outcome_method . ?x1_Target granatum:title ?x4_title . ?x2_ChemopreventiveAgent granatum:molecularWeight ?x10_molecularWeight ; granatum:SMILESnotation ?x9_SMILESnotation ; granatum:hasFormula ?x7_hasFormula ; granatum:HBD ?x5_Hydrogen_Bond_Donors ; granatum:HBA ?x6_Hydrogen_Bond_Acceptors ; granatum:TPSA ?x8_Topological_Polar_Surface_Area . FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is") FILTER ( xsd:double(?x10_molecularWeight) < 300 ) } LIMIT 100 Pubchem ChEBI Uniprot ↑ → SPARQL Translation All Assays, which Target Estrogen Receptors present in Human (Organism), and which identify potential Chemopreventive Agents with Molecular Weight < 300 http://srvgal78.deri.ie:8080/explorer?type=sampleQuery&nodes=17-1-30-33-73-78-91-81-82-92-98-63 &links=17.1-17.30-1.33-17.73-17.78-1.91-30.81-30.82-30.92-30.98-33.63 &filters=1.91.c.estrogen%20receptor|30.98.lt.300|33.63.c.human&flexible=1
  31. 31. REVEALD DATA BROWSER
  32. 32. REVEALD DATA BROWSER
  33. 33. REVEALD DATA BROWSER
  34. 34. REVEALD DATA BROWSER
  35. 35. GRAPHIC RULES ¢  Query : SELECT * WHERE {<clickedURI> ?p ?o} ¢  Results are subjected to a set of Graphic Rules, which follow the Event-Condition-Action paradigm (ECA) and provide visual representations using Fresnel Display Vocabulary. ¢  Example : —  Event: Each retrieved triple as query execution result <http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/844> <http:// www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/pdbIdPage> “http:// www.pdb.org/pdb/explore/explore.do?structureId=1IVO” —  Condition: sdf_file or pdbIdpage (Predicate) + http (Object) —  Action: HTTP GET and invoke a specific Resource Renderer —  Resource Renderer: GLMol Molecular Viewer
  36. 36. SINGLE ENTITY SEARCH
  37. 37. EVALUATION ¢  Tracking Real-time User Experience Methodology (TRUE) - widely used in the HCI community to evaluate computer games ¢  Game-based evaluation where domain users are given tasks to complete and time and interactions are tracked using Google Analytics ¢  Subjectivistic evaluation where users were asked to fill out a survey. ¢  The main purpose of this evaluation focused on two usability concerns: —  Does familiarity of the users with the DSL affect the time needed to formulate the query? —  Does a constrained DSL (smaller DSL), lead to less time needed for query formulation?
  38. 38. EVALUATION RESULTS
  39. 39. EVALUATION RESULTS
  40. 40. EVALUATION RESULTS
  41. 41. OTHER IMPLEMENTATIONS: LINKED TCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  42. 42. OTHER IMPLEMENTATIONS: LINKED TCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  43. 43. OTHER IMPLEMENTATIONS: LINKED TCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  44. 44. OTHER IMPLEMENTATIONS: LINKED TCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  45. 45. OTHER IMPLEMENTATIONS: LINKEDPPI Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
  46. 46. OTHER IMPLEMENTATIONS: LINKEDPPI Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
  47. 47. DISCUSSION ¢  DSL Incrementation Mechanism —  Extend the current model represented in the Visual Query Builder by adding new concepts and properties. —  Use or merge publicly available extensions of the DSL ¢  No reliance on the Federated Query Engine, SPARQL Endpoint, underlying DSL and Graphic Rules. ¢  Corrupt Graphic Rules result in the textual representation of the relevant triple. ¢  Domain-specific Languages increase usability and enable abstraction of underlying data models Query expressivity   Usability   Vocabulary-level semantic matching   Entity reconciliation   Semantic tractability mechanisms   Medium  (SELECT,   FILTER,  OPTIONAL)   Medium  (En=ty-­‐ centric  Search,  VQS)   Low  (Indexed  Term   URI  to  Concept)   Low  (owl:sameAs  for   same  unique  keys)   None  
  48. 48. FUTURE WORK ¢  Ontologies, indexed term labels and catalogue as elements in a Controlled Natural Language to increase usability ¢  Results pipelined to any Problem-solving method (like Autodock Vina, visualization, ML algorithm etc.) ¢  Faceted Search, Related Entity Recognition based on Feature-based Similarity Measures ¢  Allowing users of the platform to provide their own DSL, data sources, and graphic rules. ¢  SPARQL Endpoint availability and latency ¢  Ontology Reuse instead of Ontology Alignment!
  49. 49. Thank You! maulikrk@stanford.edu

×