CURRENT ADVANCES TO BRIDGE THE
USABILITY-EXPRESSIVITY GAP IN
BIOMEDICAL SEMANTIC SEARCH
(AND VISUALIZING LINKED DATA)
Maulik R. Kamdar
Biomedical Informatics PhD Program
3rd April 2015
QUERYING HETEROGENEOUS
DATASETS ON THE LINKED DATA WEB
André Freitas, Edward Curry, João Gabriel
Oliveira and Seán O'Riain
Internet Computing
February 2012
EVALUATING THE USABILITY OF
NATURAL LANGUAGE QUERY
LANGUAGES AND INTERFACES TO
SEMANTIC WEB KNOWLEDGE BASES
Esther Kaufmann and Abraham Bernstein
Journal Of Web Semantics
November 2010
INTRODUCTION
¢  Opportunities
—  Builds on existing Web Infrastructure (URIs and HTTP)
and Semantic Web Standards (RDF, RDFS, vocabularies)
—  Reduce barriers to data publication, consumption, reuse
and availability, adding a fine-grained structure.
—  Expose previously siloed databases as data graphs (D2R,
Google Refine) to be interlinked and integrated with other
datasets to create a global-scale interlinked dataspace.
¢  Challenges
—  Awareness of which exposed datasets potentially contain
the data they want, their location and their data model.
—  Syntax of structured query languages like SPARQL
—  Heterogeneous, different descriptors for same entity,
loosely-connected (yet!) and distributed data sources
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
USABILITY-EXPRESSIVITY GAP
EXISTING APPROACHES
¢  Information Retrieval Approaches
—  Entity-centric Search (SWSE, Sindice)
—  Structure Search (Semplore) – use of inverted indexes
and user feedback strategies
¢  Natural Language Queries
—  Question Answering (PowerAqua, FREyA)
—  Difficult to expand across domains
—  Best-effort Natural Language Interfaces (Treo)
—  Habitability Problem - users need guidance and support
—  WordNet/Wikipedia semantic approximation techniques
¢  Structured SPARQL Queries
CHALLENGE DIMENSIONS
¢  Query expressivity
—  Query datasets by referencing elements in the data model, operate
over the data (aggregate results, express conditional statements).
¢  Usability
—  An easy-to-operate, intuitive, and task-efficient query interface.
¢  Vocabulary-level semantic matching
—  Semantically match query terms to dataset vocabulary-level terms.
¢  Entity reconciliation
—  Match entities expressed in the query to semantically equivalent
dataset entities.
¢  Semantic tractability mechanisms
—  Answer queries not supported by explicit dataset statements
(for example, “Is Natalie Portman an Actress?” can be supported by
the statement “Natalie Portman starred Star Wars”).
GOOGLE KNOWLEDGE GRAPH
GOOGLE KNOWLEDGE GRAPH
BIOMEDICAL MOTIVATION
~5 compounds
~300 000
compounds
~300 interesting
compounds
~ 10 interesting
compounds
Literature
VirtualScreening
Querydatabases
Hypothesis
Generation
(Linked) Data
“Are there Drugs with molecular weight
under 400 tested against ‘Colon Cancer’?”
“Do any Publications refer to assays using ‘Aspirin’ as
the primary Drug in treatment of ‘Prostrate Cancer’?
REVEALD: A USER-DRIVEN
DOMAIN-SPECIFIC INTERACTIVE
SEARCH PLATFORM FOR
BIOMEDICAL RESEARCH
Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain,
Stefan Decker and Helena F. Deus
Journal of Biomedical Informatics
February 2014
CHALLENGES
¢  Awareness of which exposed datasets potentially
contain the data they want and their data model.
¢  Large, heterogeneous biomedical data sources, which
are too dynamic for reliable data centralization
¢  The assembly of SPARQL queries to create the
aggregated information for bioinformatics analysis
still poses a high cognitive entry barrier.
¢  Human-readable, and more specifically, domain-
specific representation of query results is required.
¢  None of the previous systems tested in biomedical
domains, except DistilBio, VIQUEN and Cuebee
¢  Trade-off between expressivity and usability.
BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
BACKGROUND: CANCO DOMAIN-SPECIFIC MODEL
Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
LIFE SCIENCES LINKED OPEN DATA CLOUD
~3 Billion Triples Life
Sciences
53 datasets
Cyganiak,R. and Jentzsch,A. (2014) The Linking Open Data cloud diagram. http://lod-cloud.net/ [Accessed: March 23, 2013]
BACKGROUND: CATALOGUING & LINKING
1248 Concepts and 1255 properties were harvested
from more than 53 Linked Biomedical Data Sources
(LBDS) (Life Sciences Linked Open Data – LSLOD
catalogue) and linked to the CanCO Query Elements.
Hasnain, Ali, et al. "Cataloguing and linking life sciences LOD cloud." 1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012).
BACKGROUND: ENTITY RECONCILIATION
BACKGROUND: FEDERATED ARCHITECTURE
Chebi:Compound	
  	
  	
  	
  	
  	
  	
  	
  void-­‐ext:subClassOf	
  	
  	
  	
  	
  Granatum:Molecule	
  
Pubchem:Compound	
  	
  void-­‐ext:subClassOf	
  	
  	
  	
  	
  Granatum:Molecule	
  
?molec a Granatum:Molecule
?molec a Chebi:Compound
?molec a Pubchem:Compound
SPARQL	
  	
  
Query	
  
Chebi	
   DrugBank	
   UniProt	
   Others	
  
Life	
  Sciences	
  Linked	
  Open	
  Data	
  	
  
(LSLOD)	
  
LSLOD	
  
Catalogue	
  
CanCO	
  
Saved	
  
Queries	
  
Transformed	
  
Query	
  
Transformed	
  
Query	
  
Transformed	
  
Query	
  
Transformed	
  
Query	
  
Rule	
  Templates	
  
Experimental	
  
Datasets	
  
Query	
  	
  
Engine	
  	
  
Query	
  Logging	
  
TransformaGon	
  
Cataloguing	
  &	
  	
  
Links	
  CreaGon	
  
RDFizaGon	
  
Social	
  CollaboraGve	
  
Workspace	
  
Hasnain, Ali, et al. "A Roadmap for navigating the Life Scinces Linked Open Data Cloud." International Semantic Technology (JIST2014) conference. 2014.
BACKGROUND: FEDERATED ARCHITECTURE
Ø Non-intuitive
Ø SPARQL, RDF,
Schema knowledge
required
Ø Domain-specific
visualization of
results is not possible
REVEALD SEARCH PLATFORM
¢  ReVeaLD :- Real-Time Visual Explorer and
Aggregator of Linked Data, is a user-driven
domain-specific search platform.
¢  Intuitively formulate advanced search queries
using a click-input-select mechanism
¢  Visualize the results in a domain–suitable format.
¢  Entity-centric and Visual Query Search System
¢  Assembly of the query is governed by a Domain-
specific Language (DSL), which in this case is the
Cancer Chemoprevention Ontology(CanCO)
REVEALD SEARCH PLATFORM
Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
REVEALD SEARCH PLATFORM
Demo: https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
DSL VISUAL REPRESENTATION
¢  Concept Map Visualization
VISUAL QUERY BUILDER
CanCO DSL
VISUAL QUERY BUILDER
CanCO DSL
VISUAL QUERY MODEL
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX granatum: <http://chem.deri.ie/granatum/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT * WHERE
{
?x0_Assay a granatum:Assay ;
granatum:hasInput ?x1_Target ;
granatum:identify ?x2_ChemopreventiveAgent ;
granatum:outcome_method ?x3_outcome_method .
?x1_Target granatum:title ?x4_title .
?x2_ChemopreventiveAgent
granatum:molecularWeight ?x10_molecularWeight ;
granatum:SMILESnotation ?x9_SMILESnotation ;
granatum:hasFormula ?x7_hasFormula ;
granatum:HBD ?x5_Hydrogen_Bond_Donors ;
granatum:HBA ?x6_Hydrogen_Bond_Acceptors ;
granatum:TPSA ?x8_Topological_Polar_Surface_Area .
FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is")
FILTER ( xsd:double(?x10_molecularWeight) < 300 )
} LIMIT 100
Pubchem
ChEBI
Uniprot
↑
→ SPARQL Translation
All Assays, which Target Estrogen Receptors present in Human (Organism), and which
identify potential Chemopreventive Agents with Molecular Weight < 300
http://srvgal78.deri.ie:8080/explorer?type=sampleQuery&nodes=17-1-30-33-73-78-91-81-82-92-98-63
&links=17.1-17.30-1.33-17.73-17.78-1.91-30.81-30.82-30.92-30.98-33.63
&filters=1.91.c.estrogen%20receptor|30.98.lt.300|33.63.c.human&flexible=1
REVEALD DATA BROWSER
REVEALD DATA BROWSER
REVEALD DATA BROWSER
REVEALD DATA BROWSER
GRAPHIC RULES
¢  Query : SELECT * WHERE {<clickedURI> ?p ?o}
¢  Results are subjected to a set of Graphic Rules, which
follow the Event-Condition-Action paradigm (ECA)
and provide visual representations using
Fresnel Display Vocabulary.
¢  Example :
—  Event: Each retrieved triple as query execution result
<http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/844> <http://
www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/pdbIdPage> “http://
www.pdb.org/pdb/explore/explore.do?structureId=1IVO”
—  Condition: sdf_file or pdbIdpage (Predicate) + http (Object)
—  Action: HTTP GET and invoke a specific Resource Renderer
—  Resource Renderer: GLMol Molecular Viewer
SINGLE ENTITY SEARCH
EVALUATION
¢  Tracking Real-time User Experience Methodology (TRUE)
- widely used in the HCI community to evaluate computer games
¢  Game-based evaluation where domain users are given tasks to complete
and time and interactions are tracked using Google Analytics
¢  Subjectivistic evaluation where users were asked to fill out a survey.
¢  The main purpose of this evaluation focused on two usability concerns:
—  Does familiarity of the users with the DSL affect the time needed to
formulate the query?
—  Does a constrained DSL (smaller DSL), lead to less time needed for
query formulation?
EVALUATION RESULTS
EVALUATION RESULTS
EVALUATION RESULTS
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
OTHER IMPLEMENTATIONS: LINKED TCGA
Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41.
http://srvgal78.deri.ie/tcga-pubmed/
OTHER IMPLEMENTATIONS: LINKEDPPI
Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
OTHER IMPLEMENTATIONS: LINKEDPPI
Kazemzadeh, L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
DISCUSSION
¢  DSL Incrementation Mechanism
—  Extend the current model represented in the Visual Query
Builder by adding new concepts and properties.
—  Use or merge publicly available extensions of the DSL
¢  No reliance on the Federated Query Engine, SPARQL
Endpoint, underlying DSL and Graphic Rules.
¢  Corrupt Graphic Rules result in the textual
representation of the relevant triple.
¢  Domain-specific Languages increase usability and
enable abstraction of underlying data models
Query expressivity	
   Usability	
   Vocabulary-level
semantic matching	
  
Entity reconciliation 	
   Semantic tractability
mechanisms 	
  
Medium	
  (SELECT,	
  
FILTER,	
  OPTIONAL)	
  
Medium	
  (En=ty-­‐
centric	
  Search,	
  VQS)	
  
Low	
  (Indexed	
  Term	
  
URI	
  to	
  Concept)	
  
Low	
  (owl:sameAs	
  for	
  
same	
  unique	
  keys)	
  
None	
  
FUTURE WORK
¢  Ontologies, indexed term labels and catalogue as elements
in a Controlled Natural Language to increase usability
¢  Results pipelined to any Problem-solving method (like
Autodock Vina, visualization, ML algorithm etc.)
¢  Faceted Search, Related Entity Recognition based on
Feature-based Similarity Measures
¢  Allowing users of the platform to provide their own DSL,
data sources, and graphic rules.
¢  SPARQL Endpoint availability and latency
¢  Ontology Reuse instead of Ontology Alignment!
Thank You!
maulikrk@stanford.edu

Current advances to bridge the usability-expressivity gap in biomedical semantic search (and visualizing linked data)

  • 1.
    CURRENT ADVANCES TOBRIDGE THE USABILITY-EXPRESSIVITY GAP IN BIOMEDICAL SEMANTIC SEARCH (AND VISUALIZING LINKED DATA) Maulik R. Kamdar Biomedical Informatics PhD Program 3rd April 2015
  • 2.
    QUERYING HETEROGENEOUS DATASETS ONTHE LINKED DATA WEB André Freitas, Edward Curry, João Gabriel Oliveira and Seán O'Riain Internet Computing February 2012 EVALUATING THE USABILITY OF NATURAL LANGUAGE QUERY LANGUAGES AND INTERFACES TO SEMANTIC WEB KNOWLEDGE BASES Esther Kaufmann and Abraham Bernstein Journal Of Web Semantics November 2010
  • 3.
    INTRODUCTION ¢  Opportunities —  Buildson existing Web Infrastructure (URIs and HTTP) and Semantic Web Standards (RDF, RDFS, vocabularies) —  Reduce barriers to data publication, consumption, reuse and availability, adding a fine-grained structure. —  Expose previously siloed databases as data graphs (D2R, Google Refine) to be interlinked and integrated with other datasets to create a global-scale interlinked dataspace. ¢  Challenges —  Awareness of which exposed datasets potentially contain the data they want, their location and their data model. —  Syntax of structured query languages like SPARQL —  Heterogeneous, different descriptors for same entity, loosely-connected (yet!) and distributed data sources
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
    EXISTING APPROACHES ¢  InformationRetrieval Approaches —  Entity-centric Search (SWSE, Sindice) —  Structure Search (Semplore) – use of inverted indexes and user feedback strategies ¢  Natural Language Queries —  Question Answering (PowerAqua, FREyA) —  Difficult to expand across domains —  Best-effort Natural Language Interfaces (Treo) —  Habitability Problem - users need guidance and support —  WordNet/Wikipedia semantic approximation techniques ¢  Structured SPARQL Queries
  • 11.
    CHALLENGE DIMENSIONS ¢  Queryexpressivity —  Query datasets by referencing elements in the data model, operate over the data (aggregate results, express conditional statements). ¢  Usability —  An easy-to-operate, intuitive, and task-efficient query interface. ¢  Vocabulary-level semantic matching —  Semantically match query terms to dataset vocabulary-level terms. ¢  Entity reconciliation —  Match entities expressed in the query to semantically equivalent dataset entities. ¢  Semantic tractability mechanisms —  Answer queries not supported by explicit dataset statements (for example, “Is Natalie Portman an Actress?” can be supported by the statement “Natalie Portman starred Star Wars”).
  • 14.
  • 15.
  • 16.
    BIOMEDICAL MOTIVATION ~5 compounds ~300000 compounds ~300 interesting compounds ~ 10 interesting compounds Literature VirtualScreening Querydatabases Hypothesis Generation (Linked) Data “Are there Drugs with molecular weight under 400 tested against ‘Colon Cancer’?” “Do any Publications refer to assays using ‘Aspirin’ as the primary Drug in treatment of ‘Prostrate Cancer’?
  • 17.
    REVEALD: A USER-DRIVEN DOMAIN-SPECIFICINTERACTIVE SEARCH PLATFORM FOR BIOMEDICAL RESEARCH Maulik R. Kamdar, Dimitris Zeginis, Ali Hasnain, Stefan Decker and Helena F. Deus Journal of Biomedical Informatics February 2014
  • 18.
    CHALLENGES ¢  Awareness ofwhich exposed datasets potentially contain the data they want and their data model. ¢  Large, heterogeneous biomedical data sources, which are too dynamic for reliable data centralization ¢  The assembly of SPARQL queries to create the aggregated information for bioinformatics analysis still poses a high cognitive entry barrier. ¢  Human-readable, and more specifically, domain- specific representation of query results is required. ¢  None of the previous systems tested in biomedical domains, except DistilBio, VIQUEN and Cuebee ¢  Trade-off between expressivity and usability.
  • 19.
    BACKGROUND: CANCO DOMAIN-SPECIFICMODEL Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
  • 20.
    BACKGROUND: CANCO DOMAIN-SPECIFICMODEL Zeginis, Dimitris, et al. "A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources." Semantic Web 5.2 (2014): 127-142.
  • 21.
    LIFE SCIENCES LINKEDOPEN DATA CLOUD ~3 Billion Triples Life Sciences 53 datasets Cyganiak,R. and Jentzsch,A. (2014) The Linking Open Data cloud diagram. http://lod-cloud.net/ [Accessed: March 23, 2013]
  • 22.
    BACKGROUND: CATALOGUING &LINKING 1248 Concepts and 1255 properties were harvested from more than 53 Linked Biomedical Data Sources (LBDS) (Life Sciences Linked Open Data – LSLOD catalogue) and linked to the CanCO Query Elements. Hasnain, Ali, et al. "Cataloguing and linking life sciences LOD cloud." 1st International Workshop on Ontology Engineering in a Data-driven World (OEDW 2012).
  • 23.
  • 24.
    BACKGROUND: FEDERATED ARCHITECTURE Chebi:Compound                void-­‐ext:subClassOf          Granatum:Molecule   Pubchem:Compound    void-­‐ext:subClassOf          Granatum:Molecule   ?molec a Granatum:Molecule ?molec a Chebi:Compound ?molec a Pubchem:Compound SPARQL     Query   Chebi   DrugBank   UniProt   Others   Life  Sciences  Linked  Open  Data     (LSLOD)   LSLOD   Catalogue   CanCO   Saved   Queries   Transformed   Query   Transformed   Query   Transformed   Query   Transformed   Query   Rule  Templates   Experimental   Datasets   Query     Engine     Query  Logging   TransformaGon   Cataloguing  &     Links  CreaGon   RDFizaGon   Social  CollaboraGve   Workspace   Hasnain, Ali, et al. "A Roadmap for navigating the Life Scinces Linked Open Data Cloud." International Semantic Technology (JIST2014) conference. 2014.
  • 25.
    BACKGROUND: FEDERATED ARCHITECTURE Ø Non-intuitive Ø SPARQL,RDF, Schema knowledge required Ø Domain-specific visualization of results is not possible
  • 26.
    REVEALD SEARCH PLATFORM ¢ ReVeaLD :- Real-Time Visual Explorer and Aggregator of Linked Data, is a user-driven domain-specific search platform. ¢  Intuitively formulate advanced search queries using a click-input-select mechanism ¢  Visualize the results in a domain–suitable format. ¢  Entity-centric and Visual Query Search System ¢  Assembly of the query is governed by a Domain- specific Language (DSL), which in this case is the Cancer Chemoprevention Ontology(CanCO)
  • 27.
    REVEALD SEARCH PLATFORM Demo:https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
  • 28.
    REVEALD SEARCH PLATFORM Demo:https://www.youtube.com/watch?v=6HHK4ASIkJM&hd=1
  • 29.
    DSL VISUAL REPRESENTATION ¢ Concept Map Visualization
  • 30.
  • 31.
  • 32.
    VISUAL QUERY MODEL PREFIXrdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX granatum: <http://chem.deri.ie/granatum/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT * WHERE { ?x0_Assay a granatum:Assay ; granatum:hasInput ?x1_Target ; granatum:identify ?x2_ChemopreventiveAgent ; granatum:outcome_method ?x3_outcome_method . ?x1_Target granatum:title ?x4_title . ?x2_ChemopreventiveAgent granatum:molecularWeight ?x10_molecularWeight ; granatum:SMILESnotation ?x9_SMILESnotation ; granatum:hasFormula ?x7_hasFormula ; granatum:HBD ?x5_Hydrogen_Bond_Donors ; granatum:HBA ?x6_Hydrogen_Bond_Acceptors ; granatum:TPSA ?x8_Topological_Polar_Surface_Area . FILTER regex(xsd:string(?x4_title), "estrogen receptor", "is") FILTER ( xsd:double(?x10_molecularWeight) < 300 ) } LIMIT 100 Pubchem ChEBI Uniprot ↑ → SPARQL Translation All Assays, which Target Estrogen Receptors present in Human (Organism), and which identify potential Chemopreventive Agents with Molecular Weight < 300 http://srvgal78.deri.ie:8080/explorer?type=sampleQuery&nodes=17-1-30-33-73-78-91-81-82-92-98-63 &links=17.1-17.30-1.33-17.73-17.78-1.91-30.81-30.82-30.92-30.98-33.63 &filters=1.91.c.estrogen%20receptor|30.98.lt.300|33.63.c.human&flexible=1
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    GRAPHIC RULES ¢  Query: SELECT * WHERE {<clickedURI> ?p ?o} ¢  Results are subjected to a set of Graphic Rules, which follow the Event-Condition-Action paradigm (ECA) and provide visual representations using Fresnel Display Vocabulary. ¢  Example : —  Event: Each retrieved triple as query execution result <http://www4.wiwiss.fu-berlin.de/drugbank/resource/targets/844> <http:// www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/pdbIdPage> “http:// www.pdb.org/pdb/explore/explore.do?structureId=1IVO” —  Condition: sdf_file or pdbIdpage (Predicate) + http (Object) —  Action: HTTP GET and invoke a specific Resource Renderer —  Resource Renderer: GLMol Molecular Viewer
  • 38.
  • 39.
    EVALUATION ¢  Tracking Real-timeUser Experience Methodology (TRUE) - widely used in the HCI community to evaluate computer games ¢  Game-based evaluation where domain users are given tasks to complete and time and interactions are tracked using Google Analytics ¢  Subjectivistic evaluation where users were asked to fill out a survey. ¢  The main purpose of this evaluation focused on two usability concerns: —  Does familiarity of the users with the DSL affect the time needed to formulate the query? —  Does a constrained DSL (smaller DSL), lead to less time needed for query formulation?
  • 40.
  • 41.
  • 42.
  • 43.
    OTHER IMPLEMENTATIONS: LINKEDTCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  • 44.
    OTHER IMPLEMENTATIONS: LINKEDTCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  • 45.
    OTHER IMPLEMENTATIONS: LINKEDTCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  • 46.
    OTHER IMPLEMENTATIONS: LINKEDTCGA Saleem, M., Kamdar, M. R., et al. (2014). Big linked cancer data: Integrating linked TCGA and PubMed. Web Semantics: Science, Services and Agents on the World Wide Web, 27, 34-41. http://srvgal78.deri.ie/tcga-pubmed/
  • 47.
    OTHER IMPLEMENTATIONS: LINKEDPPI Kazemzadeh,L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
  • 48.
    OTHER IMPLEMENTATIONS: LINKEDPPI Kazemzadeh,L., Kamdar, M. R.,et al. LinkedPPI: Enabling Intuitive, Integrative Protein-Protein Interaction Discovery. Linked Science, 48.
  • 49.
    DISCUSSION ¢  DSL IncrementationMechanism —  Extend the current model represented in the Visual Query Builder by adding new concepts and properties. —  Use or merge publicly available extensions of the DSL ¢  No reliance on the Federated Query Engine, SPARQL Endpoint, underlying DSL and Graphic Rules. ¢  Corrupt Graphic Rules result in the textual representation of the relevant triple. ¢  Domain-specific Languages increase usability and enable abstraction of underlying data models Query expressivity   Usability   Vocabulary-level semantic matching   Entity reconciliation   Semantic tractability mechanisms   Medium  (SELECT,   FILTER,  OPTIONAL)   Medium  (En=ty-­‐ centric  Search,  VQS)   Low  (Indexed  Term   URI  to  Concept)   Low  (owl:sameAs  for   same  unique  keys)   None  
  • 50.
    FUTURE WORK ¢  Ontologies,indexed term labels and catalogue as elements in a Controlled Natural Language to increase usability ¢  Results pipelined to any Problem-solving method (like Autodock Vina, visualization, ML algorithm etc.) ¢  Faceted Search, Related Entity Recognition based on Feature-based Similarity Measures ¢  Allowing users of the platform to provide their own DSL, data sources, and graphic rules. ¢  SPARQL Endpoint availability and latency ¢  Ontology Reuse instead of Ontology Alignment!
  • 51.