Digital Enterprise Research Institute                                          www.deri.ie




          Querying Heterogeneous Datasets on
                 the Linked Data Web:
          Challenges, Approaches, and Trends
                 André Freitas, Edward Curry, João G. Oliveira,
                                  Seán O’Riain




© Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
IEEE Internet Computing
Digital Enterprise Research Institute                        www.deri.ie




  A. Freitas, E. Curry, J. G.
  Oliveira, and S. O’Riain,
  “Querying Heterogeneous
  Datasets on the Linked
  Data Web: Challenges,
  Approaches, and
  Trends,”e  IEEE Internet
  Computing, vol. 16, no. 1,
  pp. 24-33, 2012.




   http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141
   http://andrefreitas.org
Digital Enterprise Research Institute   www.deri.ie




  Motivation
Querying Data over the Web
Digital Enterprise Research Institute                               www.deri.ie




       We can see (a) natural language query over two search engines;
        (b) corresponding SPARQL representation; and (c) semantic gap
        between the user’s information needs and data representation.
Expressivity-Usability Trade-Off
Digital Enterprise Research Institute                              www.deri.ie




   Expressivity–usability trade-off for querying over structured data.
   Blue dots indicate an ideal query mechanism for linked data must
    provide both high expressivity and high usability
Digital Enterprise Research Institute   www.deri.ie




  Challenges
Challenges
Digital Enterprise Research Institute                  www.deri.ie


           Analysis focuses on investigation of existing
            approaches under the perspective of the
            usability-expressivity trade-off.

           This focus guides the categorization and
            analysis of existing challenges, approaches
            and trends.
Challenge Dimensions
Digital Enterprise Research Institute                                 www.deri.ie


           Query Expressivity
               Ability  to query datasets by referencing elements
                  in data model structure, as well as to operate
                  over the data (aggregate results, express
                  conditional statements, etc.)
           Usability
               Easy-to-operate,        intuitive,   and   task-efficient
                  query interface
           Vocabulary-level Semantic Matching
               Ability  to semantically match user query terms to
                  dataset vocabulary-level terms
Challenge Dimensions
Digital Enterprise Research Institute                                    www.deri.ie


           Entity Reconciliation
               Matches    entities expressed in the query to
                  semantically equivalent dataset entities
           Semantic Tractability
               Ability    to answer queries not supported by
                  explicit dataset statements
                     – For example, “Is Natalie Portman an Actress?” can be
                       supported by the statement “Natalie Portman starred
                       Star Wars,” instead of an explicit statement “Natalie
                       Portman occupation Actress,” which might not be
                       present in dataset
Digital Enterprise Research Institute   www.deri.ie




  Approaches
Approaches
Digital Enterprise Research Institute                                www.deri.ie


           Information Retrieval approaches
                 Entity-centric search
                 Structure search
           Natural Language approaches
                 Question Answering
                 Semantic best-effort natural language interfaces
Entity-Centric Search
Digital Enterprise Research Institute   www.deri.ie




   e.g. Sindice
Structure Search
Digital Enterprise Research Institute   www.deri.ie




   e.g. Semplore
Question Answering
Digital Enterprise Research Institute   www.deri.ie




   e.g. FreyA
Semantic Best-Effort/NL
Digital Enterprise Research Institute   www.deri.ie




  e.g. Treo
Comparative Analysis (Approaches)
Digital Enterprise Research Institute   www.deri.ie
Addressing the Challenges
Digital Enterprise Research Institute                 www.deri.ie


           The functionality analysis of existing
            approaches provides insights on how the
            major challenges should be addressed.
           This set of strategic functionalities define
            the set of trends.
Linked Data Web
Digital Enterprise Research Institute   www.deri.ie
Digital Enterprise Research Institute   www.deri.ie




  Trends
Trends
Digital Enterprise Research Institute                www.deri.ie


           Complementary Search and Query Services
           User Interaction and Feedback Mechanisms
           Semantic Best-Effort Query Model
           Natural Language Processing Techniques
           Distributional Semantic Model
           External Knowledge Sources for Semantic
            Enrichment
           Integrated Entity Reconciliation Techniques
IEEE Internet Computing
Digital Enterprise Research Institute                        www.deri.ie




  A. Freitas, E. Curry, J. G.
  Oliveira, and S. O’Riain,
  “Querying Heterogeneous
  Datasets on the Linked
  Data Web: Challenges,
  Approaches, and
  Trends,”e  IEEE Internet
  Computing, vol. 16, no. 1,
  pp. 24-33, 2012.




   http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141
   http://andrefreitas.org
Further Reading
Digital Enterprise Research Institute                                        www.deri.ie



       A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain, A Distributional
        Structured Semantic Space for Querying RDF Graph Data, International
        Journal of Semantic Computing, vol. 5, no. 4, pp. 433-462, 201
       S. O’Riain, E. Curry, and A. Harth, XBRL and Open Data for Global Financial
        Ecosystems: A Linked Data Approach, International Journal of Accounting
        Information Systems, vol. 13, no. 2, pp. 141-162, 2012.
       A. Freitas, E. Curry, and S. O'Riain, p A Distributional Approach for
        Terminology-Level Semantic Search on the Linked Data Web, in 27th ACM
        Symposium On Applied Computing (SAC 2012), 2012.
       A. Freitas, J. G. Oliveira, S. O'Riain, and E. Curry,WA Multidimensional
        Semantic Space for Data Model Independent Queries over RDF Data, in
        Fifth IEEE International Conference on Semantic Computing (ICSC 2011)
       A. Freitas, T. Knap, S. O’Riain, and E. Curry, W3P: Building an OPM based
        provenance model for the Web, Future Generation Computer Systems, vol.
        27, no. 6, pp. 766-774, Jun. 2011.

Querying Heterogeneous Datasets on the Linked Data Web

  • 1.
    Digital Enterprise ResearchInstitute www.deri.ie Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends André Freitas, Edward Curry, João G. Oliveira, Seán O’Riain © Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  • 2.
    IEEE Internet Computing DigitalEnterprise Research Institute www.deri.ie A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain, “Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends,”e  IEEE Internet Computing, vol. 16, no. 1, pp. 24-33, 2012. http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141 http://andrefreitas.org
  • 3.
    Digital Enterprise ResearchInstitute www.deri.ie Motivation
  • 4.
    Querying Data overthe Web Digital Enterprise Research Institute www.deri.ie  We can see (a) natural language query over two search engines; (b) corresponding SPARQL representation; and (c) semantic gap between the user’s information needs and data representation.
  • 5.
    Expressivity-Usability Trade-Off Digital EnterpriseResearch Institute www.deri.ie  Expressivity–usability trade-off for querying over structured data.  Blue dots indicate an ideal query mechanism for linked data must provide both high expressivity and high usability
  • 6.
    Digital Enterprise ResearchInstitute www.deri.ie Challenges
  • 7.
    Challenges Digital Enterprise ResearchInstitute www.deri.ie  Analysis focuses on investigation of existing approaches under the perspective of the usability-expressivity trade-off.  This focus guides the categorization and analysis of existing challenges, approaches and trends.
  • 8.
    Challenge Dimensions Digital EnterpriseResearch Institute www.deri.ie  Query Expressivity  Ability to query datasets by referencing elements in data model structure, as well as to operate over the data (aggregate results, express conditional statements, etc.)  Usability  Easy-to-operate, intuitive, and task-efficient query interface  Vocabulary-level Semantic Matching  Ability to semantically match user query terms to dataset vocabulary-level terms
  • 9.
    Challenge Dimensions Digital EnterpriseResearch Institute www.deri.ie  Entity Reconciliation  Matches entities expressed in the query to semantically equivalent dataset entities  Semantic Tractability  Ability to answer queries not supported by explicit dataset statements – For example, “Is Natalie Portman an Actress?” can be supported by the statement “Natalie Portman starred Star Wars,” instead of an explicit statement “Natalie Portman occupation Actress,” which might not be present in dataset
  • 10.
    Digital Enterprise ResearchInstitute www.deri.ie Approaches
  • 11.
    Approaches Digital Enterprise ResearchInstitute www.deri.ie  Information Retrieval approaches  Entity-centric search  Structure search  Natural Language approaches  Question Answering  Semantic best-effort natural language interfaces
  • 12.
    Entity-Centric Search Digital EnterpriseResearch Institute www.deri.ie e.g. Sindice
  • 13.
    Structure Search Digital EnterpriseResearch Institute www.deri.ie e.g. Semplore
  • 14.
    Question Answering Digital EnterpriseResearch Institute www.deri.ie e.g. FreyA
  • 15.
    Semantic Best-Effort/NL Digital EnterpriseResearch Institute www.deri.ie e.g. Treo
  • 16.
    Comparative Analysis (Approaches) DigitalEnterprise Research Institute www.deri.ie
  • 17.
    Addressing the Challenges DigitalEnterprise Research Institute www.deri.ie  The functionality analysis of existing approaches provides insights on how the major challenges should be addressed.  This set of strategic functionalities define the set of trends.
  • 18.
    Linked Data Web DigitalEnterprise Research Institute www.deri.ie
  • 19.
    Digital Enterprise ResearchInstitute www.deri.ie Trends
  • 20.
    Trends Digital Enterprise ResearchInstitute www.deri.ie  Complementary Search and Query Services  User Interaction and Feedback Mechanisms  Semantic Best-Effort Query Model  Natural Language Processing Techniques  Distributional Semantic Model  External Knowledge Sources for Semantic Enrichment  Integrated Entity Reconciliation Techniques
  • 21.
    IEEE Internet Computing DigitalEnterprise Research Institute www.deri.ie A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain, “Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches, and Trends,”e  IEEE Internet Computing, vol. 16, no. 1, pp. 24-33, 2012. http://doi.ieeecomputersociety.org/10.1109/MIC.2011.141 http://andrefreitas.org
  • 22.
    Further Reading Digital EnterpriseResearch Institute www.deri.ie  A. Freitas, E. Curry, J. G. Oliveira, and S. O’Riain, A Distributional Structured Semantic Space for Querying RDF Graph Data, International Journal of Semantic Computing, vol. 5, no. 4, pp. 433-462, 201  S. O’Riain, E. Curry, and A. Harth, XBRL and Open Data for Global Financial Ecosystems: A Linked Data Approach, International Journal of Accounting Information Systems, vol. 13, no. 2, pp. 141-162, 2012.  A. Freitas, E. Curry, and S. O'Riain, p A Distributional Approach for Terminology-Level Semantic Search on the Linked Data Web, in 27th ACM Symposium On Applied Computing (SAC 2012), 2012.  A. Freitas, J. G. Oliveira, S. O'Riain, and E. Curry,WA Multidimensional Semantic Space for Data Model Independent Queries over RDF Data, in Fifth IEEE International Conference on Semantic Computing (ICSC 2011)  A. Freitas, T. Knap, S. O’Riain, and E. Curry, W3P: Building an OPM based provenance model for the Web, Future Generation Computer Systems, vol. 27, no. 6, pp. 766-774, Jun. 2011.