Digital Enterprise Research Institute                                                    www.deri.ie




                 Implementing Semantic Web applications:
                   reference architecture and challenges

                             Benjamin Heitmann, Sheila Kinsella,
                              Conor Hayes, and Stefan Decker

                      Workshop on Semantic Web Enabled Software Engineering 2009




♥ Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
                                                                               Chapter
Introduction
Digital Enterprise Research Institute                               www.deri.ie



     Focus of Semantic Web research until now:
         benefits of Semantic Web technology
     Less research on:
         costs, effort, challenges of Semantic Web technology
     Result:
         estimating cost/benefit offset for Semantic Web technologies is
         difficult
         obstacle for uptake of Semantic Web technologies by real-world
         projects
     Our contributions:
         identify main challenges and outline Software Engineering
         solutions


Benjamin.Heitmann
                             slide 2 of 14
@deri.org
Overview
Digital Enterprise Research Institute                           www.deri.ie




      Empirical Analysis of 98 Semantic Web applications
         architectural analysis + app functionality questionnaire
      Reference Architecture for Semantic Web
      applications
      Main challenges of implementing Semantic Web
      technologies
         and their effect on an example application
      Approaches for mitigating the challenges




Benjamin.Heitmann
                             slide 3 of 14
@deri.org
Empirical analysis - Architectural
Digital Enterprise Research Institute                www.deri.ie




      Goal: identify common functionality
      Result: components, allow comparison between apps
      98 papers about apps from SemWeb challenge 2003-2008
      & Scripting for SemWeb challenge 2006-2008


Benjamin.Heitmann
                             slide 4 of 14
@deri.org
Reference Architecture for
       Semantic Web applications
Digital Enterprise Research Institute                  www.deri.ie




            Empirical basis: architectural analysis
            provides standard decomposition criteria
            allows comparing of functionality




Benjamin.Heitmann
                             slide 5 of 14
@deri.org
Empirical analysis - Functionality
Digital Enterprise Research Institute                    www.deri.ie




      Goal: characterise capabilities of components
      Result: statistics about the range of variations for
     each component
      Results for 37 apps validated by authors
      Survey covers 27 properties in 7 areas of
     functionality




Benjamin.Heitmann
                             slide 6 of 14
@deri.org
Empirical analysis - Functionality
       Functionality Variations(examples)
Digital Enterprise Research Institute                  www.deri.ie




        Data Interface: data sources used
        (external/decentralised/evolving ?)
        Persistent Storage: Semantic Web standards
        supported (e.g. RDF, OWL, SPARQL ?)
        User Interface: generic/domain specific
        Data Integration: manual/automatic
        Search Service: structured/unstructured data
        Authoring: read-only/edit/create new data
        Crawling: one-time/continuous




Benjamin.Heitmann
                             slide 7 of 14
@deri.org
Implementation challenges (1)
Digital Enterprise Research Institute                             www.deri.ie




    1. Integrating noisy and heterogeneous data

            integration service is very common (72%)
            expensive: 80% require manual intervention
            76% allow updating data after initial integration
            Reasons:
                use of non-standard terms
                incorrect usage of vocabularies
               multiple URIs for the same objects and incorrect
               merging




Benjamin.Heitmann
                             slide 8 of 14
@deri.org
Implementation challenges (2)
Digital Enterprise Research Institute                            www.deri.ie




    2. Missing or belated conventions and standards
       70% allow access or importing of external data
       60% can export data or are reusable as source
       only 1/3 allow creation of new data
       Reason: standards are just emerging:
           Linked Data principles: 2006, ~8 years after RDF (1999)
           RDFa for embedding RDF in HTML: finalised 2008
           GRDDL for converting (X)HTML to RDF: finalised 2007
           SPARQL update: not finalised
           RDF forms and RDF pushback: not finalised



Benjamin.Heitmann
                             slide 9 of 14
@deri.org
Implementation challenges (3)
Digital Enterprise Research Institute                                  www.deri.ie



    3. Mismatch of data models and APIs between
    components:
     components have different data models (majority)
             object oriented (92%), relational database, graph based
         slow, non-native APIs between components
   4. Distribution of application logic across multiple
   components
     Logic included not just in code but queries, rules,
    formal vocabularies
           58% using inferencing, 24% using queries
    Result of 3+4: higher maintenance costs,
    performance loss due to non-native API overhead
Benjamin.Heitmann
                             slide 10of 14
@deri.org
Example Application: SIOC explorer
Digital Enterprise Research Institute                            www.deri.ie




                                             1 - Integration: all data
                                             is RDF+SIOC, still 2
                                             integration steps
                                             required
                                             2 - Unclear best
                                             practices: every SIOC
                                             exporter requires
                                             different crawling
 3 - Mismatched data models: graph/relational/OO
 Mismatched APIs: ruby<->java, SPARQL (slow)
 4 - distributed app logic: crawler, integration, primary app logic



Benjamin.Heitmann
                             slide 11of 14
@deri.org
Mitigating the challenges (1)
Digital Enterprise Research Institute                                  www.deri.ie




   1. Delegating generic functionality to external providers
      72% implement integration, 3 components
     required
              Delegating generic integration simplifies architecture
           Drawback: application specific integration may still be
         necessary


Benjamin.Heitmann
                             slide 12of 14
@deri.org
Mitigating the challenges (2)
Digital Enterprise Research Institute                 www.deri.ie


   2. Assembling applications from components:
     most apps in survey created on case-by-case basis:
            multiple libraries
            multiple programming languages
            mismatch of native APIs
            distributed application logic
      provide frameworks / software factories to assemble
    and customise complete applications
            provide generic data integration
            implement best practices and guidelines
            centralise application logic
            allow app specific customisation
      inspiration: Ruby on Rails, PHPCake, Django
    (Python), Struts (Java)
Benjamin.Heitmann
                             slide 13of 14
@deri.org
Summary
Digital Enterprise Research Institute                               www.deri.ie




          main challenges of implementing SemWeb tech
             cost of integrating noisy or heterogeneous data
            (non-RDF and RDF data)
              missing or belated standards and conventions
              mismatch of data models and APIs between components
              distribution of application logic across components
          approaches to mitigate the challenges:
              delegate generic functionality to external services
             support assembly of complete applications with
            frameworks
         empirical foundation: analysis of 98 Semantic Web
        applications

Benjamin.Heitmann
                             slide 14of 14
@deri.org

Implementing Semantic Web applications: reference architecture and challenges

  • 1.
    Digital Enterprise ResearchInstitute www.deri.ie Implementing Semantic Web applications: reference architecture and challenges Benjamin Heitmann, Sheila Kinsella, Conor Hayes, and Stefan Decker Workshop on Semantic Web Enabled Software Engineering 2009 ♥ Copyright 2009 Digital Enterprise Research Institute. All rights reserved. Chapter
  • 2.
    Introduction Digital Enterprise ResearchInstitute www.deri.ie Focus of Semantic Web research until now: benefits of Semantic Web technology Less research on: costs, effort, challenges of Semantic Web technology Result: estimating cost/benefit offset for Semantic Web technologies is difficult obstacle for uptake of Semantic Web technologies by real-world projects Our contributions: identify main challenges and outline Software Engineering solutions Benjamin.Heitmann slide 2 of 14 @deri.org
  • 3.
    Overview Digital Enterprise ResearchInstitute www.deri.ie Empirical Analysis of 98 Semantic Web applications architectural analysis + app functionality questionnaire Reference Architecture for Semantic Web applications Main challenges of implementing Semantic Web technologies and their effect on an example application Approaches for mitigating the challenges Benjamin.Heitmann slide 3 of 14 @deri.org
  • 4.
    Empirical analysis -Architectural Digital Enterprise Research Institute www.deri.ie Goal: identify common functionality Result: components, allow comparison between apps 98 papers about apps from SemWeb challenge 2003-2008 & Scripting for SemWeb challenge 2006-2008 Benjamin.Heitmann slide 4 of 14 @deri.org
  • 5.
    Reference Architecture for Semantic Web applications Digital Enterprise Research Institute www.deri.ie Empirical basis: architectural analysis provides standard decomposition criteria allows comparing of functionality Benjamin.Heitmann slide 5 of 14 @deri.org
  • 6.
    Empirical analysis -Functionality Digital Enterprise Research Institute www.deri.ie Goal: characterise capabilities of components Result: statistics about the range of variations for each component Results for 37 apps validated by authors Survey covers 27 properties in 7 areas of functionality Benjamin.Heitmann slide 6 of 14 @deri.org
  • 7.
    Empirical analysis -Functionality Functionality Variations(examples) Digital Enterprise Research Institute www.deri.ie Data Interface: data sources used (external/decentralised/evolving ?) Persistent Storage: Semantic Web standards supported (e.g. RDF, OWL, SPARQL ?) User Interface: generic/domain specific Data Integration: manual/automatic Search Service: structured/unstructured data Authoring: read-only/edit/create new data Crawling: one-time/continuous Benjamin.Heitmann slide 7 of 14 @deri.org
  • 8.
    Implementation challenges (1) DigitalEnterprise Research Institute www.deri.ie 1. Integrating noisy and heterogeneous data integration service is very common (72%) expensive: 80% require manual intervention 76% allow updating data after initial integration Reasons: use of non-standard terms incorrect usage of vocabularies multiple URIs for the same objects and incorrect merging Benjamin.Heitmann slide 8 of 14 @deri.org
  • 9.
    Implementation challenges (2) DigitalEnterprise Research Institute www.deri.ie 2. Missing or belated conventions and standards 70% allow access or importing of external data 60% can export data or are reusable as source only 1/3 allow creation of new data Reason: standards are just emerging: Linked Data principles: 2006, ~8 years after RDF (1999) RDFa for embedding RDF in HTML: finalised 2008 GRDDL for converting (X)HTML to RDF: finalised 2007 SPARQL update: not finalised RDF forms and RDF pushback: not finalised Benjamin.Heitmann slide 9 of 14 @deri.org
  • 10.
    Implementation challenges (3) DigitalEnterprise Research Institute www.deri.ie 3. Mismatch of data models and APIs between components: components have different data models (majority) object oriented (92%), relational database, graph based slow, non-native APIs between components 4. Distribution of application logic across multiple components Logic included not just in code but queries, rules, formal vocabularies 58% using inferencing, 24% using queries Result of 3+4: higher maintenance costs, performance loss due to non-native API overhead Benjamin.Heitmann slide 10of 14 @deri.org
  • 11.
    Example Application: SIOCexplorer Digital Enterprise Research Institute www.deri.ie 1 - Integration: all data is RDF+SIOC, still 2 integration steps required 2 - Unclear best practices: every SIOC exporter requires different crawling 3 - Mismatched data models: graph/relational/OO Mismatched APIs: ruby<->java, SPARQL (slow) 4 - distributed app logic: crawler, integration, primary app logic Benjamin.Heitmann slide 11of 14 @deri.org
  • 12.
    Mitigating the challenges(1) Digital Enterprise Research Institute www.deri.ie 1. Delegating generic functionality to external providers 72% implement integration, 3 components required Delegating generic integration simplifies architecture Drawback: application specific integration may still be necessary Benjamin.Heitmann slide 12of 14 @deri.org
  • 13.
    Mitigating the challenges(2) Digital Enterprise Research Institute www.deri.ie 2. Assembling applications from components: most apps in survey created on case-by-case basis: multiple libraries multiple programming languages mismatch of native APIs distributed application logic provide frameworks / software factories to assemble and customise complete applications provide generic data integration implement best practices and guidelines centralise application logic allow app specific customisation inspiration: Ruby on Rails, PHPCake, Django (Python), Struts (Java) Benjamin.Heitmann slide 13of 14 @deri.org
  • 14.
    Summary Digital Enterprise ResearchInstitute www.deri.ie main challenges of implementing SemWeb tech cost of integrating noisy or heterogeneous data (non-RDF and RDF data) missing or belated standards and conventions mismatch of data models and APIs between components distribution of application logic across components approaches to mitigate the challenges: delegate generic functionality to external services support assembly of complete applications with frameworks empirical foundation: analysis of 98 Semantic Web applications Benjamin.Heitmann slide 14of 14 @deri.org