From Linked Data to Semantic Applications


Published on

In this talk we will discuss how to build (today) semantically intelligent systems, i.e. systems with the ability to process and interpret information by its meaning. We will take a multidisciplinary perspective showing how recent advances in other computer science areas such as Information Retrieval and Natural Language Processing can enable, together with Linked Data and Semantic Web resources, the construction of the next generation of information systems. A summary of the core principles and available
resources from these areas will give a concrete understanding on how to jump-start your own semantic system.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

From Linked Data to Semantic Applications

  1. 1.  Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
  2. 2.  The Semantic Web vision & Linked Data Multi-disciplinary perspective  Linked Data, IR, NLP Case study: Treo  Talking to the Linked Data Web Semantic application patterns Take-away message
  3. 3. 2001: Software which is able to understand meaning (intelligent, flexible) Leveraging the Web for information scale
  4. 4.  What was the plan to achieve it? Build a Semantic Web Stack Which covers both representation and reasoning
  5. 5.  Adoption:  No significant data growth Ontologies are not straightforward to build:  People are not familiriazed with the tools and principles  Difficult to keep consistency at Web scale Scalability
  6. 6.  Problems:  Consistecy  Scalability Logic World Web World
  7. 7. 2006: The Web as a Huge Database Fundamental step for data creation
  8. 8.  Where is the intelligence and flexibility? We will be back to this point in a minute
  9. 9.  Data Model Features:  Graph-based data model  Extensible schema  Entity-centric data integration Specific Features:  Designed over open Web standards  Based on the Web infrastructure (HTTP, URIs)
  10. 10.  Positives:  Solidadoption in the Open Data context (eGovernment, eScience, etc,...)  Existing data is relevant (you can build real applications) Negatives:  Data consumption is a problem  Datageneration beyond databases mapping/triplification is also a problem  Still far from the Semantic Web vision
  11. 11.  How to address the previous challenges? Linked Data:  Web-scale structured data representation Information Retrieval:  Search, approximation, ranking strategies  Scalability Natural Language Processing (NLP):  Analysing natural language  Semantic approximation (distributional semantics)
  12. 12.  IBM Watson approach
  13. 13.  With Linked Data we are still in the DB worldFrom which university did the wife ofBarack Obama graduate?
  14. 14.  With Linked Data we are still in the DB world (but slightly worse)
  15. 15. From which university did the wife of Barack Obama graduate?
  16. 16. ): Direction, pathDemonstration
  17. 17.  Transform natural language queries into triple patterns Steps:  Entity Recognition “From which university did the wife of Barack Obama graduate?”  Dependency parsing  Query Pattern detection prep(graduate-10, From-1) From/IN det(university-3, which-2) which/WDT  Query Planning pobj(From-1, university-3) university/NN aux(graduate-10, did-4) did/VBD det(wife-6, the-5) the/DT nsubj(graduate-10, wife-6) wife/NN prep(wife-6, of-7) of/IN nn(Obama-9, Barack-8) Barack/NNP pobj(of-7, Obama-9) Obama/NNP root(ROOT-0, graduate-10) graduate/VBUsing NLP ?/.
  18. 18. Query:Using NLP
  19. 19.  Entity Search:  Build an entity index (instances)  Extract terms from URIs and index the terms using your favourite IR framework  Search instances by keywordsUsing IR
  20. 20. QueryLinked DataWeb Using IR
  21. 21.  Use distributional semantics to semantically match query terms to predicates and classes Distributional principle: Words that co-occur together tend to have related meaning  Allows the creation of a comprehensive semantic model from unstructured text  Based on statistical patterns over large amounts of text  No human annotations Distributional semantics can be used to compute a semantic relatedness measure between two words Using NLP and IR
  22. 22.  Computation of a measure of “semantic proximity” between two terms Allows a semantic approximate matching between and It supports a reasoning-like behavior based on the knowledge embedded in the corpus Using NLP and IR
  23. 23. Query Which properties are semantically related to ‘wife’?Linked DataWeb Using NLP and IR
  24. 24. QueryLinked DataWeb Using NLP and IR
  25. 25. QueryLinked DataWeb Using NLP and IR
  26. 26. QueryLinked DataWeb Using NLP and IR
  27. 27.  Semantic approximation in databases (as in any IR system): semantic best-effort Need some level of user disambiguation, refinement and feedback As we move in the direction of semantic systems we should expect the need for principled dialog mechanisms (like in human communication) Pull the the user interaction back into the system Using NLP and IR
  28. 28.  Derived from the experience developing Treo Not restricted to queries over Linked Data The following list is not intended to be complete
  29. 29.  Pattern #1: Maximize the amount of knowledge in your semantic application Meaning interpretation depends on knowledge Using LOD: DBpedia, Freebase, YAGO can give you a very comprehensive set of instances and their types Wikipedia can provide you a comprehensive distributional semantic model
  30. 30.  Pattern #2: Allow your database to grow Dynamic schema Entity-centric data integration
  31. 31.  Pattern #3: Once the database grows in complexity use semantic search instead of structured queries Instances can be used as pivot entities to reduce the search space  They are easier to search  Higher specificity and lower vocabulary variation
  32. 32.  Pattern #4: Use distributional semantics and semantic relatedness for a robust semantic matching Distributional semantics allows your application to digest (and make use of) large amounts of unstructured information Multilingual solution Can be complemented with WordNet
  33. 33.  Pattern #5: POS-Tags, Syntactic Parsing + Rules will go a long way to interpret natural language queries and sentences Use them to explore the regularities in natural language Define a scope for natural language processing in your application (restrict by domain, syntactic complexity) These tools are easy to use and quite robust (at least for English)
  34. 34.  Pattern #6: Provide a user dialog mechanism in the application Improve the semantic model with user feedback
  35. 35.  Part of the Semantic Web vision can be addressed today with a multi-disciplinary perspective  Linked Data, IR and NLP You can build your own IBM Watson-like application Both data and tools are available and ready to use: the barrier is the mindset Large opportunity for new solutions
  36. 36.  NLP  Datasets  WordNet  DBpedia  VerbNet  Freebase  Stanford parser  YAGO  C&C parser/Boxer  NLTK  Tools that will be  DBpedia Spotlight available soon:  Gate  Treo  UIMA  Treo-ESA IR  Graphia  Lucene/Solr  Terrier
  37. 37. André Freitas, Edward Curry, João Gabriel Oliveira, Sean ORiain, . IEEE InternetComputing, Special Issue on Internet-Scale Data, 2012. André Freitas, Edward Curry, João Gabriel Oliveira, Sean ORiain, International Journal of Semantic Computing (IJSC),2012. André Freitas, Sean ORiain, Edward Curry, . 27th ACM Applied Computing Symposium, Semantic Web and ItsApplications Track, 2012. André Freitas, João Gabriel Oliveira, Sean ORiain, Edward Curry, João Carlos Pereira daSilva, InProceedings of the 16th International Conference on Applications of Natural Language toInformation Systems (NLDB) 2011. André Freitas, Danilo S. Carvalho, João Carlos Pereira da Silva, Sean ORiain, Edward Curry, ASemantic Best-Effort Approach for Extracting Structured Discourse Graphs from Wikipedia. InProceedings of the 1st Workshop on the Web of Linked Entities (WoLE 2012) at the 11thInternational Semantic Web Conference (ISWC), 2012
  38. 38. andrefreitas.organdre (dot) freitas – at – deri (dot) org