Fact forge aimsa2012


Published on

This presentation describes the public data service - FactForge. It is a reason-able view of a segement of LOD cloud, and the biggest body of general knowledge on which inference is performed, supplied with a reference layer for a quick access.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Fact forge aimsa2012

  1. 1. FactForge: Data Service or theDiversity of Inferred Knowledge over LOD Mariana Damova, PhD, Kiril Simov, Zdravko Tashev, Atanas Kiryakov AIMSA’2012 September 2012
  2. 2. Ontotext – Top-5 provider of core Semantic Technology – Established in year 2000; offices in Bulgaria, UK, USA – Active both in research and commercial projects (FP7 funding for 10 years)• 360° semantic technology – unique portfolio: – Semantic Databases: high-performance RDF DBMS, scalable reasoning – Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR) – Web Mining: focused crawling, screen scraping, data fusion – Linked Data Management and Data Integration Good recognition in the SemTech community – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at GYM, #3 for “linked data management” at Google Several joint ventures and subsidiaries – Innovantage: leading online recruitment intelligence provider in UK
  3. 3. Ontotext Clients (selected) British Broadcasting Corporation (BBC) – Run its World Cup 2010 sites on top of OWLIM – Since Mar’12 BBC Sports – 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext Press Association (UK) – Analysis of Sports news – Concept extraction – Linked data generation Top-3 USA media (not allowed to name) The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM de Bibliothek (Holland) aggregation of data from 150 library databases
  4. 4. Semantic Web and Linked Open Data• Semantic Web a set of standards that enable computers to interpret the semantics of data on the web• Linked Open Data a set of principles for publishing structured data and interlinking them so that they can be browsed in a way HTML pages are browsable - Use URIs to identify things. - Use HTTP URIs so that these things can be referred to and looked up ("dereferenced") by people and user agents. - Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML. - Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. AIMSA’2012 September 2012 #4
  5. 5. Linked Open Data cloud 2008 2011 295 datasets 2009 more than 30 billion triples AIMSA’2012 July 2011 #5
  6. 6. Linked Open Data is maturing LOD cloud grows by billions of triples yearlyTechnologies and guidelines about how to produce linked data fast how to assure their quality how to provide vertical oriented data services LOD2, LATC, baseKB AIMSA’2012 September 2012 #6
  7. 7. This talk is about reasoning and coping with diversity of the data on the web of data AIMSA’2012 September 2012 #7
  8. 8. Outline• FactForge (beta)• Reference Layer• Access Modes• Querying – Airports around London – US city – a subject of a Novel – US city – contactInformation• Challenges• Conclusion AIMSA’2012 September 2012
  9. 9. FactForge (beta)the largest body of heterogeneous general knowledge on which inference has been performed– powered by OWLIM 5.2 – supporting SPARQL 1.1 AIMSA’2012 September 2012
  10. 10. Datasets REASON-ABLE VIEW of LOD datasets Number of explicit statements: 1,686,804,539 Implicit statements: 1,264,199,839 Retrievable statements: 12,646,674,554 CIA FactBook DBpedia 3.7 Freebase NY Times Lexvo Wordnet 3.0 Geonames Lingvoj MusicBrainzmaterialization is performed with respect to the semantics of OWL-Horst optimized AIMSA’2012 September 2012
  11. 11. Reference Layer PROTON – light weight upper level ontology ~500 classes, ~150 properties http://www.ontotext.com/proton-ontologyLinking at schema level:(1) using rdfs:subClassOf and rdfs:subPropertyOf statements;(2) using OWL expressions where there is a difference in the conceptualization(3) using inference rules if additional individuals are necessary in the repository to support the mapping AIMSA’2012 September 2012 #11
  12. 12. Access modesRDF Search - retrieve ranked list of URIs related to literals, which contain specific keywords AIMSA’2012 September 2012 #12
  13. 13. Access modes (condt) Exploration - traversing the data, one resource at a time AIMSA’2012 September 2012
  14. 14. Access modes (condt) Exploration - traversing the data, one resource at a time, inspecting inferred knowledge- locatedIn – Bulgaria, Eastern Europe- Geonames types/FearureCodes (dc:type P.PPL)- parentFeature – Bulgaria, Europe-containsLocation – Cherno More Sports Complex, Varna Archeological Museum- isBirthPlaceOf – Aleksander Kraev, Martin Hristov… AIMSA’2012 September 2012 #14
  15. 15. Access modes (condt) Exploration - traversing the data, one resource at a time, inspecting inferred knowledge- locatedIn - Europe- subRegionOf - Europe- hasContactInfo – website via Freebase-containsLocation- partOf … AIMSA’2012 September 2012 #15
  16. 16. Access modes (condt)SPARQL endpoint AIMSA’2012 September 2012 #16
  17. 17. Access modes (condt)RelFinder European Data Forum September 2012 #17
  18. 18. QueryingUsing LOD concepts SELECT * WHERE { ?Person dbp-ont:birthPlace ?BirthPlace ; rdf:type dbp-ont:Politician ; ?BirthPlace geo-ont:parentFeature dbpedia:Germany . }Using the intermediary layer SELECT * WHERE { ?Person prot:birthPlace ?BirthPlace ; rdf:type prot:Politicianr ; ?BirthPlace prot:subRegionOf dbpedia:Germany . } AIMSA’2012 September 2012
  19. 19. Find Airports near London Standard LOD vs. PROTON query 13 vs. 20 results DBpedia vs. DBpedia and Geonames AIMSA’2012 September 2012 #19
  20. 20. Find airports near London - Results comparison Using Geospatial index of OWLIM AIMSA’2012 September 2012 #20
  21. 21. City – a subject of a science fiction author AIMSA’2012 September 2012 #21
  22. 22. OWLIM 5.0 and SPARQL 1.1Exemplary queries :GROUP BY, min — Minimal and maximal population counts of European countriesFederated Query between FactForge and LinkedLifeData — Drugs that cure the disease from which died Alexandre Graham BellLiteral index over dates – World governors in office between 1980 and 2005Literal index over digits ― European countries with population above 20 MLNGeospatial index — Show the distance from London of airports located at most 50 miles away from it AIMSA’2012 September 2012 #22
  23. 23. Challenges and usage• Clean data – Clean up input data• At model level – Contradiction detection – Consistency checking• Curation and upgrading methodology FactForge has been used as data layer infrastructure in FP7 projects, like RENDER FactForge has been used in tasks of linked data generation from unstructured data, metadata enrichment of structured data providing linkage to the entire LOD cloud for example The National Archive of UK EDAMAM - food recommendation app AIMSA’2012 September 2012 #23
  24. 24. Acknowledgements Partial fundingColleaguesIvan Peikov, OntotextRouslan Velkov, OntotextBarry Bishop, OntotextBarry Norton, OntotextMarin Dimitrov, OntotextAlex Simov, OntotextJordan Dichev, OntotextKonstantin Penchev, Ontotext Links http://ff-dev.ontotext.com http://www.ontotext.com/owlim http://www.ontotext.com/factforge Email: info@factforge.net AIMSA’2012 September 2012 #24
  25. 25. Thank you for your attention!mariana.damova@ontotext.com