Mariana Damova - Ontotext


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Mariana Damova - Ontotext

  1. 1. OWLIMMariana Damova, PhD DM2EVienna, November 2012
  2. 2. Ontotext – Top-5 provider of core Semantic Technology – Established in year 2000; offices in Bulgaria, UK, USA – Active both in research and commercial projects (FP7 funding for 10 years)• 360° semantic technology – unique portfolio: – Semantic Databases: high-performance RDF DBMS, scalable reasoning – Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR) – Web Mining: focused crawling, screen scraping, data fusion – Linked Data Management and Data Integration Good recognition in the SemTech community – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at GYM, #3 for “linked data management” at Google Several joint ventures and subsidiaries – Innovantage: leading online recruitment intelligence provider in UK
  3. 3. Ontotext Clients (selected) British Broadcasting Corporation (BBC) – Run its World Cup 2010 sites on top of OWLIM – Since Mar’12 BBC Sports – 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext Press Association (UK) – Analysis of Sports news – Concept extraction – Linked data generation Top-3 USA media (not allowed to name) The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM de Bibliothek (Holland) aggregation of data from 150 library databases
  4. 4. Semantic Technologies• Semantic technologies (RDF, LOD) allow for an unprecedented ease of integration of heterogeneous data sources – Already adopted in pharmaceuticals and publishing industries – Cultural heritage is the next BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic Publishing” architecture, the BBC team observed considerable reduction of complexity of database design, query specification, application development, and query evaluation time. BBC World Cup 2010 dynamic semantic publishing. Jem Rayfield, Senior Technical Architect BBC News and Knowledge. mic_sem.html
  5. 5. OWLIM
  6. 6. Semantic Repository for RDFS and OWL• OWLIM is a family of scalable semantic repositories • OWLIM-Lite: in-memory, fastest, scales to ~100 million statements • OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion statements • OWLIM-Enterprise: replication cluster deployment for resilience and high performance parallel query-answering• OWLIM provides – Management, integration and analysis of heterogeneous data – Combined with light-weight, high-performance reasoning – The inference is based on logical rule-entailment – Full RDFS, OWL Horst, restricted OWL-Lite, OWL2-QL and OWL2 RL – Custom semantics can be defined via rules and axiomatic triples
  7. 7. OWLIM in the Cultural Heritage DomainSelected commercial projects ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium. The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext LODAC (Linked Open Data in Academia), Japans National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. SemTech for Cultural Heritage project funded by ITCC Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for EuropeanaSelected research projects MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge representationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext
  8. 8. OWLIM PERFORMANCE• OWLIM is a scalable, robust and efficient triple store – Serving the two most important web-sites for the London Olympic Games • Official Olympics website • BBC Olympics website – Performance highlights • OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17 min. for 100M) • Best query performance among those repositories that can handle update and multi-client query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100 queries/sec) • OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario • OWLIM v5 requires between 25% and 70% less storage space• OWL 2 RL-type languages have proven to be the only feasible approach for reasoning with billion statements
  9. 9. Reasoning complexity
  10. 10. owl:sameAs Optimizationa way to handle the equivalent statements by a single master node,which has as an impact efficient and compact handling of inferredstatements resulting in 4-6 times more statements available to querythan the explicitly introduced ones
  11. 11. OWLIM Replication Cluster• Distribution through data replication is used to ensure: – Better handling of concurrent user requests – Failover support• How does it work? – Every user request is pushed in a transaction queue – Each data write request is are multiplexed to all repository instances – Each read request is dispatched to one of the instance only – To ensure load-balancing, each read requests is send to the instance with smallest execution queue at this point in time
  12. 12. Geo-spatial index• Geo-spatial information concerns the geometry of points, shapes and distances relative to the surface of the Earth (or any spherical object).• When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to +90 degrees and the longitude ranging from -180 to +180 degrees.• airports have a reference point given by latitude, longitude and altitude;• political boundaries can be specified by polygons where each vertex is a 2-Dimensional latitude/longitude pair.
  13. 13. RDF Rank• OWLIM-SE includes a plug-in that allows for efficient calculation of a modification of PageRank over RDF graphs• Computation of rank values is fast, e.g. – 400M LOD statements takes 310 sec (27 iteraions)• Results are available through a system predicate• Example: get the 100 most important nodes in the RDF graph SELECT ?n {?n rank:hasRDFRank ?r} ORDER BY DESC(?r) LIMIT 100
  14. 14. Define: nested repositories”Nested repositories” represent a new data management concept for RDF data:• a mechanism for sharing data stored across multiple repositories, where• one of them contains a large body of knowledge which gets embedded in other repositories• each containing more specific data, which are being interlinked with the common body of knowledge
  15. 15.