SlideShare a Scribd company logo
1 of 15
Download to read offline
OWLIM

Mariana Damova, PhD



     DM2E
Vienna, November 2012
Ontotext
   – Top-5 provider of core Semantic Technology
   – Established in year 2000; offices in Bulgaria, UK, USA
   – Active both in research and commercial projects (FP7 funding for 10 years)

• 360° semantic technology – unique portfolio:
   – Semantic Databases: high-performance RDF DBMS, scalable reasoning
   – Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)
   – Web Mining: focused crawling, screen scraping, data fusion
   – Linked Data Management and Data Integration

   Good recognition in the SemTech community
   – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at
     GYM, #3 for “linked data management” at Google

   Several joint ventures and subsidiaries
   – Innovantage: leading online recruitment intelligence provider in UK
Ontotext Clients (selected)

          British Broadcasting Corporation (BBC)
                – Run its World Cup 2010 sites on top of OWLIM
                – Since Mar’12 BBC Sports
                – 2012 Olympics sections are driven
                  by OWLIM and a Concept Extraction service developed by Ontotext
          Press Association (UK)
                – Analysis of Sports news
                – Concept extraction
                – Linked data generation
          Top-3 USA media (not allowed to name)
          The National Archives (UK) contracted Ontotext to implement
          semantic KB and semantic search for the Government Web Archive
          British Museum (UK) Ontotext leads the development of Phase 3 of
          ResearchSpace project on collaborative research in cultural heritage;
          British Museum’s public SPARQL end-point is powered by OWLIM
          de Bibliothek (Holland) aggregation of data from 150 library databases
Semantic Technologies


•   Semantic technologies (RDF, LOD) allow for an unprecedented ease of
    integration of heterogeneous data sources
      – Already adopted in pharmaceuticals and publishing industries
      – Cultural heritage is the next

     BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic
       Publishing” architecture, the BBC team observed considerable reduction of
       complexity of database design, query specification, application
       development, and query evaluation time. BBC World Cup 2010 dynamic
       semantic publishing. Jem Rayfield, Senior Technical Architect BBC News
       and Knowledge.
       http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna
       mic_sem.html
OWLIM
Semantic Repository for RDFS and OWL

• OWLIM is a family of scalable semantic repositories
   • OWLIM-Lite: in-memory, fastest, scales to ~100 million statements
   • OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion
     statements
   • OWLIM-Enterprise: replication cluster deployment for resilience and high
     performance parallel query-answering

• OWLIM provides
    – Management, integration and analysis of heterogeneous data
    – Combined with light-weight, high-performance reasoning
    – The inference is based on logical rule-entailment
    – Full RDFS, OWL Horst, restricted OWL-Lite, OWL2-QL and OWL2 RL
    – Custom semantics can be defined via rules and axiomatic triples
OWLIM in the Cultural Heritage Domain

Selected commercial projects
          ResearchSpace project funded by the Andrew W. Mellon Foundation
          Support for collaborative web-based research, information sharing and web publishing for
          the cultural heritage scholarly community. An Ontotext-led international consortium.
             The Polish Digital National Museum aggregates artifacts from over 70 contributing
           cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM
           repository of Ontotext
            LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics
           aggregates various information across multiple Japanese resources as LOD. The system
           uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.
            SemTech for Cultural Heritage project funded by ITCC
           Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian
            technical aggregator for Europeana
Selected research projects
            MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge
           representationinfrastructure for querying RDF and presenting query results, includes close
           to 9K museum objects from two collections of The Gothenburg City
             Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded
           integrating activity project, a consortium of 21 partners, metadata from 6 major European
           cultural institutions has selected OWLIM repository of Ontotext
OWLIM PERFORMANCE



•   OWLIM is a scalable, robust and efficient triple store
     – Serving the two most important web-sites for the London Olympic Games
         • Official Olympics website
         • BBC Olympics website
     – Performance highlights
         • OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product
           (17 min. for 100M)
         • Best query performance among those repositories that can handle update and multi-client
           query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about
           100 queries/sec)
         • OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario
         • OWLIM v5 requires between 25% and 70% less storage space



•   OWL 2 RL-type languages have proven to be the only feasible approach for
    reasoning with billion statements
Reasoning complexity
owl:sameAs Optimization

a way to handle the equivalent statements by a single master node,
which has as an impact efficient and compact handling of inferred
statements resulting in 4-6 times more statements available to query
than the explicitly introduced ones
OWLIM Replication Cluster

• Distribution through data replication is used to ensure:
   – Better handling of concurrent user requests
   – Failover support
• How does it work?
   – Every user request is pushed in a transaction queue
   – Each data write request is are multiplexed to all repository instances
   – Each read request is dispatched to one of the
     instance only
   – To ensure load-balancing, each
     read requests is send to the
     instance with smallest execution
     queue at this point in time
Geo-spatial index

• Geo-spatial information concerns the geometry of points, shapes and distances relative to the
  surface of the Earth (or any spherical object).
• When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to
  +90 degrees and the longitude ranging from -180 to +180 degrees.




• airports have a reference point given by latitude, longitude and altitude;
• political boundaries can be specified by polygons where each vertex is a 2-Dimensional
  latitude/longitude pair.
RDF Rank

• OWLIM-SE includes a plug-in that allows for efficient
  calculation of a modification of PageRank over RDF graphs
• Computation of rank values is fast, e.g.
   – 400M LOD statements takes 310 sec (27 iteraions)

• Results are available through a system predicate
• Example: get the 100 most important nodes in the RDF graph
      SELECT ?n {?n rank:hasRDFRank ?r}
      ORDER BY DESC(?r) LIMIT 100
Define: nested repositories

”Nested repositories” represent a new data
   management concept for RDF data:
•   a mechanism for sharing data stored across
    multiple repositories, where
•   one of them contains a large body of
    knowledge which gets embedded in other
    repositories
•   each containing more specific data, which are
    being interlinked with the common body of
    knowledge
http://www.ontotext.com/owlim




                       mariana.damova@ontotext.com

More Related Content

Viewers also liked

2014 09-18 pundit@dariah2014
2014 09-18 pundit@dariah20142014 09-18 pundit@dariah2014
2014 09-18 pundit@dariah2014Net7
 
Presentation of Europeana Regia at the 40th Annual LIBER Conference
Presentation of Europeana Regia at the 40th Annual LIBER ConferencePresentation of Europeana Regia at the 40th Annual LIBER Conference
Presentation of Europeana Regia at the 40th Annual LIBER ConferenceEuropeana Regia
 
Introduction to the Pundit Hands-on session
Introduction to the Pundit Hands-on sessionIntroduction to the Pundit Hands-on session
Introduction to the Pundit Hands-on sessionChristian Morbidoni
 
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)Digitised Manuscripts to Europeana
 
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Europeana Regia
 
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...Digitised Manuscripts to Europeana
 
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceWhat About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceDigitised Manuscripts to Europeana
 
Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Kai Eckert
 
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...Digitised Manuscripts to Europeana
 

Viewers also liked (20)

The value of open data and the OpenGLAM network
The value of open data and the OpenGLAM networkThe value of open data and the OpenGLAM network
The value of open data and the OpenGLAM network
 
Work Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian MorbidoniWork Package 3 - Month 6 by Christian Morbidoni
Work Package 3 - Month 6 by Christian Morbidoni
 
2014 09-18 pundit@dariah2014
2014 09-18 pundit@dariah20142014 09-18 pundit@dariah2014
2014 09-18 pundit@dariah2014
 
All WP Meeting Athens - Europeana Inside - Gordon McKenna
All WP Meeting Athens - Europeana Inside - Gordon McKennaAll WP Meeting Athens - Europeana Inside - Gordon McKenna
All WP Meeting Athens - Europeana Inside - Gordon McKenna
 
Presentation of Europeana Regia at the 40th Annual LIBER Conference
Presentation of Europeana Regia at the 40th Annual LIBER ConferencePresentation of Europeana Regia at the 40th Annual LIBER Conference
Presentation of Europeana Regia at the 40th Annual LIBER Conference
 
Introduction to the Pundit Hands-on session
Introduction to the Pundit Hands-on sessionIntroduction to the Pundit Hands-on session
Introduction to the Pundit Hands-on session
 
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)
DM2E Project meeting Bergen: WP4 presentation, Lieke Ploeger (OKF)
 
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...Presentation of Europeana Regia at "The Message of the Old Book in the New En...
Presentation of Europeana Regia at "The Message of the Old Book in the New En...
 
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
DM2E Project meeting Bergen: WP2 presentation, Kai Eckert (University of Mann...
 
The Wittgenstein Incubator and Swicky Notes - Alois Pilcher
The Wittgenstein Incubator and Swicky Notes - Alois PilcherThe Wittgenstein Incubator and Swicky Notes - Alois Pilcher
The Wittgenstein Incubator and Swicky Notes - Alois Pilcher
 
All WP Meeting Athens - Workpackage 4 Update - Sam Leon
All WP Meeting Athens - Workpackage 4 Update - Sam LeonAll WP Meeting Athens - Workpackage 4 Update - Sam Leon
All WP Meeting Athens - Workpackage 4 Update - Sam Leon
 
Introduction to dm2 e final dg
Introduction to dm2 e final dgIntroduction to dm2 e final dg
Introduction to dm2 e final dg
 
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, FranceWhat About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
What About Semantics? - Stefan Gradmann, WWW2012, Lyon, France
 
All WP Meeting Athens - Workpackage 1 Update - Doron Goldfarb
All WP Meeting Athens - Workpackage 1 Update - Doron GoldfarbAll WP Meeting Athens - Workpackage 1 Update - Doron Goldfarb
All WP Meeting Athens - Workpackage 1 Update - Doron Goldfarb
 
DM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platformDM2E DHAB meeting: WP3 Report Scholarly research platform
DM2E DHAB meeting: WP3 Report Scholarly research platform
 
Work Package 4 - Month 6 by Sam Leon
Work Package 4 - Month 6 by Sam LeonWork Package 4 - Month 6 by Sam Leon
Work Package 4 - Month 6 by Sam Leon
 
Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)
 
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...
All WP Meeting Athens - Preliminary Results of the Contextualisation - Klaus ...
 
Dm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_novDm2 e okfn-infoday_scholarly_activities_18_nov
Dm2 e okfn-infoday_scholarly_activities_18_nov
 
Wp1 2014
Wp1 2014Wp1 2014
Wp1 2014
 

Similar to Mariana Damova - Ontotext

ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMVladimir Alexiev, PhD, PMP
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your datalocloud
 
MARC records for archived websites on the Archive of Tomorrow project / Mark ...
MARC records for archived websites on the Archive of Tomorrow project / Mark ...MARC records for archived websites on the Archive of Tomorrow project / Mark ...
MARC records for archived websites on the Archive of Tomorrow project / Mark ...CILIP MDG
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...An overview of The European Library. Olaf Janssen presenting during DRH 2005,...
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...Olaf Janssen
 
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...Blue BRIDGE
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the webChiara Del Vescovo
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingSimeon Warner
 
Infrastructure - A necessary platform for user empowerment
Infrastructure - A necessary platform for user empowermentInfrastructure - A necessary platform for user empowerment
Infrastructure - A necessary platform for user empowermentRICHES
 
Edinburgh OldMapsOnline Workshop
Edinburgh OldMapsOnline WorkshopEdinburgh OldMapsOnline Workshop
Edinburgh OldMapsOnline WorkshopPetr Pridal
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKAndy Powell
 
Europeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsEuropeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsVladimir Alexiev, PhD, PMP
 

Similar to Mariana Damova - Ontotext (20)

Europeana datainaction nov2012
Europeana datainaction nov2012Europeana datainaction nov2012
Europeana datainaction nov2012
 
ResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRMResearchSpace- Example of a VRE Based on CIDOC CRM
ResearchSpace- Example of a VRE Based on CIDOC CRM
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your data
 
MARC records for archived websites on the Archive of Tomorrow project / Mark ...
MARC records for archived websites on the Archive of Tomorrow project / Mark ...MARC records for archived websites on the Archive of Tomorrow project / Mark ...
MARC records for archived websites on the Archive of Tomorrow project / Mark ...
 
Ee bdm ws-v1
Ee bdm ws-v1Ee bdm ws-v1
Ee bdm ws-v1
 
About company
About companyAbout company
About company
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...An overview of The European Library. Olaf Janssen presenting during DRH 2005,...
An overview of The European Library. Olaf Janssen presenting during DRH 2005,...
 
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
ICOS: Integrated Carbon Observation System Open data to open our eyes to clim...
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
Metadata and me
Metadata and meMetadata and me
Metadata and me
 
Museum reasonableview
Museum reasonableviewMuseum reasonableview
Museum reasonableview
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
Mind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvestingMind the gap! Reflections on the state of repository data harvesting
Mind the gap! Reflections on the state of repository data harvesting
 
Infrastructure - A necessary platform for user empowerment
Infrastructure - A necessary platform for user empowermentInfrastructure - A necessary platform for user empowerment
Infrastructure - A necessary platform for user empowerment
 
Edinburgh OldMapsOnline Workshop
Edinburgh OldMapsOnline WorkshopEdinburgh OldMapsOnline Workshop
Edinburgh OldMapsOnline Workshop
 
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UKThe Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
The Open Archives Initiative Protocol for Metadata Harvesting and ePrints UK
 
Europeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom ViewsEuropeana Creative. EDM Endpoint. Custom Views
Europeana Creative. EDM Endpoint. Custom Views
 
Semantic Technologies for Cultural Heritage
Semantic Technologies for Cultural HeritageSemantic Technologies for Cultural Heritage
Semantic Technologies for Cultural Heritage
 
Elibrary technical strategy
Elibrary technical strategyElibrary technical strategy
Elibrary technical strategy
 

More from Digitised Manuscripts to Europeana

08a punditdm2efinaleventpisa2014 141212080409-conversion-gate01
08a punditdm2efinaleventpisa2014 141212080409-conversion-gate0108a punditdm2efinaleventpisa2014 141212080409-conversion-gate01
08a punditdm2efinaleventpisa2014 141212080409-conversion-gate01Digitised Manuscripts to Europeana
 
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v202 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2Digitised Manuscripts to Europeana
 

More from Digitised Manuscripts to Europeana (20)

DM2E community building
DM2E community buildingDM2E community building
DM2E community building
 
Reasoning with Reasoning (STRiX 2014)
Reasoning with Reasoning (STRiX 2014)Reasoning with Reasoning (STRiX 2014)
Reasoning with Reasoning (STRiX 2014)
 
Wp4 results july dec 2014
Wp4 results july dec 2014Wp4 results july dec 2014
Wp4 results july dec 2014
 
Virtual exhibition presentation pim
Virtual exhibition presentation pimVirtual exhibition presentation pim
Virtual exhibition presentation pim
 
Pisa final all_wp_121214_wp1_dg
Pisa final all_wp_121214_wp1_dgPisa final all_wp_121214_wp1_dg
Pisa final all_wp_121214_wp1_dg
 
10 wp4 community building
10 wp4 community building10 wp4 community building
10 wp4 community building
 
09 pisa finale
09 pisa finale09 pisa finale
09 pisa finale
 
08b final event_experimente
08b final event_experimente08b final event_experimente
08b final event_experimente
 
07 dm2 e_seachange
07 dm2 e_seachange07 dm2 e_seachange
07 dm2 e_seachange
 
08a punditdm2efinaleventpisa2014 141212080409-conversion-gate01
08a punditdm2efinaleventpisa2014 141212080409-conversion-gate0108a punditdm2efinaleventpisa2014 141212080409-conversion-gate01
08a punditdm2efinaleventpisa2014 141212080409-conversion-gate01
 
06 dm2 e_pisa-wp2-no-anim
06 dm2 e_pisa-wp2-no-anim06 dm2 e_pisa-wp2-no-anim
06 dm2 e_pisa-wp2-no-anim
 
05 piotrowski
05 piotrowski05 piotrowski
05 piotrowski
 
04 pisa final_event_111214_wp1_dg
04 pisa final_event_111214_wp1_dg04 pisa final_event_111214_wp1_dg
04 pisa final_event_111214_wp1_dg
 
03 isaac dm2-e14-full
03 isaac dm2-e14-full03 isaac dm2-e14-full
03 isaac dm2-e14-full
 
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v202 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2
02 20141210 beyond_dm2_e_sustainable_digital_services_chambers_v2
 
01 welcome violeta_final_event_dm2_e_20141211
01 welcome violeta_final_event_dm2_e_2014121101 welcome violeta_final_event_dm2_e_20141211
01 welcome violeta_final_event_dm2_e_20141211
 
Vienna 2014-11-18-dm2 e
Vienna 2014-11-18-dm2 eVienna 2014-11-18-dm2 e
Vienna 2014-11-18-dm2 e
 
Dc 2014 baierer-droege
Dc 2014 baierer-droegeDc 2014 baierer-droege
Dc 2014 baierer-droege
 
Pundit2, DHAB meeting, 2 October 2014
Pundit2, DHAB meeting, 2 October 2014Pundit2, DHAB meeting, 2 October 2014
Pundit2, DHAB meeting, 2 October 2014
 
7 beat estermann 20140715_open_glam_satellite-event_input_ch
7 beat estermann 20140715_open_glam_satellite-event_input_ch7 beat estermann 20140715_open_glam_satellite-event_input_ch
7 beat estermann 20140715_open_glam_satellite-event_input_ch
 

Mariana Damova - Ontotext

  • 1. OWLIM Mariana Damova, PhD DM2E Vienna, November 2012
  • 2. Ontotext – Top-5 provider of core Semantic Technology – Established in year 2000; offices in Bulgaria, UK, USA – Active both in research and commercial projects (FP7 funding for 10 years) • 360° semantic technology – unique portfolio: – Semantic Databases: high-performance RDF DBMS, scalable reasoning – Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR) – Web Mining: focused crawling, screen scraping, data fusion – Linked Data Management and Data Integration Good recognition in the SemTech community – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at GYM, #3 for “linked data management” at Google Several joint ventures and subsidiaries – Innovantage: leading online recruitment intelligence provider in UK
  • 3. Ontotext Clients (selected) British Broadcasting Corporation (BBC) – Run its World Cup 2010 sites on top of OWLIM – Since Mar’12 BBC Sports – 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext Press Association (UK) – Analysis of Sports news – Concept extraction – Linked data generation Top-3 USA media (not allowed to name) The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM de Bibliothek (Holland) aggregation of data from 150 library databases
  • 4. Semantic Technologies • Semantic technologies (RDF, LOD) allow for an unprecedented ease of integration of heterogeneous data sources – Already adopted in pharmaceuticals and publishing industries – Cultural heritage is the next BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic Publishing” architecture, the BBC team observed considerable reduction of complexity of database design, query specification, application development, and query evaluation time. BBC World Cup 2010 dynamic semantic publishing. Jem Rayfield, Senior Technical Architect BBC News and Knowledge. http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna mic_sem.html
  • 6. Semantic Repository for RDFS and OWL • OWLIM is a family of scalable semantic repositories • OWLIM-Lite: in-memory, fastest, scales to ~100 million statements • OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion statements • OWLIM-Enterprise: replication cluster deployment for resilience and high performance parallel query-answering • OWLIM provides – Management, integration and analysis of heterogeneous data – Combined with light-weight, high-performance reasoning – The inference is based on logical rule-entailment – Full RDFS, OWL Horst, restricted OWL-Lite, OWL2-QL and OWL2 RL – Custom semantics can be defined via rules and axiomatic triples
  • 7. OWLIM in the Cultural Heritage Domain Selected commercial projects ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium. The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. SemTech for Cultural Heritage project funded by ITCC Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for Europeana Selected research projects MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge representationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext
  • 8. OWLIM PERFORMANCE • OWLIM is a scalable, robust and efficient triple store – Serving the two most important web-sites for the London Olympic Games • Official Olympics website • BBC Olympics website – Performance highlights • OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17 min. for 100M) • Best query performance among those repositories that can handle update and multi-client query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100 queries/sec) • OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario • OWLIM v5 requires between 25% and 70% less storage space • OWL 2 RL-type languages have proven to be the only feasible approach for reasoning with billion statements
  • 10. owl:sameAs Optimization a way to handle the equivalent statements by a single master node, which has as an impact efficient and compact handling of inferred statements resulting in 4-6 times more statements available to query than the explicitly introduced ones
  • 11. OWLIM Replication Cluster • Distribution through data replication is used to ensure: – Better handling of concurrent user requests – Failover support • How does it work? – Every user request is pushed in a transaction queue – Each data write request is are multiplexed to all repository instances – Each read request is dispatched to one of the instance only – To ensure load-balancing, each read requests is send to the instance with smallest execution queue at this point in time
  • 12. Geo-spatial index • Geo-spatial information concerns the geometry of points, shapes and distances relative to the surface of the Earth (or any spherical object). • When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to +90 degrees and the longitude ranging from -180 to +180 degrees. • airports have a reference point given by latitude, longitude and altitude; • political boundaries can be specified by polygons where each vertex is a 2-Dimensional latitude/longitude pair.
  • 13. RDF Rank • OWLIM-SE includes a plug-in that allows for efficient calculation of a modification of PageRank over RDF graphs • Computation of rank values is fast, e.g. – 400M LOD statements takes 310 sec (27 iteraions) • Results are available through a system predicate • Example: get the 100 most important nodes in the RDF graph SELECT ?n {?n rank:hasRDFRank ?r} ORDER BY DESC(?r) LIMIT 100
  • 14. Define: nested repositories ”Nested repositories” represent a new data management concept for RDF data: • a mechanism for sharing data stored across multiple repositories, where • one of them contains a large body of knowledge which gets embedded in other repositories • each containing more specific data, which are being interlinked with the common body of knowledge
  • 15. http://www.ontotext.com/owlim mariana.damova@ontotext.com