Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Europeana datainaction nov2012


Published on

  • Be the first to comment

  • Be the first to like this

Europeana datainaction nov2012

  1. 1. Europeana Semantic Data in Action (a Pilot Service based on OWLIM) Mariana Damova (PhD) (with contribution to the work by Antoine Isaac, Valentine Charles, Zdravko Tashev, Svetoslav Petrov) Europeana AGM November 2012
  2. 2. September 2012
  3. 3. Europeana Data Standards• Unified metadata • ESE – Europeana Semantic Elements • DublinCore & Europeana fields • 36 fields: flat, limited ability semantic links dc:title europeana:provider dc:creator europeana:dataProvider dc:subject europeana:rights dc:description europeana:type dc:publisher europeana:isShownBy and/or europeana:isShownAt … … • EDM - Europeana Data Model Basic data model Two contextual classes 3
  4. 4. Europeana Data in EDM• 268GB of data in RDF • 20M+ cultural objects data and linkages to other datasets, mainly DBpedia • EDM model • SKOS 4
  5. 5. Semantic Technologies – Main Features• Semantic technologies (RDF, LOD) allow for an unprecedented ease of integration of heterogeneous data sources – Already adopted in pharmaceuticals and publishing industries BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic Publishing” architecture, the BBC team observed considerable reduction of complexity of database design, query specification, application development, and query evaluation time. BBC World Cup 2010 dynamic semantic publishing. Jem Rayfield, Senior Technical Architect BBC News and Knowledge. mic_sem.html
  6. 6. Linking Open Data• Linking Open Data (LOD) W3C SWEO Community project• Initiative for publishing “linked data” – a set of principles, which allows browsing of RDF data, spread across different servers, in the way HTML is browsed
  7. 7. Semantic Technologies and Cultural Heritage combining facts and knowledge from different datasets need for convincing real life use cases demonstrating the benefits of these technologiesThe cultural heritage domain can become a useful usecase for the application of semantic technologies.MacManus, the Founder and Editor-in-Chief of ReadWriteWeb defined an exemplary test for the Semantic Web cities around the world which have Modigliani art works
  8. 8. FactForge of Ontotext solves the Modigliani queryby combining knowledge from 6 datasets from the Linked Open Data Cloud
  9. 9. OWLIM - a scalable, robust and efficient triple store – Serving the two most important web-sites for the London Olympic Games • Official Olympics website • BBC Olympics website – Performance highlights • OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product (17 min. for 100M) • Best query performance among those repositories that can handle update and multi-client query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about 100 queries/sec) • OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario • OWLIM v5 requires between 25% and 70% less storage space • OWL 2 RL-type languages have proven to be the only feasible approach for reasoning with billion statements
  10. 10. Reason-able View with Europeana data in EDM• 268GB of data• cultural objects data and linkages to other datasetsLoaded into OWLIM with inference wrt OWL-Horst OptimizedDataset size: NumberOfStatements=3,899,531,218 NumberOfExplicitStatements= 993,332,911 NumberOfEntities=264,523,842EDM modelSKOS
  11. 11. SPARQL endpoint•
  12. 12. Semantic Queries over Structured Data• Available objects with their aggregators• Data providers having contributing content to Europeana• Datasets from Italy• Objects from the 18th century provided to Europeana• The original URL, the copyright and the creative commons right of objects provided by The European Library• Copyrights and Creative Commons rights of Europeana objects per provider• Enrichment statements produced by Europeana for objects provided by institutions from the United Kingdom• List of Europeana enriched objects from Sweden, their equivalents and related entities• Time enrichment statements produced by Europeana for provided objects• The complete ordered list of Europeana aggregators and the specific data providers they gather
  13. 13. Europeana objects with their images
  14. 14. Other cultural heritage sources available for interlinking Gothenburg City Museum objects • Oil paintings from the GIM collection • Paintings of value less than 5000 Swedish Krona • Paintings with a Gothenburg motive • Portraits and their painters • Museum Objects from Swedish Museums • Museum objects of height more than 30 centimeter • Paintings given as a present to the Gothenburg City Museum
  15. 15. Linking Open Data Cloud
  16. 16. Outlook …Europeana Creative - PSP project lead by the Austrian National Library 26 partners Objective: experimenting with re-use of cultural content for creativity Project: Europeana re-use framework and 6 pilots in different domains such as education, tourism, etc. Ontotext: participate in the infrastructure for re-use with the semantic repository OWLIM, and data integrationSofia, 13 March 2012 17
  17. 17. Ontotext – Top-5 provider of core Semantic Technology – Established in year 2000; offices in Bulgaria, UK, USA – Active both in research and commercial projects (FP7 funding for 10 years)• 360° semantic technology – unique portfolio: – Semantic Databases: high-performance RDF DBMS, scalable reasoning – Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR) – Web Mining: focused crawling, screen scraping, data fusion – Linked Data Management and Data Integration Good recognition in the SemTech community – Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at GYM, #3 for “linked data management” at Google Several joint ventures and subsidiaries – Innovantage: leading online recruitment intelligence provider in UK
  18. 18. Ontotext Clients (selected) British Broadcasting Corporation (BBC) – Run its World Cup 2010 sites on top of OWLIM – Since Mar’12 BBC Sports – 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext Press Association (UK) – Analysis of Sports news – Concept extraction – Linked data generation Top-3 USA media (not allowed to name) The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM
  19. 19. Ontotext in the Cultural Heritage DomainSelected commercial projects ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium. The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext LODAC (Linked Open Data in Academia), Japans National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. SemTech for Cultural Heritage project funded by ITCC Semantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for EuropeanaSelected research projects MOLTO FP7 project, a use case in cultural heritage for a semantic knowledge representationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext
  20. 20. Thank you for your attention! 21