What do we want computers to do for us?


Published on

This slide deck has been prepared for a workshop on Linked Data Publishing and Semantic Processing using the Redlink platform (http://redlink.co). The workshop delivered at the Department of Information Engineering, Computer Science and Mathematics at Università degli Studi dell'Aquila aimed at providing a general understanding of Semantic Web Technologies and how these can be used in real world use cases such as Salzburgerland Tourismus.

A brief introduction has been also included on MICO (Media in Context) a European Union part-funded research project to provide cross-media analysis solutions for online multimedia producers.

Published in: Technology, Education

What do we want computers to do for us?

  1. 1. A framework for knowledge extraction, linked data and semantic search. What do we want computers to do for us?
  2. 2. We have data. • From 2005 to 2020, the digital universe will grow in size by a factor of 300, from 30 exabytes to 40 trillion gigabyte (40 ZB). • From now until 2020, the digital universe will about double every two years. • Volumes of data are projected to reach 5.247 GB per person with emerging economies playing an increasingly important role (producing two thirds of the world data by the end of this decade). • Only 0.5% of this data is used today for analysis. • The amount of information individuals create themselves - writing documents, taking pictures, recording audio - is far less than the information being created about them in the digital universe. [IDC I V I E W, 2012]
  3. 3. What do we want computers to do for us? Text Images/Video Audio "language": "de" Categorisation, Summarisation, Search, Question/Answer, … "label": "outdoor" Suggest tags, Image search, … Automatic Speech Recognition, Speaker identification, Music classification, … [Andrew NG, 2011]We want computers to process data.
  4. 4. Natural Language Processing We use it everyday. [J U RAFSKY & MARTIN, 2008] a theoretically motivated range of computational techniques for analysing naturally occurring text/speech for the purpose of achieving human-like language processing.
  5. 5. Features extraction in text/speech. Levels of knowledge encoding in language data. INPUT Morphologic Syntactic Semantic FEATURES NLP { Parser Lexical DB Stemming AnaphoraPos Tagging NER
  6. 6. TEXT NLP FEATURES WISDOM What do we want computers to do with a text? STRUCTURED DATA CONTEXT We want computers to make sense of unstructured data. KNOWLEDGE { Semantic Lifting
  7. 7. TEXT WISDOM A practical example. CONTEXT Combining Semantic Web technologies with NLP technologies. KNOWLEDGE Lucoli "label": "Lucoli" "values": ["13.338889"], "predicate": "http:// www.w3.org/2003/01/ geo/wgs84_pos#long" "values": ["42.29194444444445"], "predicate": "http://www.w3.org/2003/01/geo/ wgs84_pos#lat" "values": [ ! ! ! ! ! ], "predicate": "http:// xmlns.com/foaf/0.1/ depiction" About 20 minutes car drive from L’Aquila. …
  8. 8. How we started. Building an open platform for knowledge extraction, linked data and semantic search. ! Delivering the world’s most advanced open source content analysis and making linked data publishing and information discovery accessible to anyone.
  9. 9. • Incorporating requirements from industry partners: • CMS companies • System integrators • Tool providers • Inheriting 6 years of IP with R&D on: • Semantic Information Management and Publishing (RDF and Semantic Web Technology) • Semantic Processing • Conceptual Search
  10. 10. CONTENT ANALYSIS LINKED DATA PUBLISHING 1 3 Linked Data Cloud Technology Stack Text Legacy Data Audio/Images (under development) CONTENT DISCOVERY2 • Enterprise Linked Data • Content Enhancement • Semantic Search
  11. 11. • Semantic enhancement process chaining • Multiple NLP features extraction facilities • Multiple language support • Content classification and sentiment analysis • Graduated as Top Level Project of the Apache Foundation in September 2012 STANBOL.APACHE.ORG A Toolbox for Semantic Processing.
  12. 12. SOLR.APACHE.ORG The Highly Scalable Search Server. • Based on Apache Lucene • Various language specific processing procedures • Highly scalable (Solr cloud) and highly configurable • Ultra fast indexing/searching, indexes can be merged/ optimised • Semantic Search available with an easy-to-install Redlink Plugin
  13. 13. DEV.REDLINK.IO/PLUGINS/SOLR Adding Semantic Search to Apache Solr. • Boost your existing Apache Solr installation with semantic enhancements via Redlink Content Analysis • Watch the screencast • Learn more• Customising the semantic enhancements with user-created vocabularies and Redlink NLP extraction facilities
  14. 14. Managing vocabularies. Vocabularies DEV.REDLINK.IO/API/1.0-BETA.html#linked-data • Build your first app • Learn more • Redlink allows users to create their own Linked Data server for managing vocabularies or publishing datasets for Linked (Open) Data projects • Datasets managed with Redlink can be made available for content analysis and linking • Datasets can be either private (Linked Enterprise Data) or public (Linked Open Data) ! • Public Datasets such as DBpedia, Freebase and GeoNames are available for de-referencing and interlinking
  15. 15. • Read-Write Linked Data • Triple store with transactions, versioning and rule-based reasoning • SPARQL and LDPath query languages • Transparent Linked Data Caching • Graduated as Top Level Project of the Apache Foundation in November 2013 MARMOTTA.APACHE.ORG The Open Platform for Linked Data.
  16. 16. An Open Linked Data Project for Tourism in Salzburg • Cross platform publishing as more travellers massively begin using mobile devices • Multiple Web CMSs (both proprietary and open source) to be managed simultaneously • Costly manual curation and interlinking • Increasing demand for content syndication (from big players like foursquare as well as from local application developers) • Need for better SEO especially for events and sites (too regional to be understood by commercial search engines)
  17. 17. Remixing existing content and creating new value. A magazine running on WordPress An online booking system freshly updated content on locations and events a database containing: events, facilities, accommodations, … Everything we know already from Wikipedia the World’s largest encyclopedia Using Linked Data to make sense of the information
  18. 18. Linked Data Publishing • Data from the online booking system (Feratel) is enriched and transformed in triples using identified vocabularies and ontologies • Triples are stored in the Redlink triple store in a dedicated context • RDF data and SPARQL end-points are published to the data website (data.salzburgerland.com) running CKAN as Linked Open Data • CKAN makes the data accessibile to third parties in various formats by querying Redlink
  19. 19. Transforming Feratel Data in Semantic Knowledge from SOAP to Linked Data
  20. 20. Ontologies provide a mean to hold everything together Data Modelling with LODE
  21. 21. Using LODE: An ontology for Linking Open Descriptions of Events Adding the relationships between things
  22. 22. Florianifeier with RDF different data sources are integrated to provide robot-friendly information that describe real world things <subject><predicate><object>
  23. 23. Semantic Lifting and Linked Data Principles • A “word” or “phrase” becomes an identifier used to denote “things” (named entities) existing in the real world 1.Real-world thing are unambiguously represented with web addresses (URI) 2.By accessing these web addresses (HTTP-URI) usable data is sent in return using standard formats (RDF, SPARQL) 3.This data includes links to other data so that people can discover more things "label":"May", "reference": “http://dbpedia.org/ resource/May” ! Type: Thing "values"["13.7446"],"predicate": "http:// www.w3.org/2003/01/geo/wgs84_pos#long" values"["47.10222"],"predicate": “http:// www.w3.org/2003/01/geo/wgs84_pos#lat” "reference": “http://dbpedia.org/page/Unternberg” ! Type: Place “label":"Florianifeier", "reference":“http:// rdf.salzburgerland.com/ events/event/dea7fde1-5583-4002-97eb-007 4a182fa9c.html”! Type: Event Tim Berners-Lee. LANGUAGE EVENT THING LOCATION ENGLISH FLORIANIFEIER MAY UNTERNBERG [Très Riches Heures du duc de Berry, Raymond Cazelles et Johannes Rathofe] “This May don't miss the Florianifeier, we'll have fun as usual in Unternberg”
  24. 24. Dynamic Semantic Publishing with ordLiftW • Data from the Redlink triple store is made available for content enrichment and can be edited using WordLift, a semantic plugin for WordPress.
  25. 25. Data Curation • Using Linked Data the Web becomes my new CMS • information is automatically imported in WordPress • posts are connected with entities • properties for each entity can be edited using WordPress • any change is automatically reflected in the triple-store and re-published as Open Data Using Linked Data and WordLift the Web becomes your new CMS. editing a blog post editing an entity
  26. 26. Web Search 19.900 results no answer Touristic applications attempting to discover events in Salzburgerland. “Which events occur in May in Lungau?” Linked Open Data Query 5 result 5 answer Unternberg is a village in the area of Lungauon google.at!!
  27. 27. Better SEO using Semantic Markup Florianifeier Unternberg • Using schema.org the data from the triple-store is added to the pages as semantic markup • Search engines can finally “recognise” entities that were previously unknown (i.e. Florianifeier) ordLiftW
  28. 28. •Media in cross-media context, allowing to analyse media resources as well as connected content, including video, images, audio, text, link structure and metadata; •Investigate cross-media analysis along the complete, distributed analysis chain, namely extraction, metadata publishing, querying and recommendations; •Contribute its main software development results as Open Source components to two established Apache projects, Apache Marmotta and Apache Stanbol, simplifying the use of the technology in industrial products. What do we want computers to do with Media? MICO-PROJECT.EU
  29. 29. “Show me the tempo-regional fragments where Lewis Jones is right beside Connor Macfarlane?” MICO-PROJECT.EU PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX mm: <http://linkedmultimedia.org/sparql- mm/functions#> PREFIX ma: <http://www.w3.org/ns/ma-ont#> PREFIX dct: <http://purl.org/dc/terms/> ! SELECT (mm:boundingBox(?l1,?l2) AS ?left_right) WHERE { ?f1 ma:locator ?l1; dct:subject ?p1. ?p1 foaf:name "Lewis Jones". ?f2 ma:locator ?l2; dct:subject ?p2. ?p2 foaf:name "Connor Macfarlane". ! FILTER mm:rightBeside(?l1,?l2) FILTER mm:temporalOverlaps(?l1,?l2) } We want computers to process media.
  30. 30. GRAZIE! foaf:name “Andrea Volpini" Hopefully soon in the Linked Data Cloud!
  31. 31. CREDITS ANDREW NG, 2011 J U RAFSKY & MARTIN, 2008 Webscale IA using Linked Open Data on slideshare by reduxd LODE linking open descriptions of events aswc 2009 on slideshare by Raphael Troncy Semantic SEO in the post-Hummingbird era on slideshare by Kim Renberg and Andrea Volpini Querying of metadata, media content and context in MICO a demo by Thomas Kurz this presentation is the result of many inspiring ideas and amazing work from other people and here is the list: any idea, graphics or meme belonging to us is available for sharing, copying and re-mixing under creative commons license 3.0