ViSTA-TV Workpackage 6: External Data Service for Metadata Enrichment & Novel TV Recommendations

Uploaded on …
ViSTA-TV project:
Video Stream Analytics for Viewers in the TV Industry

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Video Stream Analytics for Viewers and the TV Industry WP6: External Data Service
  • 2. 2WP6: External Data Service Objectives
  • 3. WP6 Objectives 3TITLE •  O.6.1 •  External data service design •  Analysis of candidate sources •  Analysis of data extracted •  O.6.2 •  External data service employed •  Enrich the EPG data •  Enrich feature extraction data •  Discover links between programs for novel recommendations •  O.6.3 •  Publish data to the Linked Open Data cloud The external data service aims at supporting the recommendation process by improving the connectivity of TV programs, which does not surface with the standard EPG metadata.
  • 4. ViSTA-TV External Data Service 4TITLE load enrich publish load
  • 5. External Data Service 5 "World War II" "Television Program" "Green Cross Code" "Tom Stoppard" "David Prowse" synopsis concepts "In this episode, Larry meets two veterans who each lost a limb in World War 2 to ask how differently we treat today 's injured soldiers. Plus a look back at the iconic Green Cross Code films. With Stuart Hall and Miriam Stoppard" po:long_synopsis "Larry Lamb" "Miriam Stoppard" "Stuart Hall" po:credit po:credit "" "" "" "" "" "" "" po:credit EPG DWH Concept tagging DBpedia:<LABEL> LABELrdfs:labeldc:subject Language Detection Synopsis Credits Title DBpedia:<concept>
  • 6. Zattoo Data Service: RDF 6WP6: External Data Service "9966901" po:pid "Die allerbeste Sebastian Winkler Show"dc:title "mit Motsi Mabuse, Lady Bitch Ray und Sarah Brendel" zattoo:episode_title po:masterbrand "(Premiere in Einsfestival )" po:long_synopsis po:category po:episode rdf:type po:credit po:credit po:credit "guest" "Sarah Brendel" "guest" "Motsi Mabuse" "guest" "Lady Bitch Ray" po:role po:alias po:role po:alias po:role po:alias po="" zattoo="" dc="" rdf =" rdf syntax ns#"
  • 7. 7WP6: External Data Service
  • 8. 8WP6: External Data Service Enrichments Service
  • 9. 9WP6: External Data Service
  • 10. 10WP6: External Data Service
  • 11. 11WP6: External Data Service
  • 12. 12WP6: External Data Service
  • 13. 13WP6: External Data Service
  • 14. 14WP6: External Data Service
  • 15. LOD Linking Service 15TITLE WP5
  • 16. 16WP6: External Data Service Recommendations
  • 17. LOD for recommendations 17External Data Service •  LOD datasets provide additional information which can be used to provide novel TV recommendations •  The challenge is to identify those links which are more useful to be used in the recommendation process. •  We started to analyze the datasets to identify features which can help in selecting the right links to use
  • 18. 18WP6: External Data Service Current & Future Work
  • 19. Current & Future Work 1.  Continuously adding new sources 2.  Continuous improvement of EPG enrichment quality •  complimentary services •  crowdsourcing 3.  Defining LOD-based notion of serendipity 4.  Further studies on the LOD patterns and their suitability for recommendations 5.  Applying approach in other domains, e.g. books 19TITLE
  • 20. 1. Adding new sources 20TITLE Dataset  Objects  Triples  Links to ...  DBpedia  3.77 mil  400 mil  27.2 mil  Freebase  23 mil  337 mil  3.9 mil  BBC  60 mil  43.237  BBC music  20 mil  23.000  NYT  10.467  345.889  23.400  MusicBrainz  178 mil  855.754  Flickr  1.95 mil  5.61 mil  3.400.000  LinkedMDB  503.242  6 mil  162 756  GeoNames  8 mil  94 mil  0  LinkedGeoData  1 bil  20 bil  53204 
  • 21. 2. Data cleaning Following the grandeur of Baroque, Rococo art is often dismissed as frivolous and unserious, but Waldemar Januszczak disagrees. […] The first episode is about travel in the 18th century and how it impacted greatly on some of the finest art ever made. The world was getting smaller and took on new influences shown in the glorious Bavarian pilgrimage architecture, Canaletto's romantic Venice and the blossoming of exotic designs and tastes all over Europe. The Rococo was art expressing itself in new, exciting ways. enrichment “Canaletto” ontology:Location “Rococo” dbpedia:Rococo_(band) •  Type mis-classification •  URI mis-annotation v  Integration of different text annotators results v  Validation through crouwdsourcing tasks Collaboration with: Silvia Giannini
  • 22. 2. Data cleaning extractor label DBpedia ontology class DBpedia URI Canaletto ontology:Location dbpedia:Canaletto TextRazor Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto Canaletto dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto •  Label •  NERD ontology class •  sameAs link •  Label •  DBpedia ontology class •  Dbpedia URI •  Label •  DBpedia category •  Wikipedia page •  Label •  DBpedia ontology class •  DBpedia URI Type & URI alignment Voting system: <Canaletto, dbpedia-owl:[Artist, Agent, Person] dbpedia:Canaletto> 3/4
  • 23. Validate: •  Labels relevance •  Relevant labels types results integration Aggregated enrichment (based on majority vote) Automatic integration of text annotators for enrichment Analysis of collected data for: •  Voting system validation (also URIs) •  Parameters tuning (e.g., complementarity handling) Program synopsis What if: •  there is a tie-break? •  majority of annotators are wrong? •  more granular alignment ontologies are adopted to avoid lack of type (or, type owl:Thing)? Aggregated enrichment (based on majority vote)
  • 24. 24WP6: External Data Service LOD & Serendipity
  • 25. 3. LOD-based Sependipity 25WP6: External Data Service Collaboration with:
  • 26. LOD-based Sependipity 26WP6: External Data Service
  • 27. 27WP6: External Data Service Diversity
  • 28. 4. LOD-based Patterns for Diversity 28WP6: External Data Service LOD-based method for increasing diversity in recommendations •  extracts all the patterns from an RDF dataset à clusters generated & measured for diversity •  fed into two statistical models •  to determine, which semantic patterns can extract subsets of Linked Data to improve diversity in recommendations •  data characterization step to choose model •  diversity measures, e.g. entropy & semantic similarity •  IMDB & DBPedianoisiness, size & sparsity of LOD
  • 29. 29WP6: External Data Service Applied to ‘Books’ Domain
  • 30. References •  Valentina Maccatrozzo, Lora Aroyo and Willem Robert van Hage, Crowdsourced Evaluation of Semantic Patterns for Recommendations, User Modeling, Adaptation, and Personalization, Rome, Italy, July 10-14, 2013. •  Valentina Maccatrozzo, Davide Ceolin and Lora Aroyo, LOD Enrichment of TV Programs, in W3C Italy Event: Linked Open Data: where are we?, Rome, Italy, February 20-21, 2014. •  Valentina Maccatrozzo, Davide Ceolin, Lora Aroyo and Paul Groth, Semantic Pattern- based Recommender, Extended Semantic Web Conference (ESWC2014), Heraclion, Greece, May 25-29, 2014. •  Ceolin, Davide, Moreau, Luc, O'Hara, Kieron, Fokkink, Wan, Van Hage, Willem Robert, Maccatrozzo, Valentina, Sackley, Alistair, Schreiber, Guus and Shadbolt, Nigel (2014) Two procedures for analyzing the reliability of open government data. Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU'2014), Montpellier, FR, 15 Jul 2014. 30TITLE