Media Fragments Indexing using Social Media


Published on

With more and more video shared on the Web, the practice of sharing a video object from a certain time point (deep-linking) has been implemented by many video sharing platforms. With so many media fragments created, annotated and shared, however, indexing video objects on a fine-grained level on the Web scale is still not implemented by major search engines. To solve this problem, this paper proposes Twitter Media Fragment Indexer, which monitors the Tweet text and uses the embedded URLs pointing to video fragments as the media to massively create index for media fragments. Some preliminary evaluation has shown that media fragments can be successfully indexed in large scale using this system.
This is a presentation from the LIME workshop at ESWC2014.

Published in: Internet, Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Media Fragments Indexing using Social Media

  1. 1. Media Fragment Indexing Using Social Media Yunjia Li1, Raphael Troncy2, Mike Wald1 and Gary Wills1 1School of Electronics and Computer Science University of Southampton, UK 2EURECOM, Sophia Antipolis, France, 1
  2. 2. Agenda • Media Fragments • Media Fragment Indexing Framework • Survey on Media Fragment URI Implementations on Video Sharing Platforms • Indexing Media Fragments Using Twitter • Conclusions and Future Work 2
  3. 3. Media Fragment • Denote the inside content of multimedia resources • Dimensions defined in the Media Fragment URI 1.0 spec – Temporal dimension,7 – Spatial dimension (a rectangle area),240,180,240 3
  4. 4. Current Situation • Multimedia uploading, sharing, tagging is easy • Searching a complete multimedia resource on major search engines is easy • But searching multimedia resource at a fine-grained level on major search engines is difficult – Availability of annotations: limited amount of annotations linked to media fragments – SEO problem: • The landing page is not search-engine-friendly • Everything is on the same page and the notion of media fragment is not explicitly embedded in HTML 4
  5. 5. Media Fragment Indexing Framework 5
  6. 6. Google’s Ajax Content Crawler • The Crawler is designed to index Ajax content • Replace token “#!” in URLs with “_escaped_fragment_” 6 *Diagram from crawling/docs/getting-started
  7. 7. Key Ideas • The fragment information must be included in the URL – Syntax: W3C Media Fragment 1.0 Specification • Prepare two sets of pages for every media fragment – original landing page for end-users – a snapshot page for SEO • Landing page keeps the original user interaction – Highlight media fragments on opening • SEO page – ONLY includes annotations of the media fragment – Embed rich snippet 7
  8. 8. The Solution 8 Server Crawler 1: 1: Submit pretty URL replay/1#!t=3,7 to the crawler 2: 2: Crawler asks server for replay/1?_escaped_fragment_=t=3,7 Terrace Theater 3: Snapshot page Snapshot/1?_escaped_frag ment_=t=3,7 3: Redirect the request to the snapshot page generated by the server. The snapshot page only contains annotations and Microdata for “#t=3,7”, Terrace Theater Linked Data Landing page replay/1#!t=3,7 Terrace Theater replay/1#!t=3,7 4: 4: The snapshot page is returned to the crawler with URL replay/1#!t=3,7 5: Terrace Theater 5: A user searches keyword “Terrace Theater” 6: replay/1#!t=3,7 6: Google includes replay/1#!t=3,7 in the search results 7: 7: The user click the link and ask for the document at replay/1#!t=3,7 8: 8: The server returns the landing page containing both “Terrace Theater” and “Linked Data” 9: 9: The landing page highlights the media fragment by start playing from 3s to 7s
  9. 9. Discussion • The Media Fragment Indexing Framework solved the SEO problem of media fragments • The scalability of such method largely relies on whether there are large number of annotations linked to media fragments • Looking for media fragment annotations? – Timed-text, transcript, speech recognition – Manual annotations on each video sharing platforms – Social Media (Twitter) 9
  10. 10. Survey on Media Fragment URI Implementation 10
  11. 11. Media Fragments and Social Media • The deep-linking function • A Media Fragment URL can be embedded in a Tweet • Text of the Tweet is the annotation to the URL • Get annotations by filtering Tweets that have MF URIs 11
  12. 12. Filter Tweets by Media Fragment URIs • Problem: – Any URL in Tweet is potentially a MF URI – Too many false-positive cases – They could all be MF URIs, need to be identified manually • Work around: – Identify platforms (partially-)implementing MF URI – Only filter Tweets containing URLs from those domains 12
  13. 13. Survey Methodology • Find a list of video sharing platforms – – 59 websites are targeted in the survey – Some of them have access restrictions • Go through each website manually to see whether they provide deep-linking function, such as: – Social sharing button from a certain time point – Deep-linking option in right click menu 13
  14. 14. Survey Results (1) • 9 websites partially-implemented MFURI –, Dailymotion, Hulu, Vbox7, Viddler, vimeo, Tudou, Youku and YouTube • They use different syntax to encode temporal dimension – Most of them use URI query, except YouTube & Vimeo – Parameter name: “start”, “t”, “st”, etc – Only Hulu implemented the end time • Only YouTube partially implemented spatial dimension – This is an external function implemented by Clickberry 14
  15. 15. Survey Results (2) • Only 9 websites partially-implemented MFURI, however: – Those websites have covered most videos shared on the web – eBizMBA report: • Select filter keywords based on the survey results: – Twitter is banned in China, so, Tudou and Youku are ignored – Hulu has access restriction outside U.S. • Filter keywords: “YouTube”, “Dailymotion”, “Vbox7”, “Vimeo” and “Viddler” 15
  16. 16. Indexing Media Fragments Using Twitter 16
  17. 17. Twitter Media Fragment Indexer • Collect Tweets filtered by the keywords • Extract MF URIs in Tweets, parse the media fragment information • Use Media Fragment Indexing Framework to publish Tweets as media fragment annotations • Embed rich snippet in the snapshot pages • Create sitemap for Google to crawl the snapshot pages • User searches keywords in the Tweet in Google and the link will lead to the video with corresponding start time 17
  18. 18. The Detailed Workflow 18
  19. 19. Indexing Results (1) • Monitor 50-hour non-stop Twitter stream • Filter phrase: “youtube, dailymotion, vimeo, vbox7, viddler” • 5,779,858 Tweets examined, 5,269,742 contain URLs • 32,754 Tweets contain MF URIs, 32796 MF URIs in total • Media Fragment URIs shared in each website: 19 Website No. of MFURIs % YouTube 32,666 99.604 Dailymotion 101 0.308 Vbox7 0 0 Viddler 0 0 Vimeo 29 0.088
  20. 20. Indexing Results (2) • 13,088 distinct videos are found • 17,854 distinct MF URIs for sitemap – Many Tweets share the same video, but different fragments – Many retweets – Some video are not available in UK • 17,479 URLs (97.9%) in the sitemap have been indexed by Google • Only 775 URLs are indexed as VideoObject even though all rich snippets are embedded in all snapshot pages 20
  21. 21. Demo • Search “Chris Eppstein” • As a result, this landing page will be opened and the video start playing from the time indicated in the Tweet containing keywords “Chris Eppstein” 21
  22. 22. Conclusions and Future Work 22
  23. 23. Conclusions • Introduced Media Fragment Indexing Framework • Propose the using of social media to acquire more annotations to media fragments • Survey the MF URI implementation on major video sharing platforms • Twitter Media Fragment Indexer – Monitor Tweet Stream and automatically create media fragment annotations – Index media fragments in Google – YouTube is the most important domain to share media fragments on Twitter 23
  24. 24. Future Work • How valid tweets could be served as media fragment annotations – many noisy and unrelated text – many re-tweets • Experiment on larger scale (billions of tweets and continuous monitoring) • Expand the methodology to other media fragment annotations, such as timed-text • Extract named entities from tweets and further link media fragments to the Linked Data Cloud 24
  25. 25. Questions? 25