This document discusses technologies for video fragment creation and annotation for the purpose of video hyperlinking. It describes video temporal segmentation to shots and scenes to break videos into fragments. It also discusses visual concept detection and event detection for annotating fragments so meaningful hyperlinks between fragments can be identified. An example approach is described that uses visual features to detect both abrupt and gradual shot transitions with high accuracy at 7-8 times faster than real-time.
Authors/Presenters: Vasileios Mezaris and Benoit Huet.
Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video hyperlinking approaches and in relevant enabling technologies, such as video analysis and multimedia indexing and retrieval. Several alternative strategies, based on text, visual and/or audio information are introduced, evaluated and discussed, providing the audience with details on what works and what doesn’t on real broadcast material.
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
Fast object re-detection and localization in video for spatio-temporal fragment creation, Jul. 2013, San Jose, California, USA. Talk provided by Vasileios Mezaris.
Authors/Presenters: Vasileios Mezaris and Benoit Huet.
Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video hyperlinking approaches and in relevant enabling technologies, such as video analysis and multimedia indexing and retrieval. Several alternative strategies, based on text, visual and/or audio information are introduced, evaluated and discussed, providing the audience with details on what works and what doesn’t on real broadcast material.
Semantics at the multimedia fragment level SSSW 2013Raphael Troncy
"Semantics at the multimedia fragment level or how enabling the remixing of online media" - Invited Talk given at the Semantic Web Summer School (SSSW), 12 July 2013
Authors/Presenters: Vasileios Mezaris and Benoit Huet.
Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video hyperlinking approaches and in relevant enabling technologies, such as video analysis and multimedia indexing and retrieval. Several alternative strategies, based on text, visual and/or audio information are introduced, evaluated and discussed, providing the audience with details on what works and what doesn’t on real broadcast material.
Fast object re-detection and localization in video for spatio-temporal fragme...LinkedTV
Fast object re-detection and localization in video for spatio-temporal fragment creation, Jul. 2013, San Jose, California, USA. Talk provided by Vasileios Mezaris.
Authors/Presenters: Vasileios Mezaris and Benoit Huet.
Video hyperlinking is the introduction of links that originate from pieces of video material and point to other relevant content, be it video or any other form of digital content. The tutorial presents the state of the art in video hyperlinking approaches and in relevant enabling technologies, such as video analysis and multimedia indexing and retrieval. Several alternative strategies, based on text, visual and/or audio information are introduced, evaluated and discussed, providing the audience with details on what works and what doesn’t on real broadcast material.
Semantics at the multimedia fragment level SSSW 2013Raphael Troncy
"Semantics at the multimedia fragment level or how enabling the remixing of online media" - Invited Talk given at the Semantic Web Summer School (SSSW), 12 July 2013
The importance of Linked Media to the Future WebLinkedTV
If the future Web will be able to fully leverage the scale and quality of online media, a Web scale layer of structured, interlinked media annotations is needed, which we will call Linked Media, inspired by the Linked Data movement for making structured, interlinked descriptions of resources better available online. Mobile and tablet devices, as well as connected TVs, introduce novel application domains that will benefit from broad understanding and acceptance of Linked Media standards. In this talk, I will provide an overview of current practices and specification efforts in the domain of video and Web content integration, drawing from the LinkedTV and MediaMixer projects. From this, I will present a vision for a Linked Media layer on the future Web will can empower new media-centric applications in a world of ubiquitous online multimedia.
Interactive Video Search - Tutorial at ACM Multimedia 2015klschoef
This is the presentation given by Klaus Schoeffmann and Frank Hopfgartner at the ACM Multimedia 2015 Tutorial in Brisbane, Australia (October 26, 2015). #acmmm15
Find paper here:
http://dl.acm.org/citation.cfm?id=2807417
Video Browsing - The Need for Interactive Video Search (Talk at CBMI 2014)klschoef
These are the slides from my keynote talk about Video Browsing on June 18, 2014, at the International Workshop on Content-Based Multimedia Indexing (CBMI) 2014.
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...LinkedTV
In this session we will introduce the W3C Media Fragment URI specification, highlighting how media fragments can be incorporated into known media description schema, with a focus on the W3C Media Ontology and the Open Annotation Model. We will also discuss extensions to these ontologies to more richly link media fragments to the concepts they represent, re-using Linked Data as a Web-wide knowledge graph about concepts. We will briefly demonstrate various approaches to visual, audio and textual analysis in order to generate meaningful media fragments out of a media resource, as well as look at available annotation tools for semantically describing online media. Finally, we show how existing text around media (subtitles, transcripts) can be used for fragment annotation through Named Entity Recognition services (NERD) and a combined approach for generating a semantic description of media from analysis, metadata and entity recognition (TV2RDF).
Remixing Media on the Semantic Web (ISWC2014 Tutorial) Pt 2 Linked Media: An...LinkedTV
The second session looks at how using Linked Data principles for media fragment annotation publication and retrieval (Linked Media) can enable online media fragment re-use:
Introducing the Linked Media principles
Publishing Linked Media using dedicated multimedia RDF repositories
Retrieval of media resources that illustrate linked data concepts
Using the Linked Data graph to find relevant links between distinct media assets (examples with SPARQL)
Retrieval of links between annotated media to enable topical browsing (using the TVEnricher service)
Examples of Linked Media at scale: VideoLyzard and HyperTED
LinkedTV. Engaging TV viewers with AudioVisual heritage on second screens EUscreen
'LinkedTV. Engaging TV viewers with AudioVisual heritage on second screens' by Lyndon Nixon (MODUL University, Vienna) and Lotte Belice Baltussen (Sound and Vision, Hilversum) - a presentation held at EUscreenXL Rome Conference 'From Audience to User: Engaging with Audiovisual Heritage Online' (http://blog.euscreen.eu/conference-programme).
How Open Data Can Enhance Interactive TelevisionLinkedTV
The presentation was delivered by Lyndon Nixon, STI International Consulting and Research GmbH, Austria, during the ngnlab.eu Workshop http://ngnlab.eu/index.php/ngnlabeu-workshop, held in Bratislava during September 20th, 2012. The workshop was co-located with the 5th joint IFIP Wireless and Mobile Networking Conference (WMNC 2012 http://wmnc.fiit.stuba.sk.
Purpose of the workshop is bringing together researchers and experts from academia as well as from business which came from Germany, Nederlands, Spain, Austria and Slovakia.
Av Relaties: Zoeken En Contextualisering In Linkedtv En AxesLinkedTV
The talk was delivered by by Lotte Belice Baltussen, Sound and Vision at the iMMovator Cross Media Café "Uit het lab", 12 February 2013, the Media Park in Hilversum, The Netherlands.
More information please visit: http://bit.ly/10gYc7L
Implementation of Hyperlinks in videos with HTML5LinkedTV
This presentation was presented by Rolf Fricke, Condat AG, during Xinnovations 2012 in Humboldt University in Berlin on September 11th, 2012.
The main objective of the LinkedTV project is the integration of hyperlinks in videos to open up new possibilities for the interactive, seamless usage of video on the Web. One challenge is the placement of tags and hyperlinks above the video layer, which should be closely associated to the underlying media fragments for persons or objects shown in the video. As the media fragments dynamically appear, move and disappear a precise synchronization of the overlays and related media fragments is needed. We plan to implement these features together with further user interface features on the basis of the HTML5 elements video, CSS3 and web workers. As we target WebTV as well as Broadcast TV, we plan to provide a restricted HbbTV 1.1 implementation for TV-sets, but we finally expect to profit from a HTML5 integration in upcoming HbbTV releases.
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...LinkedTV
Semantic annotation of media resources has been a focus in research since many years, the closing of the "semantic gap" being seen as key to signicant improvements in media retrieval and browsing and
enabling new media applications and services. However, current tools and services exhibit varied approaches which do not easily integrate and
act as a barrier to wider uptake of semantic annotation of online multimedia.
In this paper, we outline the Linked Media principles which can help form a consensus on media annotation approaches, survey current media annotation tools against these principles and present two emerging
toolsets which can support Linked Media conformant annotation, closing with a call to future semantic media annotation tools and services to follow the same principles and ensure the growth of a Linked Media
layer of semantic descriptions of online media which can be an enabler to richer future online media services.
LinkedTV - an added value enrichment solution for AV content providersLinkedTV
Linked Television is offering a solution for audiovisual content owners to semi-automatically enrich media with links to additional information and content related to objects and topics in the program and build client applications which access this data and provide new added value services to consumers.
Kurzer Vortrag über ein Konzept zu einem verlinkten Fernsehdienst über HbbTV. Der Dienst wird im Rahmen des EU-geförderten Forschungs- und Entwicklungsprojekt LinkedTV (www.linkedtv.eu) u.a. beim Rundfunk Berlin-Brandenburg, rbb, in Zusammenarbeit mit internationalen Partnern entwickelt und getestet.
TV newscasts report about the latest event-related facts oc- curring in the world. Relying exclusively on them is, however, insufficient to fully grasp the context of the story being reported. In this paper, we propose an approach that retrieves and analyzes related documents from the Web to automatically generate semantic annotations that provide viewers and experts comprehensive information about the news. Using different Semantic Web and information retrieval techniques, we generate what we call Semantic Snapshot of a Newscast (NSS)
An introduction to HbbTV Hybrid Broadcast Broadband TV. What is it? How does it work? Red button and #HbbTV Service examples.
Presented at Thailand's Engineering Expo November 29, 2014 and at Thailand's Set-top box Committee's Public Hearing at Thailand's Engineering Institute (EIT) November 20, 2014.
View our MVNO SERVICE PRESENTATION for an easy read on some of the MVNO services we offer
★ MVNO/MNO NEGOTIATIONS
✓ Negotiation of MVNO wholesale agreement
✓ Negotiation of additional terms
✓ Contract review and advice
★ PRODUCT AND SALES PLANNING
✓ Product/Service Planning
✓ Market Segmentation and data
✓ Market forecasts and modelling
✓ Marketing Planning and costing
★ DEVELOPMENT OF BUSINESS PLAN
✓ Corporate Strategy
✓ Financial planning and modelling
✓ Investment analysis
✓ Operation planning
✓ Product and Marketing
★ LICENSE APPLICATION
✓ Advise and experience in relation to submitting for MVNO license
✓ Business planning and modelling with a focus on the application
✓ Preparation of License Application documentation as required by the telecom regulator
★ MVNO WORKSHOP
For more information please visit: www.yozzo.com
Yozzo's annual free report and info graphic with: Figures, tables, information and statistics about Thailand's Telecom Market end of 2015 ★
✔ Blended MOU
✔ Blended ARPU
✔ MVNO in Thailand
✔ 4G subscribers in Thailand
✔ Mobile Revenue per/minute
✔ AIS Highlights end of 2015
✔ DTAC Highlights end of 2015
✔ True Move Highlights end of 2015
✔ Mobile Internet usage in Thailand
✔ Thailand’s Mobile Subscriber Growth
✔ Smartphone sales in Thailand 2015
✔ Mobile Operator Market Shares 2015
✔ Amount of smartphone users in Thailand
✔ Types of Internet connections in Thailand
✔ Thailand’s Mobile user’s consumption and more…
¹ MVNO Definition: http://www.yozzo.com/mvno-wiki/mvno-definition
² The History of MVNO | http://www.yozzo.com/mvno-wiki/the-history-of-mvno | August 2016 | Yozzo.com
³ Why MVNOs in Thailand have failed: http://www.yozzo.com/news-and-information/mvno-mobile-operator-s/why-mvnos-in-thailand-have-failed
✔ ส่วนแบ่งการตลาดของบริการโทรศัพท์เคลื่อนที่ (ร้อยละ) : MOBILE MARKET SHARE %
✔ จำนวนผู้ใช้บริการโทรศัพท์เคลื่อนที่ (MOBILE SUBSCRIBERS)
✔ สัดส่วนรายรับของผู้ให้บริการในตลาดบริการโทรศัพท์เคลื่อนที่ ( MOBILE REVENUE % )
✔ อัตราการขยายตัวของผู้ใช้บริการโทรศัพท์เคลื่อนที่ (ร้อยละ) : MOBILE GROWTH RATE %
✔ อัตราการเข้าถึง (การใช้) บริการโทรศัพท์เคลื่อนที่ : MOBILE PENETRATION
✔ รายรับเฉลี่ยจากบริการโทรศัพท์เคลื่อนที่โดยรวมการเชื่อมต่อ (บาท/เลขหมาย/เดือน) : MOBILE ARPU EXCLUDED IC
✔ MOBILE MOU (MINUTE/MONTH)
✔ รายรับเฉลี่ยจากการให้บริการโทรศัพท์เคลื่อนที่ต่อนาที : MOBILE REVENUE PER MINUTE (RPM)
✔ สัดส่วนบริการเสียงและไม่ใช่เสียง (Mobile Non-voice/Voice Ratio)
✔ จำนวนผู้ใช้บริการโทรศัพท์ประจำที่ (Fixed Line Subscribers)
✔ สัดส่วนการเข้าถึง (การใช้) บริการโทรศัพท์ประจำที่ต่อจำนวนประชากร (ร้อยละ) : Fixed Line Penetration per Population %
✔ สัดส่วนการเข้าถึง (การใช้) บริการโทรศัพท์ประจำที่ต่อจำนวนครัวเรือน(ร้อยละ) : Fixed Line Penetration per Household %
✔ เปรียบเทียบสัดส่วนจำนวนผู้ใช้บริการระหว่างโทรศัพท์เคลื่อนที่
The importance of Linked Media to the Future WebLinkedTV
If the future Web will be able to fully leverage the scale and quality of online media, a Web scale layer of structured, interlinked media annotations is needed, which we will call Linked Media, inspired by the Linked Data movement for making structured, interlinked descriptions of resources better available online. Mobile and tablet devices, as well as connected TVs, introduce novel application domains that will benefit from broad understanding and acceptance of Linked Media standards. In this talk, I will provide an overview of current practices and specification efforts in the domain of video and Web content integration, drawing from the LinkedTV and MediaMixer projects. From this, I will present a vision for a Linked Media layer on the future Web will can empower new media-centric applications in a world of ubiquitous online multimedia.
Interactive Video Search - Tutorial at ACM Multimedia 2015klschoef
This is the presentation given by Klaus Schoeffmann and Frank Hopfgartner at the ACM Multimedia 2015 Tutorial in Brisbane, Australia (October 26, 2015). #acmmm15
Find paper here:
http://dl.acm.org/citation.cfm?id=2807417
Video Browsing - The Need for Interactive Video Search (Talk at CBMI 2014)klschoef
These are the slides from my keynote talk about Video Browsing on June 18, 2014, at the International Workshop on Content-Based Multimedia Indexing (CBMI) 2014.
Remixing Media on the Semantic Web (ISWC 2014 Tutorial) Pt 1 Media Fragment S...LinkedTV
In this session we will introduce the W3C Media Fragment URI specification, highlighting how media fragments can be incorporated into known media description schema, with a focus on the W3C Media Ontology and the Open Annotation Model. We will also discuss extensions to these ontologies to more richly link media fragments to the concepts they represent, re-using Linked Data as a Web-wide knowledge graph about concepts. We will briefly demonstrate various approaches to visual, audio and textual analysis in order to generate meaningful media fragments out of a media resource, as well as look at available annotation tools for semantically describing online media. Finally, we show how existing text around media (subtitles, transcripts) can be used for fragment annotation through Named Entity Recognition services (NERD) and a combined approach for generating a semantic description of media from analysis, metadata and entity recognition (TV2RDF).
Remixing Media on the Semantic Web (ISWC2014 Tutorial) Pt 2 Linked Media: An...LinkedTV
The second session looks at how using Linked Data principles for media fragment annotation publication and retrieval (Linked Media) can enable online media fragment re-use:
Introducing the Linked Media principles
Publishing Linked Media using dedicated multimedia RDF repositories
Retrieval of media resources that illustrate linked data concepts
Using the Linked Data graph to find relevant links between distinct media assets (examples with SPARQL)
Retrieval of links between annotated media to enable topical browsing (using the TVEnricher service)
Examples of Linked Media at scale: VideoLyzard and HyperTED
LinkedTV. Engaging TV viewers with AudioVisual heritage on second screens EUscreen
'LinkedTV. Engaging TV viewers with AudioVisual heritage on second screens' by Lyndon Nixon (MODUL University, Vienna) and Lotte Belice Baltussen (Sound and Vision, Hilversum) - a presentation held at EUscreenXL Rome Conference 'From Audience to User: Engaging with Audiovisual Heritage Online' (http://blog.euscreen.eu/conference-programme).
How Open Data Can Enhance Interactive TelevisionLinkedTV
The presentation was delivered by Lyndon Nixon, STI International Consulting and Research GmbH, Austria, during the ngnlab.eu Workshop http://ngnlab.eu/index.php/ngnlabeu-workshop, held in Bratislava during September 20th, 2012. The workshop was co-located with the 5th joint IFIP Wireless and Mobile Networking Conference (WMNC 2012 http://wmnc.fiit.stuba.sk.
Purpose of the workshop is bringing together researchers and experts from academia as well as from business which came from Germany, Nederlands, Spain, Austria and Slovakia.
Av Relaties: Zoeken En Contextualisering In Linkedtv En AxesLinkedTV
The talk was delivered by by Lotte Belice Baltussen, Sound and Vision at the iMMovator Cross Media Café "Uit het lab", 12 February 2013, the Media Park in Hilversum, The Netherlands.
More information please visit: http://bit.ly/10gYc7L
Implementation of Hyperlinks in videos with HTML5LinkedTV
This presentation was presented by Rolf Fricke, Condat AG, during Xinnovations 2012 in Humboldt University in Berlin on September 11th, 2012.
The main objective of the LinkedTV project is the integration of hyperlinks in videos to open up new possibilities for the interactive, seamless usage of video on the Web. One challenge is the placement of tags and hyperlinks above the video layer, which should be closely associated to the underlying media fragments for persons or objects shown in the video. As the media fragments dynamically appear, move and disappear a precise synchronization of the overlays and related media fragments is needed. We plan to implement these features together with further user interface features on the basis of the HTML5 elements video, CSS3 and web workers. As we target WebTV as well as Broadcast TV, we plan to provide a restricted HbbTV 1.1 implementation for TV-sets, but we finally expect to profit from a HTML5 integration in upcoming HbbTV releases.
Survey of Semantic Media Annotation Tools - towards New Media Applications wi...LinkedTV
Semantic annotation of media resources has been a focus in research since many years, the closing of the "semantic gap" being seen as key to signicant improvements in media retrieval and browsing and
enabling new media applications and services. However, current tools and services exhibit varied approaches which do not easily integrate and
act as a barrier to wider uptake of semantic annotation of online multimedia.
In this paper, we outline the Linked Media principles which can help form a consensus on media annotation approaches, survey current media annotation tools against these principles and present two emerging
toolsets which can support Linked Media conformant annotation, closing with a call to future semantic media annotation tools and services to follow the same principles and ensure the growth of a Linked Media
layer of semantic descriptions of online media which can be an enabler to richer future online media services.
LinkedTV - an added value enrichment solution for AV content providersLinkedTV
Linked Television is offering a solution for audiovisual content owners to semi-automatically enrich media with links to additional information and content related to objects and topics in the program and build client applications which access this data and provide new added value services to consumers.
Kurzer Vortrag über ein Konzept zu einem verlinkten Fernsehdienst über HbbTV. Der Dienst wird im Rahmen des EU-geförderten Forschungs- und Entwicklungsprojekt LinkedTV (www.linkedtv.eu) u.a. beim Rundfunk Berlin-Brandenburg, rbb, in Zusammenarbeit mit internationalen Partnern entwickelt und getestet.
TV newscasts report about the latest event-related facts oc- curring in the world. Relying exclusively on them is, however, insufficient to fully grasp the context of the story being reported. In this paper, we propose an approach that retrieves and analyzes related documents from the Web to automatically generate semantic annotations that provide viewers and experts comprehensive information about the news. Using different Semantic Web and information retrieval techniques, we generate what we call Semantic Snapshot of a Newscast (NSS)
An introduction to HbbTV Hybrid Broadcast Broadband TV. What is it? How does it work? Red button and #HbbTV Service examples.
Presented at Thailand's Engineering Expo November 29, 2014 and at Thailand's Set-top box Committee's Public Hearing at Thailand's Engineering Institute (EIT) November 20, 2014.
View our MVNO SERVICE PRESENTATION for an easy read on some of the MVNO services we offer
★ MVNO/MNO NEGOTIATIONS
✓ Negotiation of MVNO wholesale agreement
✓ Negotiation of additional terms
✓ Contract review and advice
★ PRODUCT AND SALES PLANNING
✓ Product/Service Planning
✓ Market Segmentation and data
✓ Market forecasts and modelling
✓ Marketing Planning and costing
★ DEVELOPMENT OF BUSINESS PLAN
✓ Corporate Strategy
✓ Financial planning and modelling
✓ Investment analysis
✓ Operation planning
✓ Product and Marketing
★ LICENSE APPLICATION
✓ Advise and experience in relation to submitting for MVNO license
✓ Business planning and modelling with a focus on the application
✓ Preparation of License Application documentation as required by the telecom regulator
★ MVNO WORKSHOP
For more information please visit: www.yozzo.com
Yozzo's annual free report and info graphic with: Figures, tables, information and statistics about Thailand's Telecom Market end of 2015 ★
✔ Blended MOU
✔ Blended ARPU
✔ MVNO in Thailand
✔ 4G subscribers in Thailand
✔ Mobile Revenue per/minute
✔ AIS Highlights end of 2015
✔ DTAC Highlights end of 2015
✔ True Move Highlights end of 2015
✔ Mobile Internet usage in Thailand
✔ Thailand’s Mobile Subscriber Growth
✔ Smartphone sales in Thailand 2015
✔ Mobile Operator Market Shares 2015
✔ Amount of smartphone users in Thailand
✔ Types of Internet connections in Thailand
✔ Thailand’s Mobile user’s consumption and more…
¹ MVNO Definition: http://www.yozzo.com/mvno-wiki/mvno-definition
² The History of MVNO | http://www.yozzo.com/mvno-wiki/the-history-of-mvno | August 2016 | Yozzo.com
³ Why MVNOs in Thailand have failed: http://www.yozzo.com/news-and-information/mvno-mobile-operator-s/why-mvnos-in-thailand-have-failed
✔ ส่วนแบ่งการตลาดของบริการโทรศัพท์เคลื่อนที่ (ร้อยละ) : MOBILE MARKET SHARE %
✔ จำนวนผู้ใช้บริการโทรศัพท์เคลื่อนที่ (MOBILE SUBSCRIBERS)
✔ สัดส่วนรายรับของผู้ให้บริการในตลาดบริการโทรศัพท์เคลื่อนที่ ( MOBILE REVENUE % )
✔ อัตราการขยายตัวของผู้ใช้บริการโทรศัพท์เคลื่อนที่ (ร้อยละ) : MOBILE GROWTH RATE %
✔ อัตราการเข้าถึง (การใช้) บริการโทรศัพท์เคลื่อนที่ : MOBILE PENETRATION
✔ รายรับเฉลี่ยจากบริการโทรศัพท์เคลื่อนที่โดยรวมการเชื่อมต่อ (บาท/เลขหมาย/เดือน) : MOBILE ARPU EXCLUDED IC
✔ MOBILE MOU (MINUTE/MONTH)
✔ รายรับเฉลี่ยจากการให้บริการโทรศัพท์เคลื่อนที่ต่อนาที : MOBILE REVENUE PER MINUTE (RPM)
✔ สัดส่วนบริการเสียงและไม่ใช่เสียง (Mobile Non-voice/Voice Ratio)
✔ จำนวนผู้ใช้บริการโทรศัพท์ประจำที่ (Fixed Line Subscribers)
✔ สัดส่วนการเข้าถึง (การใช้) บริการโทรศัพท์ประจำที่ต่อจำนวนประชากร (ร้อยละ) : Fixed Line Penetration per Population %
✔ สัดส่วนการเข้าถึง (การใช้) บริการโทรศัพท์ประจำที่ต่อจำนวนครัวเรือน(ร้อยละ) : Fixed Line Penetration per Household %
✔ เปรียบเทียบสัดส่วนจำนวนผู้ใช้บริการระหว่างโทรศัพท์เคลื่อนที่
Re-using Media on the Web tutorial: Media Fragment Creation and AnnotationMediaMixerCommunity
Explain approaches to visual, audio and textual media analysis to automatically generate meaningful media fragments out of a media resource. Demonstrate latest results in the areas of video fragmentation, visual conceopt and event detection, face detection, object re-detection, and the use of speech recognition and keyword extraction from text for supporting multimedia analysis.
Lecture given on January 28, 2019 to post-graduate students of the Computer Engineering and Media program, at the School of Journalism and Media, Aristotle University of Thessaloniki.
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
A video abstract is a series of moving pictures taken from a lengthier movie that is significantly shorter than the original yet retains the original's essential meaning.
Learn More : https://bit.ly/3JVyrCW
Reference: https://pubrica.com/services/publication-support/Video-Abstract/
Why Pubrica:
When you order our services, we promise you the following – Plagiarism free | always on Time | 24*7 customer support | Written to international Standard | Unlimited Revisions support | Medical writing Expert | Publication Support | Bio statistical experts | High-quality Subject Matter Experts.
Contact us:
Web: https://pubrica.com/
Blog: https://pubrica.com/academy/
Email: sales@pubrica.com
WhatsApp : +91 9884350006
United Kingdom: +44-1618186353
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
A video abstract is a series of moving pictures taken from a lengthier movie that is significantly shorter than the original yet retains the original's essential meaning.
Learn More : https://bit.ly/3JVyrCW
Reference: https://pubrica.com/services/publication-support/Video-Abstract/
Why Pubrica:
When you order our services, we promise you the following – Plagiarism free | always on Time | 24*7 customer support | Written to international Standard | Unlimited Revisions support | Medical writing Expert | Publication Support | Bio statistical experts | High-quality Subject Matter Experts.
Contact us:
Web: https://pubrica.com/
Blog: https://pubrica.com/academy/
Email: sales@pubrica.com
WhatsApp : +91 9884350006
United Kingdom: +44-1618186353
What to curate? Preserving and Curating Software-Based Artneilgrindley
This is a presentation given at the CHArt (Computers and History of Art) conference held in London in November 2011. The slides on the title page are images taken from works exhibited at the V&A Decode exhibition.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Software Architecture: Introduction to the abstraction (May 2014_Split)Henry Muccini
This is an introductory presentation on Software Architecture that I made at the University of Split, in Croatia.
It shows what does it mean abstraction and why it is so important.
Review on content based video lecture retrievaleSAT Journals
Abstract Recent advances in multimedia technologies allow the capture and storage of video data with relatively inexpensive computers. Furthermore, the new possibilities offered by the information highways have made a large amount of video data publicly available. However, without appropriate search techniques all these data are hardly usable. Users are not satisfied with the video retrieval systems that provide analogue VCR functionality. For example, a user analyses a soccer video will ask for specific events such as goals. Content-based search and retrieval of video data becomes a challenging and important problem. Therefore, the need for tools that can be manipulate the video content in the same way as traditional databases manage numeric and textual data is significant. Therefore, a more efficient method for video retrieval in WWW or within large lecture video archives is urgently needed. This project presents an approach for automated video indexing and video search in large lecture video archives. First of all, we apply automatic video segmentation and key-frame detection to offer a visual guideline for the video content navigation. Subsequently, we extract textual metadata by applying video Optical Character Recognition (OCR) technology on key-frames and Automatic Speech Recognition on lecture audio tracks. Keywords—Feature extraction, video annotation, video browsing, video retrieval, video structure analysis
Jiri ece-01-03 adaptive temporal averaging and frame prediction based surveil...Ijripublishers Ijri
Global interconnect planning becomes a challenge as semiconductor technology continuously scales. Because of the increasing wire resistance and higher capacitive coupling in smaller features, the delay of global interconnects becomes large compared with the delay of a logic gate, introducing a huge performance gap that needs to be resolved A novel equalized global link architecture and driver– receiver co design flow are proposed for high-speed and low-energy on-chip communication by utilizing a continuous-time linear equalizer (CTLE). The proposed global link is analyzed using a linear system method, and the formula of CTLE eye opening is derived to provide high-level design guidelines and insights.
Compared with the separate driver–receiver design flow, over 50% energy reduction is observed.
Similar to Video Hyperlinking Tutorial (Part B) (20)
LinkedTV Deliverable 9.3 Final LinkedTV Project ReportLinkedTV
This document comprises the final report of LinkedTV. It includes a publishable summary of the project's scientific results and technological outcomes, a plan for use and dissemination of foreground IP and a list of dissemination activities (publications and events)
LinkedTV Deliverable 6.5 - Final evaluation of the LinkedTV ScenariosLinkedTV
The deliverable presents the results of evaluating the final
scenario demonstrators LinkedNews and LinkedCulture in the LinkedTV project. We tested specifically user satisfaction with the enriched TV experience we enabled for cultural heritage and news TV programs. We also supported the evaluation of other aspects of the LinkedTV technologies in the trials, specifically the personalization and content curation.
LinkedTV Deliverable 5.7 - Validation of the LinkedTV ArchitectureLinkedTV
The LinkedTV architecture lays the foundation for the
LinkedTV system. It consists of the integrating platform for the end-to-end functionality, the backend components and the supporting client components. Since the architecture of a software system has a fundamental impact on quality
attributes, it is important to evaluate its design. The document at hand reports on the validation of the LinkedTV architecture.
LinkedTV Deliverable 4.7 - Contextualisation and personalisation evaluation a...LinkedTV
This deliverable covers all the aspects of evaluation of the overall LinkedTV personalization workflow, as well as re-evaluations of techniques where newer technology and / or algorithmic capacity offer new insight into the general performance. The implicit contextualized personalization workflow, the implicit uncontextualized workflow in the premises of the final LinkedTV application, the advances
in context tracking given new technologies emerged and the outlook of video recommendation beyond LinkedTV is measured and analyzed in this document.
LinkedTV Deliverable 3.8 - Design guideline document for concept-based presen...LinkedTV
This document presents guidelines on how to setup enriched video experiences.
We provide user-centric guidelines on the named entities that should be detected and selected to effectively enrich video news broadcasts. This is presented in the form of a user study.
We selected 5 news videos and manually extracted the
candidate entities from various sources, such as the transcript, visual content and related articles. An expert was asked to also provide interesting entities for the videos. The resulting 99 candidate entities were presented to 50 participants via an online survey. The participants rated the level of interestingness of the entities and the usefulness of
information from Wikipedia about these entities. Analysis of
the results shows that users prefer entities of the type
organization and person and have little interest for entities of the type location. They also indicate that subtitles are not
enough as a source of interesting entities and that the amount of interesting entities can be improved by the combined use of subtitles with entities extracted from related articles or entities suggested by an expert. The expert suggestions showed to be more accurate than any other source of entities. Wikipedia seems to be a suitable source of additional information about the entities in the news, but should be complemented with additional sources.
We provide engineering guidelines on how to present,
aggregate and process content for TV program companion
applications. We describe the content processing pipeline that was developed in WP3 to feed the content for the LinkedNews and Linked Culture demonstrators. This shows how content from the Web can be re-purposed to enrich videos by extracting the core display content and presenting it in a uniform way to the user.
LinkedTV Deliverable 2.7 - Final Linked Media Layer and EvaluationLinkedTV
This deliverable presents the evaluation of content annotation and content enrichment systems that are part of the final tool set developed within the LinkedTV consortium. The evaluations were performed on both the Linked News and Linked Culture trial content, as well as on other content annotated for this purpose. The evaluation spans three languages: German (Linked News), Dutch (Linked
Culture) and English. Selected algorithms and tools were also subject to benchmarking in two international contests: MediaEval 2014 and TAC’14. Additionally, the Microposts 2015 NEEL Challenge is being organized with the support of LinkedTV.
LinkedTV Deliverable 1.6 - Intelligent hypervideo analysis evaluation, final ...LinkedTV
This deliverable describes the conducted evaluation activities for assessing the performance of a number of developed methods for intelligent hypervideo analysis and the usability of the implemented Editor Tool for supporting video annotation and enrichment. Based on the performance evaluations reported in D1.4 regarding a set of LinkedTV analysis components, we extended our experiments for assessing the effectiveness of newer versions of these methods as well as of entirely new techniques, concerning the accuracy and the time efficiency
of the analysis. For this purpose, in-house experiments and participations at international benchmarking activities were made, and the outcomes are reported in this deliverable. Moreover, we present the results of user trials regarding the developed Editor Tool, where groups of experts assessed its usability and the supported functionalities, and
evaluated the usefulness and the accuracy of the implemented video segmentation approaches based on the analysis requirements of the LinkedTV scenarios. By this deliverable we complete the reporting of WP1 evaluations that aimed to assess the efficiency of the developed
multimedia analysis methods throughout the project, according to the analysis requirements of the LinkedTV scenarios.
LinkedTV Deliverable 5.5 - LinkedTV front-end: video player and MediaCanvas A...LinkedTV
The LinkedTV media player and API has evolved from a single player and limited API in version 1 to a toolkit to allow rapid development and creation of different kind of applications within the HTML5 / multiscreen space. The main reason for this transition is that during the course of the Linked TV project different partners had different requirements for their scenarios. Instead of trying to fit all these requirements into one player and, most likely, compromise on the functionalities of the scenarios we wanted to offer something that would allow all partners a satisfiable solution.
Therefore the Springfield Multiscreen Toolkit, or short SMT, has been developed. The aim for the SMT was to allow flexibility for developing multiscreen applications. Also from a commercial point of view a toolkit with examples is more interesting than a pure player as it gives the freedom of developing new ideas with the LinkedTV platform.
LinkedTV tools for Linked Media applications (LIME 2015 workshop talk)LinkedTV
A brief introduction to tools from the LinkedTV project which can be used together to build new media applications based on conceptual linking of media fragments.
LinkedTV Deliverable D4.6 Contextualisation solution and implementationLinkedTV
This deliverable presents the WP4 contextualisation final im-plementation. As contextualization has a high impact on all the other modules of WP4 (especially personalization and recom-mendation), the deliverable intends to provide a picture of the final WP4 workflow implementation.
LinkedTV Deliverable D3.7 User Interfaces selected and refined (version 2)LinkedTV
This report describes the LinkedTV user interfaces. Based on the results user studies and the initial evaluation of the year 2 prototype we selected and refined the interfaces. We selected a single screen application that uses HbbTV technology to provide additional information about a TV program as an overlay on the TV broadcast. In addition, we worked towards TV program companion applications that are tailored for two domains: news and cultural heritage. With these applications we demonstrate different types of interaction modes, such as synchronized content on a second screen, and bookmarking chapters combined with the exploration of related content after the program. The interfaces are built on top of the Multiscreen Toolkit. We created a component-based infrastructure that allows us to quickly create tailored companion applications by reusing and configuring interface components. In the final part of the project we finalize this approach and test it by applying it to a new domain.
LinkedTV Deliverable D2.6 LinkedTV Framework for Generating Video Enrichments...LinkedTV
This deliverable describes the final LinkedTV framework that provides a set of possible enrichment resources for seed video content using techniques such as text and web mining, information extraction and information retrieval technologies. The enrichment content is obtained from four type of sources: a) by crawling and indexing web sites described in a white list specified by the content partners,
b) by querying the API or SPARQL endpoint of the Europeana digital library network which is publicly exposed, c) by querying multiple social networking APIs, d) by hyperlinking to other parts of TV programs within the same collection using a Solr index. This deliverable
also describes an additional content annotation functionality, namely labelling enrichment (as well as seed) content with thematic topics, as well as the process of exposing content annotations to this module and to the filtering services of LinkedTV’s personalization workflow. We illustrate the enrichment workflow for the two main scenarios of LinkedTV which have lead to the development of the LinkedCulture and LinkedNews applications, which respectively use the TVEnricher and TVNewsEnricher enrichment services. The original title of this deliverable from the DoW was Advanced concept labelling by complementary Web mining.
LinkedTV Deliverable D1.5 The Editor Tool, final release LinkedTV
This document reports on the design and implementation of the final version of the editor tool (ET) v2.0, where its purpose is to serve the program editing teams of broadcasters that have adopted LinkedTV’s interactive television solution into their workflow. Two of these teams are currently represented in the LinkedTV project, namely the RBB team and the AVROTROS team (formerly known as AVRO).
The main purpose of the ET is to provide a means to correct and curate automatically generated annotations and hyperlinks created by the audiovisual and textual analysis technologies developed in WP 1 and 2 of the LinkedTV project. Without the intervention of human editors to correct this data, there is a reasonable risk of exposing inappropriate, incorrect or irrelevant information to the viewers of a LinkedTV interactive broadcast.
LinkedTV Deliverable D1.4 Visual, text and audio information analysis for hyp...LinkedTV
Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the
appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including
(a) the improvement of a number of methods in terms of accuracy and time efficiency,
(b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform.
LinkedTV D8.6 Market and Product Survey for LinkedTV Services and TechnologyLinkedTV
D8.6 presents the results of the market analysis for LinkedTV products and services and consists of
two parts: an overall analysis of current and future
developments in the TV and digital video market and a specific market analysis of potential LinkedTV customers and competitors. Based on the market analysis it was possible to provide a first rough estimation of the LinkedTV market potential and to position LinkedTV on the market.
This deliverable presents the LinkedTV Public Demonstrator which will be an online, publicly accessible Website collecting showcases of the key project outputs which form together our LinkedTV solution: the Editor Tool, Platform and Player, complemented by demonstrations of the provision of this solution for the content of two European broadcasters: the LinkedCulture and LinkedNews scenario demonstrators.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Knowledge engineering: from people to machines and back
Video Hyperlinking Tutorial (Part B)
1. Video Hyperlinking
Part B: Video Fragment Creation and Annotation
for Hyperlinking
Vasileios Mezaris
Information Technologies Institute (ITI)
Centre for Research and Technology Hellas (CERTH)
IEEE ICIP’14 Tutorial, Oct. 2014
ACM MM’14 Tutorial, Nov. 2014
Information Technologies Institute
Centre for Research and Technology Hellas
2. Overview
• Introduction – overall motivation
• Technologies for media fragment creation and annotation
– Video temporal segmentation to shots
– Video temporal segmentation to scenes
– Visual concept detection
– Event detection
– Other potentially relevant technologies
• Concluding remarks
For each presented technology, we go through:
– Problem statement
– Brief overview of the literature
– A closer look at one or two approaches
– Indicative experiments and results, demos
– Conclusions
– Additional reading (references)
Information Technologies Institute 2.2
Centre for Research and Technology Hellas
3. Introduction - motivation
• The text hyperlinking analogy
A conference website
(multiple web pages)
Origin of hyperlinks
Target
Information Technologies Institute 2.3
Centre for Research and Technology Hellas
4. Introduction - motivation
• The text hyperlinking analogy
– Web sites accommodating hyperlinks, and the origins and targets of these
hyperlinks represent information at different levels of granularity
Web site: a collection of interlinked web
pages, typically by the same creator(s),
altogether serving the same purpose
Origin of a hyperlink: a very elementary
text (a single word or a short phrase);
an iconic representation of a topic
Target of a hyperlink: a focused, yet
explanatory and understandable even
on its own piece of information
An entire video
*Practical restriction: how do you
select a 2D image that appears on
your screen for e.g. 1/25 sec?
A very elementary
part of the video
(a 2D image? a shot?)*
A meaningful story-telling
part of the video
(scene / chapter / topic)
Information Technologies Institute 2.4
Centre for Research and Technology Hellas
5. Introduction - motivation
• We have: media items (videos)
• We want to: perform video hyperlinking
• Thus, we need to:
– Break down each video to fragments that can serve as the origin and the
target of hyperlinks
– Annotate each fragment so that we can identify meaningful pairs of fragments
(i.e., identify two fragments as the origin and the target of the same link)
• We could do this either
– Manually: + accuracy, - speed, cost, effort only feasible for low-volume,
high-value media items
– Automatically: - accuracy, + speed, cost, effort the only option for handling
high volume of content
– …as is often the case, in many real-world applications finding middle ground
also makes sense: try to automate the process as much as possible, but still
keep the human in the loop
Information Technologies Institute 2.5
Centre for Research and Technology Hellas
6. Introduction - motivation
• We look into automatic analysis techniques in this part of the talk
• Break down each video to fragments that can serve as the origin and the
target of hyperlinks
– Video temporal segmentation to shots
– Video temporal segmentation to scenes
– Other potentially relevant technologies: object re-detection
• Annotate each fragment so that we can identify meaningful pairs of
fragments (i.e., identify two fragments as the origin and the target of the
same link)
– Visual concept detection
– Event detection
– Other potentially relevant technologies: near-duplicate detection
Information Technologies Institute 2.6
Centre for Research and Technology Hellas
7. Fragment creation: shots
• What is a shot: a sequence of consecutive frames taken without
interruption by a single camera
• Shot segmentation
– Temporal decomposition of videos by detecting the boundaries or the
changes between the shots
– Shots can serve as the origins of hyperlinks
– Is an important pre-processing step for further analysis (scene segmentation,
concept/event detection, object re-detection)
• Shot change is manifested by a shift in visual content
– Two basic types of transitions: ABRUPT transitions
Shot k Shot k+1
Information Technologies Institute 2.7
Centre for Research and Technology Hellas
8. Fragment creation: shots
GRADUAL transitions
Dissolve
Transition effects
Shot k Shot k+1
Wipe
Shot m Shot m+1
Fade in – Fade out
Shot n Shot n+1
Information Technologies Institute 2.8
Centre for Research and Technology Hellas
9. Fragment creation: shots
• The analysis method should be robust to artifacts such as illumination
changes, fast camera movement, rapid local (object) motion
Sudden illumination change
Camera motion
Partial occlusion due to object motion (car)
Information Technologies Institute 2.9
Centre for Research and Technology Hellas
10. Related work
• Can generally be organized according to
– Data to work with: uncompressed vs. compressed video
– Features to use (also depends on the data)
– Threshold-based vs. learning-based methods
• Compressed video methods
– Reduce computational complexity by avoiding decoding, exploiting encoder
results
• Macroblock information of specific frames (e.g. intra-coded, skipped)
• DC coefficients of the compressed images
• Motion vectors included in the compressed data stream
– Generally, very fast but not as accurate as uncompressed video methods
Information Technologies Institute 2.10
Centre for Research and Technology Hellas
11. Related work
• Uncompressed video methods
– Pair-wise pixel comparisons
– Global visual feature comparisons (e.g. color histogram comparison)
– Edge-based approaches, e.g. evaluating an edge change ratio
– Motion-based approaches
– Local visual features / Bag of Visual Words
• Some features more computationally expensive than others
– Deciding using experimentally-defined thresholds: often hard to tune the
alternative is to use machine learning techniques (often Support Vector
Machines (SVMs)) for learning from different features
• General remark: high detection accuracy and relatively low computational
complexity are possible when working with uncompressed data
Information Technologies Institute 2.11
Centre for Research and Technology Hellas
12. An indicative approach
• Uncompressed video approach (based on 1)
• Detects both abrupt and gradual shot transitions by:
– Extracting global and local visual features (HSV histograms, ORB descriptors)
for every frame and evaluating the differences between every pair of
consecutives frames (abrupt transitions)
– Analyzing the sequence of similarity scores, to identify patterns that
correspond to progressive changes of the visual content and then applying
dissolve, wipe and movement detectors (gradual transitions)
1 E. Apostolidis, V. Mezaris, "Fast Shot Segmentation Combining Global and Local Visual Descriptors", Proc. IEEE Int.
Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014.
Information Technologies Institute 2.12
Centre for Research and Technology Hellas
13. An indicative approach
• Gradual transition detection
– Step 1: pair-wise frame similarity scores F(x) (blue curve) moving average G(k) (red)
– Step 2: compute first order deriv. of G(k) and store local minima, maxima in Gmin, Gmax
– Step 3: If frame k is the p-th element in Gmin, compute
– how
D(k) F(Gmin(p))F(Gmax(p1)) F(Gmin(p))F(Gmax(p))
– Step 4: Compute C(k) (green) as the sum of the values of D(k) within a sliding window
– Step 5: Potential gradual transitions are identified by thresholding C(k)
– Detection: Gradual transitions are defined by evaluating each potential transition using
(a) dissolve detector, (b) wipe detection and (c) detector for object/camera movement
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
Correct
detection
Erroneous
detection
2370
2374
2378
2382
2386
2390
2394
2398
2402
2406
2410
2414
2418
2422
2426
2430
2434
2438
2442
2446
2450
2454
2458
2462
2466
2470
2474
2478
2482
2486
2490
2494
2498
2502
2506
2510
2514
2518
2522
2526
2530
2534
2538
2542
2546
2550
2554
2558
2562
2566
2570
2574
2578
2582
2586
2590
2594
2598
2602
2606
2610
2614
2618
2622
2626
2630
Calculated similarity scores Moving average of similarity scores Dissimilarity values of detected shot boundaries
Information Technologies Institute 2.13
Centre for Research and Technology Hellas
14. Indicative experiments and results
• Dataset
– About 7 hours of video
• 150 min. of news shows
• 140 min. of cultural heritage shows
• 140 min. of various other genres
• Ground-truth (generated via manual annotation)
– 3647 shot changes
• 3216 abrupt transitions
• 431 gradual transitions
• System specifications
– Intel Core i7 processor at 3.4GHz
– 8GB RAM memory
Information Technologies Institute 2.14
Centre for Research and Technology Hellas
15. Indicative experiments and results
• Detection accuracy expressed in terms of:
– Precision (P): the fraction of detected shots that correspond to actual shots of
the videos
– Recall (R): the fraction of actual shots of the videos, that have been
successfully detected
– F-Score: 2(PR)/(P+R)
• Time performance: runs ~ 7-8 times
faster than real-time (utilizing multi-threading)
Experimental Results
Precision 94.3 %
Recall 94.1 %
F-Score 0.942
• Demo video http://www.youtube.com/watch?v=0IeVkXRTYu8
• Software available at http://mklab.iti.gr/project/video-shot-segm
Information Technologies Institute 2.15
Centre for Research and Technology Hellas
16. Shot detection conclusions
• Overall accuracy of shot detection methods is high, sufficient for any
application
• Detection of abrupt transitions is very easy; gradual transitions & handling
of intense motion still a bit more challenging
• Several times faster-than-real-time processing is feasible
• More elaborate motion-based analysis is still possible (but would increase
the computational load)
Information Technologies Institute 2.16
Centre for Research and Technology Hellas
17. Shot detection: additional reading
• E. Apostolidis, V. Mezaris, "Fast Shot Segmentation Combining Global and Local Visual Descriptors", Proc. IEEE
Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014.
• E. Tsamoura, V. Mezaris, and I. Kompatsiaris, “Gradual transition detection using color coherence and other
criteria in a video shot meta-segmentation framework”, Proc. 15th IEEE Int. Conf. on Image Processing, 2008.
• V. Chasanis, A. Likas, and N. Galatsanos, “Simultaneous detection of abrupt cuts and dissolves in videos using
support vector machines,” Pattern Recogn. Lett., vol. 30, no. 1, pp. 55–65, Jan. 2009.
• Z. Qu, Y. Liu, L. Ren, Y. Chen, and R. Zheng, “A method of shot detection based on color and edge features,”
Proc. 1st IEEE Symposium on Web Society, 2009. SWS ’09. 2009, pp. 1–4.
• J. Lankinen and J.-K. Kamarainen, “Video shot boundary detection using visual bag-of-words,” Proc. Int. Conf.
on Computer Vision Theory and Applications (VISAPP), Barcelona, Spain, 2013.
• J. Li, Y. Ding, Y. Shi, and W. Li, “A divide-and-rule scheme for shot boundary detection based on sift,” JDCTA,
pp. 202–214, 2010.
• C.-W. Su, H.-Y.M. Liao, H.-R. Tyan, K.-C. Fan, L.-H. Chen, “A motion-tolerant dissolve detection algorithm”, IEEE
Transactions on Multimedia, vol. 7, no. 6, pp. 1106–1113, 2005.
• K. D. Seo, S. Park, S. H. Jung, “Wipe scene-change detector based on visual rhythm spectrum”, IEEE
Transactions on Consumer Electronics, vol. 55, no. 2, pp. 831–838, May 2009.
• D. Lelescu and D. Schonfeld, “Statistical sequential analysis for real-time video scene change detection on
compressed multimedia bitstream”, IEEE Transactions on Multimedia,,vol. 5, no. 1, pp. 106–117, 2003.
• J. H. Nam and A.H. Tewfik, “Detection of gradual transitions in video sequences using b-spline interpolation”,
IEEE Transactions on Multimedia, vol. 7, no. 4, pp. 667–679, 2005.
• C. Grana and R. Cucchiara, “Linear transition detection as a unified shot detection approach”, IEEE
Transactions on Circuits and Systems for Video Technology, vol. 17, no. 4, pp. 483–489, 2007.
• Z. Cernekova, N. Nikolaidis, and I. Pitas, “Temporal video segmentation by graph partitioning”, Proc. IEEE Int.
Conf. on Acoustics, Speech and Signal Processing, ICASSP 2006.
Information Technologies Institute 2.17
Centre for Research and Technology Hellas
18. Fragment creation: scenes
• What is a scene: a higher-level temporal video segment that is elementary
in terms of semantic content, covering either a single event or several
related events taking place in parallel
• Scene segmentation: temporal decomposition of videos into such basic
• Scene change is not manifested by just a change in visual content
Scene Change?
story-telling units
Information Technologies Institute 2.18
Centre for Research and Technology Hellas
19. Problem statement
• Basic assumptions
– A shot cannot belong to more than one scenes
– Scene boundaries are a subset of the visual shot boundaries of the video
• Then, scene segmentation can (and typically is) performed by
• Shot segmentation, and
• Shot grouping
Shot boundaries
Inter-shot similarity
One
scene
Information Technologies Institute 2.19
Centre for Research and Technology Hellas
20. Related work
• Can generally be organized according to
– Data to work with: uni-modal vs. multi-modal
– Dependence or not on specific-domain knowledge; domain of choice
– Algorithms used
• Uni-modal vs. multi-modal
– Uni-modal methods use one type of information, typically visual cues
– Multi-modal ones may combine visual cues, audio, speech transcripts, …
• Domain-specific vs. domain-independent
– Domain-independent methods are generally applicable
– News-domain (e.g. using knowledge of news structure), TV broadcast domain
(e.g. based on advertisement detection), etc.
• Algorithms
– Graph-based, e.g. the Scene Transition Graph
– Clustering-based, e.g. using hierarchical clustering
– Based on statistical methods, e.g. on Markov Chain Monte Carlo (MCMC)
Information Technologies Institute 2.20
Centre for Research and Technology Hellas
21. An indicative approach
• Based on the Scene Transition Graph (STG) algorithm 2
Shot 115 Shot 116 Shot 117
Shot 120 Shot 119 Shot 118
Shot 121
cut edge
Shots 116 & 118
Shots 115 & 120
Shots 117 & 119
Shot 121
2 M. Yeung, B.-L. Yeo, and B. Liu. Segmentation of video by clustering and graph analysis. Computer Vision Image
Understanding, 71(1):94–109, July 1998.
Information Technologies Institute 2.21
Centre for Research and Technology Hellas
22. An indicative approach
• Introduces two
extensions of the STG 3
– Fast STG
approximation
(scenes as convex
sets of shots; linking
transitivity rules)
– Generalized STG
(probabilistic merging
of multiple STGs
created with
different parameter
values, different
features)
3 P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso. Temporal video segmentation
to scenes using high-level audiovisual features. IEEE Transactions on Circuits and Systems for Video Technology,
21(8):1163 –1177, August 2011.
Information Technologies Institute 2.22
Centre for Research and Technology Hellas
23. Indicative experiments and results
• Dataset
– 513 min. of documentaries (A)
– 643 min. of movies (B)
• Ground-truth (generated via manual annotation)
– 3459 (in A) + 6665 (in B) = 10125 shot changes
– 525 (in A) + 357 (in B) = 882 scene changes
• System specifications
– Intel Core i7 processor at 3.4GHz
– 8GB RAM memory
Information Technologies Institute 2.23
Centre for Research and Technology Hellas
24. Indicative experiments and results
• Detection accuracy expressed in terms of:
– Coverage (C): to what extent frames belonging to the same scene are
correctly grouped together (optimal value 100%)
– Overflow (O): the quantity of frames that, although not belonging to the same
scene, are erroneously grouped together (optimal value 0%)
– F-Score = 2C(1−O)/(C+(1−O))
Coverage (%) Overflow (%) F-Score (%)
Documentaries 76.96 20.80 78.06
Movies 73.55 26.11 73.72
• Time performance: the algorithm runs in 0,015x real time (i.e., ~67 times
faster than real-time), as long as the features have been extracted
• Software available at http://mklab.iti.gr/project/video-shot-segm (using low-level
visual features only)
Information Technologies Institute 2.24
Centre for Research and Technology Hellas
25. Indicative experiments and results
• Contribution of different modalities (on a different dataset)
V: low-level visual
features
VC: visual concept
detection results
A: low-level audio
features
AE: audio event
detection results
Information Technologies Institute 2.25
Centre for Research and Technology Hellas
26. Scene detection conclusions
• Automatic scene segmentation less accurate than shot segmentation…
• …but the results are good enough for improving access to meaningful
fragments in various applications (e.g. retrieval, video hyperlinking)
• Using more than just low-level visual features helps a lot
• The choice of domain-specific vs. domain-independent method should be
taken seriously
Information Technologies Institute 2.26
Centre for Research and Technology Hellas
27. Scene detection: additional reading
• M. Yeung, B.-L. Yeo, and B. Liu, “Segmentation of video by clustering and graph analysis”, Computer Vision Image
Understanding, vol. 71, no. 1, pp. 94–109, July 1998.
• P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, H. Meinedo, M. Bugalho, and I. Trancoso, “Temporal video segmentation to
scenes using high-level audiovisual features”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 21,
no. 8, pp. 1163 –1177, August 2011.
• P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, J. Kittler, “Differential Edit Distance: A metric for scene segmentation
evaluation”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 6, pp. 904-914, June 2012.
• Z. Rasheed and M. Shah, “Detection and representation of scenes in videos”, IEEE Transactions on Multimedia, vol. 7,
no. 6, pp. 1097–1105, December 2005.
• C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, “Video summarization and scene detection by graph modeling”, IEEE Transactions
on Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 296–305, February 2005.
• Y. Zhao, T. Wang, P. Wang, W. Hu, Y. Du, Y. Zhang, G. Xu, “Scene segmentation and categorization using N-cuts”, IEEE
Conf. on Computer Vision and Pattern Recognition, 2007.
• X. Zhu, A.K. Elmagarmid, X. Xue, L. Wu, and A.C. Catlin, “Insightvideo: toward hierarchical video content organization for
efficient browsing, summarization and retrieval”, IEEE Transactions on Multimedia, vol. 7, no. 4, pp. 648– 666, August
2005.
• B. T. Truong, S. Venkatesh, and C. Dorai, “Scene extraction in motion pictures”, IEEE Transactions on Circuits and
Systems for Video Technology, vol. 13, no. 1, pp. 5–15, January 2003.
• X.-S. Hua, L. Lu, and H.-J. Zhang, “Optimization-based automated home video editing system”, IEEE Transactions on
Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 572–583, May 2004.
• D. Gatica-Perez, A. Loui, and M.-T. Sun, “Finding structure in consumer videos by probabilistic hierarchical clustering”,
IEEE Transactions on Circuits and Systems for Video Technology, 2002.
Information Technologies Institute 2.27
Centre for Research and Technology Hellas
28. Scene detection: additional reading
• J. Liao and B. Zhang, “A robust clustering algorithm for video shots using haar wavelet transformation”, Proc.
SIGMOD2007 Workshop on Innovative Database Research (IDAR2007), Beijing, China, June 2007.
• Y. Zhai and M. Shah, “Video scene segmentation using markov chain monte carlo”, IEEE Transactions on Multimedia,
vol. 8, no. 4, pp. 686–697, August 2006.
• M. Sugano, K. Hoashi, K. Matsumoto, and Y. Nakajima, “Shot boundary determination on MPEG compressed domain
and story segmentation experiments for trecvid 2004, in trec video retrieval evaluation forum”, Proc. TREC Video
Retrieval Evaluation (TRECVID), Washington, DC, USA, pp. 109–120, 2004.
• L. Xie, P. Xu, S.-F. Chang, A. Divakaran, and H. Sun, “Structure analysis of soccer video with domain knowledge and
hidden markov models”, Pattern Recogn. Lett., vol. 25, no. 7, pp. 767–775, May 2004.
• H. Lu, Z. Li, and Y.-P. Tan, “Model-based video scene clustering with noise analysis”, Proc. Int. Symposium on Circuits
and Systems, ISCAS ’04, vol. 2, pp. 105–108, May 2004.
• Y. Ariki, M. Kumano, and K. Tsukada, “Highlight scene extraction in real time from baseball live video”, Proc. 5th ACM
SIGMM international workshop on Multimedia information retrieval, MIR ’03, pp. 209–214, New York, NY, USA, 2003.
• U. Iurgel, R. Meermeier, S. Eickeler, and G. Rigoll, “New approaches to audio-visual segmentation of tv news for
automatic topic retrieval”, Proc. IEEE Int. Conf. on the Acoustics, Speech, and Signal Processing (ICASSP ’01), vol. 3, pp.
1397–1400, Washington, DC, USA, 2001.
• Y. Cao, W. Tavanapong, K. Kim, and J.H. Oh, “Audio-assisted scene segmentation for story browsing”, Proc. 2nd Int.
Conf. on Image and Video Retrieval, CIVR’03, pp. 446–455, Springer-Verlag.
• A. Velivelli, C.-W. Ngo, and T. S. Huang, “Detection of documentary scene changes by audio-visual fusion”, Proc. 2nd Int.
Conf. on Image and Video Retrieval, CIVR’03, pp. 227–238, Springer-Verlag.
Information Technologies Institute 2.28
Centre for Research and Technology Hellas
29. Fragment annotation: visual
concept detection
• Goal: assign one or more semantic concepts to temporal video fragments
(typically, shots), from a predefined concept list
– Input: visual fragment or representative visual information (e.g. keyframes)
– Output: concept labels and associated confidence scores (DoC)
• Application: find similar fragments to interlink (concept-based annotation,
image/video search and retrieval; also clustering, summarization, further
analysis e.g. event detection)
• Concept detection is challenging: semantic gap, annotation effort,
computational requirements,…
hand: 0.97,
sky: 0.93,
sea: 0.91,
boat: 0.91, …
Sample
keyframe
Information Technologies Institute 2.29
Centre for Research and Technology Hellas
30. Related work
• Typical concept detection system: made of independent concept detectors
– Feature extraction (typically local features; choices of IP detectors/ descriptors)
– Feature encoding (Bag of Words, Fisher vectors, VLAD,…)
– Training/classification (supervised learning; need for annotated training data)
– Late fusion and output score post-processing
• How to build a competitive system
– Use color-, rotation- , scale- invariant descriptors; SoA encoding of them
– Fuse multiple descriptors and machine learning-based detectors
– Exploit concept correlations (e.g., sun & sky often appearing together)
– Exploit temporal information (videos)
Information Technologies Institute 2.30
Centre for Research and Technology Hellas
31. • Feature extraction
– Visual features
Related work
• Global vs. local
• Popular local descriptors SIFT, Color SIFT , SURF; also look at binary local descriptors
• Interest point detection: Harris-Laplace, Hessian, dense sampling
• Based on Deep Learning / Convolutional Neural Networks (e.g. OverFeat, Caffe)
– Motion features (STIP, MoSIFT, feature trajectories,…)
– Others modalities (text, audio): of limited use
• Feature encoding
– Bag-of-words (BoW): codebook construction (K-means); hard/soft assignment
– Fisher vectors: characterize each keyframe by a gradient vector
– Others: VLAD, Super Vector (SV),…
– Capture spatial information: Pyramidal decomposition,…
– Capture information at multiple scales: Pyramid Histogram Of visual Words
(PHOW)
Information Technologies Institute 2.31
Centre for Research and Technology Hellas
32. • Machine learning
Related work
– Binary classification: Support Vector Machines (SVMs), Logistic Regression,…
• Linear vs. kernel methods
– Multi-label learning approaches; Stacking
• Fusion
– Of features (early / late)
– Of detectors
• Post-processing: temporal re-ranking,…
Information Technologies Institute 2.32
Centre for Research and Technology Hellas
33. An indicative approach
• Concept detection using a two-layer stacking architecture
– 1st layer: Build multiple fast and independent concept detectors
• Use keyframes, and visual tomographs to capture the motion information
• Extract high-dimensional feature vectors
• Train Linear Support Vector Machine / Logistic Regression classifiers
• Easily scalable but does not capture concept correlations
– 2nd layer: Exploit concept correlations and refine the scores
• Construct low-dimensional model vectors
• Use multi-label learning to capture the concept correlations (e.g. ML-kNN or Label
Powerset algorithms)
• Use temporal re-ranking to exploit temporal information
Information Technologies Institute 2.33
Centre for Research and Technology Hellas
34. Video tomographs
• Video tomographs: 2-dimensional
slices with one dimension in time
and one dimension in space
• We extract two tomographs; use
them together with keyframes
• The two tomographs are processed
in the same way as keyframes 4
(in latest experiments, just 3 SIFT
or color-SIFT variants, with dense
sampling and VLAD encoding)
4 P. Sidiropoulos, V. Mezaris, I. Kompatsiaris, "Video tomographs and a base detector selection strategy for improving
large-scale video concept detection", IEEE Trans. on Circuits and Systems for Video Tech., 24(7), pp. 1251-1264, July 2014.
Information Technologies Institute 2.34
Centre for Research and Technology Hellas
35. 2nd-layer of stacking architecture
The proposed
2nd layer
The first
layer
(baseline)
Information Technologies Institute 2.35
Centre for Research and Technology Hellas
36. 2nd-layer of stacking architecture
• Exploit concept correlations
– Construct model vectors by concatenating the responses of the concept
detectors on a validation set
– Use them to train 2nd-layer classifier
• ML-kNN as the 2nd-layer classifier 5
– A lazy style multi-label learning algorithm; uses label correlations in the
neighbourhood of the tested instance to infer posterior probabilities.
• Label Powersets (LP) as the 2nd-layer classifier 5
– Searches for subsets of labels that appear together in the training set;
considers each set as a separate class in order to solve a multi-class problem
5 F. Markatopoulou, V. Mezaris, I. Kompatsiaris. A Comparative Study on the Use of Multi-Label Classification
Techniques for Concept-Based Video Indexing and Annotation. Proc. 20th Int. Conf. on Multimedia Modeling
(MMM'14), Jan. 2014.
Information Technologies Institute 2.36
Centre for Research and Technology Hellas
37. Video tomographs: indicative results
• Experimental setup
– The TRECVID Semantic Indexing Task: Using the concept detectors retrieve for
each concept a ranked list of 2000 test shots that are mostly related with it
– Dataset: TRECVID 2013 (~800 and ~200 hours of internet archive videos for
training and testing), 38 concepts (13 of them motion-related)
– Evaluation: Mean Extended Inferred Average Precision (MxinfAP)
Information Technologies Institute 2.37
Centre for Research and Technology Hellas
38. Two-layer stacking: indicative results
• Experimental setup
– Indexing: for a concept, assess how well the top retrieved shots relate to it
– Annotation: for a shot, assess how well the top retrieved concepts describe it
– Dataset: TRECVID 2013 (internet videos; ~800 hours for training; ~200 hours
for testing)
– Input: model vectors for 346 concepts
– Output: refined scores for 38 concepts
– Evaluation: Mean Extended Inferred Average Precision (MXinfAP), Mean
Precision at depth 3 (MP@3)
MXinfAP (%) (indexing) MAP@3 (%) (annotation)
Method 2nd layer 1st and 2nd layer 2nd layer 1st and 2nd layer
Baseline 23.53 23.53 77.66 77.66
LP 24.2 (+2.8%) 24.72 (+5.1%) 80.98 (+4.3%) 79.1 (+1.9%)
ML-kNN 18.1 (-23.1%) 23.61 (+0.3%) 77.01 (-0.8%) 79.34 (+2.2%)
Information Technologies Institute 2.38
Centre for Research and Technology Hellas
39. Concept detection conclusions
• Concept detection has progressed a lot
• Results far from perfect; yet, already useful in a variety of applications
(hyperlinking, retrieval, further analysis of fragments)
• Motion information is important (but, extracting traditional motion
descriptors more computationally expensive than working with
keyframes / tomographs)
• Linear SVMs very popular (due to the size of the problem)
• Exploiting concept correlations is very important
• Computationally-efficient concept detection, considering hundreds or
thousands of concepts, is still a challenge
• We are using binary local descriptors for real-time detection of a few
hundred concepts simultaneously, with good results (but, is real-time
always fast enough?)
Demo: http://multimedia.iti.gr/mediamixer/demonstrator.html
Information Technologies Institute 2.39
Centre for Research and Technology Hellas
40. Concept detection: additional reading
• P. Sidiropoulos, V. Mezaris, and I. Kompatsiaris, “Enhancing video concept detection with the use of tomographs”, Proc. IEEE
Int. Conference. on Image Processing (ICIP 2013), Melbourne, Australia, 2013.
• F. Markatopoulou, V. Mezaris, I. Kompatsiaris, “A Comparative Study on the Use of Multi-Label Classification
Techniques for Concept-Based Video Indexing and Annotation”, Proc. 20th Int. Conf. on Multimedia Modeling (MMM'14),
Jan. 2014.
• C. Snoek, M. Worringm, “Concept-Based Video Retrieval”, in Foundations and Trends in Information Retrieval, vol. 2, no. 4,
pp. 215–322, 2009.
• A. W. M. Smeulders, M.Worring, S. Santini, A.Gupta, and R. Jain, “Content- based image retrieval at the end of the early
years”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 1349–1380, 2000.
• G. Nasierding, A. Kouzani, “Empirical Study of Multi-label Classification Methods for Image Annotation and Retrieval”, Proc.
2010 Int. Conf. on Digital Image Computing: Techniques and Applications, China, pp. 617–622.
• G.J. Qi, X.S. Hua, Y. Rui, J. Tang, T. Mei, H. Zhang, “Correlative multi-label video annotation”, Proc. 15th Int. Conf. on
Multimedia, pp. 17–26.
• M.-L. Zhang and Z.-H. Zhou, “ML-KNN: A lazy learning approach to multi-label learning”, Pattern Recognition, vol. 40, no. 7,
2007.
• A. F. Smeaton, P. Over, and W. Kraaij, “Evaluation campaigns and TRECVid”, Proc. 8th ACM Int. Workshop on Multimedia
Information Retrieval (MIR '06), pp. 321-330, 2006.
• B. Safadi and G. Quenot, “Re-ranking by Local Re-Scoring for Video Indexing and Retrieval”, in C. Macdonald, I. Ounis, and I.
Ruthven, editors, CIKM, pp. 2081-2084, 2011.
• J. C. Van Gemert, C. J. Veenman, A. Smeulders, J.-M. Geusebroek, “Visual word ambiguity”, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1271–1283, 2010.
• G. Csurka and F. Perronnin, “Fisher vectors: Beyond bag-of visual-words image representations”, Computer Vision, Imaging
and Computer Graphics. Theory and Applications. Springer CCIS vol. 229, 2011.
• H. Jegou, F. Perronnin, M. Douze, J. Sanchez, P. Perez, and C. Schmid, “Aggregating local image descriptors into compact
codes”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 34, no. 9, pp. 1704–1716, 2012.
Information Technologies Institute 2.40
Centre for Research and Technology Hellas
41. Fragment annotation: event detection
• Extending visual concept detection results with more
elaborate annotations: event labels
hands, sky,
sea, boat, …
Fishing in
Toroneos
Bay
Information Technologies Institute 2.41
Centre for Research and Technology Hellas
42. Problem statement
• Objective:
– Automatically detect high-level events in large video collections
– Events are defined as “purposeful activities, involving people, acting on
objects and interacting with each other to achieve some result”
“Changing a vehicle tire” “Skiing” “Working on a sewing project”
– Exploit an annotated video dataset
– Represent videos with suitable feature vectors {(x ,y) є X x {-1,1}}
– Learn an appropriate event detector f: X → [0,1]
Information Technologies Institute 2.42
Centre for Research and Technology Hellas
43. Related work
• Low-level feature-based approaches
– Extract one or more low-level features (SIFT, MoSIFT, LFCC, ASR-based, etc.)
– Combine features (late fusion, early fusion, etc.)
– Motion visual features usually offer the most significant information
• Model vector-based approaches
– Exploit a semantic model vector (i.e., automatic visual concept detection
results) as a feature
– The inspiration behind this approach is that high-level events can be better
recognized by looking at their constituting semantic entities
– This video representation can also be useful for explaining why a specific
event was detected
• Hybrid approaches: combination of low-level features and model vectors
– Experimental results show improved event detection performance when low-level
features and model vectors are combined
Information Technologies Institute 2.43
Centre for Research and Technology Hellas
44. An indicative approach
Model vector representation
• Temporal video segmentation
– Shot segmentation, keyframes at fixed time intervals, etc.
• Low-level visual feature extraction
• Application of a set of trained visual concept detectors
– TRECVID SIN task, SVM-based
– Detectors may be seemingly irrelevant to the sought events
• A visual model vector is formed for each video keyframe
fishing
vehicle
– Concatenate the responses (confidence scores) of all the detectors
• Video representation
– Sequence of model vectors or an overall model vector, e.g., averaging the
model vectors for all shots of a video
0.51
0.33
...
0.91
...
0.47
Information Technologies Institute 2.44
Centre for Research and Technology Hellas
45. An indicative approach
Event detection using Discriminant Analysis (DA)
• Use a DA algorithm to extract the most significant concept information for
the detection of the target events 6, 7
– Mixture Subclass DA (MSDA): Linear or kernel-based method
– Generalized subclass DA (GSDA): fast kernel-based method
• Kernel subclass methods
– Identify a nonlinear feature transformation and a suitable subclass partition in
order to map the data into a new space
– In this space subclasses belonging to different classes are expected to be
linearly separable
2,1 2,2 2,1 2,2
1,1 1,2 1,1 1,2
6 N. Gkalelis, V. Mezaris, I. Kompatsiaris, T. Stathaki, "Mixture subclass discriminant analysis link to restricted Gaussian
model and other generalizations", IEEE Trans. on Neural Networks and Learning Systems, vol. 24, no. 1, pp. 8-21, Jan. 2013.
7 N. Gkalelis, V. Mezaris, "Video event detection using generalized subclass discriminant analysis and linear support vector
machines", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR), Glasgow, UK, April 2014.
Information Technologies Institute 2.45
Centre for Research and Technology Hellas
46. An indicative approach
• Following DA, Linear Support Vector Machine classifiers are used for
learning each event 7
• Advantages
– The effect of noise or irrelevant features to the classification problem is
minimized → classification performance may be improved
– Classifier is applied in the lower dimensional subspace → classifier training and
testing times are reduced
– The application of kernel subclass DA yields linear separable classes (or
subclasses) → linear classifiers may be used (better generalization, faster)
2,1 2,2 2,1 2,2
1,1 1,2 1,1 1,2
7 N. Gkalelis, V. Mezaris, "Video event detection using generalized subclass discriminant analysis and linear support vector
machines", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR), Glasgow, UK, April 2014.
Information Technologies Institute 2.46
Centre for Research and Technology Hellas
47. Indicative experiments and results
• TRECVID 2010
– 3 target events
– Development: 1745 videos
– Evaluation: 1742 videos
• TRECVID 2012
– A publicly available annotated
subset is used 8
– 25 target events
– Development: 8840 videos
– Evaluation: 4434 videos
8 A. Habibian, K.E.A. van de Sande, C.G.M. Snoek ,
"Recommendations for video event recognition using
concept vocabularies", Proc. ICMR 2013, pp. 89-96.
Information Technologies Institute 2.47
Centre for Research and Technology Hellas
48. Indicative experiments and results
TRECVID 2010 TRECVID 2012
• In learning one MED 2012 event,
GSDA was ~25% faster than
traditional Kernel SDA
• Train:
• Testing times: similar for GSDA-LSVM
and KSVM
• We have developed a GSDA-LSVM
extension 60x faster than LSVM in
training, for large-scale problems*
*per CV cycle
Information Technologies Institute 2.48
Centre for Research and Technology Hellas
49. Event detection conclusions
• Performance of the event detection system increases by
– Discarding irrelevant or noisy concept detections
– Effectively combining multiple classifiers
– Exploiting the subclass structure of the event data
• General hints
– Importance of low level features: visual motion features are the most
important followed by visual static features; for some events, audio features
provide complementary information
– However, the use of motion features in large-scale video databases has high
computational cost (associated with their extraction; optimal trade-off
between extraction time and quality of representation should be investigated)
– Combining low-level features with model vectors provides small but
noticeable performance gains
– The machine learning methods that one uses can make a difference in both
detection speed and accuracy
Information Technologies Institute 2.49
Centre for Research and Technology Hellas
50. Event detection: additional reading
• P. Over, J. Fiscus, G. Sanders et. al., "TRECVID 2013 -- An Overview of the Goals, Tasks, Data, Evaluation
Mechanisms and Metrics", Proc. TRECVID 2013 Workshop, November 2013, Gaithersburg, MD, USA.
• N. Gkalelis, V. Mezaris, I. Kompatsiaris, T. Stathaki, "Mixture subclass discriminant analysis link to
restricted Gaussian model and other generalizations", IEEE Transactions on Neural Networks and Learning
Systems, vol. 24, no. 1, pp. 8-21, January 2013.
• N. Gkalelis, V. Mezaris, "Video event detection using generalized subclass discriminant analysis and linear
support vector machines", Proc. ACM Int. Conf. on Multimedia Retrieval (ICMR), Glasgow, UK, April 2014.
• N. Gkalelis, V. Mezaris, and I. Kompatsiaris, “High-level event detection in video exploiting discriminant
concepts,” in Proc. 9th Int. Workshop CBMI, Madrid, Spain, June 2011, pp. 85–90.
• M. Merler, B. Huang, L. Xie, G. Hua, and A. Natsev, “Semantic model vectors for complex video event
recognition,” IEEE Trans. Multimedia, vol. 14, no. 1, pp. 88–101, Feb. 2012.
• N. Gkalelis, V. Mezaris, M. Dimopoulos, I. Kompatsiaris, T. Stathaki, "Video event detection using a
subclass recoding error-correcting output codes framework", Proc. IEEE Int. Conf. on Multimedia and Expo
(ICME 2013), San Jose, CA, USA, July 2013.
• Y.-G. Jiang, S. Bhattacharya, S.-F. Chang, and M. Shah, “High-level event recognition in unconstrained
videos,” Int. J. Multimed. Info. Retr., Nov. 2013.
• A. Habibian, K. van de Sande, C.G.M. Snoek, "Recommendations for Video Event Recognition Using
Concept Vocabularies", Proc. ACM Int. Conf. on Multimedia Retrieval, Dallas, Texas, USA, April 2013.
• Z. Ma, Y. Yang, Z. Xu, N. Sebe, A. Hauptmann, "We Are Not Equally Negative: Fine-grained Labeling for
Multimedia Event Detection", Proc. ACM Multimedia 2013 (MM’13), Barcelona, Spain, October 2013.
• C. Tzelepis, N. Gkalelis, V. Mezaris, I. Kompatsiaris, "Improving event detection using related videos and
Relevance Degree Support Vector Machines", Proc. ACM Multimedia 2013 (MM’13), Barcelona, Spain,
October 2013.
Information Technologies Institute 2.50
Centre for Research and Technology Hellas
51. Other relevant analysis technologies
• Object re-detection: a particular case of image matching
• Main goal: find instances of a specific object within a single video or a
collection of videos
– Input: object of interest + video files
– Processing: similarity estimation by means of image matching (using local
features)
– Output: detected instances of the object of interest
Semi-automatic
process!
Information Technologies Institute 2.51
Centre for Research and Technology Hellas
52. Other relevant analysis technologies
• An indicative object re-detection approach: the sequential processing of
video frames is replaced by a structure-based one, using the results of
shot segmentation 9 Example
Check shot 2
1
No Detection!
detection!
Move to the
next shot
Check the frames
of this shot
Detect and highlight
the object of interest
Demo video: http://www.youtube.com/watch?v=0IeVkXRTYu8
9 L. Apostolidis, V. Mezaris, I. Kompatsiaris, "Fast object re-detection and localization in video for spatio-temporal
fragment creation", Proc. MMIX'13 at IEEE ICME 2013, San Jose, CA, USA, July 2013.
Information Technologies Institute 2.52
Centre for Research and Technology Hellas
53. Other relevant analysis technologies
• Near-duplicate frame detection
• Rationale: often the repetition of the same frames indicates related
content
– E.g. in News, video coverage of an evolving story will often repeat the same
iconic footage of the event over a period of days
– Can help us link between pieces of related video content (as long as it’s not
e.g. just a frame showing the anchorperson…)
• Typical detection process
– Local feature extraction (e.g. SIFT), encoding (e.g. VLAD), indexing (e.g. KD-trees)
– Given a query frame, find the N closest matches to it
– Then, perform geometric verification to decide if they are indeed near-duplicates
or not
Information Technologies Institute 2.53
Centre for Research and Technology Hellas
54. Concluding remarks
• We discussed different classes of techniques for video fragmentation and
annotation, to support video hyperlinking, but others also exist, e.g.
– Face detection, tracking, clustering, recognition
– ASR, OCR, and textual transcript analysis
• Is some cases the automatic analysis results remain far from perfect
(manual) annotations; yet, these results may still be useful in automating
the process of video hyperlinking
• We will see in the next part of the tutorial how most of these results can
be used for video hyperlinking
Information Technologies Institute 2.54
Centre for Research and Technology Hellas
55. Contributors
• Mr. Evlampios Apostolodis (CERTH-ITI)
• Dr. Nikolaos Gkalelis (CERTH-ITI)
• Ms. Fotini Markatopoulou (CERTH-ITI & QMUL)
Information Technologies Institute 2.55
Centre for Research and Technology Hellas
56. Thank you!
More information (including links to various demos):
http://www.iti.gr/~bmezaris
bmezaris@iti.gr
Information Technologies Institute 2.56
Centre for Research and Technology Hellas