SlideShare a Scribd company logo
TubeTagger
                        — YouTube-based Concept Detection —


                             Adrian Ulges, Markus Koch
                           Damian Borth, Thomas M. Breuel

                    German Research Center for Artificial Intelligence (DFKI) &
                                  University of Kaiserslautern


                                     December 6 2009




D.Borth: : TubeTagger                           1                           December 6 2009
Outline



    Motivation


    TubeTagger System


    TubeTagger Web Demo


    Summary




D.Borth: : TubeTagger     2   December 6 2009
Data, data, data




D.Borth: : TubeTagger   3   December 6 2009
Data, data, data

       ”...TV, video on demand, Internet video, and P2P video will
     account for over 91 percent of global consumer traffic by 2013...”
                        ”Cisco Visual Networking Index”, 2009




D.Borth: : TubeTagger                 3                     December 6 2009
Data, data, data

       ”...TV, video on demand, Internet video, and P2P video will
     account for over 91 percent of global consumer traffic by 2013...”
                        ”Cisco Visual Networking Index”, 2009


          ”...more than 20 hours of video uploaded to YouTube every
                       minute, 1 billion views per day...”

                                          , 2009




D.Borth: : TubeTagger                 3                     December 6 2009
Data, data, data

       ”...TV, video on demand, Internet video, and P2P video will
     account for over 91 percent of global consumer traffic by 2013...”
                        ”Cisco Visual Networking Index”, 2009


          ”...more than 20 hours of video uploaded to YouTube every
                       minute, 1 billion views per day...”

                                          , 2009


    Increasing importance of video live streams: Obama’s inauguration,
         Michael Jackson’s memorial service, music video debuts...
                                              , 2009

D.Borth: : TubeTagger                 3                     December 6 2009
Content-based Video Retrieval

                        How to search in large video databases?




D.Borth: : TubeTagger                      4                      December 6 2009
Content-based Video Retrieval

                        How to search in large video databases?




                                                  ??
                                                     ?



D.Borth: : TubeTagger                      5                      December 6 2009
Content-based Video Retrieval

                        How to search in large video databases?

                                                               b ah
                                                                ec




                                                               s ce
                                                                o cr




                                                               itriw
                                                               neve




           Video Concept Detection [MediaMill, Columbia, IBM,...]
           → as key building block of CBVR



D.Borth: : TubeTagger                         6                        December 6 2009
Content-based Video Retrieval

                        How to search in large video databases?

                                                               b ah
                                                                ec




                                                               s ce
                                                                o cr




                                                               itriw
                                                               neve



                               S p rie
                                u evs d
                             Ma hn L ann
                               c ie e rig



           Video Concept Detection [MediaMill, Columbia, IBM,...]
           → as key building block of CBVR
           Supervised Machine Learning
           → need labeled training data
D.Borth: : TubeTagger                         7                        December 6 2009
Training Data
    State-of-the-art Datasets
           TRECVID
           LSCOM

    Acquisition
           collaborative   [Ayache07]

           precise & high quality
           time consuming effort




D.Borth: : TubeTagger                   8   December 6 2009
Training Data
    State-of-the-art Datasets
           TRECVID
           LSCOM

    Acquisition
           collaborative           [Ayache07]

           precise & high quality
           time consuming effort

           vocabulary size does not scale
           → from hundreds to thousands of concepts [Hauptmann07]
           missing new emerging concepts of interest
           → e.g. ”Michael Jackson”, ”Obama”, ”iPod”1
               1
                   top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news search
                   and product search respectively



D.Borth: : TubeTagger                                        8                                     December 6 2009
Web-video as training data




D.Borth: : TubeTagger        9   December 6 2009
Web-video as training data




    Pros
           available at large scale
           high variability of data
           enriched with
                  tags
                  comments
                  ratings
           could allow automatic
           concept learning
D.Borth: : TubeTagger                 9   December 6 2009
Web-video as training data




    Pros                                  Cons
           available at large scale           weakly labeled
           high variability of data                incomplete / subjective
                                                   coarse
           enriched with
                  tags                        focus of interest
                  comments                    domain change
                  ratings                     video portal as black box
           could allow automatic
           concept learning
D.Borth: : TubeTagger                 9                        December 6 2009
TubeTagger System Overview




D.Borth: : TubeTagger    10   December 6 2009
TubeTagger System Overview




    Notation
           t ∈ T := tags (≈concepts)
           x ∈ Xdb := keyframe of a videos
           P(t|x) := probability of tag t for keyframe x

D.Borth: : TubeTagger                  10                  December 6 2009
TubeTagger System Overview




D.Borth: : TubeTagger    11   December 6 2009
TubeTagger System Overview




    Training Data Acquisition
           use YouTube as primary
           training data source
           use tags as weak labels
                  pos. = tagged
                  neg. = all non-tagged

D.Borth: : TubeTagger                     11   December 6 2009
TubeTagger System Overview




    Training Data Acquisition                  Learning Pipelines
           use YouTube as primary                  Visual Concept Learning
           training data source                    Semantic Concept
           use tags as weak labels                 Learning
                  pos. = tagged
                  neg. = all non-tagged

D.Borth: : TubeTagger                     12                        December 6 2009
Visual Concept Learning




D.Borth: : TubeTagger     13   December 6 2009
Visual Concept Learning




    Features
           keyframe extraction
           → x ∈ Xdb
           bag-of-visual-words
           descriptors [Sivic06]
                  SIFT [Lowe99]
                  SURF [Bay06]

D.Borth: : TubeTagger              13   December 6 2009
Visual Concept Learning




    Features                            Statistical Models
           keyframe extraction              SVM   [Sch¨lkopf01]
                                                      o

           → x ∈ Xdb                        PAMIR    [Grangier08, Paredes09]
           bag-of-visual-words              Max. Entropy          [Deselaers05]
           descriptors [Sivic06]
                  SIFT [Lowe99]
                  SURF [Bay06]

D.Borth: : TubeTagger              13                             December 6 2009
Visual Concept Learning




    Features                            Statistical Models
           keyframe extraction              SVM   [Sch¨lkopf01]
                                                      o

           → x ∈ Xdb                        PAMIR    [Grangier08, Paredes09]
           bag-of-visual-words              Max. Entropy          [Deselaers05]
           descriptors [Sivic06]
                  SIFT [Lowe99]         → output: P(t|x)
                  SURF [Bay06]

D.Borth: : TubeTagger              13                             December 6 2009
Semantic Concept Learning




D.Borth: : TubeTagger       14   December 6 2009
Semantic Concept Learning




    Query Formulation
           concepts = textual queries
           mapping of text queries to concept vocabulary
                   ”computer” → Monitor, Windows-Desktop, iPhone
                   ”funny” → Muppets, Cats, Commercials
           learning of tag co-occurrences

D.Borth: : TubeTagger                    14                    December 6 2009
Semantic Concept Learning




 Feature
     bag-of-words features
               t ∈ T → ht
               q := hq
        mapping to concepts as
        weights
               w (q, t) :=< ht , hq >

D.Borth: : TubeTagger                   15   December 6 2009
Semantic Concept Learning




 Feature                                Approach
     bag-of-words features                   T5 = w (q, t)top5
               t ∈ T → ht                    final fusion
               q := hq                             P(q|x) =
        mapping to concepts as
        weights                                             P w (q,t)    P(t|x)
                                                     t∈T5    t∈T w (q,t)
                                                                5
               w (q, t) :=< ht , hq >

D.Borth: : TubeTagger                   15                          December 6 2009
Experiments




D.Borth: : TubeTagger   16   December 6 2009
Experiments

 Dataset
        1200 hrs. (= 750k keyframes)
        233 concepts
        per concept:
               150 videos for training
               50 videos for testing          ”soccer”        ”traffic”




D.Borth: : TubeTagger                    16              December 6 2009
Experiments

 Dataset
        1200 hrs. (= 750k keyframes)
        233 concepts
        per concept:
               150 videos for training
               50 videos for testing          ”soccer”        ”traffic”




 Clips/Concept Evaluation
        10 concepts
        trained on N clips
        tested on 200k keyframes
        saturation at 100 − 150
        clips/concepts for SURF+PAMIR

D.Borth: : TubeTagger                    16              December 6 2009
Experiments
    Feature & Classifier Evaluation
           81 concepts                                      feature
                                                  model   SURF SIFT
           video level testing (avg. fusion)      SVMs     20.4   22.4
           results:                              PAMIR     15.4   18.4
                  MAP: 22.4%                    MAXENT     14.1   15.5
                  MAP: 15.4% (6 times faster)




D.Borth: : TubeTagger                    17                December 6 2009
Experiments
    Feature & Classifier Evaluation
           81 concepts                                                          feature
                                                                      model   SURF SIFT
           video level testing (avg. fusion)                          SVMs     20.4   22.4
           results:                                                  PAMIR     15.4   18.4
                  MAP: 22.4%                                        MAXENT     14.1   15.5
                  MAP: 15.4% (6 times faster)


    Performance Distribution




           oi mi
            ra
            g           s ce
                         ocr                 bs
                                              u                rs ua t
                                                               e t rn
                                                                 a              o eai - o
                                                                                 p rt nr m
                                                                                     o o
                             showing 78 representative concepts out of 233

D.Borth: : TubeTagger                             17                           December 6 2009
Insights in Web-based Concept Detection

 Good Concepts
        ”boat/ship”, ”pyramids”
        broad community of YouTube users
        often ”interesting”, ”spectacular”        ”boat ship”


 Redundant Concepts
        ”drummer”, ”fencing”
        series of clips, not sufficiently diverse
                                                  ”drummer”
        problem: generalization

 Bad Concepts
        ”fence”, ”operation-room”
        not regularly used as a tag
                                                  ”fence”



D.Borth: : TubeTagger                   18                      December 6 2009
TubeTagger Web Demo




    User Interface
       1. selected a concept
       2. or enter a query
       3. switch between video or keyframe level

D.Borth: : TubeTagger                19            December 6 2009
TubeTagger Web Demo




    Deep Tagging
           frame-accurate concept detection beyond coarse tags
           find concept (e.g. ”Christmas Tree”) within a video clip

D.Borth: : TubeTagger                 20                     December 6 2009
TubeTagger Web Demo




    Text-based Search
           matches queries to concepts
           e.g. ”diving” → ”underwater”, ”shipwreck”, ”fish”, . . .

D.Borth: : TubeTagger                 21                      December 6 2009
Summary



    TubeTagger - YouTube-based Concept Detection
           utilize YouTube videos as training source for visual
           concept detector learning
           utilize YouTube tags for semantic model generation
           web demo providing:
                   deep tagging
                   tag recommendation
                   text-based search




D.Borth: : TubeTagger                   22                        December 6 2009
questions?


                           project site: www.dfki.de/moonvid

                  web demo at: http://madm.dfki.de/demo/tubetagger/




D.Borth: : TubeTagger                     23                     December 6 2009

More Related Content

Similar to TubeTagger – YouTube-based Concept Detection

Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UK
reeep
 
Articulo
ArticuloArticulo
SAMT09 - Web of Data Tutorial - Part 2
SAMT09 - Web of Data Tutorial - Part 2SAMT09 - Web of Data Tutorial - Part 2
SAMT09 - Web of Data Tutorial - Part 2
Bernhard Haslhofer
 
Clipper dhra 2016
Clipper dhra 2016Clipper dhra 2016
Clipper dhra 2016
John Casey
 
ICO Lecture 3 D Vision Syndrome
ICO Lecture 3 D Vision SyndromeICO Lecture 3 D Vision Syndrome
ICO Lecture 3 D Vision Syndrome
Dominick Maino
 
Unwinding The Twine
Unwinding The TwineUnwinding The Twine
Unwinding The Twine
Davide Palmisano
 
DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...
Louise Patterton
 
Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016
John Casey
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
Jisc RDM
 
PBCore: Overview
PBCore: OverviewPBCore: Overview
PBCore: Overview
Courtney Michael
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
Edureka!
 
URIplay for Google Tech Talk (2008)
URIplay for Google Tech Talk (2008)URIplay for Google Tech Talk (2008)
URIplay for Google Tech Talk (2008)
Chris Jackson
 
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
data://disrupted®
 
DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...
Louise Patterton
 
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream RecordingWebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
Amir Zmora
 
Semantics And Multimedia
Semantics And MultimediaSemantics And Multimedia
Semantics And Multimedia
Peter Berger
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
Stephen Buxton
 
A Metadata Model for Peer-to-Peer Media Distribution
A Metadata Model for Peer-to-Peer Media DistributionA Metadata Model for Peer-to-Peer Media Distribution
A Metadata Model for Peer-to-Peer Media Distribution
Alpen-Adria-Universität
 
IPTC - Linked Data
IPTC - Linked DataIPTC - Linked Data
IPTC - Linked Data
Richard Wallis
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
Open Analytics
 

Similar to TubeTagger – YouTube-based Concept Detection (20)

Linked Open Government Data in UK
Linked Open Government Data in UKLinked Open Government Data in UK
Linked Open Government Data in UK
 
Articulo
ArticuloArticulo
Articulo
 
SAMT09 - Web of Data Tutorial - Part 2
SAMT09 - Web of Data Tutorial - Part 2SAMT09 - Web of Data Tutorial - Part 2
SAMT09 - Web of Data Tutorial - Part 2
 
Clipper dhra 2016
Clipper dhra 2016Clipper dhra 2016
Clipper dhra 2016
 
ICO Lecture 3 D Vision Syndrome
ICO Lecture 3 D Vision SyndromeICO Lecture 3 D Vision Syndrome
ICO Lecture 3 D Vision Syndrome
 
Unwinding The Twine
Unwinding The TwineUnwinding The Twine
Unwinding The Twine
 
DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...
 
Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
 
PBCore: Overview
PBCore: OverviewPBCore: Overview
PBCore: Overview
 
Python for Big Data Analytics
Python for Big Data AnalyticsPython for Big Data Analytics
Python for Big Data Analytics
 
URIplay for Google Tech Talk (2008)
URIplay for Google Tech Talk (2008)URIplay for Google Tech Talk (2008)
URIplay for Google Tech Talk (2008)
 
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
​Tape-basierter Object-Storage als S3 Speicherklasse und Cloud-Absicherung
 
DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...DISNEY DOES DATA: Data management implications of using animated video as tra...
DISNEY DOES DATA: Data management implications of using animated video as tra...
 
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream RecordingWebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
WebRTC Webinar & Q&A - W3C WebRTC W3C MediaStream Recording
 
Semantics And Multimedia
Semantics And MultimediaSemantics And Multimedia
Semantics And Multimedia
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
A Metadata Model for Peer-to-Peer Media Distribution
A Metadata Model for Peer-to-Peer Media DistributionA Metadata Model for Peer-to-Peer Media Distribution
A Metadata Model for Peer-to-Peer Media Distribution
 
IPTC - Linked Data
IPTC - Linked DataIPTC - Linked Data
IPTC - Linked Data
 
2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar2013 open analytics-meetup-mortar
2013 open analytics-meetup-mortar
 

Recently uploaded

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

TubeTagger – YouTube-based Concept Detection

  • 1. TubeTagger — YouTube-based Concept Detection — Adrian Ulges, Markus Koch Damian Borth, Thomas M. Breuel German Research Center for Artificial Intelligence (DFKI) & University of Kaiserslautern December 6 2009 D.Borth: : TubeTagger 1 December 6 2009
  • 2. Outline Motivation TubeTagger System TubeTagger Web Demo Summary D.Borth: : TubeTagger 2 December 6 2009
  • 3. Data, data, data D.Borth: : TubeTagger 3 December 6 2009
  • 4. Data, data, data ”...TV, video on demand, Internet video, and P2P video will account for over 91 percent of global consumer traffic by 2013...” ”Cisco Visual Networking Index”, 2009 D.Borth: : TubeTagger 3 December 6 2009
  • 5. Data, data, data ”...TV, video on demand, Internet video, and P2P video will account for over 91 percent of global consumer traffic by 2013...” ”Cisco Visual Networking Index”, 2009 ”...more than 20 hours of video uploaded to YouTube every minute, 1 billion views per day...” , 2009 D.Borth: : TubeTagger 3 December 6 2009
  • 6. Data, data, data ”...TV, video on demand, Internet video, and P2P video will account for over 91 percent of global consumer traffic by 2013...” ”Cisco Visual Networking Index”, 2009 ”...more than 20 hours of video uploaded to YouTube every minute, 1 billion views per day...” , 2009 Increasing importance of video live streams: Obama’s inauguration, Michael Jackson’s memorial service, music video debuts... , 2009 D.Borth: : TubeTagger 3 December 6 2009
  • 7. Content-based Video Retrieval How to search in large video databases? D.Borth: : TubeTagger 4 December 6 2009
  • 8. Content-based Video Retrieval How to search in large video databases? ?? ? D.Borth: : TubeTagger 5 December 6 2009
  • 9. Content-based Video Retrieval How to search in large video databases? b ah ec s ce o cr itriw neve Video Concept Detection [MediaMill, Columbia, IBM,...] → as key building block of CBVR D.Borth: : TubeTagger 6 December 6 2009
  • 10. Content-based Video Retrieval How to search in large video databases? b ah ec s ce o cr itriw neve S p rie u evs d Ma hn L ann c ie e rig Video Concept Detection [MediaMill, Columbia, IBM,...] → as key building block of CBVR Supervised Machine Learning → need labeled training data D.Borth: : TubeTagger 7 December 6 2009
  • 11. Training Data State-of-the-art Datasets TRECVID LSCOM Acquisition collaborative [Ayache07] precise & high quality time consuming effort D.Borth: : TubeTagger 8 December 6 2009
  • 12. Training Data State-of-the-art Datasets TRECVID LSCOM Acquisition collaborative [Ayache07] precise & high quality time consuming effort vocabulary size does not scale → from hundreds to thousands of concepts [Hauptmann07] missing new emerging concepts of interest → e.g. ”Michael Jackson”, ”Obama”, ”iPod”1 1 top ranked searches 2009-Q3 by “Google Insights for Search” for web search, news search and product search respectively D.Borth: : TubeTagger 8 December 6 2009
  • 13. Web-video as training data D.Borth: : TubeTagger 9 December 6 2009
  • 14. Web-video as training data Pros available at large scale high variability of data enriched with tags comments ratings could allow automatic concept learning D.Borth: : TubeTagger 9 December 6 2009
  • 15. Web-video as training data Pros Cons available at large scale weakly labeled high variability of data incomplete / subjective coarse enriched with tags focus of interest comments domain change ratings video portal as black box could allow automatic concept learning D.Borth: : TubeTagger 9 December 6 2009
  • 16. TubeTagger System Overview D.Borth: : TubeTagger 10 December 6 2009
  • 17. TubeTagger System Overview Notation t ∈ T := tags (≈concepts) x ∈ Xdb := keyframe of a videos P(t|x) := probability of tag t for keyframe x D.Borth: : TubeTagger 10 December 6 2009
  • 18. TubeTagger System Overview D.Borth: : TubeTagger 11 December 6 2009
  • 19. TubeTagger System Overview Training Data Acquisition use YouTube as primary training data source use tags as weak labels pos. = tagged neg. = all non-tagged D.Borth: : TubeTagger 11 December 6 2009
  • 20. TubeTagger System Overview Training Data Acquisition Learning Pipelines use YouTube as primary Visual Concept Learning training data source Semantic Concept use tags as weak labels Learning pos. = tagged neg. = all non-tagged D.Borth: : TubeTagger 12 December 6 2009
  • 21. Visual Concept Learning D.Borth: : TubeTagger 13 December 6 2009
  • 22. Visual Concept Learning Features keyframe extraction → x ∈ Xdb bag-of-visual-words descriptors [Sivic06] SIFT [Lowe99] SURF [Bay06] D.Borth: : TubeTagger 13 December 6 2009
  • 23. Visual Concept Learning Features Statistical Models keyframe extraction SVM [Sch¨lkopf01] o → x ∈ Xdb PAMIR [Grangier08, Paredes09] bag-of-visual-words Max. Entropy [Deselaers05] descriptors [Sivic06] SIFT [Lowe99] SURF [Bay06] D.Borth: : TubeTagger 13 December 6 2009
  • 24. Visual Concept Learning Features Statistical Models keyframe extraction SVM [Sch¨lkopf01] o → x ∈ Xdb PAMIR [Grangier08, Paredes09] bag-of-visual-words Max. Entropy [Deselaers05] descriptors [Sivic06] SIFT [Lowe99] → output: P(t|x) SURF [Bay06] D.Borth: : TubeTagger 13 December 6 2009
  • 25. Semantic Concept Learning D.Borth: : TubeTagger 14 December 6 2009
  • 26. Semantic Concept Learning Query Formulation concepts = textual queries mapping of text queries to concept vocabulary ”computer” → Monitor, Windows-Desktop, iPhone ”funny” → Muppets, Cats, Commercials learning of tag co-occurrences D.Borth: : TubeTagger 14 December 6 2009
  • 27. Semantic Concept Learning Feature bag-of-words features t ∈ T → ht q := hq mapping to concepts as weights w (q, t) :=< ht , hq > D.Borth: : TubeTagger 15 December 6 2009
  • 28. Semantic Concept Learning Feature Approach bag-of-words features T5 = w (q, t)top5 t ∈ T → ht final fusion q := hq P(q|x) = mapping to concepts as weights P w (q,t) P(t|x) t∈T5 t∈T w (q,t) 5 w (q, t) :=< ht , hq > D.Borth: : TubeTagger 15 December 6 2009
  • 29. Experiments D.Borth: : TubeTagger 16 December 6 2009
  • 30. Experiments Dataset 1200 hrs. (= 750k keyframes) 233 concepts per concept: 150 videos for training 50 videos for testing ”soccer” ”traffic” D.Borth: : TubeTagger 16 December 6 2009
  • 31. Experiments Dataset 1200 hrs. (= 750k keyframes) 233 concepts per concept: 150 videos for training 50 videos for testing ”soccer” ”traffic” Clips/Concept Evaluation 10 concepts trained on N clips tested on 200k keyframes saturation at 100 − 150 clips/concepts for SURF+PAMIR D.Borth: : TubeTagger 16 December 6 2009
  • 32. Experiments Feature & Classifier Evaluation 81 concepts feature model SURF SIFT video level testing (avg. fusion) SVMs 20.4 22.4 results: PAMIR 15.4 18.4 MAP: 22.4% MAXENT 14.1 15.5 MAP: 15.4% (6 times faster) D.Borth: : TubeTagger 17 December 6 2009
  • 33. Experiments Feature & Classifier Evaluation 81 concepts feature model SURF SIFT video level testing (avg. fusion) SVMs 20.4 22.4 results: PAMIR 15.4 18.4 MAP: 22.4% MAXENT 14.1 15.5 MAP: 15.4% (6 times faster) Performance Distribution oi mi ra g s ce ocr bs u rs ua t e t rn a o eai - o p rt nr m o o showing 78 representative concepts out of 233 D.Borth: : TubeTagger 17 December 6 2009
  • 34. Insights in Web-based Concept Detection Good Concepts ”boat/ship”, ”pyramids” broad community of YouTube users often ”interesting”, ”spectacular” ”boat ship” Redundant Concepts ”drummer”, ”fencing” series of clips, not sufficiently diverse ”drummer” problem: generalization Bad Concepts ”fence”, ”operation-room” not regularly used as a tag ”fence” D.Borth: : TubeTagger 18 December 6 2009
  • 35. TubeTagger Web Demo User Interface 1. selected a concept 2. or enter a query 3. switch between video or keyframe level D.Borth: : TubeTagger 19 December 6 2009
  • 36. TubeTagger Web Demo Deep Tagging frame-accurate concept detection beyond coarse tags find concept (e.g. ”Christmas Tree”) within a video clip D.Borth: : TubeTagger 20 December 6 2009
  • 37. TubeTagger Web Demo Text-based Search matches queries to concepts e.g. ”diving” → ”underwater”, ”shipwreck”, ”fish”, . . . D.Borth: : TubeTagger 21 December 6 2009
  • 38. Summary TubeTagger - YouTube-based Concept Detection utilize YouTube videos as training source for visual concept detector learning utilize YouTube tags for semantic model generation web demo providing: deep tagging tag recommendation text-based search D.Borth: : TubeTagger 22 December 6 2009
  • 39. questions? project site: www.dfki.de/moonvid web demo at: http://madm.dfki.de/demo/tubetagger/ D.Borth: : TubeTagger 23 December 6 2009