SlideShare a Scribd company logo
Feature Selection Methods for Bag-
  of-(visual)-Words Approaches
          Schmiedeke, Kelm and Sikora
             Communication Systems Group
              Technische Universität Berlin

                     4 October, 2012
Motivation                                                           2




                     sports


        Schmiedeke: “Feature Selection Methods for BoW Approaches”
Lessons from last year                                                 3




 Features derived from metadata (esp. tags)
 outperform visual and ASR ones
  • Metadata:                 Naive Bayes (non translated)
  • Visual feat.:             SVM (avg. pooled histograms)
  • ASR transcripts:          kNN (JSD)


 Uploader mainly contribute to a single category




          Schmiedeke: “Feature Selection Methods for BoW Approaches”
This year‘s question                                                  4




 Does feature selection improve results achieved
 with BoW model?




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection/ Transformation                                     5




 Mutual information:



 Term Frequency:



 PCA (Eigenvalue decomposition):




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              6




    Concepts for terms selection:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
ministri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              7




    Top-k-Union:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              8




    Top-k:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              9




    Union>th:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)
         0.0002                     0.0002                      0.0001

               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              10




  Intersection>Th:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
…                          …                           …
web                        appl                        gossip
python                     googl                       interview
xbox                       teen                        iphon
big                        music                       san
expo                       tv                          texa
…                          …                           …
daytripp (0.0)             sonnet     (0.0)            ilsr       (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)
         0.0002                     0.0002                      0.0001

               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           11




  Bag of clustered SURF features transformed
  using PCA
  • Result does not benefit from transformation

                          official run        without FS/FT
      mAP                       0.2301              0.2309
      CA                       41.63 %             41.71 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           12




  Bag of filtered ASR transcripts terms (Union>Th)
  • Result does benefit from selection


                          official run        without FS/FT
      mAP                       0.1035              0.0522
      CA                       32.53 %             26.54 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           13




  Bag of clustered SURF features filtered using MI
  and intersection>th strategy
  • Result does slightly benefit from selection

                          official run        without FS/FT
      mAP                       0.2259              0.2221
      CA                       40.80 %             40.78 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                            14




  Bag of filtered terms derived from tags, title and
  descriptions (Union>Th)
  • Result does benefit from selection

                           official run        without FS/FT
       mAP                       0.5225              0.4146
       CA                       58.18 %             55.70 %




            Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           15




  Bag of clustered SURF features transformed
  using PCA and decision fusion using uploader
  • Result does benefit from transformation

                          official run        without FS/FT
      mAP                       0.3304              0.2988
      CA                       52.14 %             49.19 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Conclusion & Future Work                                              16




 FS showed potential for improving the results

 Choice of using MI or TF is not critical, both
 methods achieve roughly same results
    • Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275)



 Investigation in different scaling schemes (NB)

 Use of class-independent selection score (MI)


         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Backup                                                                17




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Backup                                                                18




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Extracting visual features                                             19




  SURF are extracted from each key frame
  • At keypoints and at a regular grid


  Vocabulary is built using hierarchical clustering
  on SURF features of development set
  • 4096/8196 codewords


  Term vector for a single video is obtained by bin-
  wise pooling of each key frames’ term vector
  • avg


          Schmiedeke: “Feature Selection Methods for BoW Approaches”
MediaEval 2012: Tagging Task                                         20




 Question: What is the videos’ blip.tv category?
 Blip.tv database (cc): ~ 3300 h
  • 5288 training videos
  • 9550 test videos
 Official evaluation measurement is Mean
 Average Precision (mAP)
 Workshop will be held 4-5 October 2012 in Pisa,
 Italy

        Schmiedeke: “Feature Selection Methods for BoW Approaches”

More Related Content

Viewers also liked

Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
Anandha L Ranganathan
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
Minha Hwang
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
Ankit Sharma
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
Shao-Chuan Wang
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
Datamining Tools
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Feature selection
Feature selectionFeature selection
Feature selection
Dong Guo
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
Reza Ramezani
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
Editor IJCATR
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 

Viewers also liked (12)

Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 

Recently uploaded

Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 

Recently uploaded (20)

Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 

Me12tt tub

  • 1. Feature Selection Methods for Bag- of-(visual)-Words Approaches Schmiedeke, Kelm and Sikora Communication Systems Group Technische Universität Berlin 4 October, 2012
  • 2. Motivation 2 sports Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 3. Lessons from last year 3 Features derived from metadata (esp. tags) outperform visual and ASR ones • Metadata: Naive Bayes (non translated) • Visual feat.: SVM (avg. pooled histograms) • ASR transcripts: kNN (JSD) Uploader mainly contribute to a single category Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 4. This year‘s question 4 Does feature selection improve results achieved with BoW model? Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 5. Feature Selection/ Transformation 5 Mutual information: Term Frequency: PCA (Eigenvalue decomposition): Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 6. Feature Selection 6 Concepts for terms selection: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) ministri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 7. Feature Selection 7 Top-k-Union: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 8. Feature Selection 8 Top-k: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 9. Feature Selection 9 Union>th: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) 0.0002 0.0002 0.0001 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 10. Feature Selection 10 Intersection>Th: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) … … … web appl gossip python googl interview xbox teen iphon big music san expo tv texa … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) 0.0002 0.0002 0.0001 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 11. Official runs 11 Bag of clustered SURF features transformed using PCA • Result does not benefit from transformation official run without FS/FT mAP 0.2301 0.2309 CA 41.63 % 41.71 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 12. Official runs 12 Bag of filtered ASR transcripts terms (Union>Th) • Result does benefit from selection official run without FS/FT mAP 0.1035 0.0522 CA 32.53 % 26.54 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 13. Official runs 13 Bag of clustered SURF features filtered using MI and intersection>th strategy • Result does slightly benefit from selection official run without FS/FT mAP 0.2259 0.2221 CA 40.80 % 40.78 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 14. Official runs 14 Bag of filtered terms derived from tags, title and descriptions (Union>Th) • Result does benefit from selection official run without FS/FT mAP 0.5225 0.4146 CA 58.18 % 55.70 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 15. Official runs 15 Bag of clustered SURF features transformed using PCA and decision fusion using uploader • Result does benefit from transformation official run without FS/FT mAP 0.3304 0.2988 CA 52.14 % 49.19 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 16. Conclusion & Future Work 16 FS showed potential for improving the results Choice of using MI or TF is not critical, both methods achieve roughly same results • Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275) Investigation in different scaling schemes (NB) Use of class-independent selection score (MI) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 17. Backup 17 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 18. Backup 18 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 19. Extracting visual features 19 SURF are extracted from each key frame • At keypoints and at a regular grid Vocabulary is built using hierarchical clustering on SURF features of development set • 4096/8196 codewords Term vector for a single video is obtained by bin- wise pooling of each key frames’ term vector • avg Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 20. MediaEval 2012: Tagging Task 20 Question: What is the videos’ blip.tv category? Blip.tv database (cc): ~ 3300 h • 5288 training videos • 9550 test videos Official evaluation measurement is Mean Average Precision (mAP) Workshop will be held 4-5 October 2012 in Pisa, Italy Schmiedeke: “Feature Selection Methods for BoW Approaches”