SlideShare a Scribd company logo
1 of 20
Feature Selection Methods for Bag-
  of-(visual)-Words Approaches
          Schmiedeke, Kelm and Sikora
             Communication Systems Group
              Technische Universität Berlin

                     4 October, 2012
Motivation                                                           2




                     sports


        Schmiedeke: “Feature Selection Methods for BoW Approaches”
Lessons from last year                                                 3




 Features derived from metadata (esp. tags)
 outperform visual and ASR ones
  • Metadata:                 Naive Bayes (non translated)
  • Visual feat.:             SVM (avg. pooled histograms)
  • ASR transcripts:          kNN (JSD)


 Uploader mainly contribute to a single category




          Schmiedeke: “Feature Selection Methods for BoW Approaches”
This year‘s question                                                  4




 Does feature selection improve results achieved
 with BoW model?




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection/ Transformation                                     5




 Mutual information:



 Term Frequency:



 PCA (Eigenvalue decomposition):




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              6




    Concepts for terms selection:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
ministri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              7




    Top-k-Union:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              8




    Top-k:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)



               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              9




    Union>th:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
unleaven(0.0782)           grittv (0.0881)             harta (0.0227)
eeli    (0.0782)           flander (0.0861)            exceric (0.0211)
davideel(0.0781)           laura (0.0855)              yoga    (0.0203)
misistri(0.0780)           economi(0.0747)             study (0.0192)

…                          …                           …

daytripp (0.0)             sonnet   (0.0)              ilsr     (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)
         0.0002                     0.0002                      0.0001

               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Feature Selection                                                              10




  Intersection>Th:

Top terms for religion:    Top terms for politics:     Top terms for health:
bibl    (0.0897)           lunch (0.1200)              jama    (0.0495)
jesu    (0.0797)           obama (0.1113)              health (0.0378)
god     (0.0796)           polit (0.0982)              report (0.0357)
…                          …                           …
web                        appl                        gossip
python                     googl                       interview
xbox                       teen                        iphon
big                        music                       san
expo                       tv                          texa
…                          …                           …
daytripp (0.0)             sonnet     (0.0)            ilsr       (0.0)
adagio (0.0)               screenplai (0.0)            resystem (0.0)
acustica (0.0)             acustica (0.0)              acustica (0.0)
         0.0002                     0.0002                      0.0001

               Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           11




  Bag of clustered SURF features transformed
  using PCA
  • Result does not benefit from transformation

                          official run        without FS/FT
      mAP                       0.2301              0.2309
      CA                       41.63 %             41.71 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           12




  Bag of filtered ASR transcripts terms (Union>Th)
  • Result does benefit from selection


                          official run        without FS/FT
      mAP                       0.1035              0.0522
      CA                       32.53 %             26.54 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           13




  Bag of clustered SURF features filtered using MI
  and intersection>th strategy
  • Result does slightly benefit from selection

                          official run        without FS/FT
      mAP                       0.2259              0.2221
      CA                       40.80 %             40.78 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                            14




  Bag of filtered terms derived from tags, title and
  descriptions (Union>Th)
  • Result does benefit from selection

                           official run        without FS/FT
       mAP                       0.5225              0.4146
       CA                       58.18 %             55.70 %




            Schmiedeke: “Feature Selection Methods for BoW Approaches”
Official runs                                                           15




  Bag of clustered SURF features transformed
  using PCA and decision fusion using uploader
  • Result does benefit from transformation

                          official run        without FS/FT
      mAP                       0.3304              0.2988
      CA                       52.14 %             49.19 %




           Schmiedeke: “Feature Selection Methods for BoW Approaches”
Conclusion & Future Work                                              16




 FS showed potential for improving the results

 Choice of using MI or TF is not critical, both
 methods achieve roughly same results
    • Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275)



 Investigation in different scaling schemes (NB)

 Use of class-independent selection score (MI)


         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Backup                                                                17




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Backup                                                                18




         Schmiedeke: “Feature Selection Methods for BoW Approaches”
Extracting visual features                                             19




  SURF are extracted from each key frame
  • At keypoints and at a regular grid


  Vocabulary is built using hierarchical clustering
  on SURF features of development set
  • 4096/8196 codewords


  Term vector for a single video is obtained by bin-
  wise pooling of each key frames’ term vector
  • avg


          Schmiedeke: “Feature Selection Methods for BoW Approaches”
MediaEval 2012: Tagging Task                                         20




 Question: What is the videos’ blip.tv category?
 Blip.tv database (cc): ~ 3300 h
  • 5288 training videos
  • 9550 test videos
 Official evaluation measurement is Mean
 Average Precision (mAP)
 Workshop will be held 4-5 October 2012 in Pisa,
 Italy

        Schmiedeke: “Feature Selection Methods for BoW Approaches”

More Related Content

Viewers also liked

Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methodsReza Ramezani
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksEditor IJCATR
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...Sebastian Raschka
 

Viewers also liked (12)

Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Support Vector Machine
Support Vector MachineSupport Vector Machine
Support Vector Machine
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Textmining Introduction
Textmining IntroductionTextmining Introduction
Textmining Introduction
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Feature selection concepts and methods
Feature selection concepts and methodsFeature selection concepts and methods
Feature selection concepts and methods
 
A Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification TasksA Review on Feature Selection Methods For Classification Tasks
A Review on Feature Selection Methods For Classification Tasks
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 

Recently uploaded

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxFIDO Alliance
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfAnubhavMangla3
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 

Recently uploaded (20)

Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 

Me12tt tub

  • 1. Feature Selection Methods for Bag- of-(visual)-Words Approaches Schmiedeke, Kelm and Sikora Communication Systems Group Technische Universität Berlin 4 October, 2012
  • 2. Motivation 2 sports Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 3. Lessons from last year 3 Features derived from metadata (esp. tags) outperform visual and ASR ones • Metadata: Naive Bayes (non translated) • Visual feat.: SVM (avg. pooled histograms) • ASR transcripts: kNN (JSD) Uploader mainly contribute to a single category Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 4. This year‘s question 4 Does feature selection improve results achieved with BoW model? Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 5. Feature Selection/ Transformation 5 Mutual information: Term Frequency: PCA (Eigenvalue decomposition): Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 6. Feature Selection 6 Concepts for terms selection: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) ministri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 7. Feature Selection 7 Top-k-Union: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 8. Feature Selection 8 Top-k: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 9. Feature Selection 9 Union>th: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) unleaven(0.0782) grittv (0.0881) harta (0.0227) eeli (0.0782) flander (0.0861) exceric (0.0211) davideel(0.0781) laura (0.0855) yoga (0.0203) misistri(0.0780) economi(0.0747) study (0.0192) … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) 0.0002 0.0002 0.0001 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 10. Feature Selection 10 Intersection>Th: Top terms for religion: Top terms for politics: Top terms for health: bibl (0.0897) lunch (0.1200) jama (0.0495) jesu (0.0797) obama (0.1113) health (0.0378) god (0.0796) polit (0.0982) report (0.0357) … … … web appl gossip python googl interview xbox teen iphon big music san expo tv texa … … … daytripp (0.0) sonnet (0.0) ilsr (0.0) adagio (0.0) screenplai (0.0) resystem (0.0) acustica (0.0) acustica (0.0) acustica (0.0) 0.0002 0.0002 0.0001 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 11. Official runs 11 Bag of clustered SURF features transformed using PCA • Result does not benefit from transformation official run without FS/FT mAP 0.2301 0.2309 CA 41.63 % 41.71 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 12. Official runs 12 Bag of filtered ASR transcripts terms (Union>Th) • Result does benefit from selection official run without FS/FT mAP 0.1035 0.0522 CA 32.53 % 26.54 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 13. Official runs 13 Bag of clustered SURF features filtered using MI and intersection>th strategy • Result does slightly benefit from selection official run without FS/FT mAP 0.2259 0.2221 CA 40.80 % 40.78 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 14. Official runs 14 Bag of filtered terms derived from tags, title and descriptions (Union>Th) • Result does benefit from selection official run without FS/FT mAP 0.5225 0.4146 CA 58.18 % 55.70 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 15. Official runs 15 Bag of clustered SURF features transformed using PCA and decision fusion using uploader • Result does benefit from transformation official run without FS/FT mAP 0.3304 0.2988 CA 52.14 % 49.19 % Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 16. Conclusion & Future Work 16 FS showed potential for improving the results Choice of using MI or TF is not critical, both methods achieve roughly same results • Metadata (mAP) : MI12004 (0.5277) vs. TF14976 (0.5275) Investigation in different scaling schemes (NB) Use of class-independent selection score (MI) Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 17. Backup 17 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 18. Backup 18 Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 19. Extracting visual features 19 SURF are extracted from each key frame • At keypoints and at a regular grid Vocabulary is built using hierarchical clustering on SURF features of development set • 4096/8196 codewords Term vector for a single video is obtained by bin- wise pooling of each key frames’ term vector • avg Schmiedeke: “Feature Selection Methods for BoW Approaches”
  • 20. MediaEval 2012: Tagging Task 20 Question: What is the videos’ blip.tv category? Blip.tv database (cc): ~ 3300 h • 5288 training videos • 9550 test videos Official evaluation measurement is Mean Average Precision (mAP) Workshop will be held 4-5 October 2012 in Pisa, Italy Schmiedeke: “Feature Selection Methods for BoW Approaches”