SlideShare a Scribd company logo
1 of 8
Interactive and Context-Aware
Tag Spell Check and Correction
            F. Bonchi, Yahoo! Research Barcelona
   O. Frieder, Georgetown University, Washington DC, US
 F. M. Nardini, F. Silvestri, H. Vahabi, ISTI – CNR, Pisa, Italy
Motivations
Data Model:
Weighteg Tag Co-occurrence Graph
                 Consider the
           Neighborhoods of a tag’s
            contexts. Tags in these
          neighborhoods are ranked
            according to a measure
                   function.
Measures
Experimental Settings
• We assess the effectiveness and the efficiency of our
  spell check and correction technique.
• We run our experiments on a dataset of tags obtained
  by a crawl of YouTube videos:
   – 568,458 distinct videos
   – 5,817,896 tags in total
• After preprocessing the dataset by removing noisy tags
  (i.e., the ones having a low frequency) the dataset
  contains 4,357,971 tags, 434,156 of which are unique.
• Human editors to assess a random subset of 250 set of
  tags associated with the video.
Effectiveness
Efficiency
Conclusions
• We introduced a model for context-aware, interactive, tag spell
  check and correction:
    – our described approach exploits this context together with the
      available co-occurrences
• of tags in a set of resources to provide an interactive tag spell
  correction.
• Experiments on a dataset of tags coming from YouTube videos show
  that our method is effective and outperforms two important
  baselines.
• We also prove that the method is efficient as it is able to check and
  correct a tag in two milliseconds with a good trade-off in terms of
  average precision and average coverage.

In our future investigation, we intend to increase the degree of
interactivity of the system by enabling “as-you-type” tag spell check
and correction.

More Related Content

Similar to Interactive and Context-Aware Tag Spell Check and Correction

Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsAryan Khandal
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsConsuelo Valdes
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marcGenomeInABottle
 
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGIRJET Journal
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...AgileNetwork
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET Journal
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012KM Chicago
 
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016Journal For Research
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET Journal
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationAlessandro Liparoti
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Monica Munoz-Torres
 
Tags Prediction from Movie Plot Synopsis Using Machine Learning
Tags Prediction from Movie Plot Synopsis Using Machine LearningTags Prediction from Movie Plot Synopsis Using Machine Learning
Tags Prediction from Movie Plot Synopsis Using Machine LearningIRJET Journal
 
Content-based filtering with applications on tv viewing data
Content-based filtering with applications on tv viewing dataContent-based filtering with applications on tv viewing data
Content-based filtering with applications on tv viewing dataElaine Cecília Gatto
 
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET Journal
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category RecognitionAshish Gupta
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORDbutest
 
IRJET- Semantic Analysis of Online Customer Queries
IRJET-  	  Semantic Analysis of Online Customer QueriesIRJET-  	  Semantic Analysis of Online Customer Queries
IRJET- Semantic Analysis of Online Customer QueriesIRJET Journal
 

Similar to Interactive and Context-Aware Tag Spell Check and Correction (20)

Deep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendationsDeep neural networks for Youtube recommendations
Deep neural networks for Youtube recommendations
 
BU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World FinalsBU - Wellesely iGEM 2011 World Finals
BU - Wellesely iGEM 2011 World Finals
 
150219 agbt giab_poster_marc
150219 agbt giab_poster_marc150219 agbt giab_poster_marc
150219 agbt giab_poster_marc
 
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNINGENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
ENTERTAINMENT CONTENT RECOMMENDATION SYSTEM USING MACHINE LEARNING
 
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
Agile Mumbai 2022 - Rohit Handa | Combining Human and Artificial Intelligence...
 
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
IRJET - Support Vector Machine versus Naive Bayes Classifier:A Juxtaposition ...
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012
 
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
A Research Paper on BFO and PSO Based Movie Recommendation System | J4RV4I1016
 
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
IRJET- Sentimental Analysis for Students’ Feedback using Machine Learning App...
 
Content - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative InformationContent - Based Recommendations Enhanced with Collaborative Information
Content - Based Recommendations Enhanced with Collaborative Information
 
Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.Web Apollo Tutorial for the i5K copepod research community.
Web Apollo Tutorial for the i5K copepod research community.
 
Tags Prediction from Movie Plot Synopsis Using Machine Learning
Tags Prediction from Movie Plot Synopsis Using Machine LearningTags Prediction from Movie Plot Synopsis Using Machine Learning
Tags Prediction from Movie Plot Synopsis Using Machine Learning
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Content-based filtering with applications on tv viewing data
Content-based filtering with applications on tv viewing dataContent-based filtering with applications on tv viewing data
Content-based filtering with applications on tv viewing data
 
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
IRJET- Fusion Method for Image Reranking and Similarity Finding based on Topi...
 
Visual Object Category Recognition
Visual Object Category RecognitionVisual Object Category Recognition
Visual Object Category Recognition
 
Proceedings Template - WORD
Proceedings Template - WORDProceedings Template - WORD
Proceedings Template - WORD
 
IRJET- Semantic Analysis of Online Customer Queries
IRJET-  	  Semantic Analysis of Online Customer QueriesIRJET-  	  Semantic Analysis of Online Customer Queries
IRJET- Semantic Analysis of Online Customer Queries
 

Recently uploaded

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Interactive and Context-Aware Tag Spell Check and Correction

  • 1. Interactive and Context-Aware Tag Spell Check and Correction F. Bonchi, Yahoo! Research Barcelona O. Frieder, Georgetown University, Washington DC, US F. M. Nardini, F. Silvestri, H. Vahabi, ISTI – CNR, Pisa, Italy
  • 3. Data Model: Weighteg Tag Co-occurrence Graph Consider the Neighborhoods of a tag’s contexts. Tags in these neighborhoods are ranked according to a measure function.
  • 5. Experimental Settings • We assess the effectiveness and the efficiency of our spell check and correction technique. • We run our experiments on a dataset of tags obtained by a crawl of YouTube videos: – 568,458 distinct videos – 5,817,896 tags in total • After preprocessing the dataset by removing noisy tags (i.e., the ones having a low frequency) the dataset contains 4,357,971 tags, 434,156 of which are unique. • Human editors to assess a random subset of 250 set of tags associated with the video.
  • 8. Conclusions • We introduced a model for context-aware, interactive, tag spell check and correction: – our described approach exploits this context together with the available co-occurrences • of tags in a set of resources to provide an interactive tag spell correction. • Experiments on a dataset of tags coming from YouTube videos show that our method is effective and outperforms two important baselines. • We also prove that the method is efficient as it is able to check and correct a tag in two milliseconds with a good trade-off in terms of average precision and average coverage. In our future investigation, we intend to increase the degree of interactivity of the system by enabling “as-you-type” tag spell check and correction.