SlideShare a Scribd company logo
FILTERING TWEETS RELATED TO AN ENTITY
TEAM: GROUP 8, PROJECT 19
• MALLIKARJUN B R(201307681)
• APRATIM UTKARSH(201305516)
• RISHABH LADHA(201101014)
• KARTIK DUBEY(201001117)
Introduction
• One of the major problem in monitoring the online reputation of companies, is to
decide about the entity information.
• Given a tweet, need to decide whether it belongs to a particular entity or not.
• Problem is particularly hard in microblogging services such as Twitter.
APPROACH
• Supervised Machine learning is used to decide if the entity belongs to an entity or
not.
• Dataset from RepLab, home page and wikipedia page of the entity is being used.
• It involves pre-processing of the above data, extracting features from the data to
train using SVM.
• Test data also goes through same procedure, the output is predicted using the
weight vector obtained from the trained model.
Architecture
Training
Architecture
Testing
Pre-Processing
• Extract user mentions and URLs
• Convert hashtags to words by removing the hash symbol
• Remove all punctuation
• Convert text to lower case
• Remove accents and convert non-ASCII characters to their ASCII equivalents
• Remove stop-words based on the list of stop words for English.
Features
• Similarity w.r.t related tweets
• Similarity w.r.t unrelated tweets
• Keyword similarity using Word-Net database
• Web similarity
Tools Used
• CMU POS-Tagger
-http://www.ark.cs.cmu.edu/TweetNLP/
• Stanford Corenlp(POS Tagger and Lammetiser):
-http://nlp.stanford.edu/software/corenlp.shtml
• WordNet
-http://lyle.smu.edu/~tspell/jaws/index.html?utm_source=twitterfeed&utm_medium=twitter
• Jsoup Parser
-http://jsoup.org/
• LIBSVM (For Multi-Class Classification)
-http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Evaluation and Results
• Corpus consists of tweets and a list of 61 entities.
• Trained over each entity separately using libsvm.
• Using the test data for each entity, we calculated the accuracy for entire dataset
• Accuracy of entity varies from 96% to 40%. Overall accuracy is 80%.
Conclusion
• In this paper we tackled the problem of company name disambiguation in Twitter
• The main goal of this task was to classify tweets as relevant or not to a given
target entity
• We have explored several types of features, namely similarity between keywords,
TF-IDF of n-grams and we have also explored external resources such as Freebase
and Wikipedia.
• Results show that it is possible to achieve an Accuracy over 0.90.
THANKS

More Related Content

What's hot

Pepperi presentation
Pepperi presentationPepperi presentation
Pepperi presentation
Nael Abd Eljawad
 
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Holistic Benchmarking of Big Linked Data
 
Implementing an Open Source IT Ticketing System at Queen's University Library
Implementing an Open Source IT Ticketing System at Queen's University LibraryImplementing an Open Source IT Ticketing System at Queen's University Library
Implementing an Open Source IT Ticketing System at Queen's University Library
Hong (Jenny) Jing
 
Semantic DEX Components
Semantic DEX ComponentsSemantic DEX Components
Semantic DEX Components
David Price
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
Axel de Romblay
 
Crossant
CrossantCrossant
Crossant
Ajay Singh
 
Reproducible research(1)
Reproducible research(1)Reproducible research(1)
Reproducible research(1)
건웅 문
 
Data Pipeline Installation Quality
Data Pipeline Installation QualityData Pipeline Installation Quality
Data Pipeline Installation Quality
GreenM
 
Advanced templates
Advanced templatesAdvanced templates
Advanced templatesSencha
 
MLBox
MLBoxMLBox
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
Madhar Khan Pathan
 
Pointers
PointersPointers
PSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R CommanderPSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R Commander
Bernard Deepal W. Jayamanne
 
Static Import and access modifiers
Static Import and access modifiersStatic Import and access modifiers
Static Import and access modifiers
Maitree Patel
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
Data Science Society
 
MR201402 effectiveness of unknown malware classification by logistic regressi...
MR201402 effectiveness of unknown malware classification by logistic regressi...MR201402 effectiveness of unknown malware classification by logistic regressi...
MR201402 effectiveness of unknown malware classification by logistic regressi...
FFRI, Inc.
 
166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...
ESEM 2014
 
Automate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAutomate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBox
Axel de Romblay
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
Manoj Jagtap
 

What's hot (20)

Pepperi presentation
Pepperi presentationPepperi presentation
Pepperi presentation
 
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
Instance Matching Benchmarks in the ERA of Linked Data - ISWC2017
 
Implementing an Open Source IT Ticketing System at Queen's University Library
Implementing an Open Source IT Ticketing System at Queen's University LibraryImplementing an Open Source IT Ticketing System at Queen's University Library
Implementing an Open Source IT Ticketing System at Queen's University Library
 
Semantic DEX Components
Semantic DEX ComponentsSemantic DEX Components
Semantic DEX Components
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
Mashcat 2017
Mashcat 2017Mashcat 2017
Mashcat 2017
 
Crossant
CrossantCrossant
Crossant
 
Reproducible research(1)
Reproducible research(1)Reproducible research(1)
Reproducible research(1)
 
Data Pipeline Installation Quality
Data Pipeline Installation QualityData Pipeline Installation Quality
Data Pipeline Installation Quality
 
Advanced templates
Advanced templatesAdvanced templates
Advanced templates
 
MLBox
MLBoxMLBox
MLBox
 
Fundamentals of Software Engineering
Fundamentals of Software Engineering Fundamentals of Software Engineering
Fundamentals of Software Engineering
 
Pointers
PointersPointers
Pointers
 
PSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R CommanderPSPP overview and Introduction to R & R Commander
PSPP overview and Introduction to R & R Commander
 
Static Import and access modifiers
Static Import and access modifiersStatic Import and access modifiers
Static Import and access modifiers
 
Relationships between research tasks and data structure (basic methods and a...
Relationships between research tasks and data structure (basic  methods and a...Relationships between research tasks and data structure (basic  methods and a...
Relationships between research tasks and data structure (basic methods and a...
 
MR201402 effectiveness of unknown malware classification by logistic regressi...
MR201402 effectiveness of unknown malware classification by logistic regressi...MR201402 effectiveness of unknown malware classification by logistic regressi...
MR201402 effectiveness of unknown malware classification by logistic regressi...
 
166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...166 - ISBSG variables most frequently used for software effort estimation: A ...
166 - ISBSG variables most frequently used for software effort estimation: A ...
 
Automate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAutomate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBox
 
Etl testing contents
Etl testing contentsEtl testing contents
Etl testing contents
 

Viewers also liked

08 god sb (1)
 08 god sb (1) 08 god sb (1)
08 god sb (1)
Nick Pellicciotta
 
God’s healing way 16
God’s healing way 16God’s healing way 16
God’s healing way 16
Nick Pellicciotta
 
10 create (1)
 10 create (1) 10 create (1)
10 create (1)
Nick Pellicciotta
 
Evolutionary Algorithms
Evolutionary AlgorithmsEvolutionary Algorithms
Evolutionary Algorithms
Alireza Andalib
 
UCSY CS Club Week1
UCSY CS Club Week1UCSY CS Club Week1
UCSY CS Club Week1
Ye Linn Wai
 
Media technologies construction
Media technologies constructionMedia technologies construction
Media technologies construction
Reise Tyler
 
God’s healing way 15
God’s healing way 15God’s healing way 15
God’s healing way 15
Nick Pellicciotta
 
costing of apparelproducts
costing of apparelproductscosting of apparelproducts
costing of apparelproducts
Aakash Singh
 
شرح إير سيرفر‬
شرح إير سيرفر‬شرح إير سيرفر‬
شرح إير سيرفر‬
لولا عبود
 
Micrel Foundry Selection - Whitepaper
Micrel Foundry Selection - WhitepaperMicrel Foundry Selection - Whitepaper
Micrel Foundry Selection - WhitepaperMårten Vrånes
 
Kolnik i kretanje pješaka kolnikom
Kolnik i kretanje pješaka kolnikomKolnik i kretanje pješaka kolnikom
Kolnik i kretanje pješaka kolnikomprometna
 
8 thesecondcomingofjesuschrist-110909031723-phpapp02
8 thesecondcomingofjesuschrist-110909031723-phpapp028 thesecondcomingofjesuschrist-110909031723-phpapp02
8 thesecondcomingofjesuschrist-110909031723-phpapp02Nick Pellicciotta
 
Kit facial grown alchemist
Kit facial grown alchemistKit facial grown alchemist
Kit facial grown alchemist
Renee Mulen
 
Moj projekat Sportovi
Moj projekat SportoviMoj projekat Sportovi
Moj projekat Sportovi
Petar Sajn
 
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตากบันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
ที่ทำการปกครองอำเภอเมืองตาก จังหวัดตาก
 
Writing an Effective 483 response
Writing an Effective 483 responseWriting an Effective 483 response
Writing an Effective 483 response
Kiran Kota
 

Viewers also liked (20)

08 god sb (1)
 08 god sb (1) 08 god sb (1)
08 god sb (1)
 
God’s healing way 16
God’s healing way 16God’s healing way 16
God’s healing way 16
 
10 create (1)
 10 create (1) 10 create (1)
10 create (1)
 
Thoughts on gethsemene
Thoughts on gethsemeneThoughts on gethsemene
Thoughts on gethsemene
 
Evolutionary Algorithms
Evolutionary AlgorithmsEvolutionary Algorithms
Evolutionary Algorithms
 
UCSY CS Club Week1
UCSY CS Club Week1UCSY CS Club Week1
UCSY CS Club Week1
 
Media technologies construction
Media technologies constructionMedia technologies construction
Media technologies construction
 
702
702702
702
 
God’s healing way 15
God’s healing way 15God’s healing way 15
God’s healing way 15
 
costing of apparelproducts
costing of apparelproductscosting of apparelproducts
costing of apparelproducts
 
شرح إير سيرفر‬
شرح إير سيرفر‬شرح إير سيرفر‬
شرح إير سيرفر‬
 
Micrel Foundry Selection - Whitepaper
Micrel Foundry Selection - WhitepaperMicrel Foundry Selection - Whitepaper
Micrel Foundry Selection - Whitepaper
 
Kolnik i kretanje pješaka kolnikom
Kolnik i kretanje pješaka kolnikomKolnik i kretanje pješaka kolnikom
Kolnik i kretanje pješaka kolnikom
 
8 thesecondcomingofjesuschrist-110909031723-phpapp02
8 thesecondcomingofjesuschrist-110909031723-phpapp028 thesecondcomingofjesuschrist-110909031723-phpapp02
8 thesecondcomingofjesuschrist-110909031723-phpapp02
 
Kit facial grown alchemist
Kit facial grown alchemistKit facial grown alchemist
Kit facial grown alchemist
 
Moj projekat Sportovi
Moj projekat SportoviMoj projekat Sportovi
Moj projekat Sportovi
 
Nation states
Nation statesNation states
Nation states
 
Bugs
BugsBugs
Bugs
 
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตากบันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
บันทึกการประชุมคณะกรรมการรักษาความมั่นคงและความสงบเรียบร้อยอำเภอเมืองตาก
 
Writing an Effective 483 response
Writing an Effective 483 responseWriting an Effective 483 response
Writing an Effective 483 response
 

Similar to IRE2014 Filtering Tweets Related to an entity

DITEC - Software Engineering
DITEC - Software EngineeringDITEC - Software Engineering
DITEC - Software Engineering
Rasan Samarasinghe
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
Poo Kuan Hoong
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18
TechSoup
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to use
Uma Ghotikar
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
Praveen Penumathsa
 
Mis unit iii by arnav
Mis unit iii by arnavMis unit iii by arnav
Mis unit iii by arnav
Arnav Chowdhury
 
Data manipulation
Data manipulationData manipulation
Data manipulation
Mohammed Hadra
 
An Easier Way to Prepare Clinical Trial Data for Reporting and Analysis
An Easier Way to Prepare Clinical Trial Data for Reporting and AnalysisAn Easier Way to Prepare Clinical Trial Data for Reporting and Analysis
An Easier Way to Prepare Clinical Trial Data for Reporting and AnalysisPerficient
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
RTTS
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
DataWorks Summit
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
Ahmet Bulut
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
Testing Frameworks
Testing FrameworksTesting Frameworks
Testing Frameworks
Moataz Nabil
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
Laurenz Wuttke
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework Designs
Test Automaton
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
Red Gate Software
 
Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City
Red Gate Software
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
Databricks
 

Similar to IRE2014 Filtering Tweets Related to an entity (20)

DITEC - Software Engineering
DITEC - Software EngineeringDITEC - Software Engineering
DITEC - Software Engineering
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18Webinar - Harness the Power of Data with Tableau - 2016-02-18
Webinar - Harness the Power of Data with Tableau - 2016-02-18
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Testing - How Vital and How Easy to use
Testing - How Vital and How Easy to useTesting - How Vital and How Easy to use
Testing - How Vital and How Easy to use
 
Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram Generating test cases using UML Communication Diagram
Generating test cases using UML Communication Diagram
 
Mis unit iii by arnav
Mis unit iii by arnavMis unit iii by arnav
Mis unit iii by arnav
 
Data manipulation
Data manipulationData manipulation
Data manipulation
 
An Easier Way to Prepare Clinical Trial Data for Reporting and Analysis
An Easier Way to Prepare Clinical Trial Data for Reporting and AnalysisAn Easier Way to Prepare Clinical Trial Data for Reporting and Analysis
An Easier Way to Prepare Clinical Trial Data for Reporting and Analysis
 
Creating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing AssignmentCreating a Project Plan for a Data Warehouse Testing Assignment
Creating a Project Plan for a Data Warehouse Testing Assignment
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Testing Frameworks
Testing FrameworksTesting Frameworks
Testing Frameworks
 
Making Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons LearnedMaking Data Science Scalable - 5 Lessons Learned
Making Data Science Scalable - 5 Lessons Learned
 
Test Automation Framework Designs
Test Automation Framework DesignsTest Automation Framework Designs
Test Automation Framework Designs
 
Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014Get Testing with tSQLt - SQL In The City Workshop 2014
Get Testing with tSQLt - SQL In The City Workshop 2014
 
Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City Part of the DLM story: Get your Database under Source Control - SQL In The City
Part of the DLM story: Get your Database under Source Control - SQL In The City
 
Consolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest AirportsConsolidating MLOps at One of Europe’s Biggest Airports
Consolidating MLOps at One of Europe’s Biggest Airports
 

Recently uploaded

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 

Recently uploaded (20)

"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 

IRE2014 Filtering Tweets Related to an entity

  • 1. FILTERING TWEETS RELATED TO AN ENTITY TEAM: GROUP 8, PROJECT 19 • MALLIKARJUN B R(201307681) • APRATIM UTKARSH(201305516) • RISHABH LADHA(201101014) • KARTIK DUBEY(201001117)
  • 2. Introduction • One of the major problem in monitoring the online reputation of companies, is to decide about the entity information. • Given a tweet, need to decide whether it belongs to a particular entity or not. • Problem is particularly hard in microblogging services such as Twitter.
  • 3. APPROACH • Supervised Machine learning is used to decide if the entity belongs to an entity or not. • Dataset from RepLab, home page and wikipedia page of the entity is being used. • It involves pre-processing of the above data, extracting features from the data to train using SVM. • Test data also goes through same procedure, the output is predicted using the weight vector obtained from the trained model.
  • 6. Pre-Processing • Extract user mentions and URLs • Convert hashtags to words by removing the hash symbol • Remove all punctuation • Convert text to lower case • Remove accents and convert non-ASCII characters to their ASCII equivalents • Remove stop-words based on the list of stop words for English.
  • 7. Features • Similarity w.r.t related tweets • Similarity w.r.t unrelated tweets • Keyword similarity using Word-Net database • Web similarity
  • 8. Tools Used • CMU POS-Tagger -http://www.ark.cs.cmu.edu/TweetNLP/ • Stanford Corenlp(POS Tagger and Lammetiser): -http://nlp.stanford.edu/software/corenlp.shtml • WordNet -http://lyle.smu.edu/~tspell/jaws/index.html?utm_source=twitterfeed&utm_medium=twitter • Jsoup Parser -http://jsoup.org/ • LIBSVM (For Multi-Class Classification) -http://www.csie.ntu.edu.tw/~cjlin/libsvm/
  • 9. Evaluation and Results • Corpus consists of tweets and a list of 61 entities. • Trained over each entity separately using libsvm. • Using the test data for each entity, we calculated the accuracy for entire dataset • Accuracy of entity varies from 96% to 40%. Overall accuracy is 80%.
  • 10. Conclusion • In this paper we tackled the problem of company name disambiguation in Twitter • The main goal of this task was to classify tweets as relevant or not to a given target entity • We have explored several types of features, namely similarity between keywords, TF-IDF of n-grams and we have also explored external resources such as Freebase and Wikipedia. • Results show that it is possible to achieve an Accuracy over 0.90.