SlideShare a Scribd company logo
1 of 12
Download to read offline
Recunoaşterea organizaţiilor în
    postările pe Tweeter
Coordonator:                     Absolvent:
Prof. Dr. Dan Cristea   Elena-Oana Tăbăranu
Dr. Diana Trandabăţ
Introducere
• Twitter: sit web fondat în 2006 ce permite
  utilizatorilor săi să posteze mesaje scurte de
  maximum 140 de caractere
• 500.000 de conturi nou create pe zi şi 140 de
  milioane de postări zilnice în martie 2011
  (Twitter, 2011)
• 19% dintre postări mentioneză un nume de
  companie sau produs: 50% conţin afirmaţii
  pozitive, 33% critică explicit(Jansen, 2009)

                  UAIC, Facultatea de Informatică   2
Descrierea sistemului
• Sistemul clasifică toate postarile disponibile
  pentru o companie a cărui nume este
  ambiguu: cele care se referă la organizaţia
  respectivă sunt identificate drept pozitive
  (true), iar cele care se referă la altceva drept
  negative (false).
• Exemplu:


                   UAIC, Facultatea de Informatică   3
Date de intrare
• Organizatorii competiţiei WePS Evaluation
  Workshop: Searching Information about
  Entities in the Web au pus la dispozitie 500 de
  nume şi 700 de postări pe Tweeter pentru
  fiecare companie în limba engleză, spaniolă
  sau ambele.




                  UAIC, Facultatea de Informatică   4
Module
1. Extragerea de
   profiluri ale
   companiilor
2. Clasificarea
   postărilor pe
   Tweeter



                   UAIC, Facultatea de Informatică   5
Extragerea de profiluri ale companiilor
Etape ale algoritmului:
1. Extragerea paginii de acasă a sitului web al unei organizaţii
2. Extragerea de cuvinte cheie din cadrul paginii de acasă: titlu,
   metadate, antete, legături
3. Salvarea informaţiilor extrase (cele mai frecvente 25 de
   cuvinte)
4. Extensia profilului cu termeni oferiţi de Google Sets
   (opţional)




                        UAIC, Facultatea de Informatică              6
Clasificarea postărilor pe Tweeter
Etape ale algoritmului:
1. Extragerea de informaţii pentru fiecare entitate de tip
    postare: companie, identificator, limbă şi conţinut.
2. Calculează eticheta pentru fiecare postare (true sau false):
   1. Curăţă postare.
   2. Calculează similaritatea postării faţă de profil: potrivire
       simplă, distanţă Levenshtein, WordNet.
3. Salvează postările de tip true.




                        UAIC, Facultatea de Informatică             7
Rezultate (I)

Configuraţie       F-measure       Precizie           Recall
                   (alpha=0.5)
WordNet            0.14            0.65               0.15
Complet
WordNet Parţial    0.10            0.63               0.09
Potrivire simplă   0.08            0.62               0.07
şi Google Sets
Potrivire simplă   0.03            0.59               0.03




                    UAIC, Facultatea de Informatică            8
Rezultate (II)
runName       query     F-measure     precision    recall    numSampl   true_true   true_false   false_false   false_true
                        (alpha=0.5)                          es



WordNet       alcatel   0.38          0.38         0.38      481        173         287          10            7
parţial



WordNet       alcatel   0.55          0.54         0.55      481        252         208          9             8
complet



Potrivire     alcatel   0.26          0.27         0.25      481        115         345          16            1
simplă



Potrivire     alcatel   0.33          0.34         0.33      481        151         309          12            5
simplă şi
Google Sets




                                                      UAIC, Facultatea de Informatică                                       9
Concluzii
• În postările pe Tweeter ale unei organizaţii se
  regăsesc atât termeni rar întâlniţi în conţinutul
  paginii de acasă (nume şi produse ale firmelor
  concurente), cât şi cuvinte ce reprezintă
  concepte similare celor din profil.
• Sarcină de lucru dificilă: postările au puţine
  cuvinte, doar un context minimal este
  disponibil pentru a rezolva problema
  dezambiguizării entităţilor

                  UAIC, Facultatea de Informatică   10
Îmbunătăţiri
• Profilul unei companii poate conţine cuvinte
  din mai multe surse (Wikipedia, DBpedia)
• Etichetele pot avea ponderi diferite
• Postările identificate drept pozitive pentru o
  companie pot fi folosite drept corpus al unui
  sistem de analiză a sentimentelor



                  UAIC, Facultatea de Informatică   11
Bibliografie
• WePS 3: searching information about entities in the Web. [Interactiv]
  http://nlp.uned.es/weps/weps-3.
• Surender Reddy Yerva, Zoltan Miklos, and Karl Aberer. It was easy, when
  apples and blackberries were only fruits.
• Bootstrapping Websites for Classification of Organization Names on
  Twitter. Kalmar, Paul.
• M.A. Garcia-Cumbreras, M. Garcia-Vega, F. Martinez-Santiago and J.M.
  Peria-Ortega. SINAI at WEPS-3: Online Reputation Management ́.
• http://blog.twitter.com/2011/03/numbers.html. Twitter Blog. [Interactiv]
• Twitter Power:Tweets as Electronic Word of Mouth. Bernard J. Jansen,
  Mimi Zhang, Kate Sobel, Abdur Chowdury. s.l. : JOURNAL OF THE
  AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009,
  Vol. 60(11):2169–2188.
• The Anatomy of a Large-Scale Hypertextual. Page, Sergey Brin and
  Lawrence. s.l. : Computer Science Department, Stanford University,
  Stanford, CA 94305, USA, 1998.

                           UAIC, Facultatea de Informatică              12

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Recunoasterea organizatiilor in postarile pe Tweeter

  • 1. Recunoaşterea organizaţiilor în postările pe Tweeter Coordonator: Absolvent: Prof. Dr. Dan Cristea Elena-Oana Tăbăranu Dr. Diana Trandabăţ
  • 2. Introducere • Twitter: sit web fondat în 2006 ce permite utilizatorilor săi să posteze mesaje scurte de maximum 140 de caractere • 500.000 de conturi nou create pe zi şi 140 de milioane de postări zilnice în martie 2011 (Twitter, 2011) • 19% dintre postări mentioneză un nume de companie sau produs: 50% conţin afirmaţii pozitive, 33% critică explicit(Jansen, 2009) UAIC, Facultatea de Informatică 2
  • 3. Descrierea sistemului • Sistemul clasifică toate postarile disponibile pentru o companie a cărui nume este ambiguu: cele care se referă la organizaţia respectivă sunt identificate drept pozitive (true), iar cele care se referă la altceva drept negative (false). • Exemplu: UAIC, Facultatea de Informatică 3
  • 4. Date de intrare • Organizatorii competiţiei WePS Evaluation Workshop: Searching Information about Entities in the Web au pus la dispozitie 500 de nume şi 700 de postări pe Tweeter pentru fiecare companie în limba engleză, spaniolă sau ambele. UAIC, Facultatea de Informatică 4
  • 5. Module 1. Extragerea de profiluri ale companiilor 2. Clasificarea postărilor pe Tweeter UAIC, Facultatea de Informatică 5
  • 6. Extragerea de profiluri ale companiilor Etape ale algoritmului: 1. Extragerea paginii de acasă a sitului web al unei organizaţii 2. Extragerea de cuvinte cheie din cadrul paginii de acasă: titlu, metadate, antete, legături 3. Salvarea informaţiilor extrase (cele mai frecvente 25 de cuvinte) 4. Extensia profilului cu termeni oferiţi de Google Sets (opţional) UAIC, Facultatea de Informatică 6
  • 7. Clasificarea postărilor pe Tweeter Etape ale algoritmului: 1. Extragerea de informaţii pentru fiecare entitate de tip postare: companie, identificator, limbă şi conţinut. 2. Calculează eticheta pentru fiecare postare (true sau false): 1. Curăţă postare. 2. Calculează similaritatea postării faţă de profil: potrivire simplă, distanţă Levenshtein, WordNet. 3. Salvează postările de tip true. UAIC, Facultatea de Informatică 7
  • 8. Rezultate (I) Configuraţie F-measure Precizie Recall (alpha=0.5) WordNet 0.14 0.65 0.15 Complet WordNet Parţial 0.10 0.63 0.09 Potrivire simplă 0.08 0.62 0.07 şi Google Sets Potrivire simplă 0.03 0.59 0.03 UAIC, Facultatea de Informatică 8
  • 9. Rezultate (II) runName query F-measure precision recall numSampl true_true true_false false_false false_true (alpha=0.5) es WordNet alcatel 0.38 0.38 0.38 481 173 287 10 7 parţial WordNet alcatel 0.55 0.54 0.55 481 252 208 9 8 complet Potrivire alcatel 0.26 0.27 0.25 481 115 345 16 1 simplă Potrivire alcatel 0.33 0.34 0.33 481 151 309 12 5 simplă şi Google Sets UAIC, Facultatea de Informatică 9
  • 10. Concluzii • În postările pe Tweeter ale unei organizaţii se regăsesc atât termeni rar întâlniţi în conţinutul paginii de acasă (nume şi produse ale firmelor concurente), cât şi cuvinte ce reprezintă concepte similare celor din profil. • Sarcină de lucru dificilă: postările au puţine cuvinte, doar un context minimal este disponibil pentru a rezolva problema dezambiguizării entităţilor UAIC, Facultatea de Informatică 10
  • 11. Îmbunătăţiri • Profilul unei companii poate conţine cuvinte din mai multe surse (Wikipedia, DBpedia) • Etichetele pot avea ponderi diferite • Postările identificate drept pozitive pentru o companie pot fi folosite drept corpus al unui sistem de analiză a sentimentelor UAIC, Facultatea de Informatică 11
  • 12. Bibliografie • WePS 3: searching information about entities in the Web. [Interactiv] http://nlp.uned.es/weps/weps-3. • Surender Reddy Yerva, Zoltan Miklos, and Karl Aberer. It was easy, when apples and blackberries were only fruits. • Bootstrapping Websites for Classification of Organization Names on Twitter. Kalmar, Paul. • M.A. Garcia-Cumbreras, M. Garcia-Vega, F. Martinez-Santiago and J.M. Peria-Ortega. SINAI at WEPS-3: Online Reputation Management ́. • http://blog.twitter.com/2011/03/numbers.html. Twitter Blog. [Interactiv] • Twitter Power:Tweets as Electronic Word of Mouth. Bernard J. Jansen, Mimi Zhang, Kate Sobel, Abdur Chowdury. s.l. : JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2009, Vol. 60(11):2169–2188. • The Anatomy of a Large-Scale Hypertextual. Page, Sergey Brin and Lawrence. s.l. : Computer Science Department, Stanford University, Stanford, CA 94305, USA, 1998. UAIC, Facultatea de Informatică 12