Spanish Corpus for Sentiment Analysis towards Brands
1. Spanish Corpus for Sentiment
Analysis towards Brands
María Navas-Loro, Víctor Rodríguez-Doncel,
Idafen Santana-Perez, Alberto Sánchez
Technical University of Madrid
mnavas@fi.upm.es
SPECOM, 14th September 2017
5. Main problem
And even though we can find corpora in several fields,
such as the medical or the touristic, or more general
opinions...
… there is nothing for
opinion towards brands
in Spanish!
16. Corpus building
1. Selection
of the brands
2. Acquisition
of tweets
3. Sifting 4. Tagging 5. Transformation
Sector Brand
BEVERAGES Cruzcampo, Heineken, Estrella Galicia, Mahou
AUTOMOTIVE Citroën, Fiat, Hyundai, Kia, Peugeot, Toyota
BANKING
Bankia, Bankinter, BBVA, Sabadell, ING, La
Caixa/Caixabank, Santander
FOOD Auchan, Bimbo, Hacendado, Milka, Pascual, Puleva
RETAIL
Alcampo, Carrefour, Decathlon, Ikea, Leroy Merlin,
Mediamarkt, Mercadona
TELECOM Amena, Lowi, Movistar, Orange, Vodafone, Yoigo
SPORTS Adidas, Nike, Reebok
17. Corpus building
2. Acquisition
of tweets
1. Selection of
the brands
3. Sifting 4. Tagging 5. Transformation
• Collected from Twitter between the 1st and the 7th
February 2017.
• Using keywords just related to the name of the
brands.
• Using just a filter, for Spanish tweets.
• Avoiding retweets.
• At the end of this step, there remainded more tan
23,000 tweets.
18. Corpus building
3. Sifting
1. Selection of
the brands
4. Tagging 5. Transformation
• Manual and automatic screening:
• Repeated tweets.
• The same tweet changing the URL.
• No real Brand (polysemous).
• Other languages…
• We obtain the final 4548 tweets.
2. Acquisition
of tweets
19. Corpus building
4. Tagging
1. Selection of
the brands
5. Transformation
• Three taggers for BEVERAGES, one for the whole corpus.
• Tags are per post, being possible have different emotions.
• Specific criteria was given to the taggers (available with
the corpus), along with information on interagreement and
Kappas. Example of criteria:
HAPPINESS is only given to products already acquired, not to future
purchases.
If a desired product is not found, SATISFACTION and SADNESS are
tagged.
2. Acquisition
of tweets
3. Sifting
20. Corpus building
5. Transformation
1. Selection of
the brands
• The corpus is represented as linked data, reusing
ontologies and also creating new classes:
• Marl and Onyx.
• SIOC.
• GoodRelations.
• It is also linked to external databases such as:
• Thomson Reuters’ PermID.
• DBpedia.
2. Acquisition
of tweets
3. Sifting 4. Tagging
23. SAB corpus connections
SAB corpus connects to external resources and uses
a proper vocabulary along with reusing
several ontologies.
Post
Meaningful
Brands
Marketing
Mix
Sentiment
Analysis
Purchase
Funnel
Marketplace
Personal
Wellbeing
Collective
Wellbeing
AwarenessEvaluation
Purchase
Postpurchase
Review
Product
Price
Promotion
Place
Hate / Love
Satisfaction / Dissatisfaction
Hapiness / Sadness
Trust / Fear
24. Example of a post
lps:826812979421257730 a sioc:Post ;
sioc:id "826812979421257730" ;
sioc:content "Ya me quede sin credito?? Hace 3 dias
tengo credito nomas... Movistar y la concha de tu
hermana"@es ;
marl:describesObject lps:Movistar ;
lps:isInPurchaseFunnel lps:postPurchase;
lps:hasMarketingMix lps:price;
lps:hasMeaningfulBrand lps:marketplace;
onyx:hasEmotion lps:hate, lps:dissatisfaccion ;
marl:hasPolarity marl:negative ;
marl:forDomain "TELCO" .
lps:hate a onyx:Emotion ;
rdfs:label "odio"@es, "hate"@en .
lps:dissatisfaction a onyx:Emotion ;
rdfs:label "insatisfaccion"@es, "dissatisfaction"@en .
25. Example of information for a brand and a company
lps:Movistar a gr:Brand ;
rdfs:seeAlso <http://dbpedia.org/resource/Movistar>
;
rdfs:label "Movistar" .
lps:1-5000062703 a gr:Business ;
rdfs:label "Telefonica de Espana, S.A.U.";
rdfs:seeAlso
<https://opencorporates.com/companies/es/82018474> ;
owl:sameAs permid:1-5000062703 .
27. Contributions and future lines
Some contributions of the SAB corpus:
• It covers a gap in the Spanish Sentiment Analysis.
• It offers a representation for Sentiment Analysis towards
Brands independent of the language.
• It offers Linked Data information that:
• Prevent corpus to be outdated (changes in names, for instance).
• Offer data related to brands beyond the text (CEOs…)
Future lines:
• Full annotation of all the aspects.
• More links.
• More tweets.
• Semantic annotation of emotional keywords.
28. Bibliography
Breslin, J.G., Decker, S., et al.: Sioc: an approach to connect web-
based communities. International Journal of Web Based
Communities 2(2), 133-142 (2006)
Sanchez Rada, J.F., Torres, M., et al.: A linked data approach to
sentiment and emotion analysis of twitter in the financial domain.
In: 2nd International Workshop on Finance and Economics on the
Semantic Web (2014)
Hepp, M.: Goodrelations: An ontology for describing products and
services offers on the web. In: International Conference on
Knowledge Engineering and Knowledge Management. pp. 329-
346. Springer (2008)
Thomson Reuters’ PermID: https://permid.org/
Dbpedia: http://dbpedia.org/
29. Link to the corpus
http://sabcorpus.linkeddata.es/
Thank you for your attention
30. Spanish Corpus for Sentiment
Analysis towards Brands
María Navas-Loro, Víctor Rodríguez-Doncel,
Idafen Santana-Perez, Alberto Sánchez
Technical University of Madrid
mnavas@fi.upm.es
SPECOM, 14th September 2017