More Related Content Similar to BIG DATA Opinion Analisis SABM LATAM 201404 Similar to BIG DATA Opinion Analisis SABM LATAM 201404 (20) BIG DATA Opinion Analisis SABM LATAM 2014041. [Big Data] Simple Exercise of Consumer
Preferences Analysis Based on Twits
for SAB Miller LATAM Brands By Gustavo Pabón – May 2014
2. Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
3. Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences?
4. Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences? Twitter may help
5. Our group mission:
To own and nurture local and
international brands that are the
first choice of the consumer
How to measure the
consumer preferences? Twitter may help
6. Here it is presented the result of a simple
exercise of consumer preferences analysis
based on twits from 2nd of April of 2014 to
26th of April of 2014.
7. Here it is presented the result of a simple
exercise of consumer preferences analysis
based on twits from 2nd of April of 2014 to
26th of April of 2014.
On a scale from 1 to 5*, the weighted average of SAB
Miller LATAM consumer preference was:
4.76
* The scale will be explained in next slide
8. © SABMiller plc 2012
Internal Use / Confidential / Secret
Exercise Summary
Twits sample streaming
range of dates and filter
Twits were streamed from 04/02/2014 to 04/26/2014 with a GAP from 04/12/2014 to
04/15/2014. The GAP was due to a technical issue on the streaming program. The
filter used was based on keywords presented on the next slide.
Twits sample size
The raw data size was 33.853 twits.
First step of data selection was filtering twits not related to SAB Miller LATAM / Global
Brands using bag of words technique. The result was a reduction to 20.044 twits.
Second step of data selection was filtering twits not related to consumer preferences
using crowd sourcing (Amazon Mechanical Turk). The result was a reduction to 3669
twits.
Consumer preference
scale from 1 to 5
Using crowd sourcing, each twit were classified by 10 different people in three
categories: 1 negative preference, 5 positive preference, 3 neutral. If more than 6
people agree on the preference, the twit was classified on such preference, if not, the
twit was classified as neutral.
Why crowd sourcing?
It is very difficult for an automatic sentiment analysis program to work with twits. They
are usually not well written, have a lot of slangs and sarcasms. In addition Spanish
internet language is not as studied as English. Human raters typically agree 79% of the
time*, while a program is at most 70% accurate. The first run of an automatic
sentiment analysis were able to classify just 151 twits.
8Presentation information in footer
* Taken from Ogneva, M. "How Companies Can Use Sentiment Analysis to Improve Their Business". Retrieved 2012-12-13.
9. © SABMiller plc 2012
Internal Use / Confidential / Secret
Keywords used for streaming filter (1 of 2)
Global brands
@Grolsch, #Grolsch, @Miller_Global, @MillerCoors, #MillerGenuineDraft, @Birra_Peroni, @peroniclub,
#PeroniNastroAzzurro, #peroni, @Pilsner_Urquell, #PilsnerUrquell, @MillerLite, #millerlite, @MGD_Argentina,
@MillerLiteAR, @MillerLiteCol, @millerlitehn, @MillerPanama, @MillerLitepa, @Miller_SLV.
Argentina’s brands
@CervezaIsenbeck, #isenbeck, @Warsteiner, @WarsteinerAR, #Warsteiner.
Colombia’s brands
@CervezaAguila, #AguilaLight, #aguila, #CervezaAguila, @clubcolombia, #ClubColombia, #clubcolombiadorada,
#clubcolombiaroja, #clubcolombianegra, @cervezacostena, #CervezaCosteña, #CervezaCostena, @PilsenCerveza,
#Pilsen, @CervezaPoker, #CervezaPoker, @pokerligera, #pokerligera, #colaypola, @ReddsColombia, #redds.
Ecuador’s brands
@ClubPremiumEc, #ClubPremium, #ClubPremiumRoja, #ClubPremiumNegra, @cervezaconquer, @PilsenerEcuador,
@Miller_Ecuador
9Presentation information in footer
10. © SABMiller plc 2012
Internal Use / Confidential / Secret
Keywords used for streaming filter (2 of 2)
Salvador’s brands
@BarenaHN, #Barena, @Barena_Peru, @PilsenerSV, @PilsenerLiteSV, #PilsenerLite, #Pilsener, @RegiaSV,
#regiaextra, @SupremaSV, #cervezasuprema, @GoldenSV, #cervezagolden
Honduras’ brands
@ImperialHN, #CervezaImperial, #imperialhn, @PortRoyalhn_com, @SalvaVidaHn, #salvavida, #salvavidahn,
#cervezasalvavida, @BarenaHN, #Barena
Panamá’s brands
@cervezaaltlas, #cervezaatlas, @Cerveza_BALBOA, #cervezabalboa
Peru’s brands
@cerarequipena, #cervezaarequipena, #cervezaarequipeña, #arequipeña, #arequipena, #Barena, @Barena_Peru,
@CristalPeru, #cervezacristal, @cusquenaperu, #cusqueñaperu, #cusquenaperu, #cervezacusqueña,
#cervezacusquena, #cusqueñamalta, @Pilsen_Callao, #PilsenCallao, @Pilsen_Trujillo, #PilsenTrujillo,
#CervezaSanJuan, @Backus_Ice, #BackusIce
10Presentation information in footer
12. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
12Presentation information in footer
13. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
13Presentation information in footer
Number of twits
14. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
14Presentation information in footer
Average consumer
preference rate
15. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
15Presentation information in footer
Using SSD (sum of squared distance) from “Number of twits”
and “rate”, Salvador did have the highest rate: 4,83.
16. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by country
16Presentation information in footer
Using SSD, Argentina did
have the lowest rate: 4,32
17. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
17Presentation information in footer
18. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
18Presentation information in footer
Using SSD, Pilsener
Salvador did have the
highest rate: 4,85
19. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by brand
19Presentation information in footer
Using SSD, Barena did
have the lowest rate: 3,94
20. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by date and day of week
20Presentation information in footer
21. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by date
21Presentation information in footer
GAP due to a technical issue
on the streamer program
22. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
22Presentation information in footer
23. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
23Presentation information in footer
From Wednesday to Friday the
number of twits increases as
well as the rate.
24. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by day of week
24Presentation information in footer
From Friday to Sunday the
number of twits decreases as
well as the rate.
25. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by hour
25Presentation information in footer
26. © SABMiller plc 2012
Internal Use / Confidential / Secret
Consumer preference rating consolidated by hour
26Presentation information in footer
From 10 am to 1 am, most
of twits were posted. But,
in average, rates do not
change much.
27. © SABMiller plc 2012
Internal Use / Confidential / Secret
Word picture of common words of positive preference twits.
27Presentation information in footer
28. © SABMiller plc 2012
Internal Use / Confidential / Secret
Word picture of common words of negative preference twits.
28Presentation information in footer
29. Conclusion
I could conclude from this simple exercise that
sentiment and opinion analyses on twits related
to SAB Miller LATAM brands can be an
alternative tool to effectively measure
customer preferences.
30. [Big Data] Simple Exercise of Consumer
Preferences Analysis Based on Twits
for SAB Miller LATAM Brands By Gustavo Pabón – May 2014