Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Análise de sentimento durante a Copa usando Big Data

2,182 views

Published on

A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag. Todos os posts públicos em português são capturados por um sistema IBM de alta tecnologia com inteligência artificial, que é treinado para aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo. Essa tecnologia é capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial. Esta solução apelidade de FAMA, foi utilizada durante os jogos do Brasil na Copa das Confederações em 2013 e evoluída para funcionar nos 64 jogos da Copa do Mundo FIFA 2014. Nesta apresentação contarei a motivação, detalhes técnicos e resultados desta empreitada que unificou futebol, redes sociais e tecnologia!

Leia mais em http://alanbraz.wordpress.com/2014/08/07/tdc2014/

Published in: Data & Analytics

Análise de sentimento durante a Copa usando Big Data

  1. 1. © 2014 IBM Corporation IBM Research – Brazil 1 Análise de sentimento durante a Copa usando Big Data Alan Braz – IBM Research @alanbraz
  2. 2. © 2014 IBM Corporation IBM Research – Brazil 2 Alan Braz IBM Research – Brazil Research Software Engineer 2002:2005 UNICAMP – BSc in Computer Science 2005aug:2005nov IBM GBS – Java developer intern 2005:2007 IBM GBS – Java developer (WWER) 2007:2010 IBM GBS – Technical leader (eAC) 2009:2012 IBM GBS – Agile coach and instructor (GenO) 2009:today Metrocamp – SE, RUP, Agile grad teacher 2010:2012 IBM GBS – Software Architect (Blue Community) 2009:2013 UNICAMP – MSc Agile Software Engineering 2013feb:today IBM Research Brazil as RSE www.alanbraz.com.br @alanbraz
  3. 3. © 2014 IBM Corporation IBM Research – Brazil Innovation and Comfort 3 Trial-and-Error: – start-ups RADICAL INNOVATION INNOVATION Science-Based: – scientific method (empirical) – logic deduction (mathematics)
  4. 4. © 2014 IBM Corporation IBM Research – Brazil 4 science-based innovation
  5. 5. © 2014 IBM Corporation IBM Research – Brazil The World is our Lab: 12 Labs Worldwide in 10 Countries 5 Almaden Watson China Austin Israel Japan Switzerland India Ireland Australia Behavioral Science Chemistry Electrical Engineering Computer Science Africa Materials Science Mathematical Science Physics Services Science IBM Research world-wide has 1600+ PhDs with diversity of disciplines:
  6. 6. © 2014 IBM Corporation IBM Research – Brazil 6
  7. 7. © 2014 IBM Corporation IBM Research – Brazil 7
  8. 8. © 2014 IBM Corporation IBM Research – Brazil 8
  9. 9. © 2014 IBM Corporation IBM Research – Brazil 9 IBM Research - Brazil Natural resources modeling, analytics, and logistics. Systems of engagement and insights. Social Data Analytics Analytics and modeling of social and human data and applications. Micro/nano- technologies aimed at addressing smarter planet challenges. Smarter Natural Resources Systems of Engagement and Insights Smarter Devices São Paulo Rio de Janeiro A team of world class researchers in close connection to the other 12 IBM Research labs an to the world’s best scientific, academic, and development communities.
  10. 10. Five Factor Model •Openness •Conscientious •Extroverted •Agreeable •Neuroticism Ford’s 12 “Universal Needs” •Structure •Challenge •Excitement •Liberty •Harmony •Closeness © 2014 IBM Corporation IBM Research – Brazil System U: Modeling People from Social Media Five Values •Self-transcendence •Conservation •Self-enhancement •Hedonism •Openness-to-Change 10 Social behaviors e.g., when tweeting Social behaviors e.g., when tweeting Five Factor Model Openness Conscientious Extroverted Agreeable Neuroticism Ford’s 12 “Universal Needs” Structure Challenge •Excitement •Liberty •Harmony •Closeness •Practicality •Self-expression •Curiosity • Ideals • Love •Stability Five Values Self-transcendence Conservation Self-enhancement Hedonism Openness-to-Change
  11. 11. © 2014 IBM Corporation IBM Research – Brazil Project: Social Media Behavior Simulation Maira Gatti, Ana Appel, Claudio Pinhanez, Rogério de Paula, Cicero dos Santos, Alexander Rademaker, Paulo Cavalin, Samuel Barbosa, Daniel Gribel  Goal: to create a tool for companies to explore the impact and result of social media actions through simulation.  Applications:  exploration of effort size 11 and impact of marketing campaigns;  determination of counter-information measures in viral media outbreaks. Simulation of Obama/Romney Twitter campaigns in the last month before election in the last month before election Romney’s Network 5.1M tweets 28,145 active users 5,498 followers Obama’s Network 23,856,961 followers Romney’s Network 1,675,792 followers Sample - Sept 22 to Oct 29, 2012 Obama’s Network 5.6M tweets 24,526 active users 3,594 followers
  12. 12. © 2013 IBM Corporation
  13. 13. Video: Ei! https://www.youtube.com/watch?v=b7IvNyLvizQ
  14. 14. © 2014 IBM Corporation IBM Research – Brazil 14 Ei! 194 Million Brazilians Helping their National Team’s Coach  An app made specifically for one person: Luiz Felipe Scolari, coach of the Brazilian national soccer team.  Ei! is an app that identifies, filters and analyzes all the Twitter comments that Brazilians have made during the games.  With the touch of a button, Scolari will know what the country consensus is on:  At half time: which players the audience are liking and hating, what changes should be made, which tactics should be explored, what player needs to be introduced…  After the game: his country’s perspective on how the team, the players and his performance as a coach.
  15. 15. Challenges •Real-time issues • Up to 5 million tweets per match • Up to 20 thousands tweets per minute • Texting x Writing: Casual language • nao disse , Balotelli ia meter gol hoje , um golaço ainda , madero aquele negoo • hora de colocar o Leandro né Felipão ? u.u • vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção © 2014 IBM Corporation IBM Research – Brazil 15 brasileira , brasil nao tomava gol de p### de chile não viu • jah to vendo o Brasil faze nois passa vergonha na copa ! ! ! pq meu g-zuis ... • acho q o ronaldinho tem que ser totula • Com todo o respeito , Luis Fabiano , popcorn men hahahahaha beijo para quem entendeu , pior piada ever ! Haha
  16. 16. © 2014 IBM Corporation IBM Research – Brazil 16 Social Sentiment Analysis is Difficult (CHEvATM) Diego costa merece errar por ter escolhido outra seleçao pra jogar (BRAvITA) Itália perdendo o segundo jogador lesionado com TRINTA minutos de jogo. Prandelli deve tá jogando o Football Manager 2013. (BRAvITA) PAAAAAAAARTIU ASSISTIR JOGO DO Brazil! (BRAvITA) Vacilo, Jô ia entrar e fazer mais um (BRAvMEX) o que aconteceu com a seleção ? Pqp (BRAvURU) no momento dançando show das poderosas de sutiã e short jeans (RMAvATM) BALE AMOR FAÇA AQUELE LINDO GOL QUE PROMETEU PRA MIM ONTEM A NOITE (BRAvMEX) Brazil vai ganhando do México, vingando-se das Olimpíadas, num jogo que vale tanto quanto troco em bala. (SAOvCOR) o ganso so quer fazer jogada genial (SAOvCOR) Com essa Fabulosa em campo o Sao Paulo sempre vai fazer gol contra o Corinthians, entenda tecnico retranqueiro do c####### (SAOvCOR) Mano meu pai ganho 500 conto no jogo do bixo kkkk
  17. 17. © 2014 IBM Corporation IBM Research – Brazil 17 Ei! Social Sentiment Solution
  18. 18. © 2014 IBM Corporation IBM Research – Brazil 18 Algorithmic Trading Powerful Analytics Millions of events per second Microsecond Latency Real time delivery Traditional / Non-traditional data sources Telco Churn Prediction Smart Grid Cyber Security Government / Law enforcement ICU Monitoring Environment Monitoring InfoSphere Streams A Platform for Real Time Analytics on BIG Data Key Big Data Challenge – Velocity Volume: Terabytes per second Petabytes per day Variety: All kinds of data All kinds of analytics Velocity: Insights in microseconds
  19. 19. © 2014 IBM Corporation IBM Research – Brazil 19 http://www.ibm.com/developerworks/analytics/
  20. 20. © 2014 IBM Corporation IBM Research – Brazil 20
  21. 21. © 2014 IBM Corporation IBM Research – Brazil Streams Runtime Illustrated 21 Optimizing scheduler assigns PEs to hosts, and continually manages resource allocation Commodity hardware – laptop, blades or high performance clusters Meters Company Filter Usage Model Usage Contract Temp Action x86 host x86 host x86 host x86 host x86 host Dynamically add hosts and jobs New jobs work with existing jobs Text Extract Degree History Compare History Store History Meters Season Adjust Daily Adjus t Text Extract
  22. 22. Ei! is Built on FAMA: Real-Time Social Media Polarity Analysis Tool for Portuguese Language © 2014 IBM Corporation IBM Research – Brazil 22  FAMA is social sentiment analysis tool for the Portuguese Language developed by IBM Research - Brazil  FAMA processes text related to topics of interest which appear in social media: Twitter, Facebook, ReclameFacil, etc.; or in private text repositories such as customer complaints or call center logs.  FAMA can determine polarity related to the topics of interest: positive, negative, or neutral.  FAMA can find most commonly used terms and their co-occurrences with the topics of interest. “FAMA” Greek goddess of gossip and rumor
  23. 23. FAMA: Real-Time Social Media Polarity Analysis in Portuguese © 2014 IBM Corporation IBM Research – Brazil 23 Text Classifier classified database Infosphere Streams Stream Computin g learned database JSONs Text Analytics dashboard user interface FAMA
  24. 24. © 2014 IBM Corporation IBM Research – Brazil Construction of the Learned Database from Manual Analysis of Tweet Samples 24 The data for the learned database is created by manual inspection of tweets: about 2000 tweets from 4 friendly matches 15 different coders with different degrees of interest and knowledge of soccer uses tool to display, collect, and process the data.
  25. 25. © 2014 IBM Corporation IBM Research – Brazil FAMA Analysis of a Tweet: Example of Text Classification 25 vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira , brasil nao tomava gol de p### de chile não viu vou ser repetitivo de novo , mas : na minha epoca de jovem torcedor da seleção brasileira brasil nao tomava gol de p### de chile não viu feature: bad word verbs: vou, ser, tomava noums: epoca, brasil, gol, chile, seleção adjectives: repetitivo, jovem, brasileira, palavrão vou: ir (to go) ser: ser (to be) tomava: tomar (suffer) p###: palavrão (bad word)
  26. 26. © 2014 IBM Corporation IBM Research – Brazil 26 FAMA (2013): Social Sentiment Analysis with a Naïve Bayes Classifier Sentiment Analysis Learning a Classifier hj vai dar Brazil!, positive Felipão é mt burrro, negative O jogo começa as 16h, neutral function H Naive Bayes Classifier function H Supervised Learning Algorithm neymar ta jogando mt hj!!! positive neutral negative manually annotated corpus
  27. 27. © 2014 IBM Corporation IBM Research – Brazil Game - Timeline 27
  28. 28. © 2014 IBM Corporation IBM Research – Brazil 28 Confederations Cup Final: Brazil 3x0 Spain
  29. 29. © 2014 IBM Corporation IBM Research – Brazil Players and Main Topics 29
  30. 30. © 2014 IBM Corporation IBM Research – Brazil Players and Main Topics 30 Inspired by Social Media Streams (former TwitterVis) http://arena1.watson.ibm.com:8080/cav/
  31. 31. © 2014 IBM Corporation IBM Research – Brazil 31
  32. 32. © 2014 IBM Corporation IBM Research – Brazil 32
  33. 33. © 2014 IBM Corporation IBM Research – Brazil 33 www.craquedasredes.com.br A tecnologia de análise de sentimento social, desenvolvida pela IBM Brasil, analisa o que está sendo postado nas redes sociais sobre qualquer tema, empresa ou pessoa, sem a necessidade de uma hashtag. Todos os posts públicos em português são capturados por um sistema IBM de alta tecnologia com inteligência artificial, que é treinado para aprender a interpretar se o sentimento de cada postagem é positivo, neutro ou negativo. Essa tecnologia é capaz de analisar postagens de diversos assuntos e naturezas, incluindo gírias, sarcasmo e linguagem coloquial.
  34. 34. Video: Copa https://www.youtube.com/watch?v=748YIZn-p4U
  35. 35. © 2014 IBM Corporation IBM Research – Brazil 35 Limitations of Naive Bayes Approach - Extra Labeling Needed Naive Bayes Penalty kick for Uruguay - David Luiz commited it - Júlio César defended it Naive Bayes Brazil x Uruguay – Semi-final David Luiz commited: - too much neutral Julio Cesar defended: - too much neutral - too much negative
  36. 36. © 2014 IBM Corporation IBM Research – Brazil 36 Deep Learning Applied to Social Sentiment Analysis Sentiment Analysis Multi-Layer Neural Network function N Learning a Deep Learning Classifier hj vai dar Brazil!, positive Felipão é mt burrro, negative O jogo começa as 16h, neutral function N Deep Learning Algorithm neymar ta jogando mt hj!!! positive neutral negative large scale non-annotated corpus manually annotated corpus
  37. 37. Penalty kick for Uruguay - David Luiz commits it - Júlio César defends it © 2014 IBM Corporation IBM Research – Brazil 37 Brazil x Uruguay – Improvements with Deep Learning Naive Bayes Deep CNN
  38. 38. Brazil x Uruguay – Improvements with Deep Learning on Players Scores David Luiz commits penalty Julio Cesar defends penalty © 2014 IBM Corporation IBM Research – Brazil 38 Naive Bayes (FAMA) Deep CNN (Deep FAMA)
  39. 39. © 2014 IBM Corporation IBM Research – Brazil 39 Deep FAMA Covering All 64 Games of World Cup 2014 • all WC’14 64 games • 53M posts processed • 34M posts about the games • peak of 72K/minute • 5.8M different users • delivered by team composed by Research, GBS, GTS, SWG, and Software Lab BR • uses full IBM portfolio: • Infosphere Streams • Websphere • DB2 • Cognos BI • all running on SoftLayer
  40. 40. © 2014 IBM Corporation IBM Research – Brazil 40 Brazil 1x7 Germany: Social Anatomy of the Largest Event in SN History globally 35.6M tweets (WR) 6.8M posts in Portuguese (19% of world) peak of 72K/minute (after 5th goal) 1.4M tweets after the game 5th goal peak of 72K/minute David Luiz interview positive effects David Luiz interview 5th goal David Luiz saves the image of Brazil after the game: without David Luiz 271K positive comments about interview, Brazil post-game positive posts would decrease from 44% to 25% First half 1.7M: 32% 13% 55% Entire game 4.4M: 33% 13% 54%
  41. 41. © 2014 IBM Corporation IBM Research – Brazil 44 Results Used by TV Globo, ESPN, and TV Band Globo 2nd screen app 1M downloads, 1.1M page views ESPN Brazil 28K page views
  42. 42. © 2014 IBM Corporation IBM Research – Brazil 45 Ei! Social Sentiment Solution
  43. 43. © 2014 IBM Corporation IBM Research – Brazil 46 http://bigdatauniversity.com/bdu-wp/bdu-course/big-data-fundamentals/
  44. 44. © 2014 IBM Corporation IBM Research – Brazil 47 https://www.coursera.org/course/mmds
  45. 45. www.bluemix.net Artigos e tutoriais em português: www.ibm.com/developerworks/br/ © 2014 IBM Corporation IBM Research – Brazil 48 facebook.com/ibmbluemix twitter.com/ibmbluemix IBM Research – Brazil http://www.research.ibm.com/brazil/ Alan Braz - alanbraz@br.ibm.com - @alanbraz

×