A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

•Download as PPTX, PDF•

1 like•435 views

Cho, Seung Woo, et al. "Investigating Temporal and Spatial Trends of Brand Images Using Twitter Opinion Mining." Information Science and Applications (ICISA), 2014 International Conference on. IEEE, 2014.

Data & Analytics

Abstract
Twitter Data
Social
Scien
ce
Huma
n
ArtMedic
al
Econo
my
Sentiment
Analysis

Twitter Crawling
Twitter API
Streaming API
REST API
- Search API
Get 1% of all
twitter data in
real time
Get twitter data
from the keyword
2013.9.9.Mon. 9:35pm ~ Now
About 10,000 ~ 15,000 tweets per a day
Total 1,220,000 tweets (2013.11.2.Sat)

Data Pre-Processing
 Only get tweets which contain at least more than 3 Korean characters and tweets within
a 500km radius of Seoul, Korea.
 To remove foreign languages, special characters
 Remove tweets which only contain location information.
 Remove retweets
‫ويتكلم‬ ‫نهائيا‬ ‫السمع‬ ‫فقد‬ ‫متعب‬ ‫ابو‬ ‫الملك‬ ‫ان‬ ‫خبر‬ ‫اكد‬ ‫المستوى‬ ‫رفيع‬ ‫وامير‬ ‫موثوق‬ ‫صدر‬
‫مفهوم‬ ‫وغير‬ ‫مترابط‬ ‫غير‬ ‫كالم‬((‫تخريف‬::)) Sat Oct 12 00:06:37 KST 2013
I'm at Club ELLUI - @ellui_seoul (서울특별시) w/ 2
others http://t.co/zhcrncosKH::Sat Oct 12 00:02:06 KST 2013

Korean Morpheme Analyzer
 꼬꼬마 Korean Morpheme Analyzer
 한나눔 Korean Morpheme Analyzer
 Komoran Korean Morpheme Analyzer
 Lucene Korean Analyzer
 은전한닢 Korean Morpheme Analyzer
 Performance of the analyzer
 Foreign language and slang tagging
 Sentiment related word tagging (slang,
verb, emoticon)
 It has good dictionary
 Don’t need to think about word spacing
 But, unable to perceive lots of emoticons,
metaphor, sarcasm, irony.

Korean Morpheme Analyzer
> 배가 아파서 병원에 갔다.
배 NN,F,배,*,*,*,*,*
가 JKS,F,가,*,*,*,*,*
아파서 VA+EC,F,아파서,Inflect,VA,EC,아프/VA+ㅏ서/EC,*
병원 NN,T,병원,*,*,*,*,*
에 JKB,F,에,*,*,*,*,*
갔 VV+EP,T,갔,Inflect,VV,EP,가/VV+ㅏㅆ/EP,*
다 EF,F,다,*,*,*,*,*
. SF,*,*,*,*,*,*,*
EOS
Noun
Verb
Adjective
Adverb
Root

Building Sentiment Dictionary
Manually labeled twitter data
1 • 6 days of twitter data (2013.9.9, 9.16, 9.23, 9.30, 10.7, 10.14)
• Labeled positive and negative sets of Noun, Adjective, Verb, Root (total 8 sets)
• Labeled by 4 person
2 • 20,000 reviews from 2 movies
• 545 positive set, 545 negative set,
545 neutral set
Naver Movie review data with rating
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5 6 7 8 9 10
0
500
1000
1500
2000
2500
3000
3500
1 2 3 4 5 6 7 8 9 10
Positive
Positivenegative
Movie 1 Movie 2

Sentiment Classification
 SVM Classifier
 1. Training set - 150 positive set, 150 negative set (Twitter data)
2. Test set – 545 positive set, 545 negative set (Movie review data)
Accuracy = 70.64220183486239% (770/1090) (classification)
Mean squared error = 1.1743119266055047 (regression)
Squared correlation coefficient = 0.18400994471523438 (regression)
 Naïve bayes Classifier
 SO-PMI Classifier

Building Sentiment Dictionary
Unlabeled &
labeled data set
Ternary classifier : Naïve Bayes,
SO-PMI, SVM
Positive
set
Negative
set
Neutral
set
Positive
set
Negative
set
Neutral
set
Positive
set
Negative
set
Neutral
set
SO-PMI
SVM
Naïve Bayes

Recently uploaded

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten

Week-01-2.ppt BBB human Computer interactionfulawalesam

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Edukaciniai dropshipping via API with DroFxolyaivanovalion

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Invezz.com - Grow your wealth with trading signalsInvezz1

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Introduction-to-Machine-Learning (1).pptxfirstjob4

Midocean dropshipping via API with DroFxolyaivanovalion

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Smarteg dropshipping via API with DroFx.pptxolyaivanovalion

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

ALSO dropshipping via API with DroFx.pptxolyaivanovalion

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Log Analysis using OSSEC sasoasasasas.pptx

Week-01-2.ppt BBB human Computer interaction

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Edukaciniai dropshipping via API with DroFx

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Invezz.com - Grow your wealth with trading signals

Determinants of health, dimensions of health, positive health and spectrum of...

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf

Introduction-to-Machine-Learning (1).pptx

Midocean dropshipping via API with DroFx

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...

Smarteg dropshipping via API with DroFx.pptx

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

ALSO dropshipping via API with DroFx.pptx

A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

1. A Study on the Spacio-Temporal Trend of Brand Index using Twitter Messages Sentiment Analysis

2. Abstract Twitter Data Social Scien ce Huma n ArtMedic al Econo my Sentiment Analysis

3. Introduction  Twitter Crawling  Data Pre-processing  Korean Morphology Analysis  Twitter Opinion Mining  Sentiment Dictionary  Evaluating performance of candidate classifiers  Sentiment Classification  Visualize Associative Relationship of Terms  Relationship with Brand Index

4. Twitter Crawling Twitter API Streaming API REST API - Search API Get 1% of all twitter data in real time Get twitter data from the keyword 2013.9.9.Mon. 9:35pm ~ Now About 10,000 ~ 15,000 tweets per a day Total 1,220,000 tweets (2013.11.2.Sat)

5. Data Pre-Processing  Only get tweets which contain at least more than 3 Korean characters and tweets within a 500km radius of Seoul, Korea.  To remove foreign languages, special characters  Remove tweets which only contain location information.  Remove retweets ‫ويتكلم‬ ‫نهائيا‬ ‫السمع‬ ‫فقد‬ ‫متعب‬ ‫ابو‬ ‫الملك‬ ‫ان‬ ‫خبر‬ ‫اكد‬ ‫المستوى‬ ‫رفيع‬ ‫وامير‬ ‫موثوق‬ ‫صدر‬ ‫مفهوم‬ ‫وغير‬ ‫مترابط‬ ‫غير‬ ‫كالم‬((‫تخريف‬::)) Sat Oct 12 00:06:37 KST 2013 I'm at Club ELLUI - @ellui_seoul (서울특별시) w/ 2 others http://t.co/zhcrncosKH::Sat Oct 12 00:02:06 KST 2013

6. Korean Morpheme Analyzer  꼬꼬마 Korean Morpheme Analyzer  한나눔 Korean Morpheme Analyzer  Komoran Korean Morpheme Analyzer  Lucene Korean Analyzer  은전한닢 Korean Morpheme Analyzer  Performance of the analyzer  Foreign language and slang tagging  Sentiment related word tagging (slang, verb, emoticon)  It has good dictionary  Don’t need to think about word spacing  But, unable to perceive lots of emoticons, metaphor, sarcasm, irony.

7. Korean Morpheme Analyzer > 배가 아파서 병원에 갔다. 배 NN,F,배,*,*,*,*,* 가 JKS,F,가,*,*,*,*,* 아파서 VA+EC,F,아파서,Inflect,VA,EC,아프/VA+ㅏ서/EC,* 병원 NN,T,병원,*,*,*,*,* 에 JKB,F,에,*,*,*,*,* 갔 VV+EP,T,갔,Inflect,VV,EP,가/VV+ㅏㅆ/EP,* 다 EF,F,다,*,*,*,*,* . SF,*,*,*,*,*,*,* EOS Noun Verb Adjective Adverb Root

8. Building Sentiment Dictionary Manually labeled twitter data 1 • 6 days of twitter data (2013.9.9, 9.16, 9.23, 9.30, 10.7, 10.14) • Labeled positive and negative sets of Noun, Adjective, Verb, Root (total 8 sets) • Labeled by 4 person 2 • 20,000 reviews from 2 movies • 545 positive set, 545 negative set, 545 neutral set Naver Movie review data with rating 0 1000 2000 3000 4000 5000 6000 1 2 3 4 5 6 7 8 9 10 0 500 1000 1500 2000 2500 3000 3500 1 2 3 4 5 6 7 8 9 10 Positive Positivenegative Movie 1 Movie 2

9. Sentiment Classification  SVM Classifier  1. Training set - 150 positive set, 150 negative set (Twitter data) 2. Test set – 545 positive set, 545 negative set (Movie review data) Accuracy = 70.64220183486239% (770/1090) (classification) Mean squared error = 1.1743119266055047 (regression) Squared correlation coefficient = 0.18400994471523438 (regression)  Naïve bayes Classifier  SO-PMI Classifier

10. Building Sentiment Dictionary Unlabeled & labeled data set Ternary classifier : Naïve Bayes, SO-PMI, SVM Positive set Negative set Neutral set Positive set Negative set Neutral set Positive set Negative set Neutral set SO-PMI SVM Naïve Bayes

11. Sentiment of Brand Index Samsung Galaxy S2 Battery LCDPrice …. : Brand (keyword) : Related nouns (attribute) Adjective Verb Noun Adverb … correlation good good nice good good Nice, pretty, lovely … Bad, terrible … PMI(word, pword) + PMI(word, nword) Determining Objectivity

12. Scenario

Editor's Notes

SNS(SocialNetWorkServic) 시작 확대 -> 개인 BigData 출현 BigData를 이용한 DataMining 대두 트위터롤로지(twitterology) 새로운 학문의 출현 - 트위터를 연구하는 학문’을 뜻하는 신조어 - 소셜네트워크서비스(SNS)인 트위터(twitter)에 학문을 뜻하는 접미사 로지(-logy) - 트위터의 실시간 정보가 사회학 경제학 의학 언어학 등의 연구
Twitter 4J library를 이용한 Streaming API (실시간)와 REST API(15분에 420회- 15분마다 요청하면 420개 받음) 구현 전체 데이터의 1%만 받을 수 있음 – 승우 발표 9월 9일 9:35pm ~ 지금도 계속 하루 평균 만~만오천개의 데이터 현재 2013.11.2 122만개의 데이터 축적
한글 3글자 이하는 받지않음 (특수문자 다빠지고, 영어, 일본어 다 빠짐) 위치정보 imap 등의 정보 제거 서울 반경 500km 이내의 데이터 받음 (전세계의 트위터가 다나옴. 우리나라꺼만 받기위해)
은전한닢 형태소분석기 리눅스에서 자바연동
1. Training set - 긍정 : DB 검색 '좋' 결과 - 이중 150개 부정 : DB 검색 '싫' 결과 - 이중 150개 2. Test set - 긍정 : 영화평 545개 부정 : 영화평 545개 사전에 아예 걸리지 않은 영화평도 포함하였을 때 optimization finished, #iter = 73 nu = 0.16326140616206591 obj = -32.23746306073249, rho = 0.11723225832508417 nSV = 61, nBSV = 38 Total nSV = 61 Accuracy = 70.64220183486239% (770/1090) (classification) Mean squared error = 1.1743119266055047 (regression) Squared correlation coefficient = 0.18400994471523438 (regression)
p(word1 & word2) is the probability that word1 and word2 co-occur f the degree of statistical dependence between the words The log of the ratio corresponds to a form of correlation
– 시나리오 : 악성 보도 이후 해명기사를 낸 기업

A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Recommended

Recommended

More Related Content

More from SOYEON KIM

More from SOYEON KIM (20)

Recently uploaded

Recently uploaded (20)

A study on the spacio temporal trend of brand index using twitter messages sentiment analysis

Editor's Notes