Mobile Marketing, Code of Ethics, Privacy and Children_Michael HanleySara Quinn
Part of the Mobile Communications Resource Center, this is one of several presentations created by Michael Hanley for Ball State University's College of Communication, Information and Media. All rights are reserved.
Information security is a global issue affecting international trading, mobile communications, social media, and the various systems and services that make our digital world and national infrastructures.
We can use Azure Migrate to migrate a server to Azure. Azure migrate is a service administered by Azure to render a centralized hub to appraise and migrate on-premises servers, infrastructures, applications, and data to Azure Cloud.
Mobile Marketing, Code of Ethics, Privacy and Children_Michael HanleySara Quinn
Part of the Mobile Communications Resource Center, this is one of several presentations created by Michael Hanley for Ball State University's College of Communication, Information and Media. All rights are reserved.
Information security is a global issue affecting international trading, mobile communications, social media, and the various systems and services that make our digital world and national infrastructures.
We can use Azure Migrate to migrate a server to Azure. Azure migrate is a service administered by Azure to render a centralized hub to appraise and migrate on-premises servers, infrastructures, applications, and data to Azure Cloud.
company names mentioned herein are for identification and educational purposes only and are the property of, and may be trademarks of, their respective owners.
This document outlines a proposed tourism management system for India. It introduces India as a popular tourist destination with diverse culture, art, traditions and history. It discusses popular regions in northern and southern India that attract tourists. The problem statement notes a lack of relationship between travel agencies and customers. The objectives are to match travel services to customer priorities, build strong customer relationships, and provide accurate travel information. The plan of action includes developing the frontend and backend, connecting forms to a database, and launching the website. Data flow diagrams, use case diagrams and ER diagrams will model the system. The next steps are to code the project, add graphics and database tables, and launch the website online.
Travel and tourism management it project pptMadhukar Kumar
This document describes a travel and tourism management system with modules for administration and users. The administration module allows administrators to track sites, maintain vehicle and booking details, and generate various reports. The user module enables users to generate itineraries, vehicle allocation schedules, and cost breakdowns. The system aims to simplify travel management, provide fast and secure processing, and minimize human effort through cost-efficient databases.
company names mentioned herein are for identification and educational purposes only and are the property of, and may be trademarks of, their respective owners.
This document outlines a proposed tourism management system for India. It introduces India as a popular tourist destination with diverse culture, art, traditions and history. It discusses popular regions in northern and southern India that attract tourists. The problem statement notes a lack of relationship between travel agencies and customers. The objectives are to match travel services to customer priorities, build strong customer relationships, and provide accurate travel information. The plan of action includes developing the frontend and backend, connecting forms to a database, and launching the website. Data flow diagrams, use case diagrams and ER diagrams will model the system. The next steps are to code the project, add graphics and database tables, and launch the website online.
Travel and tourism management it project pptMadhukar Kumar
This document describes a travel and tourism management system with modules for administration and users. The administration module allows administrators to track sites, maintain vehicle and booking details, and generate various reports. The user module enables users to generate itineraries, vehicle allocation schedules, and cost breakdowns. The system aims to simplify travel management, provide fast and secure processing, and minimize human effort through cost-efficient databases.
This document describes a sentiment analysis tool developed by Ravindra Chaudhary and Sachin Singh under the guidance of Mrs. Smita Tiwari. It uses a Naive Bayes classifier to analyze tweets and classify the sentiment as positive, negative, or neutral. The methodology involves collecting tweets using the Twitter API, preprocessing the text by removing URLs, hashtags, numbers, and other unnecessary words. Features are then extracted such as capitalized words and emoticons. The preprocessed text and features are fed into the Naive Bayes classifier to predict the sentiment. The tool was implemented using technologies like NET BEANS IDE, WAMP SERVER, MYSQL, HTML5 and CSS. Future work could involve converting it to a
This document describes a transportation and travel management system created using Visual Basic 6.0 as the front-end and Microsoft Access as the back-end. It includes details on the system requirements, database tables used to store customer, goods, login, and transportation details. Entity relationship diagrams are provided to show relationships between transportation and travel tables. Screenshots demonstrate forms for login, customer billing, goods charges, and transportation management. Advantages of the system are managing transportation and travel anywhere in the world, while limitations include data loss if the system crashes and storage capacity with Access.
Travel and tourism is India's largest service industry. It provides various types of tourism like heritage, cultural, medical, and more. The industry aims to promote tourism, improve existing tourism products, and generate employment. It discusses the concepts of tourism, the tourism industry, government initiatives to promote tourism, and internal and external factors that affect the tourism business environment in India. Some key points covered include the SWOT analysis of the tourism industry in India, different types of tourism like medical, pilgrimage, adventure, wildlife, eco, and cultural tourism. It also discusses the scope and benefits of tourism in India.
The document discusses the concept of tourism and the tourism industry. It defines tourism as activities that take place when people travel to places other than where they live for at least 24 hours for leisure or business purposes. The tourism industry comprises small firms that provide holiday packages within and between countries. It outlines some key government initiatives to promote tourism in India such as Incredible India campaign and focuses on guest hospitality. It also discusses various internal and external factors that affect the tourism business environment.
1. i
Project Report
On
Sentiment Analysis Tool
Submitted aspartial fulfillmentfor the award of
BACHELOR OF TECHNOLOGY
DEGREE
Session 2015-16
In
Information Technology
By
RAVINDRA CHAUDHARY (1203213037)
SACHIN SINGH (1203213039)
Under the guidance of
Ms. SMITA TIWARI
ABES ENGINEERING COLLEGE, GHAZIABAD
1.
AFFILIATED
TO
Dr. A.P.J. ABDUL KALAM TECHNICAL UNIVERSITY, LUCKNOW,
UTTAR PRADESH
(Formerly UPTU)
2. Student’s Declaration
W e hereby declare that the work being presented in this
report entitled SentimentAnalysis Tool is an authentic record of
our own work carried out under the Supervision of Ms.
SMITA TIWARI.
Signature of students
Ravindra Chaudhary
DATE: Sachin Singh
. Information technology
This is to certify that the above statement made by the
candidates is correct to the best of my knowledge.
Signature of HOD Signature of Supervisor
(Dr. P.C. Vashist) (M s. Smita T iwari)
Information T echnology Associate Professor
Date................ Information T echnology
ii
3. ACKNOWLEDGEM ENT
It gives us a great sense of pleasure to present the report of
the B. Tech. Project undertaken during B. Tech, Fourth Year.
W e owe special debt of gratitude to Professor M s. SMITA
T IWARI and Department of Information Technology, ABES
Engineering College, Ghaziabad for her constant support and
guidance throughout the course of our work. Her sincerity,
thoroughness and perseverance have been a constant source
of inspiration for us. It is only her cognizant efforts that our
endeavors have seen light of the day. W e also take the
opportunity to acknowledge the contribution of Professor Dr.
P.C. Vashisth Head, Department of Information Technology,
ABES Engineering College, Ghaziabad for her full support and
assistance during the development of the project. W e also do
not like to miss the opportunity to acknowledge the
contribution of all faculty members of the department for their
kind assistance and cooperation during the development of
our project. Last but not the least, we acknowledge our friends
for their contribution in the completion of the project.
RAVINDRA CHAUDHARY
SACHIN SINGH
i i i
4. TABLE OF CONTENTS
Inner Title Page i
Declaration ii
Acknowledgment iii
Abstract iv
1. Introduction 1-5
1.1. Motivation 1
1.2 Domain introduction 2-5
2. Objective 6
3. Methodology 7-8
3.1 Method of Sentiment Analysis 7
3.1.1. Data Acquisition 7
3.1.2. Tokenizer 7
3.1.3. Pre Processing 7
3.1.4. Feature Extraction 7
3.1.5. Classification and Prediction 8
4. Detail of project report work 9-
4.1. Data acquisition 9-11
4.2. Human Labelling 12-14
4.3. Feature Extraction 15-25
4.4. Classification 26-28
8. ABSTRACT
Thi s p r o j e c t a d d r e s s e s t he p r o b l e m o f s e nt i m e nt a na l ys i s i n
t w i t t e r t ha t i s c l a s s i f yi ng t w e e t s a c c o r d i ng t o t he s e nt i m e nt
e xp r e s s e d i n t he m : p o s i t i ve , ne g a t i ve . Tw i t t e r i s a n o nl i ne m i c r o -
b l o g g i ng a nd s o c i a l - ne t w o r k i ng p l a t f o r m w hi c h a l l o w s us e r s to
w r i t e s ho r t s t a t us up d a t e s o f m a xi m um l e ng t h 1 4 0 c ha r a c t e r s . It
i s a r a p i d l y e xp a nd i ng s e r vi c e w i t h o ve r 2 0 0 m i l l i o n r e g i s t ered
us e r s [ 2 4 ] o ut o f w hi c h 1 0 0 m i l l i o n a r e a c t i ve us e r s a nd ha l f of
t he m l o g o n t w i t t e r o n a d a i l y b a s i s g e ne r a t i ng ne a r l y 2 5 0 m i l l i on
t w e e t s p e r d a y [ 2 0 ] . D ue t o t hi s l a r g e a m o unt o f us a g e w e ho pe
t o a c hi e ve a r e f l e c t i o n o f p ub l i c s e nt i m e nt b y a na l yzi ng t he
s e nt i m e nt s e xp r e s s e d i n t he t w e e t s . A na l yzi ng t he p ub l i c
s e nt i m e nt i s i m p o r t a nt f o r m a ny a p p l i c a t i ons s uc h a s f i r m s t r yi ng
t o f i nd o ut t he r e s p o ns e o f t he i r p r o d uc t s i n t he m a r k e t , p r e d i c ti ng
p o l i t i c a l e l e c t i o ns a nd p r e d i c t i ng s o c i o e c o no m i c p he no m e na l i ke
s t o c k e xc ha ng e . The a i m o f t hi s p r o j e c t i s t o d e ve l o p a f unc t i o na l
c l a s s i f i e r f o r a c c ur a t e a nd a ut o m a t i c s e nt i m e nt c l a s s i f i c a t i o n of
a n unk no w n t w e e t s t r e a m .
iv
9. 1
Cha pte r 1 : INTRODUCTION
1 .1 M o tiv a tio n
W e ha ve c ho s e n t o w o r k w i t h t w i t t e r s i nc e w e f e e l i t i s a
b e t t e r a p p r o xi m a t i on o f p ub l i c s e nt i m e nt a s o p p o s e d t o
c o nve nt i o na l i nt e r ne t a r t i c l e s a nd w e b b l o g s . The r e a s o n i s
t ha t t he a m o unt o f r e l e va nt d a t a i s m uc h l a r g e r f o r t w i t t e r , as
c o m p a r e d t o t r a d i t i o na l b l o g g i ng s i t e s . M o r e o ve r t he
r e s p o ns e o n t w i t t e r i s m o r e p r o m p t a nd a l s o m o r e g e ne r al
( s i nc e t he num b e r o f us e r s w ho t w e e t i s s ub s t a nt i a l l y m o re
t ha n t ho s e w ho w r i t e w e b b l o g s o n a d a i l y b a s i s ) . S e nt i m e nt
a na l ys i s o f p ub l i c i s hi g hl y c r i t i c a l i n m a c r o - s c ale
s o c i o e c o no m i c p he no m e na l i k e p r e d i c t i ng t he s t o c k m a r k et
r a t e o f a p a r t i c ul a r f i r m . Thi s c o ul d b e d o ne b y a na l yzi ng
o ve r a l l p ub l i c s e nt i m e nt t o w a r d s t ha t f i r m w i t h r e s p e c t t o t i me
a nd us i ng e c o no m i c s t o o l s f o r f i nd i ng t he c o r r e l a t i o n b e t w e en
p ub l i c s e nt i m e nt a nd t he f i r m ’ s s t o c k m a r k e t va l ue . F i r m s c a n
a l s o e s t i m a t e ho w w e l l t he i r p r o d uc t i s r e s p o nd i ng i n t he
m a r k e t , w hi c h a r e a s o f t he m a r k e t i s i t ha vi ng a f a vo r a b le
r e s p o ns e a nd i n w hi c h a ne g a t i ve r e s p o ns e ( s i nc e t w i t t er
a l l o w s us t o d o w nl o a d s t r e a m o f g e o - t a g g e d t w e e t s f o r
p a r t i c ul a r l o c a t i o ns . If f i r m s c a n g e t t hi s i nf o r m a t i o n t he y c a n
a na l yze t he r e a s o ns b e hi nd g e o g r a p hi c a l l y d i f f e r e nt i ated
r e s p o ns e , a nd s o t he y c a n m a r k e t t he i r p r o d uc t i n a m o re
o p t i m i ze d m a nne r b y l o o k i ng f o r a p p r o p r i a t e s o l ut i o ns l i ke
c r e a t i ng s ui t a b l e m a r k e t s e g m e nt s . P r e d i c t i ng t he r e s ul t s o f
p o p ul a r p o l i t i c a l e l e c t i o ns a nd p o l l s i s a l s o a n e m e r g i ng
a p p l i c a t i on t o s e nt i m e nt a na l ys i s . In G e r m a ny f o r p r e d i c ti ng
t he o ut c o m e o f f e d e r a l e l e c t i o ns i n w hi c h c o nc l ud e d t ha t
t w i t t e r i s a g o o d r e f l e c t i o n o f o f f l i ne s e nt i m e nt .
10. 2
1 .2 D o m a in In tr o d u c tio n
Thi s p r o j e c t o f a na l yzi ng s e nt i m e nt s o f t w e e t s c o m e s und e r
t he d o m a i n o f “ P a t t e r n C l a s s i f i c at i on” a nd “ D a t a M i ni ng ”.
B o t h o f t he s e t e r m s a r e ve r y c l o s e l y r e l a t e d a nd i nt e r t w i ne d,
a nd t he y c a n b e f o r m a l l y d e f i ne d a s t he p r o c e s s o f
d i s c o ve r i ng “ us e f ul ” p a t t e r ns i n l a r g e s e t o f d a t a , e i t her
a ut o m a t i c a l l y ( uns up e r vi s e d ) o r s e m i - a ut o m a t i c a lly
( s up e r vi s e d ) . The p r o j e c t w o ul d he a vi l y r e l y o n t e c hni q ue s o f
“ N a t ur a l L a ng ua g e P r o c e s s i ng ” i n e xt r a c t i ng s i g ni f i c ant
p a t t e r ns a nd f e a t ur e s f r o m t he l a r g e d a t a s e t o f t w e e t s a nd
o n “ M a c hi ne L e a r ni ng ” t e c hni q ue s f o r a c c ur a t e l y c l a s s i f yi ng
i nd i vi d ua l unl a b e l e d d a t a s a m p l e s ( t w e e t s ) a c c o r d i ng t o
w hi c he ve r p a t t e r n m o d e l b e s t d e s c r i b e s t he m .
The f e a t ur e s t ha t c a n b e us e d f o r m o d e l i ng p a t t e r ns a nd
c l a s s i f i c a t i o n c a n b e d i vi d e d i nt o t w o m a i n g r o up s : f o r m a l
l a ng ua g e b a s e d a nd i nf o r m a l b l o g g i ng b a s e d . L a ng ua ge
b a s e d f e a t ur e s a r e t ho s e t ha t d e a l w i t h f o r m a l l i ng ui s t i c s a nd
i nc l ud e p r i o r s e nt i m e nt p o l a r i t y o f i nd i vi d ua l w o r d s a nd
p hr a s e s , a nd p a r t s o f s p e e c h t a g g i ng o f t he s e nt e nc e . P r i or
s e nt i m e nt p o l a r i t y m e a ns t ha t s o m e w o r d s a nd p hr a s e s ha ve
a na t ur a l i nna t e t e nd e nc y f o r e xp r e s s i ng p a r t i c ul a r a nd
s p e c i f i c s e nt i m e nt s i n g e ne r a l . F o r e xa m p l e t he w o rd
“ e xc e l l e nt ” ha s a s t r o ng p o s i t i ve c o nno t a t i o n w hi l e t he w o rd
“ e vi l ” p o s s e s s e s a s t r o ng ne g a t i ve c o nno t a t i o n. S o w he ne ve r
a w o r d w i t h p o s i t i ve c o nno t a t i o n i s us e d i n a s e nt e nc e ,
c ha nc e s a r e t ha t t he e nt i r e s e nt e nc e w o ul d b e e xp r e s s i n g a
p o s i t i ve s e nt i m e nt . P a r t s o f S p e e c h t a g g i ng , o n t he o t he r
ha nd , i s a s ynt a c t i c a l a p p r o a c h t o t he p r o b l e m . It m e a ns t o
a ut o m a t i c a l l y i d e nt i f y w hi c h p a r t o f s p e e c h e a c h i nd i vi d ual
w o r d o f a s e nt e nc e b e l o ng s t o : no un, p r o no un, a d ve r b,
a d j e c t i ve , ve r b , i nt e r j e c t i o n, e t c . P a t t e r ns c a n b e e xt r a c t ed
f r o m a na l yzi ng t he f r e q ue nc y d i s t r i b ut i o n o f t he s e p a r t s o f
11. 3
s p e e c h ( e t he r i nd i vi d ua l l y o r c o l l e c t i ve l y w i t h s o m e o t he r p a rt
o f s p e e c h) i n a p a r t i c ul a r c l a s s o f l a b e l e d t w e e t s . Tw i t t e r
b a s e d f e a t ur e s a r e m o r e i nf o r m a l a nd r e l a t e w i t h ho w p e o ple
e xp r e s s t he m s e l ve s o n o nl i ne s o c i a l p l a t f o r m s a nd c o m p r ess
t he i r s e nt i m e nt s i n t he l i m i t e d s p a c e o f 1 4 0 c ha r a c t e rs
o f f e r e d b y t w i t t e r . The y i nc l ud e t w i t t e r ha s ht a g s , r e t w e e t s,
w o r d c a p i t a l i za t i o n, w o r d l e ng t he ni ng [ 1 3 ] , q ue s t i o n m a r k s,
p r e s e nc e o f ur l i n t w e e t s , e xc l a m a t i o n m a r k s , i nt e r ne t
e m o t i c o ns a nd i nt e r ne t s ho r t ha nd / s l a ng s .
C l a s s i f i c at i on t e c hni q ue s c a n a l s o b e d i vi d e d i nt o a t w o
c a t e g o r i e s : S up e r vi s e d vs . uns up e r vi s e d a nd no n - a d a p t i ve
vs . a d a p t i ve / r e i nf o r c em e nt t e c hni q u e s . S up e r vi s e d a p p r o ach
i s w he n w e ha ve p r e - l a b e l e d d a t a s a m p l e s a va i l a b l e a nd w e
us e t he m t o t r a i n o ur c l a s s i f i e r . Tr a i ni ng t he c l a s s i f i e r m e a ns
t o us e t he p r e - l a b e l e d t o e xt r a c t f e a t ur e s t ha t b e s t m o d e l t he
p a t t e r ns a nd d i f f e r e nc e s b e t w e e n e a c h o f t he i n d i vi d ual
c l a s s e s , a nd t he n c l a s s i f yi ng a n unl a b e l e d d a t a s a m p le
a c c o r d i ng t o w hi c he ve r p a t t e r n b e s t d e s c r i b e s i t . F o r e xa m p le
i f w e c o m e up w i t h a hi g hl y s i m p l i f i ed m o d e l t ha t ne ut r a l
t w e e t s c o nt a i n 0 . 3 e xc l a m a t i o n m a r k s p e r t w e e t o n a ve r a ge
w hi l e s e nt i m e nt - b e a r i ng t w e e t s c o nt a i n 0 . 8 , a nd i f t he t w e e t
w e ha ve t o c l a s s i f y d o e s c o nt a i n 1 e xc l a m a t i o n m a r k t he n
( i g no r i ng a l l o t he r p o s s i b l e f e a t ur e s ) t he t w e e t w o ul d be
c l a s s i f i e d a s s ub j e c t i ve , s i nc e 1 e xc l a m a t i o n m a r k i s c l o s er
t o t he m o d e l o f 0 . 8 e xc l a m a t i o n m a r k s . U ns up e r vi s ed
c l a s s i f i c a t i o n i s w he n w e d o no t ha ve a ny l a b e l e d d a t a f o r
t r a i ni ng . In a d d i t i o n t o t hi s a d a p t i ve c l a s s i f i c a t i on t e c hni q ues
d e a l w i t h f e e d b a c k f r o m t he e nvi r o nm e nt . In o ur c a se
f e e d b a c k f r o m t he e nvi r o nm e nt c a n b e i n f o r m o f a hum a n
t e l l i ng t he c l a s s i f i e r w he t he r i t ha s d o ne a g o o d o r p o o r j ob
i n c l a s s i f yi ng a p a r t i c ul a r t w e e t a nd t he c l a s s i f i e r ne e d s t o
l e a r n f r o m t hi s f e e d b a c k . The r e a r e t w o f ur t he r t yp e s o f
12. 4
a d a p t i ve t e c hni q ue s : P a s s i ve a nd a c t i ve . P a s s i ve t e c hni q ues
a r e t he o ne s w hi c h us e t he f e e d b a c k o nl y t o l e a r n a b o ut t he
e nvi r o nm e nt ( i n t hi s c a s e t hi s c o ul d m e a n i m p r o vi ng o ur
m o d e l s f o r t w e e t s b e l o ng i ng t o e a c h o f t he t hr e e c l a s s e s ) b ut
no t us i ng t hi s i m p r o ve d l e a r ni ng i n o ur c ur r e nt c l a s s i f i c ati on
a l g o r i t hm , w hi l e t he a c t i ve a p p r o a c h c o nt i nuo us l y k e eps
c ha ng i ng i t s c l a s s i f i c a t i o n a l g o r i t hm a c c o r d i ng t o w ha t i t
l e a r ns a t r e a l - t i m e .
The r e a r e s e ve r a l m e t r i c s p r o p o s e d f o r c o m p ut i ng a nd
c o m p a r i ng t he r e s ul t s o f o ur e xp e r i m e nt s . S o m e o f t he m o st
p o p ul a r m e t r i c s i nc l ud e : P r e c i s i on, R e c a l l , A c c ur a c y, F 1 -
m e a s ur e , Tr ue r a t e a nd F a l s e a l a r m r a t e ( e a c h o f t he s e
m e t r i c s i s c a l c ul a t e d i nd i vi d ua l l y f o r e a c h c l a s s a nd t he n
a ve r a g e d f o r t he o ve r a l l c l a s s i f i e r .
T a b l e 1 : A T y p i c a l 2 x 2 C o n f u s i o n M a t r i x
Machine says yes Machine says no
Human says yes tp fn
Human says no fp tn
13. 5
P re c is io n (P ) = 𝒕 𝒑 / 𝒕 𝒑 + 𝒇 𝒑
R e c all(R ) = 𝒕 𝒑 / 𝒕 𝒑 + 𝒇 𝒏
A c c urac y (A ) = 𝒕 𝒑 + 𝒕 𝒏 / 𝒕 𝒑 + 𝒕 𝒏+ 𝒇 + 𝒇 𝒑 + 𝒇 𝒏
F 1 = 𝟐. 𝑷 . 𝑹 / 𝑷 + 𝑹
T rue R ate (T ) = 𝒕 𝒑 / 𝒕 𝒑 + 𝒇 𝒏
F als e -alarm R ate (F ) = 𝒇 𝒑 / 𝒕 𝒑 + 𝒇 𝒏
14. 6
Cha pte r 2 : OBJ ECTIVE
• To i m p l e m e nt a N a i ve B a ye s A l g o r i t hm f o r a ut o m a t i c
c l a s s i f i c a t i o n o f t e xt i nt o P o s i t i ve , N e g a t i ve .
• S e nt i m e nt A na l ys i s t o d e t e r m i ne t he a t t i t ud e o f t he m a ss
i s p o s i t i ve , ne g a t i ve o r ne ut r a l t o w a r d t he s ub j e c t o f
i nt e r e s t .
15. 7
Cha pte r 3 : METHODOLOGY
3 .1 M e th o d s o f S e n tim e n t An a ly s is : -
3 .1 .1 . D A T A A C Q UI S I T I O N
• D o w nl o a d t he t e xt us i ng t w i t t e r A P I.
3 .1 .2 . T O K E NS I E R
• U s i ng P O S ( p a r t o f s p e e c h) t a g g e r .
3 .1 .3 . P R E -P R O C E S S I NG
• R e m o ve s l a g ( no n – E ng l i s h) w o r d s
• R e p l a c i ng e m o t i c o ns b y t he i r p o l a r i t y.
• R e m o ve U R L a nd H A S H TA G ( # ) , num b e r s .
• R e p l a c e s e q ue nc e o f r e p e a t e d c ha r a c t e r c o o o o o l b y c o ol
• R e m o ve no un a nd p r e p o s i t i o ns
3 .1 .4 . F E A T UR E E X T R A C T I O N
• P e r c e nt a g e o f c a p i t a l i ze d w o r d
• N o o f – ve / + ve c a p i t a l i ze d w o r d
• N o o f + ve / - ve ha s ht a g
• N o o f + ve / - ve e m o t i c o ns
• N o . o f ne g a t i o ns
• N o . o f s p e c i a l c ha r a c t e r s e xa m p l e : - @ # % ^ *
16. 8
3 .1 .5 . C L A S S I F I C A T I O N A ND P R E D E C T I O NS
• The m o d e l i s b ui l t t o p r e d i c t t he s e nt i m e nt o f ne w t w e e t s
• F e a t ur e e xt r a c t e d a r e ne xt f o c us e d t o c l a s s i f i e r .
F i g u r e 1 : D a t a F l o w D i a g r a m
17. 9
Cha pte r 4 : DETAILS OF PROJ ECT REPORT
W ORK
The p r o c e s s o f d e s i g ni ng a f unc t i o na l c l a s s i f i e r f o r s e nt i m e nt
a na l ys i s c a n b e b r o k e n d o w n i nt o f i ve b a s i c c a t e g o r i e s . The y
a r e a s f o l l o w s :
I. D a t a A c q ui s i t i o n
II. H um a n L a b e l l i ng
III. F e a t ur e E xt r a c t i o n
IV . C l a s s i f i c a t i o n
4 .1 . D a ta Ac q u is itio n :
D a t a i n t he f o r m o f r a w t w e e t s i s a c q ui r e d b y us i ng t he p yt ho n
l i b r a r y “ t w e e t s t r e a m ” w hi c h p r o vi d e s a p a c k a g e f o r s i m ple
t w i t t e r A P I [ 2 6 ] . Thi s A P I a l l o w s t w o m o d e s o f a c c e s si ng
t w e e t s : S a m p l e S t r e a m a nd F i l t e r S t r e a m . S a m p l e S t r e am
s i m p l y d e l i ve r s a s m a l l , r a nd o m s a m p l e o f a l l t he t w e e ts
s t r e a m i ng a t a r e a l t i m e . F i l t e r S t r e a m d e l i ve r s t w e e t w hi c h
m a t c h a c e r t a i n c r i t e r i a . It c a n f i l t e r t he d e l i ve r e d t w e e ts
a c c o r d i ng t o t hr e e c r i t e r i a :
• S p e c i f i c k e yw o r d ( s ) t o t r a c k / s e a r c h f o r i n t he t w e e t s
• S p e c i f i c Tw i t t e r us e r ( s ) a c c o r d i ng t o t he i r us e r - i d ’ s
• Tw e e t s o r i g i na t i ng f r o m s p e c i f i c l o c a t i o n( s ) ( o nl y f o r g e o -
t a g g e d t w e e t s ) .
A p r o g r a m m e r c a n s p e c i f y a ny s i ng l e o ne o f t he s e f i l t e r i ng
c r i t e r i a o r a m ul t i p l e c o m b i na t i o n o f t he s e . B ut f o r o ur
p ur p o s e w e ha ve no s uc h r e s t r i c t i o n a nd w i l l t hus s t i c k t o t he
S a m p l e S t r e a m m o d e . S i nc e w e w a nt e d t o i nc r e a s e t he
g e ne r a l i t y o f o ur d a t a , w e a c q ui r e d i t i n p o r t i o ns a t d i f f e r e nt
18. 10
p o i nt s o f t i m e i ns t e a d o f a c q ui r i ng a l l o f i t a t o ne g o . If w e
us e d t he l a t t e r a p p r o a c h t he n t he g e ne r a l i t y o f t he t w e e ts
m i g ht ha ve b e e n c o m p r o m i s e d s i nc e a s i g ni f i c a nt p o r t i o n o f
t he t w e e t s w o ul d b e r e f e r r i ng t o s o m e c e r t a i n t r e nd i ng t o pi c
a nd w o ul d t hus ha ve m o r e o r l e s s o f t he s a m e g e ne r a l m o od
o r s e nt i m e nt . Thi s p he no m e no n ha s b e e n o b s e r ve d w he n w e
w e r e g o i ng t hr o ug h o ur s a m p l e o f a c q ui r e d t w e e t s . F or
e xa m p l e t he s a m p l e a c q ui r e d ne a r C hr i s t m a s a nd N e w Ye a r ’ s
ha d a s i g ni f i c ant p o r t i o n o f t w e e t s r e f e r r i ng t o t he s e j o yo us
e ve nt s a nd w e r e t hus o f a g e ne r a l l y p o s i t i ve s e nt i m e nt .
S a m p l i ng o ur d a t a i n p o r t i o ns a t d i f f e r e nt p o i nt s i n t i m e w o ul d
t hus t r y t o m i ni m i ze t hi s p r o b l e m .
A t w e e t a c q ui r e d b y t hi s m e t ho d ha s a l o t o f r a w i nf o r m a t i on
i n i t w hi c h w e m a y o r m a y no t f i nd us e f ul f o r o ur p a r t i c ul ar
a p p l i c a t i on. It c o m e s i n t he f o r m o f t he p yt ho n “ d i c t i o na ry”
d a t a t yp e w i t h va r i o us k e y - va l ue p a i r s . A l i s t o f s o m e k e y -
va l ue p a i r s a r e g i ve n b e l o w :
• W he t he r a t w e e t ha s b e e n f a vo r i t e
• U s e r ID
• S c r e e n na m e o f t he us e r
• O r i g i na l Te xt o f t he t w e e t
• P r e s e nc e o f ha s ht a g s
• W he t he r i t i s a r e - t w e e t
• L a ng ua g e und e r w hi c h t he t w i t t e r us e r ha s r e g i s t ered
t he i r a c c o unt
• G e o - t a g l o c a t i o n o f t he t w e e t
• D a t e a nd t i m e w he n t he t w e e t w a s c r e a t e d
19. 11
S i nc e t hi s i s a l o t o f i nf o r m a t i o n w e o nl y f i l t e r o ut t he
i nf o r m a t i o n t ha t w e ne e d a nd d i s c a r d t he r e s t . F o r o ur
p a r t i c ul a r a p p l i c a t i o n w e i t e r a t e t hr o ug h a l l t he t w e e t s i n o ur
s a m p l e a nd s a ve t he a c t ua l t e xt c o nt e nt o f t he t w e e t s i n a
s e p a r a t e f i l e g i ve n t ha t l a ng ua g e o f t he t w i t t e r i s us e r ’ s
a c c o unt i s s p e c i f i e d t o b e E ng l i s h. The o r i g i na l t e xt c o nt e nt
o f t he t w e e t i s g i ve n und e r t he d i c t i o na r y k e y “ t e x t ” a nd t he
l a ng ua g e o f us e r ’ s a c c o unt i s g i ve n und e r “ L a n g ” .
S i nc e hum a n l a b e l l i ng i s a n e xp e ns i ve p r o c e s s w e f ur t he r
f i l t e r o ut t he t w e e t s t o b e l a b e l l e d s o t ha t w e ha ve t he
g r e a t e s t a m o unt o f va r i a t i o n i n t w e e t s w i t ho ut t he l o s s o f
g e ne r a l i t y. The f i l t e r i ng c r i t e r i a a p p l i e d a r e s t a t e d b e l o w :
• R e m o ve R e t w e e t s ( a n y t w e e t w hi c h c o nt a i ns t he s t r i ng
“ R T” )
• R e m o ve ve r y s ho r t t w e e t s ( t w e e t w i t h l e ng t h l e s s t ha n 20
c ha r a c t e r s )
• R e m o ve no n- E ng l i s h t w e e t s ( b y c o m p a r i ng t he w o r d s o f
t he t w e e t s w i t h a l i s t o f 2 , 0 0 0 c o m m o n E ng l i s h w o r d s,
t w e e t s w i t h l e s s t ha n 1 5 % o f c o nt e nt m a t c hi ng t hr e s ho l d
a r e d i s c a r d e d )
• R e m o ve s i m i l a r t w e e t s ( b y c o m p a r i ng e ve r y t w e e t w i t h
e ve r y o t he r t w e e t , t w e e t s w i t h m o r e t ha n 9 0 % o f c o nt e nt
m a t c hi ng w i t h s o m e o t he r t w e e t i s d i s c a r d e d )
A f t e r t hi s f i l t e r i ng r o ug hl y 3 0 % o f t w e e t s r e m a i n f o r hum a n
l a b e l l i ng o n a ve r a g e p e r s a m p l e , w hi c h m a d e a t o t a l o f 1 0 , 173
t w e e t s t o b e l a b e l l e d .
20. 12
4 . 2 . Hum a n La be lling :
F o r t he p ur p o s e o f hum a n l a b e l l i ng w e m a d e t hr e e c o p i e s o f
t he t w e e t s s o t ha t t he y c a n b e l a b e l l e d b y f o ur i nd i vi d ual
s o ur c e s . Thi s i s d o ne s o t ha t w e c a n t a k e a ve r a g e o p i ni o n o f
p e o p l e o n t he s e nt i m e nt o f t he t w e e t a nd i n t hi s w a y t he no i se
a nd i na c c ur a c i e s i n l a b e l l i ng c a n b e m i ni m i ze d . G e ne r a l l y
s p e a k i ng t he m o r e c o p i e s o f l a b e l s w e c a n g e t t he b e t t e r i t
i s , b ut w e ha ve t o k e e p t he c o s t o f l a b e l l i ng i n o ur m i nd ,
he nc e w e r e a c he d a t t he r e a s o na b l e f i g ur e o f t hr e e .
W e l a b e l l e d t he t w e e t s i n f o ur c l a s s e s a c c o r d i ng t o
s e nt i m e nt s e xp r e s s e d / o b s er ve d i n t he t w e e t s : p o s i t i ve,
ne g a t i ve , ne ut r a l / o b j e c t i ve a nd a m b i g uo us . W e g a ve t he
f o l l o w i ng g ui d e l i ne s t o o ur l a b e l e r s t o he l p t he m i n t he
l a b e l l i ng p r o c e s s :
P o s i t i v e : If t he e nt i r e t w e e t ha s a
p o s i t i ve / ha p p y/ e xc i t e d/ j o yf ul a t t i t ud e o r i f s o m e t hi ng i s
m e nt i o ne d w i t h p o s i t i ve c o nno t a t i o ns . A l s o i f m o r e t ha n
o ne s e nt i m e nt i s e xp r e s s e d i n t he t w e e t b ut t he p o s i t i ve
s e nt i m e nt i s m o r e d o m i na nt . E xa m p l e : “ 4 m o r e y e a r s o f
b e i n g i n s h i t h o l e A u s t r a l i a t h e n I m o v e t o t h e U S A ! : D ” .
• N e g a t i v e: If t he e nt i r e t w e e t ha s a
ne g a t i ve / s a d / d i sp l e as e d a t t i t ud e o r i f s o m e t hi ng i s
m e nt i o ne d w i t h ne g a t i ve c o nno t a t i o ns . A l s o i f m o r e t ha n
o ne s e nt i m e nt i s e xp r e s s e d i n t he t w e e t b ut t he ne g a t i ve
s e nt i m e nt i s m o r e d o m i na nt . E xa m p l e : “ I wa n t a n a n d roi d
n o w t h i s i P h o n e i s b o r i n g : S ” .
• N e u t r a l / O b j e ct i v e : If t he c r e a t o r o f t w e e t e xp r e s s e s no
p e r s o na l s e nt i m e nt / o p i ni o n i n t he t w e e t a nd m e r e l y
t r a ns m i t s i nf o r m a t i o n. A d ve r t i s e m e nt s o f d i f f e r e nt
21. 13
p r o d uc t s w o ul d b e l a b e l l e d und e r t hi s c a t e g o r y.
E xa m p l e : “ U S H o u s e S p e a k e r v o ws t o s t o p O b am a
c o n t r a c e p t i v e r u l e . . . h t t p : / / t . c o / c y E W q K l E ” .
• A m b i g u o u s : If m o r e t ha n o ne s e nt i m e nt i s e xp r e s s e d i n
t he t w e e t w hi c h a r e e q ua l l y p o t e nt w i t h no o ne p a r t i c ul ar
s e nt i m e nt s t a nd i ng o ut a nd b e c o m i ng m o r e o b vi o us.
A l s o i f i t i s o b vi o us t ha t s o m e p e r s o na l o p i ni o n i s b e i ng
e xp r e s s e d he r e b ut d ue t o l a c k o f r e f e r e nc e t o c o nt e xt
i t i s d i f f i c ul t / i m p o s s i b l e t o a c c ur a t e l y d e c i p he r t he
s e nt i m e nt e xp r e s s e d . E xa m p l e : “ I k i n d o f l i k e h e r o e s a nd
d o n ’t l i k e i t a t t h e s a m e t i m e . . . ” F i na l l y i f t he c o nt e xt o f
t he t w e e t i s no t a p p a r e nt f r o m t he i nf o r m a t i o n a va i l a b le.
E xa m p l e : “ T h a t ’s e x a c t l y h o w I f e e l a b o u t a v e n g e r ’s h a -
h a ” .
• < B l a n k > : L e a ve t he t w e e t unl a b e l e d i f i t b e l o ng s t o s o me
l a ng ua g e o t he r t ha n E ng l i s h s o t ha t i t i s i g no r e d i n t he
t r a i ni ng d a t a .
B e s i d e s t hi s l a b e l e r s w e r e i ns t r uc t e d t o k e e p p e r s o na l b i ases
o ut o f l a b e l l i ng a nd m a k e no a s s um p t i o ns , i . e . j ud g e t he t w e e t
no t f r o m a ny p a s t e xt r a p e r s o na l i nf o r m a t i o n a nd o nl y f r o m
t he i nf o r m a t i o n p r o vi d e d i n t he c u r r e nt i nd i vi d ua l t w e e t .
O nc e w e ha d l a b e l s f r o m f o ur s o ur c e s o ur ne xt s t e p w a s t o
c o m b i ne o p i ni o ns o f t hr e e p e o p l e t o g e t a n a ve r a g e d o p i ni o n.
The w a y w e d i d t hi s i s t hr o ug h m a j o r i t y vo t e .
S o f o r e xa m p l e i f a p a r t i c ul a r t w e e t ha d t o t w o l a b e l s i n
a g r e e m e nt , w e w o ul d l a b e l t he o ve r a l l t w e e t a s s uc h. B ut i f
a l l t hr e e l a b e l s w e r e d i f f e r e nt , w e l a b e l l e d t he t w e e t as
“ una b l e t o r e a c h a m a j o r i t y vo t e ” . W e a r r i ve d a t t he f o l l o w i ng
s t a t i s t i c s f o r e a c h c l a s s a f t e r g o i ng t hr o ug h m a j o r i t y vo t i ng .
22. 14
• P o s i t i ve : 2 5 4 3 t w e e t s
• N e g a t i ve : 1 8 7 7 t w e e t s
• N e ut r a l : 4 5 4 3 t w e e t s
• A m b i g uo us : 4 5 1 t w e e t s
• U na b l e t o r e a c h m a j o r i t y vo t e : 3 9 0 t w e e t s
• U nl a b e l e d no n- E ng l i s h t w e e t s : 3 6 9 t w e e t s
S o i f w e i nc l ud e o nl y t ho s e t w e e t s f o r w hi c h w e ha ve b e en
a b l e t o a c hi e ve a p o s i t i ve , ne g a t i ve o r ne ut r a l m a j o r i t y vo t e ,
w e a r e l e f t w i t h 8 9 6 3 t w e e t s f o r o ur t r a i ni ng s e t . O ut o f t he s e
4 5 4 3 a r e o b j e c t i ve t w e e t s a nd 4 4 2 0 a r e s ub j e c t i ve t w e e ts
( s um o f p o s i t i ve a nd ne g a t i ve t w e e t s ) .
W e a l s o c a l c ul a t e d t he hum a n - hum a n a g r e e m e nt f o r o ur t w e e t
l a b e l l i ng t a s k .
4.3. Feature Extraction
N o w t ha t w e ha ve a r r i ve d a t o ur t r a i ni ng s e t w e ne e d t o
e xt r a c t us e f ul f e a t ur e s f r o m i t w hi c h c a n b e us e d i n t he
p r o c e s s o f c l a s s i f i c at i on. B ut f i r s t w e w i l l d i s c us s s o m e t e xt
f o r m a t t i ng t e c hni q ue s w hi c h w i l l a i d us i n f e a t ur e e xt r a c t i o n:
• To k e ni za t i o n: It i s t he p r o c e s s o f b r e a k i ng a s t r e a m o f
t e xt up i nt o w o r d s , s ym b o l s a nd o t he r m e a ni ng f ul
e l e m e nt s c a l l e d “ t o k e ns ” . To k e ns c a n b e s e p a r a t e d b y
w hi t e s p a c e c ha r a c t e r s a nd / o r p unc t ua t i o n c ha r a c t e r s . It
i s d o ne s o t ha t w e c a n l o o k a t t o k e ns a s i nd i vi d ual
c o m p o ne nt s t ha t m a k e up a t w e e t [ 1 9 ] .
23. 15
• U r l ’ s a nd us e r r e f e r e nc e s ( i d e nt i f i e d b y t o k e ns “ ht t p ” a nd
“ @ ” ) a r e r e m o ve d i f w e a r e i nt e r e s t e d i n o nl y a na l yzi ng
t he t e xt o f t he t w e e t .
• P unc t ua t i o n m a r k s a nd d i g i t s / num e r a l s m a y b e r e m o ved
i f f o r e xa m p l e w e w i s h t o c o m p a r e t he t w e e t t o a l i s t o f
E ng l i s h w o r d s .
• L o w e r c a s e C o nve r s i o n: Tw e e t m a y b e no r m a l i ze d b y
c o nve r t i ng i t t o l o w e r c a s e w hi c h m a k e s i t ’ s c o m p a r i son
w i t h a n E ng l i s h d i c t i o na r y e a s i e r .
• S t e m m i ng : It i s t he t e xt no r m a l i zi ng p r o c e s s o f r e d uc i ng
a d e r i ve d w o r d t o i t s r o o t o r s t e m [ 2 8 ] . F o r e xa m p l e a
s t e m m e r w o ul d r e d uc e t he p hr a s e s “ s t e m m e r ”,
“ s t e m m e d ” , “ s t e m m i ng ” t o t he r o o t w o r d “ s t e m ” .
A d va nt a g e o f s t e m m i ng i s t ha t i t m a k e s c o m p a r i son
b e t w e e n w o r d s s i m p l e r , a s w e d o no t ne e d t o d e a l w i t h
c o m p l e x g r a m m a t i c a l t r a ns f o r m a t i o ns o f t he w o r d . In o ur
c a s e w e e m p l o ye d t he a l g o r i t hm o f “ p o r t e r s t e m m i ng ” o n
b o t h t he t w e e t s a nd t he d i c t i o na r y, w he ne ve r t he r e w a s
a ne e d o f c o m p a r i s o n.
• S t o p - w o r d s r e m o va l : S t o p w o r d s a r e c l a s s o f s o me
e xt r e m e l y c o m m o n w o r d s w hi c h ho l d no a d d i t i onal
i nf o r m a t i o n w he n us e d i n a t e xt a nd a r e t hus c l a i m e d t o be
us e l e s s [ 1 9 ] . E xa m p l e s i nc l ud e “ a ” , “ a n” , “ t he ” , “ he ” , “ s he ” ,
“ b y” , “ o n” , e t c . It i s s o m e t i m e s c o n ve ni e nt t o r e m o ve t he s e
w o r d s b e c a us e t he y ho l d no a d d i t i o na l i nf o r m a t i o n s i nc e t he y
a r e us e d a l m o s t e q ua l l y i n a l l c l a s s e s o f t e xt , f o r e xa m p le
w he n c o m p ut i ng p r i o r - s e nt i m e nt - p o l ar i t y o f w o r d s i n a t w e e t
a c c o r d i ng t o t he i r f r e q ue nc y o f o c c ur r e nc e i n d i f f e r e nt
c l a s s e s a nd us i ng t hi s p o l a r i t y t o c a l c ul a t e t he a ve r a ge
24. 16
s e nt i m e nt o f t he t w e e t o ve r t he s e t o f w o r d s us e d i n t ha t
t w e e t .
• P a r t s - o f - S p e ec h Ta g g i ng : P O S - Ta g g i ng i s t he p r o c e s s o f
a s s i g ni ng a t a g t o e a c h w o r d i n t he s e nt e nc e a s t o w hi c h
g r a m m a t i c a l p a r t o f s p e e c h t ha t w o r d b e l o ng s t o , i . e.
no un, ve r b , a d j e c t i ve , a d ve r b , c o o r d i na t i ng c o nj unc t i o n
e t c .
N o w t ha t w e ha ve d i s c us s e d s o m e o f t he t e xt f o r m a t t i ng
t e c hni q ue s e m p l o ye d b y us , w e w i l l m o ve t o t he l i s t o f
f e a t ur e s t ha t w e ha ve e xp l o r e d . A s w e w i l l s e e b e l o w a
f e a t ur e i s a ny va r i a b l e w hi c h c a n he l p o ur c l a s s i f i er i n
d i f f e r e nt i a t i ng b e t w e e n t he d i f f e r e nt c l a s s e s . The r e a r e t w o
k i nd s o f c l a s s i f i c a t i o n i n o ur s ys t e m ( a s w i l l b e d i s c us s e d i n
d e t a i l i n t he ne xt s e c t i o n) , t he o b j e c t i vi t y / s ub j e c t i vi ty
c l a s s i f i c a t i o n a nd t he p o s i t i vi t y / ne g a t i vi t y c l a s s i f i c a t i o n. As
t he na m e s ug g e s t s t he f o r m e r i s f o r d i f f e r e nt i a t i ng b e t w e en
o b j e c t i ve a nd s ub j e c t i ve c l a s s e s w hi l e t he l a t t e r i s f o r
d i f f e r e nt i a t i ng b e t w e e n p o s i t i ve a nd ne g a t i ve c l a s s e s .
The l i s t o f f e a t ur e s e xp l o r e d f o r o b j e c t i ve / s ub j e c t i ve
c l a s s i f i c a t i o n i s a s b e l o w :
• N um b e r o f e xc l a m a t i o n m a r k s i n a t w e e t
• N um b e r o f q ue s t i o n m a r k s i n a t w e e t
• P r e s e nc e o f e xc l a m a t i o n m a r k s i n a t w e e t
• P r e s e nc e o f q ue s t i o n m a r k s i n a t w e e t
• P r e s e nc e o f ur l i n a t w e e t
• P r e s e nc e o f e m o t i c o ns i n a t w e e t
• U ni g r a m w o r d m o d e l s c a l c ul a t e d us i ng N a i ve B a ye s
• P r i o r p o l a r i t y o f w o r d s t hr o ug h o nl i ne l e xi c o n M P Q A
25. 17
• N um b e r o f d i g i t s i n a t w e e t
• N um b e r o f c a p i t a l i ze d w o r d s i n a t w e e t
• N um b e r o f c a p i t a l i ze d c ha r a c t e r s i n a t w e e t
• N um b e r o f p unc t ua t i o n m a r k s / s ym b o l s i n a t w e e t
R a t i o o f no n- d i c t i o na r y w o r d s t o t he t o t a l num b e r o f w o r ds
i n t he t w e e t
• L e ng t h o f t he t w e e t
• N um b e r o f a d j e c t i ve s i n a t w e e t
• N um b e r o f c o m p a r a t i ve a d j e c t i ve s i n a t w e e t
• N um b e r o f s up e r l a t i ve a d j e c t i ve s i n a t w e e t
• N um b e r o f b a s e - f o r m ve r b s i n a t w e e t
• N um b e r o f p a s t t e ns e ve r b s i n a t w e e t
• N um b e r o f p r e s e nt p a r t i c i p l e ve r b s i n a t w e e t
• N um b e r o f p a s t p a r t i c i p l e ve r b s i n a t w e e t
• N um b e r o f 3
r d
p e r s o n s i ng ul a r p r e s e nt ve r b s i n a t w e e t
• N um b e r o f no n - 3
r d
p e r s o n s i ng ul a r p r e s e nt ve r b s i n a
t w e e t
• N um b e r o f a d ve r b s i n a t w e e t
• N um b e r o f p e r s o na l p r o no uns i n a t w e e t
• N um b e r o f p o s s e s s i ve p r o no uns i n a t w e e t
• N um b e r o f s i ng ul a r p r o p e r no un i n a t w e e t
• N um b e r o f p l ur a l p r o p e r no un i n a t w e e t
26. 18
• N um b e r o f c a r d i na l num b e r s i n a t w e e t
• N um b e r o f p o s s e s s i ve e nd i ng s i n a t w e e t
• N um b e r o f w h- p r o no uns i n a t w e e t
• N um b e r o f a d j e c t i ve s o f a l l f o r m s i n a t w e e t
• N um b e r o f ve r b s o f a l l f o r m s i n a t w e e t
• N um b e r o f no uns o f a l l f o r m s i n a t w e e t
• N um b e r o f p r o no uns o f a l l f o r m s i n a t w e e t
The l i s t o f f e a t ur e s e xp l o r e d f o r p o s i t i ve / ne g a t i ve
c l a s s i f i c a t i o n a r e g i ve n b e l o w :
• O ve r a l l e m o t i c o n s c o r e ( w he r e 1 i s a d d e d t o t he s c o r e i n
c a s e o f p o s i t i ve e m o t i c o n, a nd 1 i s s ub t r a c t e d i n c a se
o f ne g a t i ve e m o t i c o n)
O ve r a l l s c o r e f r o m o nl i ne p o l a r i t y l e xi c o n M P Q A ( w he r e
p r e s e nc e o f s t r o ng p o s i t i ve w o r d i n t he t w e e t i nc r e a ses
t he s c o r e b y 1 . 0 a nd t he p r e s e nc e o f w e a k ne g a t i ve w o r d
w o ul d d e c r e a s e t he s c o r e b y 0 . 5 )
• U ni g r a m w o r d m o d e l s c a l c ul a t e d us i ng N a i ve B a ye s
• N um b e r o f t o t a l e m o t i c o ns i n t he t w e e t
• N um b e r o f p o s i t i ve e m o t i c o ns i n a t w e e t
• N um b e r o f ne g a t i ve e m o t i c o ns i n a t w e e t
• N um b e r o f p o s i t i ve w o r d s f r o m M P Q A l e xi c o n i n t w e e t
• N um b e r o f ne g a t i ve w o r d s f r o m M P Q A l e xi c o n i n t w e e t
• N um b e r o f b a s e - f o r m ve r b s i n a t w e e t
27. 19
• N um b e r o f p a s t t e ns e ve r b s i n a t w e e t
• N um b e r o f p r e s e nt p a r t i c i p l e ve r b s i n a t w e e t
• N um b e r o f p a s t p a r t i c i p l e ve r b s i n a t w e e t
• N um b e r o f 3
r d
p e r s o n s i ng ul a r p r e s e nt ve r b s i n a t w e e t
• N um b e r o f no n - 3
r d
p e r s o n s i ng ul a r p r e s e nt ve r b s i n a
t w e e t
• N um b e r o f p l ur a l no uns i n a t w e e t
• N um b e r o f s i ng ul a r p r o p e r no uns i n a t w e e t
• N um b e r o f c a r d i na l num b e r s i n a t w e e t
• N um b e r o f p r e p o s i t i ons o r c o o r d i na t i ng c o nj unc t i o ns i n a
t w e e t
• N um b e r o f a d ve r b s i n a t w e e t
• N um b e r o f w h- a d ve r b s i n a t w e e t
• N um b e r o f ve r b s o f a l l f o r m s i n a t w e e t
N e xt w e w i l l g i ve m a t he m a t i c a l r e a s o ni ng o f ho w w e
c a l c ul a t e t he uni g r a m w o r d m o d e l s us i ng N a i ve B a ye s . The
b a s i c c o nc e p t i s t o c a l c ul a t e t he p r o b a b i l i t y o f a w o rd
b e l o ng i ng t o a ny o f t he p o s s i b l e c l a s s e s f r o m o ur t r a i ni ng
s a m p l e . U s i ng m a t he m a t i c a l f o r m ul a e w e w i l l d e m o ns t r a t e a n
e xa m p l e o f c a l c ul a t i ng p r o b a b i l i t y o f w o r d b e l o ng t o o b j e c t i ve
a nd s ub j e c t i ve c l a s s . S i m i l a r s t e p s w o ul d ne e d t o b e t a k e n
f o r p o s i t i ve a nd ne g a t i ve c l a s s e s a s w e l l .
W e w i l l s t a r t b y c a l c ul a t i ng t he p r o b a b i l i t y o f a w o r d i n o ur
t r a i ni ng d a t a f o r b e l o ng i ng t o a p a r t i c ul a r c l a s s :
28. 20
F i g u r e 2 : P r o b a b i l i t y F o r m u l a 1
W e no w s t a t e t he B a ye s ’ r ul e [ 1 9 ] . A c c o r d i ng t o t hi s r ul e , i f
w e ne e d t o f i nd t he p r o b a b i l i t y o f w he t he r a t w e e t i s
o b j e c t i ve , w e ne e d t o c a l c ul a t e t he p r o b a b i l i t y o f t w e e t g i ve n
t he o b j e c t i ve c l a s s a nd t he p r i o r p r o b a b i l i t y o f o b j e c t i ve
c l a s s . The t e r m P ( t we e t ) c a n b e s ub s t i t ut e d w i t h P ( t w e e t |
o b j ) + P ( t w e e t | s ub j ) .
F i g u r e 3 : P r o b a b i l i t y F o r m u l a 2
N o w i f w e a s s um e i nd e p e nd e nc e o f t he uni g r a m s i ns i d e t he
t w e e t ( i . e . t he o c c ur r e nc e o f a w o r d i n a t w e e t w i l l no t a f f e ct
t he p r o b a b i l i t y o f o c c ur r e nc e o f a ny o t he r w o r d i n t he t w e e t )
w e c a n a p p r o xi m a t e t he p r o b a b i l i t y o f t w e e t g i ve n t he
o b j e c t i ve c l a s s t o a m e r e p r o d uc t o f t he p r o b a b i l i t y o f a l l t he
w o r d s i n t he t w e e t b e l o ng i ng t o o b j e c t i ve c l a s s . M o r e o ve r , i f
w e a s s um e e q ua l c l a s s s i ze s f o r b o t h o b j e c t i ve a nd
s ub j e c t i ve c l a s s w e c a n i g no r e t he p r i o r p r o b a b i l i t y o f t he
29. 21
o b j e c t i ve c l a s s . H e nc e f o r t h w e a r e l e f t w i t h t he f o l l o w i ng
f o r m ul a , i n w hi c h t he r e a r e t w o d i s t i nc t t e r m s a nd b o t h o f
t he m a r e e a s i l y c a l c ul a t e d t hr o ug h t he f o r m ul a m e nt i o n
a b o ve .
F i g u r e 4 : P r o b a b i l i t y F o r m u l a 3
N o w t ha t w e ha ve t he p r o b a b i l i t y o f o b j e c t i vi t y g i ve n a
p a r t i c ul a r t w e e t , w e c a n e a s i l y c a l c ul a t e t he p r o b a b i l i t y o f
s ub j e c t i vi t y g i ve n t ha t s a m e t w e e t b y s i m p l y s ub t r a c t i ng t he
e a r l i e r t e r m f r o m 1 . Thi s i s b e c a us e p r o b a b i l i t i e s m us t a l w a ys
a d d t o 1 . S o i f w e ha ve i nf o r m a t i o n o f P ( o b j | t we e t ) w e
a ut o m a t i c a l l y k no w P ( s u b j | t we e t ) .
F i g u r e 5 : P r o b a b i l i t y F o r m u l a 4
F i na l l y w e c a l c ul a t e P ( o b j | t w e e t ) f o r e ve r y t w e e t a nd us e
t hi s t e r m a s a s i ng l e f e a t ur e i n o ur o b j e c t i vi t y / s ub j e c t i vi ty
c l a s s i f i c a t i o n.
30. 22
The r e a r e t w o m a i n p o t e nt i a l p r o b l e m s w i t h t hi s a p p r o a ch.
F i r s t b e i ng t ha t i f w e i nc l ud e e ve r y uni q ue w o r d us e d i n t he
d a t a s e t t he n t he l i s t o f w o r d s w i l l b e t o o l a r g e m a k i ng t he
c o m p ut a t i o n t o o e xp e ns i ve a nd t i m e - c o ns um i ng . To s o l ve t hi s
w e o nl y i nc l ud e w o r d s w hi c h ha ve b e e n us e d a t l e a s t 5 t i m es
i n o ur d a t a . Thi s r e d uc e s t he s i ze o f o ur d i c t i o na r y f o r
o b j e c t i ve / s ub j e c t i ve c l a s s i f i c a t i on f r o m 1 1 , 2 1 6 t o 2 , 3 2 0.
W hi l e f o r p o s i t i ve / ne g a t i ve c l a s s i f i c a t i o n uni g r a m d i c t i o nary
s i ze i s r e d uc e d f r o m 6 , 5 0 2 t o 1 , 2 3 5 w o r d s .
The s e c o nd p o t e nt i a l p r o b l e m i s i f i n o ur t r a i ni ng s e t a
p a r t i c ul a r w o r d o nl y a p p e a r s i n a c e r t a i n c l a s s o nl y a nd d oes
no t a p p e a r a t a l l i n t he o t he r c l a s s ( f o r e xa m p l e i f t he w o rd
i s m i s s p e l l e d o nl y o nc e ) . If w e ha ve s uc h a s c e na r i o t he n o ur
c l a s s i f i e r w i l l a l w a ys c l a s s i f y a t w e e t t o t ha t p a r t i c ul a r c l a ss
( r e g a r d l e s s o f a ny o t he r f e a t ur e s p r e s e nt i n t he t w e e t ) j us t
b e c a us e o f t he p r e s e nc e o f t ha t s i ng l e w o r d . Thi s i s a ve r y
ha r s h a p p r o a c h a nd r e s ul t s i n o ve r - f i t t i ng . To a vo i d t hi s w e
m a k e us e o f t he t e c hni q ue k no w n a s “ L a p l a c e S m o o t hi ng ”.
W e r e p l a c e t he f o r m ul a f o r c a l c ul a t i ng t he p r o b a b i l i t y o f a
w o r d b e l o ng i ng t o a c l a s s w i t h t he f o l l o w i ng f o r m ul a :
F i g u r e 6 : P r o b a b i l i t y F o r m u l a 5
In t hi s f o r m ul a “ x” i s a c o ns t a nt f a c t o r c a l l e d t he s m o o t hi ng
f a c t o r , w hi c h w e ha ve a r b i t r a r i l y s e l e c t e d t o b e 1 . H o w t hi s
w o r k s i s t ha t e ve n i f t he c o unt o f a w o r d i n a p a r t i c ul a r c l a ss
31. 23
i s ze r o , t he num e r a t o r s t i l l ha s a s m a l l va l ue s o t he
p r o b a b i l i t y o f a w o r d b e l o ng i ng t o s o m e c l a s s w i l l ne ve r be
e q ua l t o ze r o . Ins t e a d i f t he p r o b a b i l i t y w o ul d ha ve b e e n ze r o
a c c o r d i ng t o t he e a r l i e r f o r m ul a , i t w o ul d b e r e p l a c e b y a ve r y
s m a l l no n- ze r o p r o b a b i l i t y.
The f i na l i s s ue w e ha ve i n f e a t ur e s e l e c t i o n i s c ho o s i ng
t he b e s t f e a t ur e s f r o m a l a r g e num b e r o f f e a t ur e s . O ur
ul t i m a t e a i m i s t o a c hi e ve t he g r e a t e s t a c c ur a c y o f o ur
c l a s s i f i e r w hi l e us i ng l e a s t num b e r o f f e a t ur e s . Thi s i s
b e c a us e a d d i ng ne w f e a t ur e a d d t o t he d i m e ns i o na l i t y o f o ur
c l a s s i f i c a t i o n p r o b l e m a nd t hus a d d t o t he c o m p l e xi t y o f o ur
c l a s s i f i e r . Thi s i nc r e a s e i n c o m p l e xi t y m a y no t ne c e s s a r i ly
b e l i ne a r a nd m a y e ve n b e q ua d r a t i c s o i t i s p r e f e r r e d t o k eep
t he f e a t ur e s a t a m i ni m um l o w . A no t he r i s s ue w e ha ve w i t h
t o o m a ny f e a t ur e s i s t ha t o ur t r a i ni ng d a t a m a y b e o ve r - f i t
a nd i t m a y c o nf us e t he c l a s s i f i e r w he n d o i ng c l a s s i f i c ati on
o n a n unk no w n t e s t s e t , s o t he a c c ur a c y o f t he c l a s s i f i e r m a y
e ve n d e c r e a s e . To s o l ve t hi s i s s ue w e s e l e c t t he m o st
p e r t i ne nt f e a t ur e s b y c o m p ut i ng t he i nf o r m a t i o n - g a i n o f a l l
t he f e a t ur e s und e r e xp l o r a t i o n a nd t he n s e l e c t i ng t he
f e a t ur e s w i t h hi g he s t i nf o r m a t i o n g a i n. W e us e d W EKA
m a c hi ne l e a r ni ng t o o l f o r t hi s t a s k o f f e a t ur e s e l e c t i o n [ 1 7 ] .
W e e xp l o r e d a t o t a l o f 3 3 f e a t ur e s f o r o b j e c t i vi t y / s ub j e c t i vi ty
c l a s s i f i c a t i o n a nd us e d W E K A t o c a l c ul a t e t he i nf o r m a t i on
g a i n f r o m e a c h o f t he s e f e a t ur e s .
Thi s g r a p h i s b a s i c a l l y t he s up e r - i m p o s i t i o n o f 1 0 d i f f e r e nt
g r a p hs , e a c h o ne a r r i ve d t hr o ug h o ne f o l d o ut o f t he 1 0 - f o ld
c r o s s va l i d a t i o n w e p e r f o r m e d . S i nc e w e s e e t ha t a l l t he
g r a p hs a r e ni c e l y o ve r l a p p i ng s o t he r e s ul t s e a c h f o l d a re
a l m o s t t he s a m e w hi c h s ho w s us t ha t t he f e a t ur e s w e s e l e ct
32. 24
w i l l p e r f o r m b e s t i n a l l t he s c e na r i o s . W e s e l e c t e d t he b e s t 5
f e a t ur e s f r o m t hi s g r a p h w hi c h a r e a s f o l l o w s :
1 . U ni g r a m w o r d m o d e l s ( f o r p r i o r p r o b a b i l i t i es o f w o r ds
b e l o ng i ng t o o b j e c t i ve / s ub j e c t i ve c l a s s e s )
2 . P r e s e nc e o f U R L i n t w e e t
3 . P r e s e nc e o f e m o t i c o ns i n t w e e t
4 . N um b e r o f p e r s o na l p r o no uns i n t w e e t
5 . N um b e r o f e xc l a m a t i o n m a r k s i n t w e e t
S i m i l a r l y w e e xp l o r e d 2 2 f e a t ur e s f o r p o s i t i ve / ne g a t i ve
c l a s s i f i c a t i o n a nd us e d W E K A t o c a l c ul a t e t he i nf o r m a t i on
g a i n f r o m e a c h o f t he s e f e a t ur e s .
Thi s g r a p h i s b a s i c a l l y t he s up e r - i m p o s i t i o n o f 1 0 d i f f e r e nt
g r a p hs , e a c h o ne a r r i ve d t hr o ug h o ne f o l d o ut o f t he 1 0 - f o ld
c r o s s va l i d a t i o n w e p e r f o r m e d . S i nc e w e s e e t ha t a l l t he
g r a p hs a r e ni c e l y o ve r l a p p i ng s o t he r e s ul t s e a c h f o l d a re
a l m o s t t he s a m e w h i c h s ho w s us t ha t t he f e a t ur e s w e s e l e ct
w i l l p e r f o r m b e s t i n a l l t he s c e na r i o s . W e s e l e c t e d t he b e s t 5
f e a t ur e s o ut o f w hi c h 2 w e r e r e d und a nt f e a t ur e s a nd w e w e re
l e f t w i t h o nl y 3 f e a t ur e s f o r o ur p o s i t i ve / ne g a t i ve
c l a s s i f i c a t i o n w hi c h a r e a s f o l l o w s :
1 . U ni g r a m w o r d m o d e l s ( f o r p r i o r p r o b a b i l i t i es o f w o r ds
b e l o ng i ng t o p o s i t i ve o r ne g a t i ve c l a s s e s )
2 . N um b e r o f p o s i t i ve e m o t i c o ns i n t w e e t
3 . N um b e r o f ne g a t i ve e m o t i c o ns i n t w e e t
33. 25
The r e d und a nt f e a t ur e s w e c ho s e i g no r e b e c a us e t he y p o sed
no e xt r a i nf o r m a t i o n i n p r e s e nc e o f t he a b o ve f e a t ur e s a r e as
f o l l o w s :
• E m o t i c o n s c o r e f o r t he t w e e t
• M P Q A s c o r e f o r t he t w e e t
4 .4 . C la s s ific a tio n :
P a t t e r n c l a s s i f i c a t i o n i s t he p r o c e s s t hr o ug h w hi c h d a t a i s
d i vi d e d i nt o d i f f e r e nt c l a s s e s a c c o r d i ng t o s o m e c o m m o n
p a t t e r ns w hi c h a r e f o und i n o ne c l a s s w hi c h d i f f e r t o s o me
d e g r e e w i t h t he p a t t e r ns f o und i n t he o t he r c l a s s e s . The
ul t i m a t e a i m o f o ur p r o j e c t i s t o d e s i g n a c l a s s i f i e r w hi c h
a c c ur a t e l y c l a s s i f i e s t w e e t s i n t he f o l l o w i ng f o ur s e nt i m e nt
c l a s s e s : p o s i t i ve , ne g a t i ve , ne ut r a l a nd a m b i g uo us .
The r e c a n b e t w o k i nd s o f s e nt i m e nt c l a s s i f i c at i ons i n t hi s
a r e a : c o nt e xt ua l s e nt i m e nt a na l ys i s a nd g e ne r a l s e nt i m e nt
a na l ys i s . C o nt e xt ua l s e nt i m e nt a na l ys i s d e a l s w i t h
c l a s s i f yi ng s p e c i f i c p a r t s o f a t w e e t a c c o r d i ng t o t he c o nt e xt
p r o vi d e d , f o r e xa m p l e f o r t he t w e e t “ 4 m o r e y e a r s o f b e i n g i n
s h i t h o l e A u s t r a l i a t h e n I m o v e t o t h e U S A : D ” a c o nt e xt ua l
s e nt i m e nt c l a s s i f i e r w o ul d i d e nt i f y A us t r a l i a w i t h ne g a t i ve
s e nt i m e nt a nd U S A w i t h a p o s i t i ve s e nt i m e nt . O n t he o t he r
ha nd g e ne r a l s e nt i m e nt a na l ys i s d e a l s w i t h t he g e ne r al
s e nt i m e nt o f t he e nt i r e t e xt ( t w e e t i n t hi s c a s e ) a s a w ho l e .
Thus f o r t he t w e e t m e nt i o ne d e a r l i e r s i nc e t he r e i s a n o ve r a l l
p o s i t i ve a t t i t ud e , a n a c c ur a t e g e ne r a l s e nt i m e nt c l a s s i fi er
w o ul d i d e nt i f y i t a s p o s i t i ve . F o r o ur p a r t i c ul a r p r o j e c t w e w i l l
o nl y b e d e a l i ng w i t h t he l a t t e r c a s e , i . e . o f g e ne r a l ( o ve r a l l )
s e nt i m e nt a na l ys i s o f t he t w e e t a s a w ho l e .
The c l a s s i f i c at i on a p p r o a c h g e ne r a l l y f o l l o w e d i n t hi s d o m a i n
i s a t w o - s t e p a p p r o a c h. F i r s t O b j e c t i vi t y C l a s s i f i c at i on i s
34. 26
d o ne w hi c h d e a l s w i t h c l a s s i f yi ng a t w e e t o r a p hr a s e as
e i t he r o b j e c t i ve o r s ub j e c t i ve . A f t e r t hi s w e p e r f o r m P o l a r i ty
C l a s s i f i c at i on ( o nl y o n t w e e t s c l a s s i f i e d a s s ub j e c t i ve b y t he
o b j e c t i vi t y c l a s s i f i c a t i o n) t o d e t e r m i ne w he t he r t he t w e e t i s
p o s i t i ve , ne g a t i ve o r b o t h ( s o m e r e s e a r c he r s i nc l ud e t he b o t h
c a t e g o r y a nd s o m e d o n’ t ) . Thi s w a s p r e s e nt e d b y W i l s o n e t
a l . a nd r e p o r t s e nha nc e d a c c ur a c y t ha n a s i m p l e o ne - s t ep
a p p r o a c h [ 1 6 ] .
W e p r o p o s e a no ve l a p p r o a c h w hi c h i s s l i g ht l y d i f f e r e nt f r o m
t he a p p r o a c h p r o p o s e d b y W i l s o n e t a l . [ 1 6 ] . W e p r o p o s e t ha t
i n f i r s t s t e p e a c h t w e e t s ho ul d und e r g o t w o c l a s s i f i e r s : t he
o b j e c t i vi t y c l a s s i f i e r a nd t he p o l a r i t y c l a s s i f i e r . The f o r m er
w o ul d t r y t o c l a s s i f y a t w e e t b e t w e e n o b j e c t i ve a nd s ub j e c t i ve
c l a s s e s , w hi l e l a t t e r w o ul d d o s o b e t w e e n t he p o s i t i ve a nd
ne g a t i ve c l a s s e s . W e us e t he s ho r t - l i s t e d f e a t ur e s f o r t he s e
c l a s s i f i c a t i o ns a nd us e t he N a i ve B a ye s a l g o r i t hm s o t ha t
a f t e r t he f i r s t s t e p w e ha ve t w o num b e r s f r o m 0 t o 1
r e p r e s e nt i ng e a c h t w e e t . O ne o f t he s e num b e r s i s t he
p r o b a b i l i t y o f t w e e t b e l o ng i ng t o o b j e c t i ve c l a s s a nd t he o t he r
num b e r i s p r o b a b i l i t y o f t w e e t b e l o ng i ng t o p o s i t i ve c l a s s.
S i nc e w e c a n e a s i l y c a l c ul a t e t he t w o r e m a i ni ng p r o b a b i li ti es
o f s ub j e c t i ve a nd ne g a t i ve b y s i m p l e s ub t r a c t i o n b y 1 , w e
d o n’ t ne e d t ho s e t w o p r o b a b i l i t i e s .
S o i n t he s e c o nd s t e p w e w o ul d t r e a t e a c h o f t he s e t w o
num b e r s a s s e p a r a t e f e a t ur e s f o r a no t he r c l a s s i f i c a t i o n, i n
w hi c h t he f e a t ur e s i ze w o ul d b e j us t 2 . W e us e W E K A a nd
a p p l y t he f o l l o w i ng M a c hi ne L e a r ni ng a l g o r i t hm s f o r t hi s
s e c o nd c l a s s i f i c a t i o n t o a r r i ve a t t he b e s t r e s ul t :
• K - M e a ns C l us t e r i ng
• S up p o r t V e c t o r M a c hi ne
35. 27
• L o g i s t i c R e g r e s s i o n
• K ne a r e s t N e i g hb o r s
• N a i ve B a ye s
• R ul e B a s e d C l a s s i f i e r s
To b e t t e r und e r s t a nd ho w t hi s w o r k s w e s ho w a p l o t o f a c t ua l
t e s t s e t f r o m o ne o f o ur c r o s s - va l i d a t i o ns o n t he 2 -
d i m e ns i o na l s p a c e m e nt i o ne d t he l a b e l s a r e t he a c t ua l
g r o und t r ut h a nd t he d i s t r i b ut i o n s ho w s ho w t he c l a s s i fi ed
d a t a p o i nt s a r e a c t ua l l y s c a t t e r e d t hr o ug ho ut t he s p a c e . As
w e g o r i g ht t he t w e e t s t a r t s b e c o m i ng i nc r e a s i ng l y o b j e c t i ve
a nd a s w e g o up t he t w e e t s t a r t s b e c o m i ng m o r e p o s i t i ve . The
r e s ul t s f o r o ur c l a s s i f i c a t i o n a p p r o a c h a r e m e nt i o ne d i n t he
ne xt s e c t i o n o f t hi s r e p o r t .
4 .5 . Twe e t Mode W e b Applic a tion:
W e d e s i g ne d a w e b a p p l i c a t i on w hi c h p e r f o r m e d r e a l - t i me
s e nt i m e nt a na l ys i s o n Tw i t t e r o n t w e e t s t ha t m a t c hed
p a r t i c ul a r k e yw o r d s p r o vi d e d b y t he us e r . F o r e xa m p l e i f a
us e r i s i nt e r e s t e d i n p e r f o r m i ng s e nt i m e nt a na l ys i s o n t w e e ts
w hi c h c o nt a i n t he w o r d “ O b a m a ” he / s he w i l l e nt e r t ha t
k e yw o r d a nd t he w e b a p p l i c a t i o n w i l l p e r f o r m t he a p p r o p ri ate
s e nt i m e nt a na l ys i s a nd d i s p l a y t he r e s ul t s f o r t he us e r .
The w e b a p p l i c a t i on ha s b e e n i m p l e m e nt e d us i ng t he G o o g le
A p p E ng i ne s e r vi c e [ 2 1 ] b e c a us e i t c a n b e us e d a s a f r e e w eb
ho s t i ng s e r vi c e a nd i t p r o vi d e s a l a ye r o f a b s t r a c t i o n t o t he
d e ve l o p e r f r o m t he l o w l e ve l w e b o p e r a t i o ns s o i t i s e a s i e r t o
l e a r n. W e i m p l e m e nt e d o ur a l g o r i t hm i n p yt ho n a nd i nt e g r a ted
i t w i t h G U I f o r o ur w e b s i t e us i ng H TM L a nd J a va s c r i p t us i ng
t he j i nj a 2 t e m p l a t e [ 2 3 ] . W e us e d t he G o o g l e V i s ua l i za t i on
36. 28
C ha r t A P I f o r p r e s e nt i ng o ur r e s ul t s i n a g r a p hi c a l , e a s y - t o -
und e r s t a nd m a nne r [ 2 2 ] .
W e ha ve t hr e e w a ys o f p e r f o r m i ng s e nt i m e nt a na l ys i s o n o ur
w e b s i t e a nd w e w i l l d i s c us s e a c h o f t he m o ne b y o ne :
• Tw e e t S c o r e
• Tw e e t C o m p a r e
• Tw e e t S t a t s
4 .5 .1 . T w e e t S c o r e :
Thi s f e a t ur e c a l c ul a t e s t he p o p ul a r i t y s c o r e o f t he k e yw o rd
w hi c h i s a num b e r f r o m 1 0 0 t o - 1 0 0 . The m o r e p o s i t i ve
p o p ul a r i t y s c o r e s ug g e s t s t ha t t he k e yw o r d i s hi g hl y
p o s i t i ve l y p o p ul a r o n Tw i t t e r , w hi l e t he m o r e ne g a t i ve
p o p ul a r i t y s c o r e s ug g e s t s t ha t t he k e yw o r d i s hi g hl y
ne g a t i ve l y p o p ul a r o n Tw i t t e r . A p o p ul a r i t y s c o r e c l o s e t o 0
s ug g e s t s t ha t t he k e yw o r d ha s e i t he r m i xe d o p i ni o ns o r i s no t
a p o p ul a r t o p i c o n Tw i t t e r . The p o p ul a r i t y s c o r e i s d e p e nd e nt
o n t w o r a t i o s :
• N um b e r o f p o s i t i ve c l a s s i f i ed t w e e t s / N um b e r o f ne g a t i ve
c l a s s i f i e d t w e e t s
• N um b e r o f t w e e t s a c q ui r e d / Ti m e i n p a s t ne e d e d t o
e xp l o r e t he R E S T A P I
The f i r s t r a t i o s ug g e s t s i f t he num b e r o f p o s i t i ve t w e e t s i s
l a r g e r t ha n ne g a t i ve t w e e t s o n a p a r t i c ul a r k e yw o r d , t he
k e yw o r d w o ul d ha ve o ve r a l l p o p ul a r o p i ni o n a nd vi c e ve r s a .
The s e c o nd r a t i o s ug g e s t s t ha t t he l e s s e r t i m e i n p a s t w e
ne e d t o e xp l o r e t he R E S T A P I t o g e t t he 1 , 0 0 0 t w e e t s m e a ns
37. 29
t ha t t he m o r e num b e r o f p e o p l e a r e t a l k i ng a b o ut t he k e yw o rd
o n Tw i t t e r , he nc e t he k e yw o r d i s p o p ul a r o n Tw i t t e r . H o w e ve r
i t g i ve s no i nf o r m a t i o n a b o ut t he p o s i t i vi t y o r ne g a t i vi t y o f
t he k e yw o r d a nd s o hi g he r t he s e c o nd r a t i o i s , t he m o re
p o p ul a r i t y s c o r e f r o m t he f i r s t r a t i o i s s hi f t e d t o t he e xt r e m e
e nd s ( a w a y f r o m ze r o ) m a y i t b e i n p o s i t i ve o r ne g a t i ve
d i r e c t i o n d e p e nd s o n w he t he r t he r e a r e m o r e num b e r o f
p o s i t i ve o r ne g a t i ve t w e e t s . F i na l l y a m a xi m um o f 1 0 t w e e ts
a r e d i s p l a ye d f o r e a c h c l a s s ( p o s i t i ve , ne g a t i ve a nd ne ut r a l )
s o t ha t t he us e r d e ve l o p s c o nf i d e nc e i n o ur c l a s s i f i e r .
4 .5 .2 . T w e e t C o m p a r e :
Thi s f e a t ur e c o m p a r e s t he p o p ul a r i t y s c o r e o f t w o o r t hr e e
d i f f e r e nt k e yw o r d s a nd r e p l i e s w i t h w hi c h k e yw o r d i s
c ur r e nt l y m o s t p o p ul a r o n Tw i t t e r . Thi s c a n ha ve m a ny
i nt e r e s t i ng a p p l i c a t i o ns f o r e xa m p l e ha vi ng o ur w eb
a p p l i c a t i on r e c o m m e nd us e r s b e t w e e n m o vi e s , s o ng s a nd
p r o d uc t s / b r a nd s .
4 .5 .3 . T w e e t S ta ts :
Thi s f e a t ur e i s f o r l o ng t e r m s e nt i m e nt a na l ys i s . W e i np ut a
num b e r o f p o p ul a r k e yw o r d s o n Tw i t t e r o n w hi c h a b a c k end
o p e r a t i o n r uns a f t e r e ve r y ho ur , c a l c ul a t e s t he p o p ul a r i ty
s c o r e f o r t he t w e e t s g e ne r a t e d o n t ha t k e yw o r d w i t hi n a n ho ur
t i m e f r a m e a nd s t o r e s t he r e s ul t s a g a i ns t e ve r y ho ur i n a
d a t a b a s e . W e c a n ha ve a m a xi m um o f a b o ut 3 0 0 s uc h
k e yw o r d s a s p e r G o o g l e ’ s b a nd w i d t h r e q ui r e m e nt s . S o o nce
w e ha ve a r e a s o na b l e a m o unt o f d a t a w e c a n us e i t t o p l o t
g r a p hs o f p o p ul a r i t y s c o r e a g a i ns t t i m e a nd vi s ua l i ze t he
e f f e c t o f c ha ng e i n p o p ul a r i t y s c o r e w i t h r e s p e c t t o c e r t a i n
e ve nt s . O nc e w e ha ve c o l l e c t e d e no ug h d a t a w e c a n a l s o us e
i t t o p r e d i c t c o r r e l a t i o n w i t h s o c i o - e c o no m i c p he no m e na l i ke
s t o c k e xc ha ng e r a t e s a nd p o l i t i c a l e l e c t i o ns . W o r k o n t hi s
38. 30
ha s b e e n d o ne b e f o r e b y Tum a s j a n e t a l . [ 4 ] a nd B o l l e n e t a l .
[ 9 ] .
F i g u r e 7 : H o m e P a g e
39. 31
F i g u r e 8 : S e a r c h P a g e
F i g u r e 9 : R e s u l t
40. 32
F i g u r e 1 0 : P o s i t i ve T w e e t s
F i g u r e 1 1 : S Q L D a t a b a s e
41. 33
F i g u r e 1 2 : S Q L D a t a b a s e 2
F i g u r e 1 3 : P o s i t i v e D a t a s e t 1
42. 34
F i g u r e 1 4 : P o s i t i v e d a t a s e t 2
F i g u r e 1 5 : N e g a t i v e D a t a s e t 1
43. 35
F i g u r e 1 6 : N e g a t i ve D a t a s e t 2
44. 36
Chapter 5: RESULT DISCUSSION
W e w i l l f i r s t p r e s e nt o ur r e s ul t s f o r t he o b j e c t i ve / s ub j e c t i ve
a nd p o s i t i ve / ne g a t i ve c l a s s i f i c a t i o ns . The s e r e s ul t s a c t as
t he f i r s t s t e p o f o ur c l a s s i f i c at i on a p p r o a c h. W e o nl y us e t he
s ho r t - l i s t e d f e a t ur e s f o r b o t h o f t he s e r e s ul t s . Thi s m e a ns
t ha t f o r t he o b j e c t i ve / s ub j e c t i ve c l a s s i f i c at i o n w e ha ve 5
f e a t ur e s a nd f o r p o s i t i ve / ne g a t i ve c l a s s i f i c at i on w e ha ve 3
f e a t ur e s . F o r b o t h o f t he s e r e s ul t s w e us e t he N a ïve B a yes
c l a s s i f i c a t i o n a l g o r i t hm , b e c a us e t ha t i s t he a l g o r i t hm w e a re
e m p l o yi ng i n o ur a c t ua l c l a s s i f i c a t i o n a p p r o a c h a t t he f i r st
s t e p . F ur t he r m o r e a l l t he f i g ur e s r e p o r t e d a r e t he r e s ul t o f
1 0 - f o l d c r o s s va l i d a t i o n. W e t a k e a n a ve r a g e o f e a c h o f t he
1 0 va l ue s w e g e t f r o m t he c r o s s va l i d a t i o n.
In a d d i t i o n t o t he a b o ve i nf o r m a t i o n, w e m a k e a c o nd i t i on
w hi l e r e p o r t i ng t he r e s ul t s o f p o l a r i t y c l a s s i f i c a t i o n ( w hi c h
d i f f e r e nt i a t e s b e t w e e n p o s i t i ve a nd ne g a t i ve c l a s s e s ) t ha t
o nl y s ub j e c t i ve l a b e l l e d t w e e t s a r e us e d t o c a l c ul a t e t he s e
r e s ul t s . H o w e ve r , i n c a s e o f f i na l c l a s s i f i c a t i on a p p r o a c h, a ny
s uc h c o nd i t i o n i s r e m o ve d a nd b a s i c a l l y b o t h o b j e c t i vi t y a nd
p o l a r i t y c l a s s i f i c a t i ons a r e a p p l i e d t o a l l t w e e t s r e g a r d l e s s of
w he t he r t he y a r e l a b e l l e d o b j e c t i ve o r s ub j e c t i ve .
If w e c o m p a r e t he s e r e s ul t s t o t ho s e p r o vi d e d b y W i l s o n e t
a l . [ 1 6 ] ( r e s ul t s a r e d i s p l a ye d i n Ta b l e 2 a nd Ta b l e 3 o f t hi s
r e p o r t ) w e s e e t ha t a l t ho ug h t he a c c ur a c y o f ne ut r a l c l a ss
f a l l s f r o m 8 2 . 1 % t o 7 3 % i f w e us e o ur c l a s s i f i c a t i o n i ns t ead
o f t he i r s . H o w e ve r , f o r a l l o t he r c l a s s e s w e r e p o rt
s i g ni f i c a nt l y g r e a t e r r e s ul t s . A l t ho ug h t he r e s ul t s p r e s e nted
b y W i l s o n e t a l . a r e no t f r o m Tw i t t e r d a t a t he y a r e o f p hr a se
l e ve l s e nt i m e nt a na l ys i s w hi c h i s ve r y c l o s e i n c o nc e p t t o
t w i t t e r s e nt i m e nt a na l ys i s . N e xt w e w i l l c o m p a r e o ur r e s ul t s
45. 37
w i t h t ho s e p r e s e nt e d b y G o e t a l . [ 2 ] . The r e s ul t s p r e s e nted
b y t hi s p a p e r a r e a s f o l l o w s :
If w e c o m p a r e t he s e r e s ul t s t o o ur s , w e s e e t ha t t he y a re
m o r e o r l e s s s i m i l a r . H o w e ve r , w e a r r i ve a t c o m p a r able
r e s ul t s w i t h j us t 1 0 f e a t ur e s a nd a b o ut 9 , 0 0 0 t r a i ni ng d a t a.
In c o nt r a s t t o t hi s , t he y us e d a b o ut 1 . 6 m i l l i o n no i s y l a b e l s.
The i r l a b e l s w e r e no i s y i n t he s e ns e t ha t t he t w e e t s t ha t
c o nt a i ne d p o s i t i ve e m o t i c o ns w e r e l a b e l l e d a s p o s i t i ve , w hi l e
t ho s e w i t h ne g a t i ve e m o t i c o ns w e r e l a b e l l e d ne g a t i ve . The
r e s t o f t he t w e e t s ( w hi c h d i d no t c o nt a i n a ny e m o t i c o n) w e re
d i s c a r d e d f r o m t he d a t a s e t . S o i n t hi s w a y t he y ho p e d t o
a c hi e ve hi g h r e s ul t s w i t ho ut hum a n l a b e l l i ng b ut a t t he c o st
o f us i ng hum o ng o us l a r g e num b e r a m o unt o f d a t a s e t .
N e xt w e w i l l p r e s e nt o ur r e s ul t s f o r t he c o m p l ete
c l a s s i f i c a t i o n. W e no t e t ha t t he b e s t r e s ul t s a r e r e a c hed
t hr o ug h S up p o r t V e c t o r M a c hi ne b e i ng a p p l i e d a t t he s e c o nd
s t a g e o f t he c l a s s i f i c a t i o n p r o c e s s . H e nc e t he r e s ul t s b e l ow
w i l l o nl y p e r t a i n t o t ho s e o f S V M . The s e r e s ul t s us e a t o t a l
o f t w o f e a t ur e s : P ( o b j e c t i vi t y | t w e e t ) a nd P ( p o s i t i vi t y |
t w e e t ) . B ut i f w e i nc l ud e a l l t he f e a t ur e s e m p l o ye d i n s t e p 1
o f t he c l a s s i f i c a t i o n p r o c e s s , w e ha ve a l i s t o f 8 s ho r t l i s ted
f e a t ur e s ( 3 f o r p o l a r i t y c l a s s i f i c at i on a nd 5 f o r o b j e c t i vi ty
c l a s s i f i c a t i o n) . The f o l l o w i ng r e s ul t s a r e r e p o r t e d a f t er
c o nd uc t i ng 1 0 - f o l d c r o s s va l i d a t i o n:
In c o m p a r i s o n w i t h t he s e r e s ul t s , K o ul o m p i s e t a l . [ 7 ] r e p o r ts
a ve r a g e F - m e a s ur e o f 6 8 % . H o w e ve r w he n t he y i nc l ud e
a no t he r p o r t i o n o f t he i r d a t a i nt o t he i r c l a s s i f i c a t i o n p r o c ess
( w hi c h t he y c a l l t he H A S H d a t a ) , t he i r a ve r a g e F - m e a s ure
d r o p s t o 6 5 % . In c o nt r a s t t o t hi s w e a c hi e ve a ve r a g e F -
m e a s ur e o f m o r e t ha n 7 0 % w hi c h s ho w s b e t t e r p e r f o r m a nce
t ha n e i t he r o f t he s e r e s ul t s . M o r e o ve r w e m a k e us e o f o nl y 8
46. 38
f e a t ur e s a nd 9 , 0 0 0 l a b e l l e d t w e e t s , w hi l e t he i r p r o c ess
i nvo l ve s a b o ut 1 5 f e a t ur e s i n t o t a l a nd m o r e t ha n 2 2 0 , 000
t w e e t s i n t he i r t r a i ni ng s e t . O ur uni g r a m w o r d m o d e l s a r e a l so
s i m p l e r t ha n t he i r s , b e c a us e t he y i nc o r p o r a t e ne g a t i o n i nt o
t he i r w o r d m o d e l s . H o w e ve r l i k e i n t he c a s e o f ( 1 - 9 ) t he i r
t w e e t s a r e no t l a b e l l e d b y hum a ns , b ut r a t he r und e r g o no i s y
l a b e l l i ng i n t w o w a ys : l a b e l s a c q ui r e d f r o m p o s i t i ve a nd
ne g a t i ve e m o t i c o ns a nd ha s ht a g s .
F i na l l y w e c o nc l ud e t ha t o ur c l a s s i f i c at i o n a p p r o a c h p r o vi des
i m p r o ve m e nt i n a c c ur a c y b y us i ng e ve n t he s i m p l e s t f e a t ur es
a nd s m a l l a m o unt o f d a t a s e t . H o w e ve r t he r e a r e s t i l l a
num b e r o f t hi ng s w e w o ul d l i k e t o c o ns i d e r a s f ut ur e w o r k
w hi c h w e m e nt i o n i n t he ne xt s e c t i o n.
47. 39
Chapter 6: CONCLUSION AND FUTURE
RECOMMENDATIONS
The t a s k o f s e nt i m e nt a na l ys i s , e s p e c i a l l y i n t he d o m a i n o f
m i c r o - b l o g g i ng , i s s t i l l i n t he d e ve l o p i ng s t a g e a nd f a r f r o m
c o m p l e t e . S o w e p r o p o s e a c o up l e o f i d e a s w hi c h w e f e e l a re
w o r t h e xp l o r i ng i n t he f ut ur e a nd m a y r e s ul t i n f ur t he r
i m p r o ve d p e r f o r m a nc e . R i g ht no w w e ha ve w o r k e d w i t h o nl y
t he ve r y s i m p l e s t uni g r a m m o d e l s ; w e c a n i m p r o ve t ho s e
m o d e l s b y a d d i ng e xt r a i nf o r m a t i o n l i k e c l o s e ne s s o f t he w o rd
w i t h a ne g a t i o n w o r d . W e c o ul d s p e c i f y a w i nd o w p r i o r t o t he
w o r d ( a w i nd o w c o ul d f o r e xa m p l e b e o f 2 o r 3 w o r d s ) und e r
c o ns i d e r a t i on a nd t he e f f e c t o f ne g a t i o n m a y b e i nc o r p o r ated
i nt o t he m o d e l i f i t l i e s w i t hi n t ha t w i nd o w . The c l o s e r t he
ne g a t i o n w o r d i s t o t he uni g r a m w o r d w ho s e p r i o r p o l a r i t y i s
t o b e c a l c ul a t e d , t he m o r e i t s ho ul d a f f e c t t he p o l a r i t y. F or
e xa m p l e i f t he ne g a t i o n i s r i g ht ne xt t o t he w o r d , i t m a y
s i m p l y r e ve r s e t he p o l a r i t y o f t ha t w o r d a nd f a r t he r t he
ne g a t i o n i s f r o m t he w o r d t he m o r e m i ni m i ze d i f s e f f e ct
s ho ul d b e .
A p a r t f r o m t hi s , w e a r e c ur r e nt l y o nl y f o c us i ng o n uni g r a ms
a nd t he e f f e c t o f b i g r a m s a nd t r i g r a m s m a y b e e xp l o r e d . As
r e p o r t e d i n t he l i t e r a t ur e r e vi e w s e c t i o n w he n b i g r a m s a re
us e d a l o ng w i t h uni g r a m s t hi s us ua l l y e nha nc e s p e r f o r m a nc e.
H o w e ve r f o r b i g r a m s a nd t r i g r a m s t o b e a n e f f e c t i ve f e a t ur e
w e ne e d a m uc h m o r e l a b e l e d d a t a s e t t ha n o ur m e a g e r 9 , 0 00
t w e e t s . R i g ht no w w e a r e e xp l o r i ng P a r t s o f S p e e c h s e p a r ate
f r o m t he uni g r a m m o d e l s , w e c a n t r y t o i nc o r p o r a t e P OS
i nf o r m a t i o n w i t hi n o ur uni g r a m m o d e l s i n f ut ur e . S o s a y
i ns t e a d o f c a l c ul a t i ng a s i ng l e p r o b a b i l i t y f o r e a c h w o r d l i ke
P ( wo r d | o b j ) w e c o ul d i ns t e a d ha ve m ul t i p l e p r o b a b i l i t i e s f or
e a c h a c c o r d i ng t o t he P a r t o f S p e e c h t he w o r d b e l o ng s t o .
48. 40
F o r e xa m p l e w e m a y ha ve P ( wo r d | o b j , v e r b ) , P ( wo r d | o b j ,
n o u n ) a n d P ( wo r d | o b j , a d j e c t i v e) . P a ng e t a l . [ 5 ] us e d a
s o m e w ha t s i m i l a r a p p r o a c h a nd c l a i m s t ha t a p p e nd i ng P OS
i nf o r m a t i o n f o r e ve r y uni g r a m r e s ul t s i n no s i g ni f i c a nt c ha nge
i n p e r f o r m a nc e ( w i t h N a i ve B a ye s p e r f o r m i ng s l i g ht l y b e t t er
a nd S V M ha vi ng a s l i g ht d e c r e a s e i n p e r f o r m a nc e ) , w hi l e
t he r e i s a s i g ni f i c a nt d e c r e a s e i n a c c ur a c y i f o nl y a d j e c t i ve
uni g r a m s a r e us e d a s f e a t ur e s . H o w e ve r t he s e r e s ul t s a r e f o r
c l a s s i f i c a t i o n o f r e vi e w s a nd m a y b e ve r i f i e d f o r s e nt i m e nt
a na l ys i s o n m i c r o b l o g g i ng w e b s i t e s l i k e Tw i t t e r .
O ne m o r e f e a t ur e w e t ha t i s w o r t h e xp l o r i ng i s w he t he r t he
i nf o r m a t i o n a b o ut r e l a t i ve p o s i t i o n o f w o r d i n a t w e e t ha s a ny
e f f e c t o n t he p e r f o r m a nc e o f t he c l a s s i f i e r . A l t ho ug h P a ng e t
a l . e xp l o r e d a s i m i l a r f e a t ur e a nd r e p o r t e d ne g a t i ve r e s ul t s ,
t he i r r e s ul t s w e r e b a s e d o n r e vi e w s w hi c h a r e ve r y d i f f e r e nt
f r o m t w e e t s a nd t he y w o r k e d o n a n e xt r e m e l y s i m p l e m o d e l .
O ne p o t e nt i a l p r o b l e m w i t h o ur r e s e a r c h i s t ha t t he s i ze s o f
t he t hr e e c l a s s e s a r e no t e q ua l . The o b j e c t i ve c l a s s w hi c h
c o nt a i ns 4 , 5 4 3 t w e e t s i s a b o ut t w i c e t he s i ze s o f p o s i t i ve a nd
ne g a t i ve c l a s s e s w hi c h c o nt a i n 2 , 5 4 3 a nd 1 , 8 7 7 t w e e ts
r e s p e c t i ve l y. The p r o b l e m w i t h une q ua l c l a s s e s i s t ha t t he
c l a s s i f i e r t r i e s t o i nc r e a s e t he o ve r a l l a c c ur a c y o f t he s ys t e m
b y i nc r e a s i ng t he a c c ur a c y o f t he m a j o r i t y c l a s s , e ve n i f t ha t
c o m e s a t t he c o s t o f d e c r e a s e i n a c c ur a c y o f t he m i no r i ty
c l a s s e s . Tha t i s t he ve r y r e a s o n w hy w e r e p o r t s i g ni f i c a nt ly
hi g he r a c c ur a c i e s f o r o b j e c t i ve c l a s s a s o p p o s e d t o p o s i t i ve
o r ne g a t i ve c l a s s e s . To o ve r c o m e t hi s p r o b l e m a nd ha ve t he
c l a s s i f i e r e xhi b i t no b i a s t o w a r d s a ny o f t he c l a s s e s , i t i s
ne c e s s a r y t o l a b e l m o r e d a t a ( t w e e t s ) s o t ha t a l l t hr e e o f o ur
c l a s s e s a r e a l m o s t e q ua l .
49. 41
In t hi s r e s e a r c h w e a r e f o c us i ng o n g e ne r a l s e nt i m e nt
a na l ys i s . The r e i s p o t e nt i a l o f w o r k i n t he f i e l d o f s e nt i m e nt
a na l ys i s w i t h p a r t i a l l y k no w n c o nt e xt . F o r e xa m p l e w e no t i ced
t ha t us e r s g e ne r a l l y us e o ur w e b s i t e f o r s p e c i f i c t yp e s o f
k e yw o r d s w hi c h c a n d i vi d e d i nt o a c o up l e o f d i s t i nc t c l a s s es,
na m e l y: p o l i t i c s / p ol i t i c i ans , c e l e b r i t i es , p r o d uc t s / b r a nds,
s p o r t s / s p o r t s m e n, a nd m e d i a / m o vi e s / m us i c . S o w e c a n
a t t e m p t t o p e r f o r m s e p a r a t e s e nt i m e nt a na l ys i s o n t w e e ts
t ha t o nl y b e l o ng t o o ne o f t he s e c l a s s e s ( i . e . t he t r a i ni ng d a ta
w o ul d no t b e g e ne r a l b ut s p e c i f i c t o o ne o f t he s e c a t e g o r i es)
a nd c o m p a r e t he r e s ul t s w e g e t i f w e a p p l y g e ne r a l s e nt i m e nt
a na l ys i s o n i t i ns t e a d .
L a s t b ut no t t he l e a s t , w e c a n a t t e m p t t o m o d e l hum a n
c o nf i d e nc e i n o ur s ys t e m . F o r e xa m p l e i f w e ha ve 5 hum a n
l a b e l e r s l a b e l l i ng e a c h t w e e t , w e c a n p l o t t he t w e e t i n t he 2 -
d i m e ns i o na l o b j e c t i vi t y / s ub j e c t i vi t y a nd p o s i t i vi t y /
ne g a t i vi t y p l a ne w hi l e d i f f e r e nt i a t i ng b e t w e e n t w e e t s i n w hi c h
a l l 5 l a b e l s a g r e e , o nl y 4 a g r e e , o nl y 3 a g r e e o r no m a j o r i ty
vo t e i s r e a c he d . W e c o ul d d e ve l o p o ur c us t o m c o s t f unc t i o n
f o r c o m i ng up w i t h o p t i m i ze d c l a s s b o und a r i e s s uc h t ha t
hi g he s t w e i g ht a g e i s g i ve n t o t ho s e t w e e t s i n w hi c h a l l 5
l a b e l s a g r e e a nd a s t he num b e r o f a g r e e m e nt s s t a r t
d e c r e a s i ng , s o d o t he w e i g ht s a s s i g ne d . In t hi s w a y t he
e f f e c t s o f hum a n c o nf i d e nc e c a n b e vi s ua l i ze d i n s e nt i m e nt
a na l ys i s .
50. 42
Chapter 7: REFERENCES
[ 1 ] A l b e r t B i f f e t a nd E i b e F r a nk . S e nt i m e nt K no w l e dge
D i s c o ve r y i n Tw i t t e r S t r e a m i ng D a t a . D i s c o v e r y S c i e nce,
L e c t u r e N o t e s i n C o m p u t e r S c i e n c e , 2 0 1 0 , V o l ume
6 3 3 2 / 2 0 1 0 , 1 - 1 5 , D O I: 1 0 . 1 0 0 7 / 9 7 8 - 3 - 6 4 2 - 1 6 1 8 4 - 1 _ 1
[ 2 ] A l e c G o , R i c ha B ha ya ni a nd L e i H ua ng . Tw i t t e r S e nt i m e nt
C l a s s i f i c at i on us i ng D i s t a nt S up e r vi s i o n. P r o j e c t T e c h n i cal
R e p o r t , S t a n f o r d U n i v e r s i t y , 2 0 0 9 .
[ 3 ] A l e xa nd e r P a k a nd P a t r i c k P a r o ub e k . Tw i t t e r a s a C o r p us
f o r S e nt i m e nt A na l ys i s a nd O p i ni o n M i ni ng . I n P r o c e e d i n g s of
i n t e r n a t i o na l c o n f e r e n c e o n L a n g u a g e R e s o u r c e s a nd
E v a l u a t i o n ( L R E C ) , 2 0 1 0 .
[ 4 ] A nd r a ni k Tum a s j a n, Ti m m O . S p r e ng e r , P hi l i p p G .
S a nd ne r a nd Is a b e l l M . W e l p e . P r e d i c t i ng E l e c t i o ns w i t h
Tw i t t e r : W ha t 1 4 0 C ha r a c t e r s R e ve a l a b o ut P o l i t i cal
S e nt i m e nt . I n P r o c e e d i n gs o f A A A I C o n f e r e n c e o n W e b l ogs
a n d S o c i a l M e d i a ( I C W S M ) , 2 0 1 0 .
[ 5 ] B o P a ng , L i l l i a n L e e a nd S hi va k um a r V a i t hya na t ha n.
Thum b s up ? S e nt i m e nt C l a s s i f i c at i on us i ng M a c hi ne L e a r ni ng
Te c hni q ue s . I n P r o c e e d i n gs o f t h e C o n f e r e n c e o n E m p i ri cal
M e t h o d s i n N a t u r a l L a n g u a g e P r o c e s s i n g ( E M N L P ) , 2 0 0 2 .
[ 6 ] C he nha o Ta n, L i l i a n L e e , J i e Ta ng , L o ng J i a ng , M i ng Zho u
a nd P i ng L i . U s e r L e ve l S e nt i m e nt A na l ys i s Inc o r p o r a t i ng
S o c i a l N e t w o r k s . I n P r o c e e d i ng s o f A C M S p e c i a l I n t e r e st
G r o u p o n K n o wl e d g e D i s c o v er y a n d D a t a M i n i n g ( S I G K D D ) ,
2 0 1 1 .
[ 7 ] E f t hym i o s K o ul o um p i s , The r e s a W i l s o n a nd J o ha nna
M o o r e . Tw i t t e r S e nt i m e nt A na l ys i s : The G o o d t he B a d a nd t he
51. 43
O M G ! I n P r o c e e d i ng s o f A A A I C o n f e r e n c e o n W e b l o g s a nd
S o c i a l M e d i a ( I C W S M ) , 2 0 1 1 .
[ 8 ] H a t zi va s s i l o g l o u, V . , & M c K e o w n, K . R . . P r e d i c t i ng t he
s e m a nt i c o r i e nt a t i o n o f a d j e c t i ve s . In P r o c e e d i n g s o f t h e 3 5 th
A n n u a l M e e t i n g o f t h e A C L a n d t h e 8 t h C o n f e r e n c e o f t he
E u r o p e a n C h a p t e r o f t h e A C L , 2 0 0 9 .
[ 9 ] J o ha nn B o l l e n, A l b e r t o P e p e a nd H ui na M a o . M o d e l l i ng
P ub l i c M o o d a nd E m o t i o n: Tw i t t e r S e nt i m e nt a nd s o c i o -
e c o no m i c p he no m e na . I n P r o c e e d i n gs o f A A A I C o n f e r e n ce on
W e b l o g s a n d S o c i a l M e d i a ( I C W S M ) , 2 0 1 1 .
[ 1 0 ] L uc i a no B a r b o s a a nd J unl a n F e ng . R o b us t S e nt i m e nt
D e t e c t i o n o n Tw i t t e r f r o m B i a s e d a nd N o i s y D a t a . I n
P r o c e e d i ng s o f t h e i n t e r n a t i on a l c o n f e r e n c e on
C o m p u t a t i o n a l L i n g u i s t i c s ( C O L I N G ) , 2 0 1 0 .
[ 1 1 ] P e t e r D . Tur ne y. Thum b s U p o r Thum b s D o w n? S e m a nti c
O r i e nt a t i o n A p p l i e d t o U ns up e r vi s e d C l a s s i f i c at i on o f
R e vi e w s . I n P r o c e e d i n gs o f t h e A n n u a l M e e t i n g o f t he
A s s o c i a t i o n o f C o m p u t a t i o n a l L i n g u i s t i c s ( A C L ) , 2 0 0 2 .
[ 1 2 ] R ud y P r a b o w o a nd M i k e The l w a l l . S e n t i m e nt A na l ys i s: A
C o m b i ne d A p p r o a c h. J o ur na l o f Inf o m e t r i c s , V o l um e 3 , Is s ue
2 , A p r i l 2 0 0 9 , P a g e s 1 4 3 - 1 5 7 , 2 0 0 9 .
[ 1 3 ] S a m ue l B r o d y a nd N i c ho l a s D i a k o p o ul us.
C o o o o o o oo oo o oo oo l l l l l l l l l l l l l l ! ! ! ! ! ! ! ! ! ! ! ! ! ! U s i ng W o rd
L e ng t he ni ng t o D e t e c t S e nt i m e nt i n M i c r o b l o g s . I n
P r o c e e d i ng s o f E m p i r i c a l M e t h o d s o n N a t u r a l L a n g uage
P r o c e s s i n g ( E M N L P ) , 2 0 1 1 .
[ 1 4 ] S o o - M i n K i m a nd E d ua r d H o vy. D e t e r m i ni ng t he
S e nt i m e nt o f O p i ni o ns . I n P r o c e e d i n gs o f I n t e r n a t i onal
C o n f e r e n c e o n C o m p u t a t i o n a l L i n g u i s t i c s ( I C C L ) , 2 0 0 4 .
52. 44
[ 1 5 ] S t e f a no B a c c i a ne l l a , A nd r e a E s ul i , F a b r i zi o S e b a s t i ani .
S E N TIW O R D N E T 3 . 0 : A n E nha nc e d L e xi c a l R e s o ur c e f o r
S e nt i m e nt A na l ys i s a nd O p i ni o n M i ni ng . I n P r o c e e d i ng s o f
i n t e r n a t i o na l c o n f e r e n c e o n L a n g u a g e R e s o u r c e s a nd
E v a l u a t i o n ( L R E C ) , 2 0 1 0 .
[ 1 6 ] The r e s a W i l s o n, J a nyc e W i e b e a nd P a ul H o f f m a nn.
R e c o g ni zi ng C o nt e xt ua l P o l a r i t y i n P hr a s e - L e ve l S e nt i m e nt
A na l ys i s . I n t h e A n n u a l M e e t i n g o f A s s o c i a t i o n o f
C o m p u t a t i o na l L i n g u i st i cs : H u m a n L a n g u a g e T e c h n ol ogi es
( A C L - H L T ) , 2 0 0 5 .
[ 1 7 ] Ia n H . W i t t e n, E i b e F r a nk & M a r k A . H a l l . D a t a M i ni ng –
P r a c t i c a l M a c hi ne L e a r ni ng To o l s a nd Te c hni q ue s .
[ 1 9 ] R i c g a r d O . D ud a , P e t e r E . H a r t & D a vi d G . S t o r k : P a t t e rn
C l a s s i f i c a t i o n.
[ 1 9 ] S t e ve n B i r d , E ve n K l e i n & E d w a r d L o p e r . N a t ur a l
L a ng ua g e P r o c e s s i ng w i t h P yt ho n.
[ 2 0 ] B e n P a r r . Tw i t t e r H a s 1 0 0 M i l l i o n M o nt hl y A c t i ve U s e r s;
5 0 % L o g In E ve r y d a y.
< ht t p : / / m a s ha b l e . c o m / 2 0 1 1 / 1 0 / 1 7 / t w i t t e r - c o s t o l o - s t a t s / >
[ 2 1 ] G o o g l e A p p E ng i ne
< ht t p s : / / d e ve l o p e r s . g o o g l e . c o m / a p p e ng i ne / >
[ 2 2 ] G o o g l e C ha r t A P I
< ht t p s : / / d e ve l o p e r s . g o o g l e . c o m / c ha r t / >
[ 2 3 ] J i nj a 2 : Te m p l a t i ng L a ng ua g e f o r P yt ho n
< ht t p : / / j i nj a . p o c o o . o r g / >
[ 2 4 ] M a g g i e S hi e l d s , T e c h n o l o gy R e p o r t e r , B B C N e ws .
Tw i t t e r c o - f o und e r J a c k D o r s e y r e j o i ns c o m p a ny.
< ht t p : / / w w w . b b c . c o . uk / ne w s / b us i ne s s - 1 2 8 8 9 0 4 8 > .
53. 45
[ 2 5 ] M ul t i P e r s p e c t i ve Q ue s t i o n A ns w e r i ng ( M P Q A ) O nl i ne
L e xi c o n < ht t p : / / w w w . c s . p i t t . e d u/ m p q a / s ub j _ l e xi c o n. ht m l >
[ 2 6 ] Tw e e t S t r e a m : S i m p l e Tw i t t e r S t r e a m i ng A P I A c c ess
< ht t p : / / p yp i . p yt ho n. o r g / p yp i / t w e e t s t r e a m >
[ 2 7 ] Tw i t t e r R E S T A P I ht t p s : / / d e v. t w i t t e r . c o m / d o c s / a p i
[ 2 8 ] Tw i t t e r S e nt i m e nt , a n o n l i n e a p p l i c a t i o n p e r form i ng
s e n t i m e nt c l a s s i f i c a t i o n o f T wi t t e r .
< ht t p : / / t w i t t e r s e nt i m e nt . a p p s p o t . c o m / >