Big Data + Sentiment Analysis = Awesome

Big Data + Sentiment
Analysis = Awesome
Adel Rahimi
Sharif University of Technology

TABLE OF CONTENT
• Introduction to big data and its
usage
• Sentiment analysis and its use in
NLP
• How to big data?!
• Tools to use
• Further study

WHAT IS BIG DATA?
• Big data is a term denoting
the storage and usage of vast
amount of data, either
structured or unstructured, on
the cloud.

USAGES OF BIG DATA
• Internet Search
• Finance
• Business Informatics

SPECIFICATIONS OF BIG DATA
• Volume: big data doesn't sample; it just
observes and tracks what happens
• Velocity: big data is often available in real-time
• Variety: big data draws from text, images, audio,
video; plus it completes missing pieces through
data fusion
• Machine learning: big data often doesn't ask
why and simply detects patterns
• Digital footprint: big data is often a cost-free
byproduct of digital interaction

COMPANIES WHO USE BIG DATA
• eBay.com uses two data warehouses at 7.5 petabytes and 40PB as well as a 40PB
Hadoop cluster for search, consumer recommendations, and merchandising.
• Amazon.com handles millions of back-end operations every day, as well as
queries from more than half a million third-party sellers. The core technology
that keeps Amazon running is Linux-based and as of 2005 they had the world's
three largest Linux databases, with capacities of 7.8 TB, 18.5 TB, and 24.7 TB.
• Facebook handles 50 billion photos from its user base.
• Google was handling roughly 100 billion searches per month as of August 2012.
• Oracle NoSQL Database has been tested to past the 1M ops/sec mark with 8
shards and proceeded to hit 1.2M ops/sec with 10 shards.

APPLICATIONS OF BIG DATA (II)
• Fashion Trends 2016: Google Data
Shows What Shoppers Want
In April, searches for bomber jackets grew
297% YoY in the U.K. and 612% YoY in the
U.S.

APPLICATIONS OF BIG DATA (III)
• IN CASE YOU WERE WONDERING WHAT EXACTLY IS
“BOMBER JACKET”!

ADVANTAGES OF BIG DATA
• Cheap and mass storage
• Faster processors
• Cheap open source platforms such as 'Hadoop’
• Cloud computing is a huge advancement in the field
when dealing with Big Data
• Parallel processing, large grid environments and high
connectivity

HOW WILL BIG DATA HELP US?
• Predict what customers want before they ask for it
• Get customers excited about their own data
• Improve customer service interactions
• Identify customer pain points and solve them
• Reduce health care costs and improve treatment

WHAT IS SENTIMENT ANALYSIS?
•The movie was awesome
👍
•The movie was awful 👎
•The movie was long 😕

SENTIMENT ANALYSIS WORK
FLOW
input
tokenization
Stop-word
filtering
Negation
handling
stemming
classificatio
n
Sentiment
analysis

TWITTER SENTIMENT ANALYSIS
WORK-FLOW
Tokenization
Tweet
Speech
Tagging
WordNet
WSD
SentiWordNet
Interpretation
Sentiment
Orientation
Tweet
Classified

PREPROCESSING
• Removing non-English Tweets
• Replacing Emoticons by their polarity
• Remove URL, Target Mentions, Hashtags, Numbers
• Replace Negative Mentions
• Replace Sequence of Repeated Characters eg.
‘coooooooool’ by ‘coool’
• Remove Nouns and Prepositions

EXAMPLE OF TWITTER
SENTIMENT ANALYSIS
@BonksMullet @chet_sellers This is very accurate and hilarious. Well
done :)
tweet
accurate#1 conforming exactly or almost exactly to fact or to a standard
or performing with total accuracy; "an accurate reproduction"; "the
accounting was accurate"; "accurate measurements"; "an accurate scale"
synset
WSD
SentiWordNet
Pos_score Neg_score Obj_score
0.5 0 0.5
score

WORDNET
Is a dictionary-like database of English which has
all the words and their synonyms.
The Persian equivalent of wordnet is Farsnet
available at Shahid Beheshti University.
http://dadegan.ir/catalog/farsnet

SENTIWORDNET
• SentiWordNet
Is an extended version of wordnet which has the
sentiment of each word written.

AFINN
• AFINN list of English words which are rated by their
sentiment, from -5 (negative) to +5 (positive).
• AFINN-111 contains 2477 words.
• Examples:
Abilities 2
Ability 2
Aboard 1
Absentee -1

APACHE NUTCH
• We use apache Nutch as a web crawler because it’s
blazingly fast.

ELASTICSEARCH DATABASE
• Elasticsearch is one of the fastest dabases, using
elasticsearch helps speeding up the process

APACHE HADOOP
• Hadoop uses MapReduce algorithm for stream
processing which is extremely fast and reliable.

APACHE SPARK
• Apache Spark is a fast and general engine for big data
processing, with built-in modules for streaming, SQL,
machine learning and graph processing.

APACHE CASSANDRA
• The Apache Cassandra database is the right choice
when you need scalability and high availability without
compromising performance.

REFERENCES AND FURTHER
STUDY
• What Is Big Data? | SAS. (n.d.). Retrieved from
http://www.sas.com/en_us
• 5 ways companies are using big data to help their
customers | VentureBeat | Business | by
brianabillingham. (n.d.). Retrieved from
http://venturebeat.com/2014/04/21/5-ways-big-data-
is-helping-companies-help-their-customers/
• http://sentiwordnet.isti.cnr.it/
• SentiWordNet 3.0: An Enhanced Lexical Resource for
Sentiment Analysis and Opinion Mining
• https://github.com/linkTDP/BigDataAnalysis_TweetSen
timent

REFERENCES AND FURTHER
STUDY
• AFFIN-111 -
http://www2.imm.dtu.dk/pubdb/views/publication_det
ails.php?id=6010
• Reviews ClassificationUsing SentiWordNet Lexicon -
http://www.academia.edu/1336655/Reviews_Classificat
ion_Using_SentiWordNet_Lexicon
• Using SentiWordNet and Sentiment Analysis for
Detecting Radical Content on Web Forums -
http://www.jeremyellman.com/jeremy_unn/pdfs/1_____
Chalothorn_Ellman_SKIMA_2012.pdf
• From tweets to polls: Linking text sentiment to public
opinion time series -
http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/
paper/viewFile/1536/1842

Big Data + Sentiment Analysis = Awesome

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data + Sentiment Analysis = Awesome

Similar to Big Data + Sentiment Analysis = Awesome (20)

More from Adel Rahimi

More from Adel Rahimi (13)

Recently uploaded

Recently uploaded (20)

Big Data + Sentiment Analysis = Awesome

Editor's Notes