SlideShare a Scribd company logo
Different types of analysis Sentiment analysis Stopword removal Next meetings
Big Data and Automated Content Analysis
Week 4 – Monday
»Sentiment Analysis«
Damian Trilling
d.c.trilling@uva.nl
@damian0604
www.damiantrilling.net
Afdeling Communicatiewetenschap
Universiteit van Amsterdam
18 April 2016
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Today
1 Different types of analysis
What can we do?
Systematizing analytical approaches
2 Data analysis 1: Sentiment analysis
What is it?
Bag-of-words approaches
Advanced approaches
A sentiment analysis tailored to your needs!
3 NLP-preview: Stopword removal
Natural language processing
A simple algorithm
4 Take-home message, next meetings, & exam
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What we already can do
with regard to data collection:
• query a (JSON-based) API (GoogleBooks, Twitter)
• handle CSV files
• handle JSON files
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What we already can do
with regard to data collection:
• query a (JSON-based) API (GoogleBooks, Twitter)
• handle CSV files
• handle JSON files
with regard to analysis:
Not much. We counted some frequencies and calculated some
averages.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What can we do?
Data analysis 0: Overview
What can we do?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What can we do?
Data analysis 0: Overview
What can we do?
What do you think? What are interesting methods to
analyze large data sets (like, e.g., social media data? What
questions can they answer?)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What can we do?
What else can we do?
For example
• sentiment analysis
• automated coding with regular expressions
• natural language processing
• supervised and unsupervised machine learning
• network analysis
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What can we do?
What else can we do?
Or ideally. . .
. . . a combination of these techniques.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Overview
Systematizing analytical approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the structure
• Number of Tweets over time
• singleton/retweet ratio
• Distribution of number of Tweets per user
• Interaction networks
Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities.
International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the structure
• Number of Tweets over time
• singleton/retweet ratio
• Distribution of number of Tweets per user
• Interaction networks
⇒ Focus on the amount of content and on the question who
interacts with whom, not on what is said
Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities.
International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the content
• Sentiment analysis
• Word frequencies, searchstrings
• Co-word analysis (⇒frames)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
Taking the example of Twitter:
Analyzing the content
• Sentiment analysis
• Word frequencies, searchstrings
• Co-word analysis (⇒frames)
⇒ Focus on what is said
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Systematizing analytical approaches
Systematizing analytical approaches
⇒ It depends on your reserach question which approach is
more interesting!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
Data analysis 1:
Sentiment analysis
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
• advanced approaches: different emotions
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
What is sentiment analysis?
Extracting subjective information from texts
• the author’s attitude towards the topic of the text
• polarity: negative—positive
• subjectivity: neutral—subjective *
• advanced approaches: different emotions
* Less sophisticated approaches do not see this as a sperate dimension but
simply calculate objectivity = 1 − (negativity + positivity)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
Applications
Who uses it?
• Companies
• especially for Web Analytics
• Social Scientists
• applications in data journalism, politics, . . .
Many references to examples in Mostafa (2013).
⇒ Cases in which you have a huge amount of data or real-time
data and you want to get an idea of the tone.
Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expert
Systems with Applications, 40(10), 4241– 4251. doi:10.1016/j.eswa.2013.01.019
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
What is it?
Example
1 >>> sentiment("Great service by @NSHighspeed")
2 (0.8, 0.75)
3 >>> sentiment("Bad service by @NSHighspeed")
4 (-0.6166666666666667, 0.6666666666666666)
(polarity, subjectivity) with
−1 ≤ polarity ≤ +1
0 ≤ subjectivity ≤ +1 )
This is the module pattern.nl, available for Python 2 only.
De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Data analysis 1: Sentiment analysis
Bag-of-words approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
• More advanced: look up a subjectivity score from a table
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
How does it work?
• We take each word of a text and look if it’s positive or
negative.
• Most simple way: compare it with a list of negative words and
with a list of positive words (That’s what Mostafa (2013) did)
• More advanced: look up a subjectivity score from a table
• e.g., add up the scores and average them.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
How to do this
If you were to run an analyis like the one by Mostafa (2013), how
could you do this?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
How to do this
(given a string tekst that you want to analyze and two lists of strings with negative
and positive words, lijstpos=["great","fantastic",...,"perfect"] and
lijstneg)
1 sentiment=0
2 for woord in tekst.split():
3 if woord in lijstpos:
4 sentiment=sentiment+1 #same as sentiment+=1
5 elif woord in lijstneg:
6 sentiment=sentiment-1 #same as sentiment-=1
7 print (sentiment)
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Do we need to have the lists in our program itself?
No.
You could have them in a separate text file, one per row, and then
read that file directly to a list.
1 poslijst=open("filewithonepositivewordperline.txt").read().splitlines()
2 neglijst=open("filewithonenegativewordperline.txt").read().splitlines()
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
More advanced versions
• CSV files or similar tables with weights
• Or some kind of dict?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Your thoughts?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Mustafa 2013: Interpreting the output
Your thoughts?
• each word
counts equally
(1)
• many tweets
contain no
words from the
list. What does
this mean?
• Ways to
improve BOW
approaches?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar
verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande
politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
pro
• easy to implement
• easy to modify:
• add or remove words
• make new lists for other languages, other categories (than
positive/negative), . . .
• easy to understand (transparency, reproducability)
e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar
verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande
politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Bag-of-words approaches
Bag-of-words approaches
con
• simplistic assumptions
• e.g., intensifiers cannot be interpreted ("really" in "really
good" or "really bad")
• or, even more important, negations.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Data analysis 1: Sentiment analysis
Advanced approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Improving the BOW approach
Example: The Sentistrenght algorithm
• −5 . . . − 1 and +1 . . . + 5
• spelling correction
• "booster word list" for strengthening/weakening the effect of
the following word
• interpreting repeated letters ("baaaaaad"), CAPITALS and !!!
• idioms
• negation
• ldots
Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the
American Society for Information Science and Technology, 63(1), 163-173.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Advanced approaches
Take the structure of a text into account
• Try to apply linguistics concepts to identify sentence structure
• can identify negations
• can interpret intensifiers
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Example
1 from pattern.nl import sentiment
2 >>> sentiment("Great service by @NSHighspeed")
3 (0.8, 0.75)
4 >>> sentiment("Really")
5 (0.0, 1.0)
6 >>> sentiment("Really Great service by @NSHighspeed")
7 (1.0, 1.0)
(polarity, subjectivity) with
−1 ≤ polarity ≤ +1
0 ≤ subjectivity ≤ +1 )
Unlike in pure bag-of-words approaches, here, the overall sentiment is not just
the sum or the average of its parts!
De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Advanced approaches
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Advanced approaches
pro
• understand intensifiers or negation
• thus: higher accuracy
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Advanced approaches
Advanced approaches
pro
• understand intensifiers or negation
• thus: higher accuracy
con
• Black box? Or do we understand the algorithm?
• Difficult to adapt to own needs
• really much better results?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A sentiment analysis tailored to your needs!
Data analysis 1: Sentiment analysis
A sentiment analysis tailored to your needs!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A sentiment analysis tailored to your needs!
A sentiment analysis tailored to your needs!
Identifying suicidal texts
• Bag-of-words-approach with very specific dictionary
• added negation
• added regular expression search for key phrases
• Very specific design requirements: False positives are OK,
false negatives not!
Huang, Y.-P., Goh, T., & Liew, C.L. (2007). Hunting suicide notes in web 2.0 – preliminary findings. Ninth IEEE
International Symposium on Multimedia. Retrieved from
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4476021
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A sentiment analysis tailored to your needs!
Already this still relatively simple approach seems to work
satisfactory, but if 106 scientists from 24 competing teams (!)
work on it, they can
Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew,
C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16.
Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A sentiment analysis tailored to your needs!
Already this still relatively simple approach seems to work
satisfactory, but if 106 scientists from 24 competing teams (!)
work on it, they can
group suidide notes by these characteristics:
• swear
• family
• friend
• positive emotion
• negative emotion
• anxiety
• anger
• sad
• cognitive process
• biology
• sexual
• ingestion
• religion
• death
Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew,
C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16.
Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A sentiment analysis tailored to your needs!
Sidenote
An alternative state-of-the-art approach:
Use supervised machine learning
• Instead of defining rules, hand-code (“annotate”) the
sentiment of some tweets manually and let the computer find
out which words or characters (“features”) predict sentiment
• Then use this model to predict sentiment for other tweets
• Essentially the same like what you know since the second year
of your Bachelor: regression analysis (but now with DV
sentiment and IV’s word occurrences)
• ⇒ week 8
Gonzalez-Bailon, S., & Paltoglou, G. (2015). Signals of public opinion in online communication: A comparison of
methods and data sources. The ANNALS of the American Academy of Political and Social Science, 659(1),
95-–107.
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Natural Language Processing preview:
Stopword removal
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Natural Language Processing preview:
Stopword removal
Why now? — Because the logic of the algorithm is very much
related to the one of our first simple sentiment analysis
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Natural language processing
Stopword removal: What and why?
Why remove stopwords?
• If we want to identify key terms (e.g., by means of a word
count), we are not interested in them
• If we want to calculate document similarity, it might be
inflated
• If we want to make a word co-occurance graph, irrelevant
information will dominate the picture
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A simple algorithm
Stopword removal: How
1 testo=’He gives her a beer and a cigarette.’
2 testonuovo=""
3 stopwords=[’and’,’the’,’a’,’or’,’he’,’she’,’him’,’her’]
4 for verbo in testo.split():
5 if verbo not in stopwords:
6 testonuovo=testonuovo+verbo+" "
What do we get if we do:
1 print (testonuovo)
Can you explain the algorithm?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A simple algorithm
We get:
1 >>> print (testonuovo)
2 ’He gives beer cigarette. ’
Why is "He" still in there?
How can we fix this?
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
A simple algorithm
Stopword removal
1 testo=’He gives her a beer and a cigarette.’
2 testonuovo=""
3 stopwords=[’and’,’the’,’a’,’or’,’he’,’she’,’him’,’her’]
4 for verbo in testo.split():
5 if verbo.lower() not in stopwords:
6 testonuovo=testonuovo+verbo+" "
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Take-home message
Mid-term take-home exam
Next meetings
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Take-home messages
What you should be familiar with:
• You should have completely understood last week’s exercise.
Re-read it if neccessary.
• Approaches to the analysis (e.g., structure vs. content)
• Types of sentiment analysis, application areas, pros and cons
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Mid-term take home exam
Week 5: Monday, 25 April, to Sunday, 1 May
• You get the exam on Monday at the end of the meeting
• Answers have to be handed in no later than Sunday, 23.59
• 30% of final grade
• 3 questions:
1 Literature question: Different methods (“Explain how. . . is
done”) and/or epistemological or theoretical implications
(“What does this mean for social-scientific research?”)
2 Empirical question (conceptual)
3 Empirical question (actual programming task)
If you fully understood all exercises until now, it shouldn’t be
difficult and won’t take too long. But give yourself a lot of
buffer time!!!
Big Data and Automated Content Analysis Damian Trilling
Different types of analysis Sentiment analysis Stopword removal Next meetings
Next meetings
Wednesday, 20–4
We work together on the examples in Chapter 5. You can bring
your own data (you will probably learn more!), but then already
think about how to write some script to read the data (as we did
last week and as described in Chapter 4).
Big Data and Automated Content Analysis Damian Trilling

More Related Content

What's hot

From Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information SpacesFrom Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information Spaces
Christoph Trattner
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Rich Heimann
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
Christoph Trattner
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
Rich Heimann
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
SSSW
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
Vaibhav Kurkute
 
Recommending Items in Social Tagging Systems Using Tag and Time Information
Recommending Items in Social Tagging Systems Using Tag and Time InformationRecommending Items in Social Tagging Systems Using Tag and Time Information
Recommending Items in Social Tagging Systems Using Tag and Time Information
Christoph Trattner
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Gabriel Moreira
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
Citizens in the Making
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
Frank van Harmelen
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
CosmoAIMS Bassett
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
Lars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
Sara Hooker
 
Data Science Workshop - day 2
Data Science Workshop - day 2Data Science Workshop - day 2
Data Science Workshop - day 2
Aseel Addawood
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
Tony Russell-Rose
 
Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De Meester
Ben De Meester
 
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
Christoph Trattner
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
Skylar Ritchie
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
Karry Lu
 

What's hot (20)

From Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information SpacesFrom Search to Predictions in Tagged Information Spaces
From Search to Predictions in Tagged Information Spaces
 
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
Big Data Analytics: Discovering Latent Structure in Twitter; A Case Study in ...
 
Recommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human CategorizationRecommending Tags with a Model of Human Categorization
Recommending Tags with a Model of Human Categorization
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
Recommending Items in Social Tagging Systems Using Tag and Time Information
Recommending Items in Social Tagging Systems Using Tag and Time InformationRecommending Items in Social Tagging Systems Using Tag and Time Information
Recommending Items in Social Tagging Systems Using Tag and Time Information
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Grounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methodsGrounded theory meets big data: One way to marry ethnography and digital methods
Grounded theory meets big data: One way to marry ethnography and digital methods
 
The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?The Web of Data: do we actually understand what we built?
The Web of Data: do we actually understand what we built?
 
Mauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshopMauritius Big Data and Machine Learning JEDI workshop
Mauritius Big Data and Machine Learning JEDI workshop
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Data Science Workshop - day 2
Data Science Workshop - day 2Data Science Workshop - day 2
Data Science Workshop - day 2
 
Text Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and TomorrowText Analytics: Yesterday, Today and Tomorrow
Text Analytics: Yesterday, Today and Tomorrow
 
Public PhD Defense - Ben De Meester
Public PhD Defense - Ben De MeesterPublic PhD Defense - Ben De Meester
Public PhD Defense - Ben De Meester
 
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
Je t’aime… moi non plus: reporting on the opportunities, expectations and cha...
 
Text Analytics Presentation
Text Analytics PresentationText Analytics Presentation
Text Analytics Presentation
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 

Viewers also liked

Reunió pedagògica 1r 2016 2017 definitiva
Reunió pedagògica 1r 2016 2017 definitivaReunió pedagògica 1r 2016 2017 definitiva
Reunió pedagògica 1r 2016 2017 definitiva
novesarrels
 
Taller de refuerzo 1
Taller de refuerzo 1Taller de refuerzo 1
Taller de refuerzo 1
Hernan Carrillo Aristizabal
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
PCM
PCMPCM
e0101
e0101e0101
e0101
y8b6z9a7
 
Presentacion
PresentacionPresentacion
Presentacion
fernando taco
 
Viennapharmacia ltd
Viennapharmacia ltdViennapharmacia ltd
Viennapharmacia ltd
Ginka Petrova
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
Department of Communication Science, University of Amsterdam
 
People, Culture, & Perceptions
People, Culture, & PerceptionsPeople, Culture, & Perceptions
People, Culture, & Perceptions
Kate Clarke, PhD.
 
6.1 the roman republic
6.1 the roman republic6.1 the roman republic
6.1 the roman republic
jtoma84
 
Andean Summit Mini Agenda
Andean Summit Mini AgendaAndean Summit Mini Agenda
Andean Summit Mini Agenda
Lucas Alexandre
 
Ejercicios de retroalimentacion
Ejercicios de retroalimentacionEjercicios de retroalimentacion
Ejercicios de retroalimentacion
Jorge Pulido
 
Tipos de cromosomas
Tipos de cromosomasTipos de cromosomas
Tipos de cromosomas
Francisco Espinosa
 

Viewers also liked (14)

Reunió pedagògica 1r 2016 2017 definitiva
Reunió pedagògica 1r 2016 2017 definitivaReunió pedagògica 1r 2016 2017 definitiva
Reunió pedagògica 1r 2016 2017 definitiva
 
Taller de refuerzo 1
Taller de refuerzo 1Taller de refuerzo 1
Taller de refuerzo 1
 
BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2BDACA1516s2 - Lecture2
BDACA1516s2 - Lecture2
 
BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7BDACA1516s2 - Lecture7
BDACA1516s2 - Lecture7
 
PCM
PCMPCM
PCM
 
e0101
e0101e0101
e0101
 
Presentacion
PresentacionPresentacion
Presentacion
 
Viennapharmacia ltd
Viennapharmacia ltdViennapharmacia ltd
Viennapharmacia ltd
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
People, Culture, & Perceptions
People, Culture, & PerceptionsPeople, Culture, & Perceptions
People, Culture, & Perceptions
 
6.1 the roman republic
6.1 the roman republic6.1 the roman republic
6.1 the roman republic
 
Andean Summit Mini Agenda
Andean Summit Mini AgendaAndean Summit Mini Agenda
Andean Summit Mini Agenda
 
Ejercicios de retroalimentacion
Ejercicios de retroalimentacionEjercicios de retroalimentacion
Ejercicios de retroalimentacion
 
Tipos de cromosomas
Tipos de cromosomasTipos de cromosomas
Tipos de cromosomas
 

Similar to BDACA1516s2 - Lecture4

BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
Lane-SlidesMania.pptx
Lane-SlidesMania.pptxLane-SlidesMania.pptx
Lane-SlidesMania.pptx
AngeCustodio
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
santoshi mangalgi
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
Infosectrain3
 
Diamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisDiamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) Analysis
Scott K. Wilder
 
BD-ACA week7a
BD-ACA week7aBD-ACA week7a
Social CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence NetworkingSocial CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence Networking
Comintelli
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
Khushboo Gupta
 
new.pptx
new.pptxnew.pptx
new.pptx
NILOYDASGUPTA4
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
Marquis Cabrera
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
CloudxLab
 
Juliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience ResearchJuliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience Research
Web Directions
 
Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011
Maya Marashlian
 
Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011
MayaMar
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
Andrea Gigli
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
OmSatpathy
 

Similar to BDACA1516s2 - Lecture4 (20)

BD-ACA week4a
BD-ACA week4aBD-ACA week4a
BD-ACA week4a
 
BD-ACA week1b
BD-ACA week1bBD-ACA week1b
BD-ACA week1b
 
BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1BDACA1617s2 - Lecture 1
BDACA1617s2 - Lecture 1
 
BDACA - Lecture1
BDACA - Lecture1BDACA - Lecture1
BDACA - Lecture1
 
Lane-SlidesMania.pptx
Lane-SlidesMania.pptxLane-SlidesMania.pptx
Lane-SlidesMania.pptx
 
BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
 
Twitter data analysis using R
Twitter data analysis using RTwitter data analysis using R
Twitter data analysis using R
 
How to Analyze Data (1).pptx
How to Analyze Data (1).pptxHow to Analyze Data (1).pptx
How to Analyze Data (1).pptx
 
Diamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisDiamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) Analysis
 
BD-ACA week7a
BD-ACA week7aBD-ACA week7a
BD-ACA week7a
 
Social CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence NetworkingSocial CI: A Work method and a tool for Competitive Intelligence Networking
Social CI: A Work method and a tool for Competitive Intelligence Networking
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Citi Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics PresentationCiti Global T4I Accelerator Data and Analytics Presentation
Citi Global T4I Accelerator Data and Analytics Presentation
 
Deep Learning Overview
Deep Learning OverviewDeep Learning Overview
Deep Learning Overview
 
Juliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience ResearchJuliette Melton - Mobile User Experience Research
Juliette Melton - Mobile User Experience Research
 
Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011Intellexy social media analysis solutions d2011
Intellexy social media analysis solutions d2011
 
Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011Intellexy Social Media Monitoring and Analysis Solutions D2011
Intellexy Social Media Monitoring and Analysis Solutions D2011
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 

More from Department of Communication Science, University of Amsterdam

BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
Department of Communication Science, University of Amsterdam
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
Department of Communication Science, University of Amsterdam
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
Department of Communication Science, University of Amsterdam
 
BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3BDACA1516s2 - Lecture3
Should we worry about filter bubbles?
Should we worry about filter bubbles?Should we worry about filter bubbles?

More from Department of Communication Science, University of Amsterdam (18)

BDACA - Tutorial5
BDACA - Tutorial5BDACA - Tutorial5
BDACA - Tutorial5
 
BDACA - Lecture5
BDACA - Lecture5BDACA - Lecture5
BDACA - Lecture5
 
BDACA - Lecture3
BDACA - Lecture3BDACA - Lecture3
BDACA - Lecture3
 
BDACA - Lecture2
BDACA - Lecture2BDACA - Lecture2
BDACA - Lecture2
 
BDACA - Tutorial1
BDACA - Tutorial1BDACA - Tutorial1
BDACA - Tutorial1
 
BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
 
BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
 
BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
 
BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3BDACA1617s2 - Lecture3
BDACA1617s2 - Lecture3
 
BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2BDACA1617s2 - Lecture 2
BDACA1617s2 - Lecture 2
 
BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1BDACA1617s2 - Tutorial 1
BDACA1617s2 - Tutorial 1
 
Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...Media diets in an age of apps and social media: Dealing with a third layer of...
Media diets in an age of apps and social media: Dealing with a third layer of...
 
Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"Data Science: Case "Political Communication 2/2"
Data Science: Case "Political Communication 2/2"
 
Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"Data Science: Case "Political Communication 1/2"
Data Science: Case "Political Communication 1/2"
 
BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6
 
BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5
 
BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3BDACA1516s2 - Lecture3
BDACA1516s2 - Lecture3
 
Should we worry about filter bubbles?
Should we worry about filter bubbles?Should we worry about filter bubbles?
Should we worry about filter bubbles?
 

Recently uploaded

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Fajar Baskoro
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
Celine George
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
Celine George
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 

Recently uploaded (20)

A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Pengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptxPengantar Penggunaan Flutter - Dart programming language1.pptx
Pengantar Penggunaan Flutter - Dart programming language1.pptx
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
How to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 InventoryHow to Setup Warehouse & Location in Odoo 17 Inventory
How to Setup Warehouse & Location in Odoo 17 Inventory
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17How to Fix the Import Error in the Odoo 17
How to Fix the Import Error in the Odoo 17
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 

BDACA1516s2 - Lecture4

  • 1. Different types of analysis Sentiment analysis Stopword removal Next meetings Big Data and Automated Content Analysis Week 4 – Monday »Sentiment Analysis« Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 18 April 2016 Big Data and Automated Content Analysis Damian Trilling
  • 2. Different types of analysis Sentiment analysis Stopword removal Next meetings Today 1 Different types of analysis What can we do? Systematizing analytical approaches 2 Data analysis 1: Sentiment analysis What is it? Bag-of-words approaches Advanced approaches A sentiment analysis tailored to your needs! 3 NLP-preview: Stopword removal Natural language processing A simple algorithm 4 Take-home message, next meetings, & exam Big Data and Automated Content Analysis Damian Trilling
  • 3. Different types of analysis Sentiment analysis Stopword removal Next meetings What we already can do with regard to data collection: • query a (JSON-based) API (GoogleBooks, Twitter) • handle CSV files • handle JSON files Big Data and Automated Content Analysis Damian Trilling
  • 4. Different types of analysis Sentiment analysis Stopword removal Next meetings What we already can do with regard to data collection: • query a (JSON-based) API (GoogleBooks, Twitter) • handle CSV files • handle JSON files with regard to analysis: Not much. We counted some frequencies and calculated some averages. Big Data and Automated Content Analysis Damian Trilling
  • 5. Different types of analysis Sentiment analysis Stopword removal Next meetings What can we do? Data analysis 0: Overview What can we do? Big Data and Automated Content Analysis Damian Trilling
  • 6. Different types of analysis Sentiment analysis Stopword removal Next meetings What can we do? Data analysis 0: Overview What can we do? What do you think? What are interesting methods to analyze large data sets (like, e.g., social media data? What questions can they answer?) Big Data and Automated Content Analysis Damian Trilling
  • 7. Different types of analysis Sentiment analysis Stopword removal Next meetings What can we do? What else can we do? For example • sentiment analysis • automated coding with regular expressions • natural language processing • supervised and unsupervised machine learning • network analysis Big Data and Automated Content Analysis Damian Trilling
  • 8. Different types of analysis Sentiment analysis Stopword removal Next meetings What can we do? What else can we do? Or ideally. . . . . . a combination of these techniques. Big Data and Automated Content Analysis Damian Trilling
  • 9. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Overview Systematizing analytical approaches Big Data and Automated Content Analysis Damian Trilling
  • 10. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the structure • Number of Tweets over time • singleton/retweet ratio • Distribution of number of Tweets per user • Interaction networks Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095 Big Data and Automated Content Analysis Damian Trilling
  • 11. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the structure • Number of Tweets over time • singleton/retweet ratio • Distribution of number of Tweets per user • Interaction networks ⇒ Focus on the amount of content and on the question who interacts with whom, not on what is said Bruns, A., & Stieglitz, S. (2013). Toward more systematic Twitter analysis: metrics for tweeting activities. International Journal of Social Research Methodology. doi:10.1080/13645579.2012.756095 Big Data and Automated Content Analysis Damian Trilling
  • 12. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the content • Sentiment analysis • Word frequencies, searchstrings • Co-word analysis (⇒frames) Big Data and Automated Content Analysis Damian Trilling
  • 13. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Systematizing analytical approaches Taking the example of Twitter: Analyzing the content • Sentiment analysis • Word frequencies, searchstrings • Co-word analysis (⇒frames) ⇒ Focus on what is said Big Data and Automated Content Analysis Damian Trilling
  • 14. Different types of analysis Sentiment analysis Stopword removal Next meetings Systematizing analytical approaches Systematizing analytical approaches ⇒ It depends on your reserach question which approach is more interesting! Big Data and Automated Content Analysis Damian Trilling
  • 15. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? Data analysis 1: Sentiment analysis Big Data and Automated Content Analysis Damian Trilling
  • 16. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts Big Data and Automated Content Analysis Damian Trilling
  • 17. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text Big Data and Automated Content Analysis Damian Trilling
  • 18. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive Big Data and Automated Content Analysis Damian Trilling
  • 19. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * Big Data and Automated Content Analysis Damian Trilling
  • 20. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * • advanced approaches: different emotions Big Data and Automated Content Analysis Damian Trilling
  • 21. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? What is sentiment analysis? Extracting subjective information from texts • the author’s attitude towards the topic of the text • polarity: negative—positive • subjectivity: neutral—subjective * • advanced approaches: different emotions * Less sophisticated approaches do not see this as a sperate dimension but simply calculate objectivity = 1 − (negativity + positivity) Big Data and Automated Content Analysis Damian Trilling
  • 22. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? Applications Who uses it? • Companies • especially for Web Analytics • Social Scientists • applications in data journalism, politics, . . . Many references to examples in Mostafa (2013). ⇒ Cases in which you have a huge amount of data or real-time data and you want to get an idea of the tone. Mostafa, M. M. (2013). More than words: Social networks’ text mining for consumer brand sentiments. Expert Systems with Applications, 40(10), 4241– 4251. doi:10.1016/j.eswa.2013.01.019 Big Data and Automated Content Analysis Damian Trilling
  • 23. Different types of analysis Sentiment analysis Stopword removal Next meetings What is it? Example 1 >>> sentiment("Great service by @NSHighspeed") 2 (0.8, 0.75) 3 >>> sentiment("Bad service by @NSHighspeed") 4 (-0.6166666666666667, 0.6666666666666666) (polarity, subjectivity) with −1 ≤ polarity ≤ +1 0 ≤ subjectivity ≤ +1 ) This is the module pattern.nl, available for Python 2 only. De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067. Big Data and Automated Content Analysis Damian Trilling
  • 24. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Data analysis 1: Sentiment analysis Bag-of-words approaches Big Data and Automated Content Analysis Damian Trilling
  • 25. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. Big Data and Automated Content Analysis Damian Trilling
  • 26. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) Big Data and Automated Content Analysis Damian Trilling
  • 27. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) • More advanced: look up a subjectivity score from a table Big Data and Automated Content Analysis Damian Trilling
  • 28. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches How does it work? • We take each word of a text and look if it’s positive or negative. • Most simple way: compare it with a list of negative words and with a list of positive words (That’s what Mostafa (2013) did) • More advanced: look up a subjectivity score from a table • e.g., add up the scores and average them. Big Data and Automated Content Analysis Damian Trilling
  • 29. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches How to do this If you were to run an analyis like the one by Mostafa (2013), how could you do this? Big Data and Automated Content Analysis Damian Trilling
  • 30. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches How to do this (given a string tekst that you want to analyze and two lists of strings with negative and positive words, lijstpos=["great","fantastic",...,"perfect"] and lijstneg) 1 sentiment=0 2 for woord in tekst.split(): 3 if woord in lijstpos: 4 sentiment=sentiment+1 #same as sentiment+=1 5 elif woord in lijstneg: 6 sentiment=sentiment-1 #same as sentiment-=1 7 print (sentiment) Big Data and Automated Content Analysis Damian Trilling
  • 31. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Do we need to have the lists in our program itself? No. You could have them in a separate text file, one per row, and then read that file directly to a list. 1 poslijst=open("filewithonepositivewordperline.txt").read().splitlines() 2 neglijst=open("filewithonenegativewordperline.txt").read().splitlines() Big Data and Automated Content Analysis Damian Trilling
  • 32.
  • 33. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches More advanced versions • CSV files or similar tables with weights • Or some kind of dict? Big Data and Automated Content Analysis Damian Trilling
  • 34.
  • 35.
  • 36. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Big Data and Automated Content Analysis Damian Trilling
  • 37. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Your thoughts? Big Data and Automated Content Analysis Damian Trilling
  • 38. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Mustafa 2013: Interpreting the output Your thoughts? • each word counts equally (1) • many tweets contain no words from the list. What does this mean? • Ways to improve BOW approaches? Big Data and Automated Content Analysis Damian Trilling
  • 39. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. Big Data and Automated Content Analysis Damian Trilling
  • 40. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches pro • easy to implement • easy to modify: • add or remove words • make new lists for other languages, other categories (than positive/negative), . . . • easy to understand (transparency, reproducability) e.g., Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. Big Data and Automated Content Analysis Damian Trilling
  • 41. Different types of analysis Sentiment analysis Stopword removal Next meetings Bag-of-words approaches Bag-of-words approaches con • simplistic assumptions • e.g., intensifiers cannot be interpreted ("really" in "really good" or "really bad") • or, even more important, negations. Big Data and Automated Content Analysis Damian Trilling
  • 42. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Data analysis 1: Sentiment analysis Advanced approaches Big Data and Automated Content Analysis Damian Trilling
  • 43. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Improving the BOW approach Example: The Sentistrenght algorithm • −5 . . . − 1 and +1 . . . + 5 • spelling correction • "booster word list" for strengthening/weakening the effect of the following word • interpreting repeated letters ("baaaaaad"), CAPITALS and !!! • idioms • negation • ldots Thelwall, M., Buckley, K., & Paltoglou, G. (2012). Sentiment strength detection for the social Web. Journal of the American Society for Information Science and Technology, 63(1), 163-173. Big Data and Automated Content Analysis Damian Trilling
  • 44. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Advanced approaches Take the structure of a text into account • Try to apply linguistics concepts to identify sentence structure • can identify negations • can interpret intensifiers Big Data and Automated Content Analysis Damian Trilling
  • 45. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Example 1 from pattern.nl import sentiment 2 >>> sentiment("Great service by @NSHighspeed") 3 (0.8, 0.75) 4 >>> sentiment("Really") 5 (0.0, 1.0) 6 >>> sentiment("Really Great service by @NSHighspeed") 7 (1.0, 1.0) (polarity, subjectivity) with −1 ≤ polarity ≤ +1 0 ≤ subjectivity ≤ +1 ) Unlike in pure bag-of-words approaches, here, the overall sentiment is not just the sum or the average of its parts! De Smedt, T., & Daelemans W. (2012). Pattern for Python. Journal of Machine Learning Research, 13, 2063-2067. Big Data and Automated Content Analysis Damian Trilling
  • 46. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Advanced approaches Big Data and Automated Content Analysis Damian Trilling
  • 47. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Advanced approaches pro • understand intensifiers or negation • thus: higher accuracy Big Data and Automated Content Analysis Damian Trilling
  • 48. Different types of analysis Sentiment analysis Stopword removal Next meetings Advanced approaches Advanced approaches pro • understand intensifiers or negation • thus: higher accuracy con • Black box? Or do we understand the algorithm? • Difficult to adapt to own needs • really much better results? Big Data and Automated Content Analysis Damian Trilling
  • 49. Different types of analysis Sentiment analysis Stopword removal Next meetings A sentiment analysis tailored to your needs! Data analysis 1: Sentiment analysis A sentiment analysis tailored to your needs! Big Data and Automated Content Analysis Damian Trilling
  • 50. Different types of analysis Sentiment analysis Stopword removal Next meetings A sentiment analysis tailored to your needs! A sentiment analysis tailored to your needs! Identifying suicidal texts • Bag-of-words-approach with very specific dictionary • added negation • added regular expression search for key phrases • Very specific design requirements: False positives are OK, false negatives not! Huang, Y.-P., Goh, T., & Liew, C.L. (2007). Hunting suicide notes in web 2.0 – preliminary findings. Ninth IEEE International Symposium on Multimedia. Retrieved from http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4476021 Big Data and Automated Content Analysis Damian Trilling
  • 51. Different types of analysis Sentiment analysis Stopword removal Next meetings A sentiment analysis tailored to your needs! Already this still relatively simple approach seems to work satisfactory, but if 106 scientists from 24 competing teams (!) work on it, they can Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16. Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render Big Data and Automated Content Analysis Damian Trilling
  • 52. Different types of analysis Sentiment analysis Stopword removal Next meetings A sentiment analysis tailored to your needs! Already this still relatively simple approach seems to work satisfactory, but if 106 scientists from 24 competing teams (!) work on it, they can group suidide notes by these characteristics: • swear • family • friend • positive emotion • negative emotion • anxiety • anger • sad • cognitive process • biology • sexual • ingestion • religion • death Pestian, J.P.; Matykiewicz, P., Linn-Gust, M., South, B., Uzuner, O., Wiebe, J., Cohen, K.B., Hurdle, J., & Brew, C. (2012). Sentiment analysis of suicide notes: A shared task. Biomedical Informatics Insights, 5(1), p. 3-16. Retrieved from http://europepmc.org/articles/PMC3299408?pdf=render Big Data and Automated Content Analysis Damian Trilling
  • 53. Different types of analysis Sentiment analysis Stopword removal Next meetings A sentiment analysis tailored to your needs! Sidenote An alternative state-of-the-art approach: Use supervised machine learning • Instead of defining rules, hand-code (“annotate”) the sentiment of some tweets manually and let the computer find out which words or characters (“features”) predict sentiment • Then use this model to predict sentiment for other tweets • Essentially the same like what you know since the second year of your Bachelor: regression analysis (but now with DV sentiment and IV’s word occurrences) • ⇒ week 8 Gonzalez-Bailon, S., & Paltoglou, G. (2015). Signals of public opinion in online communication: A comparison of methods and data sources. The ANNALS of the American Academy of Political and Social Science, 659(1), 95-–107. Big Data and Automated Content Analysis Damian Trilling
  • 54. Different types of analysis Sentiment analysis Stopword removal Next meetings Natural Language Processing preview: Stopword removal Big Data and Automated Content Analysis Damian Trilling
  • 55. Different types of analysis Sentiment analysis Stopword removal Next meetings Natural Language Processing preview: Stopword removal Why now? — Because the logic of the algorithm is very much related to the one of our first simple sentiment analysis Big Data and Automated Content Analysis Damian Trilling
  • 56. Different types of analysis Sentiment analysis Stopword removal Next meetings Natural language processing Stopword removal: What and why? Why remove stopwords? • If we want to identify key terms (e.g., by means of a word count), we are not interested in them • If we want to calculate document similarity, it might be inflated • If we want to make a word co-occurance graph, irrelevant information will dominate the picture Big Data and Automated Content Analysis Damian Trilling
  • 57. Different types of analysis Sentiment analysis Stopword removal Next meetings A simple algorithm Stopword removal: How 1 testo=’He gives her a beer and a cigarette.’ 2 testonuovo="" 3 stopwords=[’and’,’the’,’a’,’or’,’he’,’she’,’him’,’her’] 4 for verbo in testo.split(): 5 if verbo not in stopwords: 6 testonuovo=testonuovo+verbo+" " What do we get if we do: 1 print (testonuovo) Can you explain the algorithm? Big Data and Automated Content Analysis Damian Trilling
  • 58. Different types of analysis Sentiment analysis Stopword removal Next meetings A simple algorithm We get: 1 >>> print (testonuovo) 2 ’He gives beer cigarette. ’ Why is "He" still in there? How can we fix this? Big Data and Automated Content Analysis Damian Trilling
  • 59. Different types of analysis Sentiment analysis Stopword removal Next meetings A simple algorithm Stopword removal 1 testo=’He gives her a beer and a cigarette.’ 2 testonuovo="" 3 stopwords=[’and’,’the’,’a’,’or’,’he’,’she’,’him’,’her’] 4 for verbo in testo.split(): 5 if verbo.lower() not in stopwords: 6 testonuovo=testonuovo+verbo+" " Big Data and Automated Content Analysis Damian Trilling
  • 60. Different types of analysis Sentiment analysis Stopword removal Next meetings Take-home message Mid-term take-home exam Next meetings Big Data and Automated Content Analysis Damian Trilling
  • 61. Different types of analysis Sentiment analysis Stopword removal Next meetings Take-home messages What you should be familiar with: • You should have completely understood last week’s exercise. Re-read it if neccessary. • Approaches to the analysis (e.g., structure vs. content) • Types of sentiment analysis, application areas, pros and cons Big Data and Automated Content Analysis Damian Trilling
  • 62. Different types of analysis Sentiment analysis Stopword removal Next meetings Mid-term take home exam Week 5: Monday, 25 April, to Sunday, 1 May • You get the exam on Monday at the end of the meeting • Answers have to be handed in no later than Sunday, 23.59 • 30% of final grade • 3 questions: 1 Literature question: Different methods (“Explain how. . . is done”) and/or epistemological or theoretical implications (“What does this mean for social-scientific research?”) 2 Empirical question (conceptual) 3 Empirical question (actual programming task) If you fully understood all exercises until now, it shouldn’t be difficult and won’t take too long. But give yourself a lot of buffer time!!! Big Data and Automated Content Analysis Damian Trilling
  • 63. Different types of analysis Sentiment analysis Stopword removal Next meetings Next meetings Wednesday, 20–4 We work together on the examples in Chapter 5. You can bring your own data (you will probably learn more!), but then already think about how to write some script to read the data (as we did last week and as described in Chapter 4). Big Data and Automated Content Analysis Damian Trilling