• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Guestlecture on #bigdata
 

Guestlecture on #bigdata

on

  • 1,168 views

A guest lecture in the Master elective "The Blind Spot: Tracking Young Media Users" by Susanne Baumgartner

A guest lecture in the Master elective "The Blind Spot: Tracking Young Media Users" by Susanne Baumgartner

Statistics

Views

Total Views
1,168
Views on SlideShare
547
Embed Views
621

Actions

Likes
0
Downloads
4
Comments
0

2 Embeds 621

http://www.damiantrilling.net 592
https://twitter.com 29

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Guestlecture on #bigdata Guestlecture on #bigdata Presentation Transcript

    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? #bigdata in Communication Science Some examples from research by me and my students Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam October 2013 #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? No definition, but . . . • Existing data • Too big to code manually • Sometimes also too big to handle with normal tools • New research questions • Call to revisit the relationship between theory and empirical research #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? What’s big data? Some sources • Social Network Sites • RSS-feeds • Databases • Scraping text from the web • ... #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? It’s out there! You only have to collect it. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Some examples #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events A recent master thesis Rare events Imagine you want to analyze some very rare content. Normal sampling won’t work, that’s for sure. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events So you’d better collect everything first Getting all news coverage from Dutch news sites We collected all articles from nine news sites during a period of two months, resulting in a database with 74.000 articles. In a second step, we filtered those articles containing specific keywords. Those 292 articles where then manually coded. Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as a source for journalists in online news articles. Master Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Rare events It’s just one line of code! url.txt http://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehne http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermannbittet-um-verzeihung http://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierungwill-zuruecktreten http://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klagegegen-republik http://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafewegen-oelpest http://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-keinbabybauch-nur-fast-food ... ... ... #bigdata wget-commando wget -i urls.txt Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets A recent bachelor thesis Tone in tweets Imagine you want to know something about someone’s behavior on twitter. Or how a specific topic is discussed on Twitter. Do you really want to go through thousands of tweets by hand? #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets So you’d better think about automating your coding Finding out how negative or positive politicians are towards there opponents We took lists with positive and negative words and with a politician’s opponents. We used a Python-script to check which type of words were used to refer to opponents. For further analysis, the results where imported in SPSS. Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarende factoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici en politieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam. #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Tone in tweets #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams How often are specific expressions used? Counting words and n-grams Imagine you want to know which words or expressions dominate a discourse . There are plenty of possibilities to get an answer within minutes, here’s one: #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams Again, just one or two lines of code! For example with STATA • Install the package wordscore (net install http://www.tcd.ie/Political_Science/wordscores/wordscores) • voor wordcounts: wordfreq /home/dami/texts/lab92.txt /home/dami/texts/lab97.txt • voor ngrams (trigrams in dit geval): phrasefreq 3 lab92.txt lab97.txt #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Counting words and n-grams trigrams in Obama-Tweets #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis Another approach Network analysis Imagine you want to know who talks to whom and how networks are interconnected . Use a tool like NodeXL or Gephi! #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Network analysis #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems You sometimes depend entirely on commercial parties • Services can shut down (GoogleReader) or change their API (Twitter) • It’s rather easy to get (up to 3200) tweets from a specific user (e.g., allmytweets.net), but if you want to capture a #hashtag, you have to record it live • Twitter doesn’t give you all tweets, but just about 1% (+ a bunch of other limits) #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Problems Not sure if this a problem or a great opportunity. . . You cannot rely (only) on ready-made software but shout get ready to use tools like bash-scripts, grep, python, . . . (Which can be fun!) #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? A glimpse in the kitchen #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? What I’m doing right now Analyzing #tvduell • 570.000 tweets • Identifyig clusters of nouns, verbs and adjectives • Assigning positivity and negativity scores to tweets • See if they can be interpreted as frames ⇒How are Merkel and Steinbrück framed on the Second Secreen during the debate? #bigdata Damian Trilling
    • What’s big data? #bigdata Some examples Problems A glimpse in the kitchen Questions? Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Something you can use? 1 What’s big data? 2 Some examples Rare events Tone in tweets Counting words and n-grams Network analysis 3 Problems 4 A glimpse in the kitchen 5 Questions? #bigdata Damian Trilling
    • What’s big data? Some examples Problems A glimpse in the kitchen Questions? Vragen of opmerkingen? Damian Trilling d.c.trilling@uva.nl @damian0604 www.damiantrilling.net #bigdata Damian Trilling