Talk on the potentials of Twitter data for linguistic research held at the Freiburg Institute for Advanced Study (FRIAS) on invitation from Christian Mair. Thanks for having me!
General Principles of Intellectual Property: Concepts of Intellectual Proper...
Twitter as a data source for (socio)linguistic research
1. Twitter as a data source for
(socio)linguistic research
Cornelius Puschmann
Berlin School of Library and Information Science /
Humboldt Institute for Internet and Society
Universität Freiburg, 29. November
2. 2. A very brief introduction to Twitter
1. Framing the issue: Big Data in the
Humanities and Social Sciences
3. Technical requirements
4. Legal and ethical issues
5. Sample data: a corpus of Nigerian pidgin
This talk
3. "Big Data"
The proliferation of social media makes large
volumes of data available to researchers, leading
to new approaches:
• digital methods (Rogers, 2009)
• cultural analytics (Manovich, 2007)
• computational social science (Lazer et al, 2009)
4. Examples of "Big Data"-
style research
• artistic trends in online art communities
(Manovich, 2011)
• cooperation and collaboration in Wikipedia
edit wars (Yasseri, 2012)
• tracing the geographical spread of
neologisms via Twitter (Eisenstein et al,
2012; 44 mio tweets, 500k users)
5.
6.
7. Features of Twitter
• messages restricted to 140 characters
• semi-synchronous
• mostly public
• content presented as stream
• used to spread news, have (semi-)public
conversations
• native features: retweeting, hashtags, @
messages, picture linking
10. Software
Collection:
• The Archivist (Windows desktop software)
• yourTwapperKeeper (webserver required)
• 140kit.com (web-based plaform for researchers)
Analysis:
• Excel, Open Office Calc, SPSS, R, Google Docs..
Visualization:
• (Excel, OO Calc, R), Gephi, NodeXL
11. Legal and ethics issues
• Consider ethical issues when collecting
(cf. AoIR Ethics Guidelines)
• Anonymize all data
(cf. European Data Protection Directive)
• Don‘t share raw data
(cf.Twitter Terms of Service)
• Publish only excerpts/summary statistics
12. Example: A Twitter corpus of Nigerian pidgin
• collected data since August 17th
• used tweets from and to three
users based in Abuja, Nigeria
• 8,151 tweets from 357 different users
• corpus contains both language data and social graph
13. wiztalib
@USER: @wiztalib lolz,make I do leave dat town jor..BM no dey exist again u
think so,I miss ooooo
RT @USER: Som pple ask me, whr av bin l8ly, de tot I fell off, nobody can
save me, de playn in d backgrnd, I don't backdwn, so dnt ...
albertteslim
@USER d same tin we'v bin hearin...'xamz till further notice'...
@USER mai fada nd mai moda
lil_tenuche
LMFAO @USER: @lil_tenuche I wan enter keffi 2day sef... I wan write my
exams dia since unibuja dey dull me
RT @USER: Retweet If you are proud of ur language ☺
Sample tweets
14. @USER guy u be fool o.dey use my acct tweet rubish ba....ur brain dy shake abi
@USER: @USER abi,hw is lifefyn sweet ow jtown
Abi@USER: Wish her well bruv....RT @USER: A wish?@USER: Watch out ppl! Goin to be performing wit Wizzy tonyt :D
If you don't know won't you shut up?abi are u a learner
@USER loool....abi, when u dey commot that side?
U wey knw@USER: Hmmmm!!! D usual abi?@USER: This night.....
Lol...abi@USER: Take cover!! RT @USER: Watchu gon' do when shit hits the fan?
Lol...just missing u@USER: *raised eyebrow* kabir?? One can now follow twice? Abi what? RT @USER: @USER pls ff bck
dear :D
I wish o@USER: Kissed u n neva called?@USER: Do u knw wat he did?@USER: Abi...@USER: Oya frgive jor
Abi...@USER: Oya frgive jor@USER: @USER I'm really angry
Lol...abi@USER: Na Ideba tinz oh@USER: Ileya ti ya o
Lol abi@USER: A 100% is much,at least 50% will do.@USER: Never trust a human being 100%
Hmmm okies@USER: Not really @USER: I see u've joined #TeamNoSleep abi @USER
I see u've joined #TeamNoSleep abi @USER
Loool@USER: Tweetpic my boobs abi? U wee tey for there
Abi@USER: 3 jst in one nyte. Wow dats splendid.RT @USER: @USER yaaaaaaaay...pls ff @USER ...she's our bday mate
Abi...@USER: @USER saying nothing, and wishing you had?
Abi@USER: I can't dull myslf gaskiya
Hmmm dats true o! Lemme see ur hand sef@USER: @USER lol...why dnt u believe me, abi u see ring 4 my hand?
Yes o! :D@USER: U abi?@USER: Okene ben 10
@USER: @USER abi,hw is lifefyn sweet ow jtown
@USER abi,hw is life
Abi..RT @USER: @USER its dirty!!!
U̶̲̥̅̊ dnt knw abi RT @USER: @USER but y?
Sample tweets (with abi)
16. Summary
• Twitter can be used to collect language data
from a variety of sources
• Combination of linguistic, demographic and
interactional data enables new forms of
research
• technical challenges must be overcome
• legal/ethical issues should be carefully
considered from the onset