Twitter as a data source for (socio)linguistic research

2,050 views

Published on

Talk on the potentials of Twitter data for linguistic research held at the Freiburg Institute for Advanced Study (FRIAS) on invitation from Christian Mair. Thanks for having me!

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,050
On SlideShare
0
From Embeds
0
Number of Embeds
178
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Twitter as a data source for (socio)linguistic research

  1. 1. Twitter as a data source for (socio)linguistic research Cornelius PuschmannBerlin School of Library and Information Science / Humboldt Institute for Internet and Society Universität Freiburg, 29. November
  2. 2. This talk1. Framing the issue: Big Data in the Humanities and Social Sciences 2. A very brief introduction to Twitter 3. Technical requirements 4. Legal and ethical issues 5. Sample data: a corpus of Nigerian pidgin
  3. 3. "Big Data"The proliferation of social media makes largevolumes of data available to researchers, leadingto new approaches:• digital methods (Rogers, 2009)• cultural analytics (Manovich, 2007)• computational social science (Lazer et al, 2009)
  4. 4. Examples of "Big Data"- style research• artistic trends in online art communities (Manovich, 2011)• cooperation and collaboration in Wikipedia edit wars (Yasseri, 2012)• tracing the geographical spread of neologisms via Twitter (Eisenstein et al, 2012; 44 mio tweets, 500k users)
  5. 5. Features of Twitter• messages restricted to 140 characters• semi-synchronous• mostly public• content presented as stream• used to spread news, have (semi-)public conversations• native features: retweeting, hashtags, @ messages, picture linking
  6. 6. Extracting Twitter dataHTTP request return all data from a given user/hashtag/geolocation/... Application Programming Interface (API) Data*
  7. 7. SoftwareCollection: • The Archivist (Windows desktop software) • yourTwapperKeeper (webserver required) • 140kit.com (web-based plaform for researchers)Analysis: • Excel, Open Office Calc, SPSS, R, Google Docs..Visualization: • (Excel, OO Calc, R), Gephi, NodeXL
  8. 8. Legal and ethics issues• Consider ethical issues when collecting (cf. AoIR Ethics Guidelines)• Anonymize all data (cf. European Data Protection Directive)• Don‘t share raw data (cf. Twitter Terms of Service)• Publish only excerpts/summary statistics
  9. 9. Example: A Twitter corpus of Nigerian pidgin• collected data since August 17th• used tweets from and to threeusers based in Abuja, Nigeria• 8,151 tweets from 357 different users• corpus contains both language data and social graph
  10. 10. Sample tweetswiztalib@USER: @wiztalib lolz,make I do leave dat town jor..BM no dey exist again uthink so,I miss oooooRT @USER: Som pple ask me, whr av bin l8ly, de tot I fell off, nobody cansave me, de playn in d backgrnd, I dont backdwn, so dnt ...albertteslim @USER d same tin wev bin hearin...xamz till further notice... @USER mai fada nd mai modalil_tenuche LMFAO @USER: @lil_tenuche I wan enter keffi 2day sef... I wan write my exams dia since unibuja dey dull me RT @USER: Retweet If you are proud of ur language ☺
  11. 11. Sample tweets (with abi)@USER guy u be fool o.dey use my acct tweet rubish ba....ur brain dy shake abi@USER: @USER abi,hw is lifefyn sweet ow jtownAbi@USER: Wish her well bruv....RT @USER: A wish?@USER: Watch out ppl! Goin to be performing wit Wizzy tonyt :DIf you dont know wont you shut up?abi are u a learner@USER loool....abi, when u dey commot that side?U wey knw@USER: Hmmmm!!! D usual abi?@USER: This night.....Lol...abi@USER: Take cover!! RT @USER: Watchu gon do when shit hits the fan?Lol...just missing u@USER: *raised eyebrow* kabir?? One can now follow twice? Abi what? RT @USER: @USER pls ff bckdear :DI wish o@USER: Kissed u n neva called?@USER: Do u knw wat he did?@USER: Abi...@USER: Oya frgive jorAbi...@USER: Oya frgive jor@USER: @USER Im really angryLol...abi@USER: Na Ideba tinz oh@USER: Ileya ti ya oLol abi@USER: A 100% is much,at least 50% will do.@USER: Never trust a human being 100%Hmmm okies@USER: Not really @USER: I see uve joined #TeamNoSleep abi @USERI see uve joined #TeamNoSleep abi @USERLoool@USER: Tweetpic my boobs abi? U wee tey for thereAbi@USER: 3 jst in one nyte. Wow dats splendid.RT @USER: @USER yaaaaaaaay...pls ff @USER ...shes our bday mateAbi...@USER: @USER saying nothing, and wishing you had?Abi@USER: I cant dull myslf gaskiyaHmmm dats true o! Lemme see ur hand sef@USER: @USER lol...why dnt u believe me, abi u see ring 4 my hand?Yes o! :D@USER: U abi?@USER: Okene ben 10@USER: @USER abi,hw is lifefyn sweet ow jtown@USER abi,hw is lifeAbi..RT @USER: @USER its dirty!!!U̶̲̥̅̊ dnt knw abi RT @USER: @USER but y?
  12. 12. conversational network between three users
  13. 13. Summary• Twitter can be used to collect language data from a variety of sources• Combination of linguistic, demographic and interactional data enables new forms of research• technical challenges must be overcome• legal/ethical issues should be carefully considered from the onset
  14. 14. Thank you for your attention!

×