Successfully reported this slideshow.

Twitter as a data source for (socio)linguistic research

3

Share

Loading in …3
×
1 of 17
1 of 17

Twitter as a data source for (socio)linguistic research

3

Share

Download to read offline

Talk on the potentials of Twitter data for linguistic research held at the Freiburg Institute for Advanced Study (FRIAS) on invitation from Christian Mair. Thanks for having me!

Talk on the potentials of Twitter data for linguistic research held at the Freiburg Institute for Advanced Study (FRIAS) on invitation from Christian Mair. Thanks for having me!

More Related Content

More from Cornelius Puschmann

Related Books

Free with a 14 day trial from Scribd

See all

Related Audiobooks

Free with a 14 day trial from Scribd

See all

Twitter as a data source for (socio)linguistic research

  1. 1. Twitter as a data source for (socio)linguistic research Cornelius Puschmann Berlin School of Library and Information Science / Humboldt Institute for Internet and Society Universität Freiburg, 29. November
  2. 2. 2. A very brief introduction to Twitter 1. Framing the issue: Big Data in the Humanities and Social Sciences 3. Technical requirements 4. Legal and ethical issues 5. Sample data: a corpus of Nigerian pidgin This talk
  3. 3. "Big Data" The proliferation of social media makes large volumes of data available to researchers, leading to new approaches: • digital methods (Rogers, 2009) • cultural analytics (Manovich, 2007) • computational social science (Lazer et al, 2009)
  4. 4. Examples of "Big Data"- style research • artistic trends in online art communities (Manovich, 2011) • cooperation and collaboration in Wikipedia edit wars (Yasseri, 2012) • tracing the geographical spread of neologisms via Twitter (Eisenstein et al, 2012; 44 mio tweets, 500k users)
  5. 5. Features of Twitter • messages restricted to 140 characters • semi-synchronous • mostly public • content presented as stream • used to spread news, have (semi-)public conversations • native features: retweeting, hashtags, @ messages, picture linking
  6. 6. Application Programming Interface (API) HTTP request return all data from a given user/hashtag/geolocation/... Data* Extracting Twitter data
  7. 7. Software Collection: • The Archivist (Windows desktop software) • yourTwapperKeeper (webserver required) • 140kit.com (web-based plaform for researchers) Analysis: • Excel, Open Office Calc, SPSS, R, Google Docs.. Visualization: • (Excel, OO Calc, R), Gephi, NodeXL
  8. 8. Legal and ethics issues • Consider ethical issues when collecting (cf. AoIR Ethics Guidelines) • Anonymize all data (cf. European Data Protection Directive) • Don‘t share raw data (cf.Twitter Terms of Service) • Publish only excerpts/summary statistics
  9. 9. Example: A Twitter corpus of Nigerian pidgin • collected data since August 17th • used tweets from and to three users based in Abuja, Nigeria • 8,151 tweets from 357 different users • corpus contains both language data and social graph
  10. 10. wiztalib @USER: @wiztalib lolz,make I do leave dat town jor..BM no dey exist again u think so,I miss ooooo RT @USER: Som pple ask me, whr av bin l8ly, de tot I fell off, nobody can save me, de playn in d backgrnd, I don't backdwn, so dnt ... albertteslim @USER d same tin we'v bin hearin...'xamz till further notice'... @USER mai fada nd mai moda lil_tenuche LMFAO @USER: @lil_tenuche I wan enter keffi 2day sef... I wan write my exams dia since unibuja dey dull me RT @USER: Retweet If you are proud of ur language ☺ Sample tweets
  11. 11. @USER guy u be fool o.dey use my acct tweet rubish ba....ur brain dy shake abi @USER: @USER abi,hw is lifefyn sweet ow jtown Abi@USER: Wish her well bruv....RT @USER: A wish?@USER: Watch out ppl! Goin to be performing wit Wizzy tonyt :D If you don't know won't you shut up?abi are u a learner @USER loool....abi, when u dey commot that side? U wey knw@USER: Hmmmm!!! D usual abi?@USER: This night..... Lol...abi@USER: Take cover!! RT @USER: Watchu gon' do when shit hits the fan? Lol...just missing u@USER: *raised eyebrow* kabir?? One can now follow twice? Abi what? RT @USER: @USER pls ff bck dear :D I wish o@USER: Kissed u n neva called?@USER: Do u knw wat he did?@USER: Abi...@USER: Oya frgive jor Abi...@USER: Oya frgive jor@USER: @USER I'm really angry Lol...abi@USER: Na Ideba tinz oh@USER: Ileya ti ya o Lol abi@USER: A 100% is much,at least 50% will do.@USER: Never trust a human being 100% Hmmm okies@USER: Not really @USER: I see u've joined #TeamNoSleep abi @USER I see u've joined #TeamNoSleep abi @USER Loool@USER: Tweetpic my boobs abi? U wee tey for there Abi@USER: 3 jst in one nyte. Wow dats splendid.RT @USER: @USER yaaaaaaaay...pls ff @USER ...she's our bday mate Abi...@USER: @USER saying nothing, and wishing you had? Abi@USER: I can't dull myslf gaskiya Hmmm dats true o! Lemme see ur hand sef@USER: @USER lol...why dnt u believe me, abi u see ring 4 my hand? Yes o! :D@USER: U abi?@USER: Okene ben 10 @USER: @USER abi,hw is lifefyn sweet ow jtown @USER abi,hw is life Abi..RT @USER: @USER its dirty!!! U̶̲̥̅̊ dnt knw abi RT @USER: @USER but y? Sample tweets (with abi)
  12. 12. conversational network between three users
  13. 13. Summary • Twitter can be used to collect language data from a variety of sources • Combination of linguistic, demographic and interactional data enables new forms of research • technical challenges must be overcome • legal/ethical issues should be carefully considered from the onset
  14. 14. Thank you for your attention!

×