Your SlideShare is downloading. ×
Facebook as a data capture site: Techniques, Traps, Terms & Conditions (by  Dr. Bernie Hogan)
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Facebook as a data capture site: Techniques, Traps, Terms & Conditions (by Dr. Bernie Hogan)


Published on

Speaker: Dr. Bernie Hogan, Oxford Internet Institute, University of Oxford …

Speaker: Dr. Bernie Hogan, Oxford Internet Institute, University of Oxford

Organizers: the “Modelling and Mining of Network Information Spaces” Initiative at Dalhousie University

Sponsors: SSHRC & MITACS

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Facebook as a data capture site: Techniques, Traps, Terms and Conditions Dr Bernie Hogan Research Fellow, Oxford Internet Institute University of Oxford 24.03.11_Dalhousie School of Information ManagementThursday, 24 March 2011 1
  • 2. Sampled 10 million ties. “When I shared the image with others within Facebook, it resonated with many people. Its not just a pretty picture, its a reaffirmation of the impact we have in connecting people, even across oceans and borders.” Butler (2010)Thursday, 24 March 2011 2
  • 3. Thursday, 24 March 2011 3
  • 4. Thursday, 24 March 2011 3
  • 5. Examples of Facebook Applications for Academic Research • Capture Facebook user and network data, then push to an embedded survey (via SurveyMonkey / SurveyGizmo) • Compare products and claims based on friend recommendations • Study relationship strength against trace dataThursday, 24 March 2011 4
  • 6. Personal networks on Facebook • Simplest: Bidirectional friendship ties • More complex: Tagging relationships, messages, likes, comments.Thursday, 24 March 2011 5
  • 7. Personal Network Structures Alters Alters Alters alter alter alter Alter Alter Alter Alter Alter Alter Ego Ego Alters Ego Alters alter alter Alter Alter Alter Alter Alter Alter Alters Alters Alters alter alter alter Ego’s Friend List Alter-Alter ties Alter’s Friend List (Degree 1.0) (“Degree 1.5”) (“Degree 2.0”)Thursday, 24 March 2011 6
  • 8. Visualizing Binary Personal Networks 7 7 4 4 10 6 10 6 5 5 9 9 12 8 12 8 Me 13 3 13 3 11 1 11 1 2 2 With Ego: Without Ego: Artificially well-connected. Separate components Visible.Thursday, 24 March 2011 7
  • 9. Testing Tie Strength and Network Position Hogan (2008) Facebook Personal Overlap Nodes 186 27 19 Edges 920 40 15 Density 0.053 0.114 0.088 Components (n > 2) 1 1 2 Dyads 1 0 0 Isolates 10 3 0 z p Betweenness - with isolates -3.553 <0.001 - without -4.279 <0.001 Degree (Wilcoxon) - with isolates -1.789 0.074 - without -2.475 0.013 Degree (t-test with unequal variances) - with isolates n/a 0.045 - without n/a 0.021Thursday, 24 March 2011 8
  • 10. Taste, Ties & Time (Lewis et al., 2008) Friend distributions dwarf picture tagging behavior. The later thus may be seen as example of stronger ties.*Thursday, 24 March 2011 9
  • 11. Testing Tie Strength and Trace Data (Beyond Position) Gilbert and Karahalios (2009)Thursday, 24 March 2011 10
  • 12. The Facebook API • An extremely extensive system for allowing legitimate access to Facebook data. • Not as thorough as scraping, but nor as problematic ethically. • Gives data, but not a self-desctruct command, so review the Facebook terms!!!Thursday, 24 March 2011 11
  • 13. Getting Data out of Facebook • RESTapi: Oldest, deprecated, only RESTapi FQL OpenGraph provides basic data now. • FQL: Best for speed; complicated queries not easily embedded into Facebooks Data Store applications • OpenGraph: Simplest, some public XML JSON information, pre-cached.Thursday, 24 March 2011 12
  • 14. List 1: Row labels Maggie Lisa Bart Marge Homer Homer Homer Homer Marge Marge Marge Bart Bart Lisa Homer ? ? ? ? List 2: Column labels Marge ? ? ? Maggie Lisa Bart Marge Maggie Lisa Bart Maggie Lisa Maggie Bart ? ? Query results: Lisa ? False True False True False True False True False True True True False True True True True Maggie True False Restapi: queryResults = facebook.areFriends(List1,List2) 2 O(n )Thursday, 24 March 2011 13
  • 15. Sample FQL part I SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())Thursday, 24 March 2011 14
  • 16. Sample FQL part I SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me()) FQL is pretty much like SQL except with some syntactic sugar, restrictions on joins, and it can be very peculiar in what it returns. It does not return more than approximately 5k edges for any query.Thursday, 24 March 2011 14
  • 17. Sample FQL part II Solution Break down the query into chunks, and iterate through the chunks. Advantage 1: more robust, Advantage II: Can be parallelizedThursday, 24 March 2011 15
  • 18. Sample FQL part II Solution Break down the query into chunks, and iterate through the chunks. Advantage 1: more robust, Advantage II: Can be parallelized Example (in python) u1lo = 0, u1hi = u1lo + 1000, u2lo = 0, u2hi = u2lo + 100 # Iterate through the counts, selecting friends only in these buckets. query = "SELECT uid1, uid2 FROM friend WHERE uid1 IN (SELECT uid2 FROM friend WHERE uid1 = me() AND uid2 >= %(u1lo)s AND uid2 < %(u1hi)s) AND uid2 IN (SELECT uid1 FROM friend WHERE uid2 = me() AND uid1 >= %(u2lo)s AND uid1 < %(u2hi)s)" % {fb:fb.uid,u1lo:friendsArr[u1lo], u1hi:friendsArr[u1hi], u2lo:friendsArr [u2lo], u2hi:friendsArr[u2hi]}Thursday, 24 March 2011 15
  • 19. OAuth 2.0 Explained 1. Goes to • Application Token based system. • Requires a server to receive the token • Allows for granular permissions when Client (User) 6. Returns Data Application requesting a token. • This is presently necessary for 2. Redirects Client 4. Forwards sensitive information and particularly OAuth token 5. Asks for data information from friends. 3. Asks for credentials on clients behalf • Your application should never see a Facebook username or password.Thursday, 24 March 2011 16
  • 20. Implementations for Personal Networks • NameGenWeb: • NetVizz: • Pajek: • ORA: software.htmlThursday, 24 March 2011 17
  • 21. NameGenWeb By Bernie Hogan In beta longer than Gmail. Version 2.0 with OAuth coming... Hosted at the Oxford Internet InstituteThursday, 24 March 2011 18
  • 22. NameGen Desktop Interview tool. Pretty much busted without new OAuth support.Thursday, 24 March 2011 19
  • 23. Netvizz • Gets groups, minimal attributes and egonets in gdf format. • Some issues with really large networks. • Plays well with GEPHIThursday, 24 March 2011 20
  • 24. Facebook, Big Data and Controversy Taste, Ties & Time American Cultures The Facebook 100 Kevin Lewis et al. Pete Warden Amanda Traud et al.Thursday, 24 March 2011 21
  • 25. Taste, Ties & Time Lewis et al. • What Happened: • An attempt to create the drosophila of social networks by merging academic records with Facebook data. • Great early results, and powerful publications in Social Networks, Journal of Computer- Mediated Communication and the American Journal of Sociology • Students at Harvard College realized this data was being taken without their permission and there are massive privacy concerns • How to get their data: • Go to Harvard and hang out with Nicholas Christakis - it’s only accessible through a single terminal managed by him.Thursday, 24 March 2011 22
  • 26. American Cultures Pete Warden • What Happened • After leaving Apple, Pete Warden wanted to do some big data processing, both as a community gesture and as an attention garnering exercise. Captured friend links from all publicly available profiles (about 40% of Facebook’s population). • Released several graphics showing clusters based on interest in the U.S., such as ‘stayathomia’. • Was planning on releasing the data to the public (saying that by spidering he was not bound by their terms of service). • How to get the data: • Unless Warden is lying, you cannot. It has been destroyed after Facebook threatened legal action. However, a similar but smaller data set captured afterwards is floating around BitTorrent.Thursday, 24 March 2011 23
  • 27. The Facebook 100 Mason Porter et al. • What Happened: • Through a contact [formerly] at Facebook, Mason Porter was able to access significant amounts of personal data for the first 100 schools on Facebook in 2005. • Porter released the data to the research community but accidentally kept in Facebook IDs, which allowed easy access for reidentification. • Still committed to releasing the data for academic purposes. • How to get their data: • Mason has since taken down, but it is still available in a pseudononymized format.Thursday, 24 March 2011 24
  • 28. Summary • Facebook is an excellent source of data, but there are both legal/ethic concerns, and technical constraints. • Facebook personal networks provide good intuitions about social networks generally. • Networks can be built from friendship ties as well as derived statistics. • Insights are germane and emerging (egonets cluster around contexts and can show tie strength).Thursday, 24 March 2011 25
  • 29. Relevant References • Brooks, Brandon, Howard T Welser, Bernie Hogan and Scott Titsworth. Hogan, Bernie. 2010. “Visualizing and Interpreting Facebook Networks.” In 2011. "Socioeconomic Status Updates: College Students, Family Analyzing Social Media Networks with NodeXL, eds. D Hansen, Marc A SES, and Emergent Social Capital in Facebook Networks" Smith, and Ben Shneiderman. Burlington, MA: Morgan Kaufmann, p. Information Communication & Society. Forthcoming. 165-180. Butler, Paul (2010). "Visualizing Friendships". Facebook Engineering Note. Lewis, K et al. 2008. “Tastes, ties, and time: A new social network dataset using Available at: Facebook. com.” Social Networks 30(4): 330-342. Gilbert, Eric, and Karrie Karahalios. 2009. “Predicting tie strength with social Lewis, K, J Kaufman, and N Christakis. 2008. “The Taste for Privacy: An media.” In CHI ’09: Proceeding of the twenty-seventh annual SIGCHI Analysis of College Student Privacy Settings in an Online Social Network.” conference on Human factors in computing systems, New York, NY, USA: Journal of Computer-Mediated Communication 14(1): 79-100. ACM. Marlow, Cameron. 2009. "Maintained Relationships on Facebook". Overstated. Hogan, Bernie. 2008. "A comparison of on and offline networks through the Available at Facebook API". Qualitative Methods in the Social Sciences 2: facebook Communication Networks on the Web. Amsterdam. December 2008. Traud, Amanda L. et al. 2008. “Comparing Community Structure to Characteristics in Online Collegiate Social Networks.” 17. Available at:, 24 March 2011 26
  • 30. Thank You Bernie Hogan Research Fellow, OII Twitter: @blurky, 24 March 2011 27