Facebook as a data capture site:                          Techniques, Traps, Terms and Conditions                         ...
Sampled 10 million ties. “When I shared the image with others within                          Facebook, it resonated with ...
Thursday, 24 March 2011   3
Thursday, 24 March 2011   3
Examples of Facebook Applications                         for Academic Research                          •   Capture Faceb...
Personal networks on Facebook                •         Simplest: Bidirectional                          friendship ties   ...
Personal Network Structures                                                                       Alters            Alters...
Visualizing Binary Personal Networks                                                    7                                 ...
Testing Tie Strength and Network Position                                                                  Hogan (2008)   ...
Taste, Ties & Time                                     (Lewis et al., 2008)                           Friend distributions...
Testing Tie Strength and Trace Data (Beyond Position)                                                         Gilbert and ...
The Facebook API          •      An extremely extensive system for                 allowing legitimate access to          ...
Getting Data out of Facebook               •     RESTapi: Oldest, deprecated, only   RESTapi            FQL              O...
List 1: Row labels                   Maggie      Lisa   Bart   Marge                                                      ...
Sample FQL part I           SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())Thursday, 24...
Sample FQL part I           SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())          FQ...
Sample FQL part II           Solution           Break down the query into chunks, and iterate through the chunks.         ...
Sample FQL part II           Solution           Break down the query into chunks, and iterate through the chunks.         ...
OAuth 2.0 Explained                                                                           1. Goes to        •         ...
Implementations for Personal Networks                          •   NameGenWeb: http://apps.facebook.com/namegenweb        ...
NameGenWeb                                   By Bernie Hogan                          In beta longer than Gmail.          ...
NameGen                            Desktop                          Interview tool. Pretty much                          b...
Netvizz      •      Gets groups, minimal             attributes and             egonets in gdf             format.      • ...
Facebook, Big Data and Controversy Taste, Ties & Time            American Cultures   The Facebook 100    Kevin Lewis et al...
Taste, Ties & Time                                                Lewis et al.            •      What Happened:           ...
American Cultures                                                   Pete Warden           •       What Happened           ...
The Facebook 100                                                     Mason Porter et al.            •       What Happened:...
Summary         •      Facebook is an excellent source of data, but                there are both legal/ethic concerns, an...
Relevant References •       Brooks, Brandon, Howard T Welser, Bernie Hogan and Scott Titsworth.        Hogan, Bernie. 2010...
Thank You                                   Bernie Hogan                                Research Fellow, OII              ...
Upcoming SlideShare
Loading in …5
×

Facebook as a data capture site: Techniques, Traps, Terms & Conditions (by Dr. Bernie Hogan)

5,201
-1

Published on

Speaker: Dr. Bernie Hogan, Oxford Internet Institute, University of Oxford

Organizers: SocialMediaLab.ca the “Modelling and Mining of Network Information Spaces” Initiative at Dalhousie University

Sponsors: SSHRC & MITACS

Facebook as a data capture site: Techniques, Traps, Terms & Conditions (by Dr. Bernie Hogan)

  1. 1. Facebook as a data capture site: Techniques, Traps, Terms and Conditions Dr Bernie Hogan Research Fellow, Oxford Internet Institute University of Oxford 24.03.11_Dalhousie School of Information ManagementThursday, 24 March 2011 1
  2. 2. Sampled 10 million ties. “When I shared the image with others within Facebook, it resonated with many people. Its not just a pretty picture, its a reaffirmation of the impact we have in connecting people, even across oceans and borders.” Butler (2010)Thursday, 24 March 2011 2
  3. 3. Thursday, 24 March 2011 3
  4. 4. Thursday, 24 March 2011 3
  5. 5. Examples of Facebook Applications for Academic Research • Capture Facebook user and network data, then push to an embedded survey (via SurveyMonkey / SurveyGizmo) • Compare products and claims based on friend recommendations • Study relationship strength against trace dataThursday, 24 March 2011 4
  6. 6. Personal networks on Facebook • Simplest: Bidirectional friendship ties • More complex: Tagging relationships, messages, likes, comments.Thursday, 24 March 2011 5
  7. 7. Personal Network Structures Alters Alters Alters alter alter alter Alter Alter Alter Alter Alter Alter Ego Ego Alters Ego Alters alter alter Alter Alter Alter Alter Alter Alter Alters Alters Alters alter alter alter Ego’s Friend List Alter-Alter ties Alter’s Friend List (Degree 1.0) (“Degree 1.5”) (“Degree 2.0”)Thursday, 24 March 2011 6
  8. 8. Visualizing Binary Personal Networks 7 7 4 4 10 6 10 6 5 5 9 9 12 8 12 8 Me 13 3 13 3 11 1 11 1 2 2 With Ego: Without Ego: Artificially well-connected. Separate components Visible.Thursday, 24 March 2011 7
  9. 9. Testing Tie Strength and Network Position Hogan (2008) Facebook Personal Overlap Nodes 186 27 19 Edges 920 40 15 Density 0.053 0.114 0.088 Components (n > 2) 1 1 2 Dyads 1 0 0 Isolates 10 3 0 z p Betweenness - with isolates -3.553 <0.001 - without -4.279 <0.001 Degree (Wilcoxon) - with isolates -1.789 0.074 - without -2.475 0.013 Degree (t-test with unequal variances) - with isolates n/a 0.045 - without n/a 0.021Thursday, 24 March 2011 8
  10. 10. Taste, Ties & Time (Lewis et al., 2008) Friend distributions dwarf picture tagging behavior. The later thus may be seen as example of stronger ties.*Thursday, 24 March 2011 9
  11. 11. Testing Tie Strength and Trace Data (Beyond Position) Gilbert and Karahalios (2009)Thursday, 24 March 2011 10
  12. 12. The Facebook API • An extremely extensive system for allowing legitimate access to Facebook data. • Not as thorough as scraping, but nor as problematic ethically. • Gives data, but not a self-desctruct command, so review the Facebook terms!!!Thursday, 24 March 2011 11
  13. 13. Getting Data out of Facebook • RESTapi: Oldest, deprecated, only RESTapi FQL OpenGraph provides basic data now. • FQL: Best for speed; complicated queries not easily embedded into Facebooks Data Store applications • OpenGraph: Simplest, some public XML JSON information, pre-cached.Thursday, 24 March 2011 12
  14. 14. List 1: Row labels Maggie Lisa Bart Marge Homer Homer Homer Homer Marge Marge Marge Bart Bart Lisa Homer ? ? ? ? List 2: Column labels Marge ? ? ? Maggie Lisa Bart Marge Maggie Lisa Bart Maggie Lisa Maggie Bart ? ? Query results: Lisa ? False True False True False True False True False True True True False True True True True Maggie True False Restapi: queryResults = facebook.areFriends(List1,List2) 2 O(n )Thursday, 24 March 2011 13
  15. 15. Sample FQL part I SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me())Thursday, 24 March 2011 14
  16. 16. Sample FQL part I SELECT uid,name FROM user WHERE uid IN (SELECT uid2 FROM friend WHERE uid1 = me()) FQL is pretty much like SQL except with some syntactic sugar, restrictions on joins, and it can be very peculiar in what it returns. It does not return more than approximately 5k edges for any query.Thursday, 24 March 2011 14
  17. 17. Sample FQL part II Solution Break down the query into chunks, and iterate through the chunks. Advantage 1: more robust, Advantage II: Can be parallelizedThursday, 24 March 2011 15
  18. 18. Sample FQL part II Solution Break down the query into chunks, and iterate through the chunks. Advantage 1: more robust, Advantage II: Can be parallelized Example (in python) u1lo = 0, u1hi = u1lo + 1000, u2lo = 0, u2hi = u2lo + 100 # Iterate through the counts, selecting friends only in these buckets. query = "SELECT uid1, uid2 FROM friend WHERE uid1 IN (SELECT uid2 FROM friend WHERE uid1 = me() AND uid2 >= %(u1lo)s AND uid2 < %(u1hi)s) AND uid2 IN (SELECT uid1 FROM friend WHERE uid2 = me() AND uid1 >= %(u2lo)s AND uid1 < %(u2hi)s)" % {fb:fb.uid,u1lo:friendsArr[u1lo], u1hi:friendsArr[u1hi], u2lo:friendsArr [u2lo], u2hi:friendsArr[u2hi]}Thursday, 24 March 2011 15
  19. 19. OAuth 2.0 Explained 1. Goes to • Application Token based system. • Requires a server to receive the token • Allows for granular permissions when Client (User) 6. Returns Data Application requesting a token. • This is presently necessary for 2. Redirects Client 4. Forwards sensitive information and particularly OAuth token 5. Asks for data information from friends. 3. Asks for credentials on clients behalf • Your application should never see a Facebook username or password.Thursday, 24 March 2011 16
  20. 20. Implementations for Personal Networks • NameGenWeb: http://apps.facebook.com/namegenweb • NetVizz: http://apps.facebook.com/netvizz • Pajek: http://vlado.fmf.uni-lj.si/pub/networks/pajek/ • ORA: http://www.casos.cs.cmu.edu/projects/ora/ software.htmlThursday, 24 March 2011 17
  21. 21. NameGenWeb By Bernie Hogan In beta longer than Gmail. Version 2.0 with OAuth coming... Hosted at the Oxford Internet InstituteThursday, 24 March 2011 18
  22. 22. NameGen Desktop Interview tool. Pretty much busted without new OAuth support.Thursday, 24 March 2011 19
  23. 23. Netvizz • Gets groups, minimal attributes and egonets in gdf format. • Some issues with really large networks. • Plays well with GEPHIThursday, 24 March 2011 20
  24. 24. Facebook, Big Data and Controversy Taste, Ties & Time American Cultures The Facebook 100 Kevin Lewis et al. Pete Warden Amanda Traud et al.Thursday, 24 March 2011 21
  25. 25. Taste, Ties & Time Lewis et al. • What Happened: • An attempt to create the drosophila of social networks by merging academic records with Facebook data. • Great early results, and powerful publications in Social Networks, Journal of Computer- Mediated Communication and the American Journal of Sociology • Students at Harvard College realized this data was being taken without their permission and there are massive privacy concerns • How to get their data: • Go to Harvard and hang out with Nicholas Christakis - it’s only accessible through a single terminal managed by him.Thursday, 24 March 2011 22
  26. 26. American Cultures Pete Warden • What Happened • After leaving Apple, Pete Warden wanted to do some big data processing, both as a community gesture and as an attention garnering exercise. Captured friend links from all publicly available profiles (about 40% of Facebook’s population). • Released several graphics showing clusters based on interest in the U.S., such as ‘stayathomia’. • Was planning on releasing the data to the public (saying that by spidering he was not bound by their terms of service). • How to get the data: • Unless Warden is lying, you cannot. It has been destroyed after Facebook threatened legal action. However, a similar but smaller data set captured afterwards is floating around BitTorrent.Thursday, 24 March 2011 23
  27. 27. The Facebook 100 Mason Porter et al. • What Happened: • Through a contact [formerly] at Facebook, Mason Porter was able to access significant amounts of personal data for the first 100 schools on Facebook in 2005. • Porter released the data to the research community but accidentally kept in Facebook IDs, which allowed easy access for reidentification. • Still committed to releasing the data for academic purposes. • How to get their data: • Mason has since taken down, but it is still available in a pseudononymized format.Thursday, 24 March 2011 24
  28. 28. Summary • Facebook is an excellent source of data, but there are both legal/ethic concerns, and technical constraints. • Facebook personal networks provide good intuitions about social networks generally. • Networks can be built from friendship ties as well as derived statistics. • Insights are germane and emerging (egonets cluster around contexts and can show tie strength).Thursday, 24 March 2011 25
  29. 29. Relevant References • Brooks, Brandon, Howard T Welser, Bernie Hogan and Scott Titsworth. Hogan, Bernie. 2010. “Visualizing and Interpreting Facebook Networks.” In 2011. "Socioeconomic Status Updates: College Students, Family Analyzing Social Media Networks with NodeXL, eds. D Hansen, Marc A SES, and Emergent Social Capital in Facebook Networks" Smith, and Ben Shneiderman. Burlington, MA: Morgan Kaufmann, p. Information Communication & Society. Forthcoming. 165-180. Butler, Paul (2010). "Visualizing Friendships". Facebook Engineering Note. Lewis, K et al. 2008. “Tastes, ties, and time: A new social network dataset using Available at: http://www.facebook.com/note.php?note_id=469716398919 Facebook. com.” Social Networks 30(4): 330-342. Gilbert, Eric, and Karrie Karahalios. 2009. “Predicting tie strength with social Lewis, K, J Kaufman, and N Christakis. 2008. “The Taste for Privacy: An media.” In CHI ’09: Proceeding of the twenty-seventh annual SIGCHI Analysis of College Student Privacy Settings in an Online Social Network.” conference on Human factors in computing systems, New York, NY, USA: Journal of Computer-Mediated Communication 14(1): 79-100. ACM. Marlow, Cameron. 2009. "Maintained Relationships on Facebook". Overstated. Hogan, Bernie. 2008. "A comparison of on and offline networks through the Available at http://overstated.net/2009/03/09/maintained-relationships-on- Facebook API". Qualitative Methods in the Social Sciences 2: facebook Communication Networks on the Web. Amsterdam. December 2008. Traud, Amanda L. et al. 2008. “Comparing Community Structure to Characteristics in Online Collegiate Social Networks.” 17. Available at: http://arxiv.org/abs/0809.0690Thursday, 24 March 2011 26
  30. 30. Thank You Bernie Hogan Research Fellow, OII http://people.oii.ox.ac.uk/hogan Twitter: @blurky bernie.hogan@oii.ox.ac.ukThursday, 24 March 2011 27

×