DIY basic Facebook data mining

28,246 views

Published on

Some elementary principles and procedures for Facebook data-mining. Combination of Graph API and OpenRefine software for parsing the JSON output. Two beer brands are analyze with respect to their active fans and engagement.
The second part is dedicated to the Interest positioning (as pioneered by PerfectCrowd) technique and what can OutWit Hub do as a substitute for more sophisticated techniques & apps.

Published in: Social Media, Technology
3 Comments
25 Likes
Statistics
Notes
  • So, STEM/MARK, you are basically surprised by how many babies born each day.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hey Alexios,
    thanks! These are indeed the very basics, but I'm always surprised how many people find this new and useful.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Great presentation, it is very good quick approach for testing some hypothesis for begginers. Thank you for the material
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
28,246
On SlideShare
0
From Embeds
0
Number of Embeds
203
Actions
Shares
0
Downloads
368
Comments
3
Likes
25
Embeds 0
No embeds

No notes for slide

DIY basic Facebook data mining

  1. 1. Pleasures of basic Facebook data shoveling Jan Fait STEM/MARK Guest Lecture at Charles University, Prague, 4.12.2013
  2. 2. Today we are going to talk about : 1. Why A tiny philosophical corner 2. How No programming, just copy pasting
  3. 3. Why would I even try to mine FB data myself? The Boring part The Fun part Why are we doing this? What‘s in it for you? What are other ways to do this? How is it done?
  4. 4. What is a facebook like worth for your business?
  5. 5. Here‘s why. Sample questions: In what ways are my fans like my other customers? What do I actually know about my fans and followers on top of their age? Can I group my followers into segments? Can I target my followers based on what they (are) like ? Which ones are creating the most activity? What on earth are all the other ones doing? How similar/different is my competitors fanbase?
  6. 6. Built-in insights are fine for fanpage managers, but not for research Who could have guessed..
  7. 7. Limitations of FB research? External validity Research in social media tells you little about life outside social media Facebook self vs. Real self Sampling Only some profiles are public > Is there enough data to make claims about my fanbase? Organic environment Network engineers keep changing stuff so you are in constant need of adjustment
  8. 8. OK, but there are other ways.. Bambillion ! Always posted by a lady in her 40s
  9. 9. Indeed, there are ways: Ask professionals and pay them accordingly (see below) Setup a social media login or create an app (a rather good investment) Use ready-made tools and solutions (and pay for the useful ones) DO IT YOURSELF – PARTISAN STYLE
  10. 10. Come Buy Recommend Return Buy more What does a brand manager want from a customer?
  11. 11. Come Engage (Share) Return Engage more What does a fanpage manager want from a fan?
  12. 12. How is it done?
  13. 13. Obstacles ahead Facebook developers are smart so the road is a bit thorny Good tools are usually not free Open source tools are usually not as good Its mostly fine legally
  14. 14. … but I am not a technical type. a) Find someone who is b) Break it down into little steps c) Your chance to stand out
  15. 15. Tools to use (where facebook meets google and google meets microsoft) Facebook‘s own Graph API https://developers.facebook.com/tools/explorer OpenRefine http://openrefine.org/download.html Engineered at Google Inc., formerly named Google Refine MS Excel / iOS Numbers Programs > MS Office / ??
  16. 16. Subjects to examine (pick any fanpage or group or event) https://www.facebook.com/Gambrinus.cz
  17. 17. Subjects to examine (pick any fanpage or group or event) https://www.facebook.com/PilsnerUrquellCzech
  18. 18. Stand-off Brand More expensive, high-end beer Widely and wildly consumed cheaper beer Quality, tradition, national heritage,craftmanship Fun, shared moments, soccer Number of fans 204 734 47 566 Number of posts in 2013 415 425 Product Image Not really competitors,have the same mothership !
  19. 19. Hypothesis time H1 : Their active fanbase consists of a less 10% of the total fans H2 : There is more than 10% overlap in their active fanbase H3 : Gambrinus and Pilsner Urquell have the same engagement per post H4 :The interest positioning will show a small affinity as beer is widely appreaciate across the population
  20. 20. Action !
  21. 21. Step 1 - Do not fear the Graph API https://developers.facebook.com
  22. 22. Step 1 - Do not fear the Graph API https://developers.facebook.com/tools/
  23. 23. Step 1 - Do not fear the Graph API Access_token ! Result window Fields selector https://developers.facebook.com/tools/explorer
  24. 24. Step 1 – Facebook is nothing but a couple big tables https://developers.facebook.co m/docs/reference/fql
  25. 25. Step 1 – The JSON result format (JavaScript object notation) Graph API gives you a result in JSON Format. Visually disturbing yet convenient format used in web applications. Wait and see how OpenRefine handles it.. No, not this Json
  26. 26. Step 2 – Making a simple Graph API query Get the id of the fanpage - many ways to do it, f.e : 1) Click on a page profile pic 2) Look in the address bar and cut the last number before „type“ 146991996743
  27. 27. Step 2 – Making a simple Graph API query 1) Get a fresh access_token Important, otherwise you will only get a handful 2) And get data from your own timeline 123455687/posts?post_id&limit=50
  28. 28. Step 2 – Making a more complex query 1) Repeat with our Gambrinus.cz fanpage 2) And add some more fields – query likes and comments, increase limit, reduce timespan with a unix timestamp (135..) 146991996743/posts?fields=likes,comments &limit=20000&since=1356998400 (from 1.1.2013)
  29. 29. Step 3 – Build a string to post the same query in browser address bar A) URL : https://graph.facebook.com/ B) query : 146991996743/posts?fields=likes,comments&limit=20000&since=13 56998400 C) Access token : &access_token=XXXXXXXXX……and so on Put together A+B+C : https://graph.facebook.com/146991996743/posts?fields=likes,comm ents&limit=20000&since=1356998400&access_token=XXXXX
  30. 30. Step 4 – Run OpenRefine 1) Run the programme (it opens in your browser) 2) Select Web Addresses
  31. 31. Step 5 – Paste your address into the field 1) Take our query https://graph.facebook.com/146991996743/posts?fields=likes,comments &limit=20000&since=1356998400&access_token=XXXXXXX 2) Paste here 3) Click next
  32. 32. Step 6 – Transform your result 1) Tell the programme that your result is JSON by clicking on „JSON Files“
  33. 33. Step 7 – Pick an individual node ! This is one „like“ on a post made by user Maggu Ka
  34. 34. Step 7 – Behold ! Click on „Create Project“ in the upper left and download data in Excel Sheet Be sure this does not exceed your „limit“ in the query, otherwise increase the limit
  35. 35. Back to Step 3 ! The only thing you need to change is the id – instead of Gambrinus, now try the Pilsner Urquell id Don‘t remember? https://www.youtube.com/watch?v=vUxdB-nl0Bw
  36. 36. Analysis Note : The metrics chosen could be re- designed to reflect other stuff like time and location (sort of)
  37. 37. Engagement, like .. ehm,kiwi.. has layers Skin : All fans Core : Fans who interact regularly Inside : Number fans who interact Sample question : Has my post attracted anyone outside the usual bunch of followers who simply like everything?
  38. 38. Make crude metrics of those layers Skin : All fans = 100% Fans with more than 1 interaction / All fans = 2% Unique Ids within ineractions / All fans = 7% Tip : By messing around with the column named created_time you can see how your core fanbase has been losing and gaining interest in your posts and whether it kept ineracting = compute a lifetime of a fan
  39. 39. Try it with real Gambrinus fanpage data 47 566 = 100% 575 interactors with more than 1 action = 1.2% (28% of all active fans) 2004 unique interactors = 4.2% Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
  40. 40. And now the Pilsner Urquell 204 734= 100% 715 interactors with more than 1 action = 0.03% (30% of all active fans) 2358 unique interactors = 1% Tip : What are these ratios among competitors ? Isn‘t that more important than the widely cited number of fans?? Are any of your fans also in the competitors core fanbase? Uhh, you nasty weasels !
  41. 41. Stand-off revisited. H1 rejected and H2 confirmed Brand Number of fans 204 734 47 566 Number of posts in 2013 415 425 Number of active fans in 2013 2358 / 1.1% 2004 / 4.2% Number of repeated interactions 715 / 30% of active 575 / 28% of active Fanbase overlap 5% of active Variations : Share of all interactions created by the TOP 10% fans..
  42. 42. How to compute average engagement? 1) You may want to try to query the „insights“ table, but mostly no success for pages other than yours 2) Else you need all the posts with likes,comments (and shares) already aggregated https://graph.facebook.com/fql?q=select post_id, like_info,comment_info,share_info from stream where source_id=146991996743 and created_time>1356998400 and actor_id=146991996743 LIMIT 20000&access_token=XXXXX 3) Paste this query to OpenRefine like previously and work with Excel sheet from there Tip : Limit the type by adding type in(46,80,128,247) to the where clause so you don‘t get posts like „group created“
  43. 43. Stand-off again. H3 rejected Brand Average engagement 248 74 Median Engagement 144 29 10% Top trimmed average 169 / diff of 79 44 / diff of 30 This may look surprising, especially considering the active fanbase is more or less equal. Seems like the total fanbase does play a role. Tip : For more precise information, you may want to exclude the top 5% fans to see how much it changes
  44. 44. Study competitor‘s top posts https://www.facebook.com/Pils nerUrquellCzech/posts/101513 04524945974 https://www.facebook.com/Gam brinus.cz/posts/1015158166423 1744 Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
  45. 45. Some conclusions Followers have a lifespan, some are zombies, some have left Facebook Large group of active followers is superior to having large zombie fanbase => Facebook edge rank has buried your posts for those guys anyway. You can make up metrics once you have the data > sometimes better to have the data first The Graph API returns errors all the time, so don‘t be discouraged..
  46. 46. Step 4 – • Sum it up The dogdy part : Know more about the fans
  47. 47. The fans are well described by their favorites, likes, interests, ...
  48. 48. Facebook ids of fans + Web Scraper You have facebook id of someone => you can visit her profile You have a web scraper (like OpenRefine) => you can visit all the profiles without actually browsing throught them .. And download whatever the browser sees.. It is against the Facebook policies to scrape profile pages en-masse, but its „ok“ as a training excercise. Pete Warden scraped 200 000 000 FB profiles and they let the lawyers off the leash http://www.facebook.com/apps/site_scraping_tos_ter ms.php
  49. 49. Step 2 – Preparing data for Outwit Hub OutWit Hub is a free intelligent scraper (limited amounts of data) Prepare the links of Pilsner fans is a notepad file like below and File=> Open the txt. File in Outwit Hub http://download.cnet. com/OutWitHub/3000-11745_410846181.html
  50. 50. Step 3 – Creating a scraper in Outwit Hub Prepare a scraper 1) 2) 3) 4) Go to the „scrapers“ tab Click new Name the scraper somehow Do the rest as below Get everything starting with -- and ending with
  51. 51. Step 4 – Running the scraper on a couple of links
  52. 52. Step 5 – Calculate Affinity Count occurences of individual fanpages in the results and compare them to the occurence in the total czech facebook population of 3 770 000 1) 2) 3) 4) 5) Natural affinity = Total fans of the page / 3 770 000 Pilsner affinity = Occurences in results / Fans of Pilsner Affinity ratio = Get the ratio of the two Repeat for all fanpages Bring up those where occurence is the largest Tip : Take the URL of the page and add /posts/ and the post id you get from spreadsheet.
  53. 53. Step 6 – Results (sample)
  54. 54. Step 6 – Troubleshooting a) Go to Preferences > Time Settings and make sure none of the sliders is „in the red“. That would result in frequent CAPTCHA checks on most protected servers.. b) Make sure your scraper is targeting the right domain c) Make sure your „Marker Before“ and „Marker After“ are actually present on the page.. d) It is becoming easier to programm an app than try to scrape a meaningful amount of data
  55. 55. Thank you. Now to your questions. fait@stemmark.cz www.stemmark.cz Credits for affinity idea : Work by Jan Schmid & Josef Šlerka Images : Photopin.com
  56. 56. Download all materials at : www.stemmark.cz/downloads/educ/fb_mining.zip By the way, Mark Zuckerberg likes Pilsner Urquell.

×