Likes and Locations - Adventure in Social Data Mining


Published on

Presented by Gene Chuang at Q Tech Dinner at Lawry's Beverly Hills on 4/6/11

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Likes and Locations - Adventure in Social Data Mining

  1. 1. Likes and LocationsAdventure in Social Data Mining<br />Gene Chuang – Exec Dir of Social Eng, ATTi<br />Masahji Stewart – Founder, Synctree<br />Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA<br />
  2. 2. Dedication<br />
  3. 3. Background<br />
  4. 4.
  5. 5. Social Local Mobile Loco<br />
  6. 6. Why Mine Social and Local Data?<br />Signals to improve user experience<br />Timely and “Placely”<br />Engagement<br />Provide value – save time, save money<br />Opt In, Privacy<br />
  7. 7. Infrastructure<br />Ruby on Rails for Web, Login and API<br />Solr/Lucene for Search<br />Hadoop for Data pipeline<br />Hive for Ad Hoc queries on Hadoop<br />Ruby ETL scripts<br />
  8. 8. Oauth 2<br />Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens<br />Think Valet Key<br />
  9. 9. YP.comLogin/Registration<br />
  10. 10. Login Layer<br />A<br />
  11. 11. Oauth 2 Dance<br />
  12. 12. Semi-Social Search<br />
  13. 13.
  14. 14.
  15. 15. Social Mining - Extract<br />Extract Script<br />Pull data out of a database (like Oracle), Hive, Files, hit Facebook,<br />or any other source and output JSON data to STDOUT:<br />For example to get count of the total users signed up by day:<br />$ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14<br />{"day":"2011-02-14","count":891,"total":1328636}<br />{"day":"2011-02-15","count":1088,"total":1329724}<br />{"day":"2011-02-16","count":1016,"total":1330740}<br />{"day":"2011-02-17","count":1359,"total":1332099}<br />{"day":"2011-02-18","count":1143,"total":1333242}<br />{"day":"2011-02-19","count":660,"total":1333902}<br />{"day":"2011-02-20","count":597,"total":1334499}<br />{"day":"2011-02-21","count":874,"total":1335373}<br />
  16. 16. Social Mining - Transform<br />Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT<br />For example, to add ypids to existing facebook likes then filter out location and ypid<br />matching data:<br />$ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id<br />{"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]}<br />{"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"}<br />{"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"}<br />{"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}<br />
  17. 17. Social Mining - Load<br />Load<br />Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard)<br />For example loading total facebook accounts by day into the web dashboard<br />$ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total<br />
  18. 18.
  19. 19.
  20. 20. Location Real-Time Fuzzy Matcher<br />FP0 (exact match)<br /> Append LISTING_NAME + ADDRESS + CITY + PHONE<br /> Tokenize, normalize, strip punctuation, and stem<br /> Append tokens<br />FP3 (fuzzy match)<br /> Append LISTING_NAME + ADDRESS + CITY + PHONE<br /> Tokenize, normalize, strip punctuation, and stem<br /> Remove tokens that are less than 2 chars long<br /> Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)<br /> Remove non-phone, short, numerical tokens <br /> Remove stopwords based on top 170 most occurring listing_name tokens<br /> Order tokens alphabetically<br /> Append tokens<br />Example:<br />Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710<br />FP Method Value <br />FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai <br />
  21. 21. Social Data<br />Valid Facebook Access Tokens: 14K<br />Total Unique Likes: 300K<br />% Likes with Locations and/or Phones: 19%<br />% Likes mapped to YPID: 38%<br />Total Check-Ins: 530<br />
  22. 22. Social Mining Mother Lode<br />Social Search<br />Local Recommendation Engine<br />Discovery Wall<br />Top 10 List<br />Social e-Commerce<br />Online Presence Management – Social CRM<br />
  23. 23. Questions?<br /><br /><br /><br /><br />