Likes and Locations - Adventure in Social Data Mining
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Likes and Locations - Adventure in Social Data Mining

  • 3,980 views
Uploaded on

Presented by Gene Chuang at Q Tech Dinner at Lawry's Beverly Hills on 4/6/11

Presented by Gene Chuang at Q Tech Dinner at Lawry's Beverly Hills on 4/6/11

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
3,980
On Slideshare
3,980
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
1
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Likes and LocationsAdventure in Social Data Mining
    Gene Chuang – Exec Dir of Social Eng, ATTi
    Masahji Stewart – Founder, Synctree
    Q CTO Dinner 4/6/11 – Lawry’s Beverly Hills, CA
  • 2. Dedication
  • 3. Background
  • 4.
  • 5. Social Local Mobile Loco
  • 6. Why Mine Social and Local Data?
    Signals to improve user experience
    Timely and “Placely”
    Engagement
    Provide value – save time, save money
    Opt In, Privacy
  • 7. Yp.com Infrastructure
    Ruby on Rails for Web, Login and API
    Solr/Lucene for Search
    Hadoop for Data pipeline
    Hive for Ad Hoc queries on Hadoop
    Ruby ETL scripts
  • 8. Oauth 2
    Oauth 2 is an open protocol that allows users to share their private resources (e.g. photos, videos, contact lists) stored on one site with another site without having to hand out username and password – instead they hand out tokens
    Think Valet Key
  • 9. YP.comLogin/Registration
  • 10. Login Layer
    A
  • 11. Oauth 2 Dance
  • 12. Semi-Social Search
  • 13.
  • 14.
  • 15. Social Mining - Extract
    Extract Script
    Pull data out of a database (like Oracle), Hive, Files, hit Facebook,
    or any other source and output JSON data to STDOUT:
    For example to get count of the total users signed up by day:
    $ RAILS_ENV=production sdm extract total-users-by-day 2011-02-14
    {"day":"2011-02-14","count":891,"total":1328636}
    {"day":"2011-02-15","count":1088,"total":1329724}
    {"day":"2011-02-16","count":1016,"total":1330740}
    {"day":"2011-02-17","count":1359,"total":1332099}
    {"day":"2011-02-18","count":1143,"total":1333242}
    {"day":"2011-02-19","count":660,"total":1333902}
    {"day":"2011-02-20","count":597,"total":1334499}
    {"day":"2011-02-21","count":874,"total":1335373}
  • 16. Social Mining - Transform
    Transform scripts take JSON data in via STDIN and print JSON data out to STDOUT
    For example, to add ypids to existing facebook likes then filter out location and ypid
    matching data:
    $ cat data/facebook_likes_2011_01_12.json | sdm transform add-ypid | sdm transform filter-fields name phone location ypid_best_matchypidsypid_match_results id
    {"name":"SnuggleBunnies","location":{"city":"Carlisle","zip":"45005","country":"United States","state":"OH"},"id":"106864249335072","ypid_match_results":[]}
    {"name":"AssociateConstruction","location":{"city":"Franklin","zip":"45005","country":"United States","street":"31 Eagle Court","state":"OH"},"id":"235027821862","ypid_best_match":"6197197","phone":"(937)-746-2932"}
    {"name":"PHBistro","location":{"city":"Franklin","zip":"45005","country":"United States","street":"543 S Main Street","state":"OH"},"id":"261032274490","ypid_best_match":"1120570","phone":"(937)-743-0069"}
    {"name":"Bullwinkle's Top Hat Bistro - Miamisburg, OH","location":{"city":"Miamisburg","zip":"45342-2312","country":"United States","street":"19 North Main St","state":"OH"},"id":"260274607015","ypid_best_match":"12255503","phone":"(937)-859-7677"}
  • 17. Social Mining - Load
    Load
    Load scripts read data in from STDIN and load it into another system (an example of this would be a dashboard)
    For example loading total facebook accounts by day into the web dashboard
    $ sdm extract total-fb-accounts-by-day 2011-01-10 | sdm load dashboard total_fb_accounts day total
  • 18.
  • 19.
  • 20. Location Real-Time Fuzzy Matcher
    FP0 (exact match)
    Append LISTING_NAME + ADDRESS + CITY + PHONE
    Tokenize, normalize, strip punctuation, and stem
    Append tokens
    FP3 (fuzzy match)
    Append LISTING_NAME + ADDRESS + CITY + PHONE
    Tokenize, normalize, strip punctuation, and stem
    Remove tokens that are less than 2 chars long
    Remove upper-case short tokens (i.e., MD, CPA, DDS, etc)
    Remove non-phone, short, numerical tokens
    Remove stopwords based on top 170 most occurring listing_name tokens
    Order tokens alphabetically
    Append tokens
    Example:
    Vijay K. Sammy CPA, LLC153 Orchard StElmwood Park NJ - 07407(201) 218-0710
    FP Method Value
    FP0 vijaiksammicpallc153orchardstelmwoodpark2012180710 FP3 0710201218elmwoodorchardparksammistvijai
  • 21. Social Data
    Valid Facebook Access Tokens: 14K
    Total Unique Likes: 300K
    % Likes with Locations and/or Phones: 19%
    % Likes mapped to YPID: 38%
    Total Check-Ins: 530
  • 22. Social Mining Mother Lode
    Social Search
    Local Recommendation Engine
    Discovery Wall
    Top 10 List
    Social e-Commerce
    Online Presence Management – Social CRM
  • 23. Questions?
    genechuang@gmail.com
    http://www.twitter.com/genechuang
    http://www.quora.com/Gene-Chuang
    http://www.linkedin.com/in/genechuang