CityGrid Architecture + API Overview from O'Reilly Strata Conference

644 views
532 views

Published on

This is a presentation given by Ana Martinez

Published in: Technology, Spiritual
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
644
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Demo
  • CityGrid Architecture + API Overview from O'Reilly Strata Conference

    1. 1. Ana MartinezKin LaneFebruary 2012 M.C. Escher
    2. 2. The problem
    3. 3. Big Bottleneck!
    4. 4. Single POF!
    5. 5. Places Processing
    6. 6. Places Processing Source 2 • Name • Address • Phone • reviews Source 1 Source 3 • Name • Name • Address • Address • Phone • Phone • Images • menu CityGrid Place
    7. 7. Why is it hard?Book is to ISBN what Product is to UPC and what Place is to ______No centrally regulated unique id (tax id is, but not public). Now what?Spago176 Canon DrBeverly Hills, CA 90210310-944-3924R. French Ac & Heating Inc Ray French Air Conditioning & Heating Service2211 martin luther king blvd 2211 MLK boulevard #104los angeles, CA, 90069 west Hollywood, CA, 90069310-358-5903 866-465-5303
    8. 8. Problem Definition• Medium size data set – 21mill rows, 120 cols• Time to process: Daily• Hybrid environment• Not all data is from same source
    9. 9. Solution Normalizer Matcher Merger
    10. 10. Normalizer Soundex Metaphone NYSIIS Matching Rating Coverphone Approach
    11. 11. Know Your DataStop Words • The Viper Room Viper RoomStemming • av aven avenu • avenue avn avnueCompression • county line county rd county roadTrunction • apt unit #
    12. 12. Normalizer 123 Martin Luther King.n 123 MartinLutherKing. 123 martinlutherking. Martin Luther King | martinlutherking canon column the | n | ave | (tokens)
    13. 13. Matching Strategy Do what you can on automated fashion and complement with manual steps.
    14. 14. Matching StrategyExact matching Set similarity joins Custom fuzzy matching
    15. 15. Matching Strategy• C - Support Vector Machine• Threashold: 0.996 – Precision: 98.1% – Recall: 97.5% 84% + manual -> % Match Rate
    16. 16. MergerRules: Provider truthworthiness Voting rules New data vs Old data Super providers History: Accepted Rejected
    17. 17. Example123 M L K Road Ste 45 123 Martin Luther King Rd 123 Martin L King Drive #45123 m l k road ste 45 123 martinluther king rd 123 martin l king drive #45(123) (m) (l) (k) (road) (123) (martin) (luther) (123) (martin) (l) (king)(ste) (45) (king) (rd) (drive) (#) (45)123 mlk road ste 45 123 martinlutherkingrd 123 martinlking drive # 45123 mlkrdste 45 123 mlkrd 123 mlkdr #45123 mlkrd 123 mlkrd 123 mlkdr123 mlk 123 mlk 123 mlk MATCH! MATCH! MATCH!
    18. 18. Findings & Tips• Domain Knowledge • Automation • Mechanical Turk • Machine Learning Run every 2hrs -> Match Rate of %
    19. 19. Solution for Search APIs
    20. 20. Solution for Places API
    21. 21. Performance Results
    22. 22. Updates • Hours • Real Time
    23. 23. Places Detail – Demo Time!• Details by ID – http://api.citygridmedia.com/content/places/v2/detail?listing_i d=11280452&client_ip=123.4.56.78&publisher=test – http://api.citygridmedia.com/content/places/v2/detail?public_i d=pinks-hot-dogs-los-angeles- 2&client_ip=123.4.56.78&publisher=test
    24. 24. Improvements• Shard Listing and Content Data• Integrate Mongo across all APIs
    25. 25. APIs Now we have rich Places APIHow do we make developers aware they exist?How do we get them to successfully integrate?
    26. 26. APIs – Supporting Developer Area Common Building Blocks • Getting Started •Terms of Use Publisher Overview • Documentation • FAQ • Terms of Use
    27. 27. APIs – Supporting Developer Area Developers Tools • Code Samples •Terms of Use Libraries • Mobile SDKs • Starter Kits • Hackathon Toolkits • Partner APIs
    28. 28. APIs – Evangelism - Online • Blogging • Twitter • LinkedIn • Facebook of Use Terms • Github • Stack Overflow • Quora • Hacker News • StumbleUpon • Reddit
    29. 29. APIs – Evangelism - Offline • Conferences • Hackathons Terms of Use • Meetups • Workshops
    30. 30. APIs – Easy Start + Engage Immediately• Testable APIs• Self-Service Terms of Use• Email After Registration• Follow on Twitter• Follow on LinkedIn
    31. 31. APIs – Feedback Loop + Voice• Email Support• Forum(s) of Use Terms• Twitter• LinkedIn
    32. 32. APIs – Monetization = Sustainability• Local Web Advertising• Local Mobile Advertising Terms of Use• Local Custom Ads• Places that Pay
    33. 33. APIs – Evangelize Internally• Developer Feedback• Roadmap Suggestions Terms of Use• Landscape Analysis• Technology Awareness• Trends• Internal Hackathons
    34. 34. APIs – Measure & Repeat Terms of Use

    ×