CityGrid Architecture + API Overview from O'Reilly Strata Conference
Upcoming SlideShare
Loading in...5
×
 

CityGrid Architecture + API Overview from O'Reilly Strata Conference

on

  • 511 views

This is a presentation given by Ana Martinez

This is a presentation given by Ana Martinez

Statistics

Views

Total Views
511
Views on SlideShare
511
Embed Views
0

Actions

Likes
1
Downloads
17
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Demo

CityGrid Architecture + API Overview from O'Reilly Strata Conference CityGrid Architecture + API Overview from O'Reilly Strata Conference Presentation Transcript

  • Ana MartinezKin LaneFebruary 2012 M.C. Escher
  • The problem
  • Big Bottleneck! View slide
  • Single POF! View slide
  • Places Processing
  • Places Processing Source 2 • Name • Address • Phone • reviews Source 1 Source 3 • Name • Name • Address • Address • Phone • Phone • Images • menu CityGrid Place
  • Why is it hard?Book is to ISBN what Product is to UPC and what Place is to ______No centrally regulated unique id (tax id is, but not public). Now what?Spago176 Canon DrBeverly Hills, CA 90210310-944-3924R. French Ac & Heating Inc Ray French Air Conditioning & Heating Service2211 martin luther king blvd 2211 MLK boulevard #104los angeles, CA, 90069 west Hollywood, CA, 90069310-358-5903 866-465-5303
  • Problem Definition• Medium size data set – 21mill rows, 120 cols• Time to process: Daily• Hybrid environment• Not all data is from same source
  • Solution Normalizer Matcher Merger
  • Normalizer Soundex Metaphone NYSIIS Matching Rating Coverphone Approach
  • Know Your DataStop Words • The Viper Room Viper RoomStemming • av aven avenu • avenue avn avnueCompression • county line county rd county roadTrunction • apt unit #
  • Normalizer 123 Martin Luther King.n 123 MartinLutherKing. 123 martinlutherking. Martin Luther King | martinlutherking canon column the | n | ave | (tokens)
  • Matching Strategy Do what you can on automated fashion and complement with manual steps.
  • Matching StrategyExact matching Set similarity joins Custom fuzzy matching
  • Matching Strategy• C - Support Vector Machine• Threashold: 0.996 – Precision: 98.1% – Recall: 97.5% 84% + manual -> % Match Rate
  • MergerRules: Provider truthworthiness Voting rules New data vs Old data Super providers History: Accepted Rejected
  • Example123 M L K Road Ste 45 123 Martin Luther King Rd 123 Martin L King Drive #45123 m l k road ste 45 123 martinluther king rd 123 martin l king drive #45(123) (m) (l) (k) (road) (123) (martin) (luther) (123) (martin) (l) (king)(ste) (45) (king) (rd) (drive) (#) (45)123 mlk road ste 45 123 martinlutherkingrd 123 martinlking drive # 45123 mlkrdste 45 123 mlkrd 123 mlkdr #45123 mlkrd 123 mlkrd 123 mlkdr123 mlk 123 mlk 123 mlk MATCH! MATCH! MATCH!
  • Findings & Tips• Domain Knowledge • Automation • Mechanical Turk • Machine Learning Run every 2hrs -> Match Rate of %
  • Solution for Search APIs
  • Solution for Places API
  • Performance Results
  • Updates • Hours • Real Time
  • Places Detail – Demo Time!• Details by ID – http://api.citygridmedia.com/content/places/v2/detail?listing_i d=11280452&client_ip=123.4.56.78&publisher=test – http://api.citygridmedia.com/content/places/v2/detail?public_i d=pinks-hot-dogs-los-angeles- 2&client_ip=123.4.56.78&publisher=test
  • Improvements• Shard Listing and Content Data• Integrate Mongo across all APIs
  • APIs Now we have rich Places APIHow do we make developers aware they exist?How do we get them to successfully integrate?
  • APIs – Supporting Developer Area Common Building Blocks • Getting Started •Terms of Use Publisher Overview • Documentation • FAQ • Terms of Use
  • APIs – Supporting Developer Area Developers Tools • Code Samples •Terms of Use Libraries • Mobile SDKs • Starter Kits • Hackathon Toolkits • Partner APIs
  • APIs – Evangelism - Online • Blogging • Twitter • LinkedIn • Facebook of Use Terms • Github • Stack Overflow • Quora • Hacker News • StumbleUpon • Reddit
  • APIs – Evangelism - Offline • Conferences • Hackathons Terms of Use • Meetups • Workshops
  • APIs – Easy Start + Engage Immediately• Testable APIs• Self-Service Terms of Use• Email After Registration• Follow on Twitter• Follow on LinkedIn
  • APIs – Feedback Loop + Voice• Email Support• Forum(s) of Use Terms• Twitter• LinkedIn
  • APIs – Monetization = Sustainability• Local Web Advertising• Local Mobile Advertising Terms of Use• Local Custom Ads• Places that Pay
  • APIs – Evangelize Internally• Developer Feedback• Roadmap Suggestions Terms of Use• Landscape Analysis• Technology Awareness• Trends• Internal Hackathons
  • APIs – Measure & Repeat Terms of Use