Getting Started with Location Data
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Getting Started with Location Data

on

  • 1,224 views

 

Statistics

Views

Total Views
1,224
Views on SlideShare
1,220
Embed Views
4

Actions

Likes
0
Downloads
8
Comments
0

2 Embeds 4

http://www.linkedin.com 3
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Getting Started with Location Data Presentation Transcript

  • 1. Getting Started with Location Data Data Day Austin January 29, 2011
  • 2. What makes location data special?• Location information can be encoded multiple ways• Several levels of hierarchical relationships with location data• Location data introduces new challenges with regards to querying and indexing
  • 3. Geo-Tags Labels Address Zip Neighborhood Place NameRegions Points
  • 4. Sources of Location DataSocial Messages Checkin Providers Tagged-Photos Reviews Property Listings Crime Reports
  • 5. Possibilities Twitter Examine Twitter sentiment by Twitter Yelp place Yelp (avoid cranky coffee shops) Loopt Loopt SCVNGR SCVNGR Facebook Correlate crimes with checkins Facebook (are your friends really bankFoursquare robbers?) Foursquare Gowalla Gowalla Find neighborhoods with highest Google concentration of highly reviewed Google Trulia restaurants Trulia Zillow Zillow
  • 6. Possibilities Twitter Twitter Yelp Yelp Loopt Loopt SCVNGR SCVNGR Facebook FacebookFoursquare Foursquare Gowalla Gowalla Google Google Trulia Trulia Zillow Zillow
  • 7. Possibilities Twitter Twitter Yelp Yelp Loopt Loopt SCVNGR SCVNGR Facebook FacebookFoursquare Endless Foursquare Gowalla Gowalla Google Google Trulia Trulia Zillow Zillow
  • 8. How do you get it?• Endpoints• Authentication• Data Formats• Rate Limiting
  • 9. Endpoints• URL to make requests http://api.twitter.com/1/geo/search against API http://api.simplegeo.com/1.0/places/• Parameters define request https://maps.googleapis.com/maps/api/ place/search/• GET + POST rule the land https://api.foursquare.com/v2/venues/ https://gowalla.com/spots/
  • 10. Authentication• Anonymous, Basic, and OAuth/OAuth2• Most languages support Basic Auth .with just a few lines of code• OAuth is a bit more involved (http://hueniverse.com/oauth/ guide/) • Keys, secrets, tokens, callbacks, and authentication endpoints... • Application vs. user authentication (two vs. three-legged) • v1 uses signed HTTP messages • v2 uses HTTPS and URL tokens
  • 11. OAuth Flow
  • 12. OAuth Enabled
  • 13. Data Formats {• "firstName": "John", JSON "lastName": "Smith", • "age": 25, Compact "address": { • Great language support "streetAddress": "21 2nd Street", "city": "New York", • "state": "NY", Dumps directly into MongoDB "postalCode": "10021" }, (queries supported without "phoneNumber": refactoring) [ { "type": "home",• "number": "212 555-1234" XML }, • { Use JSON "type": "fax", "number": "646 555-4567" • JSON is much more widely ] } supported by location data } providers {“Source” : “http://en.wikipedia.org/wiki/JSON”}
  • 14. Rate Limiting• Rules of the data playground set by API provider • Play nice, RTFM, respect the data, cache the results (if allowed)• Multiple flavors • IP-Based • User-Based • Application-Based• Some APIs are smarter than others • Elastic IPs (Amazon EC2)
  • 15. Shallow Dives
  • 16. • http://dev.twitter.com/doc/geo• Authentication: Anonymous + OAuth• What’s available • Recent tweet timelines (yours, friends, public) • Trends across the Twitterverse • Geo-tagged tweets• Notes • Rate limiting catches up to you unless you’re whitelisted or using streams • Geo-tagging (with lat/long) tweets is opt-in and not “sticky”• Rate Limit: 350 requests/hr if OAuth authenticated, 150 request/hr if anonymous
  • 17. • http://developers.facebook.com/docs/reference/api/checkin/• Authentication: OAuth 2.0• What’s available • Your own checkins • Your friends’ recent checkins • Checkin counts from specific places• Notes • If you’re not friends, you can’t see their checkin • Lots of users but not a lot of checkins• Rate Limit: 600 calls per 600 seconds per access_token?
  • 18. • http://developer.foursquare.com/docs/• Authentication: OAuth 2.0• What’s available • Your own checkins • Your friends’ recent checkins • Checkin counts from specific places• Notes • Richer data available (photos, comments, badges, tips) • If you’re not friends, you can’t see their checkin • Rate Limit: 500 requests per hour per set of endpoints per OAuth consumer per authenticated user
  • 19. • http://simplegeo.com/products/• Authentication: Just an API key• What’s available • Search for places by address, lat/long, reverse IP lookup • Search for contextual information about a place • Weather, population density, other geographic features• Notes • Contextual information not terribly robust...yet (still beta) • Great language support • Rate Limit: None yet (but play nice)
  • 20. • http://gowalla.com/api/explorer• Authentication: OAuth 2.0 (kinda sorta)• What’s available: • Checkin history for all users (with public profiles) and spots • Each spot lists top users and recent checkins • Real-time checkin updates (user streams)• Notes • Rate Limit: 5 requests/sec
  • 21. What Can We Learn About Austin?
  • 22. More Specifically• Where do people from Austin checkin, in Austin? • Which bars? • Which coffee shops? • Where do they shop?
  • 23. What Do We Need?• Users, Spots and Checkins• Gowalla has the data we need• Python to • Pull the data down • Analyze the structure and content
  • 24. Some Considerations• Read the API terms• Data comes in a pre-defined structure• Rate limiting/API restrictions• Where do we put the data? • Sandbox • Long term options
  • 25. G Remember to phrase your question in theform of an API request
  • 26. Getting The Data1. API Capabilities 2. Process 1. Search by lat/long 1. Turn Austin into geo- 2. Top 10 user checkins fenced sections per spot 2. Search by those sections 3. Recent checkins at spot using lat/long 4. User checkin history 3. Top 10 user lists + recent user checkins 5. User friends seeded user data 4. Recursively add users from friend lists 5. Retrieve checkins for each user
  • 27. What Did We Get?• 6,600 local spots• 36,000 users• 2,700 local users• 1,781,417 checkins• 238,000 local checkins from locals
  • 28. Sandbox• Initially, stored in MongoDB as-is from Gowalla• Great playground (esp. for lat/long data) • Native support for 2-D location• Performed initial analysis of data structure and content
  • 29. Long-Term Storage • Several options, each with their own tradeoffs • Store the data as you intend to query it • De-normalization is likely
  • 30. Long-Term Storage• Try them out!• Knowing the data and the queries will help make the choice• High-level comparison of some options • http://kkovacs.eu/cassandra-vs-mongodb- vs-couchdb-vs-redis
  • 31. Results
  • 32. Top Places in Austin 4000AUS Austin-BergstromWhole FoodsAustin Convention Center 3000University of TexasAlamo Drafthouse CinemaMutual Mobile 2000The HiltonStubbs Bar-B-QThe Flying Saucer 1000The Driskill 0 238,674 local checkins by locals
  • 33. When Do People Checkin? 50000 37500Checkins 25000 12500 0 0 2 4 6 8 10 12 14 16 18 20 22 Time of Day
  • 34. Top Coffee Shops Spider House Halcyon 60 45Checkins 30 15 0 0 3 6 9 12 15 18 21 Time of Day
  • 35. Top Grocery Stores Whole Foods Central Market-North Lamar 300 225Checkins 150 75 0 0 3 6 9 12 15 18 21 Time of Day
  • 36. Questions?
  • 37. Thanks! Shaun Dubuque sdubuque@argiainc.com Sandeep Parikh sparikh@argiainc.comSLIDES: http://www.slideshare.net/crcsmnky/getting-started-with-location-dataTWITTER: @argiainfoWEB: http://www.argiainc.com