Getting Started with  Location Data      Data Day Austin      January 29, 2011
What makes location    data special?• Location information can be encoded  multiple ways• Several levels of hierarchical r...
Geo-Tags          Labels  Address                    Zip               Neighborhood                Place NameRegions      ...
Sources of Location              DataSocial Messages Checkin Providers   Tagged-Photos Reviews        Property Listings   ...
Possibilities   Twitter     Examine Twitter sentiment by    Twitter      Yelp                 place               Yelp    ...
Possibilities   Twitter                    Twitter      Yelp                    Yelp     Loopt                    Loopt SC...
Possibilities   Twitter                    Twitter      Yelp                    Yelp     Loopt                    Loopt SC...
How do you get it?•   Endpoints•   Authentication•   Data Formats•   Rate Limiting
Endpoints•   URL to make requests       http://api.twitter.com/1/geo/search    against API                http://api.simpl...
Authentication•   Anonymous, Basic, and OAuth/OAuth2•   Most languages support Basic Auth .with just a few lines of code• ...
OAuth Flow
OAuth Enabled
Data Formats                                      {•                                              "firstName": "John",    ...
Rate Limiting•   Rules of the data playground set by API provider    •   Play nice, RTFM, respect the data, cache the resu...
Shallow Dives
•   http://dev.twitter.com/doc/geo•   Authentication: Anonymous + OAuth•   What’s available    •   Recent tweet timelines ...
•   http://developers.facebook.com/docs/reference/api/checkin/•   Authentication: OAuth 2.0•   What’s available    •   You...
•   http://developer.foursquare.com/docs/•   Authentication: OAuth 2.0•   What’s available    •   Your own checkins    •  ...
•   http://simplegeo.com/products/•   Authentication: Just an API key•   What’s available    •   Search for places by addr...
•   http://gowalla.com/api/explorer•   Authentication: OAuth 2.0 (kinda sorta)•   What’s available:    •   Checkin history...
What Can We Learn About Austin?
More Specifically• Where do people from Austin checkin, in  Austin? • Which bars? • Which coffee shops? • Where do they shop?
What Do We Need?•   Users, Spots and    Checkins•   Gowalla has the data we    need•   Python to    •   Pull the data down...
Some Considerations• Read the API terms• Data comes in a pre-defined structure• Rate limiting/API restrictions• Where do we...
G Remember to phrase your question in theform of an API request
Getting The Data1. API Capabilities           2. Process 1. Search by lat/long         1. Turn Austin into geo- 2. Top 10 ...
What Did We Get?• 6,600 local spots• 36,000 users• 2,700 local users• 1,781,417 checkins• 238,000 local checkins from locals
Sandbox• Initially, stored in MongoDB as-is from  Gowalla• Great playground (esp. for lat/long data) • Native support for ...
Long-Term Storage           • Several options, each             with their own             tradeoffs           • Store the...
Long-Term Storage• Try them out!• Knowing the data and the queries will help  make the choice• High-level comparison of so...
Results
Top Places in Austin                           4000AUS Austin-BergstromWhole FoodsAustin Convention Center   3000Universit...
When Do People Checkin?           50000           37500Checkins           25000           12500               0           ...
Top Coffee Shops                                          Spider House        Halcyon           60           45Checkins   ...
Top Grocery Stores                                            Whole Foods                                            Centr...
Questions?
Thanks!         Shaun Dubuque sdubuque@argiainc.com           Sandeep Parikh sparikh@argiainc.comSLIDES: http://www.slides...
Upcoming SlideShare
Loading in...5
×

Getting Started with Location Data

1,097

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,097
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Getting Started with Location Data

    1. 1. Getting Started with Location Data Data Day Austin January 29, 2011
    2. 2. What makes location data special?• Location information can be encoded multiple ways• Several levels of hierarchical relationships with location data• Location data introduces new challenges with regards to querying and indexing
    3. 3. Geo-Tags Labels Address Zip Neighborhood Place NameRegions Points
    4. 4. Sources of Location DataSocial Messages Checkin Providers Tagged-Photos Reviews Property Listings Crime Reports
    5. 5. Possibilities Twitter Examine Twitter sentiment by Twitter Yelp place Yelp (avoid cranky coffee shops) Loopt Loopt SCVNGR SCVNGR Facebook Correlate crimes with checkins Facebook (are your friends really bankFoursquare robbers?) Foursquare Gowalla Gowalla Find neighborhoods with highest Google concentration of highly reviewed Google Trulia restaurants Trulia Zillow Zillow
    6. 6. Possibilities Twitter Twitter Yelp Yelp Loopt Loopt SCVNGR SCVNGR Facebook FacebookFoursquare Foursquare Gowalla Gowalla Google Google Trulia Trulia Zillow Zillow
    7. 7. Possibilities Twitter Twitter Yelp Yelp Loopt Loopt SCVNGR SCVNGR Facebook FacebookFoursquare Endless Foursquare Gowalla Gowalla Google Google Trulia Trulia Zillow Zillow
    8. 8. How do you get it?• Endpoints• Authentication• Data Formats• Rate Limiting
    9. 9. Endpoints• URL to make requests http://api.twitter.com/1/geo/search against API http://api.simplegeo.com/1.0/places/• Parameters define request https://maps.googleapis.com/maps/api/ place/search/• GET + POST rule the land https://api.foursquare.com/v2/venues/ https://gowalla.com/spots/
    10. 10. Authentication• Anonymous, Basic, and OAuth/OAuth2• Most languages support Basic Auth .with just a few lines of code• OAuth is a bit more involved (http://hueniverse.com/oauth/ guide/) • Keys, secrets, tokens, callbacks, and authentication endpoints... • Application vs. user authentication (two vs. three-legged) • v1 uses signed HTTP messages • v2 uses HTTPS and URL tokens
    11. 11. OAuth Flow
    12. 12. OAuth Enabled
    13. 13. Data Formats {• "firstName": "John", JSON "lastName": "Smith", • "age": 25, Compact "address": { • Great language support "streetAddress": "21 2nd Street", "city": "New York", • "state": "NY", Dumps directly into MongoDB "postalCode": "10021" }, (queries supported without "phoneNumber": refactoring) [ { "type": "home",• "number": "212 555-1234" XML }, • { Use JSON "type": "fax", "number": "646 555-4567" • JSON is much more widely ] } supported by location data } providers {“Source” : “http://en.wikipedia.org/wiki/JSON”}
    14. 14. Rate Limiting• Rules of the data playground set by API provider • Play nice, RTFM, respect the data, cache the results (if allowed)• Multiple flavors • IP-Based • User-Based • Application-Based• Some APIs are smarter than others • Elastic IPs (Amazon EC2)
    15. 15. Shallow Dives
    16. 16. • http://dev.twitter.com/doc/geo• Authentication: Anonymous + OAuth• What’s available • Recent tweet timelines (yours, friends, public) • Trends across the Twitterverse • Geo-tagged tweets• Notes • Rate limiting catches up to you unless you’re whitelisted or using streams • Geo-tagging (with lat/long) tweets is opt-in and not “sticky”• Rate Limit: 350 requests/hr if OAuth authenticated, 150 request/hr if anonymous
    17. 17. • http://developers.facebook.com/docs/reference/api/checkin/• Authentication: OAuth 2.0• What’s available • Your own checkins • Your friends’ recent checkins • Checkin counts from specific places• Notes • If you’re not friends, you can’t see their checkin • Lots of users but not a lot of checkins• Rate Limit: 600 calls per 600 seconds per access_token?
    18. 18. • http://developer.foursquare.com/docs/• Authentication: OAuth 2.0• What’s available • Your own checkins • Your friends’ recent checkins • Checkin counts from specific places• Notes • Richer data available (photos, comments, badges, tips) • If you’re not friends, you can’t see their checkin • Rate Limit: 500 requests per hour per set of endpoints per OAuth consumer per authenticated user
    19. 19. • http://simplegeo.com/products/• Authentication: Just an API key• What’s available • Search for places by address, lat/long, reverse IP lookup • Search for contextual information about a place • Weather, population density, other geographic features• Notes • Contextual information not terribly robust...yet (still beta) • Great language support • Rate Limit: None yet (but play nice)
    20. 20. • http://gowalla.com/api/explorer• Authentication: OAuth 2.0 (kinda sorta)• What’s available: • Checkin history for all users (with public profiles) and spots • Each spot lists top users and recent checkins • Real-time checkin updates (user streams)• Notes • Rate Limit: 5 requests/sec
    21. 21. What Can We Learn About Austin?
    22. 22. More Specifically• Where do people from Austin checkin, in Austin? • Which bars? • Which coffee shops? • Where do they shop?
    23. 23. What Do We Need?• Users, Spots and Checkins• Gowalla has the data we need• Python to • Pull the data down • Analyze the structure and content
    24. 24. Some Considerations• Read the API terms• Data comes in a pre-defined structure• Rate limiting/API restrictions• Where do we put the data? • Sandbox • Long term options
    25. 25. G Remember to phrase your question in theform of an API request
    26. 26. Getting The Data1. API Capabilities 2. Process 1. Search by lat/long 1. Turn Austin into geo- 2. Top 10 user checkins fenced sections per spot 2. Search by those sections 3. Recent checkins at spot using lat/long 4. User checkin history 3. Top 10 user lists + recent user checkins 5. User friends seeded user data 4. Recursively add users from friend lists 5. Retrieve checkins for each user
    27. 27. What Did We Get?• 6,600 local spots• 36,000 users• 2,700 local users• 1,781,417 checkins• 238,000 local checkins from locals
    28. 28. Sandbox• Initially, stored in MongoDB as-is from Gowalla• Great playground (esp. for lat/long data) • Native support for 2-D location• Performed initial analysis of data structure and content
    29. 29. Long-Term Storage • Several options, each with their own tradeoffs • Store the data as you intend to query it • De-normalization is likely
    30. 30. Long-Term Storage• Try them out!• Knowing the data and the queries will help make the choice• High-level comparison of some options • http://kkovacs.eu/cassandra-vs-mongodb- vs-couchdb-vs-redis
    31. 31. Results
    32. 32. Top Places in Austin 4000AUS Austin-BergstromWhole FoodsAustin Convention Center 3000University of TexasAlamo Drafthouse CinemaMutual Mobile 2000The HiltonStubbs Bar-B-QThe Flying Saucer 1000The Driskill 0 238,674 local checkins by locals
    33. 33. When Do People Checkin? 50000 37500Checkins 25000 12500 0 0 2 4 6 8 10 12 14 16 18 20 22 Time of Day
    34. 34. Top Coffee Shops Spider House Halcyon 60 45Checkins 30 15 0 0 3 6 9 12 15 18 21 Time of Day
    35. 35. Top Grocery Stores Whole Foods Central Market-North Lamar 300 225Checkins 150 75 0 0 3 6 9 12 15 18 21 Time of Day
    36. 36. Questions?
    37. 37. Thanks! Shaun Dubuque sdubuque@argiainc.com Sandeep Parikh sparikh@argiainc.comSLIDES: http://www.slideshare.net/crcsmnky/getting-started-with-location-dataTWITTER: @argiainfoWEB: http://www.argiainc.com
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×