#rtgeo (Where 2.0 2011)


Published on

Published in: Technology
  • Be the first to comment

#rtgeo (Where 2.0 2011)

  1. 1. Real-Time Geo #rtgeo
  2. 2. Who am i?
  3. 3. Giving a real-t ime geo talk at@where 20. How do you build stuff?#rtgeo.19 Apr via Twitter for iPhone from Santa Clara Convention Center 50 01 Great America Parkway Santa Clara, CA 95054 View Tweets at this place
  4. 4. Background [] raffi@ wherehoo wherehoo ~/: cat / etc/servi 5859/udp ces | gre # WHEREHO p whereho o 5859/tcp OWherehoo (2000) # WHEREHO O⇢ “The Stuff Around You”⇢ “Wherehoo Server: An interactive location service for software agents and intelligent systems” - J.Youll, R.Krikorian⇢ In your /etc/services file!BusRadio (2004)⇢ Designed mobile computers to play media while also transmitting telemetry⇢ Looked and sounded like a radio - but really a Linux computerOneHop (2007)⇢ Bluetooth proximity-based social networking
  5. 5. BackgroundTwitter⇢Originally tech lead of API / Platform team⇢Built the first geo-based infrastructure before acquisition of Mixer Labs in December of 2009⇢Now lead of the Application Services group⇢Runs five teams focused on scalable infrastructure around “core” data objects ⇢Tweets, users, timelines, places, etc. ⇢Delivery, authentication, APIs, etc.
  6. 6. Table of contentsBackground⇢ Why are we interested in this?Twitter’s geo APIs⇢ How do we allow people to talk about place?⇢ Context around “place”Problem statement⇢ What do we want our system to do?Infrastructure⇢ How is Twitter solving this problem?
  7. 7. People want to talk about places
  8. 8. What’s happening here?Twitter’s Geo APIs
  9. 9. Original attemptsAdding it to the tweet⇢ Use myloc.me, et. al. to add text to the tweet⇢ Puts location “in band”⇢ Takes from the 140 charactersSetting profile level locations⇢ Set the user/location of a Twitter user⇢ There’s an API for that!⇢ Not a per-tweet basis⇢ Not intended for high frequency alterations
  10. 10. Profile level changes [] raffi@~/: twurl -d location="San Francisco, California" http://twitter.com/account/update_location.xml <user> <id>8285392</id> <name>raffi</name> <screen_name>raffi</screen_name> <location>San Francisco, California</location> ... </user>
  11. 11. Geotagging API
  12. 12. Geotagging APIAdding it to the tweet⇢ Per-tweet basis⇢ Out of band and pure metadata⇢ Does not take from the 140 charactersNative Twitter support⇢ Simple way to update status with location data⇢ Ability to remove geotags from your tweets en masse⇢ Using GeoRSS and GeoJSON as the encoding format⇢ Across all Twitter APIs (REST, Search, and Streaming)
  13. 13. status/update [] raffi@~/: twurl -d "status=hey-ho&lat=37.3&long=-121.9" http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <geo xmlns:georss="http://www.georss.org/georss> <georss:point>37.3 -121.9</georss:point> </geo> ... </user>
  14. 14. geocode “latitud parameSearch e,longit radius h ude,rad as units ter take s ius” wh of mi or ere km [] raffi@~/: curl "http://search.twitter.com/search.atom? geocode=40.757929%2C-73.985506%2C25km&source=foursquare" ... <title>On the way to ace now, so whenever you can make it Ill be there. (@ Port Imperial Ferry in Weehawken) http://4sq.com/ 2rq0vO</title> ... <twitter:geo> <georss:point>40.7759 -74.0129</georss:point> </twitter:geo> ...
  15. 15. geohose
  16. 16. location filtering [] raffi@~/: curl "http://stream.twitter.com/1/statuses/filter.xml? locations=-74.5129,40.2759,-73.5019,41.2759" locations is a b ounding box s “long1,lat1,lon pecified by g2,lat2” and ca to 10 location n track up s that are mos square (~60 m t 1 degree iles square an to cover most d enough metropolitan areas)
  17. 17. Trends API
  18. 18. Trends APIGlobal Trends⇢Analysis of “hot conversations”⇢Does not take from the 140 charactersLocation specific trends⇢Tweets being localized through a variety of means internally⇢Locations exposed over the API as WOEIDs and Twitter IDs⇢Can ask for available trends sorted by distnace
  19. 19. available locations [] raffi@~/: curl "http://api.twitter.com/1/trends/available.xml" <locations type=”array”> <location> <woeid>2487956</woeid> <name>San Francisco</name> <placeTypeName code=”7”>Town</placeTypeName> <country type=”Country” code=”US”>United States</country> <url>http://where.yahooapis.com/v1/place/2487956</url> ke a lat and long nally ta </location> C an optio trends to have ... parameter ted, as ed, sor </locations> location s return dista nce from you.
  20. 20. Look up a trena Local trend WOEID d at a given [] raffi@~/: curl "http://api.twitter.com/1/trends/2487956.xml" <matching_trends type=”array”> <trends as_of=”2009-12-15T20:19:09Z”> ... <trend url=”http://search.twitter.com/search?q=Golden+Globe +nominations” query=”Golden+Globe+nominations”>Golden Globe nominations</ trend> <trend url=”http://search.twitter.com/search?q=%23somethingaintright” query=”%23somethingaintright”>#somethingaintright</trend> ... </trends> </matching_trends>
  21. 21. What’s in a name?
  22. 22. A place is a name5001 Great America Parkway, Santa Clara, CA 95054Great America Parkway and Tasman DriveThe Bay AreaSanta Clara convention centerTwitter ID 3b7dd0d93e661e18
  23. 23. how do users what to share “where”?
  24. 24. Sharing coordinatesMore aptly named “geotagging”Good for sharing photosPossibly good for talking about a specific place(e.g. store, restaurant)People don’t understand numbers and withouta map, there is a lack of contextHuge privacy implications
  25. 25. Sharing polygonsPrivacy implications arepotentially betterIf you thought sharing one pairof numbers was bad...Questions around polygondefinitionStill unable to visualize unlesson a map
  26. 26. Sharing namesHas the potential to make a connection with usersDistinguishes a “named place” from simply a “place”Inverse relationship between granularity and connectionRather large internationalization / context implications
  27. 27. Geo-place API
  28. 28. Geo-place APISupport for “names”⇢Not just coordinates⇢More contextually relevant⇢Positive privacy benefitsIncreased comlexity⇢Need to be able to look up a list of places⇢Requires a “reverse geocoder”⇢Human driven tagging and not possible to be fully automatic
  29. 29. Search [] raffi@~/: curl http://api.twitter.com/1/geo/search.json&lat=37.3&long=-121.9 ... "place_type":"neighborhood", "country_code":"US", "contained_within": [...] "full_name":"Willow Glen", "bounding_box": { "type":"Polygon", "coordinates": [[ [-121.92481908, 37.275903], [-121.88083608, 37.275903], [-121.88083608, 37.31548203], [-121.92481908, 37.31548203] ]] }, "name":"Willow Glen", "id":"46bc64ecd1da2a46", ...
  30. 30. Tweeting with a place [] raffi@~/: twurl -d "status=hey-ho&place_id=46bc64ecd1da2a46" http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <place xmlns:georss="http://www.georss.org/georss> <id>46bc64ecd1da2a46</id> <name>Willow Glen</name> <full_name>Willow Glen</full_name> <place_type>neighborhood</place_type> <url>http://api.twitter.com/1/geo/id/46bc64ecd1da2a46.json</url> <country code=”US”>United States</country> </place> ... </user>
  31. 31. Problem statementWhat do we want our system to do?
  32. 32. what do we need to build?Database of places⇢Given a real-world location, find places⇢Spatial searchMethod to store places with content⇢Per user basis⇢Per tweet basis
  33. 33. spatial lookup and index
  34. 34. as background... MySQL + GISAbility to index points and do a spatial query⇢For example, get points within a bounding rectangle⇢SELECT MBRContains(GeomFromText(‘Polygon(0 0, 0 3, 3 3, 3 0, 0 0))’), coord) FROM geometryHard to cache the spatial queryPossibly requires a DB hit on every query
  35. 35. optionsGrid / quad-tree⇢ Create a grid (possibly nested) of the entire EarthGeohash⇢ Arbitrarily precise and hierarhical spatial data referenceSpace filling curves⇢ Mapping 2D space into 1D while preserving localityR-Tree⇢ Spatial access data structure
  36. 36. Grid / Quad-Tree
  37. 37. Grid / Quad-TreeRecursively subdivide regionsTrie Structure to store“prefixes”Spatially oriented datastructure
  38. 38. Geohash
  39. 39. geohash37o18’N 121o54’W = 9q9k4Hierarchical spatial data structurePrecision encodedDistance captured⇢Nearby places (usually) share the same prefix⇢The longer the string match, the closer the places are
  40. 40. Geohash9q9k4 = 01001 / 10110 / 01001 / 10010 / 00100Longitude bits = 0010100101010⇢ -90.0 (0), -135.0 (0), -112.5 (1), -123.75 (0), -118.125 (1), -120.9375 (0), -122.34375 (0), -121.640625 (1), -121.9921875 (0), -121.81640625 (1), -121.904296875 (0), -121.8603515625(1), -121.88232421875 (0) = 121o53’WLatitude bits = 1011010100000⇢ 45.0 (1), 22.5 (0), 33.75 (1), 39.375 (1), 36.5625 (0), 37.96875 (1), 37.265625 (0), 37.617185 (1), 37.4414025 (0), 37.35351125 (0), 37.309565625 (0), 37.287692813 (0) = 37 o17’N
  41. 41. GeohashPossible to do range query in database⇢Matching based on prefix will return all the points that fit in the “grid”⇢Able to store 2D data in a 1D space
  42. 42. Space filling curve
  43. 43. Space filling curveGeneralization of geohash⇢2D to 1D mapping⇢Nearness is capturedRecurisvely can fill up spacedepending on resolution requiredFractal-like pattern can be usedto take up as much room aspossiblE
  44. 44. R-Tree
  45. 45. R-TreeHeight-balanced tree datastructure for spatial dataUsers hierarchically nestedbounding boxesnearby elements are placed in thesame node
  46. 46. Representations
  47. 47. GeoRSS / GeoJSONhttp://www.georss.org/ & http://geojson.org/<georss:point>37.3 -121.9</georss:point>{ “type”:”Point”, “coordinates”:[-121.9, 37.3]}
  48. 48. How do you store precision?“Precision” is a hard thing to encodeAccuracy can be encoded with an error radiusTwitter opts for tracking the number of decimals passed⇢140.0 != 140.00⇢DecimalTrackingFloat
  49. 49. Twitter infrastructureRuby on Rails-ish frontendScala-based services backendMySQL and soon to be Cassandra as the storeRPC to back-end or put items into queues
  50. 50. Simplified architectureR-Tree for spatial lookup⇢Data provider for front-end lookups⇢Store place object with envelope of place in R-TreeMapping from ID to place object
  51. 51. Java Toplogy Suite (JTS)http://www.vividsolutions.com/jts/jtshome.htmOpen sourceGood for representing and manipulating “geometries”Has support for fundamental geometric operations⇢ contains⇢ envelopeHas a R-Tree implementation
  52. 52. pointI nsidepointO in pol utside ygon? in pol true ygon? false
  53. 53. at (0. 0, 0.0 -- reg ) at (1. ion 1 0, 1.0 -- reg ) ion 1 -- reg at (2. ion 2 0, 2.0 -- reg ) ion 1 -- reg at (3. ion 2 0, 3.0 -- reg )at (4. ion 2 0, 4.0 -- emp ) ty
  54. 54. Java Topology Suite (JTS)Serializers and deserializers⇢Well-known text (WKT)⇢Well-known binary (WKB)⇢No GeoRSS or GeoJSON support
  55. 55. interface / RPCRockDove is a backend service⇢Data provider for front-end lookups⇢Uses some form of RPC (Thrift, Avro, etc.) to communicate with⇢Data could be cached on frontend to prevent lookupsSimple RPC interface⇢get(id)⇢containedWithin(lat, long)
  56. 56. Interface / RPCWatch those RPC queues!Fail fast and potentially throw “over capacity” messages⇢get(id) throws OverCapacity⇢containedWithin(lat, long) throws OverCapacityDistinguish between write path and read path
  57. 57. georubyhttp://georuby.rubyforge.org/Open sourceOpenGIS Simple Features Interface StandardOnly good for representing geometric entitiesGeoRuby::SimpleFeatures::Geometry::from_ewkbNo GeoJSON serializers
  58. 58. “front-end”
  59. 59. where do you acutally get location from?
  60. 60. Triangulation: Cellular200m to 1km accuracyMeasuring signal strength to cell towers with known locationsIf can only see one cellular tower, then fallback to cellular toweridentification - better than nothing, but really inaccurateRequires cellular modem, software, and lookups
  61. 61. Triangulation: WifiSub 20m accuracyWorks indoors and in urban areasDoesn’t need dedicated hardware just a 802.11 radioRelatively quick time to get a position
  62. 62. Triangulation: GPSSub 1m accuracyNeed dedicated GPS hardwareProne to multi-path confusion especially in citiesNeeds line of sight to the skyDoesn’t work well indoorsPotentially takes a few minutes to get a lock
  63. 63. AssociationIP address to geographical mappingAll done on the server sideMaybe “good” for city level⇢ Maxmind has 83% at 40km⇢ Very error prone⇢ Gets wonky when dealing with cellular connections or rather large ISPsDatabase needs to be refreshed fairlyfrequently
  64. 64. ExtractionRead the text and understand intentHard to understand whether talkingfroma place, or about a placeRunning text through a geocoder(Google, Yahoo, Geocoder.us)Parsing structured URLs and thencrawling “place pages”
  65. 65. location in browserGeolocation API Specification for JavaScriptnavigator.geolocation.getCurrentPositionDoes a callback with a position objectposition.coords has⇢ latitude and longitude⇢ accuracy⇢ other stuffSupport in Firefox 3.5, Chrome 5, Opera 10.6, and others with Google Gears
  66. 66. Follow me atQuestions? twitter.com/raffi