#rtgeo (Where 2.0 2011)

  • 2,034 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,034
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
0
Comments
0
Likes
7

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Real-Time Geo #rtgeo
  • 2. Who am i?
  • 3. Giving a real-t ime geo talk at@where 20. How do you build stuff?#rtgeo.19 Apr via Twitter for iPhone from Santa Clara Convention Center 50 01 Great America Parkway Santa Clara, CA 95054 View Tweets at this place
  • 4. Background [] raffi@ wherehoo wherehoo ~/: cat / etc/servi 5859/udp ces | gre # WHEREHO p whereho o 5859/tcp OWherehoo (2000) # WHEREHO O⇢ “The Stuff Around You”⇢ “Wherehoo Server: An interactive location service for software agents and intelligent systems” - J.Youll, R.Krikorian⇢ In your /etc/services file!BusRadio (2004)⇢ Designed mobile computers to play media while also transmitting telemetry⇢ Looked and sounded like a radio - but really a Linux computerOneHop (2007)⇢ Bluetooth proximity-based social networking
  • 5. BackgroundTwitter⇢Originally tech lead of API / Platform team⇢Built the first geo-based infrastructure before acquisition of Mixer Labs in December of 2009⇢Now lead of the Application Services group⇢Runs five teams focused on scalable infrastructure around “core” data objects ⇢Tweets, users, timelines, places, etc. ⇢Delivery, authentication, APIs, etc.
  • 6. Table of contentsBackground⇢ Why are we interested in this?Twitter’s geo APIs⇢ How do we allow people to talk about place?⇢ Context around “place”Problem statement⇢ What do we want our system to do?Infrastructure⇢ How is Twitter solving this problem?
  • 7. People want to talk about places
  • 8. What’s happening here?Twitter’s Geo APIs
  • 9. Original attemptsAdding it to the tweet⇢ Use myloc.me, et. al. to add text to the tweet⇢ Puts location “in band”⇢ Takes from the 140 charactersSetting profile level locations⇢ Set the user/location of a Twitter user⇢ There’s an API for that!⇢ Not a per-tweet basis⇢ Not intended for high frequency alterations
  • 10. Profile level changes [] raffi@~/: twurl -d location="San Francisco, California" http://twitter.com/account/update_location.xml <user> <id>8285392</id> <name>raffi</name> <screen_name>raffi</screen_name> <location>San Francisco, California</location> ... </user>
  • 11. Geotagging API
  • 12. Geotagging APIAdding it to the tweet⇢ Per-tweet basis⇢ Out of band and pure metadata⇢ Does not take from the 140 charactersNative Twitter support⇢ Simple way to update status with location data⇢ Ability to remove geotags from your tweets en masse⇢ Using GeoRSS and GeoJSON as the encoding format⇢ Across all Twitter APIs (REST, Search, and Streaming)
  • 13. status/update [] raffi@~/: twurl -d "status=hey-ho&lat=37.3&long=-121.9" http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <geo xmlns:georss="http://www.georss.org/georss> <georss:point>37.3 -121.9</georss:point> </geo> ... </user>
  • 14. geocode “latitud parameSearch e,longit radius h ude,rad as units ter take s ius” wh of mi or ere km [] raffi@~/: curl "http://search.twitter.com/search.atom? geocode=40.757929%2C-73.985506%2C25km&source=foursquare" ... <title>On the way to ace now, so whenever you can make it Ill be there. (@ Port Imperial Ferry in Weehawken) http://4sq.com/ 2rq0vO</title> ... <twitter:geo> <georss:point>40.7759 -74.0129</georss:point> </twitter:geo> ...
  • 15. geohose
  • 16. location filtering [] raffi@~/: curl "http://stream.twitter.com/1/statuses/filter.xml? locations=-74.5129,40.2759,-73.5019,41.2759" locations is a b ounding box s “long1,lat1,lon pecified by g2,lat2” and ca to 10 location n track up s that are mos square (~60 m t 1 degree iles square an to cover most d enough metropolitan areas)
  • 17. Trends API
  • 18. Trends APIGlobal Trends⇢Analysis of “hot conversations”⇢Does not take from the 140 charactersLocation specific trends⇢Tweets being localized through a variety of means internally⇢Locations exposed over the API as WOEIDs and Twitter IDs⇢Can ask for available trends sorted by distnace
  • 19. available locations [] raffi@~/: curl "http://api.twitter.com/1/trends/available.xml" <locations type=”array”> <location> <woeid>2487956</woeid> <name>San Francisco</name> <placeTypeName code=”7”>Town</placeTypeName> <country type=”Country” code=”US”>United States</country> <url>http://where.yahooapis.com/v1/place/2487956</url> ke a lat and long nally ta </location> C an optio trends to have ... parameter ted, as ed, sor </locations> location s return dista nce from you.
  • 20. Look up a trena Local trend WOEID d at a given [] raffi@~/: curl "http://api.twitter.com/1/trends/2487956.xml" <matching_trends type=”array”> <trends as_of=”2009-12-15T20:19:09Z”> ... <trend url=”http://search.twitter.com/search?q=Golden+Globe +nominations” query=”Golden+Globe+nominations”>Golden Globe nominations</ trend> <trend url=”http://search.twitter.com/search?q=%23somethingaintright” query=”%23somethingaintright”>#somethingaintright</trend> ... </trends> </matching_trends>
  • 21. What’s in a name?
  • 22. A place is a name5001 Great America Parkway, Santa Clara, CA 95054Great America Parkway and Tasman DriveThe Bay AreaSanta Clara convention centerTwitter ID 3b7dd0d93e661e18
  • 23. how do users what to share “where”?
  • 24. Sharing coordinatesMore aptly named “geotagging”Good for sharing photosPossibly good for talking about a specific place(e.g. store, restaurant)People don’t understand numbers and withouta map, there is a lack of contextHuge privacy implications
  • 25. Sharing polygonsPrivacy implications arepotentially betterIf you thought sharing one pairof numbers was bad...Questions around polygondefinitionStill unable to visualize unlesson a map
  • 26. Sharing namesHas the potential to make a connection with usersDistinguishes a “named place” from simply a “place”Inverse relationship between granularity and connectionRather large internationalization / context implications
  • 27. Geo-place API
  • 28. Geo-place APISupport for “names”⇢Not just coordinates⇢More contextually relevant⇢Positive privacy benefitsIncreased comlexity⇢Need to be able to look up a list of places⇢Requires a “reverse geocoder”⇢Human driven tagging and not possible to be fully automatic
  • 29. Search [] raffi@~/: curl http://api.twitter.com/1/geo/search.json&lat=37.3&long=-121.9 ... "place_type":"neighborhood", "country_code":"US", "contained_within": [...] "full_name":"Willow Glen", "bounding_box": { "type":"Polygon", "coordinates": [[ [-121.92481908, 37.275903], [-121.88083608, 37.275903], [-121.88083608, 37.31548203], [-121.92481908, 37.31548203] ]] }, "name":"Willow Glen", "id":"46bc64ecd1da2a46", ...
  • 30. Tweeting with a place [] raffi@~/: twurl -d "status=hey-ho&place_id=46bc64ecd1da2a46" http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <place xmlns:georss="http://www.georss.org/georss> <id>46bc64ecd1da2a46</id> <name>Willow Glen</name> <full_name>Willow Glen</full_name> <place_type>neighborhood</place_type> <url>http://api.twitter.com/1/geo/id/46bc64ecd1da2a46.json</url> <country code=”US”>United States</country> </place> ... </user>
  • 31. Problem statementWhat do we want our system to do?
  • 32. what do we need to build?Database of places⇢Given a real-world location, find places⇢Spatial searchMethod to store places with content⇢Per user basis⇢Per tweet basis
  • 33. spatial lookup and index
  • 34. as background... MySQL + GISAbility to index points and do a spatial query⇢For example, get points within a bounding rectangle⇢SELECT MBRContains(GeomFromText(‘Polygon(0 0, 0 3, 3 3, 3 0, 0 0))’), coord) FROM geometryHard to cache the spatial queryPossibly requires a DB hit on every query
  • 35. optionsGrid / quad-tree⇢ Create a grid (possibly nested) of the entire EarthGeohash⇢ Arbitrarily precise and hierarhical spatial data referenceSpace filling curves⇢ Mapping 2D space into 1D while preserving localityR-Tree⇢ Spatial access data structure
  • 36. Grid / Quad-Tree
  • 37. Grid / Quad-TreeRecursively subdivide regionsTrie Structure to store“prefixes”Spatially oriented datastructure
  • 38. Geohash
  • 39. geohash37o18’N 121o54’W = 9q9k4Hierarchical spatial data structurePrecision encodedDistance captured⇢Nearby places (usually) share the same prefix⇢The longer the string match, the closer the places are
  • 40. Geohash9q9k4 = 01001 / 10110 / 01001 / 10010 / 00100Longitude bits = 0010100101010⇢ -90.0 (0), -135.0 (0), -112.5 (1), -123.75 (0), -118.125 (1), -120.9375 (0), -122.34375 (0), -121.640625 (1), -121.9921875 (0), -121.81640625 (1), -121.904296875 (0), -121.8603515625(1), -121.88232421875 (0) = 121o53’WLatitude bits = 1011010100000⇢ 45.0 (1), 22.5 (0), 33.75 (1), 39.375 (1), 36.5625 (0), 37.96875 (1), 37.265625 (0), 37.617185 (1), 37.4414025 (0), 37.35351125 (0), 37.309565625 (0), 37.287692813 (0) = 37 o17’N
  • 41. GeohashPossible to do range query in database⇢Matching based on prefix will return all the points that fit in the “grid”⇢Able to store 2D data in a 1D space
  • 42. Space filling curve
  • 43. Space filling curveGeneralization of geohash⇢2D to 1D mapping⇢Nearness is capturedRecurisvely can fill up spacedepending on resolution requiredFractal-like pattern can be usedto take up as much room aspossiblE
  • 44. R-Tree
  • 45. R-TreeHeight-balanced tree datastructure for spatial dataUsers hierarchically nestedbounding boxesnearby elements are placed in thesame node
  • 46. Representations
  • 47. GeoRSS / GeoJSONhttp://www.georss.org/ & http://geojson.org/<georss:point>37.3 -121.9</georss:point>{ “type”:”Point”, “coordinates”:[-121.9, 37.3]}
  • 48. How do you store precision?“Precision” is a hard thing to encodeAccuracy can be encoded with an error radiusTwitter opts for tracking the number of decimals passed⇢140.0 != 140.00⇢DecimalTrackingFloat
  • 49. Twitter infrastructureRuby on Rails-ish frontendScala-based services backendMySQL and soon to be Cassandra as the storeRPC to back-end or put items into queues
  • 50. Simplified architectureR-Tree for spatial lookup⇢Data provider for front-end lookups⇢Store place object with envelope of place in R-TreeMapping from ID to place object
  • 51. Java Toplogy Suite (JTS)http://www.vividsolutions.com/jts/jtshome.htmOpen sourceGood for representing and manipulating “geometries”Has support for fundamental geometric operations⇢ contains⇢ envelopeHas a R-Tree implementation
  • 52. pointI nsidepointO in pol utside ygon? in pol true ygon? false
  • 53. at (0. 0, 0.0 -- reg ) at (1. ion 1 0, 1.0 -- reg ) ion 1 -- reg at (2. ion 2 0, 2.0 -- reg ) ion 1 -- reg at (3. ion 2 0, 3.0 -- reg )at (4. ion 2 0, 4.0 -- emp ) ty
  • 54. Java Topology Suite (JTS)Serializers and deserializers⇢Well-known text (WKT)⇢Well-known binary (WKB)⇢No GeoRSS or GeoJSON support
  • 55. interface / RPCRockDove is a backend service⇢Data provider for front-end lookups⇢Uses some form of RPC (Thrift, Avro, etc.) to communicate with⇢Data could be cached on frontend to prevent lookupsSimple RPC interface⇢get(id)⇢containedWithin(lat, long)
  • 56. Interface / RPCWatch those RPC queues!Fail fast and potentially throw “over capacity” messages⇢get(id) throws OverCapacity⇢containedWithin(lat, long) throws OverCapacityDistinguish between write path and read path
  • 57. georubyhttp://georuby.rubyforge.org/Open sourceOpenGIS Simple Features Interface StandardOnly good for representing geometric entitiesGeoRuby::SimpleFeatures::Geometry::from_ewkbNo GeoJSON serializers
  • 58. “front-end”
  • 59. where do you acutally get location from?
  • 60. Triangulation: Cellular200m to 1km accuracyMeasuring signal strength to cell towers with known locationsIf can only see one cellular tower, then fallback to cellular toweridentification - better than nothing, but really inaccurateRequires cellular modem, software, and lookups
  • 61. Triangulation: WifiSub 20m accuracyWorks indoors and in urban areasDoesn’t need dedicated hardware just a 802.11 radioRelatively quick time to get a position
  • 62. Triangulation: GPSSub 1m accuracyNeed dedicated GPS hardwareProne to multi-path confusion especially in citiesNeeds line of sight to the skyDoesn’t work well indoorsPotentially takes a few minutes to get a lock
  • 63. AssociationIP address to geographical mappingAll done on the server sideMaybe “good” for city level⇢ Maxmind has 83% at 40km⇢ Very error prone⇢ Gets wonky when dealing with cellular connections or rather large ISPsDatabase needs to be refreshed fairlyfrequently
  • 64. ExtractionRead the text and understand intentHard to understand whether talkingfroma place, or about a placeRunning text through a geocoder(Google, Yahoo, Geocoder.us)Parsing structured URLs and thencrawling “place pages”
  • 65. location in browserGeolocation API Specification for JavaScriptnavigator.geolocation.getCurrentPositionDoes a callback with a position objectposition.coords has⇢ latitude and longitude⇢ accuracy⇢ other stuffSupport in Firefox 3.5, Chrome 5, Opera 10.6, and others with Google Gears
  • 66. Follow me atQuestions? twitter.com/raffi