Giving a real-t ime geo talk at@where 20. How do you build stuff?#rtgeo.19 Apr via Twitter for iPhone from Santa Clara Convention Center 50 01 Great America Parkway Santa Clara, CA 95054 View Tweets at this place
Background  raffi@ wherehoo wherehoo ~/: cat / etc/servi 5859/udp ces | gre # WHEREHO p whereho o 5859/tcp OWherehoo (2000) # WHEREHO O⇢ “The Stuff Around You”⇢ “Wherehoo Server: An interactive location service for software agents and intelligent systems” - J.Youll, R.Krikorian⇢ In your /etc/services file!BusRadio (2004)⇢ Designed mobile computers to play media while also transmitting telemetry⇢ Looked and sounded like a radio - but really a Linux computerOneHop (2007)⇢ Bluetooth proximity-based social networking
BackgroundTwitter⇢Originally tech lead of API / Platform team⇢Built the first geo-based infrastructure before acquisition of Mixer Labs in December of 2009⇢Now lead of the Application Services group⇢Runs five teams focused on scalable infrastructure around “core” data objects ⇢Tweets, users, timelines, places, etc. ⇢Delivery, authentication, APIs, etc.
Table of contentsBackground⇢ Why are we interested in this?Twitter’s geo APIs⇢ How do we allow people to talk about place?⇢ Context around “place”Problem statement⇢ What do we want our system to do?Infrastructure⇢ How is Twitter solving this problem?
People want to talk about places
What’s happening here?Twitter’s Geo APIs
Original attemptsAdding it to the tweet⇢ Use myloc.me, et. al. to add text to the tweet⇢ Puts location “in band”⇢ Takes from the 140 charactersSetting profile level locations⇢ Set the user/location of a Twitter user⇢ There’s an API for that!⇢ Not a per-tweet basis⇢ Not intended for high frequency alterations
Geotagging APIAdding it to the tweet⇢ Per-tweet basis⇢ Out of band and pure metadata⇢ Does not take from the 140 charactersNative Twitter support⇢ Simple way to update status with location data⇢ Ability to remove geotags from your tweets en masse⇢ Using GeoRSS and GeoJSON as the encoding format⇢ Across all Twitter APIs (REST, Search, and Streaming)
geocode “latitud parameSearch e,longit radius h ude,rad as units ter take s ius” wh of mi or ere km  raffi@~/: curl "http://search.twitter.com/search.atom? geocode=40.757929%2C-73.985506%2C25km&source=foursquare" ... <title>On the way to ace now, so whenever you can make it Ill be there. (@ Port Imperial Ferry in Weehawken) http://4sq.com/ 2rq0vO</title> ... <twitter:geo> <georss:point>40.7759 -74.0129</georss:point> </twitter:geo> ...
location filtering  raffi@~/: curl "http://stream.twitter.com/1/statuses/filter.xml? locations=-74.5129,40.2759,-73.5019,41.2759" locations is a b ounding box s “long1,lat1,lon pecified by g2,lat2” and ca to 10 location n track up s that are mos square (~60 m t 1 degree iles square an to cover most d enough metropolitan areas)
Trends APIGlobal Trends⇢Analysis of “hot conversations”⇢Does not take from the 140 charactersLocation specific trends⇢Tweets being localized through a variety of means internally⇢Locations exposed over the API as WOEIDs and Twitter IDs⇢Can ask for available trends sorted by distnace
available locations  raffi@~/: curl "http://api.twitter.com/1/trends/available.xml" <locations type=”array”> <location> <woeid>2487956</woeid> <name>San Francisco</name> <placeTypeName code=”7”>Town</placeTypeName> <country type=”Country” code=”US”>United States</country> <url>http://where.yahooapis.com/v1/place/2487956</url> ke a lat and long nally ta </location> C an optio trends to have ... parameter ted, as ed, sor </locations> location s return dista nce from you.
Look up a trena Local trend WOEID d at a given  raffi@~/: curl "http://api.twitter.com/1/trends/2487956.xml" <matching_trends type=”array”> <trends as_of=”2009-12-15T20:19:09Z”> ... <trend url=”http://search.twitter.com/search?q=Golden+Globe +nominations” query=”Golden+Globe+nominations”>Golden Globe nominations</ trend> <trend url=”http://search.twitter.com/search?q=%23somethingaintright” query=”%23somethingaintright”>#somethingaintright</trend> ... </trends> </matching_trends>
What’s in a name?
A place is a name5001 Great America Parkway, Santa Clara, CA 95054Great America Parkway and Tasman DriveThe Bay AreaSanta Clara convention centerTwitter ID 3b7dd0d93e661e18
how do users what to share “where”?
Sharing coordinatesMore aptly named “geotagging”Good for sharing photosPossibly good for talking about a specific place(e.g. store, restaurant)People don’t understand numbers and withouta map, there is a lack of contextHuge privacy implications
Sharing polygonsPrivacy implications arepotentially betterIf you thought sharing one pairof numbers was bad...Questions around polygondefinitionStill unable to visualize unlesson a map
Sharing namesHas the potential to make a connection with usersDistinguishes a “named place” from simply a “place”Inverse relationship between granularity and connectionRather large internationalization / context implications
Geo-place APISupport for “names”⇢Not just coordinates⇢More contextually relevant⇢Positive privacy benefitsIncreased comlexity⇢Need to be able to look up a list of places⇢Requires a “reverse geocoder”⇢Human driven tagging and not possible to be fully automatic
Tweeting with a place  raffi@~/: twurl -d "status=hey-ho&place_id=46bc64ecd1da2a46" http://api.twitter.com/1/status/update.xml <status> <text>hey-ho</text> ... <place xmlns:georss="http://www.georss.org/georss> <id>46bc64ecd1da2a46</id> <name>Willow Glen</name> <full_name>Willow Glen</full_name> <place_type>neighborhood</place_type> <url>http://api.twitter.com/1/geo/id/46bc64ecd1da2a46.json</url> <country code=”US”>United States</country> </place> ... </user>
Problem statementWhat do we want our system to do?
what do we need to build?Database of places⇢Given a real-world location, find places⇢Spatial searchMethod to store places with content⇢Per user basis⇢Per tweet basis
spatial lookup and index
as background... MySQL + GISAbility to index points and do a spatial query⇢For example, get points within a bounding rectangle⇢SELECT MBRContains(GeomFromText(‘Polygon(0 0, 0 3, 3 3, 3 0, 0 0))’), coord) FROM geometryHard to cache the spatial queryPossibly requires a DB hit on every query
optionsGrid / quad-tree⇢ Create a grid (possibly nested) of the entire EarthGeohash⇢ Arbitrarily precise and hierarhical spatial data referenceSpace filling curves⇢ Mapping 2D space into 1D while preserving localityR-Tree⇢ Spatial access data structure
Grid / Quad-Tree
Grid / Quad-TreeRecursively subdivide regionsTrie Structure to store“prefixes”Spatially oriented datastructure
geohash37o18’N 121o54’W = 9q9k4Hierarchical spatial data structurePrecision encodedDistance captured⇢Nearby places (usually) share the same prefix⇢The longer the string match, the closer the places are
GeohashPossible to do range query in database⇢Matching based on prefix will return all the points that fit in the “grid”⇢Able to store 2D data in a 1D space
Space filling curve
Space filling curveGeneralization of geohash⇢2D to 1D mapping⇢Nearness is capturedRecurisvely can fill up spacedepending on resolution requiredFractal-like pattern can be usedto take up as much room aspossiblE
R-TreeHeight-balanced tree datastructure for spatial dataUsers hierarchically nestedbounding boxesnearby elements are placed in thesame node
How do you store precision?“Precision” is a hard thing to encodeAccuracy can be encoded with an error radiusTwitter opts for tracking the number of decimals passed⇢140.0 != 140.00⇢DecimalTrackingFloat
Twitter infrastructureRuby on Rails-ish frontendScala-based services backendMySQL and soon to be Cassandra as the storeRPC to back-end or put items into queues
Simplified architectureR-Tree for spatial lookup⇢Data provider for front-end lookups⇢Store place object with envelope of place in R-TreeMapping from ID to place object
Java Toplogy Suite (JTS)http://www.vividsolutions.com/jts/jtshome.htmOpen sourceGood for representing and manipulating “geometries”Has support for fundamental geometric operations⇢ contains⇢ envelopeHas a R-Tree implementation
pointI nsidepointO in pol utside ygon? in pol true ygon? false
at (0. 0, 0.0 -- reg ) at (1. ion 1 0, 1.0 -- reg ) ion 1 -- reg at (2. ion 2 0, 2.0 -- reg ) ion 1 -- reg at (3. ion 2 0, 3.0 -- reg )at (4. ion 2 0, 4.0 -- emp ) ty
Java Topology Suite (JTS)Serializers and deserializers⇢Well-known text (WKT)⇢Well-known binary (WKB)⇢No GeoRSS or GeoJSON support
interface / RPCRockDove is a backend service⇢Data provider for front-end lookups⇢Uses some form of RPC (Thrift, Avro, etc.) to communicate with⇢Data could be cached on frontend to prevent lookupsSimple RPC interface⇢get(id)⇢containedWithin(lat, long)
Interface / RPCWatch those RPC queues!Fail fast and potentially throw “over capacity” messages⇢get(id) throws OverCapacity⇢containedWithin(lat, long) throws OverCapacityDistinguish between write path and read path
georubyhttp://georuby.rubyforge.org/Open sourceOpenGIS Simple Features Interface StandardOnly good for representing geometric entitiesGeoRuby::SimpleFeatures::Geometry::from_ewkbNo GeoJSON serializers
where do you acutally get location from?
Triangulation: Cellular200m to 1km accuracyMeasuring signal strength to cell towers with known locationsIf can only see one cellular tower, then fallback to cellular toweridentification - better than nothing, but really inaccurateRequires cellular modem, software, and lookups
Triangulation: WifiSub 20m accuracyWorks indoors and in urban areasDoesn’t need dedicated hardware just a 802.11 radioRelatively quick time to get a position
Triangulation: GPSSub 1m accuracyNeed dedicated GPS hardwareProne to multi-path confusion especially in citiesNeeds line of sight to the skyDoesn’t work well indoorsPotentially takes a few minutes to get a lock
AssociationIP address to geographical mappingAll done on the server sideMaybe “good” for city level⇢ Maxmind has 83% at 40km⇢ Very error prone⇢ Gets wonky when dealing with cellular connections or rather large ISPsDatabase needs to be refreshed fairlyfrequently
ExtractionRead the text and understand intentHard to understand whether talkingfroma place, or about a placeRunning text through a geocoder(Google, Yahoo, Geocoder.us)Parsing structured URLs and thencrawling “place pages”