(Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)

  • 5,236 views
Uploaded on

"(Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)"; presented on March 10th. 2010 at the London Twitter DevNest 7, at the Sun Customer Briefing Centre in London.

"(Almost) Everything You Ever Wanted To Know About Geo (with WOEIDs)"; presented on March 10th. 2010 at the London Twitter DevNest 7, at the Sun Customer Briefing Centre in London.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,236
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
27
Comments
0
Likes
5

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. London Twitter #devnest 7, March 2010
    (Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…
    Gary Gale, Yahoo! Geo Technologies
  • 2. the agenda
    louisvolant on Flickr : http://www.flickr.com/photos/27048731@N03/4003756731/
  • 3. the agenda
    3
  • 13. 4
    KELLYLEEBARRETT on Flickr : http://www.flickr.com/photos/kellylee/4177529745/
  • 14. 5
    Gary Gale on Flickr : http://www.flickr.com/photos/vicchi/4414198544/
  • 15. WOEIDs
    stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/
  • 16. 44418
    12589342
  • 17. 8
    David Armano on Flickr : http://www.flickr.com/photos/7855449@N02/3158864420/
  • 18. some background
    blakophoto on Flickr : http://www.flickr.com/photos/cleveralias/3158810304/
  • 19. let’s talk about geocoding
    inF! on Flickr : http://www.flickr.com/photos/nathanbarrow/3339245753/
  • 20. geocoding is the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes (postal codes).
  • 21. reverse geocoding is the process of back (reverse) coding of a point location (latitude, longitude) to a readable address or place name.
  • 22. noway on Flickr : http://www.flickr.com/photos/noway/78606643/
  • 23. what?
    where?
  • 24.
  • 25. what? (maybe) where? (maybe)
  • 26. this is not geocoding, this is geoparsing
    szim90 on Flickr : http://www.flickr.com/photos/szim90/272670479/
  • 27. geoparsing is the process of assigning geographic identifiers (e.g., codes or geographic coordinates expressed as latitude-longitude) to textual words and phrases that occur in unstructured content.
  • 28. cheap flights from london to paris in october
  • 29. 20
    “I’m sorry dave; I can’t find that place”
  • 30. 21
    web servers
    Jamison Judd on Flickr : http://www.flickr.com/photos/jamisonjudd/2433102356/
  • 31. 22
    51° 30' 50.0868", 0° 7' 42.8514"
    (125 Shaftesbury Avenue, London, UK)
    163.1.117.210
    (Oxford, UK)
    20442/6015
    (Brest, France)
    #C5243B212
    (Wilmington, Delaware, USA)
  • 32. 23
    web surfers
    National Library NZ on The Commons on Flickr : http://www.flickr.com/photos/nationallibrarynz_commons/3326203787/
  • 33. 24
    The West End
    Downtown
    The Shops
    The High Street
  • 34. 25
    The Online World
    Formal, normalised, structured, regular
    The Real World
    “We Are Here”
    The Offline World
    Informal, eccentric, bizarre, irregular
  • 35. cheap flights from london to paris in october
    1) Tokenize
    London
    2) Remove common words
    3) Remove words not in gazetteer
    Paris
  • 36. “in”… India?
    bodhitjal on Flickr : http://www.flickr.com/photos/bodhithaj/361857780/
  • 37. “in”… Indiana?
    OZinOH on Flickr : http://www.flickr.com/photos/75905404@N00/505688957/
  • 38. “to”… Tonga?
    j_buswell on Flickr : http://www.flickr.com/photos/j_buswell/3683814556/
  • 39. language
    Jovike on Flickr : http://www.flickr.com/photos/jvk/19894053/
  • 40. Thé?
    a town in Burgundy, France
    IN?
    ISO 3166-1 Alpha-2
    for India
    To?
    a town in Ibaraki
    prefecture, Japan
    Is?
    another town in Burgundy, France
    IT?
    ISO 3166-1 Alpha-2 for Italy
    AND?
    ISO 31660-1 Alpha-3
    for Andorra
    You?
    a town in Yatenga, Burkina Faso
    Å?
    a town in NorlandFylke,
    Norway
    That?
    a town in Rajasthan, India
  • 41. may cause frustration
    paloaltosoftware on Flickr : http://www.flickr.com/photos/paloalto/3038701605/
  • 42. disambiguation
    KoenVereeken on Flickr : http://www.flickr.com/photos/koenvereeken/2088902012/
  • 43. this is peru …
  • 44. and so is this (in argentina)
  • 45. and so is this (in bolivia)
  • 46. semantics required
    dullhunk on Flickr : http://www.flickr.com/photos/dullhunk/3525013547/
  • 47. Hilton, Paris
    Paris Hilton
  • 48. London
    Jack London
  • 49. Panama
    Panama Hats
  • 50. who uses official names anyway?
    takomabibelot on Flickr : http://www.flickr.com/photos/takomabibelot/234301712/
  • 51. MOMA NYC
    Museum of Modern Art, New York
    paulamoya on Flickr : http://www.flickr.com/photos/40351463@N00/745012335/
  • 52. Millennium Wheel
    London Eye
    hismith83 on Flickr : http://www.flickr.com/photos/hismith83/200701961/
  • 53. San Francisco
    City and County of San Francisco
    SF Brit on Flickr : http://www.flickr.com/photos/cnbattson/192162591/
  • 54. WOEIDs (redux)
    stevefaeembra on Flickr : http://www.flickr.com/photos/stevefaeembra/3567750853/
  • 55. 44418
    12589342
  • 56. 51° 30' 50.0868", 0° 7' 42.8514"
  • 57. Unique
    Permanent
    Global
    Language Neutral
    London = Londra = Londres = ロンドン
    United States = États-Unis = StatiUniti = 미국
    Ensures that geography can be employed consistently and globally
    straup on Flickr : http://www.flickr.com/photos/straup/3504862388/
  • 58. GeoPlanet
    A Global Location Repository
    Names + Geometry +Topology
    WOEIDs for
    • cities and towns
    • 59. postal codes, airports
    • 60. admin regions, time zones
    • 61. telephone code areas
    • 62. marketing areas
    • 63. points of interest
    • 64. colloquial areas
    • 65. neighbourhoods
    woodleywonderworks on Flickr : http://www.flickr.com/photos/wwworks/2222523978/
  • 66. Continents
    Countries
    Counties
    Regions
    Colloquials
    Targeting Zones
    Postal Codes
    Area Codes
    Boroughs
    Neighbourhoods
    POIs
  • 67. United Kingdom
    23424975
    VereinigtesKönigreich
    Europe
    24865675
    Country
    Continent
    Royaume Uni
    England
    24554868
    Great Britain
    28298150
    Country
    Colloquial
    イギリス
    Warwickshire
    12602190
    Worcestershire
    12602192
    County
    County
    Earth
    1
    Supername
    Stratford-on-Avon
    12696101
    District
    Stratford-upon-Avon
    36424
    Warwick
    39228
    Town
    Town
    CV37
    26787646
    ZIP
  • 68. http://engineering.twitter.com/2010/02/woeids-in-twitters-trends.html
  • 69. http://isithackday.com/hacks/placemaker/tweet-locations.php
  • 70. http://wherein.yahooapis.com/v1/document
  • 71. unlock your api
    https://developer.apps.yahoo.com/wsregapp/
    sam.d on Flickr : http://www.flickr.com/photos/samd/65693717/
  • 72. Placemaker Parameters
    appid
    100% mandatory
    inputLanguage
    en-US, fr-CA, …
    outputType
    XML or RSS
    documentContent
    text to geoparse
    documentTitle
    optional title
    documentURL
    URL to geoparse
    documentType
    MIME type of doc
    autoDisambiguate
    remove duplicates
    focusWoeid
    filter around a WOEID
  • 73. // POST to Placemaker
    $ch = curl_init();
    define('POSTURL', 'http://wherein.yahooapis.com/v1/document');
    define('POSTVARS', 'appid='. $key.'&documentContent='.urlencode($content).
    '&documentType=text/plain&outputType=xml'.$lang);
    $ch = curl_init(POSTURL);
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, POSTVARS);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $placemaker = curl_exec($ch);
    curl_close($ch);
  • 74. places
    that_james on Flickr : http://www.flickr.com/photos/that_james/496797309/
  • 75. <placeDetails>
    <place>
    <woeId>44418</woeId>
    <type>Town</type>
    <name>
    <![CDATA[London, England, GB]]>
    </name>
    <centroid>
    <latitude>51.5063</latitude>
    <longitude>-0.12714</longitude>
    </centroid>
    </place>
    <matchType>0</matchType>
    <weight>1</weight>
    <confidence>10</confidence>
    </placeDetails>
    One place for WOEID 44418
  • 76. references
    misterbisson on Flickr : http://www.flickr.com/photos/maisonbisson/117720946/
  • 77. <reference>
    <woeIds>44418</woeIds>
    <start>1079</start>
    <end>1089</end>
    <isPlaintextMarker>1</isPlaintextMarker>
    <text><![CDATA[London, UK]]></text>
    <type>plaintext</type>
    <xpath><![CDATA[]]></xpath>
    </reference>
    <reference>
    <woeIds>44418</woeIds>
    <start>1116</start>
    <end>1126</end>
    <isPlaintextMarker>1</isPlaintextMarker>
    <text><![CDATA[London, UK]]></text>
    <type>plaintext</type>
    <xpath><![CDATA[]]></xpath>
    </reference>
    Two references for WOEID 44418
    Two references for WOEID 44418
  • 78. // turn into an PHP object and loop over the results
    $places = simplexml_load_string($placemaker, 'SimpleXMLElement',
    LIBXML_NOCDATA);
    if($places->document->placeDetails){
    $foundplaces = array();
    // create a hashmap of the places found to mix with
    // the references found
    foreach($places->document->placeDetails as $p){
    $wkey = 'woeid'.$p->place->woeId;
    $foundplaces[$wkey]=array(
    'name'=>str_replace(', ZZ','',$p->place->name).'',
    'type'=>$p->place->type.'',
    'woeId'=>$p->place->woeId.'',
    'lat'=>$p->place->centroid->latitude.'',
    'lon'=>$p->place->centroid->longitude.'’
    );
    }
    }
  • 79. // loop over references and filter out duplicates
    $refs = $places->document->referenceList->reference;
    $usedwoeids = array();
    foreach($refs as $r){
    foreach($r->woeIds as $wi){
    if(in_array($wi,$usedwoeids)){
    continue;
    } else {
    $usedwoeids[] = $wi.'';
    }
    $currentloc = $foundplaces["woeid".$wi];
    if($r->text!='' && $currentloc['name']!='' &&
    $currentloc['lat']!='' && $currentloc['lon']!=''){
    $text = preg_replace('/s+/',' ',$r->text);
    $name = addslashes(str_replace(', ZZ’,
    $currentloc['name']));
    $desc = addslashes($text);
    $lat = $currentloc['lat'];
    $lon = $currentloc['lon'];
    $class = stripslashes($desc)."|$name|$lat|$lon";
    $placelist.= "<li>".
    }
    }
  • 80. http://www.vicchi.org/speaking
  • 81.
  • 82. the internet is broken
    Nesster on Flickr : http://www.flickr.com/photos/nesster/3168425434/
  • 83. // load the URL, using YQL to filter the HTML
    // and fix UTF-8 nasties
    $url = 'http://www.vicchi.org/speaking';
    $realurl = 'http://query.yahooapis.com/v1/public/yql’.
    '?q=select%20*%20'.
    'from%20html%20where%20url%20%3D%20%22'.
    urlencode($url).'%22&format=xml';
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $realurl);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $c = curl_exec($ch);
    curl_close($ch);
    if(strstr($c,'<')){
    $c = preg_replace("/.*<results>|</results>.*/",'',$c);
    $c = preg_replace("/<?xml version="1.0"".
    " encoding="UTF-8"?>/",'',$c);
    $c = strip_tags($c);
    $c = preg_replace("/[ ? ]+/"," ",$c);
    }
  • 84. minor annoyances
    swooshthesnail on Flickr : http://www.flickr.com/photos/swooshthesnail/3281681399/
  • 85. 50,000 bytes
    ASurroca on Flickr : http://www.flickr.com/photos/asurroca/147049402/
  • 86. X
    no json
  • 87. post not get
    sludgegulper on Flickr : http://www.flickr.com/photos/sludgeulper/2645478209/
  • 88. http://where.yahooapis.com/v1/
  • 89. collections
    bradman334 on Flickr : http://www.flickr.com/photos/bradman334/3402569690/
  • 90. collections
    • lists of related resources, such as places
    • 91. e.g. find all places called “london”
    http://where.yahooapis.com/v1/places.q('london');count=0?appid=[your id]
    • e.g. find the most likely place called “london”
    http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]
    74
  • 92. <places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng"
    yahoo:start="0" yahoo:count="1" yahoo:total="22">
    <place yahoo:uri="http://where.yahooapis.com/v1/place/44418" xml:lang="en-us">
    <woeid>44418</woeid>
    <placeTypeName code="7">Town</placeTypeName>
    <name>London</name>
    <country type="Country" code="GB">United Kingdom</country>
    <admin1 type="Country" code="GB-ENG">England</admin1>
    <admin2 type="County" code="">Greater London</admin2>
    <admin3></admin3>
    <locality1 type="Town">London</locality1>
    <locality2></locality2>
    <postal></postal>
    <centroid>
    <latitude>51.506321</latitude><longitude>-0.127140</longitude>
    </centroid>
    <boundingBox>
    <southWest><latitude>51.261318</latitude><longitude>-0.563000</longitude></southWest>
    <northEast><latitude>51.686031</latitude><longitude>0.280360</longitude></northEast>
    </boundingBox>
    </place>
    </places>
  • 93. resources
    joshuarichards on Flickr : http://www.flickr.com/photos/joshywoshywoo/124671979/
  • 94. resources
    • unique objects that contain multiple attributes, such as a place
    • 95. e.g. get attributes for WOEID 44418
    http://where.yahooapis.com/v1/place/44418?appid=[your id]
    • e.g. find the most likely place called “london”
    http://where.yahooapis.com/v1/places.q('london’)?appid=[your id]
    77
  • 96. resources
    • unique objects that contain multiple attributes, such as a place
    • 97. e.g. get places related to WOEID 44418
    http://where.yahooapis.com/v1/place/44418/relation?appid=[your id]
    • parent, ancestors, belongsto, neighbours, siblings, children
    78
  • 98. <?xml version="1.0" encoding="UTF-8"?><places xmlns="http://where.yahooapis.com/v1/schema.rng" xmlns:yahoo="http://www.yahooapis.com/v1/base.rng" yahoo:start="0" yahoo:count="10" yahoo:total="34">
    <place yahoo:uri="http://where.yahooapis.com/v1/place/12695806" xml:lang="en-us">
    <woeid>12695806</woeid>
    <placeTypeName code="10">Local Administrative Area</placeTypeName>
    <name>City of London</name>
    </place>
    <place yahoo:uri="http://where.yahooapis.com/v1/place/12695807" xml:lang="en-us">
    <woeid>12695807</woeid>
    <placeTypeName code="10">Local Administrative Area</placeTypeName>
    <name>London Borough of Camden</name>
    </place>
    <place yahoo:uri="http://where.yahooapis.com/v1/place/12695808" xml:lang="en-us">
    <woeid>12695808</woeid>
    <placeTypeName code="10">Local Administrative Area</placeTypeName>
    <name>London Borough of Hackney</name>
    </place>

    </places>
  • 99. Far more than you could ever want
    http://delicious.com/codepo8/geotoys
  • 100. never work with children, animals or live demos
    elephipelephi on Flickr : http://www.flickr.com/photos/elephipelephi/1493013250/
  • 101. not taking notes?
    selva on Flickr : http://www.flickr.com/photos/selva/24604141/
  • 102. London Twitter #devnest 7, March 2010
    (Almost) Everything You Ever WantedTo Know About Geo (with WOEIDs)…
    Gary Gale, Yahoo! Geo Technologies
    http://slideshare.net/vicchi
  • 103. thanks for listening
    Paul Keleher on Flickr : http://www.flickr.com/photos/pkeleher/1658311814/
  • 104. www.ygeoblog.com
    twitter.com/vicchi
    twitter.com/yahoogeo