Geographic Information Retrieval From Disparate Data Sources

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    Geographic Information Retrieval From Disparate Data Sources - Presentation Transcript

    1. Geographic Information Retrieval from Disparate Data Sources Ian Turton, Anuj Jaiswal, Mark Gahegan GeoVISTA Center, School of Geography, Pennsylvania State University ijt1,arj135,mng1@psu.edu
    2. Summary
      • Information Retrieval?
      • Geographic?
      • Disparate Data Sources?
      • Does it work?
      • Semantics and Ontologies, do they help?
      • Further work?
      • Conclusions
    3. Information Retrieval
      • Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertextually-networked databases such as the World Wide Web.
      • Wikipedia
    4. OR more simply
      • Is there some way I can avoid reading all 19,000 of those articles about measles and still sound like I know what I’m talking about at the next conference?
    5. Geography
      • Well we all know that geography is important.
      • Depending on who you ask more than 80% of all information contains a geographic element.
      • Explicit:
        • Has a map coordinate
      • Implicit:
        • Has a place name
    6. Disparate Data Sources
      • Large collections of text containing implicit geographic references about Avian Flu and Measles:
        • PubMed abstracts
        • News Feeds (RSS)
        • WHO incident reports
    7. Building the System
      • Acquire data
      • Extract geographic information
      • Extract semantic and ontological information
      • Present in a form that allows easy exploration by users.
    8. Acquire Data
      • First extract abstracts from PubMed
      • http://eutils.ncbi.nlm.nih.gov/entrez/eutils/
      • ((avian OR bird) AND (influenza OR flu)) OR H5N1
      • Returns a structured XML file with citation data and abstract for selected papers.
      • Process XML into PostGIS database
    9. Extract Geographic Entities
      • Use FactXtractor (http://julian.mine.nu/snedemo.html)
      • Uses GATE to detect and extract Named Entities and Entity Relationships
      • Usually finds People , Places and Organizations
      • Returned as an OWL encoded ontology
      • In this case we just make use of places
      • <rdf:RDF xml:base=&quot;http://ist.psu.edu/sna/ontology#&quot;>
      • <owl:Class rdf:ID=&quot;Location&quot;/>
      • <owl:Class rdf:ID=&quot;Organization&quot;/>
      • <owl:Class rdf:ID=&quot;Person&quot;/>
      • <owl:DatatypeProperty rdf:ID=&quot;counts&quot;/>
      • <Location rdf:ID=&quot;Africa&quot;>
      • <counts>1</counts>
      • <mentioned_in>
      • <_Article rdf:ID=&quot;InputString0&quot;>
      • </_Article>
      • </mentioned_in>
      • </Location>
      • <Location rdf:ID=&quot;Asia&quot;>
      • <counts>1</counts>
      • <mentioned_in rdf:resource=&quot;#InputString0&quot;/>
      • </Location>
      • <Location rdf:ID=&quot;Vietnam&quot;/>
      • <Location rdf:ID=&quot;South_East&quot;/>
      • <Location rdf:ID=&quot;Europe&quot;>
      • <counts>1</counts>
      • <mentioned_in rdf:resource=&quot;#InputString0&quot;/>
      • </Location>
      • </rdf:RDF>
    10. GeoLocation
      • Converting a place name into a location
      • State College, PA -> (40.7934, -77.86)
      • Call the GeoNames web service to carry out a gazetteer lookup on the name.
    11. Disambiguation
      • Which London did you mean?
    12. Types of Ambiguity
      • Geo/Geo
        • London, UK vs London, Ontario
        • South Wales, UK vs New South Wales, Au
        • Paris, France vs Paris, Texas
      • Geo/Non Geo
        • Washington, DC vs George Washington
        • Van, Turkey vs delivery van
        • West Nile, Egypt vs West Nile Virus
      • Sort of Ambiguous
        • avian A/Mallard/Pennsylvania/10218/84 (H5N2) influenza virus strains
    13. Disambiguating Multiple Places
      • Choose A if A is a Political Entity and B is not,
      • Choose B if B is a Political Entity and A is not,
      • Choose A if A is a Region and B is not,
      • Choose B if B is a Region and A is not,
      • Choose A if A is an Ocean and B is not,
      • Choose B if B is an Ocean and A is not,
      • Choose A if A is a Populated Place and B is not,
      • Choose B if B is a Populated Place and A is not,
      • Choose A if A's population is greater than B's,
      • Choose B if B's population is greater than A's,
      • Choose A if A is an Administrative Area and B is not,
      • Choose B if B is an Administrative Area and A is not,
      • Choose A if A is a Water Feature and B is not,
      • Choose B if B is a Water Feature and A is not,
      • Choose A.
    14. Solving Geo/Non Geo Ambiguity
      • Stop word lists – hand crafted by experience
      • Province, valley, way, hill, Children, Children's, new, cross, red, clinic, general, côte, ii, iii, bas, pays, chem, northern region, eastern region, central region, southern region, region, off, square, census, islands, city, district, park, USA, State, Virology, Microbiology, Immunology, Medical, Science, Employee, Surveillance, Disease, Biochemistry, Prevention, for, and, mail, natl, dept, dev, agr, Rural, inst, mil, med, coll, Internal, Publ, Bur, Hosp, Jude, Childrens, Chai, yan, Virol, Dis, Div, Enter, Cent, lab, Univ, res, ist, prevent, roc, prod, Roche, vet, castle, peak, stat, garden, Atl, Anim, mar, queen, central, Director, LAT, AC-EIA, register, north, east, south, west, northern, southern, eastern, western
    15. Concept Extraction
      • Automatically extract keywords or tags from article abstracts by
        • Selecting keywords which exceed a preset frequency.
        • Passing text through Yahoo! tagging service, returns key phrases using latent semantic indexing.
    16. Store everything in a big database
      • Open up PostGIS and stuff in all the data keyed by article id.
        • Article
          • Citation data – authors, title, abstract, journal, volume, issue, etc
        • Places
          • Name, Country, Latitude, Longitude, etc
        • Concepts
          • Key phrase or word
    17. Provide Intuitive Front End for Users
      • Tag Cloud
        • Popularized on many web 2.0 sites such as Flickr, del.icio.us, citeUlike.org etc.
    18. Place Cloud
    19. Author Cloud
    20. Choose a tag
    21. Choose a place
    22. Select a child of the place
    23. Tag limited by place
    24. Implementation
      • Initially implemented as a java servlet using JDBC link to PostGIS
      • Reimplemented using Ruby on Rails in last week using ActiveRecord to PostGIS
      • In page mapping OpenLayers WMS map client to GeoServer over PostGIS.
    25. Semantics and Ontologies
      • Geographic ontology is provided by GeoNames semantic web service.
      • A query allows the look up of parent, children and nearby features for most features.
      • Results are cached in PostGIS database to save processing time and load on server.
    26. WordNet Ontology
    27. Conclusions
      • It is possible to construct a useful system to ingest arbitrary text and extract place names.
      • A sufficiently good automated location disambiguation system can be built for a specific domain to process 80-90% of places correctly.
      • Semantic expansion and narrowing of searches appears useful in early experiments.
      • Providing users with a familiar, and highly linked, interface seems to aid exploration of the document space.

    + Penn State UniversityPenn State University, 2 years ago

    custom

    944 views, 2 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 944
      • 944 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 32
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories