A Web Based Tool For the Detection and Analysis of Avian Influenza Outbreaks From Internet News Sources Ian Turton  and An...
Flight?
Summary <ul><li>Who we are? </li></ul><ul><li>Why we did it? </li></ul><ul><li>What is Avian Flu? </li></ul><ul><li>What w...
Who we are? <ul><li>Ian  </li></ul><ul><ul><li>Senior Research Associate in GeoVISTA Center </li></ul></ul><ul><ul><li>E-E...
What we did? <ul><li>Andrew needed a project for Ian’s course on web mapping, and later for his capstone project (like a d...
What is Avian Flu? <ul><li>Avian flu or Bird flu is a virus  </li></ul><ul><li>Most scary strain is H5N1 but there are man...
What we did? <ul><li>Designed and built a system to automatically read internet news articles and map them for us so we co...
How we did it? <ul><li>Data sources </li></ul><ul><li>Data processing tools </li></ul><ul><li>GeoCoding tools </li></ul><u...
Data Sources <ul><li>Official Avian Flu sites </li></ul><ul><ul><li>WHO  </li></ul></ul><ul><ul><li>PROMED </li></ul></ul>...
Why does this work? <ul><li>Media panic/interest leads to widespread reporting of any avian flu story. </li></ul><ul><li>U...
What is RSS? <ul><li>Really Simple Syndication </li></ul><ul><li>RDF Site Summary </li></ul><ul><li>A standardized XML fil...
Finding the geography <ul><li>Step one extract the place names, named entity extraction </li></ul><ul><ul><li>Custom tools...
Well that can’t be too hard?
Web Mapping Server <ul><li>Open Web Mapping Standards from the OGC (allows others to use our data). </li></ul><ul><li>Open...
Mapping Client <ul><li>Remember our end users are epidemiologists not GIS users so stick with a web browser as client. </l...
The Map Choice of background layers Choice of feeds http://www.experimental.geovista.psu.edu/andrew/html/avian_influenza_m...
Zoom and Pan
Time Line <ul><li>We are also interested in change over time. </li></ul><ul><li>Added SIMILE Timeline from MIT </li></ul><...
Link to external pages
Query the map
Did it work? <ul><li>Yes, </li></ul><ul><li>Well mostly,  </li></ul><ul><li>Well some of the time! </li></ul><ul><li>We ca...
What didn’t work? <ul><li>News sources and even medical feeds contain too many items that are about avian flu in a general...
What will we do next? <ul><li>Improved selection of RSS items </li></ul><ul><li>Bayesian classifier  </li></ul><ul><ul><li...
What will we do next? <ul><li>Continue to improve the GeoCoder </li></ul><ul><ul><li>Better disambiguation algorithms. </l...
Conclusions <ul><li>It is possible to construct an online automated system that can read news articles from professional a...
Upcoming SlideShare
Loading in …5
×

A Web Based Tool For the Detection and Analysis of Avian Influenza Outbreaks From Internet News Sources

1,065 views

Published on

Paper presented at AutoCarto 2008 - Shepherdstown WV

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,065
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Web Based Tool For the Detection and Analysis of Avian Influenza Outbreaks From Internet News Sources

  1. 1. A Web Based Tool For the Detection and Analysis of Avian Influenza Outbreaks From Internet News Sources Ian Turton and Andrew Murdoch GeoVISTA Center Penn State University
  2. 2. Flight?
  3. 3. Summary <ul><li>Who we are? </li></ul><ul><li>Why we did it? </li></ul><ul><li>What is Avian Flu? </li></ul><ul><li>What we did? </li></ul><ul><li>How we did it? </li></ul><ul><li>Did it work? </li></ul><ul><li>What will we do next? </li></ul>
  4. 4. Who we are? <ul><li>Ian </li></ul><ul><ul><li>Senior Research Associate in GeoVISTA Center </li></ul></ul><ul><ul><li>E-Education Fellow in Dutton E-Education Institute. </li></ul></ul><ul><li>Andrew </li></ul><ul><ul><li>MGIS Student (graduated in Summer 2008) </li></ul></ul><ul><ul><li>GIS Developer at ArcBridge Consulting and Training </li></ul></ul>
  5. 5. What we did? <ul><li>Andrew needed a project for Ian’s course on web mapping, and later for his capstone project (like a dissertation). </li></ul><ul><li>Ian had an interest in extracting geographic information from unstructured text. </li></ul><ul><li>Picked the spread of Avian Influenza and how to map it automatically from news reports. </li></ul>
  6. 6. What is Avian Flu? <ul><li>Avian flu or Bird flu is a virus </li></ul><ul><li>Most scary strain is H5N1 but there are many others. </li></ul><ul><li>~60% death rate in humans. </li></ul><ul><li>Currently no (or very limited) human to human transmission. </li></ul>Picture by Quiplash ! CCbyA
  7. 7. What we did? <ul><li>Designed and built a system to automatically read internet news articles and map them for us so we could gain a better understanding of how avian flu is spreading on a day to day basis. </li></ul><ul><li>Set it running to see how it did </li></ul><ul><li>Tweaked it a bit as we saw how it worked </li></ul>
  8. 8. How we did it? <ul><li>Data sources </li></ul><ul><li>Data processing tools </li></ul><ul><li>GeoCoding tools </li></ul><ul><li>Web Mapping tools </li></ul><ul><ul><li>Server </li></ul></ul><ul><ul><li>Client </li></ul></ul>
  9. 9. Data Sources <ul><li>Official Avian Flu sites </li></ul><ul><ul><li>WHO </li></ul></ul><ul><ul><li>PROMED </li></ul></ul><ul><li>Internet News sites </li></ul><ul><ul><li>Google News </li></ul></ul><ul><ul><li>Feedburner </li></ul></ul><ul><li>Collected as RSS feeds </li></ul>
  10. 10. Why does this work? <ul><li>Media panic/interest leads to widespread reporting of any avian flu story. </li></ul><ul><li>Use of medical blogs like PROMED also helps overcome government restrictions on reporting. </li></ul>Pictures: ianstacey, quiplash, Incessantflux CCbyA
  11. 11. What is RSS? <ul><li>Really Simple Syndication </li></ul><ul><li>RDF Site Summary </li></ul><ul><li>A standardized XML file for passing information about web log (blog) updates. </li></ul><ul><li>You normally view RSS feeds in a feed reader </li></ul><ul><li>We wrote programs to read for us. </li></ul>
  12. 12. Finding the geography <ul><li>Step one extract the place names, named entity extraction </li></ul><ul><ul><li>Custom tools </li></ul></ul><ul><ul><li>Reuters’ Calais system </li></ul></ul><ul><ul><li>MetaCarta </li></ul></ul><ul><ul><li>GeoNames.org </li></ul></ul><ul><li>GeoCode the places, disambiguate London, Washington etc </li></ul><ul><ul><li>Custom tools </li></ul></ul><ul><ul><li>MetaCarta </li></ul></ul><ul><ul><li>GeoNames.org </li></ul></ul>
  13. 13. Well that can’t be too hard?
  14. 14. Web Mapping Server <ul><li>Open Web Mapping Standards from the OGC (allows others to use our data). </li></ul><ul><li>Open Source tools (we’re a poor university). </li></ul><ul><li>Store the data points and news text in PostGIS (free spatial database). </li></ul><ul><li>GeoServer to serve maps from the DB to web (and desktop) clients. </li></ul>
  15. 15. Mapping Client <ul><li>Remember our end users are epidemiologists not GIS users so stick with a web browser as client. </li></ul><ul><li>OpenLayers (www.openlayers.org) </li></ul><ul><ul><li>JavaScript library that implements the OGC WMS and WFS standards our server uses. </li></ul></ul><ul><ul><li>Allows rapid construction of an interactive web map by relative novice developers. </li></ul></ul><ul><ul><li>The finished map looks a lot like a Google map so users can use it easily. </li></ul></ul>
  16. 16. The Map Choice of background layers Choice of feeds http://www.experimental.geovista.psu.edu/andrew/html/avian_influenza_map.html
  17. 17. Zoom and Pan
  18. 18. Time Line <ul><li>We are also interested in change over time. </li></ul><ul><li>Added SIMILE Timeline from MIT </li></ul><ul><ul><li>JavaScript tool allows user to scroll through time or date stamped information </li></ul></ul>
  19. 19. Link to external pages
  20. 20. Query the map
  21. 21. Did it work? <ul><li>Yes, </li></ul><ul><li>Well mostly, </li></ul><ul><li>Well some of the time! </li></ul><ul><li>We can take news feeds, geocode them and draw maps in a web browser. </li></ul>
  22. 22. What didn’t work? <ul><li>News sources and even medical feeds contain too many items that are about avian flu in a general sense but not actually about an outbreak. </li></ul><ul><ul><li>Conferences about avian flu </li></ul></ul><ul><ul><li>Vaccine news </li></ul></ul><ul><ul><li>Reports of other influenza outbreaks </li></ul></ul><ul><ul><li>Reports of other infectious diseases (“unlike avian flu…” </li></ul></ul>
  23. 23. What will we do next? <ul><li>Improved selection of RSS items </li></ul><ul><li>Bayesian classifier </li></ul><ul><ul><li>Train on a selection of “good” and “bad” items </li></ul></ul><ul><ul><li>Allow user to rate articles </li></ul></ul><ul><li>Non-negative matrix factorization </li></ul><ul><ul><li>Clusters similar items based on word usage </li></ul></ul><ul><ul><li>Help overcome repeated reports </li></ul></ul>
  24. 24. What will we do next? <ul><li>Continue to improve the GeoCoder </li></ul><ul><ul><li>Better disambiguation algorithms. </li></ul></ul><ul><ul><li>Allow user to rate the accuracy of locations found in reports. </li></ul></ul><ul><li>Improve User Interface </li></ul><ul><ul><li>Better selection of points of interest using timeline </li></ul></ul><ul><ul><li>Replace SIMILE with custom time bar </li></ul></ul>
  25. 25. Conclusions <ul><li>It is possible to construct an online automated system that can read news articles from professional and general news feeds and map them in a way that allows experts and members of the public to track the spread of avian flu outbreaks. </li></ul><ul><li>There is still much work that can be carried out to improve this work. </li></ul>

×