Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

1,188 views

Published on

The proliferation of plans can result in debilitating information overload in public health and medical emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and
its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they are in alignment with one another. Human analysis of plans is time-consuming and difficult, so text analysis
software tools are needed that can help humans (a) compare plans to find gaps or discrepancies and (b) locate relevant sections of plans and display links to them. This research-in-progress describes two text
analysis tools being developed at the Los Alamos National Laboratory as part of E-SOS (Emergency Situation Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both tools were used to analyze pan flu plans from the White House, the US Department of Health and Human
Services, the Centers for Disease Control, and the 50 states.

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,188
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning

  1. 1. LA-UR-09-02971 Approved for public release; distribution is unlimited. Title: Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning Author(s): Linn Marks Collins, Jorge H. Roman, James E. Powell, Mark L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly Spearing, Miriam E. Blake Los Alamos National Laboratory Geoffrey Hoare, Rhonda White Florida Department of Health Intended for: 6th International Conference on Information Systems for Crisis Response and Management Special Session on Solutions for Information Overload May 10-13, 2009 Göteborg, Sweden Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by the Los Alamos National Security, LLC for the National Nuclear Security Administration of the U.S. Department of Energy under contract DE-AC52-06NA25396. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the auspices of the U.S. Department of Energy. Los Alamos National Laboratory strongly supports academic freedom and a researcher’s right to publish; as an institution, however, the Laboratory does not endorse the viewpoint of a publication or guarantee its technical correctness. Form 836 (7/06)
  2. 2. Using Text Analysis to Reduce Information Overload in Pandemic Influenza Planning James E. Powell, Presenting Author Los Alamos National Laboratory Linn Marks Collins, Jorge H. Roman, Mark L. B. Martinez, Ketan K. Mane, Xiang Yao, A. Shelly Spearing, Miriam E. Blake Los Alamos National Laboratory Geoffrey Hoare, Rhonda White Florida Department of Health The proliferation of plans can result in debilitating information overload in public health and medical  emergencies. In the case of pandemic influenza, the US Department of Health and Human Services (HHS) and  its Centers for Disease Control and Prevention (CDC) have pan flu plans for coordinating the 50 states and  each of the 50 states has its own pan flu plan. Plans need to be analyzed, compared, and revised so that they  are in alignment with one another. Human analysis of plans is time‐consuming and difficult, so text analysis  software tools are needed that can help humans (a) compare plans to find gaps or discrepancies  and (b)  locate relevant sections of plans and display links to them. This research‐in‐progress describes two text  analysis tools being developed at the Los Alamos National Laboratory as part of E‐SOS (Emergency Situation  Overview and Synthesis): the Theme Awareness Tool (THEMAT) and Content Awareness Tool (CAT). Both  tools were used to analyze pan flu plans from the White House, the US Department of Health and Human  Services, the Centers for Disease Control, and the 50 states. 1 Proceedings of the 6th International ISCRAM Conference – Gothenburg, Sweden, May 2009 J. Landgren and S. Jul, eds. 
  3. 3. 2 Problem – Information Overload The proliferation of plans can result in debilitating information overload in public health and medical emergencies Pandemic influenza (pan flu) example The US Department of Health and Human Services (HHS) and its Centers for • Disease Control and Prevention (CDC) have pan flu plans intended to coordinate the 50 states’ responses Each of the 50 states has its own pan flu plan • These plans and related implementing materials are often hundreds of pages • long and get updated frequently Plans need to be analyzed, compared, and revised so that they are in • alignment with one another Human analysis of plans is time-consuming and difficult •
  4. 4. 3 Solution – Text Analysis and Document Linking Text analysis software tools are needed that can help humans Compare plans to find gaps or discrepancies by: 1. Extracting key concepts or themes from each plan • Identifying unique and common themes • Displaying the themes in a table and in a network visualization to facilitate • comparison Locate relevant sections of plans by: 2. Conducting a federated search of indexed plans • Displaying links to relevant plans • Executing both tasks while users are writing sections of a plan, based on what • they are writing
  5. 5. 4 Research in Progress at the Los Alamos National Laboratory, Los Alamos, NM, US Two text analysis tools are being developed as part of E-SOS: Emergency Situation Overview and Synthesis Project began in 2007 • Builds on several years of prior work by team members • E-SOS employs a number of tools Collaborative workspaces • Semantic web technologies • Digital library technologies • Awareness tools • Two of the awareness tools are being used to analyze pan flu plans THEMAT (Theme Awareness Tool) • CAT (Content Awareness Tool) •
  6. 6. 5 E-SOS – System Goals Collaborative workspaces where users can report and discuss information “Awareness tools” which display information that’s relevant to what users are currently reporting and discussing Technologies that synthesize information from heterogeneous sources
  7. 7. 6 Texts – Pan Flu Plans The White House Implementation Plan for the National Strategy for Pandemic Influenza US Health and Human Services Federal Guidance to Assist States in Improving State-Level Pandemic Influenza Operating Plans The Centers for Disease Control and Prevention (CDC) Influenza Pandemic Operation Plan (OPLAN) Pan flu plans for the states In the initial study (December 2008 – February 2009) the Florida Department of • Health team provided the LANL team with the above pan flu plans as well as pan flu plans for six states in the US In the subsequent study (March 2009 - present) the LANL team expanded the • project to include pan flu plans for all 50 states in the US, in keeping with its mission as a national laboratory
  8. 8. 7 THEMAT - Methodology Extract themes, called knowledge signatures (kSigs), from each document Create sets of kSigs (taxonomies) for each document or set of documents Compare taxonomies to identify unique and common themes Compute the network analytic relationships among themes Generate theme network (tNet) visualizations for each document or set of documents
  9. 9. 8 THEMAT – Results A set of knowledge signatures (kSigs) was generated for each document The common themes in pan flu plans were identified The unique themes in pan flu plans were identified The common and unique themes were displayed side-by-side in columns to facilitate comparison The kSigs were highlighted in all documents Theme networks (tNets) were created for each plan to display key concepts and their relationships
  10. 10. Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human 9 Services, Centers for Disease Control, and six states showing similarities and differences: for example, “authorities” and “countries” are important in the federal plans but not the state plans
  11. 11. 10 Knowledge signatures (kSigs) for pan flu plans from the US White House, Health and Human Services, and Centers for Disease Control showing similarities and differences: for example, “global influenza preparedness” is important in the WH and HHS plans but not in the CDC plan
  12. 12. 11 Interactive interface for finding the specific files where a knowledge signature (kSig) is located: in this case, “global influenza preparedness” can be found in Appendix C of the Health and Human Services pan flu plan and in Chapter 5 of the US White House pan flu plan
  13. 13. 12 Interactive interface for locating highlighted knowledge signatures (kSigs) within a document: in this case, “global influenza preparedness” in the US White House pan flu plan Navigation column with links to all of the pages and kSigs in a document
  14. 14. 13 Interactive interface visualizing the theme network (tNet) for the US Centers for Disease Control pan flu plan, showing which themes are related to each other and the importance of each theme (most important to least important = red, blue, green, yellow)
  15. 15. 14 Visualization of the relative number of themes in the plans and the degree of similarity among and between them: in this case, the CDC plan and the California plan are least similar to each other and appear on the X-axis; the Arizona plan is a few degrees closer to the California plan; the White House Plan is a few degrees closer to the CDC plan; and the White House plan contains more of the core themes than the other plans do
  16. 16. 15 THEMAT Analysis of Pan Flu Plans – Preliminary Conclusions Displays of common and unique themes reveal similarities and differences in the plans Some of the differences reflect differences in the authorities and responsibilities of • the government agencies that created the plans Some of the differences reflect inconsistencies in terminology, which is a potential • human factors problem Displays of knowledge signatures (kSigs) highlighted in the text reflect the original document structure A common structure for all of the documents and plans would make it easier for • users to read through the highlighted kSigs and compare plans Theme networks (tNets) provide a more compact view of themes than lists Visualizations are not as familiar as lists, so some users may find them difficult to • understand
  17. 17. 16 CAT – Methodology CAT Components Step 1: Create a repository of content at LANL Step 2: Configure the CAT to include this repository as one of the targets of the federated search Step 3: See what links to relevant content the CAT retrieves as users compose text about pan flu plans Step 4: Revise the targets as needed
  18. 18. 17 CAT – Results A repository of content was created at LANL The CAT was configured to include this repository as one of the targets of the federated search The number of links returned depends on the number of targets One link per target is returned for each segment of parsed text (aka REST • query) In order to increase the number of links, the number of targets has to be • increased The repository of pan flu documents was subsequently divided into a number of smaller targets in order to increase the number of links This allows for finer-grained access
  19. 19. 18 CAT – Screen Shot Results of a federated search of (1) one local repository of pandemic flu plans and (2) four websites with pandemic flu content
  20. 20. 19 CAT Linking of Pan Flu Plans – Preliminary Conclusions The resulting output, showing links to relevant texts, reflects the target structure Creating a collection of smaller, finer-grained targets optimizes the federated • search capabilities of the CAT Under these conditions, the CAT facilitates: Locating content that is related to the topic the user is writing about • Following links to that content • Creating a list of references for the topic •
  21. 21. 20 Future Work Use another E-SOS tool, the Web Awareness Tool (WEBAT), to collect more pan flu content from the web Focus on adding pan flu plans from other countries • Use the THEMAT to analyze the additional content Use the CAT to make the additional content a target of a federated search Bridge the two tools in order to better support planning Share the resulting text analysis with the appropriate federal agencies in the US Share relevant portions of the text analysis with the appropriate state agencies in the US and request that the person(s) responsible for writing pan flu plans complete a questionnaire providing feedback
  22. 22. 21 Questions?
  23. 23. 22 CAT – Drupal – Screen Shot

×