Thank you for allowing me a few minutes to talk about the amazing work that happens at the National Institutes of Health Library.
First, as a Federal employee, I must get the disclaimers out of the way…
Here’s the big picture of our effort and the focus of my talk today. We, the NIH Library, are working with the National Institute of Allergy and Infectious Diseases to build a Pandemic Influenza Digital Archive. The goal is to not only catalog and digitize this historical collection of articles, but to build a community around these publications for virologists, historians, policy makers and anyone else that can gain value from studying pandemics.
The NIH Library is focused on being the “heart” of the NIH, meeting and exceeding the scholarly information needs of our community of 5,000+ intramural researchers and clinicians through a range of innovative services. As a key resource, we make nearly all of the 5,000 scholarly journals available at the desktop, provide 15 Informationists (embedded librarians) into several NIH institutes, and have beautiful physical library configured as a flexible place to work or collaborate. One of the services that we provided through the Informationists program was the creation of custom database sets for specific work projects. These could either take the form of a Endnote file usable by the researcher or a ASP/SQL based web page displaying the results of the literature search. This approach was functional, but rudimentary and required a good bit of lead time to develop or modify each instance. A streamlined or ‘rapid application development’ approach was desired for future efforts.
Our partner in this effort, the National Institute of Allergy and Infectious Diseases, conducts research to improve treatments for and vaccinations of infectious diseases.
Over the past 30 years of his career, Dr. David Morens identified, collected and compiled a core collection of research information focused on the epidemiology, etiology, diagnosis and treatment of the 1918 pandemic influenza and influenza-related diseases. This collection presently serves his needs as a virologist with a special interest in pandemic influenzas. He and his colleagues have utilized data from these documents in dozens of publications on the history of influenza and its significance for the future. Currently, Dr. Morens has a collection of over 5,000 documents with more arriving daily. Acquired primarily through the NIH Library and NLM, they include journal articles, statistical data, email correspondence, books and book chapters, bibliographies, reviews, abstracts, newspaper articles, etc. In response to the health community’s critical need for an accessible, centralized source for historical influenza data, this collection will serve as the starting point for a comprehensive and vital pandemic influenza digital archive. The goal is to facilitate the ability of scientists and researchers, both within and outside NIH, to explore and respond to current issues and ideas; and to acquire a deeper understanding of pandemic influenza.
Here’s a map from the historical collection showing the spread of the 1918 disease over time. The pandemic happened too quickly for the Public Health Service to create a detailed study so they had to create this approximate map after the pandemic was over.
In contract, here’s a modern map showing pandemic influenza spread – a partnership between Rhizalabs and Google. This shows the potential for reworking the historical data to visualize it in new ways.
It became apparent that this pandemic influenza project needed more than a good bibliographic database, it also needed strong collaboration tools to support the creation of a community around this historical topic. The recent outbreak of pandemic influenza brought additional focus to this effort and also showcased some new ways to present pandemic-related data. As we researched a solution, Virtual Research Environments sounded like a good direction for our effort. The UK-based JISC, which is funding its third round of research efforts around Virtual Research Environments, says “The purpose of a Virtual Research Environment (VRE) is to help researchers from all disciplines to work collaboratively by managing the increasingly complex range of tasks involved in carrying out research on both small and large scales. The concept of a VRE is evolving. The term VRE is now best thought of as shorthand for the tools and technologies needed by researchers to do their research, interact with other researchers (who may come from different disciplines, institutions or even countries) and to make use of resources and technical infrastructures available both locally and nationally.” http://www.jisc.ac.uk/whatwedo/programmes/vre.aspx We also ran into Islandora which is “an open source project underway at the Robertson Library at the University of Prince Edward Island. Islandora combines the Drupal and Fedora software applications to create a robust digital asset management system that can be used for any requirement where collaboration and digital data stewardship, for the short and long term, are critical.” http://islandora.ca/ Even more specifically, we learned that the Massachusetts General Hospital and Harvard University teamed up to build a software toolkit called the Science Collaboration Framework (SCF) to establish web-based virtual team organizations for researchers in biomedicine. From their description, it “enables researchers to publish and discuss on-line content such as articles, news, and perspectives, and to provide shared semantic context for this content using established scientific vocabularies and automated text mining. SCF is reusable open source software based on the popular Drupal content management system, with many new modules to support biomedical researchers and access to RDF &quot;linked data&quot;. SCF supports scientists in publishing, annotating, sharing and discussing content such as articles, perspectives, interviews and news items, as well as providing personal biographies, formal and informal bibliographies, and asserting research interests. It also supports shared databases of key research resources, and private research workspaces.” http://sciencecollaboration.org/ Two popular Web sites based upon the SCF are Stembook (stembook.org) and Parkinsons Disease Online (http://www.pdonlineresearch.org/ ) We determined that a Virtual Research Environment was the best solution for this project and that Drupal was the most advanced platform to build it upon.
This has been the first major project I’ve tackled since starting at the NIH Library in March. I had no prior experience with Drupal but was fortunate to have joined a group that already had a server ready to be loaded and a support agreement with Acquia. Since my job was to manage the custom services requested by the Informationists program, changing the current process from a manual one-off SQL/ASP database effort to something that was rapid and reusable was very appealing. The Pandemic Influenza Digital Archive or PIDA was focused on creating a bibliographic database of the documents held by Dr. Morens until we were able to show the power and value of collaboration that Drupal brought to the table. NIAID eagerly agreed to switch gears and rescope the effort to reflect this broader goal – which was actually in line with Dr. Morens’ ultimate goals for the project.
Once we had a working system, we then needed to focus on organizing and capturing the collection. In a second move that dramatically changed the scope of the project, we decided to first build a master database of relevant publication records by searching historical medical-related databases including Thomson-Reuters Web of Knowledge, CABI’s Global Health Archive, and ProQuest’s Historical Newspapers Collection. Since Dr. Morens collection was built in extreme stove pipes around specific article topics, starting with a broad historical search using professional library staff will help to ensure a more comprehensive collection. We will then add taxonomy vocabularies to each record and compare all records against what Dr. Morens has already identified. We’re also enabling various collaboration features including the forum, five-star rating, and comment. Finally, we are identifying and reaching out to other organizations that have a strong historical pandemic influenza focus and would like to partner with us on this project.
Here’s a screen shot of what the site looks like right now. Most of the effort to date has been on the backend and researching modules that will provide the functionality that we want now or in the future.
Here’s a shot of the editing screen showing the various vocabularies that have been defined so far. Tagging each record will be key to providing advanced sorting, searching, clustering, and visualizations.
To accomplish the eventual goal of a globally-available community for virologists and others interested in this topic, we’ve implemented and are experimenting with several Drupal modules, including: Biblio – which manages list of scholarly publications, automatically handling various sort and exporting views Browscap/Mobile Tools – modules that identify a mobile device and allows custom views to be delivered that best support the display CCK/Views – more customization to Drupal Entrez – automatic/streamlined import from PubMed into Biblio Gmap – Google Map API bridge for Drupal LDAP – integration to MS Active Directory and LDAP Taxonomy – allows classification of content Timeline – support for Simile Timeline API
Based upon the database record enhancements that are planned, we should be able to plot all search results by location (using Gmap) and by time (using Timeline). We also plan to pull copies of all publications identified in the database onto our local server to provide full text searching and ensure local preservation of this collection. Research is definitely a team sport so being able to collaborate at multiple levels is key. Being able to comment on works in the collection will be helpful but also being able to create specific sub-sets of records and share them with colleagues will be critically important. Utilizing in-house translation services and future search capabilities, we hope to offer multilingual search to support the diverse languages in our collection.
NIH - Drupal Pida 2010
Implementing the Open Government Directive Virtual Research Environments at the NIH Library
Disclaimer <ul><li>These slides represent the work and opinions of the presenter and do not constitute official positions of the National Institutes of Health (NIH) or the U.S. Department of Health & Human Services (HHS). </li></ul><ul><li>References to any specific commercial products by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by NIH or HHS. </li></ul>
Introduction <ul><li>The National Institutes of Health Library and the National Institute of Allergy and Infectious Diseases (NIAID) Office of Communications and Government Relations (OCGR) are collaborating on the creation of a 'Pandemic Influenza Digital Archives'. </li></ul><ul><li>This collaborative web site will showcase Dr. David Morens' core collection of several thousand scholarly publications spanning the 9 th century AD to the present on various aspects of all pandemics and large scale epidemics, especially the 1918 pandemic influenza. Using the open source software Drupal, we are creating a world-class pandemic influenza digital archive, serving the needs of virologists and researchers around the world. </li></ul>
About the NIH Library <ul><li>This is accomplished by: </li></ul><ul><li>providing desktop/workbench access to relevant journals, books, reference sources, and database/Internet resources; </li></ul><ul><li>providing access to over 20 information professionals to assist with standard or more personalized services; </li></ul><ul><li>providing a quiet place to read, write, or study </li></ul>The NIH Library, centrally located on the Bethesda campus in Building 10, serves the information needs of NIH and selected HHS agencies, through a comprehensive range of information resources, services, and knowledge.
About NIAID <ul><li>The National Institute of Allergy and Infectious Diseases (NIAID) conducts and supports basic and applied research to better understand, treat, and ultimately prevent infectious, immunologic, and allergic diseases. For more than 60 years, NIAID research has led to new therapies, vaccines, diagnostic tests, and other technologies that have improved the health of millions of people in the United States and around the world. </li></ul>
About the Pandemic Flu Collection <ul><li>Collected by Dr. David Morens </li></ul><ul><li>Focused on the epidemiology, etiology, </li></ul><ul><li>diagnosis and treatment </li></ul><ul><li>Over 5,000 items collected – add’l 5,000 identified </li></ul><ul><li>Include journal articles, statistical data, email correspondence, books and book chapters, bibliographies, reviews, abstracts, & newspapers </li></ul><ul><li>Will be starting point for a comprehensive resource </li></ul>
Virtual Research Environments <ul><li>Virtual Research Environments </li></ul><ul><ul><li>http://www.jisc.ac.uk/whatwedo/programmes/vre.aspx </li></ul></ul><ul><li>UPEI’s Islandora (Drupal over Fedora) </li></ul><ul><ul><li>http://islandora.ca/ </li></ul></ul><ul><li>Massachusetts General & Harvard-based Science Collaboration Framework </li></ul><ul><ul><li>http://sciencecollaboration.org/ </li></ul></ul>
Moving Towards Drupal <ul><li>Purchased a server and Acquia support – Nov. 2008 </li></ul><ul><li>Hired Information Architect – March 2 nd </li></ul><ul><li>Brought into PIDA project – April 3 rd </li></ul><ul><li>Scheduled on-site Drupal install – May 13 th </li></ul><ul><li>First prototype shown – May 28 th </li></ul><ul><ul><ul><li>NIAID committed to go with Drupal </li></ul></ul></ul><ul><ul><li>NIAID/NIH rescoping PIDA effort in light of new capabilities </li></ul></ul>
Building the Digital Archive <ul><li>Build a database of records: </li></ul><ul><ul><li>Search to build initial database </li></ul></ul><ul><ul><li>Match against physical collection </li></ul></ul><ul><ul><li>Enhance the database records </li></ul></ul><ul><li>Wrap additional services around this collection: </li></ul><ul><ul><li>Create a Web site “home” for the archive </li></ul></ul><ul><ul><li>Provide collaboration tools </li></ul></ul><ul><ul><li>Partner with other organizations </li></ul></ul>
Drupal modules using <ul><li>Biblio (scholarly publication management) </li></ul><ul><li>Browscap/Mobile Tools (to provide handheld support) </li></ul><ul><li>CCK/Views (to support event/meeting calendar, etc) </li></ul><ul><li>Entrez (to import from PubMed to Biblio) </li></ul><ul><li>Gmap (to support mapping) </li></ul><ul><li>LDAP/Active Directory (integration with single sign-on) </li></ul><ul><li>Taxonomy (to support ad hoc and structured term lists) </li></ul><ul><li>Timeline (to dynamically plot by timeframe) </li></ul>
Future Plans <ul><li>Clustering results by location on a map or by date on a timeline </li></ul><ul><li>Local, full text searching of all publications in collection </li></ul><ul><li>MyLibrary functionality to mark records, save them into sets, and share those sets with others </li></ul><ul><li>Multilingual search </li></ul>
Thank You! James King NIH Library, Information Architect [email_address]