Dr. John Snow is famous for his investigations into the causes of the 19th century cholera epidemics, and is also known as the father of (modern) epidemiology.  He began with noticing the significantly higher death rates in two areas supplied by Southwark Company. His identification of the Broad Street pump as the cause of the Soho epidemic is considered the classic example of epidemiology. He used chlorine in an attempt to clean the water and had the handle removed, thus ending the outbreak.
Reuses data from other sources Daisy chain of related studies Often ePHI/sensitive information (therefore subject to HIPAA) Privacy and security are paramount! Conformance with laws and regulations especially important Big data Complex and heterogenous - Associating public health studies with genomics research, demographic information with health information, etc Required quality data to reproduce studies and verify results Requires reuse of workflow modules to execute same commands on different data supervision of collections and data sharing by oversight committees, rather than individuals, common Researcher incentives in current system cause researchers to view data as proprietary, rather than a public good; lots of data hugging Data is often stored either in Excel spreadsheets or relational databases Data is often coded into numeric values, since epidemiologists often work with statistical analyses and most statistical routines require that non- numeric information be coded into numeric answers
Training done in cooperation with IRB and Research Administration and University IT group; take a wholistic approach; researchers don’t want to have to go to 3 different trainings if they can avoid it. Integration of concepts and issues, where you can’t have a single workshop.
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for Epidemiology
https://www.asis.org/rdap/Bootstrapping Library DataManagement Services for Epidemiology Stacy KonkielScience Data Management Librarian Indiana University - Bloomington Konkiel, Bootstrapping Library Data Management Services for Epidemiology
https://www.asis.org/rdap/Epidemiology The study of the patterns, causes, and effects of health and disease conditions in populations
https://www.asis.org/rdap/ “Epi” Data Characteristics• Sensitive• Often recycled, daisy-chained – Big data• Complex and heterogeneous• Flat-file vs. relational databases• Often numeric, even for non-numeric responses – data dictionaries are essential!
https://www.asis.org/rdap/ Researcher Needs• HIPAA-aligned storage• High-capacity storage and computation• Protection of personal investment in data• Incentives for sharing data• Metadata interoperability
https://www.asis.org/rdap/ Library Services for Epi Data• Technology – Repository with access controls OR Long-term embargoes for data – High-capacity preservation (OA and dark) – Ability to mint PIDs for data
https://www.asis.org/rdap/ Library Services for Epi Data• Training – Data management specific to epi – Metadata standards and uses – De-identification – how and why – Data citation using PIDs
https://www.asis.org/rdap/ Resources• Informed Consent: Lutz, K., et al. (2012). Research ethics board approval for an international thromboprophylaxis trial. Journal of critical care• Workflows: Enanoria, W. (2004). Data Management Issues in Epidemiology. Berkeley, CA: Center for Infectious Diseases & Emergency Readiness. Retrieved from www.idready.org/slides/data_management.ppt• Workflows: Thomas, R. K. (Ed.). (2003). Chapter 12: Information Sources and Data Management. Health Services Planning.• Metadata: Brandt, C. A., Gadagkar, R., Rodriguez, C., & Nadkarni, P. M. (2004). Managing complex change in clinical study metadata. Journal of the American Medical Informatics Association : JAMIA• Disciplinary Metadata (DCC): http://www.dcc.ac.uk/resources/metadata- standards