The document describes a joke about a physicist, chemist, and statistician finding a fire in a wastebasket. The physicist proposes cooling the materials to lower their temperature below ignition. The chemist proposes cutting off the oxygen supply to extinguish the fire. The statistician then starts other fires around the room to obtain an adequate sample size, alarming the other two professors. The document also provides information about data analysis and causal inference, including steps in data management, data processing, exploration, and focused analysis using methods like stratification and mathematical modeling.
Bill Howe discussed emerging topics in responsible data science for the next decade. He described how the field will focus more on what should be done with data rather than just what can be done. Specifically, he talked about incorporating societal constraints like fairness, transparency and ethics into algorithmic decision making. He provided examples of unfair outcomes from existing algorithms and discussed approaches to measure and achieve fairness. Finally, he discussed the need for reproducibility in science and potential techniques for more automatic scientific claim checking and deep data curation.
This document discusses the responsible use of data science techniques and technologies. It describes data science as answering questions using large, noisy, and heterogeneous datasets that were collected for unrelated purposes. It raises concerns about the irresponsible use of data science, such as algorithms amplifying biases in data. The work of the DataLab group at the University of Washington is presented, which aims to address these issues by developing techniques to balance predictive accuracy with fairness, increase data sharing while protecting privacy, and ensure transparency in datasets and methods.
This document discusses the importance of properly analyzing and visualizing data when conducting statistical tests and reporting results. It recommends displaying raw data through dot plots instead of bar graphs to avoid concealing variance. The document discusses how the mean may not always be the best descriptor of data and how providing confidence intervals around measures provides important context about uncertainty. It also emphasizes choosing statistical tests wisely based on the characteristics of the data and justifying choices. Overall, the document stresses the importance of exploring data visually and using appropriate analyses and reporting to avoid drawing incorrect conclusions.
Data Science is an interdisciplinary approach that combines computational science, statistics, and domain knowledge to extract meaningful insights from large and complex data. It aims to address challenges posed by the data revolution characterized by big data from diverse sources. There is no single agreed upon definition, but most definitions emphasize applying techniques from computer science, statistics, and the relevant domain area to discover patterns, make predictions, and support decision making from data. Key aspects include developing appropriate methodologies for knowledge discovery, forecasting and decisions using large and diverse data from sources like surveys, social media, sensors and more. The integration of domain knowledge representation with computational and statistical tools is seen as an important novelty that can enhance data analysis and interpretation.
The document discusses teaching data ethics in data science education. It provides context about the eScience Institute and a data science MOOC. It then presents a vignette on teaching data ethics using the example of an alcohol study conducted in Barrow, Alaska in 1979. The study had methodological and ethical issues in how it presented results to the community. The document concludes by discussing incorporating data ethics into all of the Institute's data science programs and initiatives like automated data curation and analyzing scientific literature visuals.
Bill Howe discussed emerging topics in responsible data science for the next decade. He described how the field will focus more on what should be done with data rather than just what can be done. Specifically, he talked about incorporating societal constraints like fairness, transparency and ethics into algorithmic decision making. He provided examples of unfair outcomes from existing algorithms and discussed approaches to measure and achieve fairness. Finally, he discussed the need for reproducibility in science and potential techniques for more automatic scientific claim checking and deep data curation.
This document discusses the responsible use of data science techniques and technologies. It describes data science as answering questions using large, noisy, and heterogeneous datasets that were collected for unrelated purposes. It raises concerns about the irresponsible use of data science, such as algorithms amplifying biases in data. The work of the DataLab group at the University of Washington is presented, which aims to address these issues by developing techniques to balance predictive accuracy with fairness, increase data sharing while protecting privacy, and ensure transparency in datasets and methods.
This document discusses the importance of properly analyzing and visualizing data when conducting statistical tests and reporting results. It recommends displaying raw data through dot plots instead of bar graphs to avoid concealing variance. The document discusses how the mean may not always be the best descriptor of data and how providing confidence intervals around measures provides important context about uncertainty. It also emphasizes choosing statistical tests wisely based on the characteristics of the data and justifying choices. Overall, the document stresses the importance of exploring data visually and using appropriate analyses and reporting to avoid drawing incorrect conclusions.
Data Science is an interdisciplinary approach that combines computational science, statistics, and domain knowledge to extract meaningful insights from large and complex data. It aims to address challenges posed by the data revolution characterized by big data from diverse sources. There is no single agreed upon definition, but most definitions emphasize applying techniques from computer science, statistics, and the relevant domain area to discover patterns, make predictions, and support decision making from data. Key aspects include developing appropriate methodologies for knowledge discovery, forecasting and decisions using large and diverse data from sources like surveys, social media, sensors and more. The integration of domain knowledge representation with computational and statistical tools is seen as an important novelty that can enhance data analysis and interpretation.
The document discusses teaching data ethics in data science education. It provides context about the eScience Institute and a data science MOOC. It then presents a vignette on teaching data ethics using the example of an alcohol study conducted in Barrow, Alaska in 1979. The study had methodological and ethical issues in how it presented results to the community. The document concludes by discussing incorporating data ethics into all of the Institute's data science programs and initiatives like automated data curation and analyzing scientific literature visuals.
Data Curation and Debugging for Data Centric AIPaul Groth
It is increasingly recognized that data is a central challenge for AI systems - whether training an entirely new model, discovering data for a model, or applying an existing model to new data. Given this centrality of data, there is need to provide new tools that are able to help data teams create, curate and debug datasets in the context of complex machine learning pipelines. In this talk, I outline the underlying challenges for data debugging and curation in these environments. I then discuss our recent research that both takes advantage of ML to improve datasets but also uses core database techniques for debugging in such complex ML pipelines.
Presented at DBML 2022 at ICDE - https://www.wis.ewi.tudelft.nl/dbml2022
This document discusses the importance of statistics in astronomical research. It notes that while astronomers are well-trained in physics, many are not well-versed in statistical methodology and often misapply statistical methods. The document outlines the talk, covering the history of astronomy and statistics, current issues, and recommended steps for proper statistical analysis of scientific data. It emphasizes that modern statistical tools and computing environments like R can help astronomers better analyze the huge datasets now available and derive deeper scientific insights.
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
Talk given to the Rhodes Biomedical Association, 4th May 2016.
For references see: http://www.slideshare.net/deevybishop/references-on-reproducibility-crisis-in-science-by-dvm-bishop
This document provides an introduction to biostatistics. It defines biostatistics and explains its importance in biomedical research. Some key points covered include:
- Biostatistics is the application of statistics to medicine and health sciences. It involves the collection, organization, and analysis of numerical data.
- Understanding biostatistics is important for medical research, updating medical knowledge, and managing data and treatment.
- The document outlines the basic concepts of biostatistics like population and sample, and the different types of data. It also describes the typical steps involved in a research project and how biostatistics can be applied.
Table of Contents16304_TTLX_Walker.indd 1 8312 1152.docxmattinsonjanel
This document provides a table of contents for a book on statistics in criminology and criminal justice. It lists chapter titles and page numbers. It also includes copyright information, publishing details, and production credits for the book. The summary focuses on the purpose and key details rather than copying significant content.
The Emerging Discipline of Data Science: Principles and Techniques for Data-Intensive Analysis, Keynote, 2nd Swiss Workshop on Data Science – SDS|2015, Winterthur, Switzerland, 12 June 2015
Abstract and other presentations at: http://michaelbrodie.com/?page_id=17
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
Conference presentation at ISCB 41 in the session
"Biostatistical inference in practice: moving beyond false
dichotomies"
A comment in Nature, signed by over 800 researchers, called for the scientific community to “retire statistical significance”. The responses included a call to halt the use of the term „statistically significant”, and changes in journal’s author guidelines. The leading discourse among statisticians is that inadequate statistical training of clinical researchers and publishing practices are to blame for the misuse of statistical testing. In this presentation, we search our collective conscience by reviewing ethical guidelines for statisticians in light of the p-value crisis, examine what this implies for us when conducting analyses in collaborative work and teaching, and whether the ATOM (accept uncertainty; be thoughtful, open and modest) principles can guide us.
Nicholas Jewell MedicReS World Congress 2014MedicReS
Teaching Medical Research Methodology : All modern medical and public health research now requires a considerable amount of biostatistics,
computer science, data processing and machine
learning: Data Science
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
This thesis explored developing new interactive visualization systems and data integration methods to support discovery in collections of scientific information. It addressed challenges of existing methods to support overviews and exploration as the volume of data increases. The work involved instantiating graph structures from real-world datasets, developing interactive visualizations, and using quantitative and semantic guidance to explore connections. It evaluated the methods on datasets from VAST challenges, open notebook science, and Pfizer drug discovery to demonstrate feasibility and identify future work opportunities at larger scales with these approaches.
Human resources section2b-textbook_on_public_health_and_community_medicinePrabir Chatterjee
Statistics are used extensively in public health and community medicine. Statistical methods allow public health administrators to understand population health trends and identify health issues at both the community and individual level. Descriptive statistics are used to summarize and present data in a meaningful way through tables, graphs, and summary measures. Inferential statistics are then used to draw conclusions and make decisions based on analyzing samples from the overall population. The appropriate use of statistics is important for public health planning, research, and evaluating health programs and treatments.
This document discusses issues with reproducibility in scientific research. It provides examples of studies that could not be reproduced, including a case where only 6 out of 53 landmark cancer studies could be validated. It advocates for more transparency through open data, open access, and open source policies to improve reproducibility and rebuild trust in science. Open and reproducible research practices like open notebook science are presented as ways to achieve faster, more reliable science.
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
In the session "Philosophy of Science and the New Paradigm of Data-Driven Science at the American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics
This document appears to be a presentation on finding and understanding statistical sources. It discusses what statistics are, provides examples of data versus statistics, and covers the key elements of statistics including the unit of observation, space, and time period. It also addresses evaluating statistical sources and searching for statistics from various potential sources like government agencies, research organizations, and publications. The presentation aims to help students learn how to properly understand and evaluate statistical information found in their research.
Shelley Hurwitz MedicReS World Congress 2014MedicReS
Biostatistics and Ethics Shelley Hurwitz, PhD Brigham and Women’s Hospital Harvard Medical School Fellow, American Statistical Association Advisory Board on Ethics, International Statistical Institute
This document appears to be a presentation given by Tom Johnson at the Esri Health Conference in Scottsdale, Arizona on August 28, 2012. The presentation discusses how data and maps inform each other, with data being used to create maps and maps then guiding the collection of additional data. It also outlines four potential types of data/analytic variables that can be studied for any phenomenon: qualitative, quantitative, geographic, and timeline of change. The presentation argues that addressing complex health issues will require transdisciplinary collaboration and going beyond the traditional three-phase process of data in, analysis, and information out.
From Replication Crisis to Credibility RevolutionKoki Ikeda
The document discusses issues related to the replication crisis in psychology and potential solutions. It notes that questionable research practices like p-hacking and HARKing are common but unintentional. Solutions proposed include transparency through open science, pre-registration of studies including pre-reviews, direct replication of studies, and higher evidentiary standards. Institutional changes are also needed to incentivize practices like pre-registration and increasing acceptance of replication studies. While rigorous methods may initially lower productivity, they can increase it long-term by allowing easier reuse of materials and data and identifying reliable findings sooner.
This document provides an overview and introduction to an economics statistics course. It discusses key topics that will be covered in the course, including:
- Descriptive and inferential statistics
- Probability theory as the bridge between descriptive and inferential statistics
- The process of statistical investigation from designing experiments/surveys to making inferences and assessing reliability
- Examples of how statistics is used to analyze data and make decisions in various fields like government, business, and research.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
- Data monitoring committees (DMCs) emerged in the 1970s to periodically review accumulating clinical trial data and monitor safety and efficacy. They issue judgments to stop or continue trials.
- DMCs have become reflexive institutions that engage in endogenous critical inquiry. They focus on issues like membership qualifications and attributes, and how much trials can be redefined based on interim data.
- Guidelines have been issued to provide standards around DMC roles, structures, and decision-making processes. There is debate around how independent DMCs should be and what level of access they should have to trial data and ability to influence trial parameters.
This chapter introduces communicable diseases and their epidemiology in Ethiopia. It defines key epidemiological terms used to describe diseases. Communicable diseases pose a major health burden in Ethiopia. Many factors contribute to their transmission, including poverty, poor sanitation and lack of access to health care. The major communicable diseases affecting Ethiopia are described.
This document is a manual published by the World Health Organization in 1997 on vector control methods for use by individuals and communities. It contains 10 chapters that describe the biology, public health importance, and control measures for various disease vectors, including mosquitoes, tsetse flies, triatomine bugs, fleas, lice, ticks, mites, cockroaches, houseflies, freshwater snails, and cyclops. For each vector, the manual provides details on its life cycle, disease transmission, and recommends methods for personal protection as well as community-based control strategies.
More Related Content
Similar to 13a Data analysis and causal inference – 1
Data Curation and Debugging for Data Centric AIPaul Groth
It is increasingly recognized that data is a central challenge for AI systems - whether training an entirely new model, discovering data for a model, or applying an existing model to new data. Given this centrality of data, there is need to provide new tools that are able to help data teams create, curate and debug datasets in the context of complex machine learning pipelines. In this talk, I outline the underlying challenges for data debugging and curation in these environments. I then discuss our recent research that both takes advantage of ML to improve datasets but also uses core database techniques for debugging in such complex ML pipelines.
Presented at DBML 2022 at ICDE - https://www.wis.ewi.tudelft.nl/dbml2022
This document discusses the importance of statistics in astronomical research. It notes that while astronomers are well-trained in physics, many are not well-versed in statistical methodology and often misapply statistical methods. The document outlines the talk, covering the history of astronomy and statistics, current issues, and recommended steps for proper statistical analysis of scientific data. It emphasizes that modern statistical tools and computing environments like R can help astronomers better analyze the huge datasets now available and derive deeper scientific insights.
What is the reproducibility crisis in science and what can we do about it?Dorothy Bishop
Talk given to the Rhodes Biomedical Association, 4th May 2016.
For references see: http://www.slideshare.net/deevybishop/references-on-reproducibility-crisis-in-science-by-dvm-bishop
This document provides an introduction to biostatistics. It defines biostatistics and explains its importance in biomedical research. Some key points covered include:
- Biostatistics is the application of statistics to medicine and health sciences. It involves the collection, organization, and analysis of numerical data.
- Understanding biostatistics is important for medical research, updating medical knowledge, and managing data and treatment.
- The document outlines the basic concepts of biostatistics like population and sample, and the different types of data. It also describes the typical steps involved in a research project and how biostatistics can be applied.
Table of Contents16304_TTLX_Walker.indd 1 8312 1152.docxmattinsonjanel
This document provides a table of contents for a book on statistics in criminology and criminal justice. It lists chapter titles and page numbers. It also includes copyright information, publishing details, and production credits for the book. The summary focuses on the purpose and key details rather than copying significant content.
The Emerging Discipline of Data Science: Principles and Techniques for Data-Intensive Analysis, Keynote, 2nd Swiss Workshop on Data Science – SDS|2015, Winterthur, Switzerland, 12 June 2015
Abstract and other presentations at: http://michaelbrodie.com/?page_id=17
Dichotomania and other challenges for the collaborating biostatisticianLaure Wynants
Conference presentation at ISCB 41 in the session
"Biostatistical inference in practice: moving beyond false
dichotomies"
A comment in Nature, signed by over 800 researchers, called for the scientific community to “retire statistical significance”. The responses included a call to halt the use of the term „statistically significant”, and changes in journal’s author guidelines. The leading discourse among statisticians is that inadequate statistical training of clinical researchers and publishing practices are to blame for the misuse of statistical testing. In this presentation, we search our collective conscience by reviewing ethical guidelines for statisticians in light of the p-value crisis, examine what this implies for us when conducting analyses in collaborative work and teaching, and whether the ATOM (accept uncertainty; be thoughtful, open and modest) principles can guide us.
Nicholas Jewell MedicReS World Congress 2014MedicReS
Teaching Medical Research Methodology : All modern medical and public health research now requires a considerable amount of biostatistics,
computer science, data processing and machine
learning: Data Science
Interactive Visualization Systems and Data Integration Methods for Supporting...Don Pellegrino
This thesis explored developing new interactive visualization systems and data integration methods to support discovery in collections of scientific information. It addressed challenges of existing methods to support overviews and exploration as the volume of data increases. The work involved instantiating graph structures from real-world datasets, developing interactive visualizations, and using quantitative and semantic guidance to explore connections. It evaluated the methods on datasets from VAST challenges, open notebook science, and Pfizer drug discovery to demonstrate feasibility and identify future work opportunities at larger scales with these approaches.
Human resources section2b-textbook_on_public_health_and_community_medicinePrabir Chatterjee
Statistics are used extensively in public health and community medicine. Statistical methods allow public health administrators to understand population health trends and identify health issues at both the community and individual level. Descriptive statistics are used to summarize and present data in a meaningful way through tables, graphs, and summary measures. Inferential statistics are then used to draw conclusions and make decisions based on analyzing samples from the overall population. The appropriate use of statistics is important for public health planning, research, and evaluating health programs and treatments.
This document discusses issues with reproducibility in scientific research. It provides examples of studies that could not be reproduced, including a case where only 6 out of 53 landmark cancer studies could be validated. It advocates for more transparency through open data, open access, and open source policies to improve reproducibility and rebuild trust in science. Open and reproducible research practices like open notebook science are presented as ways to achieve faster, more reliable science.
D. G. Mayo: Your data-driven claims must still be probed severelyjemille6
In the session "Philosophy of Science and the New Paradigm of Data-Driven Science at the American Statistical Association Conference on Statistical Learning and Data Science/Nonparametric Statistics
This document appears to be a presentation on finding and understanding statistical sources. It discusses what statistics are, provides examples of data versus statistics, and covers the key elements of statistics including the unit of observation, space, and time period. It also addresses evaluating statistical sources and searching for statistics from various potential sources like government agencies, research organizations, and publications. The presentation aims to help students learn how to properly understand and evaluate statistical information found in their research.
Shelley Hurwitz MedicReS World Congress 2014MedicReS
Biostatistics and Ethics Shelley Hurwitz, PhD Brigham and Women’s Hospital Harvard Medical School Fellow, American Statistical Association Advisory Board on Ethics, International Statistical Institute
This document appears to be a presentation given by Tom Johnson at the Esri Health Conference in Scottsdale, Arizona on August 28, 2012. The presentation discusses how data and maps inform each other, with data being used to create maps and maps then guiding the collection of additional data. It also outlines four potential types of data/analytic variables that can be studied for any phenomenon: qualitative, quantitative, geographic, and timeline of change. The presentation argues that addressing complex health issues will require transdisciplinary collaboration and going beyond the traditional three-phase process of data in, analysis, and information out.
From Replication Crisis to Credibility RevolutionKoki Ikeda
The document discusses issues related to the replication crisis in psychology and potential solutions. It notes that questionable research practices like p-hacking and HARKing are common but unintentional. Solutions proposed include transparency through open science, pre-registration of studies including pre-reviews, direct replication of studies, and higher evidentiary standards. Institutional changes are also needed to incentivize practices like pre-registration and increasing acceptance of replication studies. While rigorous methods may initially lower productivity, they can increase it long-term by allowing easier reuse of materials and data and identifying reliable findings sooner.
This document provides an overview and introduction to an economics statistics course. It discusses key topics that will be covered in the course, including:
- Descriptive and inferential statistics
- Probability theory as the bridge between descriptive and inferential statistics
- The process of statistical investigation from designing experiments/surveys to making inferences and assessing reliability
- Examples of how statistics is used to analyze data and make decisions in various fields like government, business, and research.
DataONE Education Module 01: Why Data Management?DataONE
Lesson 1 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
- Data monitoring committees (DMCs) emerged in the 1970s to periodically review accumulating clinical trial data and monitor safety and efficacy. They issue judgments to stop or continue trials.
- DMCs have become reflexive institutions that engage in endogenous critical inquiry. They focus on issues like membership qualifications and attributes, and how much trials can be redefined based on interim data.
- Guidelines have been issued to provide standards around DMC roles, structures, and decision-making processes. There is debate around how independent DMCs should be and what level of access they should have to trial data and ability to influence trial parameters.
Similar to 13a Data analysis and causal inference – 1 (20)
This chapter introduces communicable diseases and their epidemiology in Ethiopia. It defines key epidemiological terms used to describe diseases. Communicable diseases pose a major health burden in Ethiopia. Many factors contribute to their transmission, including poverty, poor sanitation and lack of access to health care. The major communicable diseases affecting Ethiopia are described.
This document is a manual published by the World Health Organization in 1997 on vector control methods for use by individuals and communities. It contains 10 chapters that describe the biology, public health importance, and control measures for various disease vectors, including mosquitoes, tsetse flies, triatomine bugs, fleas, lice, ticks, mites, cockroaches, houseflies, freshwater snails, and cyclops. For each vector, the manual provides details on its life cycle, disease transmission, and recommends methods for personal protection as well as community-based control strategies.
The 10-step approach to outbreak investigations involves:
1) Identifying an investigation team and resources.
2) Establishing the existence of an outbreak.
3) Verifying the diagnosis, constructing a case definition, and finding cases systematically.
Descriptive epidemiology is then used to develop hypotheses, which are evaluated through additional studies if needed, before implementing control measures, communicating findings, and maintaining surveillance to confirm the outbreak has ended. Being systematic and following these steps is key to determining the source and controlling outbreaks.
According to a new assessment by the UN Food and Agriculture Organization and Famine Early Warning Systems Network, around 731,000 Somalis face acute food insecurity and 2.3 million more are at risk. This brings the total number of people in need of humanitarian assistance to 3 million. Malnutrition rates remain high, with nearly 203,000 children acutely malnourished. The humanitarian situation has improved in some areas due to above average rainfall and increased aid, but concerns remain for 2015. The humanitarian response plan requests $863 million to address ongoing needs and prevent a major crisis from undoing recent peace and state building progress in Somalia.
This document discusses communicable diseases. It defines communicable diseases as diseases that can spread from one person to another through various modes of transmission like air, water, food, or contact. Some common communicable diseases mentioned include influenza, polio, typhoid, measles, mumps, chickenpox, tuberculosis, and AIDS. It also discusses immunity and how the body develops immunity to diseases either naturally after suffering from an illness or artificially through vaccination. Preventing the spread of communicable diseases requires measures like maintaining hygiene, immunization, and promptly treating illnesses.
This document outlines the Canadian Nurses Association's position on primary health care. It believes primary health care is integral to improving health outcomes for Canadians and that its principles, such as accessibility, health promotion, and intersectoral collaboration, are the most effective way to provide equitable healthcare. The CNA also believes primary health care and nursing are closely connected, and nursing standards and education should be grounded in primary health care principles. Adopting a primary health care approach could help address rising healthcare costs and improve Canada's performance on health indicators relative to other countries.
This document provides an overview of general nutrition concepts. It defines key terms like food, nutrition, diet, and malnutrition. It outlines the six major nutrients - carbohydrates, proteins, fats, vitamins, minerals, and water. The document discusses dietary guidelines and food groups. It explains that human beings need food to provide energy for essential physiological functions like respiration, circulation, digestion, metabolism, maintaining body temperature, growth, and repair of tissues. The most vulnerable groups who require adequate nutrition are infants, young children, pregnant women, and lactating mothers.
The document outlines a road map to accelerate HIV prevention efforts to meet global targets of reducing new HIV infections by 75% by 2020. It finds that while progress has been made, declines in new infections have been too slow, with only 1.7 million new infections in 2016, an 11% decline since 2010. Of 25 focus countries, only 3 saw over 30% declines, while 8 had no decline or increases. No country met the 2015 target of 50% reduction. Faster progress is needed to avoid increased treatment costs and continued mother-to-child transmission programs. The road map proposes intensified prevention programs, especially for adolescent girls, young women and key populations.
This document discusses the key ethical issues that arise in public health surveillance programs. It begins with a brief history of public health surveillance and definitions of key terms. The main ethical problem discussed is the potential conflict between individual interests/rights and collective interests. While clinical ethics focuses on individual physician-patient relationships, public health ethics must consider the broader community. Some argue the ethics of public health and clinical practice are distinctly different given this shift from individual to collective interests. The document examines how tools and checklists can help evaluate the ethical acceptability of surveillance programs.
This document provides an overview of planning and management for health extension workers. It defines management as a process of reaching organizational goals through people and resources. The key functions of management are planning, organizing, staffing, directing, and controlling. Planning involves setting objectives and strategies, while evaluation assesses progress towards objectives. Communication and decision-making are also integral to the management process. Effective management applies principles like management by objectives and learning from experience. The roles of administration and management are also distinguished, with administration focusing more on policy and management on execution.
The document provides recommendations for surveillance of acute viral hepatitis. It defines clinical and laboratory criteria for diagnosing hepatitis A, B, and non-A/non-B. Surveillance is recommended to guide control measures like ensuring blood and injection safety and immunization programs. Countries should monitor cases of acute jaundice and increase in liver enzymes to detect hepatitis outbreaks and evaluate prevention programs. Standardized case definitions and laboratory tests are important for comparable surveillance data.
This document provides an introduction to a module on the Expanded Program on Immunization (EPI) in Ethiopia. The module aims to train health center teams and other health professionals to increase immunization coverage and reduce morbidity and mortality from six childhood diseases. Despite initiatives over the years, immunization coverage remains low in Ethiopia due to factors like lack of transportation, ineffective cold chains, shortage of trained staff, poor collaboration, and inadequate community involvement. The module seeks to address this through training and bringing about significant changes in EPI coverage.
This document provides a handbook on water programming published by UNICEF in 1999. It aims to guide field professionals in implementing UNICEF's water, environment and sanitation strategies. The handbook covers topics such as water and sustainable development, community participation and management, cost effectiveness, appropriate water technologies, and maintenance of water supply systems. It emphasizes the importance of community-based management of water resources, cost-effective solutions, and involvement from all levels of government and communities in water sector issues.
This document provides an introduction to the Somali PHAST Step-by-Step Guide, which uses participatory methods to help communities improve hygiene behaviors, prevent diarrheal diseases, and encourage community management of water and sanitation facilities. The guide contains 7 steps to take communities through developing a plan for preventing diarrheal diseases. Section 2 provides background concepts, defining hygiene, sanitation, the link between the two, and that hygiene and sanitation promotion requires more than just asking people to change - it requires understanding disease transmission and being motivated to promote positive behaviors.
The development of this lecture note for training Health Extension workers is an arduous assignment for Dr. Meseret Yazachew and Dr. Yihenew Alem at Jimma University.
This document was developed with inputs from many institutions and experts. Several individuals deserve special mention. Mary Arimond, Kathryn Dewey and Marie Ruel developed the analytical framework and provided technical oversight throughout the project. Eunyong Chung and Anne Swindale provided technical support. Nita Bhandari, Roberta Cohen, Hilary Creed de Kanashiro, Christine Hotz, Mourad Moursi, Helena Pachon and Cecilia C. Santos-Acuin conducted analysis of data sets. Chessa Lutter coordinated a working group to update the breastfeeding indicators. Mary Arimond and Megan Deitchler coordinated the working group that developed the Operational Guide on measurement issues which is a companion to this document. Bernadette Daelmans and José Martines coordinated the project throughout its phases. Participants in the consensus meetings held in Geneva 3–4 October 2006 and in Washington, DC 6–8 November 2007 provided invaluable inputs to formulate the recommendations put forward in this document.
POLICY MAKING PROCESS
Policy
• a statement of intent for achieving an objective.
• Deliberate statement aimed at achieving specific objective
• policies are formulated by the Government in order to provide
a guideline in attaining certain objectives for the benefit of the
people.
• Importance and objective of any policy
• to solve existing challenges/problems in any society
• used as a tool to safeguard and ensure better services to
members of the society.
• Reasons for formulating a Policy
• Reforms (socio-economic, technological advancements, etc)
within and outside the country.
This document describes a case-control study conducted to determine the reason for many students failing an exam. The study found that students who did not attend lectures had an 80 times higher chance of failing compared to students who did attend, and that this result was statistically significant with a p-value less than 0.05, suggesting not attending lectures was the likely cause of failure.
Aim of nutritional assessment
To identify nutritional problems of the community
To find the underlying cause for malnutrition
To plan and implement control of malnutrition
Maintain good nutrition of community
Ancylostomiasis, or hookworm infection, is an important global public health problem caused by parasitic hookworms that infect humans. It is transmitted when larvae penetrate the skin and enter the body, usually through walking barefoot on contaminated soil. In Libya, hookworm infection is very rare, with most cases found in farmers who come into contact with infected feces in soil. The hookworms live in the intestine and feed on blood, potentially causing iron deficiency anemia and related health issues if left untreated. Prevention relies on sanitary disposal of human waste and health education to avoid transmission.
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kol...rightmanforbloodline
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Versio
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Version
TEST BANK For An Introduction to Brain and Behavior, 7th Edition by Bryan Kolb, Ian Q. Whishaw, Verified Chapters 1 - 16, Complete Newest Version
TEST BANK For Community Health Nursing A Canadian Perspective, 5th Edition by...Donc Test
TEST BANK For Community Health Nursing A Canadian Perspective, 5th Edition by Stamler, Verified Chapters 1 - 33, Complete Newest Version Community Health Nursing A Canadian Perspective, 5th Edition by Stamler, Verified Chapters 1 - 33, Complete Newest Version Community Health Nursing A Canadian Perspective, 5th Edition by Stamler Community Health Nursing A Canadian Perspective, 5th Edition TEST BANK by Stamler Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Pdf Chapters Download Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Pdf Download Stuvia Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Study Guide Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Ebook Download Stuvia Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Questions and Answers Quizlet Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Studocu Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Quizlet Test Bank For Community Health Nursing A Canadian Perspective, 5th Edition Stuvia Community Health Nursing A Canadian Perspective, 5th Edition Pdf Chapters Download Community Health Nursing A Canadian Perspective, 5th Edition Pdf Download Course Hero Community Health Nursing A Canadian Perspective, 5th Edition Answers Quizlet Community Health Nursing A Canadian Perspective, 5th Edition Ebook Download Course hero Community Health Nursing A Canadian Perspective, 5th Edition Questions and Answers Community Health Nursing A Canadian Perspective, 5th Edition Studocu Community Health Nursing A Canadian Perspective, 5th Edition Quizlet Community Health Nursing A Canadian Perspective, 5th Edition Stuvia Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Pdf Chapters Download Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Pdf Download Stuvia Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Study Guide Questions and Answers Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Ebook Download Stuvia Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Questions Quizlet Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Studocu Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Quizlet Community Health Nursing A Canadian Perspective, 5th Edition Test Bank Stuvia
Histololgy of Female Reproductive System.pptxAyeshaZaid1
Dive into an in-depth exploration of the histological structure of female reproductive system with this comprehensive lecture. Presented by Dr. Ayesha Irfan, Assistant Professor of Anatomy, this presentation covers the Gross anatomy and functional histology of the female reproductive organs. Ideal for students, educators, and anyone interested in medical science, this lecture provides clear explanations, detailed diagrams, and valuable insights into female reproductive system. Enhance your knowledge and understanding of this essential aspect of human biology.
Muktapishti is a traditional Ayurvedic preparation made from Shoditha Mukta (Purified Pearl), is believed to help regulate thyroid function and reduce symptoms of hyperthyroidism due to its cooling and balancing properties. Clinical evidence on its efficacy remains limited, necessitating further research to validate its therapeutic benefits.
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
Osteoporosis - Definition , Evaluation and Management .pdfJim Jacob Roy
Osteoporosis is an increasing cause of morbidity among the elderly.
In this document , a brief outline of osteoporosis is given , including the risk factors of osteoporosis fractures , the indications for testing bone mineral density and the management of osteoporosis
Here is the updated list of Top Best Ayurvedic medicine for Gas and Indigestion and those are Gas-O-Go Syp for Dyspepsia | Lavizyme Syrup for Acidity | Yumzyme Hepatoprotective Capsules etc
1. 4/12/2011 Data analysis and causal inference 1
Data analysis and causal inference – 1
Victor J. Schoenbach, PhD home page
Department of Epidemiology
Gillings School of Global Public Health
University of North Carolina at Chapel Hill
www.unc.edu/epid600/
Principles of Epidemiology for Public Health (EPID600)
2. 12/30/2001 Data analysis and causal inference 2
The Physicist, the Chemist, and the Statistician
From “Science Jokes”, posted to Usenet groups by Joachim Verhagen
(verhagen@fys.ruu.nl); downloaded from, Keith M. Gregg,
keith.gregg@stanford.edu, www-leland.stanford.edu/~keithg/humor.shtml
“Three professors (a physicist, a chemist,
and a statistician) are called in to see their
dean. Just as they arrive the dean is called
out of his office, leaving the three professors
there. The professors see with alarm that
there is a fire in the wastebasket.
3. 12/30/2001 Data analysis and causal inference 3
The Physicist, the Chemist, and the Statistician
From “Science Jokes”, posted to Usenet groups by Joachim Verhagen
(verhagen@fys.ruu.nl); downloaded from, Keith M. Gregg, keith.gregg@stanford.edu,
www-leland.stanford.edu/~keithg/humor.shtml
“The physicist says, ‘I know what to do! We
must cool down the materials until their
temperature is lower than the ignition
temperature and then the fire will go out.’
4. 12/30/2001 Data analysis and causal inference 4
The Physicist, the Chemist, and the Statistician
From “Science Jokes”, posted to Usenet groups by Joachim Verhagen
(verhagen@fys.ruu.nl); downloaded from, Keith M. Gregg, keith.gregg@stanford.edu,
www-leland.stanford.edu/~keithg/humor.shtml
“The chemist says, ‘No! No! I know what to
do! We must cut off the supply of oxygen so
that the fire will go out due to lack of one of
the reactants.’
5. 12/30/2001 Data analysis and causal inference 5
The Physicist, the Chemist, and the Statistician
From “Science Jokes”, posted to Usenet groups by Joachim Verhagen
(verhagen@fys.ruu.nl); downloaded from, Keith M. Gregg, keith.gregg@stanford.edu,
www-leland.stanford.edu/~keithg/humor.shtml
“While the physicist and chemist debate
what course to take, they both are alarmed
to see the statistician running around the
room starting other fires. They both scream,
‘What are you doing?’
To which the statistician replies, ‘Trying to
get an adequate sample size.’”
6. 12/30/2001 Data analysis and causal inference 6
Data management
• Managing epidemiologic data is “mass
production”
• A systematic, organized, professional
approach is critical for detecting and
avoiding problems
7. 12/30/2001 Data analysis and causal inference 7
“You can never, never take
anything for granted.”
Noel Hinners, vice president for flight
systems at Lockheed Martin Astronautics,
whose engineering team reported
measurements in English units that the
Mars Climate Orbiter navigation team
assumed were metric units.
8. 12/30/2001 Data analysis and causal inference 8
Without the documentation, the data may be
of little if any value (1995 NSFG)
00000000000003122222222402143041000
00000000000001144112131 070520310
00000000000003233112131 072331040
000000000000011163322227070350110
00000000000003133022221 02451121000
00000000000001111112131 02110041000
00000000000002111112131 07307131000
00000000000002122112131 01073041000
9. 12/30/2001 Data analysis and causal inference 9
Data analysis and causal inference
• “Our data say nothing at all.”
(Epidemiology guru Sander Greenland, Congress of
Epidemiology 2001, Toronto)
• Data are observer notes, respondent
answers, biochemical measurements,
contents of medical records, machine
readable datasets, …
• What does one do with them?
10. 11/13/2007 Data analysis and causal inference 10
Steps in data management
• Design the data collection process
• Write down all data collection procedures
• Train and supervise data collectors
• Monitor all data collection activities
• Document all data collection experiences
• Keep track of, document, and safeguard
data
11. 11/13/2007 Data analysis and causal inference 11
Data processing
• Review, edit, and code data forms,
documenting exceptions and actions
• Convert to electronic form
• “Clean” data – check for illegal or
improbable values, combinations of values
• Prepare summaries
12. The case of the missing eights
• Cancer Prevention study II
(N=1.2 million)
• Contractor keyed 20,000
forms/wk; checked weekly.
• 28-item food frequency had
peculiar pattern of missings
• Pulled original QQs to check
• Programmer checked code
• Cause: “O” instead of “0”
Steven D. Stellman. Am J Epidemiol
1989;129(4):857-860
4/12/2011 Data analysis and causal inference 12
13. 4/12/2011 Data analysis and causal inference 13
Can you find the data management error?
48 * get non-hispanic white population in county for 2000, first by adding
49 ages 15-24, 25-34, 35-44, and 45-64, then by excluding ages 45-64;
50
51 CWHITES=CST00609+CST00610+CST00611+CST00612;
52 CWHITES2=CWHITES-CST00612;
53
54 * get non-hispanic black population in county;
55
56 CBLACKS=CST00616+CST00617+CST00618+CST00619;
57 CBLACKS2=CBLACKS-CST00619;
58
59 * get hispanic or latino population in county;
60
61 CHISPS=CST00623+CST00624+CST00625+CST00626;
62 CHISPS2=CHISPS-CST00626;
63 (continues on next slide)
14. 4/12/2011 Data analysis and causal inference 14
Can you find the data management error?
CST00637 Female population white alone aged 15-24, 2000 – county
CST00638 Female population white alone aged 25-34, 2000 – county
CST00639 Female population white alone aged 35-44, 2000 – county
CST00640 Female population white alone aged 45-64, 2000 – county
CST00644 Female population black* alone aged 15-24, 2000 – county
CST00645 Female population black* alone aged 25-34, 2000 – county
CST00646 Female population black* alone aged 35-44, 2000 – county
CST00647 Female population black* alone aged 45-64, 2000 – county
CST00651 Female population Hispanic* aged 15-24, 2000 – county
CST00652 Female population Hispanic* aged 25-34, 2000 – county
CST00653 Female population Hispanic* aged 35-44, 2000 – county
CST00654 Female population Hispanic* aged 45-64, 2000 – county
* Full variable name: “black or African American”, “Hispanic or Latino”
(continues on next slide)
15. 4/12/2011 Data analysis and causal inference 15
Can you find the data management error?
64 * get non-hispanic white female population in county;
65
66 CWFEMALES=CST00637+CST00638+CST00639+CST00640;
67 CWFEMALES2=CWFEMALES-CST00640;
68
69 * get non-hispanic black female population in county;
70
71 CBFEMALES=CST00644+CST00645+CST00646+CST00647;
72 CBFEMALES2=CBFEMALES-CST00646;
73
74 * get hispanic female population in county;
75
76 CHFEMALES=CST00651+CST00652+CST00653+CST00654;
77 CHFEMALES2=CHFEMALES-CST00654;
(continues on next slide)
16. 4/12/2011 Data analysis and causal inference 16
Can you find the data management error?
64 * get non-hispanic white female population in county;
65
66 CWFEMALES=CST00637+CST00638+CST00639+CST00640;
67 CWFEMALES2=CWFEMALES-CST00640;
68
69 * get non-hispanic black female population in county;
70
71 CBFEMALES=CST00644+CST00645+CST00646+CST00647;
72 CBFEMALES2=CBFEMALES-CST00646;
73
74 * get hispanic female population in county;
75
76 CHFEMALES=CST00651+CST00652+CST00653+CST00654;
77 CHFEMALES2=CHFEMALES-CST00654;
17. 12/30/2001 Data analysis and causal inference 17
Data exploration
• Examine the data – frequency
distributions, cross-tabulations,
scatterplots – be alert for surprises and
suspicious findings
• Examine means and prevalence for
factors of interest, overall and within
interesting subgroups
• Look at associations, prevalence ratios,
relative risks, odds ratios, correlations
18. 12/30/2001 Data analysis and causal inference 18
Carry out focused data analysis
• Desirable to have a written analysis plan
based on the research questions
• Typically carry out “crude” analyses and
analyses controlling for important
variables
• Methods of control: stratification,
mathematical modeling
19. Distribution of U.S. household income, 2007
(CPS data)
4/12/2011 Data analysis and causal inference 19
Income in $1000s/year
Source: http://img55.imageshack.us/i/incomedistr07jo6.jpg/
20. 12/30/2001 Data analysis and causal inference 20
Stratified analysis
• Divide the dataset into subsets according
to relevant covariables (e.g., age, sex,
smoking, …)
• Examine the estimates and associations
within each subset (unless there are too
many)
• Take averages across the subsets
21. 11/13/2007 Data analysis and causal inference 21
Mathematical modeling
• Express the outcome as some
mathematical function of the relevant
covariables
• “Fit” this function to the data, so that it
models the relations in the data
• Interpret the resulting model to draw
inferences about associations
22. 11/13/2007 Data analysis and causal inference 22
Selecting a pattern to sew a pair of pants
• Want one that fits the need
• Can sew without a pattern, but takes
time and may not look good
• Select a pattern that will be well
received
• Have you seen anyone wearing it?
• Has it been featured in magazines
23. 12/30/2001 Data analysis and causal inference 23
The strategy of statistical data analysis
Look for an available statistical
model that will fit the situation (e.g.,
binomial, normal, chi-square, linear)
• Have others used it?
• Has it appeared in a methodology
article?
24. 12/30/2001 Data analysis and causal inference 24
The strategy of statistical data analysis
Summarize the data in terms of the
statistical model
– Mean
– Standard deviation
– Other parameters
25. 4/22/2002 Data analysis and causal inference 25
But should always look at the data
• Distributions can have same mean
and standard deviation but look very
different – e.g., same mean:
5 5
26. 4/18/2006 Data analysis and causal inference 26
Regression models - Conceptual
• Suppose risk factors of:
Age 50 years
BP 130 mmHG systolic
CHL 220 mg/dL
SMK 30 pack-years
27. 4/13/2010 Data analysis and causal inference 27
Regression models - Conceptual
Example of an additive model:
Risk of CHD =
Risk from Age (“Age_risk”)
Risk from BP (“BP_risk”)
Risk from CHL (“CHL_risk”)
Risk from SMK (“SMK_risk”)
28. 4/13/2010 Data analysis and causal inference 28
Propose the model
Risk of CHD = Age_risk + BP_risk + CHL_risk + SMK_risk
Age_risk = Age in years x risk increase per year
BP_risk = BP in mmHG x risk increase per mmHG
CHL_risk = Cholest. in mg/dL x risk increase per mg/dL
SMK_risk = Pack-years x risk increase per pack-year
29. 4/13/2010 Data analysis and causal inference 29
Fit the model – estimate the coefficients
• Risk = β0 +β1Age + β2BP + β3CHL + β4SMK
β0 = baseline risk
β1 = risk increase per year
β2 = risk increase per mmHG
β3 = risk increase per mg/dL
β4 = risk increase per pack-year
• Use the data and statistical techniques to
estimate β1, β2, β3, β4.
30. 12/30/2001 Data analysis and causal inference 30
P-values and Power
• P-value: “the probability of obtaining
an interesting-looking sample from a
boring population” (1 – specificity)
• Power: “the probability of obtaining
an interesting-looking sample from
an interesting population” (sensitivity)
31. 11/16/2004 Data analysis and causal inference 31
The P-value
If my study observes 0.5 [e.g., ln(OR)]
0
Boring population
0.7 [ln(OR)]
Interesting population
32. 11/22/2005 Data analysis and causal inference 32
The P-value
If my study observes 0.5 [e.g., ln(OR)]
0
Boring population
0.7
Interesting population
P-value
33. 11/16/2004 Data analysis and causal inference 33
The Problem with the P-value
But the P-value does not tell me the
probability that what I observed was
due to chance
0
Boring population
0.7
Interesting population
34. 11/16/2004 Data analysis and causal inference 34
If I study only boring populations
0
Distributions of samples from boring populations
35. 11/16/2004 Data analysis and causal inference 35
If I study only interesting populations
0
0.7
Distributions of samples from interesting populations
36. 11/22/2005 Data analysis and causal inference 36
Many boring populations
0
Boring populations
0.7
Interesting populations
37. 11/22/2005 Data analysis and causal inference 37
Many interesting populations
0
Boring populations
0.7
Interesting populations
38. 12/30/2001 Data analysis and causal inference 38
Do epidemiologists study boring populations?
That probability depends on how many boring
populations there are. If we study
10 interesting populations
100 boring populations
with 90% power and 5% significance level, we
expect us to obtain 9 interesting samples from
the interesting populations and 5 from the
boring populations
39. 11/22/2005 Data analysis and causal inference 39
P-values and predictive values
Results:
14 interesting samples
5 came from boring populations
Probability that an interesting sample
came from a boring population:
5/14 = 36% – not 5%!
Analogous to positive predictive value
40. 4/12/2011 Data analysis and causal inference 40
Analogy to positive predictive value
Populations
Samples Interesting
(“cases”)
Boring
(“noncases”)
Total
Interesting
(“positive”)
9 5 14 PV+
64%
Boring
(“negative”)
1 95 96
Total 10 100 110
(with 90%
sensitivity)
(with 95%
specificity)
41. 4/12/2011 Data analysis and causal inference 41
Meta-analysis
• Literature reviews
• Systematic literature reviews
• Every study is an observation from a
population of possible studies
• The set of studies that have been
published may be a biased sample
from that population
42. 7/1/2009 Data analysis and causal inference 42
What should guide data analysis
• What are the research questions?
– Estimate means (e.g., cholesterol)
and prevalences (e.g., HIV)
– Assess associations (e.g., Is blood
lead associated with elevated blood
pressure?; Do prepaid health plans
provide more preventative care? Do
bednets protect against malaria?)
43. 11/20/2007 Data analysis and causal inference 43
Association of helmet use with death in motorcycle
crashes: a matched-pair cohort study
(Daniel Norvell and Peter Cummings, AJE 2002;156:483-7)
• Data from the National Highway Traffic
Safety Administration’s Fatality Analysis
Reporting System
• Exposure: helmet use; Outcome: death
• Potential confounders: sex, seat position,
age, state helmet law
44. 11/20/2007 Data analysis and causal inference 44
Association of helmet use with death in motorcycle
crashes: a matched-pair cohort study
(Daniel Norvell and Peter Cummings, AJE 2002;156:483-7)
• 9,222 driver-passenger pairs after
exclusions
• Relative risk of death for a helmeted rider
was 0.65 (0.57-0.74), (0.61 adjusted for
seat position)
• Examined effect measure modification by
seat position and by type of crash.
45.
46. When the
proofreader takes a
week off
12/29/2009, B5
Dec 2009 Close
28 10547.08
25 10520.10
24 10520.10
23 10466.44
22 10464.93
21 10414.14
18 10328.89
17 10308.26
www.google.com/finance/historical?q=INDEXDJX:.DJI Dec 22 23 24 25 28
47. I hope he’s having
a good break!
12/31/2009, B6
Dec 23 24 25 28 29
Dec 2009 Close
29 10545.41
28 10547.08
25 10520.10
24 10520.10
23 10466.44
22 10464.93
21 10414.14
18 10328.89
17 10308.26
www.google.com/finance/historical?q=INDEXDJX:.DJI
48. 4/12/2011 Data analysis and causal inference 48
Thank you
• Arigato
• Asanti
• Dhanyavaad
• Dumela
• Gracias
• Merci
• Obrigato
• Xie xie
Editor's Notes
Xin chao, Guten tag, wilkommen, karibuni, dumela, merhaba, shalom, huan-ying, bienvenidos, boa tarde
This two-part lecture is about data analysis and causal inference.
As long as we’re talking about data analysis, let’s begin with a little story about statisticians (The Physicist, the Chemist, and the Statistician, in “Science Jokes”, posted to Usenet groups by Joachim Verhagen (verhagen@fys.ruu.nl); downloaded from, Keith M. Gregg, keith.gregg@stanford.edu, www-leland.stanford.edu/~keithg/humor.shtml)
“Three professors (a physicist, a chemist, and a statistician) are called in to see their dean. Just as they arrive the dean is called out of his office, leaving the three professors there. The professors see with alarm that there is a fire in the wastebasket.”
“The physicist says, ‘I know what to do! We must cool down the materials until their temperature is lower than the ignition temperature and then the fire will go out.’”
“The chemist says, ‘No! No! I know what to do! We must cut off the supply of oxygen so that the fire will go out due to lack of one of the reactants.’”
“While the physicist and chemist debate what course to take, they both are alarmed to see the statistician running around the room starting other fires. They both scream, ‘What are you doing?’
To which the statistician replies, ‘Trying to get an adequate sample size.’”
The first thing we do with data is to manage them (note that epidemiologists usually regard the word “data” as a plural word, based on its Latin root; however, other fields often consider “data” to be singular). Since epidemiologic studies tend to have many – hundreds, thousands, or even millions – of observations and often tens or hundreds of data items for each observation, managing epidemiologic data involves “mass production”. Therefore a systematic, organized, professional approach is critical for detecting and avoiding problems with the data.
Data management, including careful and thorough documentation, is one of those activities like sanitation, hygiene, laundry, maintenance, and the like that are critical to health and well-being but largely underappreciated.
The consequences of lapses in managing data can be far-reaching, and one can never take anything for granted. One of the more dramatic consequences of a lapse in data management was the destruction of the Mars Climate Orbiter when it touched down at too high a speed on the Martian surface. In the investigation of the crash, it turned out that the force data reported by the Lockheed Martin engineering team had been in English units, but the navigation team at NASA had assumed that they were in metric units.
And so, as Noel Hinners, vice president for flight systems at Lockheed Martin Astronautics said, “you can never, never take anything for granted.”
Without proper documentation, data may be of little if any value. For example, on the slide is an excerpt of data from the 1995 National Survey of Family Growth that my colleague Dr. Adaora Adimora and I have been analyzing to study concurrent sexual partnerships among U.S. women. Sometimes people will go to great lengths to save their data for years and years, only to find that they never had or neglected to save the documentation for it. Without the documentation, the data are, essentially, useless.
The preceding lecture began with Sander Greenland’s assertion that data say nothing at all. Data consist of observer notes, respondent answers, biochemical measurements, contents of medical records, machine readable datasets, and other kinds of information from which we attempt to derive meaning.
So what does one do with them? Analysis and interpretation of the data create the meaning that we ascribe to the data.
The steps in data management are to:
1. Design the process by which data will be collected, writing down all data collection procedures
2. Train and supervise data collectors and monitor all data collection activities
3. Document all data collection experiences so that later it will be possible to reconstruct what happened or how issues that arose were resolved
and very importantly,
4. Keep track of, document, and safeguard the data and the documentation
It may seem superfluous to remind people not to lose their data. But as I said, data management is an under-appreciated activity, so people tend to be casual about it (“I’ll back it up ‘tomorrow’”). A project I worked on back in the mainframe era nearly lost an entire year’s work because of a disk crash. The person responsible for backing up the disk – my boss – had kept putting off the task. Fortunately we had shared a copy of the files with another organization, and they had, fortunately, not recycled the tape yet! The American College of Epidemiology had to recreate its membership database when it was lost. In November 2002 thieves stole the hard drives from 9 personal computers in the Epidemiological and Communicable Diseases unit at the Indian Council of Medical Research in New Delhi. There was apparently no backup. So you see, epidemiologists are as fallible in this area as the rest of us are.
The next steps in data management are to review, edit, and code the data forms (e.g., questionnaires, abstracts of records, notes from observations). For example, the questionnaire may have instructed respondents to “mark one response”, but you may get questionnaires where two responses are circled, or a response is marked midway between two choices, and the like. Someone needs to decide how to handle these situations and to edit the forms accordingly. Questions about how these situations were handled may well arise. So it is important to document the coding decisions, the forms that had exceptions, and the actions taken. Occasionally it may be necessary to go back and revise all of the exceptions handled in a certain way, and it is much easier to work from a list than to have to go through all of the forms again. [For example, in a multi-site project in which I participated, the data center proposed to code intermediate responses (e.g., when “2” and “3” were both circled or a mark was made between them) as the higher number, a plan which was endorsed by the Data Analysis Committee. Later, though, the principal investigator at one of the sites persuaded the Steering Committee that the responses should have been coded with fractional values (e.g., “2.5”), necessitating re-review of thousands of forms to identify the exceptions.]
After the forms are edited, the data are converted to electronic form, usually by keying into a computer, sometimes by optical scanning. Increasingly interview data are captured directly by computer, through CATI (Computer-Assisted Telephone Interview), CAPI (Computer-Assisted Personal Interview), and A-CASI (Audio Computer-Assisted Self-Interview) technology.
Stellman reports an unusual data error encountered in the Cancer Prevention Study II, with 1.2 million questionnaires completed during fall 1982. Data were entered and key-verified under contract. The firm typically processed 20,000 forms/week, and researchers subjected each batch to an “exhaustive battery” of computer checks.
As the researchers were beginning a factor analysis of the 28-item food frequency section, however, they examined the distribution of missing values and found to their surprise and puzzlement that there appeared to be no questionnaires with exactly 8 or 18 missing food items. After an intensive investigation, including pulling a sample of the original data forms they concluded that there was a programming problem:
“The contractor’s lead programmer was asked to inspect all code related to the flag in question, but could find no errors. This was an exceptionally capable individual, whose word could be accepted as final. Seeking a possible (but unlikely) flaw in our own data logging process, we examined originally delivered data tapes . . ., but these proved to be identical in content to the system files. The problem simply had to originate with the contractor. At the time we were reaching this conclusion, the programmer called back again with a sheepish tone to say she had discovered the problem in her program. After all data items had been entered, the number of missing items was subtracted from 28 and the result was tested against zero; if the numbers were equal, the first item in the series was output as the flag character and the remaining 27 were output as blanks. But the line of code with the test contained a misprint: A letter “O” had been typed instead of a zero (one of the hardest programming errors to detect). In the machine level language of the contractor’s computer, this mistyped instruction was still a legal one, but it gave a test result of “true” for any number of missing items that ended with the digit 8.” (859-860)
Steven D. Stellman. The case of the missing eights. Am J Epidemiol 1989;129(4):857-860, http://aje.oxfordjournals.org/content/129/4/857.full.pdf
48 * get non-hispanic white population in county for 2000, first by adding
49 ages 15-24, 25-34, 35-44, and 45-64, then by excluding ages 45-64;
50
51 CWHITES=CST00609+CST00610+CST00611+CST00612;
52 CWHITES2=CWHITES-CST00612;
53
54 * get non-hispanic black population in county;
55
56 CBLACKS=CST00616+CST00617+CST00618+CST00619;
57 CBLACKS2=CBLACKS-CST00619;
58
59 * get hispanic or latino population in county;
60
61 CHISPS=CST00623+CST00624+CST00625+CST00626;
62 CHISPS2=CHISPS-CST00626;
63 (continues on next slide)
CST00637 Female population white alone aged 15-24, 2000 – county
CST00638 Female population white alone aged 25-34, 2000 – county
CST00639 Female population white alone aged 35-44, 2000 – county
CST00640Female population white alone aged 45-64, 2000 – county
CST00644Female population black* alone aged 15-24, 2000 – county
CST00645Female population black* alone aged 25-34, 2000 – county
CST00646Female population black* alone aged 35-44, 2000 – county
CST00647Female population black* alone aged 45-64, 2000 – county
CST00651 Female population Hispanic* aged 15-24, 2000 – county
CST00652 Female population Hispanic* aged 25-34, 2000 – county
CST00653 Female population Hispanic* aged 35-44, 2000 – county
CST00654 Female population Hispanic* aged 45-64, 2000 – county
* Full variable name: “black or African American”, “Hispanic or Latino” (continues on next slide)
64 * get non-hispanic white female population in county;
65
66 CWFEMALES=CST00637+CST00638+CST00639+CST00640;
67 CWFEMALES2=CWFEMALES-CST00640;
68
69 * get non-hispanic black female population in county;
70
71 CBFEMALES=CST00644+CST00645+CST00646+CST00647;
72 CBFEMALES2=CBFEMALES-CST00646;
73
74 * get hispanic female population in county;
75
76 CHFEMALES=CST00651+CST00652+CST00653+CST00654;
77 CHFEMALES2=CHFEMALES-CST00654;
(continues on next slide)
64 * get non-hispanic white female population in county;
65
66 CWFEMALES=CST00637+CST00638+CST00639+CST00640;
67 CWFEMALES2=CWFEMALES-CST00640;
68
69 * get non-hispanic black female population in county;
70
71 CBFEMALES=CST00644+CST00645+CST00646+CST00647;
72 CBFEMALES2=CBFEMALES-CST00646;
73
74 * get hispanic female population in county;
75
76 CHFEMALES=CST00651+CST00652+CST00653+CST00654;
77 CHFEMALES2=CHFEMALES-CST00654;
After or in the process of “cleaning” the data (reviewing distributions for illegal or improbable values or combinations of values, such as pregnant males), it is important to examine the data to familiarize oneself with them, to be aware of how various factors are distributed, and, always, to be on the alert for surprises and suspicious findings. Analysts inspect frequency distributions, cross-tabulations, and scatterplots to “see” the data. One worthwhile practice is to make sure that the numbers of respondents in each table are what they should be, since respondents can be lost or duplicated when datasets are merged, variables are recoded, subgroups are examined, and so forth.
Summary statistics, such as means, proportions (e.g., prevalence, incidence proportions), and rates (e.g., incidence rates), are examined in the dataset as a whole and within various subgroups. Even if groups will be combined for analysis, it is good practice to look at data by gender, age group, and various other dimensions relevant to the study population and type of data.
The next step is to look at associations among factors, with such measures as prevalence ratios, relative risks, odds ratios, and correlation coefficients. Graphical analysis techniques can be revealing at all of these stages.
A thorough exploration of the data helps to catch problems resulting either from errors or lapses of some kind and also helps for identifying features of the data that have implications for the formal statistical analysis (e.g., outliers, skewed distributions). For example, some statistical analysis techniques assume that variables are normally distributed. If the exploration reveals that they are not, a transformation must be applied or statistical analysis techniques employed that do not make this assumption.
It is generally desirable to have a written analysis plan to guide the analysis of the research questions. Even if you have a clear idea of the study question and how to proceed to examine it, it is easy to become lost in the process of scanning distributions, examining hundreds of means and proportions, and staring at screens and printouts. So write down your plan with as much specificity as you can.
Usually the data analysis plan will call for a formal assessment of the crude estimates and associations, followed by estimates and associations that control for important covariables identified in your analysis plan, such as potential confounders. There are two major methods of controlling for covariables: stratified analysis and mathematical modeling.
For example, the distribution of U.S. household income is not a “normal” distribution.
In stratified analysis, the dataset is divided into subsets according to one or more covariables to be controlled. For example, an overall dataset might be examined within subgroups formed by gender, age group, urban-rural, smoking status, blood pressure, etc., depending upon the factors being studied. Ideally the results will be inspected within each stratum, unless there are too many strata to make that practical. Then, by averaging the estimates across sets of strata and across all of them, the analyst obtains adjusted estimates that control for the stratification variables. Age standardization, considered earlier in the course, is an example of stratified analysis.
Important advantages of stratified analysis are that it usually requires fewer assumptions about the distributions of variables and their relationships, and it shows all of the data. With mathematical modeling, it is easy to miss important features of the data because they are not in view. Disadvantages of stratified analysis are that if there are several variables to control the number of strata becomes large very quickly. Also, variables must be categorized in order to form the strata. Besides the work involved in categorizing the variables, categorization can reduce available precision. But looking at stratified analyses is a good idea as an accompaniment to other analysis methods, even if one does not ultimately report the stratified analysis results.
Mathematical modeling is the second and now most widely used method for examining associations while controlling for important covariables. With mathematical modeling, we find a way to express the outcome, such as incidence, as a mathematical function of the covariables we consider important. For example, we might express the incidence of heart disease as a function of age, smoking status, blood pressure level, cholesterol level, presence of diabetes, and so on. We usually specify the form of this function (e.g., whether we add the effect of one factor to the effect of another, multiply the effect of one factor by another, and so forth) and then tailor the function to fit the data. An analogy is choosing a dress pattern and then adjusting it to fit the person for whom the dress is being made.
Fitting the model involves statistical procedures that estimate parameters that indicate the quantitative contribution to the outcome of each of the factors in the model (contingent, of course, always on the assumptions that the model was based on and the model form). The model will have been chosen so that these parameters have a useful interpretation. For example, the parameters might estimate the difference in risk of the outcome attributable to a factor or the odds ratio relating a factor to the outcome. When the parameter is a ratio, the model usually works with it on the log scale. So that’s why we use logistic and log bionomial regression.
A possible analogy to statistical analysis of data, especially inferential statistics, where the analyst attempts to draw inferences about a population from a sample of data, is the way a seamstress or tailor might approach sewing a pair of pants.
In selecting a pattern, s/he looks first for one that will suit the purpose for which the pants are intended (so for example, dress pants, work pants, casual pants, shorts, athletic shorts, etc.). It’s possible to sew without a pattern, but the result may not look good.
Also, s/he will want to chose a pattern that will be well received when the pants are worn. One consideration is whether s/he has seen anyone wearing pants in that style. Another might be whether the pattern has been featured in a fashion magazine.
So in analyzing data, the analyst looks for an available statistical model that appears to fit the situation – for example, the binomial, normal, or chi-square distribution, or the linear or logistic model.
If others have used that model (i.e., that pattern, essentially) with data of the type we are dealing with, the result is more likely to be well received by our peers. Similarly, if the model has been presented in a scientific journal (perhaps that is the statistician’s equivalent of a fashion magazine) the result is likely to be well-received.
Having chosen the pattern, the person sewing a pair of pants will need to select the correct size pattern or adjust the pattern to fit the person who will wear the pants. Similarly, having chosen the type of statistical model, the data analyst will select the “size” model for the data, which involves estimating the parameters that the model uses. For example, a normal distribution is a family of distributions that can be wide or narrow and can be located anywhere on the real number line. The analyst selects a specific distribution that fits the location of the data on the real number line (the mean, essentially), the dispersion of the data around its mean (standard deviation or variance), and whether the data are skewed to the right or left, and so forth.
Of course, just as the seamstress will want to see the person whom the pattern is to fit, the data analyst will want to look at the data before selecting the model. For example, two distributions can have the same mean and standard deviation but differ greatly in other respects. The two distributions in the diagram both have a mean of 5, but otherwise they are very different.
Epidemiology also uses mathematical models to determine expected outcomes. Suppose we want to estimate the risk of CHD for someone age 50 years, with systolic blood pressure of 130 mmHG, serum cholesterol of 220 mg/dL, who has smoked a pack of cigarettes/day for 30 years.
Graphical examples of simple linear regression:
http://www.sjsu.edu/faculty/gerstman/StatPrimer/regression.pdf
http://cast.massey.ac.nz/core/index.html?book=biometric (more thorough)
Regression models are the kind most often used in epidemiology. With a regression model we begin with a concept of how risk of the outcome relates to a set of risk factors.
Each risk factor will need a multiplier (a “coefficient”) to translate its value into a risk-equivalent.
Then we use the data and statistical techniques (regression analysis) to estimate the most likely values of the coefficients. That process is called “fitting the model”.
P-values are ubiquitous in health research, though they are widely misunderstood as well. Here is an attempt to convey an intuitive sense of how they work.
A p-value might be regarded as the probability of obtaining an interesting-looking sample from a boring population. We know that even if nothing is going on in a population, a particular sample just might appear to have an intriguing association. The p-value is computed to tell us the probability of obtaining an unusual sample even when no such association exists in the underlying population.
Statistical power is in some respects the inverse of the p-value. Statistical power is the probability of obtaining an interesting-looking sample from an interesting population.
Both the p-value and statistical power are the probability of obtaining an interesting sample (i.e., one with an association of interest). But we know that by the vagaries of random sampling a particular sample might not represent the population well. So the p-value and power tell us how likely an interesting sample could arise in these two very different situations, the situation where there is no association in the population and the situation where there is an association in the population.
The problem with p-values is not so much what they try to do as how we try to interpret them. This slide shows two possible populations. The one on the left is the boring one – it has no association. The one on the right is the interesting one – in this population there is an association we are interested in detecting.
Suppose I conduct a sample survey – a cross-sectional study with a sample of people randomly selected from the population. I will use the OR as the measure of association, and to make the situation easier to diagram, I am going to show the OR on the log scale (the distribution of the log of the OR is symmetrical, whereas that of the OR is not). To orient you, an OR of 1.0 has a natural log of zero and would correspond to a “boring” population. An OR of 2.0 has a natural log of 0.7; an OR of 1.65 has a natural log of 0.5. It’s just a 1-for-1 transformation. [We will write the natural logarithm of the OR as ln(OR).]
So now I draw a sample, compute the ln(OR), and the result happens to be, say, 0.5. That’s an OR of 1.65, represented by the vertical blue line. I do not know what the population really looks like, so I consider the possibilities. One possibility is that there is no association in the population – i.e., it’s boring (that’s the one on the left), and the true value of the association is ln(OR)=0, or the OR=1.0. Another possibility is that the population is not a boring one – it might, for example, be the interesting one on the right, where the true value of the association is ln(OR)=0.7, in other words, OR=2.0. We would like to know the probability that the sample I obtained came from one of the interesting possible populations rather than from the boring population.
So we would like to know the probability that our sample came from an interesting population. But it’s much easier to figure out the reverse – the probability that a particular population would give rise to my sample. So the p-value is designed to provide the probability that, given the size of my study, the boring population would produce a sample as (or more) interesting as the one I obtained.
In the diagram, the normal-looking curve on the left shows the distribution of values of the ln(OR) that would be observed if I repeated my study a large number of times in a boring population. Most of the times the sample I obtain would be boring, but sometimes it would be interesting. The pink in the right tail of the graph shows the proportion of times that the sample I obtain would have a ln(OR) of 0.5 or greater. The pink area on the left of the distribution shows the probability that I would observe a ln(OR) of -0.5 or lower, which for now let’s think of as equally interesting. This is not quite the information I wanted to know, but it is nevertheless useful.
So the p-value tells me the probability that the boring population on the left would yield a sample as interesting or more so than the one I obtained. But the p-value does not tell me the probability that what I observed actually came from that boring population, i.e., that the association was due only to chance. The reason is that that probability – the probability that the sample I obtained came from the boring population – depends on how many boring populations I study and how many interesting populations I study.
For example, if I study only boring populations, the probability that my samples come from boring populations is 100%. Even when by chance I observe an interesting sample, as I will from time to time, if I study only boring populations, then that sample must have come from one.
In contrast, if I study only interesting populations, then all of my samples must come from interesting populations – even when by chance, as will happen, I get a boring sample (in other words, one that does not show an association).
If I have been rather unsuccessful in identifying worthwhile hypotheses to test, most of the populations I am studying are boring. Thus, even when the p-value is less than 0.05, there is a substantial probability that the sample simply represents an atypical sample from a boring population.
On the other hand, if you have been very successful in identifying worthwhile hypotheses to test, then most of the populations you study are interesting. Even if the p-value for a particular sample is greater than 0.05, there is a substantial probability that the sample simply represents an atypical sample from an interesting population.
So the probabilities that a given sample we obtain came from a boring or an interesting population depend on the relative proportions of boring and interesting populations that we study – information that’s generally not possible to know.
Suppose that every epidemiologist studies 10 interesting populations (or 10 true associations) and 100 boring populations (or 100 non-existent associations). If the statistical power (probability of obtaining an interesting sample from an interesting population) is 90%, then we expect that epidemiologists will obtain, on average, 9 interesting samples from the 10 interesting populations. Similarly, if our criterion for an “interesting sample” is a p-value less than 5% (that’s from our 5% significance level), then we expect epidemiologists to obtain, on average, 5 interesting samples from the 100 boring populations.
So these epidemiologists have observed, on average, 14 interesting samples, 5 of which came from boring populations. All these interesting samples had a p-value less than 5%, by our definition of an interesting sample. But the probability that a given interesting sample came from a boring population is 5/14 = 36%, not 5%!
You may be noticing a similarity to the concept of positive predictive value that we studied in the lecture on population screening. Indeed, just as the predictive value of a positive screening test depends especially on the prevalence of the condition for which we are screening, the predictive value of a “significant” association depends on the proportion of interesting populations under study.
In that analogy, statistical power corresponds to sensitivity – the probability of observing a real association when there actually is one (i.e., classifying an interesting population as an “interesting” one). The significance level (alpha, the cutpoint for deciding what is a “significant” p-value) corresponds to the false positive rate (1 minus the specificity), the probability of classifying a boring population as an interesting one.
The table on the slide displays the numbers from the previous example, in the form we used for evaluating screening tests: sensitivity of 9/10 (90%), specificity of 95/100 (from the false positive rate [significance level] of 5/100), and PPV of 9/14.
So now you know that a p-value does not tell you the probability that a given result is due to chance (i.e., comes from a boring population) and that a “significant finding (p<0.05)” does not tell us that there is less than a 5% probability that the results were due to chance. We have to interpret a “significant” finding the way we would a positive result from a screening test with that false positive rate.
Setting a more stringent significance level (e.g., p-values < 0.01) reduces the false positive rate (increases specificity), which increases the probability that a “significant” finding was not due to chance (i.e., that a “significant” finding does come from an interesting population). But the actual probability depends on the proportion of interesting populations being studied as well as the significance level, just as positive predictive value depends upon disease prevalence and specificity.
I hope that the preceding discussion assists you in interpreting p-values you encounter in the literature. Let’s return now to the broader strategy of data analysis and interpretation, particularly attempts to infer causation from epidemiologic data.
The analysis is directed by the research questions. One category of research question is to gather information on the distribution of variables of interest. For example, we might be interested in conducting a study to estimate the distribution of serum cholesterol or blood lead levels in a population, or the prevalence of HIV or of use of well water.
Another category of research questions involves associations, such as whether blood lead level is associated with elevated blood pressure, or do prepaid health plans provide more preventive care than fee-for-service plans, or do bednets protect against malaria?.
Here is an example of data analysis in a study with a causal hypothesis: “does motorcycle helmet use reduce risk of death?” Daniel Norvell and Peter Cummings (American Journal of Epidemiology 2002;156:483-7) used data from the National Highway Traffic Safety Administration’s Fatality Analysis Reporting System, which collects information for all crashes on US public roads in which a fatality occurs. The primary exposure was helmet use; the primary outcome was death.
As you recall, the causal comparison that we would like to make contrasts the risk of death to motorcycle riders and passengers wearing helmets with the risk of death to motorcycle riders and passengers not wearing helmets. Since that comparison involves a counterfactual, we use a substitute population. What should that substitute be? If we compare death risks for helmeted riders with death risks for unhelmeted riders, we would certainly be concerned about differences between riders who wear helmets and riders who do not in regard to driving behavior and crash characteristics. The authors circumvented this concern to some extent by comparing the death risk of the driver and passenger on the same motorcycle. That comparison tends to equalize driver and crash-related factors. The authors also identified and controlled for a number of potential confounders: sex, seat position, age, and presence of a state helmet law (since a law requiring helmet use might lead crash survivors to report helmet use falsely, which would make helmets appear to be more protective (because only the survivors are able to report use).
The dataset included 9,222 driver-passenger pairs after exclusions. The primary analysis found a crude relative risk of 0.65, with 95% confidence interval 0.57-0.74). When the association was adjusted for seat position, the relative risk estimate strengthened slightly, to 0.61.
Note that an overall measure of association, whether crude or adjusted, does not tell us whether the association is the same in various important groups. Whether or not a factor, such as seat position, is a confounder it can define groups in which the association being measured is stronger or weaker (or absent). The authors investigated the possibility of effect measure modification by seat position and found a small difference: an adjusted relative risk of death of 0.65 for helmeted compared to unhelmeted drivers, and a slightly stronger association of 0.58 for helmeted versus unhelmeted passengers, though the confidence intervals overlapped considerably. (The authors also tried examining both seat position and sex simultaneously, but the two factors were very strongly related: 97.4% of the women were passengers.)
However, whether or not the crash involved a collision was indeed a powerful effect modifier. In the 88% of crashes involving a collision with a vehicle or object, the adjusted relative risk of death was 0.65 for a helmeted rider. By contrast, in crashes in which there was no collision (skidding, turning over), the adjusted relative risk was 0.36, so that helmet use appeared to be much more protective for non-collision crashes.