Crises of confidence and publishing reforms: What every social psychologist n...Matti Heino
After half a century of talk, the researcher community is putting forth genuine efforts to improve social scientific practices in 2018.
This is a presentation for the University of Helsinki faculty of Social Sciences, on the recent developments in statistical practices and publishing reforms.
On my blog: https://wordpress.com/post/mattiheino.com/3438
The document discusses challenges in evaluating the mechanisms and theories behind how interventions and programs work. It notes that while an intervention may predict an outcome like increased physical activity, understanding how and why it worked requires testing the underlying program theory against alternatives. Evaluating mechanisms of change is difficult due to sample size limitations and the complexity of interventions. Overfitting theories to limited data is also a problem. The document raises open questions about whether program theories can be treated like other scientific theories and subjected to replication, falsification, and cross-validation, and how best to study mechanisms of action in complex, real-world interventions.
The Kyrgyz Republic established a national monitoring and evaluation (M&E) system beginning in the 2000s. As strategic planning increased the need for M&E and non-governmental organization involvement, the government began including M&E sections in programs and strategies from 2000 onward. A National M&E Network was formed in 2007 by NGOs and individuals to support M&E system development. While M&E practices were adopted, implementation has faced challenges of disconnected data collection across agencies and a lack of public input. The Network works to strengthen professional evaluation through training, publications, and events to help address these challenges and further establish M&E in governance.
Founded in 1984 with an initial membership of 12 evaluators, the Washington Evaluators (WE) has since grown to include a professional and student membership base of more than 200 in the nation's capitol. This presentation describes WE's experience in developing and maintaining a community of evaluation practitioners that include a diverse mix of government, private, and self-employed evaluators as well as prominent evaluators in academia. This presentation discusses the strategies WE uses to foster personal connections and sharing information about the evaluation profession for both new and long-time evaluators.
Partnerships for Transformative Change in Challenging Political Contexts w/ D...Washington Evaluators
The document summarizes a 4-day course on transformative evaluation held in Santiago, Chile in September 2016. The course was attended by 35 evaluators from several South American countries and focused on how evaluators can contribute to social justice and human rights through their work. It covered the transformative paradigm and questions about incorporating social change into evaluation design. Participants discussed solutions like empowering marginalized communities and forming diverse evaluation teams. The course organizers were flexible in bringing transformative evaluation concepts to different universities and organizations in Chile.
This document discusses the importance and challenges of conducting trustworthy A/B tests for online controlled experiments. It provides examples of three experiments conducted at Microsoft to evaluate ideas: 1) changing the location of the Windows search box, 2) truncating search engine result pages from 10 to 8 results, and 3) adding site links to Bing ads. For each experiment, participants were asked to predict the winning variant based on the overall evaluation criterion, with the intent of showing how difficult it can be to assess the value of ideas without controlled testing. The document emphasizes that most ideas do not significantly improve metrics and that incremental improvements are typically small, achieved through many small tests rather than rare breakthroughs. It stresses the importance of controlled experiments for
The document discusses correlation versus causality in experimental design. It provides examples of different types of experimental designs including randomized controlled trials, natural experiments, before-after designs, and differences-in-differences designs. It emphasizes the importance of randomness, control groups, and understanding the outcome variable when analyzing experimental data. Key considerations include whether the outcome is continuous or categorical and choosing the appropriate statistical tests accordingly. The document also discusses examples of experiments in various contexts like economics, policy, and online settings.
Essay Good Health Adds Years To Life. Online assignment writing service.Alexis Thelismond
The document provides instructions for using the HelpWriting.net service to have an essay written. It outlines a 5-step process: 1) Create an account with an email and password. 2) Complete a 10-minute order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Review the completed paper and authorize payment. 5) Request revisions until satisfied with the work.
Crises of confidence and publishing reforms: What every social psychologist n...Matti Heino
After half a century of talk, the researcher community is putting forth genuine efforts to improve social scientific practices in 2018.
This is a presentation for the University of Helsinki faculty of Social Sciences, on the recent developments in statistical practices and publishing reforms.
On my blog: https://wordpress.com/post/mattiheino.com/3438
The document discusses challenges in evaluating the mechanisms and theories behind how interventions and programs work. It notes that while an intervention may predict an outcome like increased physical activity, understanding how and why it worked requires testing the underlying program theory against alternatives. Evaluating mechanisms of change is difficult due to sample size limitations and the complexity of interventions. Overfitting theories to limited data is also a problem. The document raises open questions about whether program theories can be treated like other scientific theories and subjected to replication, falsification, and cross-validation, and how best to study mechanisms of action in complex, real-world interventions.
The Kyrgyz Republic established a national monitoring and evaluation (M&E) system beginning in the 2000s. As strategic planning increased the need for M&E and non-governmental organization involvement, the government began including M&E sections in programs and strategies from 2000 onward. A National M&E Network was formed in 2007 by NGOs and individuals to support M&E system development. While M&E practices were adopted, implementation has faced challenges of disconnected data collection across agencies and a lack of public input. The Network works to strengthen professional evaluation through training, publications, and events to help address these challenges and further establish M&E in governance.
Founded in 1984 with an initial membership of 12 evaluators, the Washington Evaluators (WE) has since grown to include a professional and student membership base of more than 200 in the nation's capitol. This presentation describes WE's experience in developing and maintaining a community of evaluation practitioners that include a diverse mix of government, private, and self-employed evaluators as well as prominent evaluators in academia. This presentation discusses the strategies WE uses to foster personal connections and sharing information about the evaluation profession for both new and long-time evaluators.
Partnerships for Transformative Change in Challenging Political Contexts w/ D...Washington Evaluators
The document summarizes a 4-day course on transformative evaluation held in Santiago, Chile in September 2016. The course was attended by 35 evaluators from several South American countries and focused on how evaluators can contribute to social justice and human rights through their work. It covered the transformative paradigm and questions about incorporating social change into evaluation design. Participants discussed solutions like empowering marginalized communities and forming diverse evaluation teams. The course organizers were flexible in bringing transformative evaluation concepts to different universities and organizations in Chile.
This document discusses the importance and challenges of conducting trustworthy A/B tests for online controlled experiments. It provides examples of three experiments conducted at Microsoft to evaluate ideas: 1) changing the location of the Windows search box, 2) truncating search engine result pages from 10 to 8 results, and 3) adding site links to Bing ads. For each experiment, participants were asked to predict the winning variant based on the overall evaluation criterion, with the intent of showing how difficult it can be to assess the value of ideas without controlled testing. The document emphasizes that most ideas do not significantly improve metrics and that incremental improvements are typically small, achieved through many small tests rather than rare breakthroughs. It stresses the importance of controlled experiments for
The document discusses correlation versus causality in experimental design. It provides examples of different types of experimental designs including randomized controlled trials, natural experiments, before-after designs, and differences-in-differences designs. It emphasizes the importance of randomness, control groups, and understanding the outcome variable when analyzing experimental data. Key considerations include whether the outcome is continuous or categorical and choosing the appropriate statistical tests accordingly. The document also discusses examples of experiments in various contexts like economics, policy, and online settings.
Essay Good Health Adds Years To Life. Online assignment writing service.Alexis Thelismond
The document provides instructions for using the HelpWriting.net service to have an essay written. It outlines a 5-step process: 1) Create an account with an email and password. 2) Complete a 10-minute order form with instructions, sources, and deadline. 3) Review bids from writers and choose one. 4) Review the completed paper and authorize payment. 5) Request revisions until satisfied with the work.
Running head DISCUSSION ESSAY1DISCUSSION ESSAY4Di.docxtodd271
Running head: DISCUSSION ESSAY
1
DISCUSSION ESSAY
4
Discussion Essay
Name
Academic Institution
April 1, 2019
Discussion Essay
Social control plays a major role in my own life since it dictates what I should do and what I should not. This element claims a degree of my liberty to make choices since I am compelled to please society or find myself in trouble. By this, I am expected to socialize with a certain class of people or else breaking this norm may leave people feeling disappointed with me. I am also expected to carry myself in accordance with my age or else people will think that I have lost my mind or being childish, while as I may simply be in a mood to let loose and just live my life in a care free way even for a moment, because after all it is my life.
On the other hand, social control helps to shape my life in becoming a responsible youth, and the desire to meet this expectation helps in clearing my perception of matters, which also develops my perspective in relation to what society considers moral or immoral. For example, it shapes my position regarding some activities that I would otherwise consider fun yet in the real sense are criminal in nature. As a young person, I feel energetic and adventurous and fun for me is anything thrilling (Lilly et al., 2011). Presently, there are many activities that a young person can indulge in for a thrilling experience. They could include crazy driving, trying out drugs and other substances, or a weekend getaway spree under no adult supervision, just to mention but a few. However, social control comes in handy and redirects such contemplations through the guiding sense it offers through the wisdom of experienced adults such as my parents, teachers, and other guardians in my life.
The power of social influence from my community has helped to develop a sense of commitment within me to follow our social norms. As such, I would say that I see the effect of Travis Hirschi’s social bond theory, which supposes that delinquency occurs in the absence of, or when social bonds are weak (Hirschi, 2002). However, crime is easily averted when social bonds are strong. As such, in an event of social deviance, the strong association I share with parents and community plays a vital role of dissuading me from indulging in delinquency because I have accepted the social conditions of my social group.
Social conditioning has helped me to become a college student instead of being involved in criminal activity. I come from a family that does not take misbehavior kindly. Getting involved in criminal activity is met with harshness from my parents, my father especially. I remember this time immediately after receiving my college acceptance letter. A new neighbor moved in with their two sons of my age and I was more than thrilled to have them for company. Apparently, both boys were using pot and they introduced me on this rainy Saturday evening (Lilly et al., 2011). My first experience set me out of contro.
The document discusses confounding in epidemiological studies. It explains that confounding occurs when the comparison group in a study is not equivalent to the unexposed version of the exposed group, due to differences in other risk factors between the groups. Specifically, it notes that confounding is a problem of comparison, as we compare the exposed group to a substitute population that does not actually show what would have happened to the exposed group without exposure. The document emphasizes that understanding and controlling for confounding is important in determining whether an observed association is truly causal.
This document summarizes a statistics lecture about the research process and why statistics are needed in optometry and vision science. It discusses the steps of evidence-based practice including asking questions, acquiring evidence, appraising evidence, and applying evidence. It also covers generating and testing theories, levels of measurement, measurement error, validity, reliability, types of research such as correlational and experimental research, and methods of data collection and analysis. The goal is to explain the research process and why statistics are an essential tool for evidence-based practice in optometry.
The Real Lessons of Dr. Deming’s Red Bead FactoryMark Graban
The red bead experiment, created by Dr. Deming, demonstrates how variation exists in any process and is mostly due to common causes within the system, not individual performance. In the experiment, workers try to produce a standard number of beads per trial but often fail due to the inherent variation in the bead drawing process. This shows that blaming individuals and incentivizing performance does not work. The key lessons are that the system, not individuals, is usually the cause of variation, and the focus should be on understanding and reducing common cause variation through systematic improvements.
Based on data from 11 responses from Unanimous AI asking about a fair movie ticket price, this document compares the performance of Unanimous to simulated questionnaires. With the full 11 responses, 54.5% of Unanimous answers were within $0.25 of the average price, compared to 44% of simulated 36-person questionnaires, a non-significant difference. However, with samples of 15 or less, Unanimous responses were more accurate, with 80% within $0.25 of the average for samples of 5, significantly more accurate than the 23% of simulated 9-person questionnaires. More data is needed to generalize these results.
Unit 5Instructions Enter total points possible in cell C12, under.docxmarilucorr
This document provides instructions for scoring an evidence-based clinical question rubric. It includes a rubric with criteria for evaluating a refined PICOT question, systematic review of the clinical question using databases, description of the systematic review and errors analysis, determination of an evidence-based quantitative article from the search, summarization of the selected case study, description of the study approach and population, application of evidence to practice, evaluation of outcomes and validity/reliability, discussion of potential bias, determination of evidence level, length, and format/style following APA guidelines. Scores between 0-4 are entered in the yellow cells for each criteria, with the total score out of 100% calculated at the bottom along with comments on
This research only implies marital condition is correlated to the duration of calls, but did not find the quantitative relationship between them. Besides, duration’s relationship with other dimensions of information is also important for us to predict duration and target at valuable customers, which needs further research such as regression analysis.
This research only implies marital condition is correlated to the duration of calls, but did not find the quantitative relationship between them. Besides, duration’s relationship with other dimensions of information is also important for us to predict duration and target at valuable customers, which needs further research such as regression analysis.
- The document describes Stanley Milgram's famous experiment on obedience to authority from 1963. In the experiment, participants were instructed to administer electric shocks to a learner for incorrect answers, though no actual shocks were given.
- About 65% of participants administered what they believed were severe electric shocks, showing high obedience to authority. Each participant can be viewed as a Bernoulli trial with probability of 0.35 to refuse the shock.
- The document then discusses using the binomial distribution to calculate probabilities of outcomes with a given number of trials and probability of success for each trial. It provides the formula and conditions for applying the binomial distribution.
SEO split tests you should run - Will CritchlowWill Critchlow
This document summarizes key points about running SEO split tests and the scientific method. It discusses how to generate hypotheses, run controlled experiments, and analyze results using statistical methods to determine if changes lead to significant traffic increases. Specific tests mentioned include adding structured data, improving meta descriptions, making sites mobile-friendly, testing tabbed versus flat content layouts, and keyword targeting. Challenges discussed include not assuming traffic equality between test groups and accounting for seasonality.
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...MLconf
This document discusses lessons learned from combining human judgment with algorithmic recommendations at Stitch Fix. It outlines three key lessons: 1) Success can be measured in multiple ways, like agreement between humans and algorithms or user experience; 2) Algorithms should predict both item selection and success, and disagreements can provide useful feedback; 3) Having humans involved makes experiments more complicated since humans may selectively not comply with proposed recommendations or policies. The overall message is that combining human and algorithmic systems can be effective but requires careful consideration of different objectives and how humans interact with and provide feedback to the algorithms.
This document discusses correlation versus causation and the design of experiments. It begins by noting the difference between correlation and causation, as a correlation does not necessarily imply causation. It then discusses various examples where a correlation was observed but upon further examination it was found there was no causal relationship or the relationship was more complex than initially thought. The document emphasizes the importance of experimental design and controlling for confounding variables to establish causal relationships rather than just correlations. It provides several examples of experiments and their designs.
I gave this talk at a Nigeria Health Summit in March 2016. It was an introduction to impact evaluation: what it is, when it's a good idea, and some possible approaches.
This document provides data from a study on steroid usage, grip strength, aggression, and happiness. It includes the following information:
1) A table with data from 14 participants, including the number of weeks on steroids, grip strength, aggression score, happiness score, and investigator number.
2) Descriptions of what each column in the data table represents, such as the scales for grip strength, aggression, and happiness.
3) The sample size is 14 participants.
The Washington Eval membership survey found:
- Most members joined to learn about evaluation theories/practice and make connections.
- Monthly brown bags, deep dives, and social events are most popular. Preferred times are on-demand, 12-2pm, and 5:30-6pm.
- The weekly digest is most useful for sharing events, jobs, and opportunities.
- Most support increasing dues to $30, offering a two-year option, and auto-renew with opt-in.
- Members are generally satisfied with WE's diversity efforts but want more training and DEI incorporation.
- Many members expressed interest in pro bono, mentoring, and volunteer opportunities.
Are you interested in supporting emerging evaluators and developing the evaluation profession in the Washington, DC area? Are you an emerging evaluator interested in improving your skills and understanding or moving into a different field? This presentation will provide information on ways that Washington Evaluators members can engage in Mentor Minutes.
Mentor Minutes is an initiative that aims to connect current WE members to experienced evaluation professionals in the WE community through short-term mentorship opportunities. The purpose of Mentor Minutes is to pair experienced evaluators (mentors) with aspiring, emerging, or seasoned evaluators (mentees) and establish mutually beneficial professional connections.
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Roundtable: Contributions of Cost-Effectiveness Studies to Evidence-Based Policymaking
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
Harry Hatry: Cost-Effectiveness Basics for Evidence-Based PolicymakingWashington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
Running head DISCUSSION ESSAY1DISCUSSION ESSAY4Di.docxtodd271
Running head: DISCUSSION ESSAY
1
DISCUSSION ESSAY
4
Discussion Essay
Name
Academic Institution
April 1, 2019
Discussion Essay
Social control plays a major role in my own life since it dictates what I should do and what I should not. This element claims a degree of my liberty to make choices since I am compelled to please society or find myself in trouble. By this, I am expected to socialize with a certain class of people or else breaking this norm may leave people feeling disappointed with me. I am also expected to carry myself in accordance with my age or else people will think that I have lost my mind or being childish, while as I may simply be in a mood to let loose and just live my life in a care free way even for a moment, because after all it is my life.
On the other hand, social control helps to shape my life in becoming a responsible youth, and the desire to meet this expectation helps in clearing my perception of matters, which also develops my perspective in relation to what society considers moral or immoral. For example, it shapes my position regarding some activities that I would otherwise consider fun yet in the real sense are criminal in nature. As a young person, I feel energetic and adventurous and fun for me is anything thrilling (Lilly et al., 2011). Presently, there are many activities that a young person can indulge in for a thrilling experience. They could include crazy driving, trying out drugs and other substances, or a weekend getaway spree under no adult supervision, just to mention but a few. However, social control comes in handy and redirects such contemplations through the guiding sense it offers through the wisdom of experienced adults such as my parents, teachers, and other guardians in my life.
The power of social influence from my community has helped to develop a sense of commitment within me to follow our social norms. As such, I would say that I see the effect of Travis Hirschi’s social bond theory, which supposes that delinquency occurs in the absence of, or when social bonds are weak (Hirschi, 2002). However, crime is easily averted when social bonds are strong. As such, in an event of social deviance, the strong association I share with parents and community plays a vital role of dissuading me from indulging in delinquency because I have accepted the social conditions of my social group.
Social conditioning has helped me to become a college student instead of being involved in criminal activity. I come from a family that does not take misbehavior kindly. Getting involved in criminal activity is met with harshness from my parents, my father especially. I remember this time immediately after receiving my college acceptance letter. A new neighbor moved in with their two sons of my age and I was more than thrilled to have them for company. Apparently, both boys were using pot and they introduced me on this rainy Saturday evening (Lilly et al., 2011). My first experience set me out of contro.
The document discusses confounding in epidemiological studies. It explains that confounding occurs when the comparison group in a study is not equivalent to the unexposed version of the exposed group, due to differences in other risk factors between the groups. Specifically, it notes that confounding is a problem of comparison, as we compare the exposed group to a substitute population that does not actually show what would have happened to the exposed group without exposure. The document emphasizes that understanding and controlling for confounding is important in determining whether an observed association is truly causal.
This document summarizes a statistics lecture about the research process and why statistics are needed in optometry and vision science. It discusses the steps of evidence-based practice including asking questions, acquiring evidence, appraising evidence, and applying evidence. It also covers generating and testing theories, levels of measurement, measurement error, validity, reliability, types of research such as correlational and experimental research, and methods of data collection and analysis. The goal is to explain the research process and why statistics are an essential tool for evidence-based practice in optometry.
The Real Lessons of Dr. Deming’s Red Bead FactoryMark Graban
The red bead experiment, created by Dr. Deming, demonstrates how variation exists in any process and is mostly due to common causes within the system, not individual performance. In the experiment, workers try to produce a standard number of beads per trial but often fail due to the inherent variation in the bead drawing process. This shows that blaming individuals and incentivizing performance does not work. The key lessons are that the system, not individuals, is usually the cause of variation, and the focus should be on understanding and reducing common cause variation through systematic improvements.
Based on data from 11 responses from Unanimous AI asking about a fair movie ticket price, this document compares the performance of Unanimous to simulated questionnaires. With the full 11 responses, 54.5% of Unanimous answers were within $0.25 of the average price, compared to 44% of simulated 36-person questionnaires, a non-significant difference. However, with samples of 15 or less, Unanimous responses were more accurate, with 80% within $0.25 of the average for samples of 5, significantly more accurate than the 23% of simulated 9-person questionnaires. More data is needed to generalize these results.
Unit 5Instructions Enter total points possible in cell C12, under.docxmarilucorr
This document provides instructions for scoring an evidence-based clinical question rubric. It includes a rubric with criteria for evaluating a refined PICOT question, systematic review of the clinical question using databases, description of the systematic review and errors analysis, determination of an evidence-based quantitative article from the search, summarization of the selected case study, description of the study approach and population, application of evidence to practice, evaluation of outcomes and validity/reliability, discussion of potential bias, determination of evidence level, length, and format/style following APA guidelines. Scores between 0-4 are entered in the yellow cells for each criteria, with the total score out of 100% calculated at the bottom along with comments on
This research only implies marital condition is correlated to the duration of calls, but did not find the quantitative relationship between them. Besides, duration’s relationship with other dimensions of information is also important for us to predict duration and target at valuable customers, which needs further research such as regression analysis.
This research only implies marital condition is correlated to the duration of calls, but did not find the quantitative relationship between them. Besides, duration’s relationship with other dimensions of information is also important for us to predict duration and target at valuable customers, which needs further research such as regression analysis.
- The document describes Stanley Milgram's famous experiment on obedience to authority from 1963. In the experiment, participants were instructed to administer electric shocks to a learner for incorrect answers, though no actual shocks were given.
- About 65% of participants administered what they believed were severe electric shocks, showing high obedience to authority. Each participant can be viewed as a Bernoulli trial with probability of 0.35 to refuse the shock.
- The document then discusses using the binomial distribution to calculate probabilities of outcomes with a given number of trials and probability of success for each trial. It provides the formula and conditions for applying the binomial distribution.
SEO split tests you should run - Will CritchlowWill Critchlow
This document summarizes key points about running SEO split tests and the scientific method. It discusses how to generate hypotheses, run controlled experiments, and analyze results using statistical methods to determine if changes lead to significant traffic increases. Specific tests mentioned include adding structured data, improving meta descriptions, making sites mobile-friendly, testing tabbed versus flat content layouts, and keyword targeting. Challenges discussed include not assuming traffic equality between test groups and accounting for seasonality.
Brad Klingenberg, Director of Styling Algorithms, Stitch Fix at MLconf SF - 1...MLconf
This document discusses lessons learned from combining human judgment with algorithmic recommendations at Stitch Fix. It outlines three key lessons: 1) Success can be measured in multiple ways, like agreement between humans and algorithms or user experience; 2) Algorithms should predict both item selection and success, and disagreements can provide useful feedback; 3) Having humans involved makes experiments more complicated since humans may selectively not comply with proposed recommendations or policies. The overall message is that combining human and algorithmic systems can be effective but requires careful consideration of different objectives and how humans interact with and provide feedback to the algorithms.
This document discusses correlation versus causation and the design of experiments. It begins by noting the difference between correlation and causation, as a correlation does not necessarily imply causation. It then discusses various examples where a correlation was observed but upon further examination it was found there was no causal relationship or the relationship was more complex than initially thought. The document emphasizes the importance of experimental design and controlling for confounding variables to establish causal relationships rather than just correlations. It provides several examples of experiments and their designs.
I gave this talk at a Nigeria Health Summit in March 2016. It was an introduction to impact evaluation: what it is, when it's a good idea, and some possible approaches.
This document provides data from a study on steroid usage, grip strength, aggression, and happiness. It includes the following information:
1) A table with data from 14 participants, including the number of weeks on steroids, grip strength, aggression score, happiness score, and investigator number.
2) Descriptions of what each column in the data table represents, such as the scales for grip strength, aggression, and happiness.
3) The sample size is 14 participants.
Similar to The Importance of Systematic Reviews (17)
The Washington Eval membership survey found:
- Most members joined to learn about evaluation theories/practice and make connections.
- Monthly brown bags, deep dives, and social events are most popular. Preferred times are on-demand, 12-2pm, and 5:30-6pm.
- The weekly digest is most useful for sharing events, jobs, and opportunities.
- Most support increasing dues to $30, offering a two-year option, and auto-renew with opt-in.
- Members are generally satisfied with WE's diversity efforts but want more training and DEI incorporation.
- Many members expressed interest in pro bono, mentoring, and volunteer opportunities.
Are you interested in supporting emerging evaluators and developing the evaluation profession in the Washington, DC area? Are you an emerging evaluator interested in improving your skills and understanding or moving into a different field? This presentation will provide information on ways that Washington Evaluators members can engage in Mentor Minutes.
Mentor Minutes is an initiative that aims to connect current WE members to experienced evaluation professionals in the WE community through short-term mentorship opportunities. The purpose of Mentor Minutes is to pair experienced evaluators (mentors) with aspiring, emerging, or seasoned evaluators (mentees) and establish mutually beneficial professional connections.
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Roundtable: Contributions of Cost-Effectiveness Studies to Evidence-Based Policymaking
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
Harry Hatry: Cost-Effectiveness Basics for Evidence-Based PolicymakingWashington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
George Julnes: Humility in Valuing in the Public Interest - Multiple Methods ...Washington Evaluators
Washington Evaluators and the Bipartisan Policy Center's Evidence-Based Policymaking Initiative are pleased to co-sponsor a roundtable discussion about the contributions of cost-effectiveness studies to informing policy decisions in government. This panel discussion will explore approaches to conducting cost-effectiveness studies, their value and use in government decisions, and practical steps for improving their utility for decision-makers. The distinguished panelists have collectively experienced the generation and use of cost-effectiveness studies from a variety of academic, non-governmental, and governmental positions. We invite you to join us on Tuesday, December 5th at 2 PM for a lively discussion of the implications of cost-effectiveness research on government decision making.
The panel discussion will be introduced and chaired by Nick Hart, Director of BPC's Evidence-Based Policymaking Initiative and the 2017 Washington Evaluators President.
Panelists:
Harry Hatry, Distinguished Fellow and Director of the Urban Institute's Public Management Program
George Julnes, Professor in the University of Baltimore's School of Public and International Affairs
Sandy Davis, Senior Advisor to BPC's Evidence-Based Policymaking Initiative
The DC Consortium Student Conference on Evaluation and Policy (SCEP) is a collaboration of universities in the District of Columbia, Northern Virginia and Maryland regions, representing the interests of students aspiring to be evaluators and policy makers. This collaboration aims to provide students with a platform to present their research and engage with evaluation experts in the opportunity-rich region of Washington, D.C., thereby serving as a bridge between students, academia and other evaluation and policy agencies/organizations. In this presentation, students from the Organizing Committee discuss lessons learned from DC SCEP’s inaugural conference. Features of the conference include a keynote address, interdisciplinary panel, and about 30 student presentations. We will highlight lessons learned concerning how the conference served to broker knowledge towards its theme, ‘Advancing Social Justice in Evaluation and Policy Integration’ with Consortium graduate students in the region.
The document summarizes findings from three recent GAO reports on the use of evidence in federal decision making. It discusses the results of GAO's 2017 survey of federal managers which found no significant increase in the use of performance measures or information in decision making. It also summarizes the GAO's assessment of GPRAMA implementation and key findings about quarterly performance reviews and program evaluation from the manager survey. The document concludes with a recommendation that OMB direct each agency to prepare an annual evaluation agenda.
As evaluators, policy makers, and program managers, we want our efforts to solve the problems of the world to be based on the best possible knowledge. Too often, however, that knowledge is not organized in a way that makes it easy to use for decision-making and action. As a result, too many programs fail to meet their potential.
“Causal knowledge mapping” is a technique for integrating and measurably improving knowledge from a broad range of sources. In this webinar, we’ll use real-world examples and interactive conversations to show three kinds of causal knowledge maps that can benefit an evaluation: (1) Collaborative maps to design programs that fit the local situation; (2) Literature maps to identify and improve upon effective practices; (3) Evaluation findings maps for continual improvement.
Transitioning from School to Work: Preparing Evaluation Students and New Eval...Washington Evaluators
Unlike some professions, there is no single path for making the leap from student to new professional to being an established member of the profession. In large part this is because of the trans-disciplinary nature of evaluation field and the many the broad number of professions and sectors (public, non-profit, private) in which evaluation and social science research skills may be useful. This panel will explore the many approaches used by universities in the Washington, DC area to train graduate and undergraduate students in the field of evaluation, and the transition strategies to help students and new evaluators establish themselves in the evaluation field. The seven distinguished panelists are all associated with Washington Evaluators, and have served in AEA and/or WE leadership positions. Panelists and our Discussant will be asked to address questions such as:
1. In which disciplines/schools at your university would we expect to find courses in evaluation or related to evaluation?
2. What are the components of the evaluation curricula? Do you offer a degree or major field in evaluation?
3. Do you offer hands-on experiences for your students to design and conduct evaluations?
4. Where have your former students worked in the evaluation field, and what kinds of careers have they had?
5. What advice do you have for new evaluators regarding making the shift from school to work in the evaluation field? What types of professional and networking activities would you recommend to further careers in evaluation?
Challenges and Solutions to Conducting High Quality Contract Evaluations for ...Washington Evaluators
Challenges and Solutions to Conducting High Quality Contract Evaluations for the U.S. Government
Washington Evaluators Brown Bag
July 7, 2015
Presenter: David J. Bernstein
Discussant: Kathryn E. Newcomer
Lessons from World Bank Support for Evidence-Based Policy Making, Presented by Nils Junge on Wednesday, June 17, 2015 from 12 - 1:30 pm in the George Washington University Marvin Center (Room 308).
Since the late 1990s the World Bank has placed greater and greater emphasis on evidence-based policy making, with a specific focus on how the poor and vulnerable are affected. A commonly used approach is ‘Poverty and Social Impact Analysis’ (PSIA), typically undertaken before development projects are approved. PSIAs are implemented with the express purpose of informing public sector reforms in order to mitigate negative distributional impacts. To identify winners and losers of a given policy reform, PSIAs may use or combine various kinds of analysis: statistical, econometric cost-benefit, social, stakeholder, political economy, etc. Strongly utilization-focused, the evaluation process is often as important as the analytical work itself. After introducing PSIA methods, the presenter will share practical lessons from 12 years conducting PSIAs and some of the challenges inherent in this exciting area of evaluation.
Nils Junge works internationally as an independent evaluator and policy advisor. In addition to advising the World Bank and government counterparts on addressing reform impacts, he has conducted evaluations for over 20 clients in Africa, Eastern Europe and the Middle East/North Africa. Multi-lingual, he has worked in 5 languages. He has an MA from Johns Hopkins – School of Advanced International Studies (SAIS).
This document outlines the current state of monitoring and evaluation (M&E) in Tajikistan. It discusses the country's background, M&E system and players, possibilities and limitations. It also describes Tajikistan's National M&E Network, which was established in 2008 and includes over 100 members. The Network aims to share information, expand partnerships, and build M&E capacity in Tajikistan through activities like attending international conferences and developing local language resources. Overall, the document provides an overview of M&E practice in Tajikistan and the goals of the National M&E Network to further develop the field.
Washington Evaluators (WE) is a local affiliate of the American Evaluation Association (AEA). WE was founded over 30 years ago as a professional society devoted to fostering state-of-the-art knowledge and information sharing.
Ann K. Emery gave a brown bag presentation on visualizing evaluation results to the Washington Evaluators on September 15, 2014 at George Washington University. The presentation highlighted tips for creating effective data visualizations including using intentional color schemes, ensuring visuals are accessible on websites and social media, and using checklists to guide design. Emery emphasized the importance of visualizing both qualitative and quantitative evaluation findings to tell compelling stories with data.
Influencing Evaluation Policy and Practice: The American Evaluation Associati...Washington Evaluators
Influencing Evaluation Policy and Practice: The American Evaluation Association's Evaluation Policy Task Force by Cheryl J. Oros, Ph.D., Consultant to the Evaluation Policy Task Force
Title: Emerging directions and challenges in survey methods
Presented by: Jolene Smyth, Associate Professor, Department of Sociology,
University of Nebraska-Lincoln
Brown Bag co-sponsored by the Evaluation Institute and Washington Evaluators
State of Evaluation 2012: Evaluation Practice and Capacity in the Nonprofit S...Washington Evaluators
Washington Evaluators Brown Bag
by Johanna Morariu, Kat Athanasiades, and Ann Emery
February 25, 2013
Nonprofits hear a lot of talk about evaluation these days—metrics and measurements, indicators and impact, efficiency and effectiveness. Everyone seems to want evaluation results—from nonprofit staff themselves to donors to board members. But there’s a gap in the conversation: What are nonprofits really doing to evaluate their work? How are they using evaluation results? Do nonprofit staff have the knowledge, skills, and resources they need to carry out effective evaluation? We answered these questions—and many others—in State of Evaluation 2012: Nonprofit Evaluation Practice and Capacity. The State of Evaluation project is the first nationwide project that systematically and repeatedly collects data from nonprofits about their evaluation practices. During the brown bag, we’ll discuss the new findings from this research. For example, we found that: 90% of organizations report evaluating their work (up from 85% in 2010); 100% of organizations reported using and communicating their evaluation findings; Budgeting for evaluation is still low—more than 70% of organizations are spending less than 5% of organizational budgets on evaluation; and, on average, evaluation—and its close relation, research—continue to be the lowest priorities for nonprofits (compared to fundraising, financial management, communications, etc.).
Innovation Network is a nonprofit evaluation, research, and consulting firm. We provide knowledge and expertise to help nonprofits and funders learn from their work to improve their results. Johanna Morariu, Director, leads the work of the organization as well as projects with a broad assortment of philanthropic and nonprofit organizations. Kat Athanasiades and Ann Emery, Associates, provide project support.
Consistent Protocol, Unique Sites: Seeking Cultural Competence in a Multisite...Washington Evaluators
Washington Evaluators Brown Bag
by Ladel Lewis
August 28, 2012
Evaluating one site of a federally funded, longitudinal, multi-site initiative to improve services for children with mental health issues and their families presents numerous challenges. Many individuals, particularly racial minorities, are understandably reluctant to participate or remain in an evaluation concerning such sensitive issues. Further, not all the sites fit neatly into the same “one size fits all” evaluation protocol that must be used at all the sites. Cultural competence is crucial regarding: (1) breaking the barriers to participation; (2) balancing the traditional perspectives of “informed consent” and “confidentiality” with those of the participants; (3) balancing the need for consistent measures in our national study with the local realities of our participants; (4) interpreting and reporting the results. Seeking input from stakeholders at each step of the evaluation helped us recognize and overcome these barriers, and attain equitable recruitment and retention rates among Caucasian and African-American participants.
Ladel Lewis received a B.A. in Criminal Justice from the University of Michigan in 2001 and a M.A. in Sociology in 2005 from Western Michigan University. Studying evaluation research under Dr. Chris Coryn at the Evaluation Center, she earned her Ph.D. in Sociology in 2012 at Western Michigan University. She has published journal articles across disciplines such as “User Perceptions of Accessible GPS as a Wayfinding Tool for Travelers with Visual Impairments” published in the AER Journal: Research and Practice in Visual Impairment and Blindness, “White Thugs & Black Bodies: A Comparison of the portrayal of African American Women in Hip-Hop Videos” published in the Hilltop Journal and “Lights, Camera Action: The Portrayal of African American Women In Hip Hop Videos” in the Call & Response Journal.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfGetInData
Recently we have observed the rise of open-source Large Language Models (LLMs) that are community-driven or developed by the AI market leaders, such as Meta (Llama3), Databricks (DBRX) and Snowflake (Arctic). On the other hand, there is a growth in interest in specialized, carefully fine-tuned yet relatively small models that can efficiently assist programmers in day-to-day tasks. Finally, Retrieval-Augmented Generation (RAG) architectures have gained a lot of traction as the preferred approach for LLMs context and prompt augmentation for building conversational SQL data copilots, code copilots and chatbots.
In this presentation, we will show how we built upon these three concepts a robust Data Copilot that can help to democratize access to company data assets and boost performance of everyone working with data platforms.
Why do we need yet another (open-source ) Copilot?
How can we build one?
Architecture and evaluation
1. The Campbell Collaboration www.campbellcollaboration.orgThe Campbell Collaboration www.campbellcollaboration.org
Howard White
CEO, Campbell Collaboration
@washeval @c2update @HowardNWhite
The importance of systematic reviews
3. The Campbell Collaboration www.campbellcollaboration.org
The seven piece study
But these are observational data,
which don’t control for selection
bias (people who eat more than five
portions a day are wealthy,
educated, health fanatics)
The five piece study
This is a systematic review, using data from
16 high-quality studies (observational data
but analysis controls for confounders)
7. The Campbell Collaboration www.campbellcollaboration.org
Errors in hypothesis testing
H0 correct H0 false
Don’t reject H0 No error Type II error
‘false negative’
Reject H0 Type I error
‘false positive’
No error
8. The Campbell Collaboration www.campbellcollaboration.org
Hypothesis testing: if H0 true
H0: ß = 0
Don’t reject H0
5% chance
that when
H0 true get a
sample
leads you to
reject H0
9. The Campbell Collaboration www.campbellcollaboration.org
Errors in hypothesis testing
H0 true H0 false
Don’t reject H0 No error Type II error
Reject H0 Type I error
= 5%
No error
10. The Campbell Collaboration www.campbellcollaboration.org
Hypothesis testing HA correct
0
Incorrectly reject
HA approx 40%
of the time
Power = 1 – type II error
HA
11. The Campbell Collaboration www.campbellcollaboration.org
Errors in hypothesis testing
H0 true H0 false
Don’t reject H0 No error Type II error
Maybe 20% but often
40-60%
Reject H0 Type I error
= 5%
No error
12. The Campbell Collaboration www.campbellcollaboration.org
The horrifying truth about hypothesis
testing
• If the ‘null hypothesis’ is correct (null = no programme
impact) then we will correctly agree with the null 95% of
the time (we are wrong 5% of the time)
• But if the null hypothesis is wrong (the programme works)
then we probably incorrectly conclude the programme
doesn’t work 40-60% of the time!!!
13. The Campbell Collaboration www.campbellcollaboration.org
Implications
• An under-powered RCT is no better than tossing a coin at
determining if a successful programme is working so
• Power, power, power
• We also need replicate ‘unsuccessful’ programmes
• And we really REALLY need to do systematic reviews with
meta-analysis….
14. The Campbell Collaboration www.campbellcollaboration.org
Pooling evidence
1 = no effect
Intervention worksIntervention is harmful
So pooling data
allows us to
overcome the
high risk of
Type II error
And goal scoring
can be very
misleading
(DON’T do it)
15. The Campbell Collaboration www.campbellcollaboration.org
A real life example
30-50% reduction
in mortality
Corticosteriod for
women about to
deliver
prematurely
19. The Campbell Collaboration www.campbellcollaboration.org
Source: Li-Xuan Sang et al. Consumption of coffee associated with reduced risk of liver
cancer: a meta-analysis BMC Gastroenterol. 2013; 13: 34 doi: 10.1186/1471-230X-13-34
COFFEE AND LIVER CANCERBut coffee
is good for
you
20. The Campbell Collaboration www.campbellcollaboration.org
And this really matters… growing number of studies
show most things don’t work.
• Education: 90 interventions evaluated in RCTs by IES - 90%
had weak or no positive effects.
• Employment/training: Department of Labor-commissioned
RCTs 75% weak or no positive effects
• Business: Over 13,000 RCTs of new products/strategies
conducted by Google and Microsoft, 80- 90% no significant
effects.
But are these sufficiently powered???
Need to combine the evidence
21. The Campbell Collaboration www.campbellcollaboration.org
There’s more to a systematic review than meta-analysis
- Systematic search
- Systematic screening
- Systematic coding
- Systematic synthesis
- Systematic presentation of results
Not being systematic introduces bias – as we shall see
22. The Campbell Collaboration www.campbellcollaboration.org
What the evidence synthesis process should look like
Source: Julia Littell: Campbell Systematic Reviews:
Evidence for Implementation and Impact, GIC Dublin 2015
23. The Campbell Collaboration www.campbellcollaboration.org
What the evidence synthesis process actually looks like
Source: Julia Littell: Campbell Systematic Reviews:
Evidence for Implementation and Impact, GIC Dublin 2015
Over-emphasis on
significant findings
Studies with significant findings are 2-3 times
more likely to be published
Selective presentation of results
24. The Campbell Collaboration www.campbellcollaboration.org
An example: the treatment of results from a single study of parent
training (PT) versus multi-systemic training (MST) (a branded
programme)
RCT assigning 43 abusive or neglectful families to either:
• Parent training: group sessions discussing parenting
techniques
• Multi-systemic therapy: individual family treatment tackling
multiple issues, e.g expectation re. child behaviour, child
management, emotional support, parental behaviour
change
Study looked at 30 outcomes on individual and child
functioning, stress etc.
25. The Campbell Collaboration www.campbellcollaboration.org
Outcome reporting and confirmation bias in action
9 out of 14
reviews report
just one outcome
from the paper,
favouring MST
Source: Julia Littell: Campbell Systematic Reviews:
Evidence for Implementation and Impact, GIC Dublin 2015
26. The Campbell Collaboration www.campbellcollaboration.org
And how reviewers summarized the Brunk et al.
paper…
Source: Julia Littell: Campbell Systematic Reviews:
Evidence for Implementation and Impact, GIC Dublin 2015
No difference
27. The Campbell Collaboration www.campbellcollaboration.org
What the systematic review says
Out of home placement: no difference
Delinquency: no difference
Family cohesion: no
difference
28. The Campbell Collaboration www.campbellcollaboration.org
Systematic reviews rebalance the evidence
pyramid
Narrative
reviews
More than 100 narrative
reviews most stating
MST is more effect than
alternatives
MST is not
consistently
better or worse
than
alternatives
29. The Campbell Collaboration www.campbellcollaboration.org
So drop narrative reviews in favour of systematic
reviews to rebalance the evidence pyramid
More than 100 narrative
reviews most stating
MST is more effect than
alternatives
MST is not
consistently
better or worse
than
alternatives
30. The Campbell Collaboration www.campbellcollaboration.org
And in some areas, SRs have already made a
difference
Crime and justice
1970s “Nothing works”
Analysis of 231 studies:
“With few exceptions, the rehabilitative efforts that have
been reported so far have no appreciable effect on
recidivism”
Lipton, Martinson and Wilks ‘The Effectiveness of
Correctional Treatment, 1975
Abandonment of rehabilitation in US
and other countries (in prison at least
get an incarceration effect)
But ….
31. The Campbell Collaboration www.campbellcollaboration.org
The analysis was goal scoring:
and there is no mixed evidence, only
poorly synthesized evidence. Meta-
analysis gave clear results
though in fact 48% of studies found significant
positive effects
showed ‘mixed evidence’ for all categories of intervention
32. The Campbell Collaboration www.campbellcollaboration.org
Rehabilitation works
Review of 9 meta-analyses found ALL had positive average treatment effect
Source: Lipsey and
Wilson (1993)
And prison no better than non- custodial
sentences… or possibly worse
RCTs & 1 natural experiment
PSM
Source: Villletaz et al., 2013
33. The Campbell Collaboration www.campbellcollaboration.org
Which means that…
• Prison is at best no better and possibly worse than non-custodial
sentences
• Each additional year of prison increases recidivism by 3-4 per cent
Source: Petrosino et al. Scared Straight and Other Juvenile Awareness
Programs for Preventing Juvenile Delinquency: A Systematic Review
Campbell Systematic Reviews 2013:5
‘Shock
approaches’
such as boot
camps and
scared
straight are
unsuccessful
and even
harmful
34. The Campbell Collaboration www.campbellcollaboration.org
And this evidence is being used
“We must use sound,
research-based,
rehabilitation programmes
for offenders so they do
not re-offend.”
35. The Campbell Collaboration www.campbellcollaboration.org
Similar story in policing
1997 Sherman et al. Preventing Crime: What works, what doesn’t,
reviewing over 500 crime prevention programmes
UK £250 million ‘Crime reduction programme’
Source: Braga et al. Hot spots policing effects on crime.
Campbell Systematic Reviews 2012:8
For example
hotspot
policing
36. The Campbell Collaboration www.campbellcollaboration.org
Reviews can be used to answer both first
generation questions (does it work) and second
generation (design) questions.
Examples of looking at design questions
37. The Campbell Collaboration www.campbellcollaboration.org
First generation: payment for environmental services
Effects of PES on forest cover change rate due to deforestation
ra -rc
−0.010 −0.005 0.000 0.005 0.010
Costa Rica PSA
Costa Rica PSA
Costa Rica PSA
Mexico PSAH
Mexico MBCF
1998−2005
1997−2000
2000−2005
2004−2006
2000−2009
Arriagada et al. 2011
Pfaff et al. 2008
Robalino et al. 2008
Alix−Garcia et al. 2012
Honey−Roses et al. 2011
Random effects mean
Predictive interval
I2
= 67.9%, t2
= 0 (0), t = 0
Tiny effect
0.3 % reduction in
deforestation
i.e. after 10 years 97% of
land for which payments
received would still have
been forested in absence
of the programme
38. The Campbell Collaboration www.campbellcollaboration.org
What works by type of programme for teenage pregnancy
Source: Scher et al. Interventions Intended to Reduce
Pregnancy-Related Outcomes Among Adolescents. Campbell
Review 2006:12
39. The Campbell Collaboration www.campbellcollaboration.org
Impact of welfare for work schemes by administration
• Source: Smedslund et al Work Programs for Welfare recipients. Campbell Review 2006:9
0.00 0.20 0.40 0.60 0.80 1.00 1.20
Employment
Earnings
Welfare payments
Welfare proportion
Bush Clinton Reagan 2nd Reagan 1st Carter Ford Johnson
40. The Campbell Collaboration www.campbellcollaboration.org
Second generation: conditional cash transfers (CCTs)
• Mexico: Progressa
launched 1996
(renamed
Oportunidades)
• Brazil: 12 million
families by 2010Cash payment on
conditions:
• Education with 80%
attendance and
maintaining certain grade
• Health: Ante-natal care,
child immunization
Targeted both geographically
and by means test
Design questions:
• Do conditions
matter?
• Timing, nature and
size of payment
• Who to give it to
41. The Campbell Collaboration www.campbellcollaboration.org
Meta-analysis allows us to get at design features,
for example …
CCTs have a larger effect on enrolment rates
• Secondary than primary
• The larger the transfer
• The less frequent the transfer
• If conditions include achievement not just attendance
And…
42. The Campbell Collaboration www.campbellcollaboration.org
Conditionality works
Children 60% more likely to be in school with conditionality
which is monitored and enforced compared to no
conditions
But we need a lot of primary studies
to exploit heterogeneity
43. The Campbell Collaboration www.campbellcollaboration.org
Water, supply and sanitation interventions
4 main types of intervention:
– Water supply improvement: source or point-of-use
– Water quality: water treatment/protection at source or
point-of-use (households)
– Sanitation: provision of facilities (improved latrines,
sewer connection)
– Hygiene: soap, hygiene education
Usual outcome variable is child diarrhoea
44. The Campbell Collaboration www.campbellcollaboration.orgwww.3ieimpact.orgAuthor name
44
65 studies (71 interventions) included in meta-analysis
11 studies from searches met the inclusion criteria
54 studies from previous reviews met the inclusion
criteria
Review against inclusion criteria
Abstract review of 278 papers, with full text copies
obtained for 68 of these
Full text copies obtained of all 110 studies
Search strategy
Title review of 19,233 papers identified from
searches of databases, organisations and
communication with researchers
110 studies identified from the bibliographies
previous reviews
45. The Campbell Collaboration www.campbellcollaboration.org
45
Effectiveness results pooled (outcome = child diarrhoea)
NOTE: Weights are from random effects analysis
Water supply interventions
Subtotal
Water quality interventions
Subtotal
Sanitation interventions
Subtotal
Hygiene interventions
Subtotal
Multiple interventions
Subtotal
ID
Study
0.98 (0.89, 1.06)
0.58 (0.50, 0.67)
0.63 (0.43, 0.93)
0.69 (0.61, 0.77)
0.62 (0.46, 0.83)
ES (95% CI)
0.98 (0.89, 1.06)
0.58 (0.50, 0.67)
0.63 (0.43, 0.93)
0.69 (0.61, 0.77)
0.62 (0.46, 0.83)
ES (95% CI)
Ratio favours intervention
1.1 .5 .75 1 2
BUT
Evidence
largely from
trials not actual
projects
And hints of
weak
sustainability
46. The Campbell Collaboration www.campbellcollaboration.org
Sustainability 1: less impact over longer periods
NOTE: Weights are from random effects analysis
Water supply (12 months or more)
Subtotal
Water quality (under 12 months)
Subtotal
Water quality (12 months or more)
Subtotal
Sanitation (12 months or more)
Subtotal
Hygiene (under 12 months)
Subtotal
Hygiene (12 months or more)
Subtotal
Multiple (under 12 months)
Subtotal
Multiple (12 months or more)
Subtotal
ID
Study
0.82 (0.71, 0.96)
0.56 (0.47, 0.66)
0.81 (0.67, 0.97)
0.64 (0.37, 1.10)
0.72 (0.60, 0.86)
0.67 (0.49, 0.91)
0.41 (0.23, 0.74)
0.77 (0.70, 0.85)
ES (95% CI)
0.82 (0.71, 0.96)
0.56 (0.47, 0.66)
0.81 (0.67, 0.97)
0.64 (0.37, 1.10)
0.72 (0.60, 0.86)
0.67 (0.49, 0.91)
0.41 (0.23, 0.74)
0.77 (0.70, 0.85)
ES (95% CI)
Ratio favours intervention
1.1 .5 .75 1 2
47. The Campbell Collaboration www.campbellcollaboration.org
Sustainability 2: low compliance after a while
• Ceramic filter provision in Cambodia; 3 years later only 31% households
were still using the filters (Brown et al, 2007)
• Pasteurisation in Kenya; 4 years later only 30% continued to pasteurise
their water (Iijima et al, 2001)
• Programme promoting POU water disinfectant in Guatemala 1 year
later; repeated use among only 5% of households from original trials
(Luby et al, 2008).
• Water filters in Bolivia; compliance 67%; assessment made 4 months
after trial ended (Clasen et al, 2006)
48. The Campbell Collaboration www.campbellcollaboration.orgwww.3ieimpact.orgAuthor name
Sustainability 3: lack of WTP
In Kenya, access to free chlorine increased uptake to over 60
percent, whereas coupons for even a 50 percent discount had a
minimal effect
49. The Campbell Collaboration www.campbellcollaboration.org
– Point of use water treatment has
large health effect (community-
level doesn’t)
– But challenge is to ensure
sustained proper use
– Any water supply intervention not
taking into account this demand
element should be questioned
So the systematic review tells us that:
50. The Campbell Collaboration www.campbellcollaboration.org
A quick word on the Campbell Collaboration
• Coordinating groups (CGs) for
– Crime and Justice
– Education
– International development
– Social welfare
• CGs manage editorial process
– Three stage process: title, protocol, review
– Any team can submit proposed title
• All published in Campbell Library, managed by Secretariat in Oslo
51. The Campbell Collaboration www.campbellcollaboration.org
In summary
• Rigorous evidence matters
• High quality systematic reviews sort out what is
rigorous and what is not
• And synthesize the evidence in policy-relevant ways
• - telling us what works and why
• Make them and use them!!
52. The Campbell Collaboration www.campbellcollaboration.orgThe Campbell Collaboration www.campbellcollaboration.org
Thank you
Visit our website
www.campbellcollaboration.org
Follow us on Twitter @C2update & Facebook