Don't Torture Data, Ask Nicely!

•

0 likes•221 views

This document discusses types of data torture and how to avoid forcing confessions from data. It defines two types of data torturing: opportunistic, where associations are found in the data and hypotheses formed to fit them; and Procrustean, where hypotheses are decided on in advance and the data made to fit. Clues to detecting data torture include whether findings came from primary or secondary hypotheses and whether all data groups were analyzed. The document advises asking data questions respectfully rather than torturing it to avoid forced confessions.

Data & Analytics

Digital You Can Trust |
TRUSTED CONF.
CONTACT: Roberta Cardoso
DATE: July 2019
DATA TORTURE

Digital You Can Trust |
BETA
ROBERTA CARDOSO
TECHNICAL DATA ANALYST
HI, I’M BETA…
● Brazilian
● Data Analyst
● Mother of a sweet 10yo girl
● Tutor of 3 dogs and 6 cats
● Balancing nature and technology in everyday life

DIGITAL MARKETING & ANALYTICS |
Digital You Can Trust |
If you want reliable
confessions, don’t
torture your data,
ask nicely.
AGENDA
Data Torture
Types of Data Torture
Clues to Data Torture
Are You Forcing
Confessions?

Digital You Can Trust |
If you torture the
data long enough, it
will confess to
anything.
Source: How to Lie With Statistics, 1954, ISBN 0393310728
- Darrell Huff

Digital You Can Trust |
Opportunistic Procrustean
Types of Data Torturing
Source: Mills, 1993, pp. 1196–1199

Digital You Can Trust |
Pores over the data until a
"significant" association is found and
then devises a plausible hypothesis
to fit the association.
Opportunistic
Makes very hard for readers to tell that
the positive association didn’t spring
from an a priori hypothesis.
is performed by deciding on the
hypothesis to be proved and making
the data fit the hypothesis.
Procrustean
Its results are often more believable if one starts
with a popular hypothesis. It is also more
destructive, because it may produce results that are
seen as definitive proof of the hypothesis
When many independent
tests are performed, the
probability of a correct
conclusion drops
drastically.
Source: Mills, 1993, pp. 1196–1199

Digital You Can Trust |
Pores over the data until a
"significant" association is found
between variables and then devises
a plausible hypothesis to fit the
association.
Opportunistic
Makes very hard for readers to tell that the positive
association didn’t spring from an a priori
hypothesis.
It is performed by deciding on the
hypothesis to be proved and making
the data fit the hypothesis.
Procrustean
It may produce results that are seen as
definitive proof of the hypothesis
It’s more difficult to carry
out than opportunistic
data torturing, because it
requires selective
reporting, but its results
are often more believable
Source: Mills, 1993, pp. 1196–1199

Digital You Can Trust |
There is a chance of
doing this
unintentionally.

Digital You Can Trust |
Comparing a current value
to an average or target
value.
Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS

Digital You Can Trust |
Performing trend analysis.
Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS

Digital You Can Trust |
Clues to Data Torture
Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS
● Did the reported findings result from testing a primary hypothesis or an a posteriori
hypothesis?
● Does the hypothesis have good supporting data from previous studies?
● Have data been reported for all groups in the study or were certain study groups excluded
from analysis and why?
● Was the effect of multiple comparisons discussed and statistically managed?
● How many significant results were reported relative to the number of comparisons made?
● Was the research outcome defined before collecting the data?

Digital You Can Trust |
How to avoid
forced confessions?

Digital You Can Trust |
Don’t torture
the data,
Ask nicely.

DELIVERED
BY EXPERTS
Our global team of expert consultants and
practitioners have been hand-selected from
thousands of applicants.
Digital You Can Trust |

We’re a global online marketing
agency managed from one of
the finest beaches on the planet.
Digital You Can Trust |
GET IN
TOUCH

What's hot

Using fairness metrics to solve ethical dilemmas of machine learningLászló Kovács

GRBN Trust and Personal Data Survey - Presentation - IIeX Amsterdam - Febru...Andrew Cannon

Chna library research 1 21-15ajumonvi

Thift Shop Analytics 2016Scott Pierce

Automated Predictionshowslidedump

1115 track2 siegelRising Media, Inc.

Thift Shop InsightsScott Pierce

Data Science Poster FinalJesse Hinson

5 Ways to Tell Stories with DataSteve Rayson

Cognitive Recommendations Using Real Estate Standard OntologyPropMixIO

Ta da!Suresh Manian

Think Like a Strategist - Confab 2019Melanie Seibert

What's hot (12)

Using fairness metrics to solve ethical dilemmas of machine learning

GRBN Trust and Personal Data Survey - Presentation - IIeX Amsterdam - Febru...

Chna library research 1 21-15

Thift Shop Analytics 2016

Automated Prediction

1115 track2 siegel

Thift Shop Insights

Data Science Poster Final

5 Ways to Tell Stories with Data

Cognitive Recommendations Using Real Estate Standard Ontology

Ta da!

Think Like a Strategist - Confab 2019

Similar to Don't Torture Data, Ask Nicely!

3 razones para contar historias...Data IQ Argentina

Jerait PDF.pdfJeraitServices

Book Summary : Everybody LiesRahul Rishi

Validating stategies - audience responsesAssociation for Project Management

Denver Event - 2013 - Floodlight and Data Engine User SurveyKDMC

Open Data Myths: busted!Cofluence

Can you trust your Data? 3 ways to be sure.Mohammad Ebdah RN MBA CPHQ CLSSGB

3 formas de saber como confiar en tus datosData IQ Argentina

ACP Digging DeeperJennifer LaFleur

Mendelson: Driving daily enterprise coverageNews Leaders Association's NewsTrain

Паливода Єгор.pptxDepartment of Economics, Entrepreneurship and Business Administration, SumDU

Explain the concept of data storytelling in data analysis.pdfSoumodeep Nanee Kundu

First-Party World Problems: Future-Proof Your Business with First-Party DataIn Marketing We Trust

The Myths of Big DataProphet

Why does telling a story with your data matters Explain the impo.docxfranknwest27899

DataWeek: Oh no, I'm running a data-driven cult!Huge

Making the Leap from Market Research to Insight Part Three: Quantitative DataThom Pulliam

Are you data-driven or addicted to data?Tony Clement

Reflection of Statistical Analysis.BISMAshley Kruempel

Week2 day5slideRohitKar2

Similar to Don't Torture Data, Ask Nicely! (20)

3 razones para contar historias...

Jerait PDF.pdf

Book Summary : Everybody Lies

Validating stategies - audience responses

Denver Event - 2013 - Floodlight and Data Engine User Survey

Open Data Myths: busted!

Can you trust your Data? 3 ways to be sure.

3 formas de saber como confiar en tus datos

ACP Digging Deeper

Mendelson: Driving daily enterprise coverage

Паливода Єгор.pptx

Explain the concept of data storytelling in data analysis.pdf

First-Party World Problems: Future-Proof Your Business with First-Party Data

The Myths of Big Data

Why does telling a story with your data matters Explain the impo.docx

DataWeek: Oh no, I'm running a data-driven cult!

Making the Leap from Market Research to Insight Part Three: Quantitative Data

Are you data-driven or addicted to data?

Reflection of Statistical Analysis.BISM

Week2 day5slide

Recently uploaded

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一ffjhghh

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H

Ukraine War presentation: KNOW THE BASICSAishani27

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408

Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

BabyOno dropshipping via API with DroFx.pptxolyaivanovalion

Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth

Invezz.com - Grow your wealth with trading signalsInvezz1

Introduction-to-Machine-Learning (1).pptxfirstjob4

Ravak dropshipping via API with DroFx.pptxolyaivanovalion

(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor

Carero dropshipping via API with DroFx.pptxolyaivanovalion

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

Industrialised data - the key to AI success.pdfLars Albertsson

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

Recently uploaded (20)

定制英国白金汉大学毕业证（UCB毕业证书）成绩单原版一比一

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf

Ukraine War presentation: KNOW THE BASICS

BigBuy dropshipping via API with DroFx.pptx

VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps

Schema on read is obsolete. Welcome metaprogramming..pdf

Generative AI on Enterprise Cloud with NiFi and Milvus

BabyOno dropshipping via API with DroFx.pptx

Unveiling Insights: The Role of a Data Analyst

Invezz.com - Grow your wealth with trading signals

Introduction-to-Machine-Learning (1).pptx

Ravak dropshipping via API with DroFx.pptx

(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...

Carero dropshipping via API with DroFx.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

Industrialised data - the key to AI success.pdf

Call Girls In Mahipalpur O9654467111 Escorts Service

Don't Torture Data, Ask Nicely!

1. Digital You Can Trust | TRUSTED CONF. CONTACT: Roberta Cardoso DATE: July 2019 DATA TORTURE

2. Digital You Can Trust | BETA ROBERTA CARDOSO TECHNICAL DATA ANALYST HI, I’M BETA… ● Brazilian ● Data Analyst ● Mother of a sweet 10yo girl ● Tutor of 3 dogs and 6 cats ● Balancing nature and technology in everyday life

3. DIGITAL MARKETING & ANALYTICS | Digital You Can Trust | If you want reliable confessions, don’t torture your data, ask nicely. AGENDA Data Torture Types of Data Torture Clues to Data Torture Are You Forcing Confessions?

4. Digital You Can Trust | If you torture the data long enough, it will confess to anything. Source: How to Lie With Statistics, 1954, ISBN 0393310728 - Darrell Huff

5. Digital You Can Trust |

6. Digital You Can Trust | Opportunistic Procrustean Types of Data Torturing Source: Mills, 1993, pp. 1196–1199

7. Digital You Can Trust | Pores over the data until a "significant" association is found and then devises a plausible hypothesis to fit the association. Opportunistic Makes very hard for readers to tell that the positive association didn’t spring from an a priori hypothesis. is performed by deciding on the hypothesis to be proved and making the data fit the hypothesis. Procrustean Its results are often more believable if one starts with a popular hypothesis. It is also more destructive, because it may produce results that are seen as definitive proof of the hypothesis When many independent tests are performed, the probability of a correct conclusion drops drastically. Source: Mills, 1993, pp. 1196–1199

8. Digital You Can Trust | Pores over the data until a "significant" association is found between variables and then devises a plausible hypothesis to fit the association. Opportunistic Makes very hard for readers to tell that the positive association didn’t spring from an a priori hypothesis. It is performed by deciding on the hypothesis to be proved and making the data fit the hypothesis. Procrustean It may produce results that are seen as definitive proof of the hypothesis It’s more difficult to carry out than opportunistic data torturing, because it requires selective reporting, but its results are often more believable Source: Mills, 1993, pp. 1196–1199

9. Digital You Can Trust | There is a chance of doing this unintentionally.

10. Digital You Can Trust | Comparing a current value to an average or target value. Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS

11. Digital You Can Trust | Performing trend analysis. Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS

12. Digital You Can Trust | Clues to Data Torture Source: Marcey L. Abate - DATA TORTURING AND THE MISUSE OF STATISTICAL TOOLS ● Did the reported findings result from testing a primary hypothesis or an a posteriori hypothesis? ● Does the hypothesis have good supporting data from previous studies? ● Have data been reported for all groups in the study or were certain study groups excluded from analysis and why? ● Was the effect of multiple comparisons discussed and statistically managed? ● How many significant results were reported relative to the number of comparisons made? ● Was the research outcome defined before collecting the data?

13. Digital You Can Trust | How to avoid forced confessions?

14. Digital You Can Trust | Don’t torture the data, Ask nicely.

15. Digital You Can Trust | Thank you!

16. DELIVERED BY EXPERTS Our global team of expert consultants and practitioners have been hand-selected from thousands of applicants. Digital You Can Trust |

17. We’re a global online marketing agency managed from one of the finest beaches on the planet. Digital You Can Trust | GET IN TOUCH

Editor's Notes

Proud resident of the Chapada Diamantina National Park in Brazil Data Analyst with about 10 years of experience in Digital Marketing Analysis I’m passionate about Project Management, process design and optimisation for Big Data
The first time I read something similar to this quote was on the website of a Data Science Institute. They were using it as their motto: "We torture data until it confesses". In the beginning, it made total sense to me, because I'm a Data Analyst, and my job is to get answers from the data. I Googled: "Data Torture", and found the root of this quote in a book from 1954, where the author picks apart how marketers manipulate statistics and data visualization to trick the public. The book is named "How to Lie With Statistics". At that point, I was relieved that I hadn't changed my LinkedIn headline from Data Analyst to Data Torturer. It became evident for me that data torturing is less about answering questions, and more about forcing confessions for whatever the torturer wants to prove.
Like other forms of torture, If it’s done skillfully, data torturing won’t leave incriminating evidence. So, the unfortunate result of torturing data is getting anything but the truth. In a word, data torturing is ethically problematic because neither the reported data nor the explanations or hypotheses the data torturer offers are all that trustworthy.
In 1993, Doctor James Mills published an article in The New England Journal of Medicine, where he refers to two types of Data Torturing: 1) Opportunistic 2) Procrustean We’re going briefly cover both now:
Opportunistic torture is performed by running many independent tests - which decreases the probability of a correct conclusion. For instance: If the CvR for a current ad and it's creative variation differed by 5% or 10% how would we know whether the difference was due to chance? For reasonably arbitrary reasons, a result is not due to chance if the probability value (p-value) is less than 0.05 - which means that there is a 5% chance to conclude that the two ads differ when they actually don't, and 95% probability that we will correctly infer that there is no difference between them. The problem is: when many independent tests are performed that probability drops drastically, in a way that if we run the same test 20 times, it is only 36% of the probability of a correct conclusion.
Procrustean data torturing is about manipulating the data so that they prove the desired hypothesis. It’s more difficult to carry out than opportunistic data torturing, because it requires selective reporting, but its results are often more believable. It’s also more destructive, because it may produce results that are seen as definitive proof of the hypothesis. It can take several forms: Exposure may be redefined in a way that strengthens the association. For example, one study of the website organic traffic due to some SEO improvements on the outcome of an notable uplift in CTR presumed 30 days before the intervention; the choice of an inappropriately extended period to define the exposure produced a positive result by including unknown interventions unrelated to the tested optimization. Study pages whose results don’t support the hypothesis may be intentionally dropped.
Can you see how easy it is to slip from one impression to something quite different by making different approaches to interpreting the data? Data torture simply reflects that if you keep coming at the data from different angles you can get a whole range of answers, and there is also a chance you are doing this unintentionally.
A common method for analyzing data is to compare a current value to an average or target value. This form of data torturing may lead to acting on a perceived difference when none really exists. Comparisons to averages, specifications, and targets ignore common variability and treat every fluctuation as something special. Another example of the dangers associated with this type of analysis is: when the CTR results for a particular web page has been plotted over time, and the monthly percentage drop twice the overall average and a “red flag” is given by ignoring the time progression.
Trend analysis are often misused to make decisions with either limited data points or with inadequate knowledge about the process of creating the data. This practice may result in data torturing by wrongly identifying the type of trend or by leading one to conclude that a trend exists when in fact it does not. Consider the data points shown in the Figure #1. It is difficult, if not impossible, to formulate a meaningful interpretation by only observing three data points without a broader contextual basis, it is all too common that this data would be labeled an “upward trend”. In Figures #2 and #3, the last three data points in each run chart are the same points as those given in Figure #1. Making decisions and taking action on perceived trends from limited data points will almost surely result in data torturing by either underreacting or overreacting.
Data torturing can rarely be proved. There are, however, clues that should arouse the reader's suspicion. In conclusion, here are some of Mills’ recommendations for assessing allegedly statistically significant findings
These shortcomings make evident the importance of applying statistical thinking even when using basic statistical tools. As repeatedly shown, failure to consider the processes, variation, and data within the mindset of statistical thinking can result in faulty decisions and actions. In summary, because statistical thinking requires a focus on the process, the application of the associated concepts will increase the effectiveness of statistical tools and help to prevent data torturing.

Don't Torture Data, Ask Nicely!

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Don't Torture Data, Ask Nicely!

Similar to Don't Torture Data, Ask Nicely! (20)

More from In Marketing We Trust

More from In Marketing We Trust (20)

Recently uploaded

Recently uploaded (20)

Don't Torture Data, Ask Nicely!

Editor's Notes