Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tal Zarsky, "Correlation v. Causation in Health-Related Big Data Analysis: The Role of Reason and Regulation"


Published on

Part of the "2016 Annual Conference: Big Data, Health Law, and Bioethics" held at Harvard Law School on May 6, 2016.

This conference aimed to: (1) identify the various ways in which law and ethics intersect with the use of big data in health care and health research, particularly in the United States; (2) understand the way U.S. law (and potentially other legal systems) currently promotes or stands as an obstacle to these potential uses; (3) determine what might be learned from the legal and ethical treatment of uses of big data in other sectors and countries; and (4) examine potential solutions (industry best practices, common law, legislative, executive, domestic and international) for better use of big data in health care and health research in the U.S.

The Petrie-Flom Center for Health Law Policy, Biotechnology, and Bioethics at Harvard Law School 2016 annual conference was organized in collaboration with the Berkman Center for Internet & Society at Harvard University and the Health Ethics and Policy Lab, University of Zurich.

Learn more at

Published in: Healthcare
  • Be the first to comment

Tal Zarsky, "Correlation v. Causation in Health-Related Big Data Analysis: The Role of Reason and Regulation"

  1. 1. Background ž “Just    Correlation”    and  predictive   analytics  in  the  medical  and  other   contexts:   —The  Age  of  Big  Data —Data  Driven  Processes  and  Results —Putting  the  information  to  use —Reliance  on  “mere”  correlations
  2. 2. Roadmap ž The  rise  of  “Big  Health  Data” ž What  does  mere  reliance  on  correlation   mean  (examples) — Possible  options,  alternatives  and  outcomes ž Pros  and  Cons  of  “Just  Causation” — Reliance  on  other  disciplines.   ž Law  and  Policy  implications  and  “hooks”
  3. 3. “Big Health Data” ž Health  and  Medical  data  held  by  new   players,  because  of:   — Definition  change   — New  practices,  sources  and  business  models. ○ At  times,  these  are  startups.   ž Change  reflected  in  some  new  legislation   [GDPR  in  the  EU]. — Regulating  health  data  calls  for  unique   balancing;;   ○ Strong  privacy  preference  vs.  public  benefits  
  4. 4. Example (1): Credit Data ž “all  data  is  credit  data,  we  just  don’t   know  how  to  use  it  yet”. ž ZestFinance and  others  – provide   methods  for  credit  ranking  of  the   “underbanked”.   ž Most  likely  rely  on  correlations  between   attributes,  factors  and  behaviors  – and   rates  of  payment  or  default.   ž These  insights  are  used  for  prospective   credit  applicants.  
  5. 5. Example (2) Health Data & IoT ž Wearables -­ gadgets  affixed  to  the  body   which  collect  biometric  and  behavioral  data.   — Fitbit products  provided  to  employees  (for  free!).   ž Possible  future  uses  – calculating  insurance   premiums.   — Similar  processes  carried  out  by  smartphone applications.   ž Again,  firms  rely  on  “mere”  correlations   found  in  the  data  when  making  health-­ related  recommendation  and  judgments.  
  6. 6. What Do We Mean by “Just Correlation” ž Five  possible  variations  of  Big  Data  uses  – relying  upon:   1. Mere  Correlations 2. Correlation  +  Statistical  proof  of  causation.   3. Correlation  +  Experimental  evidence  of  causation   (natural  or  artificial  manipulation).   4. Correlation  +  reasonable  mechanism  hypothesis 5. Correlation  +  scientifically  proven  mechanism  found.   “Mechanism”  – term  of  art;;  an  explanation  of  a  phenomenon.   • Provides  additional  proof  as  to  the  existence  of  a   causal  relationship • Provides  scientific  knowledge.    
  7. 7. “Just Correlation” – What Can Go Wrong? ž Possible  outcomes  when  a  Correlation   between  Factor  “A”  and  “B”  was  found:   (i) A  (indeed)  causes  B (ii) A  does  not cause  B.  The  data  is  wrong.   (iii) A  does  not cause  B.  The  correlation  is   spurious.   (iv) A  does  not  cause  B.  B  causes  A. (v) A  does  not cause  B.  C  causes  both  A  and  B.  
  8. 8. The Benefits of “Just Correlation” 1. The  need  for  speed. 2. Low  costs. 3. Does  not  compromise  precision.   4. Does  not  steer  science  towards  existing   knowledge  and  theory -­ Limited  bias  against  unexplainable  findings.  
  9. 9. Just Correlation: Problems (1) ž Causation  as  a  “Quality  Check”: — Assists  in  the  removal  of  noise.   — Protects  us  from  “over-­fitting” ○ Do  we  need  a  “mechanism”,  or  does  statistical   causation  suffice?   — Mechanisms  assist  in  revealing  confounders. ž Having  a  theory  enables  generalization   of  findings.  
  10. 10. Just Correlation: Problems (2) ž Understanding  mechanisms  alerts  us  of   possible  side  effects.   — Important  factor  in  the  health  context.   ž Seeking  mechanisms  leads  to  positive   externalities  – knowledge  about  nature  and   society.   ž In  Conclusion:   Causation  provides  important   benefits  and  is  essential    in  the  health  context. — A  context-­specific  analysis  is  required  to  establish   whether  mechanisms  are  always  mandated.
  11. 11. Legal Hooks and Responses ž Law  should  not  intervene,  because:   — Market  still  self-­correct  if  mere  correlation  is  error-­ridden   (but…). — Intervention  might  undermine  innovation.   — Law  should  not  meddle  with  science  – it  might  serve  self   interests,  or  get  things  wrong. ž But… — Different  rules  should  be  applied  when  government  is   the  source  of  data  – could  require  or  restrict  uses.   — Specific  interventions  might  be  called  for  to  protect  the   interests  of  investors,  data  subjects and  those  affected   by  the  process.  
  12. 12. Investors ž Protect  investors  from  the  executive’s   reckless  conduct  – mere  reliance  on   correlation.   ž But, — Investors  should  look  after  their  own   interests. ○ Assure  disclosure  pertaining  to  this  specific   matter.  
  13. 13. Data Subjects ž Prediction  often  involves  personal  data — Compromises  privacy  rights  and  involves   balancing.   — Possible  questions:   ○ Was  the  data  de-­identified? ○ Was  consent  provided? ○ Should  processing  be  allowed  even  without   consent?   ž The privacy balance should consider overall benefits – and these require causation. — This balance will impact the legal findings as to whether data usage should be permitted.
  14. 14. Impacted Individuals (1) ž Correlations  lead,  at  times,  to  negative   treatment.   — With  health  data,  secondary  effects  might  also  follow   (such  as  stigma). ž Can  those  negatively  impacted  by  a  “mere”   correlation  bring  action  against  a  firm?  Are   such  actions  and  outcomes  “unfair”?   ○ If  a  prediction  proves  wrong,  equality  is  compromised.   — Equals  are  not  treated  equally  (FTC  report). — However,  private  firms  are  not  necessarily  subjected   to  such  a  fairness  requirement. ○ Protected  groups  might  not  be  implicated.   ○ Mitigation  via  competition  (over  time).
  15. 15. Impacted Individuals (2) ž When  might  the  fear  of  unfair  outcomes   render  “just  correlation”  – unjust?   — Government  (higher  fairness  standard) ○ And  also  highly  regulated  industries… ○ “Socially  meaningful”  industries   — Health-­care,  insurance,  credit.   — Monopoly  (no  mitigating  competition) — In  sum:  the  higher  standard  would  often   apply  in  the  health  and  medical  context.  
  16. 16. Thank  you! Comments  are  welcome: