Measuring reliability and validity in human coding and machine classification

Slides delivered as a part of #CAQDAS14.

In 1989 the Department of Sociology at the University of Surrey convened the world's first conference on qualitative software, which brought together qualitative methodologists and software developers who debated the pros and cons of the use of technology for qualitative data analysis. The result was a book (Fielding & Lee (1991) Using Computers in Qualitative Research, Sage Publications), the setting-up of the CAQDAS Networking Project and many other conferences concerning the topics over the years.

This conference will be another opportunity for methodologists, developers and researchers to come together and debate the issues.There will be keynote papers by leading experts in the field, software support clinics and opportunities to present work in progress.


  Measuring  Reliability  and  Validity     in  Human  Coding  and     Machine  Classifica9on     Dr.  Stuart  Shulman   May  2,  2014   CAQDAS  Conference  2014   "…a  wealth  of  informa0on  creates  a  poverty  of  a6en0on."      -­‐  Herbert  Simon,  1971  
  • 2. •  This  research  has  been  supported  by  grants  from  the  NaGonal  Science  FoundaGon   (NSF)  and  was  supplemented  through  interagency  agreements  between  the  US   Environmental  ProtecGon  Agency,  the  US  Fish  &  Wildlife  Service,  and  the  NSF.     –  EIA  0089892  (2001-­‐2002)   v “SGER  CiGzen  Agenda-­‐SeVng  in  the  Regulatory  Process:  Electronic  CollecGon  and   Synthesis  of  Public  Commentary”   –  EIA  0327979  (2003-­‐2004)   v “SGER  CollaboraGve:  A  Testbed  for  eRulemaking  Data”   –  SES  0322662  (2003-­‐2005)   v “Democracy  and  E-­‐Rulemaking:    Comparing  TradiGonal  vs.  Electronic  Comment  from  a   Discursive  DemocraGc  Framework”   –  IIS  0429293  (2004-­‐2007)     v “CollaboraGve  Research:  Language  Processing  Technology  for  Electronic  Rulemaking”     –  SES-­‐0620673  (2007)   v   “Coding  across  the  Disciplines:  A  Project-­‐Based  Workshop  on  Manual  Text  AnnotaGon   Techniques”   –  IIS-­‐0705566  (2007-­‐2010)   v “CollaboraGve  Research  III-­‐COR:  From  a  Pile  of  Documents  to  a  CollecGon  of  InformaGon:   A  Framework  for  MulG-­‐Dimensional  Text  Analysis”     •  Any  opinions,  findings  and  conclusions  or  recommenda9ons  expressed  in  this  material   are  those  of  the  authors  and  do  not  necessarily  reflect  those  of  the  Na9onal  Science   Founda9on     Acknowledgements  
  • 4. Qualita9ve  Methods:  Genes,  Taste,  or  Tac9c?   •  Qualita9ve  by  birth  or  choice?   –  Some  look  to  words  as  an  alternaGve  to  number  crunching   –  Others  rooted  in  rich  and  meaningful  interpreGve  tradiGons   •  Another  group  is  fluent  in  both  qual  &  quant   –  Mixed  methods  open  up  rather  than  limits  fields  of  knowledge   •  One  central  goal  is  valid  inferences  about  phenomena   –  Replicable  and  transparent  methods   –  AbenGon  to  error  and  correcGve  measures   –  Internal  and  external  validaGon  of  results   •  Using  computers  for  qualita9ve  data  analysis  helps,  but…   –  Rigor  sGll  originates  with  the  research  design,  not  the  technology   –  Socware  makes  beber  organizaGon  and  efficiency  possible   –  Coders  enable  the  researcher  to  step  back  while  scaling  up  
  • 5. Purist                                          Pluralist                                                  Posi9vist   A  spectrum  of  approaches  to  working  with  qualita9ve  data   Different  types  of  knowledge  claims  depending  where  you  sit   deep  immersion   closeness  to  data   anGpathy  to  numbers   credible  interpretaGon   in-­‐depth  analysis   contextual   subjecGve   experimental     mixed  method   adapGve  hybrid   flexible  approach   interdisciplinary     quanGtaGve   focus  on  error   measurement  criGcal   validity  and  reliability   replicaGon  &  objecGvity   generalizaGon   hypotheses   These  choices  philosophical,  ideological,  poli9cal  and  ethical  
  • 6. Emergent  proper9es  found  in  a  very  well  read  texts,     such  as  the  character  type  “extremist  agent  of  the  law”  
  • 7. Agenda-­‐secng  in  the  press  
  • 8. Rela9ons  between  Classes   Rates  and  Terms  for  Credit   Farm  Profitability   Cost  of  Living   Soil  Fer9lity   Educa9on   Explora9on   Specula9on   Coding   Valida9on  
  • 9. Skip  Ahead  10  Years:   Display  Ideas  Using     IR  &  NLP  Techniques   •  Informa9on  Retrieval  (IR)   –  Search  and  cluster  topics  and  cross-­‐ correlate  by  stakeholders   •  Natural  Language  Processing  (NLP)   –  Grouped  by  opinion  and  writer  type     Con   Pro   25,000   20,000   15,000   10,000   5,000   Par  2.2(a1)   Ø Con:   ü 150,  818:  “impossible  to  maintain”   ü 272:  “too  expensive  for  elderly”   Ø Pro:     ü 169,  213,  391,  392,  394:  “already   being  done  in  Alaska”   ü 18:  “extend  to  children”   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  x  xx   xxxx  x    xx  x   Xx  xx  xxxx  x   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  xx   xxxx  xxx   Xxx  xxx   xxxxxxx  x   xxx  xx  x   Xx  xx  xxxx  x   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  x  xx   xxxx  x  xx  x   Xx  xx  xxxx  x  
  • 10. Stuart  W.  Shulman.  2003.  "An  Experiment  in  Digital  Government  at  the  United  States   Na9onal  Organic  Program,"  Agriculture  and  Human  Values  20(3),  253-­‐265.  
  • 11. Coding  Web  Sites  and  Focus  Groups  to  Study  Agenda-­‐Secng  
  • 12. Annota9on  to  Improve  Op9cal  Character  Recogni9on  
  • 13. Over  13,000  hours  of  video  and  audio  were  recorded  of  the  public  spaces  in  a  LTC  facility’s   demenGa  unit  in    suburban  Pibsburgh,  PA.    A  codebook  of  80+  codes  was  developed  to   categorize  the  behavior  of  the  consenGng  residents  and  staff  (only  in  relaGon  to  paGents).     22  coders  spent  more  than  4,400  hours  over  a  period  of  22  months  coding  the  video  data.   The  data  were  coded  using  the  Informedia  Digital  Video  Library  (IDVL),  an  interface   designed  by  computer  scienGsts  at  Carnegie  Mellon  University.  
  • 16. Dr.  Stuart  W.  Shulman   Founder  &  CEO,  Texicer,  LLC   Research  Associate  Professor,  Department  of  PoliGcal  Science   University  of  Massachusebs  Amherst   Director,  QualitaGve  Data  Analysis  Program  (QDAP)   Associate  Director,  NaGonal  Center  for  Digital  Government   Editor  Emeritus,  Journal  of  Informa0on  Technology  &  Poli0cs   stu@texicer.com   hbp://people.umass.edu/stu/   @stuartwshulman