SlideShare a Scribd company logo
1 of 29
Download to read offline
Measuring	
  Reliability	
  and	
  Validity	
  	
  
in	
  Human	
  Coding	
  and	
  	
  
Machine	
  Classifica9on	
  
	
  
Dr.	
  Stuart	
  Shulman	
  
May	
  2,	
  2014	
  
CAQDAS	
  Conference	
  2014	
  
“…a	
  wealth	
  of	
  informa0on	
  creates	
  a	
  poverty	
  of	
  a6en0on.”	
  
	
   	
  -­‐	
  Herbert	
  Simon,	
  1971	
  
Measuring reliability and validity in human coding and machine classification
•  This	
  research	
  has	
  been	
  supported	
  by	
  grants	
  from	
  the	
  NaGonal	
  Science	
  FoundaGon	
  
(NSF)	
  and	
  was	
  supplemented	
  through	
  interagency	
  agreements	
  between	
  the	
  US	
  
Environmental	
  ProtecGon	
  Agency,	
  the	
  US	
  Fish	
  &	
  Wildlife	
  Service,	
  and	
  the	
  NSF.	
  
	
  
–  EIA	
  0089892	
  (2001-­‐2002)	
  
v “SGER	
  CiGzen	
  Agenda-­‐SeVng	
  in	
  the	
  Regulatory	
  Process:	
  Electronic	
  CollecGon	
  and	
  
Synthesis	
  of	
  Public	
  Commentary”	
  
–  EIA	
  0327979	
  (2003-­‐2004)	
  
v “SGER	
  CollaboraGve:	
  A	
  Testbed	
  for	
  eRulemaking	
  Data”	
  
–  SES	
  0322662	
  (2003-­‐2005)	
  
v “Democracy	
  and	
  E-­‐Rulemaking:	
  	
  Comparing	
  TradiGonal	
  vs.	
  Electronic	
  Comment	
  from	
  a	
  
Discursive	
  DemocraGc	
  Framework”	
  
–  IIS	
  0429293	
  (2004-­‐2007)	
  	
  
v “CollaboraGve	
  Research:	
  Language	
  Processing	
  Technology	
  for	
  Electronic	
  Rulemaking”	
  	
  
–  SES-­‐0620673	
  (2007)	
  
v 	
  “Coding	
  across	
  the	
  Disciplines:	
  A	
  Project-­‐Based	
  Workshop	
  on	
  Manual	
  Text	
  AnnotaGon	
  
Techniques”	
  
–  IIS-­‐0705566	
  (2007-­‐2010)	
  
v “CollaboraGve	
  Research	
  III-­‐COR:	
  From	
  a	
  Pile	
  of	
  Documents	
  to	
  a	
  CollecGon	
  of	
  InformaGon:	
  
A	
  Framework	
  for	
  MulG-­‐Dimensional	
  Text	
  Analysis”	
  
	
  
•  Any	
  opinions,	
  findings	
  and	
  conclusions	
  or	
  recommenda9ons	
  expressed	
  in	
  this	
  material	
  
are	
  those	
  of	
  the	
  authors	
  and	
  do	
  not	
  necessarily	
  reflect	
  those	
  of	
  the	
  Na9onal	
  Science	
  
Founda9on	
  	
  
Acknowledgements	
  
An	
  Incredibly	
  Important	
  Book	
  
Qualita9ve	
  Methods:	
  Genes,	
  Taste,	
  or	
  Tac9c?	
  
•  Qualita9ve	
  by	
  birth	
  or	
  choice?	
  
–  Some	
  look	
  to	
  words	
  as	
  an	
  alternaGve	
  to	
  number	
  crunching	
  
–  Others	
  rooted	
  in	
  rich	
  and	
  meaningful	
  interpreGve	
  tradiGons	
  
•  Another	
  group	
  is	
  fluent	
  in	
  both	
  qual	
  &	
  quant	
  
–  Mixed	
  methods	
  open	
  up	
  rather	
  than	
  limits	
  fields	
  of	
  knowledge	
  
•  One	
  central	
  goal	
  is	
  valid	
  inferences	
  about	
  phenomena	
  
–  Replicable	
  and	
  transparent	
  methods	
  
–  AbenGon	
  to	
  error	
  and	
  correcGve	
  measures	
  
–  Internal	
  and	
  external	
  validaGon	
  of	
  results	
  
•  Using	
  computers	
  for	
  qualita9ve	
  data	
  analysis	
  helps,	
  but…	
  
–  Rigor	
  sGll	
  originates	
  with	
  the	
  research	
  design,	
  not	
  the	
  technology	
  
–  Socware	
  makes	
  beber	
  organizaGon	
  and	
  efficiency	
  possible	
  
–  Coders	
  enable	
  the	
  researcher	
  to	
  step	
  back	
  while	
  scaling	
  up	
  
Purist 	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Pluralist 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Posi9vist	
  
A	
  spectrum	
  of	
  approaches	
  to	
  working	
  with	
  qualita9ve	
  data	
  
Different	
  types	
  of	
  knowledge	
  claims	
  depending	
  where	
  you	
  sit	
  
deep	
  immersion	
  
closeness	
  to	
  data	
  
anGpathy	
  to	
  numbers	
  
credible	
  interpretaGon	
  
in-­‐depth	
  analysis	
  
contextual	
  
subjecGve	
  
experimental	
  	
  
mixed	
  method	
  
adapGve	
  hybrid	
  
flexible	
  approach	
  
interdisciplinary	
  
	
  
quanGtaGve	
  
focus	
  on	
  error	
  
measurement	
  criGcal	
  
validity	
  and	
  reliability	
  
replicaGon	
  &	
  objecGvity	
  
generalizaGon	
  
hypotheses	
  
These	
  choices	
  philosophical,	
  ideological,	
  poli9cal	
  and	
  ethical	
  
Emergent	
  proper9es	
  found	
  in	
  a	
  very	
  well	
  read	
  texts,	
  	
  
such	
  as	
  the	
  character	
  type	
  “extremist	
  agent	
  of	
  the	
  law”	
  
Agenda-­‐secng	
  in	
  the	
  press	
  
Rela9ons	
  between	
  Classes	
  
Rates	
  and	
  Terms	
  for	
  Credit	
  
Farm	
  Profitability	
  
Cost	
  of	
  Living	
  
Soil	
  Fer9lity	
  
Educa9on	
  
Explora9on	
  
Specula9on	
  
Coding	
  
Valida9on	
  
Skip	
  Ahead	
  10	
  Years:	
  
Display	
  Ideas	
  Using	
  	
  
IR	
  &	
  NLP	
  Techniques	
  
•  Informa9on	
  Retrieval	
  (IR)	
  
–  Search	
  and	
  cluster	
  topics	
  and	
  cross-­‐
correlate	
  by	
  stakeholders	
  
•  Natural	
  Language	
  Processing	
  (NLP)	
  
–  Grouped	
  by	
  opinion	
  and	
  writer	
  type	
  	
  
Con	
   Pro	
  
25,000	
  
20,000	
  
15,000	
  
10,000	
  
5,000	
  
Par	
  2.2(a1)	
  
Ø Con:	
  
ü 150,	
  818:	
  “impossible	
  to	
  maintain”	
  
ü 272:	
  “too	
  expensive	
  for	
  elderly”	
  
Ø Pro:	
  	
  
ü 169,	
  213,	
  391,	
  392,	
  394:	
  “already	
  
being	
  done	
  in	
  Alaska”	
  
ü 18:	
  “extend	
  to	
  children”	
  
Xxx	
  xx	
  xxx	
  
xx	
  x	
  xxx	
  x	
  
xxx	
  	
  
Xx	
  xxxx	
  x	
  
xxx	
  x	
  
xxxxxxx	
  x	
  
Xxxxx	
  x	
  xx	
  
xxxx	
  x	
  	
  xx	
  x	
  
Xx	
  xx	
  xxxx	
  x	
  
Xxx	
  xx	
  xxx	
  
xx	
  x	
  xxx	
  x	
  
xxx	
  	
  
Xx	
  xxxx	
  x	
  
xxx	
  x	
  
xxxxxxx	
  x	
  
Xxxxx	
  xx	
  
xxxx	
  xxx	
  
Xxx	
  xxx	
  
xxxxxxx	
  x	
  
xxx	
  xx	
  x	
  
Xx	
  xx	
  xxxx	
  x	
  
Xxx	
  xx	
  xxx	
  
xx	
  x	
  xxx	
  x	
  
xxx	
  	
  
Xx	
  xxxx	
  x	
  
xxx	
  x	
  
xxxxxxx	
  x	
  
Xxxxx	
  x	
  xx	
  
xxxx	
  x	
  xx	
  x	
  
Xx	
  xx	
  xxxx	
  x	
  
Stuart	
  W.	
  Shulman.	
  2003.	
  "An	
  Experiment	
  in	
  Digital	
  Government	
  at	
  the	
  United	
  States	
  
Na9onal	
  Organic	
  Program,"	
  Agriculture	
  and	
  Human	
  Values	
  20(3),	
  253-­‐265.	
  
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Coding	
  Web	
  Sites	
  and	
  Focus	
  Groups	
  to	
  Study	
  Agenda-­‐Secng	
  
Annota9on	
  to	
  Improve	
  Op9cal	
  Character	
  Recogni9on	
  
Over	
  13,000	
  hours	
  of	
  video	
  and	
  audio	
  were	
  recorded	
  of	
  the	
  public	
  spaces	
  in	
  a	
  LTC	
  facility’s	
  
demenGa	
  unit	
  in	
  	
  suburban	
  Pibsburgh,	
  PA.	
  	
  A	
  codebook	
  of	
  80+	
  codes	
  was	
  developed	
  to	
  
categorize	
  the	
  behavior	
  of	
  the	
  consenGng	
  residents	
  and	
  staff	
  (only	
  in	
  relaGon	
  to	
  paGents).	
  	
  
22	
  coders	
  spent	
  more	
  than	
  4,400	
  hours	
  over	
  a	
  period	
  of	
  22	
  months	
  coding	
  the	
  video	
  data.	
  
The	
  data	
  were	
  coded	
  using	
  the	
  Informedia	
  Digital	
  Video	
  Library	
  (IDVL),	
  an	
  interface	
  
designed	
  by	
  computer	
  scienGsts	
  at	
  Carnegie	
  Mellon	
  University.	
  
Measuring reliability and validity in human coding and machine classification
hjp://cat.ucsur.pij.edu	
  
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
Measuring reliability and validity in human coding and machine classification
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Dr.	
  Stuart	
  W.	
  Shulman	
  
Founder	
  &	
  CEO,	
  Texicer,	
  LLC	
  
Research	
  Associate	
  Professor,	
  Department	
  of	
  PoliGcal	
  Science	
  
University	
  of	
  Massachusebs	
  Amherst	
  
Director,	
  QualitaGve	
  Data	
  Analysis	
  Program	
  (QDAP)	
  
Associate	
  Director,	
  NaGonal	
  Center	
  for	
  Digital	
  Government	
  
Editor	
  Emeritus,	
  Journal	
  of	
  Informa0on	
  Technology	
  &	
  Poli0cs	
  
stu@texicer.com	
  
hbp://people.umass.edu/stu/	
  
@stuartwshulman	
  

More Related Content

Viewers also liked

Simulation 1
Simulation 1Simulation 1
Simulation 1jen4man1
 
What Every Teacher Should Know About Handwriting
What Every Teacher Should Know About HandwritingWhat Every Teacher Should Know About Handwriting
What Every Teacher Should Know About HandwritingDownhill Publishing LLC
 
Trakų pilis.kursai.tinklas.lt
Trakų pilis.kursai.tinklas.ltTrakų pilis.kursai.tinklas.lt
Trakų pilis.kursai.tinklas.ltAurelijaOK
 
Manual administrativo conquistadores 2014 pdf
Manual administrativo conquistadores 2014   pdfManual administrativo conquistadores 2014   pdf
Manual administrativo conquistadores 2014 pdfPerú
 
Ancientegypt.pathfinder
Ancientegypt.pathfinderAncientegypt.pathfinder
Ancientegypt.pathfinderaquagigi
 
CodersTrust Presentation at Digital World, Dhaka 2014
CodersTrust Presentation at Digital World, Dhaka 2014CodersTrust Presentation at Digital World, Dhaka 2014
CodersTrust Presentation at Digital World, Dhaka 2014Ferdinand Kjærulff
 
Narciso.it
Narciso.itNarciso.it
Narciso.itciorci
 

Viewers also liked (13)

Simulation 1
Simulation 1Simulation 1
Simulation 1
 
Award Maker 4 Teachers
Award Maker 4 TeachersAward Maker 4 Teachers
Award Maker 4 Teachers
 
What Every Teacher Should Know About Handwriting
What Every Teacher Should Know About HandwritingWhat Every Teacher Should Know About Handwriting
What Every Teacher Should Know About Handwriting
 
Trakų pilis.kursai.tinklas.lt
Trakų pilis.kursai.tinklas.ltTrakų pilis.kursai.tinklas.lt
Trakų pilis.kursai.tinklas.lt
 
Question 5
Question 5Question 5
Question 5
 
Manual administrativo conquistadores 2014 pdf
Manual administrativo conquistadores 2014   pdfManual administrativo conquistadores 2014   pdf
Manual administrativo conquistadores 2014 pdf
 
Ancientegypt.pathfinder
Ancientegypt.pathfinderAncientegypt.pathfinder
Ancientegypt.pathfinder
 
Mundo viral
Mundo viralMundo viral
Mundo viral
 
CodersTrust Presentation at Digital World, Dhaka 2014
CodersTrust Presentation at Digital World, Dhaka 2014CodersTrust Presentation at Digital World, Dhaka 2014
CodersTrust Presentation at Digital World, Dhaka 2014
 
Kongamano
KongamanoKongamano
Kongamano
 
Narciso.it
Narciso.itNarciso.it
Narciso.it
 
Projusticia
ProjusticiaProjusticia
Projusticia
 
Columbia Celebrates Winter
Columbia Celebrates WinterColumbia Celebrates Winter
Columbia Celebrates Winter
 

Similar to Measuring reliability and validity in human coding and machine classification

Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Stuart Shulman
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data SciencePhilip Bourne
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data SciencePhilip Bourne
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
 
To Preserve Or Not To Preserve?
To Preserve Or Not To Preserve?To Preserve Or Not To Preserve?
To Preserve Or Not To Preserve?pbajcsy
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...Tom Moritz
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Carole Goble
 
Parsec 191119 slideshare
Parsec 191119 slideshareParsec 191119 slideshare
Parsec 191119 slideshareAlison Specht
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data SciencePhilip Bourne
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirSpark Summit
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)Research Data Alliance
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewPhilip Bourne
 
Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Cagatay Turkay
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...William Gunn
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionFabio Stella
 

Similar to Measuring reliability and validity in human coding and machine classification (20)

Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
Text Analytics: From Colored Pens and Crumbly Papers to Custom Machine Classi...
 
Russell uia 4-19-2013
Russell uia 4-19-2013Russell uia 4-19-2013
Russell uia 4-19-2013
 
The UVA School of Data Science
The UVA School of Data ScienceThe UVA School of Data Science
The UVA School of Data Science
 
UVA School of Data Science
UVA School of Data ScienceUVA School of Data Science
UVA School of Data Science
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
To Preserve Or Not To Preserve?
To Preserve Or Not To Preserve?To Preserve Or Not To Preserve?
To Preserve Or Not To Preserve?
 
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
California Ocean Science Trust " Building a Sustainable Knowledge Base for ...
 
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
Trust and Accountability: experiences from the FAIRDOM Commons Initiative.
 
Parsec 191119 slideshare
Parsec 191119 slideshareParsec 191119 slideshare
Parsec 191119 slideshare
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
One View of Data Science
One View of Data ScienceOne View of Data Science
One View of Data Science
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
Open Data is not Enough (final version)
Open Data is not Enough (final version)Open Data is not Enough (final version)
Open Data is not Enough (final version)
 
What Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's ViewWhat Data Science Will Mean to You - One Person's View
What Data Science Will Mean to You - One Person's View
 
Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?Data Science: Origins, Methods, Challenges and the future?
Data Science: Origins, Methods, Challenges and the future?
 
Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 

More from Stuart Shulman

Fear and loathing on the social campaign trail
Fear and loathing on the social campaign trailFear and loathing on the social campaign trail
Fear and loathing on the social campaign trailStuart Shulman
 
Fear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailFear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailStuart Shulman
 
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Stuart Shulman
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsStuart Shulman
 
Text Analytics for Social Data Using DiscoverText & Sifter
 Text Analytics for Social Data Using DiscoverText & Sifter Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterStuart Shulman
 
Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterText Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterStuart Shulman
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningStuart Shulman
 
CAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanCAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanStuart Shulman
 
Technology for Citizen Voices
Technology for Citizen VoicesTechnology for Citizen Voices
Technology for Citizen VoicesStuart Shulman
 
DiscoverText: Tools for Text
DiscoverText: Tools for TextDiscoverText: Tools for Text
DiscoverText: Tools for TextStuart Shulman
 
Citizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataCitizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataStuart Shulman
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product OverviewStuart Shulman
 
Importing bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadImporting bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadStuart Shulman
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefingStuart Shulman
 

More from Stuart Shulman (17)

Fear and loathing on the social campaign trail
Fear and loathing on the social campaign trailFear and loathing on the social campaign trail
Fear and loathing on the social campaign trail
 
Fear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign TrailFear and Loathing on the Social Campaign Trail
Fear and Loathing on the Social Campaign Trail
 
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
Texifter Presentation at Boston New Technology’s #BNT77 Startup Showcase!
 
CoderRank: Creating Gold Standards
CoderRank: Creating Gold StandardsCoderRank: Creating Gold Standards
CoderRank: Creating Gold Standards
 
Text Analytics for Social Data Using DiscoverText & Sifter
 Text Analytics for Social Data Using DiscoverText & Sifter Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & Sifter
 
Text Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & SifterText Analytics for Social Data Using DiscoverText & Sifter
Text Analytics for Social Data Using DiscoverText & Sifter
 
Twitter for Research
Twitter for ResearchTwitter for Research
Twitter for Research
 
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine LearningSifting Social Data: Word Sense Disambiguation Using Machine Learning
Sifting Social Data: Word Sense Disambiguation Using Machine Learning
 
CAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart ShulmanCAQDAS 2014 Pecha Kucha - Stuart Shulman
CAQDAS 2014 Pecha Kucha - Stuart Shulman
 
Technology for Citizen Voices
Technology for Citizen VoicesTechnology for Citizen Voices
Technology for Citizen Voices
 
DiscoverText: Tools for Text
DiscoverText: Tools for TextDiscoverText: Tools for Text
DiscoverText: Tools for Text
 
Summit slide loop ny
Summit slide loop nySummit slide loop ny
Summit slide loop ny
 
Citizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigDataCitizen Voices in a Networked Age of #BigData
Citizen Voices in a Networked Age of #BigData
 
DiscoverText Product Overview
DiscoverText Product OverviewDiscoverText Product Overview
DiscoverText Product Overview
 
Importing bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file uploadImporting bulk outlook email into DiscoverText - the .pst file upload
Importing bulk outlook email into DiscoverText - the .pst file upload
 
Texifter
TexifterTexifter
Texifter
 
Future of text analysis forrester briefing
Future of text analysis   forrester briefingFuture of text analysis   forrester briefing
Future of text analysis forrester briefing
 

Recently uploaded

2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptxSandy Millin
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17Celine George
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsEugene Lysak
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxheathfieldcps1
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxSaurabhParmar42
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17Celine George
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice documentXsasf Sfdfasd
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17Celine George
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17Celine George
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfYu Kanazawa / Osaka University
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxDr. Asif Anas
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationMJDuyan
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...CaraSkikne1
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxraviapr7
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.raviapr7
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxraviapr7
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and stepobaje godwin sunday
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfTechSoup
 

Recently uploaded (20)

2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
2024.03.23 What do successful readers do - Sandy Millin for PARK.pptx
 
How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17How to Add a many2many Relational Field in Odoo 17
How to Add a many2many Relational Field in Odoo 17
 
The Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George WellsThe Stolen Bacillus by Herbert George Wells
The Stolen Bacillus by Herbert George Wells
 
The basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptxThe basics of sentences session 10pptx.pptx
The basics of sentences session 10pptx.pptx
 
CAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptxCAULIFLOWER BREEDING 1 Parmar pptx
CAULIFLOWER BREEDING 1 Parmar pptx
 
How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17How to Show Error_Warning Messages in Odoo 17
How to Show Error_Warning Messages in Odoo 17
 
The Singapore Teaching Practice document
The Singapore Teaching Practice documentThe Singapore Teaching Practice document
The Singapore Teaching Practice document
 
How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17How to Make a Field read-only in Odoo 17
How to Make a Field read-only in Odoo 17
 
How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17How to Add a New Field in Existing Kanban View in Odoo 17
How to Add a New Field in Existing Kanban View in Odoo 17
 
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdfP4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
 
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptxUltra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
 
UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024UKCGE Parental Leave Discussion March 2024
UKCGE Parental Leave Discussion March 2024
 
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdfPersonal Resilience in Project Management 2 - TV Edit 1a.pdf
Personal Resilience in Project Management 2 - TV Edit 1a.pdf
 
Benefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive EducationBenefits & Challenges of Inclusive Education
Benefits & Challenges of Inclusive Education
 
5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...5 charts on South Africa as a source country for international student recrui...
5 charts on South Africa as a source country for international student recrui...
 
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptxEducation and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
 
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
 
Prescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptxPrescribed medication order and communication skills.pptx
Prescribed medication order and communication skills.pptx
 
General views of Histopathology and step
General views of Histopathology and stepGeneral views of Histopathology and step
General views of Histopathology and step
 
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdfMaximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
Maximizing Impact_ Nonprofit Website Planning, Budgeting, and Design.pdf
 

Measuring reliability and validity in human coding and machine classification

  • 1. Measuring  Reliability  and  Validity     in  Human  Coding  and     Machine  Classifica9on     Dr.  Stuart  Shulman   May  2,  2014   CAQDAS  Conference  2014   “…a  wealth  of  informa0on  creates  a  poverty  of  a6en0on.”      -­‐  Herbert  Simon,  1971  
  • 3. •  This  research  has  been  supported  by  grants  from  the  NaGonal  Science  FoundaGon   (NSF)  and  was  supplemented  through  interagency  agreements  between  the  US   Environmental  ProtecGon  Agency,  the  US  Fish  &  Wildlife  Service,  and  the  NSF.     –  EIA  0089892  (2001-­‐2002)   v “SGER  CiGzen  Agenda-­‐SeVng  in  the  Regulatory  Process:  Electronic  CollecGon  and   Synthesis  of  Public  Commentary”   –  EIA  0327979  (2003-­‐2004)   v “SGER  CollaboraGve:  A  Testbed  for  eRulemaking  Data”   –  SES  0322662  (2003-­‐2005)   v “Democracy  and  E-­‐Rulemaking:    Comparing  TradiGonal  vs.  Electronic  Comment  from  a   Discursive  DemocraGc  Framework”   –  IIS  0429293  (2004-­‐2007)     v “CollaboraGve  Research:  Language  Processing  Technology  for  Electronic  Rulemaking”     –  SES-­‐0620673  (2007)   v   “Coding  across  the  Disciplines:  A  Project-­‐Based  Workshop  on  Manual  Text  AnnotaGon   Techniques”   –  IIS-­‐0705566  (2007-­‐2010)   v “CollaboraGve  Research  III-­‐COR:  From  a  Pile  of  Documents  to  a  CollecGon  of  InformaGon:   A  Framework  for  MulG-­‐Dimensional  Text  Analysis”     •  Any  opinions,  findings  and  conclusions  or  recommenda9ons  expressed  in  this  material   are  those  of  the  authors  and  do  not  necessarily  reflect  those  of  the  Na9onal  Science   Founda9on     Acknowledgements  
  • 5. Qualita9ve  Methods:  Genes,  Taste,  or  Tac9c?   •  Qualita9ve  by  birth  or  choice?   –  Some  look  to  words  as  an  alternaGve  to  number  crunching   –  Others  rooted  in  rich  and  meaningful  interpreGve  tradiGons   •  Another  group  is  fluent  in  both  qual  &  quant   –  Mixed  methods  open  up  rather  than  limits  fields  of  knowledge   •  One  central  goal  is  valid  inferences  about  phenomena   –  Replicable  and  transparent  methods   –  AbenGon  to  error  and  correcGve  measures   –  Internal  and  external  validaGon  of  results   •  Using  computers  for  qualita9ve  data  analysis  helps,  but…   –  Rigor  sGll  originates  with  the  research  design,  not  the  technology   –  Socware  makes  beber  organizaGon  and  efficiency  possible   –  Coders  enable  the  researcher  to  step  back  while  scaling  up  
  • 6. Purist                                          Pluralist                                                  Posi9vist   A  spectrum  of  approaches  to  working  with  qualita9ve  data   Different  types  of  knowledge  claims  depending  where  you  sit   deep  immersion   closeness  to  data   anGpathy  to  numbers   credible  interpretaGon   in-­‐depth  analysis   contextual   subjecGve   experimental     mixed  method   adapGve  hybrid   flexible  approach   interdisciplinary     quanGtaGve   focus  on  error   measurement  criGcal   validity  and  reliability   replicaGon  &  objecGvity   generalizaGon   hypotheses   These  choices  philosophical,  ideological,  poli9cal  and  ethical  
  • 7. Emergent  proper9es  found  in  a  very  well  read  texts,     such  as  the  character  type  “extremist  agent  of  the  law”  
  • 9. Rela9ons  between  Classes   Rates  and  Terms  for  Credit   Farm  Profitability   Cost  of  Living   Soil  Fer9lity   Educa9on   Explora9on   Specula9on   Coding   Valida9on  
  • 10. Skip  Ahead  10  Years:   Display  Ideas  Using     IR  &  NLP  Techniques   •  Informa9on  Retrieval  (IR)   –  Search  and  cluster  topics  and  cross-­‐ correlate  by  stakeholders   •  Natural  Language  Processing  (NLP)   –  Grouped  by  opinion  and  writer  type     Con   Pro   25,000   20,000   15,000   10,000   5,000   Par  2.2(a1)   Ø Con:   ü 150,  818:  “impossible  to  maintain”   ü 272:  “too  expensive  for  elderly”   Ø Pro:     ü 169,  213,  391,  392,  394:  “already   being  done  in  Alaska”   ü 18:  “extend  to  children”   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  x  xx   xxxx  x    xx  x   Xx  xx  xxxx  x   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  xx   xxxx  xxx   Xxx  xxx   xxxxxxx  x   xxx  xx  x   Xx  xx  xxxx  x   Xxx  xx  xxx   xx  x  xxx  x   xxx     Xx  xxxx  x   xxx  x   xxxxxxx  x   Xxxxx  x  xx   xxxx  x  xx  x   Xx  xx  xxxx  x  
  • 11. Stuart  W.  Shulman.  2003.  "An  Experiment  in  Digital  Government  at  the  United  States   Na9onal  Organic  Program,"  Agriculture  and  Human  Values  20(3),  253-­‐265.  
  • 18. Coding  Web  Sites  and  Focus  Groups  to  Study  Agenda-­‐Secng  
  • 19. Annota9on  to  Improve  Op9cal  Character  Recogni9on  
  • 20. Over  13,000  hours  of  video  and  audio  were  recorded  of  the  public  spaces  in  a  LTC  facility’s   demenGa  unit  in    suburban  Pibsburgh,  PA.    A  codebook  of  80+  codes  was  developed  to   categorize  the  behavior  of  the  consenGng  residents  and  staff  (only  in  relaGon  to  paGents).     22  coders  spent  more  than  4,400  hours  over  a  period  of  22  months  coding  the  video  data.   The  data  were  coded  using  the  Informedia  Digital  Video  Library  (IDVL),  an  interface   designed  by  computer  scienGsts  at  Carnegie  Mellon  University.  
  • 28.                                
  • 29. Dr.  Stuart  W.  Shulman   Founder  &  CEO,  Texicer,  LLC   Research  Associate  Professor,  Department  of  PoliGcal  Science   University  of  Massachusebs  Amherst   Director,  QualitaGve  Data  Analysis  Program  (QDAP)   Associate  Director,  NaGonal  Center  for  Digital  Government   Editor  Emeritus,  Journal  of  Informa0on  Technology  &  Poli0cs   stu@texicer.com   hbp://people.umass.edu/stu/   @stuartwshulman