SlideShare a Scribd company logo
Systematic Review is e-Discovery
in Doctor’s Clothing
Joint work with
Matt Lease
ir.ischool.utexas.edu
slideshare.net/mattlease
@mattlease
ml@utexas.edu
Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas)
Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
“The place where people & technology meet”
~ Wobbrock et al., 2009
www.ischools.org
2
• System-Reviews
• Electronic Discovery (e-Discovery)
• Toward a Joint Research Agenda
3Matt Lease <ml@utexas.edu>
Roadmap
• System-Reviews
• Electronic Discovery (e-Discovery)
• Toward a Joint Research Agenda
4Matt Lease <ml@utexas.edu>
Roadmap
Evidence-Based Medicine n.
The conscientious, explicit and judicious
use of current best evidence in making
decisions about the care of
individual patients
5
Systematic reviews: from biomedical
articles to actionable evidence
6
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data 7
Formulate RQ &
Boolean Query
Boolean Search
Document Collection
All Tasks but #2 done
manually by MDs
On average, 75 articles describing results from
clinical trials are published every day.
Bastian, PLoS Med, 2010
The median length to complete a single review: 1110
person-hours.
Allen & Olkin, JAMA, 1998
8
12
Technologies for semi-automated
citation screening are relatively mature
and slowly gaining acceptance
Research on citation screening
• Methods for handling imbalance with asymmetric costs [ICDM
2011; ICDM 2012; KAIS 2013]
• Active learning strategies [KDD 2010; SDM 2011; KDD 2013;]
– Nguyen, Wallace, and Lease. Combining Crowd and Expert
Labels using Decision Theoretic Active Learning. HCOMP 2015.
• Test Collection: github.com/bwallace/crowd-sourced-ebm
• Dually supervised methods [ICML 2011; KDD 2010]
13
• System-Reviews
• Electronic Discovery (e-Discovery)
• Toward a Joint Research Agenda
14Matt Lease <ml@utexas.edu>
Roadmap
PubMed
?
2 search database
1 formulate question,
protocol & query
4 extract data
treatment
outcome
ba
c d
3 screen retrieved citations
Studies
AIMS1988
ASSET1988
Aber1976
Amery1969
Anderson1983
Bassand1986
Bett1973
Bossaert1987
Brunelli1988
Buchalter1987
Croydon1987
Dewar1963
Durand1987
ECSG−11979
ECSG−21988
EWP1971
Fletcher1959
GISSI1986
Gormsen1973
Guerci1987
Heikinheim1971
ISAM1986
ISISPilot1987
ISIS−21988
Ikram1986
Julian1987
Khaja1983
Leiboff1984
Maublant1988
Meinertz1988
NHFAustra1988
Olson1986
Raizner1985
Rentrop1984
Sainsous1986
Schreiber1986
Simoons1985
TICO1988
Topol1987
WWICSK1983
WWIVSK1988
White1987
Overall (I^2=19% , P=0.147)
0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26
OddsRatio(logscale)
5 synthesize extracted data 15
Request for
Production (RFP):
Boolean Query
Review Documents for
“Responsiveness” Parties use documents
Review Responsive
Documents for Privilege
Boolean Search
Document Collection
Electronically Stored Information (ESI)
e.g., Enron email archive
Manual Review does not Scale
16
Paul, George L., and Jason R. Baron.
Information inflation: Can the legal
system adapt? Rich. JL & Tech. 13 (2007).
IR Research in e-Discovery
• NIST TREC Track: 2006-2011
• Oard & Webber, FnTIR Book, 2013
• A variety of published work at SIGIR++
– e.g., Cormack & Grossman, SIGIR 2016
17
• System-Reviews
• Electronic Discovery (e-Discovery)
• Toward a Joint Research Agenda
18Matt Lease <ml@utexas.edu>
Roadmap
Commonalities
• Need high-recall with bounded cost
• Follow 3-Stage Pipeline Today
– Boolean query
– Screening (traditionally manual by experts)
– Final review & use
• Pipeline approach useful but limits improvement
– overall framing & unrecoverable errors
• Limiting reliance on experts
– Traditionally assumed to be infallible 19
Can we crowdsource screening?
Michael Mortenson, Byron C. Wallace, Gaelen Adam, Tom Trikalinos and Tim Kraska.
Crowdsourcing Citation Screening for Systematic Reviews. (Under review).
20
21
Total Recall: Applications
22
E-Discovery
Total Recall: Strategies
23
Conclusion
• Systematic Review & e-Discovery have much in common,
but SR has received relatively little attention in IR
– Open problems & current assumptions give IR researchers
fertile opportunities for research beyond other IR tasks
– Public test collections available for both
• github.com/bwallace/crowd-sourced-ebm
• Aaron Cohen’s: http://skynet.ohsu.edu/~cohenaa/systematic-drug-
class-review-data.html
– Reading list: https://github.com/bwallace/automating-ebm-
resources/wiki/Papers
• TREC Total Recall Track (trec-total-recall.org) offers a
great forum for bringing together those interested
24
Thank You!
ir.ischool.utexas.eduSlides: www.slideshare.net/mattlease
25

More Related Content

More from Matthew Lease

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
Matthew Lease
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Matthew Lease
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
Matthew Lease
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Matthew Lease
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
Matthew Lease
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
Matthew Lease
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Matthew Lease
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
Matthew Lease
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Matthew Lease
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Matthew Lease
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
Matthew Lease
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Matthew Lease
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
Matthew Lease
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
Matthew Lease
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
Matthew Lease
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Matthew Lease
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
Matthew Lease
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
Matthew Lease
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Matthew Lease
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
Matthew Lease
 

More from Matthew Lease (20)

Automated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey ResponsesAutomated Models for Quantifying Centrality of Survey Responses
Automated Models for Quantifying Centrality of Survey Responses
 
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
Key Challenges in Moderating Social Media: Accuracy, Cost, Scalability, and S...
 
Explainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loopExplainable Fact Checking with Humans in-the-loop
Explainable Fact Checking with Humans in-the-loop
 
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
Adventures in Crowdsourcing : Toward Safer Content Moderation & Better Suppor...
 
AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd AI & Work, with Transparency & the Crowd
AI & Work, with Transparency & the Crowd
 
Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation Designing Human-AI Partnerships to Combat Misinfomation
Designing Human-AI Partnerships to Combat Misinfomation
 
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
Designing at the Intersection of HCI & AI: Misinformation & Crowdsourced Anno...
 
But Who Protects the Moderators?
But Who Protects the Moderators?But Who Protects the Moderators?
But Who Protects the Moderators?
 
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
Believe it or not: Designing a Human-AI Partnership for Mixed-Initiative Fact...
 
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
Mix and Match: Collaborative Expert-Crowd Judging for Building Test Collectio...
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
Your Behavior Signals Your Reliability: Modeling Crowd Behavioral Traces to E...
 
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
What Can Machine Learning & Crowdsourcing Do for You? Exploring New Tools for...
 
The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)The Rise of Crowd Computing (December 2015)
The Rise of Crowd Computing (December 2015)
 
Toward Better Crowdsourcing Science
 Toward Better Crowdsourcing Science Toward Better Crowdsourcing Science
Toward Better Crowdsourcing Science
 
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work PlatformsBeyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
Beyond Mechanical Turk: An Analysis of Paid Crowd Work Platforms
 
The Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject CrowdsourcingThe Search for Truth in Objective & Subject Crowdsourcing
The Search for Truth in Objective & Subject Crowdsourcing
 
Toward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd WorkToward Effective and Sustainable Online Crowd Work
Toward Effective and Sustainable Online Crowd Work
 
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
Multidimensional Relevance Modeling via Psychometrics & Crowdsourcing: ACM SI...
 
Crowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine EvaluationCrowdsourcing: From Aggregation to Search Engine Evaluation
Crowdsourcing: From Aggregation to Search Engine Evaluation
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Systematic Review is e-Discovery in Doctor’s Clothing

  • 1. Systematic Review is e-Discovery in Doctor’s Clothing Joint work with Matt Lease ir.ischool.utexas.edu slideshare.net/mattlease @mattlease ml@utexas.edu Gordon V. Cormack (U. Waterloo) An Thanh Nguyen (U. Texas) Thomas A. Trikalinos (Brown U.) Byron C. Wallace (U. Texas)
  • 2. “The place where people & technology meet” ~ Wobbrock et al., 2009 www.ischools.org 2
  • 3. • System-Reviews • Electronic Discovery (e-Discovery) • Toward a Joint Research Agenda 3Matt Lease <ml@utexas.edu> Roadmap
  • 4. • System-Reviews • Electronic Discovery (e-Discovery) • Toward a Joint Research Agenda 4Matt Lease <ml@utexas.edu> Roadmap
  • 5. Evidence-Based Medicine n. The conscientious, explicit and judicious use of current best evidence in making decisions about the care of individual patients 5
  • 6. Systematic reviews: from biomedical articles to actionable evidence 6
  • 7. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data 7 Formulate RQ & Boolean Query Boolean Search Document Collection All Tasks but #2 done manually by MDs
  • 8. On average, 75 articles describing results from clinical trials are published every day. Bastian, PLoS Med, 2010 The median length to complete a single review: 1110 person-hours. Allen & Olkin, JAMA, 1998 8
  • 9.
  • 10.
  • 11.
  • 12. 12 Technologies for semi-automated citation screening are relatively mature and slowly gaining acceptance
  • 13. Research on citation screening • Methods for handling imbalance with asymmetric costs [ICDM 2011; ICDM 2012; KAIS 2013] • Active learning strategies [KDD 2010; SDM 2011; KDD 2013;] – Nguyen, Wallace, and Lease. Combining Crowd and Expert Labels using Decision Theoretic Active Learning. HCOMP 2015. • Test Collection: github.com/bwallace/crowd-sourced-ebm • Dually supervised methods [ICML 2011; KDD 2010] 13
  • 14. • System-Reviews • Electronic Discovery (e-Discovery) • Toward a Joint Research Agenda 14Matt Lease <ml@utexas.edu> Roadmap
  • 15. PubMed ? 2 search database 1 formulate question, protocol & query 4 extract data treatment outcome ba c d 3 screen retrieved citations Studies AIMS1988 ASSET1988 Aber1976 Amery1969 Anderson1983 Bassand1986 Bett1973 Bossaert1987 Brunelli1988 Buchalter1987 Croydon1987 Dewar1963 Durand1987 ECSG−11979 ECSG−21988 EWP1971 Fletcher1959 GISSI1986 Gormsen1973 Guerci1987 Heikinheim1971 ISAM1986 ISISPilot1987 ISIS−21988 Ikram1986 Julian1987 Khaja1983 Leiboff1984 Maublant1988 Meinertz1988 NHFAustra1988 Olson1986 Raizner1985 Rentrop1984 Sainsous1986 Schreiber1986 Simoons1985 TICO1988 Topol1987 WWICSK1983 WWIVSK1988 White1987 Overall (I^2=19% , P=0.147) 0 0.01 0.02 0.04 0.08 0.190.270.38 0.76 1.91 3.82 7.65 18.26 OddsRatio(logscale) 5 synthesize extracted data 15 Request for Production (RFP): Boolean Query Review Documents for “Responsiveness” Parties use documents Review Responsive Documents for Privilege Boolean Search Document Collection Electronically Stored Information (ESI) e.g., Enron email archive
  • 16. Manual Review does not Scale 16 Paul, George L., and Jason R. Baron. Information inflation: Can the legal system adapt? Rich. JL & Tech. 13 (2007).
  • 17. IR Research in e-Discovery • NIST TREC Track: 2006-2011 • Oard & Webber, FnTIR Book, 2013 • A variety of published work at SIGIR++ – e.g., Cormack & Grossman, SIGIR 2016 17
  • 18. • System-Reviews • Electronic Discovery (e-Discovery) • Toward a Joint Research Agenda 18Matt Lease <ml@utexas.edu> Roadmap
  • 19. Commonalities • Need high-recall with bounded cost • Follow 3-Stage Pipeline Today – Boolean query – Screening (traditionally manual by experts) – Final review & use • Pipeline approach useful but limits improvement – overall framing & unrecoverable errors • Limiting reliance on experts – Traditionally assumed to be infallible 19
  • 20. Can we crowdsource screening? Michael Mortenson, Byron C. Wallace, Gaelen Adam, Tom Trikalinos and Tim Kraska. Crowdsourcing Citation Screening for Systematic Reviews. (Under review). 20
  • 21. 21
  • 24. Conclusion • Systematic Review & e-Discovery have much in common, but SR has received relatively little attention in IR – Open problems & current assumptions give IR researchers fertile opportunities for research beyond other IR tasks – Public test collections available for both • github.com/bwallace/crowd-sourced-ebm • Aaron Cohen’s: http://skynet.ohsu.edu/~cohenaa/systematic-drug- class-review-data.html – Reading list: https://github.com/bwallace/automating-ebm- resources/wiki/Papers • TREC Total Recall Track (trec-total-recall.org) offers a great forum for bringing together those interested 24