SlideShare a Scribd company logo
1 of 56
Download to read offline
Text mining
Text mining the PCD literature
PCD validity
Uses and Validity of Primary Care Database studies
May 2013
David Springate, Evan Kontopantelis, Ivan Olier, David Reeves
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
2 Text-mining the PCD literature
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
2 Text-mining the PCD literature
What is being studied using PCD’s?
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
2 Text-mining the PCD literature
What is being studied using PCD’s?
Changes in topics of investigation over time
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
2 Text-mining the PCD literature
What is being studied using PCD’s?
Changes in topics of investigation over time
3 Validity of Clinical coding
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Outline
1 Use of text-mining to explore the scientific literature
2 Text-mining the PCD literature
What is being studied using PCD’s?
Changes in topics of investigation over time
3 Validity of Clinical coding
4 ClinicalCodes.org : A new repository for clinical code lists
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Text mining
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
Automatically extracting structured information from
unstructured text
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
Automatically extracting structured information from
unstructured text
2 Semantic searching
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
Automatically extracting structured information from
unstructured text
2 Semantic searching
Improves search accuracy by including context into a search
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
Automatically extracting structured information from
unstructured text
2 Semantic searching
Improves search accuracy by including context into a search
3 Knowledge discovery
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
What is it?
The process of extracting high-quality structured information
from unstructured text (e.g. Scientific literature).
Uses a variety of computational and statistical methods to
find patterns and trends in text
Text mining consists of:
1 Information extraction
Automatically extracting structured information from
unstructured text
2 Semantic searching
Improves search accuracy by including context into a search
3 Knowledge discovery
Identifying relationships in extracted data
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
Humans can’t keep up to date with
the literature
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
Humans can’t keep up to date with
the literature
75 trials and 11 Systematic
reviews published per day!
Bastian et al. (2010) PLoS
Medicine
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
Humans can’t keep up to date with
the literature
75 trials and 11 Systematic
reviews published per day!
Bastian et al. (2010) PLoS
Medicine
It is increasingly difficult to hone in
on relevant papers
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
Humans can’t keep up to date with
the literature
75 trials and 11 Systematic
reviews published per day!
Bastian et al. (2010) PLoS
Medicine
It is increasingly difficult to hone in
on relevant papers
More of the literature is being held
online in machine-readable archives
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Why do we need it?
The scientific literature is rapidly
increasing in size
Humans can’t keep up to date with
the literature
75 trials and 11 Systematic
reviews published per day!
Bastian et al. (2010) PLoS
Medicine
It is increasingly difficult to hone in
on relevant papers
More of the literature is being held
online in machine-readable archives
TM can reduce processing time for
systematic reviews by 80%
(NCTM)
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Text-mining is not a magic bullet
Many publications are not open
access
Often need to rely on
abstracts
Grey literature is often
inaccessable
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Text-mining is not a magic bullet
Many publications are not open
access
Often need to rely on
abstracts
Grey literature is often
inaccessable
Still need plenty of human
input!
TM algorithms can be very
complex
Breadth at the expense of depth
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Text mining the PCD literature
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
UK Primary Care Databases
GPRD / CPRD
The General Practice Research Database / The Clinical Practice
Research Datalink
˜ 900 papers
THIN
The Health Improvement Network
˜ 360 papers
QResearch
˜ 75 papers
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
The Dataset
All articles reported by CPRD, THIN, QResearch in Pubmed
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
The Dataset
All articles reported by CPRD, THIN, QResearch in Pubmed
1185 Abstracts with metadata
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
The Dataset
All articles reported by CPRD, THIN, QResearch in Pubmed
1185 Abstracts with metadata
141 full-text articles for validation
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
The Dataset
All articles reported by CPRD, THIN, QResearch in Pubmed
1185 Abstracts with metadata
141 full-text articles for validation
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
The Dataset
All articles reported by CPRD, THIN, QResearch in Pubmed
1185 Abstracts with metadata
141 full-text articles for validation
How are PCD’s being used by researchers?
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
PCD studies are a growth area!
Number of publications is rapidly increasing. . .
1990 1995 2000 2005 2010
050100150
PCD articles in pubmed
year
Numberofarticles
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
PCD studies are a growth area!
. . . and there is global interest in UK PCD research
Institutions affiliated with UK PCD publications
xx
x x
x
xxxxxx
x x x
x
xx
x
x
x
xx
x
xx
x
xxx xx
xxx
x
xx
xxxxxx
xx
xx
x
xx
x
x
x xx x
x
x
xx
x
x
xxxxxx
x
x
x
x
x
x
xx
x xxx
x
xxxxx
xxx
xxx
x
x
x
x
xx
xxx
xx
x xx
x
x
xxx
xx
x
x
x
x
x
x
x
x
x
x
xx xx
x
xxx
x x
x
x
x
x
x
x
xx
xx
x
xx
xxxx
x
x
x
x
xx
x
xx
x
xxx
x xx
xx
xx
x
x
xx
x
xxxxxx x
x
x
xx
x
xxx
x
x
xxx
x
x
x
x
xxxxx
x
xx
x
xx
xxxxxx
xx
xx
x
x
x
x
xx x
xxx
x
x
xx
xx
x xxxxx
x
xxxxx
x
xx
xxx
x
x
x
xx
xx
xxx
x
x
x
x
xx
x
xx
xx
x
x
x
x
xx
xx
xxxxxx x
x
x
xx
x
x
x
x
x
x
x
xx
x x
x
x
x
xx
x
x
x
xxx
x
xxxxx
x
x
x xxxx
x x
xxxxx
xx
xx
x
x
xxxxxxxx xxx
x
xxxxx
x
x xx xxx x
x
xx xxxx
x
x
xxx
xx
x
xx
xxxxx
x
xx
x
x
xx
x
x
x
xx
x
x
x
xxx
x
xx
x
xxxx
xx
xxx
xx
x
xx x
xx
x
xxxx
xxx
x
x
xxx
x
x xxxx
x
x
x
x
x
x
x
x
x
x
x
x
x
x xxxxxx xxxx xx
x
xxx
x
x
x
x
x x
x x
xx
x
x
x
x
x
xxx
x
x
x
xx
x
xxx
x
x
x
x
x
x
x
xx x
x
x
x
x
x
xx
xx
xxxx
x
x
x
x
xxx
x
x
xx
xxx
x
x xxx
x
x
x
xx
xxxxxxxxx
xx
xx
x
xxxx
xx
x
xxxx
x
x
xx
x
x
x
x xx
x
xxx
x
xx
xxxxxx xx
x
xx
x
x
x
xxx
x
x
x
xxxxx
xx
xx
x
x
x
x
x
x
x
x
x x
x
xxxxx
x
xx xxx
x
xxx
x
x
x
x
x
x
x
xx
x
x
x
x
x
xxx
xx
x
xxxxx
x x
xx
xx
x
x
x
xxxxxxx
xx
x
x
xxxx
xx xx
x
x x x
xxxx
xx
xx
xxx
xxx
x xx
xx
x
xxx
x
x
x
x
x
xxx
x
x
x
x
xxxx
x
x
x x
xxxx
xxx
xxxxxxxx x
xx
xx
xx
x
xxxx x
x
x
xxxx
x
x
x
xx
xxxx
xx
x xx
xxx
xxx
x
x
x xxx
xxxx xxxx
x
xx
x
x
x
x
xx
x
x x
x
xx
xxx
x
x
x
x
x
x
x
x
xxxxxxxx
x
xxx
xx
x
xxx
x
xx xxxx
xx
x
xxxxxxxx xxxxx
x
xx
xx
x
xxxxxxx
x
x
xx
xxx
x
x xx
x
xx
xx
x
x
xx
x
x
x
xxx
x x
x
xxx
x
x
xx
xx
xx
xxx
x
x
x
xx x
x
xxx x
x
x x
xx
x
x
xxx
x
xx
xxxxxxxx
x x
x
x
x
x
xx
x
xxxxxx
x
x
xxxx
xxx
x
xxx
x
x
x
x xx
x
x
xx
x
x
x
x
x
x
x
x
x
x x
x
x x
xx x
xx
xx x
x
x
xx
x
x
x
xxx
x
xxx
x
xx
xx
x
x
x
x
x
xx
xx
xx
x
x
xx
x
xxxxxxx
x
xxxxxxxxxxx xxxxxxxxxxx
xxx x
x
x
xxxx
x
xxxx
xxxxxxxxxxxxxxxxxx xxxxx
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Broad scope of topics in PCD studies
A network graph of PCD topics of investigation
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
Cancer1
Fractures/osteo
VTE
antipsychotics/smi
Diabetes
Asthma
NSAID's
HRT
Flu vaccination
Pregnancy
CHD/antihypertensives
Stroke
Pneumonia
Statins
Psoriasis
Antibiotics
Steroids
Atrial/warfarin
Epilepsy
AntidepressantsParacetamol
Heart attack
IBS
BMI/obesity
Kidney disease
Cancer2
Seizures
Auto−immune
COPD
Healthcare costs
Beta blockers
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Study types are changing. . .
q q q q q q q
q
q
q q
q
q
q
q q q
q
q
q
q
q q q q q q q q q q
q q
q q q q q q q q q
q q q
q q q q q q q q
q
q
q
q q
q
q
q q q
q
q q
q
q
q q q q q q
q
q
q
q
q
q
q
q q
q q
q
q
q
q
q q q q q q q q q q q q
q q q q q q q q q q q q q q q q
q q q q q q
q
q
q q
q
q q q
q q
q q q q q q q q
q
q q
q q q q
q
q q
q
Associations Benefits Effectiveness
Epidemiology Harms and risks Healthcare costs
Misc Predictions Validity
0
40
80
120
0
40
80
120
0
40
80
120
1990 1995 2000 2005 2010 1990 1995 2000 2005 2010 1990 1995 2000 2005 2010
year
records
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
. . . as are analysis methods
q q q q
q
q q q q
q q q
q
q
q q
q
q
q
q q
q
q
q
q
q
q q q q q q q q
q q
q
q
q
q
q q
q q
q q q
q
q
q
q
q q
q
q q q
q
q
q q q q q
q
q
q
q
q
q
q
q
q
q q
q
q q q
q q
q
q
q
q q
q
q
q
q
q q q q q q q q q q q
q
q q
q
q
q q q
q q
q
q
q q q
q q
q
q q
q
q
q
Bayesian etc. Descriptives only Misc
Mixed−effects RCT comparisons Regression models
Simulations Survival analysis
0
20
40
60
0
20
40
60
0
20
40
60
1990 1995 2000 2005 2010 1990 1995 2000 2005 2010
year
records
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
PCD validity
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Threats to validity
Unmeasured confounding
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Threats to validity
Unmeasured confounding
Correlation does not equal causation
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Threats to validity
Unmeasured confounding
Correlation does not equal causation
GP recording
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Threats to validity
Unmeasured confounding
Correlation does not equal causation
GP recording
Clinical coding
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Clinical Coding in PCD’s
All clinical events are entered by GP’s as clinical codes:
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Clinical Coding in PCD’s
All clinical events are entered by GP’s as clinical codes:
Symptoms, signs & diagnoses (READ codes)
Referrals to external care centres
Immunisation records
Prescription information
Diagnostic test records and results
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Clinical Coding in PCD’s
All clinical events are entered by GP’s as clinical codes:
Symptoms, signs & diagnoses (READ codes)
Referrals to external care centres
Immunisation records
Prescription information
Diagnostic test records and results
Everything recorded by a GP can be identified (if you know
which codes to look for and where to look for them!)
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Clinical Coding in PCD’s
All clinical events are entered by GP’s as clinical codes:
Symptoms, signs & diagnoses (READ codes)
Referrals to external care centres
Immunisation records
Prescription information
Diagnostic test records and results
Everything recorded by a GP can be identified (if you know
which codes to look for and where to look for them!)
e.g.
H331.00 - Asthma diagnosis
H33z011 - Severe asthma attack
33G1 - Spirometry testing
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Clinical codes in PCD studies
Diagnoses are made by reference to a set of clinical codes
Workflow
1 Researchers decide on a rough set of codes for a condition
By searching lookup tables for matching terms
By reference to an external source (e.g. QOF)
2 Clinicians go through this draft list by hand and select the
relevant codes
3 The database is searched for events matching the finalised
code list
4 The correct combination of events in the timeframe of interest
gives a diagnosis
e.g. For Asthma: Need at least 1+ clinical event 1+ drug
event in the last year to qualify
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Code list? What code list?
Currently no obligation to publish code lists
No centralised repository for clinical codes
The vast majority of PCD studies do not publish their codes
No way of knowing if a condition diagnosis is valid
No way to replicate the research
For example. . .
In 45 UK case-control PCD studies (diabetes):
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Code list? What code list?
Currently no obligation to publish code lists
No centralised repository for clinical codes
The vast majority of PCD studies do not publish their codes
No way of knowing if a condition diagnosis is valid
No way to replicate the research
For example. . .
In 45 UK case-control PCD studies (diabetes):
Only 5 reported ANY clinical codes. . .
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Code list? What code list?
Currently no obligation to publish code lists
No centralised repository for clinical codes
The vast majority of PCD studies do not publish their codes
No way of knowing if a condition diagnosis is valid
No way to replicate the research
For example. . .
In 45 UK case-control PCD studies (diabetes):
Only 5 reported ANY clinical codes. . .
Only 2 of these published codes in appendix
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Code list? What code list?
Currently no obligation to publish code lists
No centralised repository for clinical codes
The vast majority of PCD studies do not publish their codes
No way of knowing if a condition diagnosis is valid
No way to replicate the research
For example. . .
In 45 UK case-control PCD studies (diabetes):
Only 5 reported ANY clinical codes. . .
Only 2 of these published codes in appendix
Only 1 provided full set of code lists
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Validity of Clinical coding
Clinical codes should be held to scrutiny and peer-review (either
pre- or post-publication)
This would allow for:
replication of studies
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Validity of Clinical coding
Clinical codes should be held to scrutiny and peer-review (either
pre- or post-publication)
This would allow for:
replication of studies
validation of diagnoses
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Validity of Clinical coding
Clinical codes should be held to scrutiny and peer-review (either
pre- or post-publication)
This would allow for:
replication of studies
validation of diagnoses
incremental improvements to clinical definitions
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
ClinicalCodes.org
. . . Is an online repository for PCD researchers to upload their
codes upon publication.
Deposit code-lists for
published studies
Download historical
code-lists
Archive for all Quality and
Outcomes Framework
business rules (2004 -
current)
Database-specific
information (e.g.
consultation types)
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
ClinicalCodes.org
Allows for validation /
replication of PCD studies
Tracking of disease
definitions through time
Comparitive studies of
clinical codes
Don’t reinvent the wheel!
Currently in development on campus:
medcodes.ls.manchester.ac.uk:8080/codesdb
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Summary
Publish open access!
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Summary
Publish open access!
Upload your codes!
May 2013 Uses and Validity of Primary Care Database studies
Text mining
Text mining the PCD literature
PCD validity
Summary
Publish open access!
Upload your codes!
Thank you
May 2013 Uses and Validity of Primary Care Database studies

More Related Content

What's hot

What's hot (8)

Collaborative Research: Scopus & RefWorks
Collaborative Research: Scopus & RefWorksCollaborative Research: Scopus & RefWorks
Collaborative Research: Scopus & RefWorks
 
Pubmed: doing the search
Pubmed: doing the searchPubmed: doing the search
Pubmed: doing the search
 
Advanced PubMed Presentation (with Endnote)
Advanced PubMed Presentation (with Endnote)Advanced PubMed Presentation (with Endnote)
Advanced PubMed Presentation (with Endnote)
 
How many medline platforms on the web?
How many medline platforms on the web?How many medline platforms on the web?
How many medline platforms on the web?
 
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...Haustein, S. (2017). The evolution of scholarly communication and the reward ...
Haustein, S. (2017). The evolution of scholarly communication and the reward ...
 
How to Search in PubMed® Tutorial
How to Search in PubMed® TutorialHow to Search in PubMed® Tutorial
How to Search in PubMed® Tutorial
 
Twlyon2015 poster
Twlyon2015 posterTwlyon2015 poster
Twlyon2015 poster
 
NIH Data Catalog - Updated Results
NIH Data Catalog - Updated ResultsNIH Data Catalog - Updated Results
NIH Data Catalog - Updated Results
 

Viewers also liked

Rp week 7 presentation compressed
Rp week 7 presentation compressedRp week 7 presentation compressed
Rp week 7 presentation compressed
dazza50
 
Pharmacoepidemiology
PharmacoepidemiologyPharmacoepidemiology
Pharmacoepidemiology
Govind Girase
 
336 Primary Data
336 Primary Data336 Primary Data
336 Primary Data
Fatema Ka
 
Validity And Reliabilty
Validity And ReliabiltyValidity And Reliabilty
Validity And Reliabilty
shoffma5
 

Viewers also liked (20)

Rp week 7 presentation compressed
Rp week 7 presentation compressedRp week 7 presentation compressed
Rp week 7 presentation compressed
 
Correlational data, causal hypotheses and validity
Correlational data, causal hypotheses and validityCorrelational data, causal hypotheses and validity
Correlational data, causal hypotheses and validity
 
3 Important Things in Infographics
3 Important Things in Infographics3 Important Things in Infographics
3 Important Things in Infographics
 
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...ClinicalCodes.org: An online repository of clinical code lists for primary ca...
ClinicalCodes.org: An online repository of clinical code lists for primary ca...
 
Pharmacoepidemiology
PharmacoepidemiologyPharmacoepidemiology
Pharmacoepidemiology
 
336 Primary Data
336 Primary Data336 Primary Data
336 Primary Data
 
Pharmacoepidemiology
PharmacoepidemiologyPharmacoepidemiology
Pharmacoepidemiology
 
Establishing Construct Validity using a Correlation Matrix with Survey Data
Establishing Construct Validity using a Correlation Matrix with Survey DataEstablishing Construct Validity using a Correlation Matrix with Survey Data
Establishing Construct Validity using a Correlation Matrix with Survey Data
 
Pharmacoepidemiology
PharmacoepidemiologyPharmacoepidemiology
Pharmacoepidemiology
 
Quantitative Data - A Basic Introduction
Quantitative Data - A Basic IntroductionQuantitative Data - A Basic Introduction
Quantitative Data - A Basic Introduction
 
Validation of packaging operations Pharma
Validation of packaging operations PharmaValidation of packaging operations Pharma
Validation of packaging operations Pharma
 
Primary & secondary data
Primary & secondary dataPrimary & secondary data
Primary & secondary data
 
Research methods - PSYA1 psychology AS
Research methods - PSYA1 psychology ASResearch methods - PSYA1 psychology AS
Research methods - PSYA1 psychology AS
 
8. validity and reliability of research instruments
8. validity and reliability of research instruments8. validity and reliability of research instruments
8. validity and reliability of research instruments
 
Validity And Reliabilty
Validity And ReliabiltyValidity And Reliabilty
Validity And Reliabilty
 
Research methods in psychology
Research methods in psychologyResearch methods in psychology
Research methods in psychology
 
Ppt data collection
Ppt data collectionPpt data collection
Ppt data collection
 
Reliability and validity
Reliability and validityReliability and validity
Reliability and validity
 
Validity and Reliability
Validity and ReliabilityValidity and Reliability
Validity and Reliability
 
validity its types and importance
validity its types and importancevalidity its types and importance
validity its types and importance
 

Similar to Presentation

Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Susanna-Assunta Sansone
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
Tarek Tawfik Amin
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
Dr. Haxel Consult
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
ijtsrd
 

Similar to Presentation (20)

An Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using ClusteringAn Improved Mining Of Biomedical Data From Web Documents Using Clustering
An Improved Mining Of Biomedical Data From Web Documents Using Clustering
 
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
GSmith Springer Nature Data policies and practices: HKU Open Data and Data Pu...
 
Identifying and tracking research resources using RRIDs: a practical approach
Identifying and tracking research resources using RRIDs:  a practical approachIdentifying and tracking research resources using RRIDs:  a practical approach
Identifying and tracking research resources using RRIDs: a practical approach
 
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
How to Make your Research Process more Effective? 4 Must-Use Tools for Resear...
 
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...Scientific Data overview of Data Descriptors - WT Data-Literature integration...
Scientific Data overview of Data Descriptors - WT Data-Literature integration...
 
Artificial intelligence in literature review in research - an introduction.pptx
Artificial intelligence in literature review in research - an introduction.pptxArtificial intelligence in literature review in research - an introduction.pptx
Artificial intelligence in literature review in research - an introduction.pptx
 
Data sharing as part of the research ecosystem
Data sharing as part of the research ecosystemData sharing as part of the research ecosystem
Data sharing as part of the research ecosystem
 
Paper as a Research Object
Paper as a Research ObjectPaper as a Research Object
Paper as a Research Object
 
Effective search of bibliographic databases
Effective search of bibliographic databasesEffective search of bibliographic databases
Effective search of bibliographic databases
 
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
ICIC 2014 Finding Answers in the Data – The Future Role of Text and Data Mini...
 
Gaining credit for sharing research data
Gaining credit for sharing research dataGaining credit for sharing research data
Gaining credit for sharing research data
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
 
informatics_future.pdf
informatics_future.pdfinformatics_future.pdf
informatics_future.pdf
 
One Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open ScienceOne Funder’s View for Advancing Open Science
One Funder’s View for Advancing Open Science
 
Standardising research data policies, research data network
Standardising research data policies, research data networkStandardising research data policies, research data network
Standardising research data policies, research data network
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Biositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource DiscoveryBiositemaps: A Framework for Biomedical Resource Discovery
Biositemaps: A Framework for Biomedical Resource Discovery
 
Ys Sivan Psgimsr Online Searching
Ys Sivan Psgimsr Online SearchingYs Sivan Psgimsr Online Searching
Ys Sivan Psgimsr Online Searching
 
Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016Rii stock centerdir_aug9_2016
Rii stock centerdir_aug9_2016
 
Text mining in biomedical domain with emphasis on document clustering
Text mining in biomedical domain with emphasis on document clusteringText mining in biomedical domain with emphasis on document clustering
Text mining in biomedical domain with emphasis on document clustering
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Presentation

  • 1. Text mining Text mining the PCD literature PCD validity Uses and Validity of Primary Care Database studies May 2013 David Springate, Evan Kontopantelis, Ivan Olier, David Reeves May 2013 Uses and Validity of Primary Care Database studies
  • 2. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature May 2013 Uses and Validity of Primary Care Database studies
  • 3. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature 2 Text-mining the PCD literature May 2013 Uses and Validity of Primary Care Database studies
  • 4. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature 2 Text-mining the PCD literature What is being studied using PCD’s? May 2013 Uses and Validity of Primary Care Database studies
  • 5. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature 2 Text-mining the PCD literature What is being studied using PCD’s? Changes in topics of investigation over time May 2013 Uses and Validity of Primary Care Database studies
  • 6. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature 2 Text-mining the PCD literature What is being studied using PCD’s? Changes in topics of investigation over time 3 Validity of Clinical coding May 2013 Uses and Validity of Primary Care Database studies
  • 7. Text mining Text mining the PCD literature PCD validity Outline 1 Use of text-mining to explore the scientific literature 2 Text-mining the PCD literature What is being studied using PCD’s? Changes in topics of investigation over time 3 Validity of Clinical coding 4 ClinicalCodes.org : A new repository for clinical code lists May 2013 Uses and Validity of Primary Care Database studies
  • 8. Text mining Text mining the PCD literature PCD validity Text mining May 2013 Uses and Validity of Primary Care Database studies
  • 9. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction May 2013 Uses and Validity of Primary Care Database studies
  • 10. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction Automatically extracting structured information from unstructured text May 2013 Uses and Validity of Primary Care Database studies
  • 11. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction Automatically extracting structured information from unstructured text 2 Semantic searching May 2013 Uses and Validity of Primary Care Database studies
  • 12. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction Automatically extracting structured information from unstructured text 2 Semantic searching Improves search accuracy by including context into a search May 2013 Uses and Validity of Primary Care Database studies
  • 13. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction Automatically extracting structured information from unstructured text 2 Semantic searching Improves search accuracy by including context into a search 3 Knowledge discovery May 2013 Uses and Validity of Primary Care Database studies
  • 14. Text mining Text mining the PCD literature PCD validity What is it? The process of extracting high-quality structured information from unstructured text (e.g. Scientific literature). Uses a variety of computational and statistical methods to find patterns and trends in text Text mining consists of: 1 Information extraction Automatically extracting structured information from unstructured text 2 Semantic searching Improves search accuracy by including context into a search 3 Knowledge discovery Identifying relationships in extracted data May 2013 Uses and Validity of Primary Care Database studies
  • 15. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size May 2013 Uses and Validity of Primary Care Database studies
  • 16. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size Humans can’t keep up to date with the literature May 2013 Uses and Validity of Primary Care Database studies
  • 17. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size Humans can’t keep up to date with the literature 75 trials and 11 Systematic reviews published per day! Bastian et al. (2010) PLoS Medicine May 2013 Uses and Validity of Primary Care Database studies
  • 18. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size Humans can’t keep up to date with the literature 75 trials and 11 Systematic reviews published per day! Bastian et al. (2010) PLoS Medicine It is increasingly difficult to hone in on relevant papers May 2013 Uses and Validity of Primary Care Database studies
  • 19. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size Humans can’t keep up to date with the literature 75 trials and 11 Systematic reviews published per day! Bastian et al. (2010) PLoS Medicine It is increasingly difficult to hone in on relevant papers More of the literature is being held online in machine-readable archives May 2013 Uses and Validity of Primary Care Database studies
  • 20. Text mining Text mining the PCD literature PCD validity Why do we need it? The scientific literature is rapidly increasing in size Humans can’t keep up to date with the literature 75 trials and 11 Systematic reviews published per day! Bastian et al. (2010) PLoS Medicine It is increasingly difficult to hone in on relevant papers More of the literature is being held online in machine-readable archives TM can reduce processing time for systematic reviews by 80% (NCTM) May 2013 Uses and Validity of Primary Care Database studies
  • 21. Text mining Text mining the PCD literature PCD validity Text-mining is not a magic bullet Many publications are not open access Often need to rely on abstracts Grey literature is often inaccessable May 2013 Uses and Validity of Primary Care Database studies
  • 22. Text mining Text mining the PCD literature PCD validity Text-mining is not a magic bullet Many publications are not open access Often need to rely on abstracts Grey literature is often inaccessable Still need plenty of human input! TM algorithms can be very complex Breadth at the expense of depth May 2013 Uses and Validity of Primary Care Database studies
  • 23. Text mining Text mining the PCD literature PCD validity Text mining the PCD literature May 2013 Uses and Validity of Primary Care Database studies
  • 24. Text mining Text mining the PCD literature PCD validity UK Primary Care Databases GPRD / CPRD The General Practice Research Database / The Clinical Practice Research Datalink ˜ 900 papers THIN The Health Improvement Network ˜ 360 papers QResearch ˜ 75 papers May 2013 Uses and Validity of Primary Care Database studies
  • 25. Text mining Text mining the PCD literature PCD validity The Dataset All articles reported by CPRD, THIN, QResearch in Pubmed May 2013 Uses and Validity of Primary Care Database studies
  • 26. Text mining Text mining the PCD literature PCD validity The Dataset All articles reported by CPRD, THIN, QResearch in Pubmed 1185 Abstracts with metadata May 2013 Uses and Validity of Primary Care Database studies
  • 27. Text mining Text mining the PCD literature PCD validity The Dataset All articles reported by CPRD, THIN, QResearch in Pubmed 1185 Abstracts with metadata 141 full-text articles for validation May 2013 Uses and Validity of Primary Care Database studies
  • 28. Text mining Text mining the PCD literature PCD validity The Dataset All articles reported by CPRD, THIN, QResearch in Pubmed 1185 Abstracts with metadata 141 full-text articles for validation May 2013 Uses and Validity of Primary Care Database studies
  • 29. Text mining Text mining the PCD literature PCD validity The Dataset All articles reported by CPRD, THIN, QResearch in Pubmed 1185 Abstracts with metadata 141 full-text articles for validation How are PCD’s being used by researchers? May 2013 Uses and Validity of Primary Care Database studies
  • 30. Text mining Text mining the PCD literature PCD validity PCD studies are a growth area! Number of publications is rapidly increasing. . . 1990 1995 2000 2005 2010 050100150 PCD articles in pubmed year Numberofarticles May 2013 Uses and Validity of Primary Care Database studies
  • 31. Text mining Text mining the PCD literature PCD validity PCD studies are a growth area! . . . and there is global interest in UK PCD research Institutions affiliated with UK PCD publications xx x x x xxxxxx x x x x xx x x x xx x xx x xxx xx xxx x xx xxxxxx xx xx x xx x x x xx x x x xx x x xxxxxx x x x x x x xx x xxx x xxxxx xxx xxx x x x x xx xxx xx x xx x x xxx xx x x x x x x x x x x xx xx x xxx x x x x x x x x xx xx x xx xxxx x x x x xx x xx x xxx x xx xx xx x x xx x xxxxxx x x x xx x xxx x x xxx x x x x xxxxx x xx x xx xxxxxx xx xx x x x x xx x xxx x x xx xx x xxxxx x xxxxx x xx xxx x x x xx xx xxx x x x x xx x xx xx x x x x xx xx xxxxxx x x x xx x x x x x x x xx x x x x x xx x x x xxx x xxxxx x x x xxxx x x xxxxx xx xx x x xxxxxxxx xxx x xxxxx x x xx xxx x x xx xxxx x x xxx xx x xx xxxxx x xx x x xx x x x xx x x x xxx x xx x xxxx xx xxx xx x xx x xx x xxxx xxx x x xxx x x xxxx x x x x x x x x x x x x x x xxxxxx xxxx xx x xxx x x x x x x x x xx x x x x x xxx x x x xx x xxx x x x x x x x xx x x x x x x xx xx xxxx x x x x xxx x x xx xxx x x xxx x x x xx xxxxxxxxx xx xx x xxxx xx x xxxx x x xx x x x x xx x xxx x xx xxxxxx xx x xx x x x xxx x x x xxxxx xx xx x x x x x x x x x x x xxxxx x xx xxx x xxx x x x x x x x xx x x x x x xxx xx x xxxxx x x xx xx x x x xxxxxxx xx x x xxxx xx xx x x x x xxxx xx xx xxx xxx x xx xx x xxx x x x x x xxx x x x x xxxx x x x x xxxx xxx xxxxxxxx x xx xx xx x xxxx x x x xxxx x x x xx xxxx xx x xx xxx xxx x x x xxx xxxx xxxx x xx x x x x xx x x x x xx xxx x x x x x x x x xxxxxxxx x xxx xx x xxx x xx xxxx xx x xxxxxxxx xxxxx x xx xx x xxxxxxx x x xx xxx x x xx x xx xx x x xx x x x xxx x x x xxx x x xx xx xx xxx x x x xx x x xxx x x x x xx x x xxx x xx xxxxxxxx x x x x x x xx x xxxxxx x x xxxx xxx x xxx x x x x xx x x xx x x x x x x x x x x x x x x xx x xx xx x x x xx x x x xxx x xxx x xx xx x x x x x xx xx xx x x xx x xxxxxxx x xxxxxxxxxxx xxxxxxxxxxx xxx x x x xxxx x xxxx xxxxxxxxxxxxxxxxxx xxxxx May 2013 Uses and Validity of Primary Care Database studies
  • 32. Text mining Text mining the PCD literature PCD validity Broad scope of topics in PCD studies A network graph of PCD topics of investigation q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q Cancer1 Fractures/osteo VTE antipsychotics/smi Diabetes Asthma NSAID's HRT Flu vaccination Pregnancy CHD/antihypertensives Stroke Pneumonia Statins Psoriasis Antibiotics Steroids Atrial/warfarin Epilepsy AntidepressantsParacetamol Heart attack IBS BMI/obesity Kidney disease Cancer2 Seizures Auto−immune COPD Healthcare costs Beta blockers May 2013 Uses and Validity of Primary Care Database studies
  • 33. Text mining Text mining the PCD literature PCD validity Study types are changing. . . q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Associations Benefits Effectiveness Epidemiology Harms and risks Healthcare costs Misc Predictions Validity 0 40 80 120 0 40 80 120 0 40 80 120 1990 1995 2000 2005 2010 1990 1995 2000 2005 2010 1990 1995 2000 2005 2010 year records May 2013 Uses and Validity of Primary Care Database studies
  • 34. Text mining Text mining the PCD literature PCD validity . . . as are analysis methods q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Bayesian etc. Descriptives only Misc Mixed−effects RCT comparisons Regression models Simulations Survival analysis 0 20 40 60 0 20 40 60 0 20 40 60 1990 1995 2000 2005 2010 1990 1995 2000 2005 2010 year records May 2013 Uses and Validity of Primary Care Database studies
  • 35. Text mining Text mining the PCD literature PCD validity PCD validity May 2013 Uses and Validity of Primary Care Database studies
  • 36. Text mining Text mining the PCD literature PCD validity Threats to validity Unmeasured confounding May 2013 Uses and Validity of Primary Care Database studies
  • 37. Text mining Text mining the PCD literature PCD validity Threats to validity Unmeasured confounding Correlation does not equal causation May 2013 Uses and Validity of Primary Care Database studies
  • 38. Text mining Text mining the PCD literature PCD validity Threats to validity Unmeasured confounding Correlation does not equal causation GP recording May 2013 Uses and Validity of Primary Care Database studies
  • 39. Text mining Text mining the PCD literature PCD validity Threats to validity Unmeasured confounding Correlation does not equal causation GP recording Clinical coding May 2013 Uses and Validity of Primary Care Database studies
  • 40. Text mining Text mining the PCD literature PCD validity Clinical Coding in PCD’s All clinical events are entered by GP’s as clinical codes: May 2013 Uses and Validity of Primary Care Database studies
  • 41. Text mining Text mining the PCD literature PCD validity Clinical Coding in PCD’s All clinical events are entered by GP’s as clinical codes: Symptoms, signs & diagnoses (READ codes) Referrals to external care centres Immunisation records Prescription information Diagnostic test records and results May 2013 Uses and Validity of Primary Care Database studies
  • 42. Text mining Text mining the PCD literature PCD validity Clinical Coding in PCD’s All clinical events are entered by GP’s as clinical codes: Symptoms, signs & diagnoses (READ codes) Referrals to external care centres Immunisation records Prescription information Diagnostic test records and results Everything recorded by a GP can be identified (if you know which codes to look for and where to look for them!) May 2013 Uses and Validity of Primary Care Database studies
  • 43. Text mining Text mining the PCD literature PCD validity Clinical Coding in PCD’s All clinical events are entered by GP’s as clinical codes: Symptoms, signs & diagnoses (READ codes) Referrals to external care centres Immunisation records Prescription information Diagnostic test records and results Everything recorded by a GP can be identified (if you know which codes to look for and where to look for them!) e.g. H331.00 - Asthma diagnosis H33z011 - Severe asthma attack 33G1 - Spirometry testing May 2013 Uses and Validity of Primary Care Database studies
  • 44. Text mining Text mining the PCD literature PCD validity Clinical codes in PCD studies Diagnoses are made by reference to a set of clinical codes Workflow 1 Researchers decide on a rough set of codes for a condition By searching lookup tables for matching terms By reference to an external source (e.g. QOF) 2 Clinicians go through this draft list by hand and select the relevant codes 3 The database is searched for events matching the finalised code list 4 The correct combination of events in the timeframe of interest gives a diagnosis e.g. For Asthma: Need at least 1+ clinical event 1+ drug event in the last year to qualify May 2013 Uses and Validity of Primary Care Database studies
  • 45. Text mining Text mining the PCD literature PCD validity Code list? What code list? Currently no obligation to publish code lists No centralised repository for clinical codes The vast majority of PCD studies do not publish their codes No way of knowing if a condition diagnosis is valid No way to replicate the research For example. . . In 45 UK case-control PCD studies (diabetes): May 2013 Uses and Validity of Primary Care Database studies
  • 46. Text mining Text mining the PCD literature PCD validity Code list? What code list? Currently no obligation to publish code lists No centralised repository for clinical codes The vast majority of PCD studies do not publish their codes No way of knowing if a condition diagnosis is valid No way to replicate the research For example. . . In 45 UK case-control PCD studies (diabetes): Only 5 reported ANY clinical codes. . . May 2013 Uses and Validity of Primary Care Database studies
  • 47. Text mining Text mining the PCD literature PCD validity Code list? What code list? Currently no obligation to publish code lists No centralised repository for clinical codes The vast majority of PCD studies do not publish their codes No way of knowing if a condition diagnosis is valid No way to replicate the research For example. . . In 45 UK case-control PCD studies (diabetes): Only 5 reported ANY clinical codes. . . Only 2 of these published codes in appendix May 2013 Uses and Validity of Primary Care Database studies
  • 48. Text mining Text mining the PCD literature PCD validity Code list? What code list? Currently no obligation to publish code lists No centralised repository for clinical codes The vast majority of PCD studies do not publish their codes No way of knowing if a condition diagnosis is valid No way to replicate the research For example. . . In 45 UK case-control PCD studies (diabetes): Only 5 reported ANY clinical codes. . . Only 2 of these published codes in appendix Only 1 provided full set of code lists May 2013 Uses and Validity of Primary Care Database studies
  • 49. Text mining Text mining the PCD literature PCD validity Validity of Clinical coding Clinical codes should be held to scrutiny and peer-review (either pre- or post-publication) This would allow for: replication of studies May 2013 Uses and Validity of Primary Care Database studies
  • 50. Text mining Text mining the PCD literature PCD validity Validity of Clinical coding Clinical codes should be held to scrutiny and peer-review (either pre- or post-publication) This would allow for: replication of studies validation of diagnoses May 2013 Uses and Validity of Primary Care Database studies
  • 51. Text mining Text mining the PCD literature PCD validity Validity of Clinical coding Clinical codes should be held to scrutiny and peer-review (either pre- or post-publication) This would allow for: replication of studies validation of diagnoses incremental improvements to clinical definitions May 2013 Uses and Validity of Primary Care Database studies
  • 52. Text mining Text mining the PCD literature PCD validity ClinicalCodes.org . . . Is an online repository for PCD researchers to upload their codes upon publication. Deposit code-lists for published studies Download historical code-lists Archive for all Quality and Outcomes Framework business rules (2004 - current) Database-specific information (e.g. consultation types) May 2013 Uses and Validity of Primary Care Database studies
  • 53. Text mining Text mining the PCD literature PCD validity ClinicalCodes.org Allows for validation / replication of PCD studies Tracking of disease definitions through time Comparitive studies of clinical codes Don’t reinvent the wheel! Currently in development on campus: medcodes.ls.manchester.ac.uk:8080/codesdb May 2013 Uses and Validity of Primary Care Database studies
  • 54. Text mining Text mining the PCD literature PCD validity Summary Publish open access! May 2013 Uses and Validity of Primary Care Database studies
  • 55. Text mining Text mining the PCD literature PCD validity Summary Publish open access! Upload your codes! May 2013 Uses and Validity of Primary Care Database studies
  • 56. Text mining Text mining the PCD literature PCD validity Summary Publish open access! Upload your codes! Thank you May 2013 Uses and Validity of Primary Care Database studies