Pseudonymised GP data extraction 
for risk stratification 
CCIO Leaders Network Annual Conference 
04 November 2014 
Dr Brendan O’Brien 
Consultant Clinical Informatics Specialist (CCIO) 
Regional Health & Social Care Board, NI.
NI Context 
Population: 1.8 million 
Integrated Health & Social Care 
Budget ~ £4.5 billion 
Purchaser provider split 
Commissioners: HSCB & PHA 
Providers: 5 Acute Trusts, Regional Ambulance Service 
351 GP practices
Clinical Priority Areas 
Patients risk stratified according to complexity of 
needs. 
Four priority areas set by DHSSPS. 
• Frail elderly patients 
• Respiratory patients 
• Diabetic patients 
• Patients with a history of stroke / TIA
Approach 
• Data pseudonymised at source (only the GP practice know the patients 
identity) 
• MIQUEST approach used for the data extraction 
• Only coded information extracted 
• HSCB writes the HQL code for the data extraction 
• GP practices run the HQL queries 
• HSCB conducts analysis on pseudonymised data set from GP practices 
• Stratified list for each clinical area returned to practices 
• Practice re-identify patients, add soft knowledge and proactively manage 
those patients with greatest number of factors associated with unplanned 
admissions 
• HSCB will aggregate the extract for public health and commissioning uses
Agreement between HSCB & each Practice 
Covers sharing of 
anonymised and 
pseudonymised 
patient level data.
MIQUEST 
• Free software for extracting and aggregating data 
from all GP systems 
• Incorporates strong security and confidentiality 
safeguards 
• Adopted as standard within the NHS for data 
extraction 
• Part of RFA99 (requirements for Accreditation)
LOCAL ENQUIRER QUERY 
• Import query 
• Run query 
• Output response 
• Use the data 
GP Practice 
REMOTE ENQUIRER QUERY 
• Import the query 
• Authorise the query 
• Run query 
• Authorise response 
• Output response 
• Use the data 
HSCB or other enquirer
MIQUEST 
CTV 3 Lots of quirks in 
Individual systems
HQL Data Model
Main Factors 
• Emergency admissions in past 12 months 
• Polypharmacy 
• High risk drugs: NSAIDs, diuretics, anticoagulants, 
antiplatelets 
• Number of comorbidities (20 registers) 
• Test results 
• Smoking status, alcohol intake 
• ‘Soft knowledge’ added by GP Practice
Challenges 
• No validated risk prediction algorithm for NI 
population 
• Care.data = increased stakeholder concern 
• 4 GP Systems, 3 coding schemes 
– Vision: Read Version 2 
– EMIS PCS: Read Version 2 
– EMIS LV: Read Version 2 + proprietary codes 
– Crosscare (Merlok): Clinical Terms Version 3
Pseudonymisation At Source 
We’re not 
Brazil 
CARE.DATA 
We’re 
Northern 
Ireland.
Progress to date 
• 1,124,040 patients records extracted so far 
• 1,692 patients excluded from extract (0.15%) 
– Due to controversy around care.data the decision was 
made to exclude data on patients whose clinical record 
contained a Read code opting them out of the Emergency 
Care Summary (ECS) extract. 
• 92% of GP practices signed up to the enhanced service 
– Covering 93% of NI population 
– EMIS LV and Merlok extractions this month
Polypharmacy 
Highest number of unique 
items in any single month 
(July to December 2013) 
Percentage of the patient 
population included in the 
extraction 
0 items 36.16% 
1-4 items 47.46% 
5-9 items 12.94% 
10+ items 3.44%
High Risk Drugs 
Number of high risk drug 
categories from which 
patients have received 
medication in period July 
to December 2013 
Percentage of the patient 
population included in the 
extraction 
0 79.01% 
1 16.98% 
2 3.70% 
3 0.30% 
4 0.01%
Morbidity (number of conditions) by Age Band
Morbidities 
Comorbidities
Thanks for listening 
Email: 
brendan.obrien@hscni.net 
Twitter: 
@drbrendanobrien

Information Sharing: Sharing information across health and social care. Pseudonymised GP data extraction for risk stratification - Dr. Brendan O'Brien

  • 1.
    Pseudonymised GP dataextraction for risk stratification CCIO Leaders Network Annual Conference 04 November 2014 Dr Brendan O’Brien Consultant Clinical Informatics Specialist (CCIO) Regional Health & Social Care Board, NI.
  • 2.
    NI Context Population:1.8 million Integrated Health & Social Care Budget ~ £4.5 billion Purchaser provider split Commissioners: HSCB & PHA Providers: 5 Acute Trusts, Regional Ambulance Service 351 GP practices
  • 3.
    Clinical Priority Areas Patients risk stratified according to complexity of needs. Four priority areas set by DHSSPS. • Frail elderly patients • Respiratory patients • Diabetic patients • Patients with a history of stroke / TIA
  • 4.
    Approach • Datapseudonymised at source (only the GP practice know the patients identity) • MIQUEST approach used for the data extraction • Only coded information extracted • HSCB writes the HQL code for the data extraction • GP practices run the HQL queries • HSCB conducts analysis on pseudonymised data set from GP practices • Stratified list for each clinical area returned to practices • Practice re-identify patients, add soft knowledge and proactively manage those patients with greatest number of factors associated with unplanned admissions • HSCB will aggregate the extract for public health and commissioning uses
  • 5.
    Agreement between HSCB& each Practice Covers sharing of anonymised and pseudonymised patient level data.
  • 6.
    MIQUEST • Freesoftware for extracting and aggregating data from all GP systems • Incorporates strong security and confidentiality safeguards • Adopted as standard within the NHS for data extraction • Part of RFA99 (requirements for Accreditation)
  • 7.
    LOCAL ENQUIRER QUERY • Import query • Run query • Output response • Use the data GP Practice REMOTE ENQUIRER QUERY • Import the query • Authorise the query • Run query • Authorise response • Output response • Use the data HSCB or other enquirer
  • 8.
    MIQUEST CTV 3Lots of quirks in Individual systems
  • 9.
  • 10.
    Main Factors •Emergency admissions in past 12 months • Polypharmacy • High risk drugs: NSAIDs, diuretics, anticoagulants, antiplatelets • Number of comorbidities (20 registers) • Test results • Smoking status, alcohol intake • ‘Soft knowledge’ added by GP Practice
  • 11.
    Challenges • Novalidated risk prediction algorithm for NI population • Care.data = increased stakeholder concern • 4 GP Systems, 3 coding schemes – Vision: Read Version 2 – EMIS PCS: Read Version 2 – EMIS LV: Read Version 2 + proprietary codes – Crosscare (Merlok): Clinical Terms Version 3
  • 12.
    Pseudonymisation At Source We’re not Brazil CARE.DATA We’re Northern Ireland.
  • 13.
    Progress to date • 1,124,040 patients records extracted so far • 1,692 patients excluded from extract (0.15%) – Due to controversy around care.data the decision was made to exclude data on patients whose clinical record contained a Read code opting them out of the Emergency Care Summary (ECS) extract. • 92% of GP practices signed up to the enhanced service – Covering 93% of NI population – EMIS LV and Merlok extractions this month
  • 14.
    Polypharmacy Highest numberof unique items in any single month (July to December 2013) Percentage of the patient population included in the extraction 0 items 36.16% 1-4 items 47.46% 5-9 items 12.94% 10+ items 3.44%
  • 15.
    High Risk Drugs Number of high risk drug categories from which patients have received medication in period July to December 2013 Percentage of the patient population included in the extraction 0 79.01% 1 16.98% 2 3.70% 3 0.30% 4 0.01%
  • 16.
    Morbidity (number ofconditions) by Age Band
  • 17.
  • 18.
    Thanks for listening Email: brendan.obrien@hscni.net Twitter: @drbrendanobrien