BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

The Norwegian Primary Care Research
Network IT infrastructure: The Snow system
Johan Gustav Bellika
Professor Nasjonalt Senter for e-helseforskning
Professor II Institutt for Klinisk Medisin, Helsevitenskapelig fakultet, UiT
Seminar on Practical Privacy-Preserving Distributed Statistical Computations
2018.03.05
Dr. John Snow
(1813 – 1858)

Source: WMA Declaration of Helsinki. URL: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/
http://chrisricecooper.blogspot.no/2015/02/photoessay-on-year-of-ram-by-asian.html
Declaration of Helsinki, Article 6
The primary purpose of medical research involving human
subjects is to understand the causes, development and effects of
diseases and improve preventive, diagnostic and therapeutic
interventions (methods, procedures and treatments).
Even the best proven interventions must be evaluated
continually through research for their safety, effectiveness,
efficiency, accessibility and quality.

Declaration of Helsinki, Article 9
It is the duty of physicians who are involved in medical
research to protect the life, health, dignity, integrity, right to
self-determination, privacy, and confidentiality of personal
information of research subjects.
Source: WMA Declaration of Helsinki. URL: https://www.wma.net/policies-post/wma-declaration-of-helsinki-ethical-principles-for-medical-research-involving-human-subjects/
Medical research should be privacy preserving!

Objectives for the research infrastructure
• Make participation in research projects easier and more efficient for the GPs
• Reuse health data in a safe and privacy preserving manner
• Complete research projects according to scheduled time and resources
consumption
• Recruit 90-110 GP practices
• Cover 7,5 % of the Norwegian population

What is Snow?
• A distributed system
• Enables collection and reuse of anonymous medical data
• Builds and maintains a national online epidemiology-model
• Use the epidemiology model to provide automated IT based health services
• Enable privacy preserving distributed computations on EHR data
• Directed at research, quality improvements, audit, disease surveillance,…
Source:http://upload.wikimedia.org/wikipedia/commons/f/f6/Vibrio_cholerae.jpg

Snow architecture
- enables coordinated computations on distributed resources
- a “collaborative Edge computing” infrastructure [1]
S Coord
S
S
S
S
ClientCoord=Snow Coordination server
S= Snow Server in local health institution
Source: [1] Shi W, Cao J, Zhang Q, Li Y, Xu L. Edge Computing: Vision and Challenges. IEEE Internet Things J. oktober 2016;3(5):637–46.

Edge computing
“Edge computing refers to the enabling technologies allowing computation to be
performed at the edge of the network”[1].
Beneficial when data is:
• To sensitive (health data)
• To big (genetic data)
• To competitive (data will expose profile of owner)
• +++
Source: [1] Shi W, Cao J, Zhang Q, Li Y, Xu L. Edge Computing: Vision and Challenges. IEEE Internet Things J. oktober 2016;3(5):637–46.

The computing entities
• The individual computing process – an “agent”
• One instantiation at each participating Snow server
• One unique communication address for each agent:
agent-user@snow-server-domain/mission_id
• Agents communicates among each other using XMPP messages
• Coordinated computations: “Missions” of multiple agents:
• One “main” coordinating agent
• Multiple computation agents performing computations in parallel

Agent distribution scheme
(Collaborative computations at the edges)
Snow coordinator
Main
agent
Snow server Snow server Snow server
Health network
Comput.
agent
Comput.
agent
Comput.
agent
Health institution Health institutionHealth institution

• A small computer that fits everywhere
• Snow server software is pre-installed
• Very easy installation
• Remote system administration by the Snow
team at UiT / NSE
• Remove the risk of affecting the stability or
performance of operation critical IT systems
– the electronic health record system
• All data in the box is pseudonymised, both
patient and GPs
• Agents compute on the box
Snow appliance box:
The nodes of the
network
11

Data flow in PCRN
Internet Secure health net
GP office 1
Snow
GP
server
EMR
GP office 2
Snow
GP
server
EMR
GP office 3
Snow
GP
server
EMR
Aggregated
data/statistics
Snow
coordinator
server
PCRN net portal
• Distributed data analysis
• Establish projects, invite GPs,
initiate data extraction etc
PCRN internal data
• Epidemiological analyses
• GP and patient data
• Consultation statistics
PCRN CN
Safe haven for data
Research data set
(individual patient data)Secure data storage for
research data set and
advanced data analyses
Local net
= data storage

Using secure multiparty computations to support
research in primary care

Virtualdataset
Creating a virtual dataset with Emnet/Snow
Researcher/PCRN staff Coordinator
Def Def
Def
Def
Clinical
practice 1
Clinical
practice 2
Clinical
practice 3
Aimed at:
1. Make participation in research projects easier and more efficient for the GPs
2. Support researchers in inclusion of sufficient number of patients in clinical research
3. Support article 9 in Helsinki declaration: Privacy preservation

GP tool to identify the eligible patients

Virtualdataset
Distributed statistical computations with Emnet/Snow
Clinical
practice 1
Clinical
practice 2
Clinical
practice 3
Researcher/PCRN staff Coordinator
Query Query
Query
Query
Result
Secure multi-party computation
(SMC)
Aimed at:
1. Support researchers in inclusion of sufficient number of patients in clinical research

Report
database
Virtualdataset
Automated processing
Clinical
practice 1
Clinical
practice 2
Clinical
practice 3
Coordinator
Query
Query
Query
Resultat
Secure multi-party computation
(SMC)
PCRN interne data
Aimed at:
1. Supporting article 6 in the Helsinki declaration: Continuous evaluation

Benefits
• Centralised resources as PCRN staff/researchers can help GPs become more
efficient in research.
• Knowledge about the patient populations can be generated directly from the
distributed sources, spanning administrative borders as municipalities, regions,
countries and continents
• Aggregated (non sensitive) statistics can be produced automatically directly from
the sources.

Drawbacks
• Two other comparable approaches exists, no standard established
• How to validate correctness of computed statistics is an open research question

Thanks for listening
Questions?

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation

Similar to BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation (20)

More from Statistisk sentralbyrå

More from Statistisk sentralbyrå (20)

Recently uploaded

Recently uploaded (20)

BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Computation