14a Conferenza Nazionale di Statistica

Input Privacy Preserving
Techniques: UNECE
project experiences
Massimo De Cubellis, Mauro Bruno, Fabrizio De Fausti,
Monica Scannapieco
Tecnologo | ISTAT
30.11//2021

o Input Privacy definition
o Input privacy vs Output privacy
o Motivations
o Input privacy preserving techniques (HE, SMPC, TEE, FL, etc.)
o UNECE IPP project overview
o Use cases / mini-pilots
o Conclusions
Outline
2
INPUT PRIVACY PRESERVING TECHNIQUES: UNECE PROJECT EXPERIENCES | M. DE CUBELLIS, M. BRUNO, F. DE FAUSTI, M. SCANNAPIECO

Input privacy: a definition
3
Input privacy means that the Computing Party cannot access or derive any input value provided by Input
Parties, nor access intermediate values or statistical results available at Result parties during processing of
the data (unless the value has been specifically selected for disclosure). [UN Handbook on Privacy-
Preserving Computation Techniques, 2019]
Source Data
Statistical
Analysis
Statistical
products
Computing parties Output parties
Input parties

Input privacy
o Input privacy techniques are based on data
«transformations» that preserve source data
privacy
o Examples of input privacy techniques: Secure
Multi-Party Computation (SMC), Homomorphic
Encryption (HE), Trusted Execution
Environment (TEE), Federated Learning (FL)
Output privacy
o Output privacy aims at reducing the risk of
privacy breaches in the phases of
disseminating or exchanging statistical
products
o Examples of output privacy techniques:
Differential Privacy, Statistical Disclosure
Control
4
Input Privacy vs Output Privacy
Output parties
Source Data
Statistical
Analysis
Statistical
products
Computing parties Output parties
Input parties
Statistical
products

Motivations: why NSIs need to apply IPP techniques?
5
Official Statistics is moving towards new production scenarios involving the use of data owned by external
parties [i.e.: Private parties like Mobile Networks Operators (MNOs), public parties like Central Banks or
others NSIs]
New needs (i.e. giving NSIs access to new sources of Big Data – enabling Big data collaboration across
multiple NSIs) have led NSIs to take into account input privacy, (i.e. protect the privacy of data acquired by
parties external to the NSIs)
Private/Public sector
NSI Central
Authority
NSO-1 NSO-2 NSO-3

The main input privacy preserving techniques are:
o Secure Multy-party computation (SMPC)
o Federated Learning (FL)
o Homomorfic Encryption (HE)
o Trusted Execution Environment (TEE)
Input Privacy Techniques: overview (1 of 3)
6

o Secure Multy-party computation (SMPC)
o Secure Multi Party Computation (SMPC) is a paradigm based on communication between the parties
o Data are split, and each party sends a few portions of her data
o Other parties can not reconstruct the initial data, but can make some computation on portions of data received
by other parties
o Then once each party has finished, everything can be aggregated and the result of the output is known to each
party
o Federated Learning (FL)
o Machine learning system based on distributing the algorithm to where the data, is instead of gathering the data
where the algorithm is (decentralized/distributed computation)
7

o Homomorfic Encryption (HE)
o Homomorphic cryptography is a type of cryptography based on techniques that allow the manipulation of
encrypted data.
For example, having two numbers X and Y (encrypted with the same homomorphic algorithm starting from two
numbers A and B) it is possible to calculate the "encrypted" sum of A and B by directly adding the two encrypted
numbers X and Y without the need to decrypt them
o Trusted Execution Environment (TEE)
8
o unlike other IPP techniques, TEE represents a hardware
solution
o the enclave technology allows programs to be executed in
isolation of other programs.
o all data inbound and outbound is encrypted, and
computation in clear only happens within the enclave.
o the enclave code and integrity can then be checked
externally.
Example: Privacy-preserving genotype imputation
in a trusted execution environment

Context:
Project name: Input Privacy Preservation Project
Period: July 2020 – December 2021 (probably will be extended)
Organizations involved: Eurostat, GSO-Vietnam, INEGI-Mexico , Istat-Italy, ONS-UK, Statistics Canada,
Statistics Netherlands, UNECE
Project Goals:
o investigate statistical use cases that require protection on the input side
o assess applicability of selected classes of techniques for main scenarios
o identify opportunities for sharing across statistical community
o create community across statistical organizations and external partners (academia, private sector)
UNECE Input Privacy Preserving: Project overview
9

WP1 – Document the use cases
o Existing use cases documentation (original version / generalyzed version)
o Use cases «description» through a logical framework to specify Input Privacy for OS Use Cases
WP 2 – Elaboration for the use cases
o Track 1: Private Set Intersection
o PSI with analytics (ISTAT)
o PSI + Analytics using HE (CBS)
o PSI - Measures the coverage of a data source from a third party in privacy preserving way,
(StatCan)
o Track 2: Private Machine Learning
o Track 3: Organize external consultation
(to suppot NSIs in the design and development of IPP services)
UNECE Input Privacy Preserving: Project structure
10

Simulate a runtime environment with several NSIs gathering sensors data with the aim to perform a private
machine learning training
o Architecture: distributed architecture with a central coordinator
o Data: sensors open data by accelorometer
o Task: A machine learning classification task: to predict human activities starting from accelerometer
data
o IPP Technology: Federated Learning
o Framework: Flower Federated Learning
UNECE IPP Use cases: Track 2 – Private Machine Learning (1/2)
11

An example of Federated Learning approach
UNECE IPP Use cases: Track 2 – Private Machine Learning (2/2)
12
Central
Authority
NSI-1 NSI-2 NSI-3 NSI-4
A Central Authority (server) wants to start a FL training with 4 clients (NSIs)
2 - Central Authority sends the model and parameter
1 - The server sets the FL strategy (i.e. federated average) and connects to
clients. Client data stay on his side
3 - each client train his model and sends it back to the server
4 - the server aggregates the models following the selected strategy
Strategy
o Other configurations are possible
o In each configuration:
o the data are not shared with other NSIs
o different strategies are possible
o more iterations are possible
Parameters
Aggregation

Results obtained:
o Sharing of previous experiences on the subject of IPP
o Development of new mini-pilots / use-cases applying PSI and Federated Learning techniques)
o Know-how acquisition on how to apply IPP techniques within the official statistics
o Ongoing collaboration with UN Privacy Enhancing Technology Lab (UN PET-Lab)
Further development:
o Extension of the UNEEC IPP project in which further mini-pilots and use cases can be carried out
o Design a platform to provide Input Privacy Preserving services
o Apply the lessons learned in the official statistics production within the NSIs
Conclusions
13

o UN Handbook on Privacy-Preserving Computation Techniques, 2019
o https://towardsdatascience.com/homomorphic-encryption-intro-part-1-overview-and-use-cases-
a601adcff06c
o https://www.sciencedirect.com/science/article/abs/pii/S2405471221002891
References
14

Thanks for your attention !
UNECE Input Privacy Preserving
15

14a Conferenza Nazionale di Statistica

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 14a Conferenza Nazionale di Statistica

Similar to 14a Conferenza Nazionale di Statistica (20)

More from Istituto nazionale di statistica

More from Istituto nazionale di statistica (20)

Recently uploaded

Recently uploaded (20)

14a Conferenza Nazionale di Statistica