Confidential Computing - Analysing Data Without Seeing Data

www.csiro.au
Data Analy1cs
WITHOUT Seeing the Data
Max O>
… with input from the en1re N1 Team
max.o>@data61.csiro.au

Future Value of Data
Data Analytics Without Seeing the Data
2 |
time
value
release
Data decays with time!

3 |
time
value
release
Joined with another data set
– more value!!

4 |
time
value
release
New analytics techniques
– more value!!

Data Analytics Without Seeing the Data5 |
time
value
release
Data decay
+
Joining new data
+
New analytics techniques
Uncertain future value
Unknown future risk

Challenge
Computa.on
Result
Confidential
Learn this!
Learn
NOTHING
Data Analy.cs Without Seeing the Data 6 |

The Problem
How can we learn valuable insights from
sensi1ve data from mul1ple organisa.ons?
Insights
Sensitive
data
Sensitive
data
Joint
Analysis
Confidential Confidential

Three Basic Building Blocks
• Private computa.on
•  Arithme.c on encrypted numbers
• Distributed, conﬁden.al analy.cs
•  Distributed algorithms, computa.on & protocols
• Private Record Linkage
•  Privacy preserving record level matching

Solu1on (1): Private computa1on
3 E
71175935987496430338623223060201843925208459762815635262949815592595
16861516633702469933935260534155369128712003211669147527394965883186
98743040588706948658192655353713280945959536474253285115856347911583
77797185627083578174160157299579445890692023902698424427665636040729
38327792655060957281939887206011322264791188672934779233385835564950
538042608146734818512597109…..........
65535371328094595953647425328511585634791158377797185627083578174160
15729957944589069202390269842442766563604072976104138715920619699952
17697451818900805720754176976456091364980410538327792655060957281939
88720601132226479118867293477923338583556495053804260814673481851259
70093558089132685793389213865608731685640953069735077874534452166343
33195600873200349632089…....
2 E
+ “+”
95364742532851158563479115837779718562708357817416015729957944589069
20239026984244276656360407297610413871592061969995217697451818900805
11886729347792333858355649505380426081467348185125971095628099782109
58956224480113528398128884692700462576308469655060770093558089132685
79338921386560873168564095306973507787453445216634333195600873200349
632089270046257630846…....
D 5
=
= Data Analy.cs Without Seeing the Data 9 |

Solu1on (1): Private computa1on
3 E
71175935987496430338623223060201843925208459762815635262949815592595
16861516633702469933935260534155369128712003211669147527394965883186
98743040588706948658192655353713280945959536474253285115856347911583
77797185627083578174160157299579445890692023902698424427665636040729
38327792655060957281939887206011322264791188672934779233385835564950
538042608146734818512597109…..........
65535371328094595953647425328511585634791158377797185627083578174160
15729957944589069202390269842442766563604072976104138715920619699952
17697451818900805720754176976456091364980410538327792655060957281939
88720601132226479118867293477923338583556495053804260814673481851259
70093558089132685793389213865608731685640953069735077874534452166343
33195600873200349632089…....
2 E
+ “+”
95364742532851158563479115837779718562708357817416015729957944589069
20239026984244276656360407297610413871592061969995217697451818900805
11886729347792333858355649505380426081467348185125971095628099782109
58956224480113528398128884692700462576308469655060770093558089132685
79338921386560873168564095306973507787453445216634333195600873200349
632089270046257630846…....
D 5
=
= 10 | Data Analy.cs Without Seeing the Data

Solu1on (2): Distributed analy1cs
Compute
Data
Dept 2
Compute
Data
N1 Secure compute
Confidentiality boundary
Data always remains conﬁden1al
to the source ins.tu.on
Dept 1
Compute
N1
Coordinator
Messages containing
encrypted data
11 | Data Analy.cs Without Seeing the Data

Solu1on (3): Private Record Linkage
Dataset A Dataset B
Tori Mckone 7/06/1921 F
Tori Mackon 6/07/1921 F
Victoria Mckon 7/06/1921 F
?
?

Scoring
Model
Own
Data
Other
Data
Quality
??

Suspicious Ac1vi1es
Need to report?
Model
Builder

Industry using Gov Data
Model
Builder
Own
Data
Gov
Data

Benchmarking
Own
Data
Model
Builder

Device Analy1cs
Model of normal
behaviour
OK OK NG OK
Private Modeling
learn
deploy
OK NG OK
19 |

Homomorphic encryp1on
Partial
Homomorphic
Encryption
Somewhat
Homomorphic
Encryption
Fully
Homomorphic
Encryption
Allows either addition or
multiplication of encrypted
numbers
Allows evaluation of low order
polynomials
Allows evaluation of arbitrary
functions
Moregeneral
Faster

Paillier Encryp1on
c = gm
rn
modn2
Encryption of m:
D E m1( ).E m2( )modn2
( )= m1 + m2 modn
D E m1( )
m2
modn2
( )= m1m2 modn
Addition of encrypted numbers:
Multiplication of encrypted number by a scalar:

Paillier Encryp1on
c = gm
rn
modn2
Encryption of m:
Addition of encrypted numbers:
Multiplication of encrypted number by a scalar:
gm1
× gm2
= gm1+m2
gm1
( )
m2
= gm1m2

Paillier Implementa1ons
• Python – open source
•  www.github.com/nicta/python-paillier
• Java – open source
•  www.github.com/nicta/javallier
• Javascript – s.ll under closed
development

Distributed, Conﬁden1al
Analy1cs

Distributed Compu1ng with a Twist
Compute
Data
Org 2
Compute
Data
N1 Secure compute
Confidentiality boundary
Data always remains conﬁden1al
to the source organisa.on
Org 1
Compute
N1
Coordinator
Messages containing
ONLY encrypted data

Graph Computa1on Engine
Domains
CE
CE
CE
DF DF
CE
DF
CE
Coordinator
Worker
Workers
Properties
M
M
M
M
M
Messages
M JSON Message
CE AKKA actors
DF Data frames

N1 Analy1cs Pla[orm
Privacy Technologies
Partial homomorphic
encryption
Private Record
Linkage
Irreversible
aggregation
Distributed Graph Computation Engine
Analytics
Statistics Regression Clustering
Data Auth
Machine Learning
Learn Evaluate Deploy
Network

Logis1c Regression
p x;θ( )=
1
1+e−θ.x
L θ( )= yi logp xi;θ( )+ 1− yi( )
i=0
n
∑ log 1− p xi;θ( )( )
Logis.c func.on
Log likelihood
Minimise for :
Evaluate:
θ
Requires “secure log” and “secure inverse” protocol
using Paillier encryp.on
Builds on Han et al. 2010 “Privacy Preserving Gradient Descent Methods”

Example Paillier Logis1c Regression
Org B
CECE
Coordinator
Worker
Secure
Log
Logistic
Learner
Secure
Inverse
M JSON Message
CE AKKA actors
DF Data frames
Gradient
Descent
Private key holder
Features & labels Features
Org A
N1Analytics

Performance
•  Learning
•  Learnt models have the same
accuracy as unencrypted
calcula.ons
•  “Private learning” is (1000x)
slower due to encrypted
computa.ons. Learning .mes are
several hours.
•  Deployment
•  A score can be generated in real
.me (<50ms)
•  Customer data that contributes to
the score remains private.
��
�� (��)
��
��
��
��
�� (�)
��
��

Scaling
Coordinator
Data Provider 1
Data Provider 2
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
Worker
��
●
●
●
●
●
■
■
■ ■ ■
◆
◆
◆
◆
0 100 200 300 400
Cores
5
10
50
100
500
Minutes
Learning time scaling
● 10,000x10 features
■ 100,000x10 features
◆ 1,000,000x10 features

Record Linkage Challenge
Dataset A Dataset B
Tori Mckone 7/06/1921 F
Tori Mackon 6/07/1921 F
Victoria Mckon 7/06/1921 F
?
?

Solu1on (3): Private Record Linkage
Jane Doe
Paul Doe
Jim Clark
Kate Clark
Shan Bo
Reg Pal
Janet Doe
Bob Doe
Jim Clark
Kat Clark
Shan Bo
Joe Smith
a8bf342
f72630b
14oe54
a72bef4
7830530
4bf6021
a8bf242
b3894f3
14oe54
672bef4
7830530
80ac364
Fuzzy Matching
One way hash func.ons One way hash func.ons

Private Record Linkage
Fuzzy
Matcher
Shared Secret
Salt Hasher
Personally
Iden.fiable
Informa.on
Anonymous
Bloom filter
Hasher
Personally
Iden.fiable
Informa.on
Anonymous
Bloom filter
Linkage Table
N1
Company A Company B
PII cannot be recovered from the hashes

Private Record Linkage
44 |
44
Organisa.on B
Fuzzy
Matcher
Organisa.on A
N1 Analy.cs
A's$PII$data
Name DOB Gender
John/Smith 12/01/82 M
Mark/Gorgon 1/12/90 M
Hanna/Smith 4/02/78 F
… … …
… … …
Juliet/Baker 2/11/72 F
B's$PII$data
Name DOB Gender
Mark.Gorgon 1/12/90 M
Juliet.Baker 2/11/72 F
Andrew.Roberts 4/02/93 M
… … …
… … …
Hanna.Smith 4/02/78 F
A's$Cryptographic$Hashes
Row Key
1 10110110...00101010
2 01110110...11010101
3 10011001...10100110
… …
… …
100000 01101011...00101101
B's$Cryptographic$Hashes
Row Key
1 01110110…11010101
2 01101011...00101101
3 01111000…00110011
… …
… …
100000 10011101...10100111
Shared
Secret Salt
Hasher Hasher
Linkage(Table
Row$A Row$B
1 X
2 1
3 100000
… …
… …
100000 X
Similar in approach to MERLIN - Ranbaduge, Vatsalan, Christen (2015)
Data Analy.cs Without Seeing the Data

Probabilis1c Record Linkage
Common categorical features
(e.g post code, age range, gender)
Record linkage can be a privacy issue

Classiﬁca1on without iden1ty linking
46 |
Features
Labels
Rados Features
Shared feature
Labels*
Label Propor.ons
Learning from Label Proportions
Patrini, Nock, Caetano, & Rivera, NIPS (2014), (Almost) No label no cry

Classiﬁca1on without iden1ty linking
47 |
Features
Labels
Rados Features
Shared feature
Labels*
Encrypted Label
Propor.ons
Learning from Encrypted Label Proportions

Current Capabili1es of N1 pla[orm
•  Standard data analy.cs
techniques on confiden.al
data:
•  Correla.on analysis
•  Classifica.on / predic.on
•  Regression
•  Clustering / outlier detec.on
•  Automated private record
linkage
•  Fine grained authorisa.on and
access control
Dept 1
Org 2
Comp3
Private record
linkage
Sta.s.cs Classifiers
Anomaly
Detec.on
Private analy.cs
Federated model – No central database
Data is kept local to the source

Beta program
• Not open sourced (yet!)
• Looking for partners who want to use our
system in their applica1ons
• S.ll some warts, but working in
commercial sesng

Acknowledgements
51 |
Engineering
Mr. Brian Thorne
Dr. Mentari Djatmiko
Dr. Guillaume Smith
Dr. Wilko Hanecka
Dr. Hamish Ivey-Law
Research
Dr. Richard Nock
Mr. Giorgio Patrini
Dr. Roksana Borelli
Dr. Arik Friedman
Prof. Hugh Durrant-Whyte
Business
Mr. Warren Bradey
Ms. Shelley Copsey
Lead: Dr. Stephen Hardy

www.csiro.au
Data Analy1cs Without
Seeing the Data
Max O>
… with input from the en1re N1 Team
max.o>@data61.csiro.au

Confidential Computing - Analysing Data Without Seeing Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Confidential Computing - Analysing Data Without Seeing Data

Similar to Confidential Computing - Analysing Data Without Seeing Data (20)

Recently uploaded

Recently uploaded (20)

Confidential Computing - Analysing Data Without Seeing Data