Enterprises hold data that has potential value outside their own firewalls. We have been trying to figure out how to share such data at a level of detail with others in a secure, safe, legal and risk mitigated manner that ensure high level of privacy while adding tangible economic and social value. Enterprises are facing numerous roadblocks, failed projects, inadequate business cases, and issues of scale that needs newer techniques, technology and approach.
In this talk, we will be setup the groundwork for scalable data augmentation for organisations and visualising technical architectures and solutions around emerging technologies of data fabrics, edge computing and a second coming of data virtualisation.
2. Data Exchanged (without
consent)
• GPS
• HIV Status
• Email addresses
• Weapon: Contract
• Response: Excuse
• Exposure: (Potential) exposure
of marginalized people.
3. Data Breach:
• Email Addresses
• Username & Passwords
Exposure:
• 150 million customers
Response:
• No clear Apologies
• (Delayed) Corrective Actions
Weapon: Contract
4. Data Breach:
• Names
• Loyalty data
• Email addresses
• Physical addresses
• DOB
• Credit Card last 4 digits
Exposure:
• Millions of Customers
Response:
• Denial
• Fake Solutions
• 8 months before first action
5. Paper contracts are still the most common
weapon organizations use to get away with.
As regulations get more mature, the impetus
to be more effective in privacy preservation
will be on service providers.
6. From the exhibition: "M. Hulot, the protagonist in Jacques Tati's 1967 film Playtime, is
Enterprises have different data landscape than
consumer facing (typically tech) organisations.
Enterprises have silos, legacy systems, have to learn
to be data driven the hard way and have divergent
forces giving a unique focus on
9. Data Augmentation
ORG A
Class 1
Class 2
Class 3
ORG A
Class 1
Class 2
Class 3
ORG B
ORG C
Potentially Better
Typical Modeling Exercise
Modeling after data augmentation
10. ORG A
Class 1
Class 2
Class 3
ORG B
ORG C
Content Shared
• Aggregated Data / Insights
• Open Data
• Stratified Sampling
• Synthetic Data
• De-identified / Anonymized
Channels:
• Public Portals
• Private Marketplaces
• In Person Walk
throughs/handovers
• Gossiping
• Pigeons
Data Augmentation
11. Data as an asset
• Easy to copy and spawn
• Does not depreciate or depletes
• Really hard to valuate
• Process to yield value
• Various forms and derivatives
Resolve to First Principles
Data has properties that make it
intrinsically hard to ensure privacy
preservation. Therefore, we must
adhere to first principles to better
understand the problem
statement first.
12. The Five Safes
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Great Resources
ACS Data Sharing Frameworks The De-Identification Decision Making Framework
13. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
14. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
15. Safe Data – (Encryption)
Data at Rest Standard Encryption
Data in Transit Secure the Pipe
Data for Compute Homomorphic Encryption
16. Homomorphic Encryption
Partial Homomorphic Encryption (PHE)
Somewhat Homomorphic Encryption (SWHE)
Full Homomorphic Encryption (FHE)
Addition/Multiplication
Low Order Polynomials
Eval of Arbitrary Functions
More
General
Less
Costly
Data Analytics without seeing the data
Max Ott, YOW Data 2016
17. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
18. Safe Setting - Confidential Computing
Trusted Execution Environments (Safe Data in Safe Setting)
Microsoft Azure Confidential Computing
Google Cloud Platform: Asylo Open Source Framework
Confidential Computing at the Software layer?
19. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
24. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
26. Safe People – (System Span)
Expanding the Span of control
27. First Principles
Safe Data
Safe People
Safe Setting
Safe Project
Safe Output
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
29. Safe Project – Audit Trails & Lineage
?
Data
in the
wild
Its still very hard within enterprises
to have a point to point track of data
lineage and processing.
The problem is expounded when
data leaves the span of vision.
30. One Ring to Rule them All?
Encryption
Authentication & Authorisation
Environment for Data Controllers & Processors
Audit Trail, Lineage and Access & Query Logs
Linkage Problem
A data landscape must cover all
principles of data privacy.
36. The Zetaris Enterprise Data Fabric – Location Aware, Usage Aware, People Aware, Privacy Preserved data in a secure
environment.
Also check out Apache Ignite, Redhat OpenShift + JBoss Virtualization,.
37.
38. GDPR Highlights
Data
Portability
Erasure
Access
Consent
Right to transfer personal data from one electronic
processing system to and into another.
Right to withdraw consent and ask for personal
data to be deleted
Right to know what’s been collected and how its
being processed
Consumer is informed in ’clear’ and plain language.
Consent to collect can be withdrawn at any time
By Design
By Design
By Design
By Design
Only through
Serialization
Random writes
are not typical
Limited Purview
Hard
Monoliths e.g. Lakes Data Fabric
39. As data scientists, we are at
the forefront of disruption
and hold the potential to
change things. We are
automating decisions in all
aspects of society.
Yet, our work has serious
negative implications, we
need to educate ourselves
on the broader societal
questions around
regulations, ethics and
impact
Enjoy the Tribe!
Editor's Notes
Data Augmentation
Value comes with greater depth of analysis
Data Exchanges Models
Insights as a service
application offloading
marketplaces
virtualization with least cost and exposure routing
data fabric as a data augmenation approach
Status Quo:
Sampling (stratified sampling or rather top N)
De-identified
Highly aggregated
Data Augmentation
Value comes with greater depth of analysis
Data Exchanges Models
Insights as a service
application offloading
marketplaces
virtualization with least cost and exposure routing
data fabric as a data augmenation approach
Status Quo:
Sampling (stratified sampling or rather top N)
De-identified
Highly aggregated
Aim for simplicity
Monolithic systems
Distributed by Design
Co-Location (NSW Data Sharing Framework)
Same data different use cases
PII embedded.
Resourse Contention in Monolithic Systems
The problem of monoliths
We are treating this as a separate thing (privacy)
Open data movement and the open data publishing
Separate teams for data publishing and data creation
Dr Eugene – For engineers, data is a commodity that flows through the system
Aim for simplicity
Monolithic systems
Distributed by Design
Co-Location (NSW Data Sharing Framework)
Same data different use cases
PII embedded.
Resourse Contention in Monolithic Systems
The problem of monoliths
We are treating this as a separate thing (privacy)
Open data movement and the open data publishing
Separate teams for data publishing and data creation
Dr Eugene – For engineers, data is a commodity that flows through the system
Aim for simplicity
Monolithic systems
Distributed by Design
Co-Location (NSW Data Sharing Framework)
Same data different use cases
PII embedded.
Resourse Contention in Monolithic Systems
The problem of monoliths
We are treating this as a separate thing (privacy)
Open data movement and the open data publishing
Separate teams for data publishing and data creation
Dr Eugene – For engineers, data is a commodity that flows through the system
Aim for simplicity
Monolithic systems
Distributed by Design
Co-Location (NSW Data Sharing Framework)
Same data different use cases
PII embedded.
Resourse Contention in Monolithic Systems
The problem of monoliths
We are treating this as a separate thing (privacy)
Open data movement and the open data publishing
Separate teams for data publishing and data creation
Dr Eugene – For engineers, data is a commodity that flows through the system
Going for Microservices
Background
Databases are still monoliths
Problem is: we are again replicating data to tie them up behind microservices
Meta pattern
Going for Microservices
Background
Databases are still monoliths
Problem is: we are again replicating data to tie them up behind microservices
Meta pattern
Going for Microservices
Background
Databases are still monoliths
Problem is: we are again replicating data to tie them up behind microservices
Meta pattern
The enterprise data fabric
Single envionrment where the data is packaged and lives as its source
SOR and Apps and data analysis.
Privacy built in by two ways
Encyrption embedded. Usage tracked and secure.
Data fabric
Data colocation – hybrid vs on-prem vs on cloud
Geographically aware
Least cost routing
Least exposure routing
In memory compute grids (unified access and unified controls)
Edge computing and IoT data privacy (Boris)