Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019 Slide deck

Opening session
Richard Stevens, e-SIDES, IDC
European Big Data Community Forum, 2019

Morning Session Agenda
Towards value-centric Big Data
3
Welcome Keynote - Dr Malte Beyer-Katzenberger, DG
for Communication Networks, Content and Technology
of the European Commission
10.15
10.30
10.45
11.30
Keynote - Prof Dr Christiane Wendehorst, President -
European Law Institute, co-chair of the German Data
Ethics Committee
Panel session - Extracting the value of data: How
the research and industry community can best
move forward to balance privacy and fairness
Marina Da Bormida, Maryant Fernandez Perez,
Diego Naranjo, Moira Patterson
Coffee break
13.00 Networking lunch
11.45
Projects panel
e-SIDES, SODA, SPECIAL WeNet, MyHealthMyData

Afternoon Session Agenda
Lessons learned from research and
technology for a human-centered Big Data
4
Afternoon session introduction
Rigo Wenning, SPECIAL
14.00
14.10
16.00
16.30
Break-out sessions
• Technology and Data Protection Law – how can
software engineering support legal compliance?
• Human-centric Big Data governance: responsible
ways to innovate privacy-preserving technologies
Wrap-up
Rigo Wenning, SPECIAL
Closing remarks

Towards a more ethical data economy?
Malte Beyer-Katzenberger, European Commission, DG CONNECT

«The EU needs to go further in developing a competitive, secure, inclusive and
ethical digital economy with world-class connectivity. Special emphasis should be
placed on access to, sharing of and use of data, on data security and on
Artificial Intelligence, in an environment of trust.»
(conclusions of 21/22 March 2019 the European Council)
European Big Data Community Forum, 2019 2
The political context

What does the future hold for the data economy/data4AI?

Chosing the right course of action

Data (sharing) spectrum

▪ Anonymised: All set (?)
▪ Consent-based: Fatigue – anyone?
▪ Broad consent (and accompanying measures)
▪ Magic PETs (privacy-preserving analytics)
▪ Novel challenges: Group rights?
Personal data sharing

▪ Future funding inititiave under the Digital Europe programme (as of 2021);
▪ Announced in the Coordinated Plan on AI ((COM(2018)795, cf. Annex I) as a
measure to improve data access for AI, in particular privately-held data;
▪ To be operated by private consortia or in PPP;
▪ For specific sectors of the economy (e.g. manufacturing, mobility, agriculture,
energy) or thematic domains (health, climate change management);
▪ To allow machine learning on public sector data (cf. High value datasets under
Open Data Directive) and privately-held data pooled on the basis of voluntary
agreement (or legal obligation if one exists);
▪ No single design plan – depends on sector or domain.
Common European data spaces

Ethical elements
▪ Ethics of the use of algorithms
▪ Ethics of the collection and use of data
▪ Ethics of withholding data

Malte Beyer-Katzenberger
Team leader, data innovation & policy
Malte.beyer-Katzenberger@ec.europa.eu
@beyermalte
European Commission

Christiane Wendehorst
Co-Chair of the Data Ethics Commission
Opinion of the
German Data Ethics Commission

• Established in mid 2018 with the mission to develop, within
one year, an ethical and regulatory framework for data,
ADM and AI
• Co-chaired by Christiane Wendehorst and
Christiane Woopen
• Opinion presented in Berlin
on 23 October 2019
• Includes ethical guidelines and
75 concrete recommendations for action
regarding data and algorithmic systems
Data Ethics Commission

3
Ethics of handling
personal data
Ethics of handling data in general
(including non-personal data)
Ethics of handling data and data-driven technologies
(including algorithmic systems, such as AI)
Ethics of the digital transformation in general
(including issues such as the platform economy or the future of work)
Wider
framework
Data-driven technologies
(such as AI)
Data
What is Data Ethics?

Introduction
General ethical principles
Technological Foundations
Multi-Level-Governance of
Digital Ecosystems

General ethical principles
Human dignity
Autonomy
Privacy
Security
Democracy
Justice and Solidarity
Sustainability

Data perspective and algorithms
perspective
Data rights
and data
obligations
Requirements
for algorithmic
systems

Data Governance Principles
• In line with the Principles under Article 5
of the GDPR, but apply to personal as
well as non-personal data
• Stress the potential of data use and data
sharing for the common good
• Recognise that there may, under certain
circumstances, also be an ethical
imperative to use data
Data use and
sharing for
the common
good
Foresighted
responsibility
Respect for
the rights of
the parties
involved
Fit-for-
purpose
data quality
Risk-
adequte
information
security
Interest-oriented
transparency

Data rights and corresponding data
obligations
• Rights vis-à-vis a controller of data,
aimed at access, desistance,
rectification or at receiving an
economic share
• Inspired by ALI-ELI Principles
• No plea for “data ownership”
• Data subjects’ rights under the
GDPR as a particularly important
manifestation

Illustration: The non-personal data collected by sensors in modern agricultural machinery (relating to soil quality,
weather, etc.) are used by manufacturers as a basis for many of the services they provide (precision farming,
predictive maintenance, etc.). If the manufacturers were to forward these data to potential investors or lessors
of land, however, the latter would be given information that might prove harmful to an agricultural holding if
negotiations over the land were to take place in the future.
Rights to require desistance from data use

Ethical imperatives to use data
Illustration: A hospital is experiencing an outbreak of a multi-resistant pathogen. It wants to analyse the health
data of patients who have recently become infected in order to gain a better idea of why certain individuals are
more likely to fall prey to the pathogen, as a basis for pinpointing the inpatients that might benefit most from a
move to another hospital. Under these circumstances, the hospital has a general obligation to provide new
patients with the best possible protection against infection by taking all available and reasonable precautions to
this end. This includes the use of health data belonging to patients who have already been infected with the
pathogen, provided that said use might protect new patients and there is no obligation emanating from the
former group of patients to desist from use of their data.

Rights to request access to data
Illustration: A supplier manufactures the engines for the agricultural machinery referred to in the first
Illustration. It would be extremely useful for the supplier to have access to certain tractor data so that it can
verify and constantly improve the quality of its engines. These data are stored in the manufacturer’s cloud,
however, and the latter is unwilling to allow the supplier to access them.

Rights to request rectification of data
Illustration: A very high error rate has been detected in the engine data stored by the manufacturer in the
previous Illustration. This is problematic for the company that supplies these engines, not only because it
deprives the company of the possibility to fulfil its quality assurance remit, but also because these engine-related
data are pooled with engine-related data from other engine manufacturers as a basis for evaluations, and poor
performance metrics for the engines from the relevant supplier might reduce the latter’s chances of securing
orders from other manufacturers. The processing of inaccurate data causes harm to the supplier.

Standards for the Use of Personal Data
• Recommendations for measures against ethically indefensible uses of data and
against the existing enforcement gap, including by fleshing out and strengthening the
existing legal framework (e.g. concerning profiling and trade in data)
• Recommendations with regard to specific contexts: data as “counter-performance”,
personalised risk assessment, digital inheritance
• Recommendations with regard to specific groups of data subjects: employees,
patients, minors, vulnerable adults
• Better implementation of privacy by design

Improving controlled access to personal data
• Better legal certainty for researchers (clarification and harmonisation of the law,
innovative forms of consent, etc.)
• Fostering progress with anonymisation, pseudonymisation and synthetic data
• Innovative data management and data trust schemes as the way forward
• Duty to provide for interoperability/interconnectivity in particular sectors (by way of
asymmetrical regulation)

Debates around access to non-personal data
• ASISA-Principle (Awareness – Skills – Infrastructures – Stocks – Access): Investing in
awareness raising, data infrastructures, and practical support
• Cautious adaptations of the current legislative framework (limited third party effects of
data contracts, facilitating data pooling, etc) and possibly further legislative measures
• Fostering open data in the public sector (open government data) while improving
protection of third parties
• Open data in the private sector: incentives for voluntary data sharing, cautious
approach to statutory duties, mainly on a sector-by-sector basis

A risk-based regulatory framework
• „Criticality pyramid“: different levels of
potential for harm (risk)
• No need for any regulation with regard to
most algorithmic systems
• Ban on systems involving an untenable
potential for harm
• Plea for a horizontal Regulation at EU level
and sector specific legislation at both EU
and national levels

E-Mail: christiane.wendehorst@univie.ac.at

Operationalizing Data Ethics: which barriers, opportunities
and facilitating factors for SMEs?
Marina Da Bormida, R&I Legal Advisor and Ethics Expert

Data Economy based on EU values for boosting European
competitiveness
Business
Citizens
Government and public
bodies
Science
Pursuing benefits for all involved stakeholders,
“Towards a European Data Sharing Space”
(BDVA PP, April 2019)
From Legal Compliance towards Data
Ethics

Barriers and challenges for SMEs / 1
Limited knowledge, information and awareness
• No specialised staff in ethics issues
• Unfamiliar with most topics (fundamental rights impact assessment, trade offs,…)
• GDPR-centric vision
Far from their daily business work and customer base’ demand:
gap

Barriers and challenges for SMEs / 2
Lack of perception of the future direct benefits and unnecessary
• Lack of long-term vision
• Cultural resistance
Limited resources
• Time and cost constraints
• Perception of irresponsible use of time and resources
• Disproportion between necessary efforts (hard work and research) and available
resources

Opportunities for SMEs
Alignment of
some Data Ethics
requirements
with streamline
tasks (such as
auditing & risk
assessmen)
Ethical practices
as a possible
competitive
factor (thought
not full
awareness)
Familiarity with
ethical data
collection and
processing
Ad-hoc
compliance
supporting
services?

The way forward…reflecting on facilitating measures/ 1
Fintech universe
Small scale, live testing
of innovations in a
controlled environment
Main features: possible
special exemption &
regulator’s supervision
More open and active
dialogue between
regulators and
innovators
Revise and shape the
regulatory framework
with agility
Extention of DIHs’
function as
experimentation
facilities? Clear
framework needed
Regulatory Sandboxes

The way forward…reflecting on facilitating measures / 2
Incentives and
awareness campaign
Participation to
European Projects
Addressing the value-
chain’s asymmetries of
power
Business ecosystem
services (training,
certification,…), such as
in DIHs
Cross-fertilization and
intersection dynamics
of Technology and
Law/Ethics

Thank you!ou!
Marina Da Bormida
R&I Legal Advisor and Ethics Expert
m.dabormida@eurolawyer.it
+393498433690
“Bejond privacy. Learning Data Ethics”, Bruxelles, 14 November 2019

Consumer-friendly EU policies on Artificial
Intelligence and the data economy
BEUC – The European Consumer Organisation
Author: Maryant Fernández

AI: TRUST IS GOOD, CONTROL IS BETTER
https://www.vzbv.de/sites/default/files/2019_vzbv_factsheet_artificial_intelligence.pdf

AI RIGHTS FOR CONSUMERS
3
• Right to Transparency, Explanation, and
Objection (clear picture; stay in control; risk-based)
• Right to Accountability and Control (appropriate
technical systems to ensure compliance)
• Right to Fairness (expectations respected; input +
output fair; general welfare aspects)
• Right to Non-Discrimination (incorrect
predictions; adverse effects; proxy discrimination)
• Right to Safety and Security (safety for software;
regulatory oversight; updates)
• Right to Access to Justice (redress & public
enforcement; product liability modernised)
• Right to Reliability and Robustness (technically
robust and reliable by design; data quality)

Competition
Reducing barriers to
entry
Preventing lock-in
Enabling innovation
Protection and
empowerment
Giving control over
personal data
Respecting consumer
rights
Privacy-enhancing
innovation
Common
interest
Promoting innovation
that benefits
consumers
Protecting freedom of
information
Encourage access to
public data
Oversight
Coherent data
governance
Cooperation between
authorities
Effective enforcement
and redress for
consumers
BEUC’s vision for a European data access and control policy

Consumer check-list
5
1. Address market failures.
2. Stimulate innovation, bearing in mind
innovation ≠ progress.
3. Put consumers at the centre in data
sharing, in conformity with the GDPR (data
minimisation, purpose limitation, data
protection by design…)
4. Ensure a high-level of data security.
5. Adopt technical solutions to help
consumers control and manage flows of
personal information.
6. Make redress available to consumers.
7. Reduce the risks of data concentration
and excessive data collection
8. Promote the common interest through
open data initiatives.

Maryant Fernández
Senior Digital Policy Officer
Digital@beuc.eu
BEUC – The European Consumer Organisation
Thanks for your attention!

IEEE SA – ADVANCING
TECHNOLOGY FOR THE BENEFIT OF
HUMANITY
STANDARDIZATION ACTIVITIES FOR AUTONOMOUS AND INTELLIGENT SYSTEMS

RAISING THE WORLD’S STANDARDS
Mission
Provide a high-quality, market-relevant standardization
environment that is respected world-wide
2
About IEEE SA
▪ Consensus-building organization within IEEE that
develops and advances global technologies - through
facilitation of standards development and
collaboration
▪ Promotes innovation, enables creation and expansion
of international markets; helps protect health, public
safety
▪ Drives functionality, capabilities and interoperability
of a wide range of products and services that
transform the way people live, work and communicate

IEEE ACTIVITIES IN A/IS AND ETHICS
Our Work: Putting principles into practice
3
Community
▪ 3000 members from all
continents
▪ 40% women
▪ Participation &
endorsement by
industry
▪ Recognition by
governments &
international
organizations
Ethically Aligned Design
▪ Provides guidance for
standards,
certification,
regulation, & serves
as a reference for the
work of policymakers,
industry members,
technologists,
& educators
“EAD For” Series
▪ Business
▪ Artists
▪ Health
▪ Parenting
▪ Advertising
Standards
▪ Nearly 30 AI/AS
standards projects in
development of which
15 are ethically
oriented
▪ Included in the
ethically oriented
standards is IEEE
P7000, which
establishes a process
model by which
engineers &
technologists can
address ethical
considerations
Certification
Criteria and process for
Certification / marks
addressing:
▪ Transparency in A/IS
▪ Accountability in A/IS
▪ Algorithmic Bias in A/IS
Education and learning
▪ AI & Ethics in Design
Business Course
▪ EAD University Consortium
▪ Engagement and
collaboration with
governments,
municipalities and
intergovernmental
fora (EU, EC, CoE, OECD,
UN orgs, NYC, Vienna,
Espoo, ….)

IEEE SA TECHNICAL STANDARDS
P3652.1™ - GUIDE FOR
ARCHITECTURAL FRAMEWORK
AND APPLICATION OF FEDERATED
MACHINE LEARNING
P2807™, P2807.1™ - KNOWLEDGE
GRAPHS (FRAMEWORK,
EVALUATION)
P1872.2™ - STANDARD FOR
AUTONOMOUS ROBOTICS (AUR)
ONTOLOGY
P2040™ - STANDARD FOR
CONNECTED, AUTOMATED AND
INTELLIGENT VEHICLES: OVERVIEW
AND ARCHITECTURE
P2040.1™- STANDARD FOR
CONNECTED, AUTOMATED AND
INTELLIGENT VEHICLES:
TAXONOMY AND DEFINITIONS
P2660.1™ - RECOMMENDED
PRACTICES ON INDUSTRIAL
AGENTS: INTEGRATION OF
SOFTWARE AGENTS AND LOW
LEVEL AUTOMATION FUNCTIONS
P2418.4™ - STANDARD FOR THE
FRAMEWORK OF DISTRIBUTED
LEDGER TECHNOLOGY (DLT) USE IN
CONNECTED AND AUTONOMOUS
VEHICLES (CAVS)
P2751™ - 3D MAP DATA
REPRESENTATION FOR ROBOTICS
AND AUTOMATION
PC37.249™ - GUIDE FOR
CATEGORIZING SECURITY NEEDS
FOR PROTECTION AND
AUTOMATION RELATED DATA
FILES
P2672™ - GUIDE FOR GENERAL
REQUIREMENTS OF MASS
CUSTOMIZATION
P2812™ - GUIDE FOR MINOR
GUARDIANSHIP SYSTEM FOR
ONLINE MOBILE GAMING
P1589™ - STANDARD FOR AN
AUGMENTED REALITY LEARNING
EXPERIENCE MODEL
P2247.1™, P2247.2™, P2247.3™ -
ADAPTIVE INSTRUCTOINAL
SYSTEMS (CLASSIFICATION,
INTEROPERABILITY, AND
EVALUATION)
P2830™ - STANDARD FOR
TECHNICAL FRAMEWORK AND
REQUIREMENTS OF SHARED
MACHINE LEARNING
P3333.1.3™ - STANDARD FOR THE
DEEP LEARNING-BASED
ASSESSMENT OF VISUAL
EXPERIENCE BASED ON HUMAN
FACTORS

IEEE SA IMPACT STANDARDS
IEEE P7000™ – MODEL
PROCESS FOR
ADDRESSING ETHICAL
CONCERNS DURING
SYSTEM DESIGN
IEEE P7001™ –
TRANSPARENCY OF
AUTONOMOUS SYSTEMS
IEEE P7002™ – DATA
PRIVACY PROCESS
IEEE P7003™ –
ALGORITHMIC BIAS
CONSIDERATIONS
IEEE P7004™ –
STANDARD ON CHILD
AND STUDENT DATA
GOVERNANCE
IEEE P7005™ –
STANDARD ON
EMPLOYER DATA
GOVERNANCE
IEEE P7006™ –
STANDARD ON
PERSONAL DATA AI
AGENT
IEEE P7007™ –
ONTOLOGICAL
STANDARD FOR
ETHICALLY DRIVEN
ROBOTICS AND
AUTOMATION SYSTEMS
IEEE P7008™ –
STANDARD FOR
ETHICALLY DRIVEN
NUDGING FOR ROBOTIC,
INTELLIGENT AND
AUTONOMOUS SYSTEMS
IEEE P7009™ –
STANDARD FOR FAIL-
SAFE DESIGN OF
AUTONOMOUS AND SEMI-
AUTONOMOUS SYSTEMS
IEEE P7010™ –
WELLBEING METRICS
STANDARD FOR ETHICAL
ARTIFICIAL
INTELLIGENCE AND
AUTONOMOUS SYSTEMS
IEEE P7011™ –
STANDARD FOR THE
PROCESS OF
IDENTIFYING & RATING
THE TRUST-WORTHINESS
OF NEWS SOURCES
IEEE P7012™ –
STANDARD FOR
MACHINE READABLE
PERSONAL PRIVACY
TERMS
IEEE P7013™ –
INCLUSION AND
APPLICATION
STANDARDS FOR
AUTOMATED FACIAL
ANALYSIS TECHNOLOGY
IEEE P7014™ – STANDARD
FOR EMULATED EMPATHY IN
AUTONOMOUS AND
INTELLIGENT SYSTEMS
5

OTHER RELEVANT STANDARDS RELATING TO DATA
Global Initiative to Standardize Fairness in the
Trade of Data
▪ Focus on three principles:
﹣ Data Agency
﹣ Data Ethics
﹣ Data Equity
Digital Inclusion, Identity, Trust, and Agency
(DIITA) Program
▪ Workstreams include work on:
﹣ Privacy by Design
﹣ Dignity in Gaming
IEEE P2089 - Standard for Age Appropriate
Digital Services Framework - Based on
the 5Rights Principles for Children
EAD for Parenting
6

WE INVITE YOU TO
CONNECT WITH US.
Moira Patterson
m.patterson@ieee.org
https://www.facebook.com/ieeesa/
https://twitter.com/IEEESA
https://standards.ieee.org/

Privacy-preserving technologies
in a data-driven society
Daniel Bachlechner, Fraunhofer
European Big Data Community Forum 2019
14 November 2019
Source:https://www.ethicalsocietymr.org/upcoming-events.html

Improve the dialogue
between stakeholders
and increase the
confidence of citizens
in data technologies
and use
e-Sides Ethical and Societal Implications of Data Sciences 2
Objectives and methods
▪ Investigation of related projects
through joint workshops,
interviews and website analyses
▪ Collection of insight from
renowned experts with
different backgrounds through
workshops and interviews
▪ Review of more than 200
articles including academic
papers and study reports
▪ Interaction with a diverse set of
stakeholders by means of a
collaborative platform
Key objectives Main methods
Reach a common vision
for an ethically sound
approach to data use
and facilitate
responsible research
and innovation

1) Identify ethical and
societal issues
2) Identify existing
technologies
3) Assess existing
technologies
4) Conduct a gap analysis
5) Identify design
requirements
6) Assess solutions under
development
7) Identify implementation
barriers
8) Make recommendations
3
Results
Self-
determination
Welfare
Privacy
Lawfulness
Fairness
Accountability
Trustworthiness
Independency
What issues may occur
in the context of data-driven applications?
Resources: D2.2, white paper
Privacy

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
4
Results
Anonymisation
Encryption Accountability
Deletion
Policy enforce.
MPC
Sanitisation
Transparency
Access control
User control
Access & portab.
Data provenance
How can they be addressed using technology?
Anonymisation
User control

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
5
Results
Specific assessment General assessment
Comprehensive set
Combination needed
Different aims
Multidimensional measure
needed
Limited integration
Regional differences
Combination with non
technical measures needed
Unclear responsibilitiesTension between objectives
Low demand
Does current technology meet the needs?
Combination needed
Regional differences
Resources: D3.2, white paper, WISP publication

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
6
Results
Ethical/legal Societal/economic
Privacy-by-design
Sensitive data
Inferred data
Liability and responsibility
Costs and benefits
Business models
Public attention
Economic value
Cultural fit
Skill level
Which aspects of data-driven solutions
still need to be improved?
Public attention

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
7
Results
Embed security and
privacy features
Connect people, processes
and technology
Take preventive
measures
Comply with laws and
corporate policies
Resources: D4.2
What should be considered when
designing new data-driven solutions?
Connect people, processes
and technology

▪ Strictest data
protection rules
apply
▪ Diverse range of
technologies
used
▪ Business models
increasingly rely
on sensitive data
▪ Established good
practices are
widely adopted
▪ Cooperation of
different stake-
holders needed
▪ Ad networks still
show limited
willingness to act
societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
8
Results
Are new data-driven solutions being
developed and used responsibly?
Healthcare Transportation Web browsing
Resources: D5.1

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
9
Results
Challenges Opportunities
Differences in attitudes
and contexts
Empowerment vs.
cognitive overload
Issues related to legal
compliance and ethics
Difficulties of conducting
assessments
Awareness raising and
transparency
Tools of accountability
Reference points of
accountability
Bodies and mechanisms
of oversight
Resources: D5.3, collaborative platform
How can data-driven solutions be
developed and used in a responsible way?
Empowerment vs.
cognitive overload

societal issues
technologies
3) Assess existing
technologies
5) Identify design
requirements
development
barriers
10
Results
Resources: D5.2
Developers and operators
of data-driven solutions
Policy makers dealing
with relevant issues
Developers of privacy-
preserving technologies
Civil society
(organisations)
What should be done to make
responsible data-driven solutions a reality?

Thank you!
@eSIDES_EU
#privacyinbigdata
eSIDES_EU
info@e-sides.eu
https://e-sides.eu/

Enhancing Transparency in
the Big Data and AI
Landscape
Sabrina Kirrane, Vienna University of Economics and Business
Beyond Privacy: Learning Data Ethics
13th of November 2019

Data & Data
Driven
Services
Regulators
Companies/
Service
Providers
Customers/
Service Users
Privacy
Preferences
Legal
Policies
Contracts/
Terms of use
SPECIAL Aims

• Detailed in D2.1 Policy Language V1 & D2.5
Policy Language V2
• Available for download via the SPECIAL website:
https://www.specialprivacy.eu/publications/pu
blic-deliverables
• An unofficial draft specification has been
published online
https://www.specialprivacy.eu/platform/ontolo
gies-and-vocabularies
The SPECIAL Usage policy language
Syntax and expressivity
Fast Compliance Checking in an OWL2 Fragment. Piero A. Bonatti. Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018)

The SPECIAL Policy Log Vocabulary
• Detailed in D2.3 Transparency
Framework V1 delivered in M14
• Available for download via the SPECIAL
website
https://www.specialprivacy.eu/langs/s
plog
• An unofficial draft specification has
been published online
https://www.specialprivacy.eu/platfor
m/ontologies-and-vocabularies
A Scalable Consent, Transparency and Compliance Architecture, Sabrina Kirrane, Javier D. Fernández, Wouter Dullaert, Uros Milosevic, Axel Polleres, Piero Bonatti, Rigo
Wenning, Olha Drozd and Philip Raschke , Proceedings of the Posters and Demos Track of the Extended Semantic Web Conference (ESWC 2018)

SPECIAL ODRL Regulatory Compliance Profile
• Preliminary Analysis Detailed in D2.2 Formal
Representation of the legislation V1 & D2.6
Formal Representation of the legislation V2
• Available for download via the SPECIAL website:
https://www.specialprivacy.eu/publications/pu
blic-deliverables
• An unofficial draft specification has been
published online
https://www.specialprivacy.eu/platform/ontolo
gies-and-vocabularies
ODRL policy modelling and compliance checking, Marina De Vos, Sabrina Kirrane, Julian Padget and Ken Satoh, Proceedings of the 3rd International Joint Conference
on Rules and Reasoning (RuleML+RR 2019)

Transparency and compliance checking
Subsumption Algorithm
• The development of a compliance checking
algorithm for the SPECIAL policy language
devised in T2.1
• A company’s policy can be checked for
compliance with data subjects’ consent and with
part of the GDPR by means of subsumption
queries
• We provide a complete and tractable structural
subsumption algorithm for compliance checking
• Detailed in D2.4 & D2.8 Transparency and
Compliance Algorithms
Piero A. Bonatti. Fast Compliance Checking in an OWL2 Fragment. Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018)

Transparency and compliance checking
Stream processing platform
7
• Data processing and sharing event logs are stored
in the Kafka distributed streaming platform,
which in turn relies on Zookeeper for
configuration, naming, synchronization, and
providing group services.
• We assume that consent updates are infrequent
and as such usage policies and the respective
vocabularies are represented in a Virtuoso triple
store.
• The compliance checker, which includes an
embedded
• A HermiT reasoner uses the consent saved in
Virtuoso together with the application logs
provided by Kafka to check that data processing
and sharing complies with the relevant usage
control policies.
• As logs can be serialized using JSON-LD, it is
possible to benefit from the faceting browsing
capabilities of Elasticsearch and the out of the
box visualization capabilities provided by Kibana.
A Scalable Consent, Transparency and Compliance Architecture, Sabrina Kirrane, Javier D. Fernández, Wouter Dullaert, Uros Milosevic, Axel Polleres, Piero Bonatti, Rigo
Wenning, Olha Drozd and Philip Raschke , Proceedings of the Posters and Demos Track of the Extended Semantic Web Conference (ESWC 2018)

• Standardisation of vocabularies (data, processing, purpose, storage,
sharing) is difficult
• There are cognitive limitations in terms of understanding consent and
transparency
• GDPR Compliance is only the tip of the iceberg, from a usage control
perspective we also need to consider other regulations, licenses,
social norms, cultural differences
• We need to embrace distributed and decentralised systems, which
complicates things further
• Ensuring such systems are well behaved is a crucial to success (i.e., all
usage constraints are adhered to and the system as a whole works as
expected)
Open Challenges & Opportunities

Contact Details
Technical/Scientific contact
Sabrina Kirrane
Vienna University of Economics and Business
sabrina.kirrane@wu.ac.at
The project SPECIAL (Scalable Policy-awarE linked data arChitecture for prIvacy, trAnsparency
and compLiance) has received funding from the European Union’s Horizon 2020 research and
innovation programme under grant agreement No 731601 as part of the ICT-18-2016 topic
Big data PPP: privacy-preserving big data technologies.

The project SODA has received funding from the European Union's
Horizon 2020 research
and innovation programme under grant agreement No 731583.
Paul Koster (Philips Research)
Progressing Practical Privacy-
Preserving Big Data Analytics
November 14, 2019, Brussel

2
Opportunity & problem: joint data analytics
Unlock value of joint data analytics by addressing the privacy – utility trade-off
SODA

3
Approach
SODA addresses the privacy-utility trade-off with Multi-Party Computation

4 SODA (confidential)
MPC – Secure Multi-Party Computation
jointly compute a function while keeping the (input) data private
animation source: Claudio Orlandi, Aarhus University

5
SODA
Enable practical privacy-preserving analytics on big data with MPC
• Advance technology & readiness level
• Provide insights into end-user barriers
and opportunities of MPC
• Position MPC in data protection and
privacy laws (GDPR)
• Enable MPC-based machine learning
• Demonstrate feasibility of MPC in healthcare

6
MPC in the Data Science Methodology
Focus MPC for now on modelling, inferences, some data preparation, etc
Pragmatically deal with (leaky) data analysis for exploration

7
MPC enabled machine learning and data analytics
• logistic regression
• neural networks
• CNN, MLP, federated
• ridge regression - 11M records!
• ID3 decision tree
• random forest / regression trees
• Burrows-Wheeler Transform
• inexact DNA string search
• logrank test

8
Predictive analytics – logistic regression
Train logistic regression model for chronic heart failure survival risk
18-11-2019
Multi-Party Computation
Trained
model
12 attributes
3000+ patients
mtcars
3 attr /
32 rows
size
days
heart failure
11 attr /
2476 rows
breast
cancer
9 attr /
588 rows
https://github.com/philips-software/fresco-logistic-regression-2

9
Descriptive analytics - Kaplan-Meier
Enable medical researchers to (privacy preserving) gain insight from data
Kaplan-Meier Survival Analysis – compare two classes, e.g. treatments
• Logrank test (chi2, p-value)
• KM curve
18-11-2019data of individual parties (remains private) combined data (never disclosed) aggregated data (privacy preserving)

10
Yes, MPC is practical for big data analytics in healthcare, but…
Selected use cases are feasible today
➔ use for high value with no alternative
Broad adoption requires
• ML library like R or Scikit Learn
• Mature frameworks
• Easier to use / program
• (More performance)

Diversity and Privacy: Opportunities and Challenges
The project WeNet – The Internet of Us
Author: Laura Schelenz, International Center for Ethics in the Sciences and Humanities, Tübingen, Germany

WWW.INTERNETOFUS.EU © 2019-2022 WeNet
WeNet in a nutshell
WeNet – Internet of Us
Start Date: 1st January 2019
Duration: 48 Months
Total budget: 6.5 M€
Coordinator: University of Trento
Prof. Fausto Giunchiglia
Final outcome: an online platform that
will empower machine mediated
diversity-aware people interactions
Website: https://www.internetofus.eu/
2

WeNet Main Objectives
• Development of the scientific foundations, methodologies and
algorithms empowering machine mediated diversity-aware
people interactions.
• Development of the WeNet online platform, integrating and
consolidating the implementation of the methods and tools developed
as part of Objective O.1
• Large scale Smart University pilot trials in 18 different Universities
and adult school sites and involve 10,000 participants.
• Community building, which will expand from the consortium to all
institutions worldwide
• Ensure a clear ethical guidance for the technology development
and the pilot activities
3
→ Computer Science,
Machine Learning
→ Systems Design and
Engineering
→ Social Science
→ Ethics

WeNet Consortium
4

Diversity in WeNet – A Taxonomy
Measure and harness diversity
• Ascribed / achieved attributes (static)
• Sensitive to cultural context (dynamic)
• Culture is hard to quantify. Then what?
Social Practice Theory
• Focus on behaviour patterns (routines)
• Social practice as a configuration of
material, meaning, and competence
6

Diversity – Opportunities and
Challenges
Ethical reflection of diversity in WeNet
7
Diversity can be leveraged to improve social interaction; but there are normative
implications; we should constantly reflect on norms, values, and assumptions
underlying our understanding of diversity.
Diversity has not only instrumental value, but also intrinsic value; it allows us
to affirm human rights and our commitment to a pluralist society.
Aligning diversity online can be difficult; it raises ethical concerns, e.g. the risk of
curtailing diversity too much and effectively excluding users.
Operationalizing and “breaking down” diversity of users into data points reduces a
user to a limited representation of self.

Privacy – Opportunities and
Challenges
Ethical reflection of privacy in WeNet
8
WeNet can promote privacy-enhancing technology; goal is to build new
technology that offers opportunities for data ownership.
Working with diversity means collecting large amounts of sensitive data. Data
subjects may be exposed to certain risks.
As a European research project, WeNet is bound by the General Data Protection
Regulation and must take data protection measures to ensure compliance with the
law.

THANK YOU!
WeNet project is funded by the EU’s Horizon2020
programme under Grant Agreement number 823783.
Email
info@internetofus.eu
Website
www.internetofus.eu
Twitter
@WeNetProject
GET IN TOUCH

A GDPR-compliant blockchain-based system with advanced
privacy-preserving solutions
Edwin Morley-Fletcher, Lynkeus

Big Data + Artificial Intelligence + Blockchain
= Game-Changer
Blockchain:
▪ Private permissioned blockchain based on Hyperledger Fabric
▪ Controlled access based on blockchain storage of permitted
transactions
▪ Off-chain storage of health data by multiple hospital repositories and
by individuals
▪ Metadata Catalogue allowing to safely inspect what health-data
are available on MHMD
▪ Dynamically and automatically managing consent by Smart Contracts
▪ An overall Privacy-by-Design and GDPR Compliance Assessment
completed by October 2019.

Artificial Intelligence (1)
“Visiting mode”: bringing the algorithms to the data
Secure computation, which permits running AI without disclosing neither data
nor algorithms, is performed through three tools:
▪ Homomorphic Encryption
Developed by TUB (with an obfuscation layer and the MORE encryption
scheme) and awarded the Innovation Radar Prize 2019 in the category
Industrial & Enabling Tech, with this statement:
“This solution implements a software framework for developing
personalized medicine solutions based on homomorphically encrypted
data and artificial intelligence (AI). The framework ensures that the data
remains private, and the performance of the AI models is not affected by
the encryption”.
▪

Artificial Intelligence (2)
▪ Secure Multiparty Computation
▪ Developed by Athena RC. SMPC allows a set of distrustful parties to
perform the computation in a distributed manner, while each of them
individually remains oblivious to the input data and the intermediate
results.
▪ Federated Deep Learning with an untrusted Black Box
▪ Jointly developed by Siemens Healthineers and Athena RC, using SMPC
and Differential Privacy.
▪ A secure Machine Learning request containing a model training pipeline is
distributed to the data providers along with a set of parameters, and is run
locally on an isolated environment.
▪ Local computation results are then securely aggregated using the MHMD
SMPC. This cycle is repeated to obtain many training iterations and/or
model validation.

Big Data (1)
▪ Health data remain silos-based
▪ Big Data and AI are difficult to apply in medicine, especially in rare diseases (30
million people affected in Europe), where data driven solutions are most needed.
▪ Effective data sharing is still the exception in healthcare.
▪ MHMD has investigated what contribution can come from recurring to Sharing
Synthetic Data
▪ Synthetic data are fully artificial data, automatically generated by making use of
machine learning algorithms, based on recursive conditional parameter
aggregation, operating within global statistical models.
▪ They typify the case of “personal data [which are] rendered anonymous in such a
manner that the data subject is not or no longer identifiable” (Recital 26 GDPR).

Big Data (2)
▪ Generating differentially-private synthetic data
▪ Differential privacy provides an until-now lacking mathematical foundation
to privacy definition:
▪ “Differentially Private Synthetic Data Generation is a mathematical theory,
and set of computational techniques, that provide a method of de-
identifying data sets—under the restriction of a quantifiable level of privacy
loss. It is a rapidly growing field in computer science”
(National Institute of Standards and Technology Differential Privacy Synthetic
Data Challenge 2019: Propose an algorithm to develop differentially private
synthetic datasets to enable the protection of personally identifiable
information while maintaining a dataset's utility for analysis)

Rigo Wenning, SPECIAL & Mosaicrown

Afternoon Session Agenda
Lessons learned from research and
technology for a human-centered Big Data
2
14.00
14.10
16.00
16.30
Break-out sessions
• Technology and Data Protection Law – how can
• Human-centric Big Data governance: responsible
Wrap-up
Closing remarks

In the break-out session you will have the possibility to answer questions provided
by the speakers and ask your own questions
HOW?
▪ Grab your phone
▪ Visit the URL provided by speaker (no need to register)
▪ Insert the code provided by the speaker
▪ Cast your vote & ask questions
We want your input!

Anonymisation of personal data leads to
inapplicability of the GDPR – Myth or Reality?
dr. jur. Anna Zsófia Horváth LL.M.
Research Assistant
SODA Project – University of Göttingen

2European Big Data Community Forum, 2019
Binary concept of data
under the current regulatory regime
Personal Data
GDPR
protection of privacy and
respect of the right to
informational self-
determination
facilitate the free flow of
personal data in the EU as part
of the Digital Single Market
Strategy
Non-Personal Data
Reg. 2018/1807
on a framework
for the free flow of
non-personal data
facilitate the free flow of
information as part of the
Digital Single Market Strategy

Anonymisation through the data lifecycle
data life span
acquisition analysis application

• GDPR does not define anonymisation / anonymous data
• Personal Data – Art. 4 Nr. 1
• any information relating to an identified or identifiable natural person
• data without personal reference falls out of the GDPR‘s scope
• Question of identifiability
• Absolute concept of identifiability
• No actual anonymity unless completely irreversible
• Relative concept of identifiability
• context-sensitive
• access to additional knowledge is necessary
Legal concept of anonymity I.

▪ Recital 26
▪ “To determine whether a natural person is identifiable, account should be taken of all the
means reasonably likely to be used, such as singling out, either by the controller or by
another person to identify the natural person directly or indirectly”
▪ costs
▪ time
▪ circumstances of any given processing
▪ Indirect identification, e.g. by ”singling out".
▪ Objective discretionary question
▪ threshold of re-identification risk
▪ no one-size fits all
Legal concept of anonymity II.

➢ Dual concept of anonymisation
1. Anonymisation as “processing of the data”
▪ falls under the GDPR
▪ all the obligations relating to processing of personal data apply
– principles and lawfulness of processing
– obligations of controller and processor
– data security provisions
2. Anonymity as ”state of the data”
▪ falls outside the scope of the GDPR
▪ with the reservations that there are no means reasonably likely to be used available
Solution Approach – conceptual level

Solution Approach – practical level
relative anonymity, removal of personal reference
Context-specific risk assessment
Application of appropriate methods of anonymisation and
technical and organisational measures
Regular review, continuous evaluation, comprehensible documentation

Questions to the audience
8
Do you agree with the statement that Big Data
and Privacy are not mutually exclusive?
1.
2.
3.
4.
Do you think that a holistic approach allowing a
”grey zone” between personal and non-personal
data would be practical?
Do you think that a “total and complete
anonymisation” is still possible to achieve?
Do you think data subjects should be informed
about what their anonymised – once personal –
data is going to be used for?

Milestones and results of SODA
▪ Milestones
▪ Deliverable on general legal aspects of
privacy-preserving data analytics
▪ Deliverable on specifically chosen use cases
▪ Consultations interdisciplinary and with
DPA’s
▪ events
▪ Main findings
▪ Duality of anonymisation
▪ Legally compliant data processing can be
achieved through the structured
implementation of technical and
organisational measures.
▪ Big Data and Privacy are not mutually
exclusive.
ANONYMISATION OF PERSONAL DATA LEADS TO THE INAPPLICABILITY
OF THE GDPR – MYTH OR REALITY?
2017
2018
2019
SODA TIME LINE
Interdisciplinary consultations
Deliverable on general legal
aspects
Presenting the SODA pilot
cases at Medical
Informatics Europe
GDPR Commentary
Deliverable on legal evaluation
of pilot cases
Interdisciplinary consultations
and dissemination event
2020
9

dr. jur. Anna Zsófia Horváth LL.M.
Research Assistant
University of Göttingen
Dipl. Jur. Lukas Dalby
Research Assistant
University of Göttingen
Paul Koster
Project Manager
Philips
Thank you for your attention!
www.soda-project.eu/contact

Data Privacy Vocabularies to fulfil GDPR
Transparency & Compliance Checking requirements
Author: Eva Schlehahn, Unabhängiges Landeszentrum für
Datenschutz Schleswig-Holstein, Germany

Necessary precondition to enable:
 Valid consent (Art 4 (11) GDPR),
 Data subject’s rights (e. g. access,
rectification…),
 Enforcement of data handling
policies
 Demonstration of compliance
Scope: Data, systems, processes
Necessity of transparency from European data protection law
perspective

GDPR:
 Art. 12 (1) GDPR:
 The controller may provide information
by electronic means.
 Art. 21 (5) GDPR:
 When using information society services,
the data subject may exercise the right to
object by automated means using
technical specifications.’
 Recital 32 GDPR:
 Possibility of using electronic means and
technical settings for information society
services for giving consent.
Revelance of diverse case law, DPA
decisions & upcoming ePrivacy Reg.
 Planet 49 CJEU judgment
 Current cookie banners + tracking via opt-
out NOT ok => consent needed
 Berlin DPA fine against Deutsche
Wohnen (14,5m €, Oct 30th 2019)
 GDPR infringement bc IT system did not
foresee deletion concept & erasure function
for data
 Current ePrivacy Regulation draft:
 Requirements in flux, software settings for
giving consent are now mentioned in Recital
20a -> might still change
Legal foundation of a technical approach for consent management
and policy enforcement

https://www.w3.org/community/dpvcg/
Currently 58 participants:
Stakeholders from industry, research,
government...
Goal: Development of a taxonomy of privacy
terms, esp. with regard to GDPR. Examples
are taxonomies of:
 personal data categories,
 different data processing purposes,
 events of disclosures,
 consent status/modalities
 types of processing operations.
Community building and standardisation effort:W3C Data Privacy
Vocabularies and Controls Community Group (DPVCG)

Data protection focus for technical specifications I: Policies
entailing the necessary information

 Categories of personal data
 E. g. master record data, location and movement data, call records, communication metadata, log
file data.
 E. g. special categories of personal according to Art. 9 GDPR
 racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership,
genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning
health, data concerning a natural person's sex life or sexual orientation
 Support documentation of
 processing purpose(s) + legal ground
 consent (evtl. incl. versioning) and current status, e. g.
 given – if yes, specific whether explicit or implicit
 pending / withheld
 withdrawn
 referring to the personal data of a minor
 etc...
Data protection focus for technical specifications II

 Support documentation of
 Involved controller(s)
 Involved processor(s)
 Storage location and cross-border data transfers, involved countries
 Location of data centre where processing & storage occurs
 Location of controller establishment
 Relevant could be:
– Data transfer within the European Union
– Data transfer to a third country with basis for compliance acc. to Art. 44 et seq. GDPR (treating them as ‘EULike’, i. e. adequacy
decision, appropriate safeguards, binding corporate rules), where possible with link documenting the latter, e. g. to the
Commission’s adequacy decision or the BCR
– Other third country
 Suggestion: Use country codes (e.g. TLD, ISO 3166) - allows for later adaption in case of legal changes
 Suggestion: Incorporate also rules that exclude data transfers to some jurisdictions (‘notUS’, ‘notUK’)
Data protection focus for technical specifications III

 Enforce rules how to handle the data, e. g.
 User/access activity allowed, like read-only, write, rectify, disclose, deletion
 Anonymize / pseudonymize / encrypt
 Notify [define notification rules e. g. towards data subject, eventually with
predefined action time]
 Time for deletion – ideas could be:
 delete-by_ or delete-x-date_month_after <event>
 no-retention (no storage beyond using once)
 stated purpose (until purpose has been fulfilled)
 legal-requirement (storage period defined by a law requiring it)
 business practices (requires a deletion concept of controller)
 Indefinitely ( e. g. for really anonymized data, public archives...)
Data protection focus for technical specifications IV

Project website: https://www.specialprivacy.eu/
The project SPECIAL (Scalable Policy-awarE linked data arChitecture for prIvacy,
trAnsparency and compLiance) has received funding from the European Union’s
Horizon 2020 research and innovation programme under grant agreement No
731601 as part of the ICT-18-2016 topic Big data PPP: privacy-preserving big data
technologies.
More info and funding notice

Thank you / contact details
12
Author of this presentation: Eva Schlehahn
Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein
(ULD, Independent Centre for Privacy Protection Schleswig-
Holstein)
Email: uld67@datenschutzzentrum.de
Twitter: @eschlehahn
SPECIAL project technical/scientific contact: Sabrina Kirrane
Vienna University of Economics and Business
Email: sabrina.kirrane@wu.ac.at
SPECIAL project administrative contact: Jessica Michel
ERCIM / W3C
Email: jessica.michel@ercim.eu
SPECIAL project website: https://www.specialprivacy.eu/

Issues discussed
• For showing GDPR compliance, what’s the most important IT system feature
needed?
• Who would benefit the most from a data privacy
vocabulary/ontology/taxonomy?
• What should such a data privacy vocabulary, i.e taxonomy cover?
 How SPECIAL addressed these issues + how YOU can use these results:
 Deliverables, prototypes, ontologies & vocabularies, code repository, platform
demonstrators, UI demos ALL Open Access: https://www.specialprivacy.eu/
 Everyone can engage in the W3C DPCG: https://www.w3.org/community/dpvcg/
Data Privacy Vocabularies to fulfil GDPR Transparency & Compliance Checking requirements
RECAP & WRAP UP

Why have we preferred to opt for sharing synthetic data and for
computation “bringing the algorithms to the data”
Edwin Morley-Fletcher, Lynkeus

The “visiting mode”
▪ I already mentioned this morning the three tools
developed by MyHealthMyData for providing secure
computation in ways which permit running AI without
disclosing neither data nor algorithms:
▪ Homomorphic Encryption
▪ Secure Multiparty Computation
▪ Federated Deep Learning with an untrusted Black Box
▪ I will not go back on this, but I will focus on how to
guarantee a secure “publishing mode”.

The inconvenient truth
As already stated this morning:
1. Health data remain silos-based
2. Big Data and AI are difficult to apply in
medicine, especially in rare diseases (30
million people affected in Europe), where
data driven solutions are most needed.
3. Effective data sharing is still the exception in
healthcare

How easy and risky is it to share health data?
▪ Where consent applies, MHMD data is made available for download.
▪ What happens after data download is not under control of the MHMD
blockchain.
▪ Of course, the risk of data breaches increases with the number of copies
shared
▪ According to various circumstances of trust, and privacy-preserving needs,
MHMD health data can be published either as pseudonymous or
anonymous data.
▪ A semi-automated tool, AMNESIA, is used in MHMD for providing the
necessary pseudonymisation or anonymisation.

A new anonymization paradigm
▪ Synthetic data were ﬁrst conceptualized in 1993 as a way to
replicate the statistical properties of a database without
exposing the identifiable information it contained.
▪ Methods to produce them vary, but the underlying principle
is that values in the original database are algorithmically
substituted with those taken from statistically equivalent
distributions, to create entirely new records.
▪ In medicine they have been successfully used to publish
sensitive data to the general public, to train machine learning
tools and to conduct clinical research.

Breaking the link between private information
and the data’s information content
▪ Synthetic data are artificially generated data sets which have
the ability to jump-start AI-development in areas were data
are scarce or too expensive to obtain, such as the biomedical
sector.
▪ As artificial replicas of original data sets, synthetic data have
shown the ability to replicate all the statistical features of
original ones and to support research and development
activities in a variety of applications in a compliant fashion.

Synthetic Data are a “Columbus Egg” in the
GDPR environment
They are a crucial tool in healthcare.
▪ They retain significant information usefulness.
▪ They do not allow any personal re-identification
of original individual datasets.
▪ They do not fall within the scope of the GDPR:
▪ They are freely tradeable.

ARTICLE 29 DATA PROTECTION WORKING PARTY
Opinion 05/2014 on Anonymisation Techniques
Is synthetic data processing subject to GGPR rules?
▪ “If the data controller wishes to retain … personal data once the
purposes of the original or further processing have been achieved,
anonymisation techniques should be used so as to irreversibly
prevent identification”
▪ “Accordingly, the Working Party considers that anonymisation as an
instance of further processing of personal data can be considered
to be compatible with the original purposes of the processing but
only on condition the anonymisation process is such as to reliably
produce anonymised information”

Generative Adversarial Networks
▪ Synthetic data can be generated by a range of systems including naive Bayes models,
generative adversarial networks (GAN and infoGAN) or statistical shape analysis, (for
imaging data).
▪ The selection process starts from user/customer requirements and specifies upfront
required data reliability.
▪ The selected model is then configured to generate intended data types.
▪ After the generation, a discriminator assesses original vs. synthetic set similarity,
indicating if the desired reliability score was met.
▪ An interpretation tool allows to pinpoint single sources of discrepancies between
original and synthetic, and to iteratively improve the generator's parametrization.
▪ This direct feedback loop design has shown to drastically improve efficiency of and
control over the generation process.

Discriminator and data curation
▪ Discriminators commonly used for data quality control assess the overall
statistical resemblance of two sets, but they cannot identify underlying
reasons for discrepancies.
▪ New methods allow to weight each original variable in the generation process,
thus supporting detailed diagnostics and direct, ongoing improvements to the
generative pipeline.
▪ Interpretation systems, by analysing resulting data structures, can identify
gaps, skewed value distributions, or spurious values in the original data,
allowing to address a variety of correction, formatting or normalization issues,
which are wide-spread in clinical data sets, and can substantially limit their
values.

Synthetic Data enhanced features
▪ Differential privacy provides an until-now lacking mathematical foundation to
privacy definition.
▪ Adding appropriate differential privacy features can assure non-reidentification
even on whole population statistics.
▪ A scalable quality-control system allows to generate synthetic data being even
more informative and robust than the original ones.
▪ Quality control and iterative approaches can lead to statistically equivalent sets,
at a vastly lower cost.
▪ Such methods can also enrich the synthetic set with more statistical features
and, in the case of synthetic images, with automatically placed annotations to
then train diagnostic image recognition systems.

Differential Privacy
▪ DP is a property of the algorithm/output, not of the
data
▪ If each Mi is ει-DP, then M = (M1,…,Mk) is (Σει)-DP
▪ If A is an ε-DP output, then f(A) is also ε-DP for any
function f()
▪ DP eliminates potential linkage attacks.

Questions to the audience
13
What is the “visiting mode” approach?1.
2.
3.
Synthetic data can be made fully anonymous?
What is differential privacy?

Human-centric big data governance:
responsible ways to innovate privacy-preserving technologies
Dr. Karolina La Fors
Post-doc researcher
Centre for Law and Digital Technologies (eLaw) Leiden University
E-SIDES project

Agenda of break-out session
2
Presentation of e-SIDES
Presentation of WeNet
Presentation of BDVe
(Position paper on Data Protection in the era of
Artificial Intelligence)

e-SIDES lessons for the responsible innovation of
privacy-preserving technologies in the era of AI
▪ EU-funded Coordination and Support Action (CSA)
complementing Research and Innovation Actions (RIAs)
projects on privacy-preserving big data technologies
▪ Consortium members: IDC (Italy); Fraunhofer ISI
(Germany); eLaw - Leiden University (Netherlands)
▪ Period: January 2017 – December 2019

Challenges for Privacy-preservation in the era of AI
4
▪ Expanding Impact of
Data Breaches
▪ Human Biases
▪ Procedural Biases
▪ Discrepancies in professional
understandings & computational
definability of privacy
▪ Business models reliant on
▪ human behavioural data
“Trustworthy AI has three components[…]:
1. lawful, complying with all applicable laws and regulations;
2. ethical, ensuring adherence to ethical principles and values; and
3. robust, both from a technical and social perspective, since, even
with good intentions, AI systems can cause unintentional harm.”
(High-Level Expert Group on AI)
“Trustworthiness is key enabler of responsible competition”
the role of PPT to shape such trustworthiness is perhaps more vital
than before in the era of AI.

AI Expands the Privacy Impact of
Data Breaches
“2018 saw more than 6,500 data breaches, exposing
a staggering 5 billion compromised records.”
• The larger the big data stakeholder within the
analytic chain that endures a data breach, the
more citizens’ privacy becomes impacted.
• The role of PPT increasingly becomes a
cybersecurity tool.
https://threatpost.com/ripple-effect-large-enterprise-data-breaches/150041/

AI Amplifies the Privacy Impact of
Human & Procedural Biases
• Unclear how the absence of bias should look like
& how that should be computed.
• Discrepancies between:
▪ definitions of privacy
▪ understanding legal compliance
▪ business models reliant on
human behavioural data
• The future role of PPT must also be tuned
towards privacy threats from biases by
solving discrepancies.
Apple Face-Recognition Blamed by N.Y.
Teen for False Arrest (29/04/19), Van Voris, B.
https://www.bloomberg.com/news/articles/2019-04-22/apple-face-recognition-
blamed-by-new-york-teen-for-false-arrest
Amazon’s Face Recognition Falsely Matched 28
Members of Congress With Mugshots
(26/07/2019), Snow, J.
https://www.aclu.org/blog/privacy-technology/surveillance-technologies/amazons-
face-recognition-falsely-matched-28

1) Identify ethical, legal, societal
and economic issues
2) Identify existing technologies
3) Assess existing technologies
4) Conduct gap analysis
5) Identify design requirements
development
barriers
8) Define Community Positions
& Make recommendations
What is e-SIDES doing…?
WHY?
WHAT? HOW?
▪ Reach a common vision for an ethically sound approach to big data
and facilitate responsible research and innovation in the field
▪ Improve the dialogue between stakeholders and the confidence of
citizens towards big data technologies and data markets
▪ Review of articles (scientific &
professional)
▪ Liaise with researchers, business
leaders, policy makers and civil
society through community
events
▪ Provide an Internet-based
meeting place for discussion,
learning and networking
▪ Provide a collective community
position paper with choice
points

Classes of Privacy-Preserving Technologies
Anonymisation
Encryption
Deletion
Sanitisation
Multi-party
comput.
Access control
Policy
enforcement
Accountability
Transparency
Data provenance
Access &
portability
User control

Ethical, Legal, Economic & Social Implementation Barriers to
Privacy-Preserving Big Data Technologies
1) EU-US management models & attitudes towards privacy
differ (e.g.: data utility vs privacy)
2)Budget limitations & cost effectiveness of PPT
3) Bridging cultural differences challenge due to privacy
differing expectations & unpredictable outcomes of
analytics
4) Consumer mentality change and acquisition of new skills
(e.g.: tech. and privacy savviness)
5) PPTs need periodic assessment with respect to use &
impact
Reasons for societal, economic &
technical barriers
Reasons for legal implementation barriers
Based on desk research we distilled 4 reasons:
▪ 1) regional differences
▪ 2) Sensitive data
▪ 3) Liability and responsibility for the effects
of big data-based decisions

▪ e-SIDES Final Key Outputs
- Community Position Paper (CPP)
- Recommendations

3) Challenges
1) Introduction
2) Stakeholders
4) Opportunities
11
What is the CPP...
A document on responsible data-
driven innovation written by and
for the big data community
Structure
5) Conclusion

▪ Indicates where action is needed
▪ Documents good practices
▪ Provides a basis for decision
making
▪ Drives a lively debate
12
Source:https://www.k12insight.com/trusted/one-teacher-empowers-students-handshake
...and how
does it help?

13
What do we already have...
Challenges Opportunities
3) Issues related to legal
compliance and ethics
1) Differences in attitudes and
contexts
2) Empowerment vs.
cognitive overload
4) Difficulties of conducting
assessments
3) Reference points of
accountability
1) Awareness raising and
transparency
2) Tools of accountability
4) Bodies and mechanisms
of oversight

14
Contribute by end of November!
▪ Editors constantly integrate
suggestions into the paper
▪ The community is informed about
significant changes
▪ Anonymous suggestions
are possible
▪ To be named as contributor,
sign in with a Google Account

Thank you!
Questions?

Data for Diversity-Aware Technology: Ethical Considerations
Insights from the project WeNet – The Internet of Us
Author: Laura Schelenz, International Center for Ethics in the Sciences and Humanities, Tübingen, Germany

WWW.INTERNETOFUS.EU © 2019-2022 WeNet 2
Ethical Challenges
Data for Diversity-Aware Technology
Operationalization
of diversity
Collection of
large amounts of
sensitive data
Protection of
privacy of data
subjects, data
minimization
Representation of
minorities in the
dataset
Pattern inference
through machine
learning
Diversity-
awareness in
tech
development
teams

Diversity-aware technology…
• leverages the diversity of technology users to their advantage
• Diversity helps achieve a «good» outcome = diversity as instrumental value
• can help reduce computer-mediated bias against certain social groups
• Diversity realizes the goal of inclusion and representation of minorities = diversity as
instrumental/intrinsic value
• mirrors the diversity of its users → how can we operationalize diversity?
4

What kind of diversity?
• WeNet understands diversity in terms of
social practices
= routine behavior
e.g. cooking, riding a bike
• Large amounts of sensitive data: eating
habits, transportation, shopping, use of
university buildings, student performance
5
Material
Meaning
Social practices
Competence
Figure 1: Operationalization of social practices in WeNet

Ethical challenges
• the need to collect massive amounts of sensitive data
→data minimization, data protection, and privacy rights
• how to account for minorities in the dataset
→implicit bias and constraints of category-building
• how to account for minorities in computer models
→machine learning and statistics
7

Ethical challenge #dataprotection
Misuse, loss,
hacking of
information
particularly risky
if sensitive data
is involved
8
Diversity-aware technology poses risks to data subjects
Easy
identification of
data subjects
the rarer and more
“dispersed” the
data points, the
easier it will be to
trace the
information back to
the data subjects
Easy to
circumvent
Article 5, data
minimization
risk in claiming the
need to collect
vast amounts of
data for diversity-
aware technology
Discrimination of
marginalized
groups
the more data that
is “out in the
open”, the more
data can be used
against a person

Ethical challenge #equalrepresentation
Implicit bias
stems from “schemas” that we internalize at a young age and
that are activated subconsciously; schemas help interpret a
situation (cf. Sally Haslanger)
9
Implicit bias = scripts that prevent us from recognizing “full” diversity
Example from WeNet: social practice of “working”
contract-based employment vs. reproductive labor
➢ social practices are coded to ideas of gender
➢ implicit bias in the operationalization of diversity
may result in the marginalization of female users
Prejudice
Racism/Sexism
→ Structural Injustice
George (user)
• student assistant (5hrs
a week) at a local NGO
• has a one-year
contract
• wants to gain
experience and extra
money
Aisha (user)
• takes care of her
grandma every
weekday between 7
and 9 pm (when her
mother has to leave for
her night shift)
• annoyed because
cannot meet friends

Ethical challenge #algorithmicjustice
Machine learning and pattern
recognition
• Diversity represented in the
dataset may be further reduced
by machine learning methods
• Algorithms may be optimized for
the majority of the population but
not minorities
10
Computer models built from data sets must be diversity-aware
contract-
based
work
Student work
home-
based
temporary
work
contract-
based
work
Student work
formal
employment
home-
based
temporary
work
Student workchildcare

Recommendations
✓ Diversity-aware technology needs interdisciplinary cooperation
✓ Develop diversity-aware technology that leverages diversity for a “good” outcome
and ensures non-discrimination
✓ Protect data subjects’ privacy, explore innovative solutions that help represent
diversity by collecting less data
✓ Develop a data collection plan that explicitly seeks to reduce bias in the dataset;
answer the question “How do we account for minorities in the dataset in a way that
properly represents them?”
✓ Test how the computer models fare with regard to fairness; answer the question
“How do our models affect minorities and is there disparate treatment resulting from
our technology?”
✓ Increase diversity-awareness in tech development teams: provide training to
enhance sensitivity to questions of gender, race, and class discrimination
12

THANK YOU!
WeNet project is funded by the EU’s Horizon2020
programme under Grant Agreement number 823783.
Email
laura.schelenz@uni-tuebingen.de
Website
www.internetofus.eu
www.izew.uni-tuebingen.de
Twitter
@WeNetProject
@LauraSchelenz
GET IN TOUCH

Data Protection in the era of Artificial Intelligence
Charlotte van Oirsouw, TNO, BDVe
1

Data Protection in the era of Artificial
Intelligence
https://bit.ly/2QfBsoC

BDVA: what is it and
what does it do?
▪ Building Big Data
Ecosystem
▪ Support EC research
programs
▪ 50% industry, 50%
academia
▪ 42 projects, +250 partners

Position paper focussed on technical solutions &
trends in Privacy-Preserving Technologies
▪ Why? To give a podium to PPT developments & to highlight
challenges
▪ For which audience? EC, Policymakers, SMEs, the world…
▪ Who is talking? Experts from several h2020 research
projects
▪ Why focus on technological solutions? To break
tech/society dichotomy in data-thinking and to show
alternatives (to big tech from US)

How to protect personal data in an
era of big data analysis and AI?
(and is it still about personal data?)
What is the current state of art
when it comes to PPTs
What do projects see as main
challenges and trends in PPTs
How can research into -and uptake
of- PPTs be stimulated?
Research Questions
How can regulators and
policymakers help?

Classifying harms and risks
▪ From the perspective of the end-user, data actor, data –
driven object, society at large? Economic, social, scientific
harm, inferred harms, harms from proxy?Harms based on
inferred data – boundary of personal data?
▪ Qualitative vs quantitative ‘proofs’ of risks and harms
▪ Blurring boundary between privacy harms and safety risks
▪ Main challenge for PPTs – scaling and adoption

Classifying solutions
▪ Solutions are either data-centred, actor-centred or risk-based
▪ ISO: privacy preserving techniques & privacy-preserving models.
It also mentions synthetic data as a technique for de-
identification (which is debatable)
▪ Hoepmans’ Blue Book: data-related vs process –related
mitigating measures.
▪ e-SIDES classification has been mentioned above
▪ Summarizing: there is no 1 way to classify PPTs

Giving data control back to users. See https://decodeproject.eu/
Trend 1: end user back as focus point

Sticky policy walkthrough. SPECIAL project. See https://www.specialprivacy.eu/flowchart/157-flowchart-01
Trend 2: Automation of policy for big data

MPC visual. TNO. See https://bit.ly/2PEV9X2
Trend 3: secure data analytics

https://decodeproject.eu/
https://www.specialprivacy.eu/
https://smoothplatform.eu/
https://www.soda-project.eu/
https://restassuredh2020.eu/
https://privacypatterns.eu
https://pripare.aup.edu/
https://www.papaya-project.eu/
http://www.myhealthmydata.eu/
https://www.pdp4e-project.eu/
http://musketeer.eu/
https://mosaicrown.eu/
https://www.defendproject.eu/
Some key topics among h2020 ICT projects
GDPR COMPLIANCE MPC & SECURE BDA USER CONTROL AUTOMATING
COMPLIANCE
https://www.specialprivacy.eu/

Recommendations for policy
1) Create a (continuously updated) overview of privacy
challenges caused by BDA and AI
2) Support R&D into technical solutions - keeping up with
social, ethical and legal developments
3) Supporting uptake of privacy-preserving technologies
4) Develop, offer and support regulatory sandboxes in which
new data services can be tried and tested

https://www.big-data-value.eu/
http://www.bdva.eu/
http://www.bdva.eu/downloads
BDVA website
BDVA position papers
Webinars, events & marketplace
Contact & links

Break-out sessions wrap-up

Technology and Data Protection Law – how can
Recap & Wrap up

Questions for the audience:
• For showing GDPR compliance: Most important IT system features?
• Metadata related to details of anonymization
• Algorithmic transparency
• Logs of data accesses and transmissions
• An taxonomy/data privacy vocabulary for the processing operation in place
• Enforceable data handling policies
DATA PRIVACY VOCABULARIES TO FULFIL GDPR TRANSPARENCY &
COMPLIANCE CHECKING REQUIREMENTS – 1/2

▪ How SPECIAL addressed these issues + how YOU can use these results:
▪ Deliverables, prototypes, ontologies & vocabularies, code repository, platform
demonstrators, UI demos ALL Open Access: https://www.specialprivacy.eu/
▪ Everyone can engage in the W3C DPCG: https://www.w3.org/community/dpvcg/
▪ W3C Community Group Report published at: https://www.w3.org/ns/dpv
DATA PRIVACY VOCABULARIES TO FULFIL GDPR TRANSPARENCY &
COMPLIANCE CHECKING REQUIREMENTS – 2/2

• Do you agree with the statement that Big Data and Privacy are not mutually
exclusive? -> Majority said Big Data + Privacy are NOT mutually exclusive!
• Do you think that a holistic approach allowing a ”grey zone” between personal
and non-personal data would be practical?
• Other issues discussed:
• Legal vs. technical understanding + requirements regarding anonymity
• Relative concept of identifiability
• Encrypted data anonymized?
ANONYMISATION OF PERSONAL DATA LEADS TO THE INAPPLICABILITY OF
THE GDPR – MYTH OR REALITY? – 1/2

▪ How SODA addressed these issues + how YOU can use these results:
▪ Primary objective: developing a GDPR-compliant, secure MPC system for the
healthcare domain
▪ Deliverables demonstrating that de-identification reduces risks and enhances
privacy https://www.soda-project.eu/deliverables/
▪ Identified legal challenges:
▪ anonymisation and removal of personal reference
▪ determination of purpose and legitimate basis
▪ special provisions for sensitive data
▪ application of technical and organisational measures
ANONYMISATION OF PERSONAL DATA LEADS TO THE INAPPLICABILITY OF
THE GDPR – MYTH OR REALITY? – 2/2

• What is the „visiting mode“ approach?
• Synthetic data can be made fully anonymous?
• What is differential privacy?
• Other issues discussed:
• Need of personal data for training models, how to get consent for that?
• What of bias in the original data? There can be some mitigation techniques to
remove bias?
• Sharing of data by several hospitals? Obstacles, alignement?
• Group rights on the data? How to handle this?
WHY HAVE WE PREFERRED TO OPT FOR SHARING SYNTHETIC DATA AND
FOR COMPUTATION “BRINGING THE ALGORITHMS TO THE DATA” – 1/2

▪ How MyHealthMyData addressed these issues + how YOU can use these
results:
▪ MHMD has some know-how about synthetic data as a possibility to use in the
health sector for data analysis
▪ Also, know-how about anonymization techniques (differential privacy)
WHY HAVE WE PREFERRED TO OPT FOR SHARING SYNTHETIC DATA AND
FOR COMPUTATION “BRINGING THE ALGORITHMS TO THE DATA” – 2/2

Human-centric big data governance: responsible
Recap & Wrap up

Points for discussion
➢ GDPR fitness for AI
➢ Oversight bodies
➢ Combination of legislations needed, use the legislation we have, harm-
minding approach
➢ Diversity-aware technology development -> challenges -> diversity in
datasets machine reduces diversity
➢ Diversity-aware earlier approaches: internationalization practices (bridging
Arabic, Chinese and English language differences),
➢ Positive and negative discrimination

➢ Regulatory sandboxes, ePrivacy regulation
➢ Contextual vs generic interventions: how diverse or generic can/should
PPTs be?
➢ Market oriented sanctions
➢ Exclusion from ecosystem if not value-sensitive
➢ Ethics inserted into education
➢ Enforcement
➢ If you want to preserve ethics and rules (formalized ethics) and one
player comes in than your whole rule system is challenged.
Points for discussion

Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019 Slide deck

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019 Slide deck

Similar to Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019 Slide deck (20)

More from IDC4EU

More from IDC4EU (6)

Recently uploaded

Recently uploaded (20)

Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019 Slide deck