Governance compliance

www.kensu.io
GOVERNANCE AND COMPLIANCE
1
Recipes for GDPR-friendly Data Science

www.kensu.io
ANDY -|- KENSU
2
Andy Petrella - Founder @ Kensu
Maths MSc / Computer Science MSc
10+ years in data computing (science?)
http://kensu.io Analytics, AI Governance
2
Analytics
Governance
Perform
ance
Compliance

www.kensu.io
a. Data Privacy
b. Risk
c. Ethic
I. COMPLIANCE
x. How to guarantee compliance

www.kensu.io
A. DATA PRIVACY
Information privacy, also known as data privacy or data protection, is the
relationship between the collection and dissemination of
a. data,
b. technology,
c. the public expectation of privacy,
d. legal
and political issues surrounding them.[1]
Privacy concerns exist wherever personally identiﬁable information or
other sensitive information is collected, stored, used, and ﬁnally
destroyed or deleted – in digital form or otherwise.
Improper or non-existent disclosure control can be the root cause for
privacy issues.
https://en.wikipedia.org/wiki/Information_privacy

www.kensu.io
Each controller/processor shall maintain a record of
processing activities under its responsibility (cf. Art. 30).
That record shall contain many information including:
• The purposes of the processing
• A description of the categories of data subjects and of
the categories of personal data 
 
etc.
A. DATA PRIVACY
GDPR

www.kensu.io
A. DATA PRIVACY
Prior to collecting Californian’s personal data, businesses
must disclose in their privacy policy: 
“the categories of personal information to be collected and
the purposes for which the categories of personal
information shall be used” 
with any additional uses requiring notice to the
consumer
CaCPA: California Consumer Privacy Act of 2018

www.kensu.io
B. RISKS
Risks are present wherever data is used:
- Managing business risks with data
- Building new data business
https://www.eiuperspectives.economist.com/sites/default/files/RetailBanksandBigData.pdf

www.kensu.io
B. RISKS
- Retail worry about credit risk: 
imbalance between the sizes of classes (defaulters <<< non-defaulters)
generates overly optimistic scores… 
- Commercial focus on market risk: 
VaR and variations requires important backtesting 
- Investment are concerned about operational risk: 
Just think about BCBS… govern, monitor, control!
Business’ risks… risks

www.kensu.io
B. RISKS
Intrinsic
https://unicsoft.net/risks-data-science-project/

www.kensu.io
B. RISKS
Intrinsic
Loosers Records stolen
JP Morgan Chase 76,000,000
Evernote 50,000,000
eBay 145,000,000
Target 70,000,000
LinkedIn 117,000,000
Yahoo 1,000,000,000

www.kensu.io
B. RISKS
Intrinsic
Improper Analytics
One tiny mistake can ruin the whole project.
Low Data Quality
Even most advanced analytics methods fail with incorrect data

www.kensu.io
C. ETHIC
Data Ethics refers to systemising, defending, and recommending
concepts of right and wrong conduct in relation to data, in particular
personal data.
Data ethics is different from information ethics because the focus of
information ethics is more concerned with issues of intellectual property.
https://en.wikipedia.org/wiki/Big_data_ethics
While data ethics is more concerned with collectors and
disseminators of structured or unstructured data such as
data brokers — governments — large corporations.

www.kensu.io
C. ETHIC
WAT?
http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf
Data ethics can be defined as the branch of ethics that studies and evaluates moral
problems related to
data
- generation
- recording
- processing
- dissemination
- sharing and use
algorithms
- artificial intelligence
- artificial agents
- machine learning
- robots (well…)
practices
- responsible innovation
- programming
- hacking
- professional codes
in order to formulate and support morally good solutions

www.kensu.io
C. ETHIC
WAT?
http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf
Data ethics can be defined as the branch of ethics that studies and evaluates moral
problems related to
data
- generation
- recording
- processing
- dissemination
- sharing and use
algorithms
- artificial intelligence
- artificial agents
- machine learning
- robots (well…)
practices
- responsible innovation
- programming
- hacking
- professional codes
in order to formulate and support morally good solutions
E
T
H
I C
?
E
T
H
I C
?
E
T
H
I C
?

www.kensu.io
C. ETHIC
WAT?
The ethics of data focuses on ethical problems posed by
the collection and analysis of large datasets and on
issues ranging from the use of big data in
- biomedical research and social sciences
- proﬁlings
- advertising
- data philanthropy
- open data

www.kensu.io
C. ETHIC
WAT?
The ethics of algorithms addresses issues posed by the
increasing complexity and autonomy of algorithms
broadly understood, especially in the case of machine
learning applications.
Crucial challenges include moral responsibility and
accountability of both designers and data scientists with
respect to unforeseen and undesired consequences as
well as missed opportunities.

www.kensu.io
C. ETHIC
WAT?
The ethics of practices addresses the pressing questions
concerning the responsibilities and liabilities of people
and organizations in charge of data processes, strategies
and policies, including data scientists’ work to ensure ethical
practices fostering the protection of the data subject
rights.

www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf

www.kensu.io
C. ETHIC
Non-discrimination
Right to explanation

www.kensu.io
C. ETHIC
Non-discrimination
1. Article 21 of the Charter of Fundamental Rights of the
European Union
2. Article 14 of the European Convention on Human Rights
3. Articles 18-25 of the Treaty on the Functioning of the
European Union.

www.kensu.io
C. ETHIC
https://www.miamiherald.com/news/nation-world/national/article89562297.html
Discrimination… can be unintended

www.kensu.io
C. ETHIC
https://www.miamiherald.com/news/nation-world/national/article89562297.html
Discrimination… can be unintended
“Ingress players, like the database volunteers, appeared to
skew male, young and English-speaking, […].  
Though the surveys did not gather data on race or income
levels, the average player spent almost $80 on the Ingress
game […] suggesting access to disposable income.”

www.kensu.io
C. ETHIC
Proﬁling is inherently discriminatory
 
Data subjects are grouped in categories and decisions
are made on this basis
Plus, as said, machine learning can reify existing patterns
of discrimination 
 
Consequences: Biased decisions are presented as the
outcome of an “objective” algorithm.

www.kensu.io
C. ETHIC
Standard supervised machine learning algorithms are based
on discovering reliable associations to make predictions.
There is no concern for causal reasoning or “explanation”

www.kensu.io
C. ETHIC
For Burrell in How the machine “thinks”: Understanding opacity in
machine learning algorithms, there are three barriers to transparency
1. Intentional hiding of the decision procedures by corporations
2. Code sources are overly complex
3. Machine learning can reason at very high dimensions, humans’
brains don’t

www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
a. Monitoring
b. Automated Reporting

www.kensu.io
In (data) engineering, processes have improved to
satisfy the need for stability, quality and compliance
by introducing:
1. logging
2. testing
3. continuous deployment
Monitoring

www.kensu.io
Data science projects are slightly different in nature than
pure engineering projects.
In that, most issues may come from the dynamicity of the
experimentations and the volatility of the data.
Such that, monitoring becomes key to AUTOMATED
compliance!
Monitoring

www.kensu.io
For data project, monitoring is about:
- what/how data are used (e.g. data lineage, products, …)
- what/how models are build (e.g. methods, metrics, …)
- where/how data products are used (e.g. marketing, fraud, …)
Monitoring

www.kensu.io
Pursuing the parallel with engineering:  
CI/CD and Q/A are similar to our current compliance needs!
Automated Reporting
The automation of compliance can be approached with a
set of rules to estimate the level of risks and to limit the
efforts to only actionable events.
Reporting is mandatory for compliance. 
Reports can be generated from the conjunction of
monitored activities and established rules dictated by
regulations.

www.kensu.io
The Kensu way: Data Activity Manager
Monitor
Automated Registry Report

www.kensu.io
a. Data in the Wild
b. Effects of contraints
II. GOVERNANCE
x. How to govern

www.kensu.io
A. DATA IN THE WILD
Working on data is perceived as the Wild West.
• Experimentations in highly dynamic environments (e.g. notebooks)
• Local copy or duplication of datasets
• Creation of intermediate dumps (models, prepared datasets)

www.kensu.io
B. EFFECTS OF CONTRAINTS
Adding constraints (policies) to govern is a classic…
So, we would have the following examples:
- predeﬁne the set of needed data
- list methods to be used
- create documents… maintain them
Rules, laws, …

www.kensu.io
B. EFFECTS OF CONTRAINTS
The consequences of such constraints are:
• Lack of freedom
• Anonymization
• what about marketing use case
• what is the reliability of the process
• anonymisation is actually itself a process to be listed!
• poor/slow reactivity to market changes (performance drop)
… might not be best

www.kensu.io
X. HOW TO GOVERN
For compliance reasons, we have to introduce monitoring.
Monitoring data opens new governance doors:
- Govern data activities with a bottom-up approach
- Control vs Constrain
In other terms, data governance in a data-driven fashion

www.kensu.io
X. HOW TO GOVERN
The Kensu way: Data Activity Manager

www.kensu.io
THANKS!
http://kensu.io Analytics, AI Governance
Analytics
Governance
Perform
ance
Compliance
Q/A
Checkout Kensu Data Activity Manager

Governance compliance

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Governance compliance

Similar to Governance compliance (20)

More from Andy Petrella

More from Andy Petrella (20)

Recently uploaded

Recently uploaded (20)

Governance compliance