SlideShare a Scribd company logo
www.kensu.io
GOVERNANCE AND COMPLIANCE
1
Recipes for GDPR-friendly Data Science
www.kensu.io
ANDY -|- KENSU
2
Andy Petrella - Founder @ Kensu
Maths MSc / Computer Science MSc
10+ years in data computing (science?)
http://kensu.io Analytics, AI Governance
2
Analytics
Governance
Perform
ance
Compliance
www.kensu.io
a. Data Privacy
b. Risk
c. Ethic
I. COMPLIANCE
x. How to guarantee compliance
www.kensu.io
A. DATA PRIVACY
Information privacy, also known as data privacy or data protection, is the
relationship between the collection and dissemination of
a. data, 
b. technology,
c. the public expectation of privacy, 
d. legal 
and political issues surrounding them.[1]
Privacy  concerns exist wherever  personally identifiable information  or
other  sensitive information  is collected, stored, used, and finally
destroyed or deleted – in digital form or otherwise.
Improper or non-existent disclosure control can be the root cause for
privacy issues.
https://en.wikipedia.org/wiki/Information_privacy
www.kensu.io
Each  controller/processor  shall maintain a record of
processing activities under its responsibility (cf. Art. 30).
That record shall contain many information including:
• The purposes of the processing
• A description of the categories of data subjects and of
the categories of personal data



etc.
A. DATA PRIVACY
GDPR
www.kensu.io
A. DATA PRIVACY
Prior to collecting Californian’s personal data, businesses
must disclose in their privacy policy:

“the categories of personal information to be collected and
the purposes for which the categories of personal
information shall be used”

with any additional uses requiring notice to the
consumer
CaCPA: California Consumer Privacy Act of 2018
www.kensu.io
B. RISKS
Risks are present wherever data is used:
- Managing business risks with data
- Building new data business
https://www.eiuperspectives.economist.com/sites/default/files/RetailBanksandBigData.pdf
www.kensu.io
B. RISKS
- Retail worry about credit risk:

imbalance between the sizes of classes (defaulters <<< non-defaulters)
generates overly optimistic scores…

- Commercial focus on market risk:

VaR and variations requires important backtesting

- Investment are concerned about operational risk:

Just think about BCBS… govern, monitor, control!
Business’ risks… risks
www.kensu.io
B. RISKS
Intrinsic
https://unicsoft.net/risks-data-science-project/
www.kensu.io
B. RISKS
Intrinsic
Loosers Records stolen
JP Morgan Chase 76,000,000
Evernote 50,000,000
eBay 145,000,000
Target 70,000,000
LinkedIn 117,000,000
Yahoo 1,000,000,000
www.kensu.io
B. RISKS
Intrinsic
Improper Analytics
One tiny mistake can ruin the whole project.
Low Data Quality
Even most advanced analytics methods fail with incorrect data
www.kensu.io
C. ETHIC
Data Ethics refers to systemising, defending, and recommending
concepts of right and wrong conduct in relation to data, in particular
personal data.
Data ethics is different from information ethics because the focus of
information ethics is more concerned with issues of intellectual property.
https://en.wikipedia.org/wiki/Big_data_ethics
While data ethics is more concerned with collectors and
disseminators of structured or unstructured data such as
data brokers — governments — large corporations.
www.kensu.io
C. ETHIC
WAT?
http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf
Data ethics can be defined as the branch of ethics that studies and evaluates moral
problems related to
data
- generation
- recording
- processing
- dissemination
- sharing and use
algorithms
- artificial intelligence
- artificial agents
- machine learning
- robots (well…)
practices
- responsible innovation
- programming
- hacking
- professional codes
in order to formulate and support morally good solutions
www.kensu.io
C. ETHIC
WAT?
http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf
Data ethics can be defined as the branch of ethics that studies and evaluates moral
problems related to
data
- generation
- recording
- processing
- dissemination
- sharing and use
algorithms
- artificial intelligence
- artificial agents
- machine learning
- robots (well…)
practices
- responsible innovation
- programming
- hacking
- professional codes
in order to formulate and support morally good solutions
E
T
H
I C
?
E
T
H
I C
?
E
T
H
I C
?
www.kensu.io
C. ETHIC
WAT?
The ethics of data focuses on ethical problems posed by
the collection and analysis of large datasets and on
issues ranging from the use of big data in
- biomedical research and social sciences
- profilings
- advertising
- data philanthropy
- open data
www.kensu.io
C. ETHIC
WAT?
The ethics of algorithms addresses issues posed by the
increasing complexity and autonomy of algorithms
broadly understood, especially in the case of machine
learning applications.
Crucial challenges include moral responsibility and
accountability of both designers and data scientists with
respect to unforeseen and undesired consequences as
well as missed opportunities.
www.kensu.io
C. ETHIC
WAT?
The ethics of practices addresses the pressing questions
concerning the responsibilities and liabilities of people
and organizations in charge of data processes, strategies
and policies, including data scientists’ work to ensure ethical
practices fostering the protection of the data subject
rights.
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
Non-discrimination
Right to explanation
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
Non-discrimination
1. Article 21 of the Charter of Fundamental Rights of the
European Union
2. Article 14 of the European Convention on Human Rights
3. Articles 18-25 of the Treaty on the Functioning of the
European Union.
www.kensu.io
C. ETHIC
Automated decision-making
https://www.miamiherald.com/news/nation-world/national/article89562297.html
Discrimination… can be unintended
www.kensu.io
C. ETHIC
Automated decision-making
https://www.miamiherald.com/news/nation-world/national/article89562297.html
Discrimination… can be unintended
“Ingress players, like the database volunteers, appeared to
skew male, young and English-speaking, […]. 

Though the surveys did not gather data on race or income
levels, the average player spent almost $80 on the Ingress
game […] suggesting access to disposable income.”
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
Right to explanation
Profiling is inherently discriminatory


Data subjects are grouped in categories and decisions
are made on this basis
Plus, as said, machine learning can reify existing patterns
of discrimination



Consequences: Biased decisions are presented as the
outcome of an “objective” algorithm.
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
Right to explanation
Standard supervised machine learning algorithms are based
on discovering reliable associations to make predictions.
There is no concern for causal reasoning or “explanation”
www.kensu.io
C. ETHIC
Automated decision-making
https://arxiv.org/pdf/1606.08813.pdf
Right to explanation
For Burrell in How the machine “thinks”: Understanding opacity in
machine learning algorithms, there are three barriers to transparency
1. Intentional hiding of the decision procedures by corporations
2. Code sources are overly complex
3. Machine learning can reason at very high dimensions, humans’
brains don’t
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
a. Monitoring
b. Automated Reporting
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
In (data) engineering, processes have improved to
satisfy the need for stability, quality and compliance
by introducing:
1. logging
2. testing
3. continuous deployment
Monitoring
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
Data science projects are slightly different in nature than
pure engineering projects.
In that, most issues may come from the dynamicity of the
experimentations and the volatility of the data.
Such that, monitoring becomes key to AUTOMATED
compliance!
Monitoring
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
For data project, monitoring is about:
- what/how data are used (e.g. data lineage, products, …)
- what/how models are build (e.g. methods, metrics, …)
- where/how data products are used (e.g. marketing, fraud, …)
Monitoring
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
Pursuing the parallel with engineering: 

CI/CD and Q/A are similar to our current compliance needs!
Automated Reporting
The automation of compliance can be approached with a
set of rules to estimate the level of risks and to limit the
efforts to only actionable events.
Reporting is mandatory for compliance.

Reports can be generated from the conjunction of
monitored activities and established rules dictated by
regulations.
www.kensu.io
X. HOW TO GUARANTEE COMPLIANCE
The Kensu way: Data Activity Manager
Monitor
Automated Registry Report
www.kensu.io
a. Data in the Wild
b. Effects of contraints
II. GOVERNANCE
x. How to govern
www.kensu.io
A. DATA IN THE WILD
Working on data is perceived as the Wild West.
• Experimentations in highly dynamic environments (e.g. notebooks)
• Local copy or duplication of datasets
• Creation of intermediate dumps (models, prepared datasets)
www.kensu.io
B. EFFECTS OF CONTRAINTS
Adding constraints (policies) to govern is a classic…
So, we would have the following examples:
- predefine the set of needed data
- list methods to be used
- create documents… maintain them
Rules, laws, …
www.kensu.io
B. EFFECTS OF CONTRAINTS
The consequences of such constraints are:
• Lack of freedom
• Anonymization
• what about marketing use case
• what is the reliability of the process
• anonymisation is actually itself a process to be listed!
• poor/slow reactivity to market changes (performance drop)
… might not be best
www.kensu.io
X. HOW TO GOVERN
For compliance reasons, we have to introduce monitoring.
Monitoring data opens new governance doors:
- Govern data activities with a bottom-up approach
- Control vs Constrain
In other terms, data governance in a data-driven fashion
www.kensu.io
X. HOW TO GOVERN
The Kensu way: Data Activity Manager
www.kensu.io
THANKS!
http://kensu.io Analytics, AI Governance
Analytics
Governance
Perform
ance
Compliance
Q/A
Checkout Kensu Data Activity Manager

More Related Content

What's hot

ICEGOV - Tutorial 1 - Information Policy Concepts and Principles
ICEGOV - Tutorial 1 - Information Policy Concepts and PrinciplesICEGOV - Tutorial 1 - Information Policy Concepts and Principles
ICEGOV - Tutorial 1 - Information Policy Concepts and PrinciplesICEGOV
 
Privacy and personal information
Privacy and personal informationPrivacy and personal information
Privacy and personal information
Uc Man
 
Introduction to Information Policy
Introduction to Information PolicyIntroduction to Information Policy
Introduction to Information Policy
Niamh Headon
 
Information policy sunil sir
Information policy sunil sirInformation policy sunil sir
Information policy sunil sirbgshalini
 
Data Privacy
Data PrivacyData Privacy
Data Privacy
cliff_rudolph
 
Cloud and Data Privacy
Cloud and Data PrivacyCloud and Data Privacy
Cloud and Data Privacy
Maganathin Veeraragaloo
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy Protection
Sylvia Ogweng
 
Ensuring Effective Information Security Management Information Classification...
Ensuring Effective Information Security Management Information Classification...Ensuring Effective Information Security Management Information Classification...
Ensuring Effective Information Security Management Information Classification...
ijtsrd
 
Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics' Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics'
Axon Lawyers
 
Data set Legislation
Data set   Legislation Data set   Legislation
Data set Legislation
Data-Set
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis
 
Data set Legislation
Data set LegislationData set Legislation
Data set Legislation
Data-Set
 
Data set Legislation
Data set LegislationData set Legislation
Data set Legislation
Data-Set
 
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
Konstantinos Demertzis
 
Privacy by design for startups: legal and technology
Privacy by design for startups: legal and technologyPrivacy by design for startups: legal and technology
Privacy by design for startups: legal and technology
Ishay Tentser
 
Information policy ppt
Information policy pptInformation policy ppt
Information policy ppt
Kabir Khan
 
Copy of OSTP RFI on Big Data and Privacy
Copy of OSTP RFI on Big Data and PrivacyCopy of OSTP RFI on Big Data and Privacy
Copy of OSTP RFI on Big Data and Privacy
Micah Altman
 
Information Privacy
Information PrivacyInformation Privacy
Information Privacy
imehreenx
 
Data set module 4
Data set   module 4Data set   module 4
Data set module 4
Data-Set
 

What's hot (19)

ICEGOV - Tutorial 1 - Information Policy Concepts and Principles
ICEGOV - Tutorial 1 - Information Policy Concepts and PrinciplesICEGOV - Tutorial 1 - Information Policy Concepts and Principles
ICEGOV - Tutorial 1 - Information Policy Concepts and Principles
 
Privacy and personal information
Privacy and personal informationPrivacy and personal information
Privacy and personal information
 
Introduction to Information Policy
Introduction to Information PolicyIntroduction to Information Policy
Introduction to Information Policy
 
Information policy sunil sir
Information policy sunil sirInformation policy sunil sir
Information policy sunil sir
 
Data Privacy
Data PrivacyData Privacy
Data Privacy
 
Cloud and Data Privacy
Cloud and Data PrivacyCloud and Data Privacy
Cloud and Data Privacy
 
Open Government Data & Privacy Protection
Open Government Data & Privacy ProtectionOpen Government Data & Privacy Protection
Open Government Data & Privacy Protection
 
Ensuring Effective Information Security Management Information Classification...
Ensuring Effective Information Security Management Information Classification...Ensuring Effective Information Security Management Information Classification...
Ensuring Effective Information Security Management Information Classification...
 
Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics' Paperless Lab Academy 'legal aspects of big data analytics'
Paperless Lab Academy 'legal aspects of big data analytics'
 
Data set Legislation
Data set   Legislation Data set   Legislation
Data set Legislation
 
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...
 
Data set Legislation
Data set LegislationData set Legislation
Data set Legislation
 
Data set Legislation
Data set LegislationData set Legislation
Data set Legislation
 
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
A Dynamic Intelligent Policies Analysis Mechanism for Personal Data Processin...
 
Privacy by design for startups: legal and technology
Privacy by design for startups: legal and technologyPrivacy by design for startups: legal and technology
Privacy by design for startups: legal and technology
 
Information policy ppt
Information policy pptInformation policy ppt
Information policy ppt
 
Copy of OSTP RFI on Big Data and Privacy
Copy of OSTP RFI on Big Data and PrivacyCopy of OSTP RFI on Big Data and Privacy
Copy of OSTP RFI on Big Data and Privacy
 
Information Privacy
Information PrivacyInformation Privacy
Information Privacy
 
Data set module 4
Data set   module 4Data set   module 4
Data set module 4
 

Similar to Governance compliance

Data science governance : what and how
Data science governance : what and howData science governance : what and how
Data science governance : what and how
Andy Petrella
 
Ai in compliance
Ai in compliance Ai in compliance
Ai in compliance
Ebere Ikerionwu
 
Data science governance and GDPR
Data science governance and GDPRData science governance and GDPR
Data science governance and GDPR
Andy Petrella
 
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdf
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdfWritten-Blog_Ethic_AI_08Aug23_pub_jce.pdf
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdf
jiricejka
 
Artificial Intelligence (AI) & Privacy.pptx
Artificial Intelligence (AI) & Privacy.pptxArtificial Intelligence (AI) & Privacy.pptx
Artificial Intelligence (AI) & Privacy.pptx
Dr.A.Prabaharan Professor & Research Director, Public Action
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
Eryk Budi Pratama
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
abethan
 
Jan 2017 Submission to AG Re: Metadata use in civil proceedings
Jan 2017 Submission to AG Re: Metadata use in civil proceedingsJan 2017 Submission to AG Re: Metadata use in civil proceedings
Jan 2017 Submission to AG Re: Metadata use in civil proceedings
Timothy Holborn
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONPranav Godse
 
Big Data: Privacy and Security Aspects
Big Data: Privacy and Security AspectsBig Data: Privacy and Security Aspects
Big Data: Privacy and Security Aspects
IRJET Journal
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislation
caniceconsulting
 
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos  FTC Comment Letter Big Data: A Tool for Inclusion or ExclusionAnonos  FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Ted Myerson
 
Research methods - ethics
Research methods - ethicsResearch methods - ethics
Research methods - ethics
Tracy Harwood
 
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
e-SIDES.eu
 
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
Ansgar Koene
 
Ppt IT Infrastructure.ppt
Ppt IT Infrastructure.pptPpt IT Infrastructure.ppt
Ppt IT Infrastructure.ppt
23017156038
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
IJERDJOURNAL
 
A practical data privacy and security approach to ffiec, gdpr and ccpa
A practical data privacy and security approach to ffiec, gdpr and ccpaA practical data privacy and security approach to ffiec, gdpr and ccpa
A practical data privacy and security approach to ffiec, gdpr and ccpa
Ulf Mattsson
 
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
Andrea Omicini
 
Privacy experience in Plone and other open source CMS
Privacy experience in Plone and other open source CMSPrivacy experience in Plone and other open source CMS
Privacy experience in Plone and other open source CMS
Interaktiv
 

Similar to Governance compliance (20)

Data science governance : what and how
Data science governance : what and howData science governance : what and how
Data science governance : what and how
 
Ai in compliance
Ai in compliance Ai in compliance
Ai in compliance
 
Data science governance and GDPR
Data science governance and GDPRData science governance and GDPR
Data science governance and GDPR
 
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdf
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdfWritten-Blog_Ethic_AI_08Aug23_pub_jce.pdf
Written-Blog_Ethic_AI_08Aug23_pub_jce.pdf
 
Artificial Intelligence (AI) & Privacy.pptx
Artificial Intelligence (AI) & Privacy.pptxArtificial Intelligence (AI) & Privacy.pptx
Artificial Intelligence (AI) & Privacy.pptx
 
The Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI WebinarThe Rise of Data Ethics and Security - AIDI Webinar
The Rise of Data Ethics and Security - AIDI Webinar
 
Ethics In DW &amp; DM
Ethics In DW &amp; DMEthics In DW &amp; DM
Ethics In DW &amp; DM
 
Jan 2017 Submission to AG Re: Metadata use in civil proceedings
Jan 2017 Submission to AG Re: Metadata use in civil proceedingsJan 2017 Submission to AG Re: Metadata use in civil proceedings
Jan 2017 Submission to AG Re: Metadata use in civil proceedings
 
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTION
 
Big Data: Privacy and Security Aspects
Big Data: Privacy and Security AspectsBig Data: Privacy and Security Aspects
Big Data: Privacy and Security Aspects
 
Smart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislationSmart Data Module 5 d drive_legislation
Smart Data Module 5 d drive_legislation
 
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos  FTC Comment Letter Big Data: A Tool for Inclusion or ExclusionAnonos  FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
Anonos FTC Comment Letter Big Data: A Tool for Inclusion or Exclusion
 
Research methods - ethics
Research methods - ethicsResearch methods - ethics
Research methods - ethics
 
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...
 
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
Bias in algorithmic decision-making: Standards, Algorithmic Literacy and Gove...
 
Ppt IT Infrastructure.ppt
Ppt IT Infrastructure.pptPpt IT Infrastructure.ppt
Ppt IT Infrastructure.ppt
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
 
A practical data privacy and security approach to ffiec, gdpr and ccpa
A practical data privacy and security approach to ffiec, gdpr and ccpaA practical data privacy and security approach to ffiec, gdpr and ccpa
A practical data privacy and security approach to ffiec, gdpr and ccpa
 
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
Privacy through Anonymisation in Large-scale Socio-technical Systems: The BIS...
 
Privacy experience in Plone and other open source CMS
Privacy experience in Plone and other open source CMSPrivacy experience in Plone and other open source CMS
Privacy experience in Plone and other open source CMS
 

More from Andy Petrella

Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
Andy Petrella
 
How to Build a Global Data Mapping
How to Build a Global Data MappingHow to Build a Global Data Mapping
How to Build a Global Data Mapping
Andy Petrella
 
Interactive notebooks
Interactive notebooksInteractive notebooks
Interactive notebooks
Andy Petrella
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
Andy Petrella
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Andy Petrella
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
Andy Petrella
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
Andy Petrella
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
Andy Petrella
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
Andy Petrella
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Andy Petrella
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browser
Andy Petrella
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open Science
Andy Petrella
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
Andy Petrella
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
Andy Petrella
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
Andy Petrella
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 

More from Andy Petrella (20)

Data Observability Best Pracices
Data Observability Best PracicesData Observability Best Pracices
Data Observability Best Pracices
 
How to Build a Global Data Mapping
How to Build a Global Data MappingHow to Build a Global Data Mapping
How to Build a Global Data Mapping
 
Interactive notebooks
Interactive notebooksInteractive notebooks
Interactive notebooks
 
Scala: the unpredicted lingua franca for data science
Scala: the unpredicted lingua franca  for data scienceScala: the unpredicted lingua franca  for data science
Scala: the unpredicted lingua franca for data science
 
Agile data science with scala
Agile data science with scalaAgile data science with scala
Agile data science with scala
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browser
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open Science
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Spark devoxx2014
Spark devoxx2014Spark devoxx2014
Spark devoxx2014
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 

Recently uploaded

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 

Recently uploaded (20)

Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Governance compliance

  • 1. www.kensu.io GOVERNANCE AND COMPLIANCE 1 Recipes for GDPR-friendly Data Science
  • 2. www.kensu.io ANDY -|- KENSU 2 Andy Petrella - Founder @ Kensu Maths MSc / Computer Science MSc 10+ years in data computing (science?) http://kensu.io Analytics, AI Governance 2 Analytics Governance Perform ance Compliance
  • 3. www.kensu.io a. Data Privacy b. Risk c. Ethic I. COMPLIANCE x. How to guarantee compliance
  • 4. www.kensu.io A. DATA PRIVACY Information privacy, also known as data privacy or data protection, is the relationship between the collection and dissemination of a. data,  b. technology, c. the public expectation of privacy,  d. legal  and political issues surrounding them.[1] Privacy  concerns exist wherever  personally identifiable information  or other  sensitive information  is collected, stored, used, and finally destroyed or deleted – in digital form or otherwise. Improper or non-existent disclosure control can be the root cause for privacy issues. https://en.wikipedia.org/wiki/Information_privacy
  • 5. www.kensu.io Each  controller/processor  shall maintain a record of processing activities under its responsibility (cf. Art. 30). That record shall contain many information including: • The purposes of the processing • A description of the categories of data subjects and of the categories of personal data
 
 etc. A. DATA PRIVACY GDPR
  • 6. www.kensu.io A. DATA PRIVACY Prior to collecting Californian’s personal data, businesses must disclose in their privacy policy:
 “the categories of personal information to be collected and the purposes for which the categories of personal information shall be used”
 with any additional uses requiring notice to the consumer CaCPA: California Consumer Privacy Act of 2018
  • 7. www.kensu.io B. RISKS Risks are present wherever data is used: - Managing business risks with data - Building new data business https://www.eiuperspectives.economist.com/sites/default/files/RetailBanksandBigData.pdf
  • 8. www.kensu.io B. RISKS - Retail worry about credit risk:
 imbalance between the sizes of classes (defaulters <<< non-defaulters) generates overly optimistic scores…
 - Commercial focus on market risk:
 VaR and variations requires important backtesting
 - Investment are concerned about operational risk:
 Just think about BCBS… govern, monitor, control! Business’ risks… risks
  • 10. www.kensu.io B. RISKS Intrinsic Loosers Records stolen JP Morgan Chase 76,000,000 Evernote 50,000,000 eBay 145,000,000 Target 70,000,000 LinkedIn 117,000,000 Yahoo 1,000,000,000
  • 11. www.kensu.io B. RISKS Intrinsic Improper Analytics One tiny mistake can ruin the whole project. Low Data Quality Even most advanced analytics methods fail with incorrect data
  • 12. www.kensu.io C. ETHIC Data Ethics refers to systemising, defending, and recommending concepts of right and wrong conduct in relation to data, in particular personal data. Data ethics is different from information ethics because the focus of information ethics is more concerned with issues of intellectual property. https://en.wikipedia.org/wiki/Big_data_ethics While data ethics is more concerned with collectors and disseminators of structured or unstructured data such as data brokers — governments — large corporations.
  • 13. www.kensu.io C. ETHIC WAT? http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf Data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data - generation - recording - processing - dissemination - sharing and use algorithms - artificial intelligence - artificial agents - machine learning - robots (well…) practices - responsible innovation - programming - hacking - professional codes in order to formulate and support morally good solutions
  • 14. www.kensu.io C. ETHIC WAT? http://rsta.royalsocietypublishing.org/content/roypta/374/2083/20160360.full.pdf Data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data - generation - recording - processing - dissemination - sharing and use algorithms - artificial intelligence - artificial agents - machine learning - robots (well…) practices - responsible innovation - programming - hacking - professional codes in order to formulate and support morally good solutions E T H I C ? E T H I C ? E T H I C ?
  • 15. www.kensu.io C. ETHIC WAT? The ethics of data focuses on ethical problems posed by the collection and analysis of large datasets and on issues ranging from the use of big data in - biomedical research and social sciences - profilings - advertising - data philanthropy - open data
  • 16. www.kensu.io C. ETHIC WAT? The ethics of algorithms addresses issues posed by the increasing complexity and autonomy of algorithms broadly understood, especially in the case of machine learning applications. Crucial challenges include moral responsibility and accountability of both designers and data scientists with respect to unforeseen and undesired consequences as well as missed opportunities.
  • 17. www.kensu.io C. ETHIC WAT? The ethics of practices addresses the pressing questions concerning the responsibilities and liabilities of people and organizations in charge of data processes, strategies and policies, including data scientists’ work to ensure ethical practices fostering the protection of the data subject rights.
  • 20. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Non-discrimination 1. Article 21 of the Charter of Fundamental Rights of the European Union 2. Article 14 of the European Convention on Human Rights 3. Articles 18-25 of the Treaty on the Functioning of the European Union.
  • 22. www.kensu.io C. ETHIC Automated decision-making https://www.miamiherald.com/news/nation-world/national/article89562297.html Discrimination… can be unintended “Ingress players, like the database volunteers, appeared to skew male, young and English-speaking, […]. 
 Though the surveys did not gather data on race or income levels, the average player spent almost $80 on the Ingress game […] suggesting access to disposable income.”
  • 23. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation Profiling is inherently discriminatory 
 Data subjects are grouped in categories and decisions are made on this basis Plus, as said, machine learning can reify existing patterns of discrimination
 
 Consequences: Biased decisions are presented as the outcome of an “objective” algorithm.
  • 24. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation Standard supervised machine learning algorithms are based on discovering reliable associations to make predictions. There is no concern for causal reasoning or “explanation”
  • 25. www.kensu.io C. ETHIC Automated decision-making https://arxiv.org/pdf/1606.08813.pdf Right to explanation For Burrell in How the machine “thinks”: Understanding opacity in machine learning algorithms, there are three barriers to transparency 1. Intentional hiding of the decision procedures by corporations 2. Code sources are overly complex 3. Machine learning can reason at very high dimensions, humans’ brains don’t
  • 26. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE a. Monitoring b. Automated Reporting
  • 27. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE In (data) engineering, processes have improved to satisfy the need for stability, quality and compliance by introducing: 1. logging 2. testing 3. continuous deployment Monitoring
  • 28. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE Data science projects are slightly different in nature than pure engineering projects. In that, most issues may come from the dynamicity of the experimentations and the volatility of the data. Such that, monitoring becomes key to AUTOMATED compliance! Monitoring
  • 29. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE For data project, monitoring is about: - what/how data are used (e.g. data lineage, products, …) - what/how models are build (e.g. methods, metrics, …) - where/how data products are used (e.g. marketing, fraud, …) Monitoring
  • 30. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE Pursuing the parallel with engineering: 
 CI/CD and Q/A are similar to our current compliance needs! Automated Reporting The automation of compliance can be approached with a set of rules to estimate the level of risks and to limit the efforts to only actionable events. Reporting is mandatory for compliance.
 Reports can be generated from the conjunction of monitored activities and established rules dictated by regulations.
  • 31. www.kensu.io X. HOW TO GUARANTEE COMPLIANCE The Kensu way: Data Activity Manager Monitor Automated Registry Report
  • 32. www.kensu.io a. Data in the Wild b. Effects of contraints II. GOVERNANCE x. How to govern
  • 33. www.kensu.io A. DATA IN THE WILD Working on data is perceived as the Wild West. • Experimentations in highly dynamic environments (e.g. notebooks) • Local copy or duplication of datasets • Creation of intermediate dumps (models, prepared datasets)
  • 34. www.kensu.io B. EFFECTS OF CONTRAINTS Adding constraints (policies) to govern is a classic… So, we would have the following examples: - predefine the set of needed data - list methods to be used - create documents… maintain them Rules, laws, …
  • 35. www.kensu.io B. EFFECTS OF CONTRAINTS The consequences of such constraints are: • Lack of freedom • Anonymization • what about marketing use case • what is the reliability of the process • anonymisation is actually itself a process to be listed! • poor/slow reactivity to market changes (performance drop) … might not be best
  • 36. www.kensu.io X. HOW TO GOVERN For compliance reasons, we have to introduce monitoring. Monitoring data opens new governance doors: - Govern data activities with a bottom-up approach - Control vs Constrain In other terms, data governance in a data-driven fashion
  • 37. www.kensu.io X. HOW TO GOVERN The Kensu way: Data Activity Manager
  • 38. www.kensu.io THANKS! http://kensu.io Analytics, AI Governance Analytics Governance Perform ance Compliance Q/A Checkout Kensu Data Activity Manager