SlideShare a Scribd company logo
1 of 34
www.kensu.io
DATA SCIENCE GOVERNANCE
1
Turn GDPR’s accountability principles into an added-value
for your business
Big Data Spain, 2017
www.kensu.io 2
- CEO & Founder -
Mathematics
Computer Science
ANDY PETRELLA
KENSU & ME
Started in Belgium by building en enterprise stack for Data Scientists (Agile Data Science Toolkit)
Pivot on internal component: Data Science Catalog
Focus on Data Science Governance
Accelerated by Alchemist Accelerator in San Francisco and The Faktory in Belgium
Kensu Inc. in October!
Spark Notebook O’Reilly Training O’Reilly Book
www.kensu.io
TOPICS
1. Some thoughts on “Data Science”
2. Data Science Governance: What
3. Data Science Governance: How
4. GDPR: Accountability principle and transparency
5. Business advantages
3
www.kensu.io
SOME THOUGHTS ON “DATA SCIENCE”
4
www.kensu.io
MACHINE LEARNING
Pioneers in 1950s
AI Winter in 1970s due pessimism
Resurgence in 1980s
Machine Learning (and related) is used since the 1990s (esp. SVM and RNN)
Deep learning see widespread commercial use in 2000s
Machine learning receives great publicity (read: buzz) in 2010s
5ref: https://en.wikipedia.org/wiki/Timeline_of_machine_learning
www.kensu.io
DATA SCIENCE: +ENGINEERING
Claim: “Data Scientist” coined by DJ Patil in 2008.
Pretty much where Machine Learning was part of Softwares
In a way, when we added “engineering” to the mix
Also, engineering is even more prominent with Big Data Distributed
Computing
6
www.kensu.io
DATA SCIENCE: +EXPERIMENTATION
So much data available
So many tools, libraries, frameworks, …
So many things we can try
We have distributed computing now, right? => Let’s try everything
Discover new insights (and potentially new businesses)
7
www.kensu.io
DATA SCIENCE: RECAP
Maths: stats, machine learning and so on
Engineering: ETL, Databases, Computing framework, Softwares, Platforms, …
Creativity: “From business intelligence To intelligent business”- Michael Fergusson
Data Science is an umbrella on top of all activities on data
8
www.kensu.io
DON’T BELIEVE ME?
9https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
www.kensu.io
DATA SCIENCE GOVERNANCE: WHAT
10
www.kensu.io
DATA PIPELINE
Data pipeline is connecting activities on data, potentially involving
several technologies.
A pipeline is generally thought as an End-to-End processing line to solve
one problem.
But, part of pipelines are reused to save computation, storage, time, …
Thus interdependency between pipeline segments grows with initiatives
11
www.kensu.io
GOAL: TAKE DECISION
Data Pipelines, connected together, aren’t created for the beauty of it.
The ultimate goal is always to take decisions.
Decisions are generally taken or linked to humans with responsibilities.

(even for self driving cars, in case of problem)
Given that pipelines are cut-and-wired, interleaved, …
How not to be anxious at deploying the last piece used by the decision maker
12
www.kensu.io
SOURCES OF ANXIETY
What if:
• one of the data used in the process has different patterns suddenly?
• one of the tools, projects or similar is modified upstream?
• the insights are deviating from the reality?
• …
13
www.kensu.io
DEBUGGING?
To reduce the anxiety or, actually, reducing the risks, we need ways to debug.
In pure engineering, we have unit, function, integrations tests,… but
How do we do when the problems come from the data themselves?
We can’t generate all cases of data variations, right?
How to debug? 

Without the big picture, we may try to optimise a model for weeks for nothing
14
www.kensu.io
DATA SCIENCE GOVERNANCE
Data governance: controls that data meets precise standards and
involves monitoring against production data.
Data Science Governance: control that data activity meets precise
standards and involves monitoring against production data activity.
A Data Activity is described by at least technologies, users, systems,
data, processing
15
www.kensu.io
GOVERNING DATA SCIENCE
Who does what on which data and where it is done?
What is the impact of a process on the global system?
What are the performance metrics (quality, execution,…) of the processes?
16
www.kensu.io
CONTINUOUS INTEGRATION FOR DATA SCIENCE
Data Scientists/Citizens have a view on all the activities applied to
the original sources used in his/her own process.
They also have a control on their own results in production
They have the opportunity to analyse and debug a pipeline
involving all activities:
• independently of the technologies
• involving several people in the enterprise
17
www.kensu.io
DATA SCIENCE GOVERNANCE: HOW
18
www.kensu.io
CHALLENGES
So many tools are using data!
The number of processing is growing impressively.
We have to take care of the legacy…
19
www.kensu.io
GET THE DATA
As usual, we have to collect the right data to take right decision.
First run an assessment to create a high level map of all the tools
involved into a company.
For each tool, do whatever it takes to collect information about the
activities it is creating.
Information are metadata, lineage, statistics, accuracy measures, …
20
www.kensu.io
CONNECT THE DATA
Data Science Governance needs the global picture.
To do that we need to connect all data that can be collected.
So that, it is possible to create a cartography of all on-going processes.
This map tracks all data and their descendants
21
www.kensu.io
USE THE DATA
This is where the fun part starts… the map of data activities is an
amazing source of information
Here are a few things you can think of when using this kind of data:
• impact analysis
• dependency analysis
• optimisation
• recommendation
22
www.kensu.io
GDPR
23
General Data Protection Regulation
www.kensu.io
ACCOUNTABILITY PRINCIPLE
Implement appropriate technical and organisational measures that
ensure and demonstrate that you comply. This may include internal
data protection policies such as staff training, internal audits of
processing activities, and reviews of internal HR policies.
24
www.kensu.io
TRANSPARENCY
As well as your obligation to provide comprehensive, clear and
transparent privacy policies, if your organisation has more than 250
employees, you must maintain additional internal records of your
processing activities.
25
www.kensu.io
ACCOUNTABILITY: DATA SCIENCE GOVERNANCE
To govern data science, we have to:
• collect activities
• connect activities
With this information we can reliably create automatically the
process registry
26
www.kensu.io
TRANSPARENCY: DATA SCIENCE GOVERNANCE
To govern data science seen as a continuous integration solution: 

we have to explain and measure activities independently of the
technologies.
With this information we can reliably create transparent reports of
activities across the whole chain of processing
27
www.kensu.io
CONSEQUENCES
28
Connect data and business
Spoiler attack: one-line ahead
www.kensu.io
DATA TO BUSINESS
29
Business KPIs are nothing but data!
www.kensu.io
BUSINESS TO DATA
30
Change the business to match the data
ADAPT!
www.kensu.io
KENSU
Taking the idea further
31
www.kensu.io 32
SOLUTION: DATA SCIENCE ON DATA SCIENCE
Data: Oracle
Activity: Tensorflow
(*)
collect activities metadata (*)
performance optimisations
Data Science
Governance
CompliancePerformance
www.kensu.io
OUR PRODUCT: KENSU DATA ACTIVITY MANAGEMENT
33
Data Science Governance
First Governance, Compliance and Performance solution for Data science
Feature Benefit Why it matters
Connect.Collect.Learn
Automatically captures all data
science relevant activities related to
governance, compliance and
performance within a given domain.
Provided end-to-end control and
insights into all relevant aspects of
data science related activities

#GDPR
DPO Dashboard
One-stop control center for all
potential data privacy violations
Near-realtime notifications and
actionable intelligence current state
of “compliance health”
#GDPR
Compliance Reporting
One-click reports for all relevant
governance and compliance reports
Guarantee for good relationship with
authorities in charge by respecting
their templates
#GDPR
www.kensu.io
DATA SCIENCE GOVERNANCE
Andy Petrella
CEO Co Founder
@noootsab
@kensuio

More Related Content

More from Andy Petrella

Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Andy Petrella
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.Andy Petrella
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Andy Petrella
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleAndy Petrella
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformAndy Petrella
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Andy Petrella
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...Andy Petrella
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserAndy Petrella
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open ScienceAndy Petrella
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkAndy Petrella
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaAndy Petrella
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphXAndy Petrella
 
Quanti-litative Revolution in GIS
Quanti-litative Revolution in GISQuanti-litative Revolution in GIS
Quanti-litative Revolution in GISAndy Petrella
 
Scala and-fp-in-big-data
Scala and-fp-in-big-dataScala and-fp-in-big-data
Scala and-fp-in-big-dataAndy Petrella
 
Software Crafted And Libraries Available
Software Crafted And Libraries AvailableSoftware Crafted And Libraries Available
Software Crafted And Libraries AvailableAndy Petrella
 
Fp and entrepreneurship
Fp and entrepreneurshipFp and entrepreneurship
Fp and entrepreneurshipAndy Petrella
 
BigData Week 2014 Belgium: Velocity
BigData Week 2014 Belgium: VelocityBigData Week 2014 Belgium: Velocity
BigData Week 2014 Belgium: VelocityAndy Petrella
 

More from Andy Petrella (20)

Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.What is a distributed data science pipeline. how with apache spark and friends.
What is a distributed data science pipeline. how with apache spark and friends.
 
Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)Towards a rebirth of data science (by Data Fellas)
Towards a rebirth of data science (by Data Fellas)
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Spark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scaleSpark Summit Europe: Share and analyse genomic data at scale
Spark Summit Europe: Share and analyse genomic data at scale
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
 
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
Data Enthusiasts London: Scalable and Interoperable data services. Applied to...
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
 
Distributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browserDistributed machine learning 101 using apache spark from the browser
Distributed machine learning 101 using apache spark from the browser
 
Liège créative: Open Science
Liège créative: Open ScienceLiège créative: Open Science
Liège créative: Open Science
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
What is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache SparkWhat is Distributed Computing, Why we use Apache Spark
What is Distributed Computing, Why we use Apache Spark
 
Spark devoxx2014
Spark devoxx2014Spark devoxx2014
Spark devoxx2014
 
Lightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and ScalaLightning fast genomics with Spark, Adam and Scala
Lightning fast genomics with Spark, Adam and Scala
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Quanti-litative Revolution in GIS
Quanti-litative Revolution in GISQuanti-litative Revolution in GIS
Quanti-litative Revolution in GIS
 
Scala and-fp-in-big-data
Scala and-fp-in-big-dataScala and-fp-in-big-data
Scala and-fp-in-big-data
 
Software Crafted And Libraries Available
Software Crafted And Libraries AvailableSoftware Crafted And Libraries Available
Software Crafted And Libraries Available
 
Fp and entrepreneurship
Fp and entrepreneurshipFp and entrepreneurship
Fp and entrepreneurship
 
BigData Week 2014 Belgium: Velocity
BigData Week 2014 Belgium: VelocityBigData Week 2014 Belgium: Velocity
BigData Week 2014 Belgium: Velocity
 

Recently uploaded

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Business added value of GDPR's accountability principles

  • 1. www.kensu.io DATA SCIENCE GOVERNANCE 1 Turn GDPR’s accountability principles into an added-value for your business Big Data Spain, 2017
  • 2. www.kensu.io 2 - CEO & Founder - Mathematics Computer Science ANDY PETRELLA KENSU & ME Started in Belgium by building en enterprise stack for Data Scientists (Agile Data Science Toolkit) Pivot on internal component: Data Science Catalog Focus on Data Science Governance Accelerated by Alchemist Accelerator in San Francisco and The Faktory in Belgium Kensu Inc. in October! Spark Notebook O’Reilly Training O’Reilly Book
  • 3. www.kensu.io TOPICS 1. Some thoughts on “Data Science” 2. Data Science Governance: What 3. Data Science Governance: How 4. GDPR: Accountability principle and transparency 5. Business advantages 3
  • 4. www.kensu.io SOME THOUGHTS ON “DATA SCIENCE” 4
  • 5. www.kensu.io MACHINE LEARNING Pioneers in 1950s AI Winter in 1970s due pessimism Resurgence in 1980s Machine Learning (and related) is used since the 1990s (esp. SVM and RNN) Deep learning see widespread commercial use in 2000s Machine learning receives great publicity (read: buzz) in 2010s 5ref: https://en.wikipedia.org/wiki/Timeline_of_machine_learning
  • 6. www.kensu.io DATA SCIENCE: +ENGINEERING Claim: “Data Scientist” coined by DJ Patil in 2008. Pretty much where Machine Learning was part of Softwares In a way, when we added “engineering” to the mix Also, engineering is even more prominent with Big Data Distributed Computing 6
  • 7. www.kensu.io DATA SCIENCE: +EXPERIMENTATION So much data available So many tools, libraries, frameworks, … So many things we can try We have distributed computing now, right? => Let’s try everything Discover new insights (and potentially new businesses) 7
  • 8. www.kensu.io DATA SCIENCE: RECAP Maths: stats, machine learning and so on Engineering: ETL, Databases, Computing framework, Softwares, Platforms, … Creativity: “From business intelligence To intelligent business”- Michael Fergusson Data Science is an umbrella on top of all activities on data 8
  • 11. www.kensu.io DATA PIPELINE Data pipeline is connecting activities on data, potentially involving several technologies. A pipeline is generally thought as an End-to-End processing line to solve one problem. But, part of pipelines are reused to save computation, storage, time, … Thus interdependency between pipeline segments grows with initiatives 11
  • 12. www.kensu.io GOAL: TAKE DECISION Data Pipelines, connected together, aren’t created for the beauty of it. The ultimate goal is always to take decisions. Decisions are generally taken or linked to humans with responsibilities.
 (even for self driving cars, in case of problem) Given that pipelines are cut-and-wired, interleaved, … How not to be anxious at deploying the last piece used by the decision maker 12
  • 13. www.kensu.io SOURCES OF ANXIETY What if: • one of the data used in the process has different patterns suddenly? • one of the tools, projects or similar is modified upstream? • the insights are deviating from the reality? • … 13
  • 14. www.kensu.io DEBUGGING? To reduce the anxiety or, actually, reducing the risks, we need ways to debug. In pure engineering, we have unit, function, integrations tests,… but How do we do when the problems come from the data themselves? We can’t generate all cases of data variations, right? How to debug? 
 Without the big picture, we may try to optimise a model for weeks for nothing 14
  • 15. www.kensu.io DATA SCIENCE GOVERNANCE Data governance: controls that data meets precise standards and involves monitoring against production data. Data Science Governance: control that data activity meets precise standards and involves monitoring against production data activity. A Data Activity is described by at least technologies, users, systems, data, processing 15
  • 16. www.kensu.io GOVERNING DATA SCIENCE Who does what on which data and where it is done? What is the impact of a process on the global system? What are the performance metrics (quality, execution,…) of the processes? 16
  • 17. www.kensu.io CONTINUOUS INTEGRATION FOR DATA SCIENCE Data Scientists/Citizens have a view on all the activities applied to the original sources used in his/her own process. They also have a control on their own results in production They have the opportunity to analyse and debug a pipeline involving all activities: • independently of the technologies • involving several people in the enterprise 17
  • 19. www.kensu.io CHALLENGES So many tools are using data! The number of processing is growing impressively. We have to take care of the legacy… 19
  • 20. www.kensu.io GET THE DATA As usual, we have to collect the right data to take right decision. First run an assessment to create a high level map of all the tools involved into a company. For each tool, do whatever it takes to collect information about the activities it is creating. Information are metadata, lineage, statistics, accuracy measures, … 20
  • 21. www.kensu.io CONNECT THE DATA Data Science Governance needs the global picture. To do that we need to connect all data that can be collected. So that, it is possible to create a cartography of all on-going processes. This map tracks all data and their descendants 21
  • 22. www.kensu.io USE THE DATA This is where the fun part starts… the map of data activities is an amazing source of information Here are a few things you can think of when using this kind of data: • impact analysis • dependency analysis • optimisation • recommendation 22
  • 24. www.kensu.io ACCOUNTABILITY PRINCIPLE Implement appropriate technical and organisational measures that ensure and demonstrate that you comply. This may include internal data protection policies such as staff training, internal audits of processing activities, and reviews of internal HR policies. 24
  • 25. www.kensu.io TRANSPARENCY As well as your obligation to provide comprehensive, clear and transparent privacy policies, if your organisation has more than 250 employees, you must maintain additional internal records of your processing activities. 25
  • 26. www.kensu.io ACCOUNTABILITY: DATA SCIENCE GOVERNANCE To govern data science, we have to: • collect activities • connect activities With this information we can reliably create automatically the process registry 26
  • 27. www.kensu.io TRANSPARENCY: DATA SCIENCE GOVERNANCE To govern data science seen as a continuous integration solution: 
 we have to explain and measure activities independently of the technologies. With this information we can reliably create transparent reports of activities across the whole chain of processing 27
  • 28. www.kensu.io CONSEQUENCES 28 Connect data and business Spoiler attack: one-line ahead
  • 29. www.kensu.io DATA TO BUSINESS 29 Business KPIs are nothing but data!
  • 30. www.kensu.io BUSINESS TO DATA 30 Change the business to match the data ADAPT!
  • 32. www.kensu.io 32 SOLUTION: DATA SCIENCE ON DATA SCIENCE Data: Oracle Activity: Tensorflow (*) collect activities metadata (*) performance optimisations Data Science Governance CompliancePerformance
  • 33. www.kensu.io OUR PRODUCT: KENSU DATA ACTIVITY MANAGEMENT 33 Data Science Governance First Governance, Compliance and Performance solution for Data science Feature Benefit Why it matters Connect.Collect.Learn Automatically captures all data science relevant activities related to governance, compliance and performance within a given domain. Provided end-to-end control and insights into all relevant aspects of data science related activities
 #GDPR DPO Dashboard One-stop control center for all potential data privacy violations Near-realtime notifications and actionable intelligence current state of “compliance health” #GDPR Compliance Reporting One-click reports for all relevant governance and compliance reports Guarantee for good relationship with authorities in charge by respecting their templates #GDPR
  • 34. www.kensu.io DATA SCIENCE GOVERNANCE Andy Petrella CEO Co Founder @noootsab @kensuio