DELIVERING ANALYTICS
AT SCALE WITH A
GOVERNED DATA LAKE
Jean-Michel Franco
Sr Director for Data Governance Products
@jmichel_franco
22
AGENDA
A data governance model for
the Data Lake
Building the platform for the
governed data lake
Use Cases : Establishing
GDPR/Data Privacy compliance
in a customer 360° lake
01
02
03
33
2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
DATA ECONOMICS ARE BROKEN
TRADITIONAL APPROACH CAN NO LONGER KEEP UP
Data &
business
expectations
Delivery
Capabilities
WIDENING
GAP
3Xgrowth
rate in
self-service
users
Data doubling
every 2 years
cloud
machine
learning
IOT
People
44
Traditional
DataWarehouse
Authoritative Governance
WHY TRADITIONAL APPROACHES FAILED WITH BIG DATA
Old Model -> Too Few People Access Too Few Data
Restricted
Data Access
Governance
Costs, Time to market, Scalability
Limited
User Reach
55
RawData
WHY MOST DATA LAKE APPROACHES WILL FAIL
New model -> Struggling To Control The Data Sprawl
Any Data Costs, Time to value, Scalability
Governance, Risks
Any Data
Worker
Data Scientists
Business Analysts
Operations
INGEST CURATE MANAGE CONSUME
INGEST CURATE MANAGE CONSUME
INGEST CURATE MANAGE CONSUME
GOVERN
GOVERN
Open Data
Hadoop & NoSQL
Traditional Data Sources
Streams
Enterprise Apps
Cloud
Smart
Data
Smart
Data
Smart
Data
66
BALANCING THE GOVERNANCE ‘SEE-SAW’
ControlAutonomy
77
COLLABORATIVE GOVERNANCE FROM THE GET GO
Scaling Trust And Reach Through Collaboration
Costs, Time to value, Scalability, Governance Any Data
Worker
Any Data
Open Data
Hadoop & NoSQL
Traditional Data Sources
Streams
Enterprise Apps
Cloud RawData
IT’s Sanctionned Content
Business Crowdsourced Content
SmartData
PAVING THE ROAD FOR THE
GOVERNED DATA LAKE
99
A 5 STEPS APPROACH FOR MODERN DATA GOVERNANCE
√
• Establish Data Quality Upfront
• Unleash Data as a Service
for People and Apps
• Capture & Document Any Data Sources
√
√
√
√
√
√
• Take Control & Protect
Data Engineers
Business Users
Data Scientists
Customers
Applications
API
• Foster accountability
10
WHAT ARE THE RELATED 5 DATA MGMT DISCIPLINES?
Know
Your Data
Build the
360°view
Protect and
govern your Data
Foster
Accounta-
lities
Publish data
in a controlled
way
Data Anonymization,
Policy Enforcement,
Data Lineage
Data Cataloging & Metadata Management
Data Quality &
Master Data Management
Data Stewardship
Data Cataloging and API
11
BOOSTING DATA USAGE TENFOLD
AT ISO-BUDGET WITH A CLOUD
DATA-LAKE
Goals & Benefits
• Expanding data usage and reaping
the benefits of data monetization
• Turbo-charging analytics from
batch to near real time
• Establishing end to end security
and compliance (MIFID, GDPR…)
• Improving data accessibility ->
from 45 days to 1 day for
provisioning a data lab
Data
Lake
Catalog &
Search
Access &
UI
Processing
and
Analytics
Gover-
nance &
Security
Data
Ingestion
&
Integration
USE CASE: THE GDPR DATA LAKE
1313
MOST COMPANIES FAIL, BADLY
Policies are defined…
98%HAVE UPDATED THEIR
PRIVACY POLICIES FOR
GDPR
70%FAILED TO PROVIDE THE
DATA REQUESTED!
21 days
AVG TIME IT TOOK
COMPLIANT COMPANIES
TO RESPOND
But are not enforced… or poorly delivered
1414
THE ROAD TO
COMPLIANCE:
WHY DO
COMPANIES FAIL?
• No established accountability
• Unautomated process
• A legal process, rather than a
customer service engagement
• People as human data integrators
• No control over personal data
• Reluctance to share the data
1515
CAPTURE AND TRACK PERSONAL DATA
Data Management Discipline: Data Cataloging
The road to success (1/5)
1616
FOSTER ACCOUNTABILITY
Data Management Discipline: Data Stewardship
The road to success (2/5)
6%
ASKED FOR A
DELAY
EXTENSION
1717
RECONCILE DATA
Data Management Discipline: Customer 360/MDM
The road to success (3/5)
1818
ENFORCE COMPLIANCE
Data Management discipline: Data Masking
The road to success (4/5)
1919
MAKE DATA AVAILABLE FOR DATA SUBJECTS
Data Management Discipline: Data Services
The road to success (5/5)
7%
RESPONDED
IN < 1 DAY
DELIVERING ANALYTICS
AT SCALE WITH A
GOVERNED DATA LAKE
Jean-Michel Franco
Sr Director for Data Governance Products
@jmichel_franco

Delivering Analytics at Scale with a Governed Data Lake

  • 1.
    DELIVERING ANALYTICS AT SCALEWITH A GOVERNED DATA LAKE Jean-Michel Franco Sr Director for Data Governance Products @jmichel_franco
  • 2.
    22 AGENDA A data governancemodel for the Data Lake Building the platform for the governed data lake Use Cases : Establishing GDPR/Data Privacy compliance in a customer 360° lake 01 02 03
  • 3.
    33 2012 2013 20142015 2016 2017 2018 2019 2020 2021 DATA ECONOMICS ARE BROKEN TRADITIONAL APPROACH CAN NO LONGER KEEP UP Data & business expectations Delivery Capabilities WIDENING GAP 3Xgrowth rate in self-service users Data doubling every 2 years cloud machine learning IOT People
  • 4.
    44 Traditional DataWarehouse Authoritative Governance WHY TRADITIONALAPPROACHES FAILED WITH BIG DATA Old Model -> Too Few People Access Too Few Data Restricted Data Access Governance Costs, Time to market, Scalability Limited User Reach
  • 5.
    55 RawData WHY MOST DATALAKE APPROACHES WILL FAIL New model -> Struggling To Control The Data Sprawl Any Data Costs, Time to value, Scalability Governance, Risks Any Data Worker Data Scientists Business Analysts Operations INGEST CURATE MANAGE CONSUME INGEST CURATE MANAGE CONSUME INGEST CURATE MANAGE CONSUME GOVERN GOVERN Open Data Hadoop & NoSQL Traditional Data Sources Streams Enterprise Apps Cloud Smart Data Smart Data Smart Data
  • 6.
    66 BALANCING THE GOVERNANCE‘SEE-SAW’ ControlAutonomy
  • 7.
    77 COLLABORATIVE GOVERNANCE FROMTHE GET GO Scaling Trust And Reach Through Collaboration Costs, Time to value, Scalability, Governance Any Data Worker Any Data Open Data Hadoop & NoSQL Traditional Data Sources Streams Enterprise Apps Cloud RawData IT’s Sanctionned Content Business Crowdsourced Content SmartData
  • 8.
    PAVING THE ROADFOR THE GOVERNED DATA LAKE
  • 9.
    99 A 5 STEPSAPPROACH FOR MODERN DATA GOVERNANCE √ • Establish Data Quality Upfront • Unleash Data as a Service for People and Apps • Capture & Document Any Data Sources √ √ √ √ √ √ • Take Control & Protect Data Engineers Business Users Data Scientists Customers Applications API • Foster accountability
  • 10.
    10 WHAT ARE THERELATED 5 DATA MGMT DISCIPLINES? Know Your Data Build the 360°view Protect and govern your Data Foster Accounta- lities Publish data in a controlled way Data Anonymization, Policy Enforcement, Data Lineage Data Cataloging & Metadata Management Data Quality & Master Data Management Data Stewardship Data Cataloging and API
  • 11.
    11 BOOSTING DATA USAGETENFOLD AT ISO-BUDGET WITH A CLOUD DATA-LAKE Goals & Benefits • Expanding data usage and reaping the benefits of data monetization • Turbo-charging analytics from batch to near real time • Establishing end to end security and compliance (MIFID, GDPR…) • Improving data accessibility -> from 45 days to 1 day for provisioning a data lab Data Lake Catalog & Search Access & UI Processing and Analytics Gover- nance & Security Data Ingestion & Integration
  • 12.
    USE CASE: THEGDPR DATA LAKE
  • 13.
    1313 MOST COMPANIES FAIL,BADLY Policies are defined… 98%HAVE UPDATED THEIR PRIVACY POLICIES FOR GDPR 70%FAILED TO PROVIDE THE DATA REQUESTED! 21 days AVG TIME IT TOOK COMPLIANT COMPANIES TO RESPOND But are not enforced… or poorly delivered
  • 14.
    1414 THE ROAD TO COMPLIANCE: WHYDO COMPANIES FAIL? • No established accountability • Unautomated process • A legal process, rather than a customer service engagement • People as human data integrators • No control over personal data • Reluctance to share the data
  • 15.
    1515 CAPTURE AND TRACKPERSONAL DATA Data Management Discipline: Data Cataloging The road to success (1/5)
  • 16.
    1616 FOSTER ACCOUNTABILITY Data ManagementDiscipline: Data Stewardship The road to success (2/5) 6% ASKED FOR A DELAY EXTENSION
  • 17.
    1717 RECONCILE DATA Data ManagementDiscipline: Customer 360/MDM The road to success (3/5)
  • 18.
    1818 ENFORCE COMPLIANCE Data Managementdiscipline: Data Masking The road to success (4/5)
  • 19.
    1919 MAKE DATA AVAILABLEFOR DATA SUBJECTS Data Management Discipline: Data Services The road to success (5/5) 7% RESPONDED IN < 1 DAY
  • 20.
    DELIVERING ANALYTICS AT SCALEWITH A GOVERNED DATA LAKE Jean-Michel Franco Sr Director for Data Governance Products @jmichel_franco