SlideShare a Scribd company logo
Data Governace & MultiTenancy
A Tech Perspective
Sathish K S
VP Engineering
Zeotap
Intro
ZEOTAP
 Customer Intelligence Platform – SAAS, DAAS & Combined offering
 Enables Brands to better understand their customers
o 360 View
o Identity resolution
o Activation
 Data Assets – People centric including from Telcos
 Full Privacy/GDPR compliant
 Catering to Ad-Tech and MarTech
DATA GOVERNANCE
 Data Principles – Data as an asset and linkage with business
 Data Quality – Need of standards of the asset
 Meta Data - Semantics of the asset
 Data Access - Permissions
 Data Lifecycle – States and management
KHATRI & BROWN - 2010
DATA GOVERNANCE
 Cross-Functional Framework
 Manage Data as an Asset
o Access Rights
o Processing Rights
o Activation
 Policies & Standards
 Compliance Adherence
 Monitoring and Auditing
 Acts on Information and not on Data
 You Cannot Govern What You Don’t Know
Features
DATA GOVERNANCE - General
 Explore
 Provision to assign ownership
 Provision for knowledge attribution - lineage
 Provision for access rights
 Categorization and Use case
 Apply Rules
 Metering outflow
 Derivation / Inference history
Goals
 Democratize
 Secure
 Compliance
 Quality
DATA GOVERNANCE - Accommodate
 Changing Regulations
 More variety in data
o AI Generated
o Biometrics PII
 Country
 Data ownership flux
Basic Tech Building Blocks
Data Assets
Inventory
• Location
• Asset Type
• Category
• Version
• Source
Data Inventory
• Schema
• Data Types
• Tags
• Cardinality
• Values
• Quality Stats
Data Security
• Access Control
• Encryption
• Masking
• Erasure
Data Lineage
• Knowledge
source
• Timestamp
• Weights
Basic Tech Building Blocks
Data Quality
• Stats
• Freshness
• Density
• Anomaly
• Dedup
• Quality Stats
Policy
Management
• Compliance
• Consent
• Thresholds
• Lifecycle
• Actions
Audit
&
Monitoring
• Notifications
• Logging
• Analytics
Throw in Multi-Tenancy
DATA GOVERNANCE – MULTI-Tenancy
 Organizational
 Vertical
o Regulations
 Data Domain
o Data Profile / Quality
o Security
o Lifecycle
o Processing
o Storage
o Metadata
DATA GOVERNANCE - MULTITenancy
 Application or usage Context
o Analytics
o Derivations
o Marketing
o Personalization
 Multi-Tenant Architectural Implications
o Shared Everything
o Shared Nothing
o Shared Stateless Control Planes
In Essence
DATA GOVERNANCE
IS BECOMING
A BIG DATA PROBLEM IN ITS OWN
Building The Bits
Catalog Evolution
What is in my
data?
- Schema
Who gave the
data?
- Lineage
How do you
describe?
- Raw or
Inferred
Where all is it
stored, when?
- Location
- Version
How is it
used?
- Purpose
Who owns it?
- Access
Who uses it?
- Roles
What rules
govern?
- Governance
POLICY Management & Catalog
 Policy and Actions
 Policy Types – schema, value, record…
 Action – Drop action, Alert action, Null Action…
 Hierarchy support
 Runtime parameters like blacklist & whitelist
 Thresholds
 F(Policy) + G(Param) = Action to be actuated
Lifecyle & Ownership
Collect
• Source
• Provider
Process
• Processor
• Auditor
Analyze
• Analyst
• Validator
Share
• Consumer
• Sharer
EOL
• Source
• Provider
Owner responsible for assigning permissions for usage
Continuous Quality Measurement
Problem
 Quality linked to Usage or Value Creation
 Monitor, Detect, Propagate
Tools
 AWS Deequ – cloud giving first class citizenship to quality. Open sourced
 Apache Griffin - Platform style more than library style
 Datacleaner – https://datacleaner.org/
 Great Expectations - https://greatexpectations.io/ - You may be surprised!
Lineage Problem – DAAS Multitenancy
 Who is the knowledge contributor?
 Challenge
o Multiple Data sources
o Conflicting attributes
o Derivation models
Catalog – On Steroids
 Interface for search
 Lookups
 Multi-relational
 Transitivity
 Updates
o Registration
o Onboarding
o Processing output
o Futuristic AI / ML based
To Explore
 Tooling is maturing at rapid pace
Softwares beyond cloud natives
 Poster child – Amundsen from Lyft
 Challenger – Apache Atlas
o Extensible Entity and Type System
o Backed by GraphDB (Janus) & Elastic
 Upcoming
o Atlan – Definitely Check This out - https://atlan.com/
o Marquez Project – Also this - https://marquezproject.github.io/marquez/
Tools Evaluation
 Storage integrations
 Intelligent crawling
 APIs to push metadata from processing systems
 Event based integrations
 Self serve UI & APIs
 Bulk export capabilities for AI/ML on meta
Security - Infra
 Principles of Zero Trust and Positive Security
 Perimeter
o VPN in cloud
o Apache Knox
 Access – Attribute based access control
 Masking
 Encryption/Hashing
 Key Management – important in data exchange
Audit Service
■ Define Grammar – Queryable using popular SQL on Hadoop
■ Cloud storage – S3, GCS
■ Logs grammar to include
○ ViolationType
○ Product Code
○ Dataflow stage
○ Action taken
○ Action Timestamp
○ Violation Timestamp
○ ++
Declarative Data Workflows
Problem : Governance – react to changes, dynamic, extensibility
 Invest in declarative data pipelines and workflows (workflow as a code)
 Invest in Functional repository of Pure Functions
Tools
 Poster child – Airflow
 Upcoming
o Prefect – parameterized dynamic workflows, irregular / no schedule - https://github.com/prefecthq/prefect
o A-La-Mode – emphasis on output versioning and success detection - https://github.com/binaryaffairs/a-la-mode
Accumulate Function Repo
Pure Function Repositories with Functional
Assets to perform
 Enrich the data
 Standardize
 Normalize
 Apply a Rule
 Find Pattern (Regular Expression)
 Sort
 Filter
 Merge
 Transpose
 Parse
 Transform Data Types
 Missing Data Handling
Ref Architectures
Where is Zeotap
 SAAS – shared nothing
 DAAS – shared everything
 ID resolution product – shared control plane
 Provisioning System
 Heavily borrowed from SDN concepts
 Testable IAAC with
o SLAs for slowly created assets & quick assets
o Error handling
o Transactional
o Ease of Billing
Path To Self Serve
 Provisioning System
o Support – Shared Nothing, Shared Control Plane, Shared Resources style
o Cloud APIs
 Workflow as a Service
o In house control plane – Mr Sheperd & Kingpin based on Apache Camel
o Abstraction over Airflow, Oozie and Livy
 Self serve Compliance and Consent
o Supported Regulations – GDPR, CCPA
o Frameworks – TCF, TCF 2.0
o Add your own rules (DSL)
Provisioning System - Evolving
Thank You
● Share this session on social media using #DataLakeSummit
● Interact with other attendees on Slack at datalake-summit.slack.com

More Related Content

What's hot

How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best Practice
DATAVERSITY
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
DATUM LLC
 
Real-World Data Governance: Governance Risk and Compliance
Real-World Data Governance: Governance Risk and ComplianceReal-World Data Governance: Governance Risk and Compliance
Real-World Data Governance: Governance Risk and Compliance
DATAVERSITY
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data Quality
DATAVERSITY
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
Pieter De Leenheer
 
Building an Effective Data Management Strategy
Building an Effective Data Management StrategyBuilding an Effective Data Management Strategy
Building an Effective Data Management Strategy
Harley Capewell
 
State of Data Governance in 2021
State of Data Governance in 2021State of Data Governance in 2021
State of Data Governance in 2021
DATAVERSITY
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Analytics8
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
Robyn Bollhorst
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsSheldon McCarthy
 
Sustainable Data Governance
Sustainable Data GovernanceSustainable Data Governance
Sustainable Data Governance
First San Francisco Partners
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
DATAVERSITY
 
Data governance
Data governanceData governance
Data governanceSambaSoup
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
DATAVERSITY
 
Kickstart a Data Quality Strategy to Build Trust in Data
Kickstart a Data Quality Strategy to Build Trust in DataKickstart a Data Quality Strategy to Build Trust in Data
Kickstart a Data Quality Strategy to Build Trust in Data
Precisely
 
Data-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance StrategiesData-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance Strategies
DATAVERSITY
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
Sai Paravastu
 
Top 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data GovernanceTop 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data Governance
First San Francisco Partners
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
Enabling an Analytics-Driven Organization
Enabling an Analytics-Driven OrganizationEnabling an Analytics-Driven Organization
Enabling an Analytics-Driven Organization
First San Francisco Partners
 

What's hot (20)

How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best Practice
 
How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model How to Build & Sustain a Data Governance Operating Model
How to Build & Sustain a Data Governance Operating Model
 
Real-World Data Governance: Governance Risk and Compliance
Real-World Data Governance: Governance Risk and ComplianceReal-World Data Governance: Governance Risk and Compliance
Real-World Data Governance: Governance Risk and Compliance
 
How to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data QualityHow to Strengthen Enterprise Data Governance with Data Quality
How to Strengthen Enterprise Data Governance with Data Quality
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
Building an Effective Data Management Strategy
Building an Effective Data Management StrategyBuilding an Effective Data Management Strategy
Building an Effective Data Management Strategy
 
State of Data Governance in 2021
State of Data Governance in 2021State of Data Governance in 2021
State of Data Governance in 2021
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
 
Most Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital EconomyMost Common Data Governance Challenges in the Digital Economy
Most Common Data Governance Challenges in the Digital Economy
 
Enterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial InstitutionsEnterprise Data Governance for Financial Institutions
Enterprise Data Governance for Financial Institutions
 
Sustainable Data Governance
Sustainable Data GovernanceSustainable Data Governance
Sustainable Data Governance
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Data governance
Data governanceData governance
Data governance
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
Kickstart a Data Quality Strategy to Build Trust in Data
Kickstart a Data Quality Strategy to Build Trust in DataKickstart a Data Quality Strategy to Build Trust in Data
Kickstart a Data Quality Strategy to Build Trust in Data
 
Data-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance StrategiesData-Ed Online Webinar: Data Governance Strategies
Data-Ed Online Webinar: Data Governance Strategies
 
Eclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentationEclipse day Sydney 2014 BIG data presentation
Eclipse day Sydney 2014 BIG data presentation
 
Top 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data GovernanceTop 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data Governance
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
 
Enabling an Analytics-Driven Organization
Enabling an Analytics-Driven OrganizationEnabling an Analytics-Driven Organization
Enabling an Analytics-Driven Organization
 

Similar to Data governance datalakes_multitenancy

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Philip Filleul
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
Kiran Kamreddy
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-OverviewHarry Frost
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
Syaifuddin Ismail
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
DataWorks Summit
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
hadooparchbook
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Caserta
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
Satish Bhatia
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
Contexti
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
Nigel Jones
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
DataWorks Summit/Hadoop Summit
 
Myth Busters 9: Data virtualization doesn’t help me with data governance
Myth Busters 9: Data virtualization doesn’t help me with data governanceMyth Busters 9: Data virtualization doesn’t help me with data governance
Myth Busters 9: Data virtualization doesn’t help me with data governance
Denodo
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
Precisely
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
Denodo
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
DataWorks Summit/Hadoop Summit
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
Paolo Platter
 

Similar to Data governance datalakes_multitenancy (20)

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
Cardinality-HL-Overview
Cardinality-HL-OverviewCardinality-HL-Overview
Cardinality-HL-Overview
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
 
Hadoop Application Architectures - Fraud Detection
Hadoop Application Architectures - Fraud  DetectionHadoop Application Architectures - Fraud  Detection
Hadoop Application Architectures - Fraud Detection
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
Unleashing the power of apache atlas with apache - virtual dataconnector
Unleashing the power of apache atlas with apache  - virtual dataconnectorUnleashing the power of apache atlas with apache  - virtual dataconnector
Unleashing the power of apache atlas with apache - virtual dataconnector
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
 
Myth Busters 9: Data virtualization doesn’t help me with data governance
Myth Busters 9: Data virtualization doesn’t help me with data governanceMyth Busters 9: Data virtualization doesn’t help me with data governance
Myth Busters 9: Data virtualization doesn’t help me with data governance
 
Hadoop Perspectives for 2017
Hadoop Perspectives for 2017Hadoop Perspectives for 2017
Hadoop Perspectives for 2017
 
Data Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 

Recently uploaded

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 

Recently uploaded (20)

Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 

Data governance datalakes_multitenancy

  • 1. Data Governace & MultiTenancy A Tech Perspective Sathish K S VP Engineering Zeotap
  • 3. ZEOTAP  Customer Intelligence Platform – SAAS, DAAS & Combined offering  Enables Brands to better understand their customers o 360 View o Identity resolution o Activation  Data Assets – People centric including from Telcos  Full Privacy/GDPR compliant  Catering to Ad-Tech and MarTech
  • 4. DATA GOVERNANCE  Data Principles – Data as an asset and linkage with business  Data Quality – Need of standards of the asset  Meta Data - Semantics of the asset  Data Access - Permissions  Data Lifecycle – States and management KHATRI & BROWN - 2010
  • 5. DATA GOVERNANCE  Cross-Functional Framework  Manage Data as an Asset o Access Rights o Processing Rights o Activation  Policies & Standards  Compliance Adherence  Monitoring and Auditing  Acts on Information and not on Data  You Cannot Govern What You Don’t Know
  • 7. DATA GOVERNANCE - General  Explore  Provision to assign ownership  Provision for knowledge attribution - lineage  Provision for access rights  Categorization and Use case  Apply Rules  Metering outflow  Derivation / Inference history Goals  Democratize  Secure  Compliance  Quality
  • 8. DATA GOVERNANCE - Accommodate  Changing Regulations  More variety in data o AI Generated o Biometrics PII  Country  Data ownership flux
  • 9. Basic Tech Building Blocks Data Assets Inventory • Location • Asset Type • Category • Version • Source Data Inventory • Schema • Data Types • Tags • Cardinality • Values • Quality Stats Data Security • Access Control • Encryption • Masking • Erasure Data Lineage • Knowledge source • Timestamp • Weights
  • 10. Basic Tech Building Blocks Data Quality • Stats • Freshness • Density • Anomaly • Dedup • Quality Stats Policy Management • Compliance • Consent • Thresholds • Lifecycle • Actions Audit & Monitoring • Notifications • Logging • Analytics
  • 12. DATA GOVERNANCE – MULTI-Tenancy  Organizational  Vertical o Regulations  Data Domain o Data Profile / Quality o Security o Lifecycle o Processing o Storage o Metadata
  • 13. DATA GOVERNANCE - MULTITenancy  Application or usage Context o Analytics o Derivations o Marketing o Personalization  Multi-Tenant Architectural Implications o Shared Everything o Shared Nothing o Shared Stateless Control Planes
  • 14. In Essence DATA GOVERNANCE IS BECOMING A BIG DATA PROBLEM IN ITS OWN
  • 16. Catalog Evolution What is in my data? - Schema Who gave the data? - Lineage How do you describe? - Raw or Inferred Where all is it stored, when? - Location - Version How is it used? - Purpose Who owns it? - Access Who uses it? - Roles What rules govern? - Governance
  • 17. POLICY Management & Catalog  Policy and Actions  Policy Types – schema, value, record…  Action – Drop action, Alert action, Null Action…  Hierarchy support  Runtime parameters like blacklist & whitelist  Thresholds  F(Policy) + G(Param) = Action to be actuated
  • 18. Lifecyle & Ownership Collect • Source • Provider Process • Processor • Auditor Analyze • Analyst • Validator Share • Consumer • Sharer EOL • Source • Provider Owner responsible for assigning permissions for usage
  • 19. Continuous Quality Measurement Problem  Quality linked to Usage or Value Creation  Monitor, Detect, Propagate Tools  AWS Deequ – cloud giving first class citizenship to quality. Open sourced  Apache Griffin - Platform style more than library style  Datacleaner – https://datacleaner.org/  Great Expectations - https://greatexpectations.io/ - You may be surprised!
  • 20. Lineage Problem – DAAS Multitenancy  Who is the knowledge contributor?  Challenge o Multiple Data sources o Conflicting attributes o Derivation models
  • 21. Catalog – On Steroids  Interface for search  Lookups  Multi-relational  Transitivity  Updates o Registration o Onboarding o Processing output o Futuristic AI / ML based
  • 22. To Explore  Tooling is maturing at rapid pace Softwares beyond cloud natives  Poster child – Amundsen from Lyft  Challenger – Apache Atlas o Extensible Entity and Type System o Backed by GraphDB (Janus) & Elastic  Upcoming o Atlan – Definitely Check This out - https://atlan.com/ o Marquez Project – Also this - https://marquezproject.github.io/marquez/
  • 23. Tools Evaluation  Storage integrations  Intelligent crawling  APIs to push metadata from processing systems  Event based integrations  Self serve UI & APIs  Bulk export capabilities for AI/ML on meta
  • 24. Security - Infra  Principles of Zero Trust and Positive Security  Perimeter o VPN in cloud o Apache Knox  Access – Attribute based access control  Masking  Encryption/Hashing  Key Management – important in data exchange
  • 25. Audit Service ■ Define Grammar – Queryable using popular SQL on Hadoop ■ Cloud storage – S3, GCS ■ Logs grammar to include ○ ViolationType ○ Product Code ○ Dataflow stage ○ Action taken ○ Action Timestamp ○ Violation Timestamp ○ ++
  • 26. Declarative Data Workflows Problem : Governance – react to changes, dynamic, extensibility  Invest in declarative data pipelines and workflows (workflow as a code)  Invest in Functional repository of Pure Functions Tools  Poster child – Airflow  Upcoming o Prefect – parameterized dynamic workflows, irregular / no schedule - https://github.com/prefecthq/prefect o A-La-Mode – emphasis on output versioning and success detection - https://github.com/binaryaffairs/a-la-mode
  • 27. Accumulate Function Repo Pure Function Repositories with Functional Assets to perform  Enrich the data  Standardize  Normalize  Apply a Rule  Find Pattern (Regular Expression)  Sort  Filter  Merge  Transpose  Parse  Transform Data Types  Missing Data Handling
  • 29. Where is Zeotap  SAAS – shared nothing  DAAS – shared everything  ID resolution product – shared control plane  Provisioning System  Heavily borrowed from SDN concepts  Testable IAAC with o SLAs for slowly created assets & quick assets o Error handling o Transactional o Ease of Billing
  • 30. Path To Self Serve  Provisioning System o Support – Shared Nothing, Shared Control Plane, Shared Resources style o Cloud APIs  Workflow as a Service o In house control plane – Mr Sheperd & Kingpin based on Apache Camel o Abstraction over Airflow, Oozie and Livy  Self serve Compliance and Consent o Supported Regulations – GDPR, CCPA o Frameworks – TCF, TCF 2.0 o Add your own rules (DSL)
  • 32.
  • 33. Thank You ● Share this session on social media using #DataLakeSummit ● Interact with other attendees on Slack at datalake-summit.slack.com