1. InSpark
Erwin de Kreuk
Lead Data and AI
Azure Purview
Microsoft's answer to Data Governance and Data Lineage
@erwindekreuk
https://erwindekreuk.com
HELLHEIM
12:15 CET
DataSaturday Oslo
04 September 2021
@DataSatOslo
5. InSpark
Data governance is becoming increasingly
interdisciplinary
What data do I have?
Where did the data originate?
Can I trust it?
DISCOVERY
What’s my exposure to risk?
Is my usage compliant?
How do I control access & use?
What is required by regulation X?
COMPLIANCE
ChiefDataOfficer
7. InSpark
Data Map
Multicloud
On-prem
Data Insights
Azure Purview
Data Catalog
SaaS
Data Map
Automate and manage metadata at scale
Data Catalog
Enable effortless discovery for data
consumers
Data Insights
Assess data usage across your
organization
8. InSpark
Unified data governance to
maximize the business
value of data
Azure Purview
Reimagine data
governance in the cloud
Set the foundation for
effective data governance
Maximize business value
of data for data
consumers
Gain insight into data use
across the estate
9. InSpark
Manage and govern operational,
transactional and analytical data
Cloud-native, purpose-built
service to address discovery and
compliance needs
Fully managed, serverless, PaaS
service
Eliminate manual, ad-hoc and
homegrown solutions
Reimagine data
governance in the cloud
10. InSpark
Automate discovery of data in on-
premises, multicloud and SaaS
sources
Classify data at scale to specify
sensitivity, compliance, industry,
business and company-specific
value
Know where data came from and
what was derived from it with
data lineage
Set the foundation for
effective data governance
11. InSpark
Connect business and technical
data analysts, data scientists, and
data engineers to a trusted data
catalog
Enable users to quickly find data
and view its lineage and
sensitivity
Deliver a curated and consistent
glossary of business terms and
definitions
Maximize business value
of data for data
consumers
12. InSpark
Understand at a glance how data
is being created and used across
your data estate
Visually assess the state of data
assets, scans, business glossary
and sensitive data
Gain insight into data use
across the estate
13. InSpark
Azure Purview Features
Azure Purview
Azure Purview Platform
Azure Purview Studio
Automated Scanning & Classification
• Dedicated per customer on shared infra
• Provisioned default capacity with option to add-on capacity
Data Map
• Serverless, pay per use
• Includes connectors, scanning of sources, processing into data assets, lineage capture, classification
• Search, browse, asset details
• Automated meta-data and lineage extraction
• Automated classification based on content inspection
• Private Endpoint
• Management center
On-prem & Multi-cloud Operational, Analytical, SaaS
Azure Purview Catalog included with Platform (C0)
Power BI
SQL Server on-prem
Azure Synapse
Azure Data Services
M365 Compliance Cen
Open APIs
(Apache Atlas 2.0)
14. InSpark
Azure Purview Features
Azure Purview
Azure Purview Platform
Azure Purview Studio
Azure Purview Catalog (C1)
Automated Scanning & Classification
• Dedicated per customer on shared infra
• Provisioned default capacity with option to add-on capacity
Data Map
• Serverless, pay per use
• Includes connectors, scanning of sources, processing into data assets, lineage capture, classification
• Search, browse, asset details
• Automated meta-data and lineage extraction
• Automated classification based on content inspection
• Private Endpoint
• Management center
On-prem & Multi-cloud Operational, Analytical, SaaS
• Business Glossary templates
• Lineage visualization & workflows
Azure Purview Catalog included with Platform (C0)
Data Producers &
Consumers
Open APIs
(Apache Atlas 2.0)
Power BI
SQL Server on-prem
Azure Synapse
Azure Data Services
M365 Compliance Cen
15. InSpark
Azure Purview Features
Azure Purview
Azure Purview Platform
Azure Purview Studio
Azure Purview Catalog (C1)
Automated Scanning & Classification
• Dedicated per customer on shared infra
• Provisioned default capacity with option to add-on capacity
Data Map
• Serverless, pay per use
• Includes connectors, scanning of sources, processing into data assets, lineage capture, classification
• Search, browse, asset details
• Automated meta-data and lineage extraction
• Automated classification based on content inspection
• Private Endpoint
• Management center
On-prem & Multi-cloud Operational, Analytical, SaaS
Azure Purview Data Insights (D1)
• Business Glossary templates
• Lineage visualization & workflows
Azure Purview Catalog included with Platform (C0)
• Catalog Insights (Asset, Scan, Glossary)
• Sensitive Information Types & Labeling insights
Data Producers &
Consumers
Data Officers &
Security Officers
Open APIs
(Apache Atlas 2.0)
Power BI
SQL Server on-prem
Azure Synapse
Azure Data Services
M365 Compliance Cen
16. InSpark
• No access to Purview Portal
• Can Manage all aspects of Scanning
• Ideal role for programmatic processes, such as service principals
• Can register Data Sources
Azure Purview – Access Control
Data Source Administrator
17. InSpark
• Has access to Purview Portal
• Can read all content in Azure Purview
Azure Purview – Access Control
Data Reader
Data Source Administrator
18. InSpark
• Has access to Purview Portal
• Can read all content in Azure Purview
• Can edit assets, classification and glossary terms
• Can apply classifications and glossary terms to assets.
• Can not Register Data Sources, only read
Azure Purview - Roles
Data Reader
Data Curator
Data Source Administrator
21. InSpark
Azure Purview - Pricing
• Capacity Unit
• €0.289 per 1 Capacity Unit Hour
• Provisioned API throughput. 1 capacity unit = 1 API/sec
• Includes 4 capacity units for free until February 28, 2021.
• Metadata Storage
• Free in preview
Azure Purview Data Map
22. InSpark
Azure Purview - Pricing
• Capacity Unit
• €0.289 per 1 Capacity Unit Hour
• Provisioned API throughput. 1 capacity unit = 1 API/sec
• Includes 4 capacity units for free until February 28, 2021.
• Metadata Storage
• Free in preview
Azure Purview Data Map
Changed for all new Purview
Accounts created after or on
August 18th, 2021
24. InSpark
Azure Purview - Pricing
• Power BI Online
• Free in Preview
• SQL Server On Prem
• Free in Preview
• Other Data Sources
• Free in Preview
• €0.532 per 1 vCore Hour
Includes 16 vCore-hours for Free every month until February 28, 2021
Azure Purview Data Map
Scanning and Classification
25. InSpark
Azure Purview - Pricing
• C0
• Included with the Data Map
Search and browse of data assets
• C1
• Free in preview
• Business glossary, lineage visualization and catalog insights
• D0
• Free in preview
Sensitive data identification insights
Azure Purview Data Map
Scanning and Classification
Azure Purview Data Catalog
https://azure.microsoft.com/en-us/pricing/details/azure-purview
26. InSpark
Azure Purview Studio Updates Accounts Notifications
Feedback
Metrics
Search Bar
Usefull Links
Recently
Accessed Entities
Search Bar
Key Actvities
27. InSpark
• Quick Actions, recently accessed items, owned Items, search bar and
Documentation
Azure Purview Studio - Activity hubs
• Create collections, register data sources, setup Scans, Integration runtime
• Manage Glossary Items, search, manage terms templates and custom
attributes, import and export Terms using csv
• Insights on your data
• Meta Data Management Security, ADF and data share Connections
31. InSpark
Purview Data Map
Unify and make data meaningful
Automated metadata scanning and
lineage identification of hybrid
data stores
100+ built-in and custom classifiers
Microsoft Information Protection
sensitivity labels
32. InSpark
Purview Data Map
Automated metadata scanning and
lineage identification of hybrid
data stores
100+ built-in and custom classifiers
Microsoft Information Protection
sensitivity labels
Unify and make data meaningful
33. InSpark
Azure Purview Data Catalog
Enable effortless discovery
Semantic search and
browse
Business glossary and
workflows
Data lineage with sources,
owners, transformations,
and lifecycle
37. InSpark
Insights
Reports on Assets, Scans,
Glossary, Classification,
and Labeling
Get a bird’s-eye view of sensitive data
38. InSpark
Register and Scan a Power BI Tenant
Discover data registered and scanned by Azure Purview
Allow service principals to use read-only Power BI admin APIs
Enhance admin APIs responses with detailed metadata
39. InSpark
Register and Scan a Power BI Tenant
Discover data registered and scanned by Azure Purview
Allow service principals to use read-only Power BI admin APIs
Enhance admin APIs responses with detailed metadata
40. InSpark
Integrate Azure Purview in Azure Synapse Analytics
Discover data registered and scanned by Azure Purview
In Preview
41. InSpark
Integrate Azure Purview in Azure Synapse Analytics
Discover data registered and scanned by Azure Purview
In Preview
42. InSpark
Integrate Azure Purview in Azure Synapse Analytics
Discover data registered and scanned by Azure Purview
In Preview
43. InSpark
Integrate Azure Purview in Azure Synapse Analytics
Discover data registered and scanned by Azure Purview
In Preview
44. InSpark
Integrate Azure Purview in Azure Synapse Analytics
Discover data registered and scanned by Azure Purview
In Preview
Hallo and Welcome to my session about Azure Purview
My name is Erwin de Kreuk and I’m working as a Lead Data and AI for InSpark a Microsoft Partner in the Netherlands
Ciao e benvenuto alla mia sessione su Azure Purview
Hallo and Welcome to my session about Azure Purview
My name is Erwin de Kreuk and I’m working as a Lead Data and AI for InSpark a Microsoft Partner in the Netherlands
Azure Purview is a unified data governance service.
During this session I will explain what Azure Purview is.
The position of Azure Purview within your Data Estate
And how it works with some practical examples
If you have questions, please feel free to ask them
History
Blue Talon June 2019
With Azure Purview Microsoft has now his own Cloud Native Service for Data Governance and Data Lineage.
I'm curious what the future will bring, but also which position it will take compared to Colibra / Informatica / AWS Glue Data Catalog or other Data Governance products
As we all know Data Governance is becoming more and more becoming increasingly interdisciplinary.
A chief data officer (CDO) is a corporate officer who is responsible for enterprise-wide governance and utilization of information as an asset, via data processing, analysis, data mining, information trading and other means.
He will be one of the users who will use Azure Purview to get answers
On what kind of do I have within my Data Estate
Where is the data coming from but also I can trust the data.
But also compliance is getting more and more important with all the required regulations from the local government or industries. F.E ISO and NEN certifications.
Besided these questions the CDO wants to have also answers based
On what are the risk to exposure mu data
How can we control the access and use of data and compliant is our data.
The following elements can lead to a successful data governance which is one of the key components in a modern Data Estate:
You need to have control on your growing data landscape
You want to Overcome operational silos
A data silo is a collection of data held by one group that is not easily or fully accessible by other groups. ... Finance, administration, HR, and other departments need different information to do their work, and those individual collections of often overlapping-but-inconsistent data are in separate silos
You want Increase the flexibility/agility of your data
And You want make sure you comply with all different industry regulations and local government regulations.
Azure Purview can help you with these elements
Azure Purview organizes metadata that enables your organization to break down silos and derive meaning from data.
Once data can be understood and annotated, it then lends itself to several applications –
During the public we can use the data map where automate and manage metadata at scale
Data catalog to Discover and search for data
Data insights. To get an overview of the data in our Data Estate
This’s what Azure Purview currently has to offer
In the future, privacy, quality and master data management will follow.
There are 4 pilars which helps you to maximize the business value of data in your organization
Data Governance
Set the Foundation
Create Business Value for the consumers
And of course, insights should not be missing
Key features of Reimagine data governance in the cloud
Cloud Native
Managed
Serverless
PaaS
Key features for the foundation are
Automate and Discover data of different sources
Classify data to specify sensitivity
Know where your data is coming from
Key features to maximize the business values
Connect the different roles within your organization to a trusted data catalog
Enable them to quickly find this data
Key features to gain insights
Understand at a glance how data is being created and used across your data estate
Visually the state of data assets, scans, business glossary and sensitive data
Datasource
Power BI, SQL Sever on-prem, Azure Data Services including Synapse, Cosmos DB & Storage, Non-Microsoft systems including SAP ECC, SAP S4 HANA & Teradata, Multi-cloud systems including AWS S3
With Purview Platform:
Automate scanning and classification of multicloud, SaaS, on-prem data. 25 plus out of box connectors and file formats supported
Modernize homegrown catalogs built on opensource technology with Purview using Apache Atlas APIs supported out-of-the-box
Get catalog features (C0 Tier) for FREE included with Purview platform:
Search and browse
Empower business and technical data analysts via a catalog to find and interpret data.
Power data scientists and engineers with business context to drive BI, Analytics, AI and ML initiatives
Automated metadata and lineage extraction
Enrich the business value of data with technical, business and semantic metadata
Scale understanding of data with automated, fully managed, serverless metadata management capability
Leverage support of Apache Atlas’s open-source Lineage APIs to push lineage information into the Purview Data Map.
Analyze impact of changes to data and understand dependencies visually.
Azure Purview Catalog (C1 Tier) includes the following in addition to the free features included with the platform:
Business Glossary
Deliver a curated and consistent understanding of business terms and definitions.
Import existing glossary terms from existing data dictionaries easily.
Also add ability to define custom attributes for the glossary terms and create templates for different domains like ‘Finance’, ‘Sales’ etc.
Lineage views
Ensure data provenance with a visual representation of owners, sources, transformation, and lifecycle
Built-in integrations with solutions to automatically extract lineage such as Synapse Analytics, Azure Data Factory, Azure Data Share etc.
Data Insights (D1 Tier) provides a bird’s eye view of your data landscape intended to help users such as Chief Data Officers quickly understand their data estate at large and gain key insights such as where sensitive data resides.
It includes:
Catalog insights:
Asset Insights: Quickly see where all your data resides across a range of data sources
Scan Insights: Success/failures/cancellations over a period
Glossary Insights: Quickly understand changes made to the glossary over time and assess how much coverage glossary has over your data map.
Sensitive data insights
Simplify compliance risk assessment across all your operational and transactional data sources.
Assess risk and derive audit trails of data qualified by sensitivity and business relevance.
Purview Data Source Administrator Role
Does not have access to the Purview Portal (the user needs to also be in the Data Reader or Data Curator roles) and can manage all aspects of scanning data into Azure Purview but does not have read or write access to content in Azure Purview beyond those related to scanning.
programmatic processes, such as service principals, that need to be able to set up and monitor scans but should not have access to any of the catalog's data.
Purview Data Reader Role
Has access to the Purview portal and can read all content in Azure Purview except for scan bindings
Purview Data Curator Role
Has access to the Purview portal and can read all content in Azure Purview except for scan bindings, can edit information about assets, can edit classification definitions and glossary terms, and can apply classifications and glossary terms to assets.
Purview Data Source Administrator Role
Does not have access to the Purview Portal (the user needs to also be in the Data Reader or Data Curator roles) and can manage all aspects of scanning data into Azure Purview but does not have read or write access to content in Azure Purview beyond those related to scanning.
programmatic processes, such as service principals, that need to be able to set up and monitor scans but should not have access to any of the catalog's data.
Purview Data Reader Role
Has access to the Purview portal and can read all content in Azure Purview except for scan bindings
Purview Data Curator Role
Has access to the Purview portal and can read all content in Azure Purview except for scan bindings, can edit information about assets, can edit classification definitions and glossary terms, and can apply classifications and glossary terms to assets.
When deploying an Azure Purview Account on or after August 18th, 2021 you now can also assign roles bases on Collection
So as you can see in the Example you can restricted people to see data in the Collection Assets Revenue.
How this all works, I will show that I a later demo
4 capacity units are only for some subscriptions types
Charging will now start as of 1 Capacity unit, for all Azure Purview accounts created on or after Augusts 18, 2021. Existing Purview accounts will be migrated starting September/October.
Currently the Elastic Data Map is free
Purview Data Map can automatically scale up and down within the elasticity window
To get the next level of the elasticity window, a support ticket needs to be created.
A single, centralized place that provides unified experience for data producers, data consumers, data & security officers
Home
Quick Actions, recently accessed items, owned Items, search bar and Documentation
Sources
Create collections, register data sources, setup Scans, Integration runtimeGlossary
Manage Glossary Items, search, manage terms templates and custom attributes, import and export Terms using csv
Insights
Insights on your data
Management Center
Meta Data Management Security, ADF and data share Connections
Demo Activity Hubs
Home Page
Tabs
Table view-Map View
Scan
ADLS Define Scope
All Source are categorized
Pay Attention when you have enabled Private endpoint that you can access selected networks/sources
Intended to help users such as Chief Data Officers quickly understand their data estate at large and gain key insights such as where sensitive data resides
Asset Insight Understand distribution of data assets across a range of data sources & environments
Scan Insight Number of successful, failed and cancelled scans over time
Glossary Insights Understand changes made to business terms and assess how much coverage glossary has over the data map
Classifications Insights Understand what sensitive data exists across the data estate from various lens
Sensitivity Labels Insights Understand what sensitivity labels have been applied across the data estate
File Extensions Insights Recently scanned files based on their extensions
Reports on Assets, Scans, Glossary, Classification, and Labeling
You need make sure that your Azure Purview Account as permission to read the PowerBI Tenant.
You need to be a Power BI Admin to see the tenant settings page.
First of all create a Security Group and add your Purview Account as a Member
Then you need to add this Security Group to the tenant setting Allow service principals to use read-only Power BI admin APIs to allow Purview to scan your PowerBI Metadata you need to enable Enhance admin APIs responses with detailed metadata
Make sure that before you start scanning your Power BI Dataset and to get the metadata, you must schedule a refresh in the powerbi service.
I immediately thought back to a keynote from Pass Summit 2015, in which , Microsoft's new vision immediately became clear Walk with your head in the Cloud and your feet on the ground. I don’t why but it just came up.
But it makes it clear that Microsoft is now busy to create a Unified experience for his customers. Where Azure Synapse is the heart and with the link to Azure Purview and Azure Cosmos DB/
I immediately thought back to a keynote from Pass Summit 2015, in which , Microsoft's new vision immediately became clear Walk with your head in the Cloud and your feet on the ground. I don’t why but it just came up.
But it makes it clear that Microsoft is now busy to create a Unified experience for his customers. Where Azure Synapse is the heart and with the link to Azure Purview and Azure Cosmos DB is getting even more simple.
Once you created this connection you directly search with the Azure Purview catalog
And for 2 weeks your Data Lineage will be enabled also when connecting your Purview Account
Azure Purview drops lineage if the source or destination uses an unsupported data storage system.
Once you created this connection you directly search with the Azure Purview catalog
And for 2 weeks your Data Lineage will be enabled also when connecting your Purview Account
Azure Purview drops lineage if the source or destination uses an unsupported data storage system.
You may see below warning if you have the privilege to read Purview role assignment information and the needed role is not granted.
To make sure the connection is properly set for the pipeline lineage push, go to your Purview account and check if Purview Data Curator role is granted to the Synapse workspace's managed identity. If not, manually add the role assignment.
Source
Collection
Scan + Scan Rule set + Custom File Type
Schedule
Search catalog cities Lineage
Browse Assets Edit/Overview/Lineage/Contacts
Show Insights
Show Synapse Integration