Hi, I am Stan from Collibra, a best of breed Data Governance software vendor. Today I want to introduce you to Collibra's "Data Governance Center". I want to show you how the software supports the process of data management, and the steps to increase your data maturity.
Let's start with a brief overview of Data Governance in practice. In the top left corner, we have the organization dimension, with the councils & committees, working groups, functional teams etc. that carry an organizational responsibility. Who owns the the "Customer” data domain: "Sales" or "Finance"? In the top right corner, we find people & process: the stewards and stakeholders (and chief stewards, entity curators, Data Czarina's, ...) and the workflows between them, like approvals and issue management. You are in control: the software has out of the box roles and flows for organizations with a low Data Governance maturity. For those organizations with a higher maturity level and a need for more advanced processes, you can configure everything according to your own specific needs. In the bottom right corner, we find our assets: the data & business definitions, business concepts and relations, taxonomies, reference data, rules & policies... And in the lower left corner we have metrics regarding your data governance maturity: - growth metrics that show you how fast you are moving forward, for example the declining trend in the number of issues month by month - maturity metrics which indicate your current maturity level, for example do you have 50% responsibility coverage for your assets or 75?
Let's zoom in on this a little bit and see who the stakeholders for Data Governance are. We need to understand who they are, what they expect , and how they can contribute . The example that I am using is a typical one, even for organizations with a lower level of maturity: - on top, we have some kind of council with decision makers, ideally harvested across functional groups, and a top champion who sponsors the initiative - at the bottom, we have the different areas and various kinds of stewards. These are business users who in their daily work produce and consume data. I would say that the top people are our "one percenters" of DG and that the bottom people are the "five - ten" percenters of DG. Data Governance is not their main job. They contribute to it. They contribute a percentage of their time to Data Governance. In the middle we find the full time Data Governance people. Data Governance is their job. They live and breathe it. This is typically a core team which co-ordinates all the data governance efforts within the organization. How does the Collibra software help all these people? How can the software help them increase your data maturity?
Let me give a brief overview of Collibra's Data Governance Center. The Business Semantics Glossary focuses at a core asset: business semantics or metadata, ranging from business & data definitions to business rules & policies to assist you with policy management. The Reference Data Accelerator focuses on the asset of reference data: codes & codesets, relations and hierarchies. The Data Stewardship Manager focuses on full support for your data stewards & stakeholders: their communication, coordination and facilitation.
Collibra's Business Semantics Glossary is packed full of features that you are looking for to build, import, manage, maintain, reconcile, ... your business & data definitions, rules & policies. In this presentation I will focus on consumption rather than on production. Why? Remember our x-percenters in Data Governance. For them and for all the rest of the organization the value comes from consuming. Their main question will be: how do I get my hands on this? For the full time Data Governance heroes out there: that means that your top of mind question should be: how do I enable accessibility of our data assets? Let's see how Collibra helps. Where does our target audience spend their time?
They look at reports: in specialized reporting environments, on shared portals, in shared drives, inside spreadsheets. As soon as something in there seems not quite right - you know, the numbers seem wrong, what is behind them?, is the data correct?, is the report showing what it should? ... - the trust in your data gradually disappears. Data Governance guarantees trustworthy data . From any Windows application where you can select text (office, excel, IE, ...) our target user can simply highlight what he or she needs, hit CTRL+ALT+S and Collibra delivers contextualized search results. These provide the first answers: there is more than one kind of customer, what are the differences, what are the rules, where is the data stored, ... And from here people can directly use the "Business User Portal" to get access to all the data assets and easily search, filter, browse, navigate information as well as access the people with Data Governance responsibilities to re-establish and maintain trust.
Collibra's Reference Data Accelerator extends and adds to the functions of the Business Semantics Glossary with a strong focus on reference data. Reference data is defined as data that classifies other data. For example, your database records will contain columns like "CUST_TYPE", in which you will find a limited set of values like "GLD_CUST" and "SLV_CUST". This is reference data, and its purpose it to classify data so that you can treat your Silver Customers well and your Gold Customers even better. Their business rules and processes differ and your colleagues or employees act accordingly if the business needs to run right. Reference data has key strategic business value, and needs to be governed as such. Reference data is found for almost everything: customers, prospects, accounts, products, securities, instruments, trials, locations, statuses, transactions, ...
A characteristic of reference data is that a lot of it will be outside of your control. One of the simplest examples here are the ISO Country Codes. The International Standards Organization (ISO) manages the list of countries as well as the list of codes (2 or 3 character codes) to represent those codes. And those codes are sprinkled all over your data: which customers are in which countries, operating locations, transactions between countries, ... And yet, you do not control that standardized list in any manner. You get what is given. That doesn't mean that what you get is immediately useful. For example, "AS" represents "American Samoa", one of the "countries". Except it is not a country. It is a dependency, and its parent country is the US. That could mean for example that any transactions made with AS needs to follow US regulations. And that you should listen to the FDA for clinical trials in AS. This is critical. You will also need to "hook up" this reference data to your own internal reference data. Let's zoom in on the region "North America". Inside Collibra we can relate that region (which might be internal reference data) with those countries (which is external reference data): "North America" consists of "Canada", "The United States" and many others. Drilling down on "The United States" shows us other relations as well: - two direct mappings: US as in the ISO 3166-2 list, and USA as in the ISO 3166-3 list, and - according to the countries by the "Bank of England" the United States is part of "USA". This is because they use the reference code "USA" to include the United States, as well as American Samoa, Guam and the Northern Mariana Islands. Next to those mappings & relations, you also need hierarchy management of your reference data. Let's take products as an example. We have Swaps (LO2000) & Loans (LO1000). Then we find Fixed Rate Interest Loans (LO1100), Floating Rate Interest Loans (LO1200) and the mixed bag: Fixed/Floating (LO1300) and so on. Like the Basic Line numbers of the old ages (Line 10 do this, Line 20 do this so if I need to do something in between I still have lines 11 - 19 as a buffer). People will use the specifics of the coding scheme to hide that hierarchy in their databases. Collibra takes care of this as well.
Because the only constant is change, and because not everybody (or every system) can handle change at the same speed, you also need the ability to travel through time. In Collibra you have snapshots to make sure you can access the relevant reference data at the relevant point in time. Need the latest and greatest for your new Big Data project that you are gearing up? Check. Need the countries that you were doing business with in 2011 for audit & compliance purposes? Check. Like I have shown before, the accessibility (via search, the BUP, ...) of these data assets is valuable. People who do data entry can make sense of the obscure values in dropdowns or figure out the exact code they have to type in field X on the screen. This reference data can then be provisioned to other systems.
Collibra's Data Stewardship Manager provides you with the capabilities needed to set up your governance organization, roles & responsibilities, as well as the workflows between any type of data asset. It includes advanced reporting (e.g., impact analysis) between those assets as well as issue management.
The Data Stewardship Manager enables the process of data management. Let me show you how that works with two specific processes. These two processes will illustrate the reusable components within the Data Stewardship Manager which will support any Data Management process within your organization.
My first process example: Issue Management. Even if you believe that all is well, there are always data related issues.
Let's zoom in on "John Doe" who depends on data for his daily business activities. He can't quite put his finger on it, but he believes that something is wrong with the data he is working with. After some investigation, John figures out that there really is bad data involved.
John communicates the issue.
People respond ... emphasis on his.
This way of treating the problem takes the issue from John to his colleagues and from there straight to ... the data issue black hole.
Not so under governance. When people have been identified, roles have been assigned and procedures agreed upon, Collibra will distribute and coordinate the responsibilities. In the example here, I have the Demo Organization split into functional groups or communities: Finance, Sales, Operations, EA, the council, ... Each of those carry data ownership. For example, Finance is the owner of the Customer Domain. John Fisher is the Data Steward in Finance, Judy Clark is the Business Steward, and Mary signs things off as the Chief Steward.
Let's go back to John Doe and his data issue. John reports this issue in Collibra. As the Data Steward, John Fisher now becomes responsible. He triages the issue, possibly interacting with John Doe to make sure everything is clear. He then reviews it with his fellow Business Steward to determine the business relevance. Because they consider it a cross functional issue, John Fisher escalates it to the Data Governance council. After serious consideration, possibly including voting on candidate solutions, in their weekly or monthly meeting, they assign the issue to Sales because they seem to be the original source for the data and its issue. The Data Steward in Sales ends up with the issue and she sets out to resolve the issue. She identifies and resolves the root cause. To be completely proactive, she even sets up a Data Quality rule to monitor the data source including a threshold based on this specific issue. This way, when the data gets corrupted again in the future, Collibra is notified and the issue can be handled proactively before it ever reaches John Doe. The work of the Data Steward is finished and Collibra takes over to make sure that the relevant stakeholders are notified of the outcome. Our friend John Doe is happy. His confidence in the data, and the Data Governance initiative is strengthened.
Management is happy as well: Data Governance is tangible. They have been notified of the issue and its resolution. They can finally monitor the size of the problem and see progress. They have come to understand that less issues with the data means smoother business operations.
My second process example: Data Sharing Management. Let's go shopping.
There are three important aspects in my shopping experience: - I need to search and quickly find what I want to buy. - I need to be compliant with the rules (for example, some items cannot be sent internationally because of regulations) - People take care of the delivery in a tightly controlled process.
These aspects also apply to data sharing: - You want to search and find the data assets you need for your business process - You want to stay compliant with the rules & policies that apply to those data assets - You need to coordinate with the relevant stewards, stakeholders and owners for the data logistics
So let's go shopping for data. My organization just started a big data project, and hired me - a data scientist. The first thing I want to do is get my hands on some data, the bigger the better. Say I need .... er ... "Person" data. I put in the appropriate search query ... select the results that will help me out and I simply add them to my shopping cart to check out.
I might not immediately be aware of it, but there are rules & policies that apply to the data assets I just selected. For example, a security officer classified part of the data I am hunting for as "Highly Confidential". According to policy, I am not allowed to access that data. Fortunately for me, the system flags this request so people are at least aware of my attempt. The other data elements I selected are accessible to me. My data sharing request is partially approved and I am happy to learn that the data will soon be shared with me through our MDM Hub. Which means that other people must come into play: the stewards that will take care of the actual data logistics and their chief steward to prioritize and approve this activity. When they have performed their work, I am notified that the data is waiting for me and I can finally get my hands dirty. You see, if Data Governance is all about "Data as an Asset", "Data as an Asset", "Data as an Asset" ... saying it three times does not make it so. It is about the value that you can get from those assets, and how you can get that value out there. For data: sharing really is caring because of the network effect: the value is dependent on the number of people using it to add value to the company.
One last thing. One of the single biggest challenges in Data Governance is adoption. This is for those among you that are trying to get people on board the Data Governance train. Is there anyone like that in the audience? ... I have good news for you. Instead of trying to bring people to Data Governance, try bringing Data Governance to where they are. Mobile devices are more frequently used in corporate environments. Collibra offers mobile support so your stewards and stakeholders have accessibility to the data assets on the go.
The Data Governance Center: a platform to increase your data maturity
The Data Governance Center A Platform to increase your Data Maturity
Data Governance Stakeholders Bob Brown Data Governance Data Governance Council Ofﬁcer Mary Smith Mike Jones Chief Steward Chief Steward (Finance) (Sales) Data Governance Data Governance Manager Working Group Finance … Sales John Fisher Judy Clark Data Steward Business Steward
The process of data management • Intake / Proposal I put my heart and my soul into my work, and • Review and approval have lost my mind in the • Escalation process. • Issue management Vincent Van Gogh • Data Sharing • Voting • Security classiﬁcation • Policy audit • …
Issue Management Picture by Andrew Aveley Source: h@p://photo.getaway.co.za/2012/03/26/andrew-‐aveley-‐hidden-‐danger/
Detect and research the issue I think I have a problem with my data…
Communicate the issue Guys, I have a problem with my data…
Hoping for feedback in the communication Yup. There seems to be a problem with his data…
Data Sharing • Find the data assets you need for your business process • Stay compliant with the rules & policies that apply to those data assets • Coordinate with the relevant stewards, stakeholders and owners for the data logistics
Controlling your data sharing CONFIDENTIAL DATA We were not able to ship the data to your business unit / department / system / country / … because of legal & risk issues. This request for data sharing has been ﬂagged. Please consult your legal department. Have a nice day.