Data, Information And Knowledge Management Framework And The Data Management Book Of Knowledge (Dmbok)
Upcoming SlideShare
Loading in...5
×
 

Data, Information And Knowledge Management Framework And The Data Management Book Of Knowledge (Dmbok)

on

  • 35,509 views

Structured and Comprehensive Approach to Data Management and the Data Management Book of Knowledge (DMBOK)

Structured and Comprehensive Approach to Data Management and the Data Management Book of Knowledge (DMBOK)

Statistics

Views

Total Views
35,509
Views on SlideShare
35,439
Embed Views
70

Actions

Likes
51
Downloads
3,625
Comments
4

6 Embeds 70

http://www.slideshare.net 62
http://www.linkedin.com 3
https://tasks.crowdflower.com 2
http://www.health.medicbd.com 1
http://sthepaniahernandz.wordpress.com 1
http://tmadeley.wordpress.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • Dear Alan, Really a super repository to understand the vast turf of Data..The most comprehensive attempt to encompass the subject ..thus far...very informative , richly engaging ! many thanks for allowing this to be shared .
    Are you sure you want to
    Your message goes here
    Processing…
  • Excellent Overview!!
    Are you sure you want to
    Your message goes here
    Processing…
  • An excellent overview.
    Are you sure you want to
    Your message goes here
    Processing…
  • what a great presentation! thank you.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data, Information And Knowledge Management Framework And The Data Management Book Of Knowledge (Dmbok) Data, Information And Knowledge Management Framework And The Data Management Book Of Knowledge (Dmbok) Presentation Transcript

  • Structured and Comprehensive Approach to Data Management and the Data Management Book of Knowledge (DMBOK) Alan McSweeney
  • Objectives • To provide an overview of a structured approach to developing and implementing a detailed data management policy including frameworks, standards, project, team and maturity March 8, 2010 2
  • Agenda • Introduction to Data Management • State of Information and Data Governance • Other Data Management Frameworks • Data Management and Data Management Book of Knowledge (DMBOK) • Conducting a Data Management Project • Creating a Data Management Team • Assessing Your Data Management Maturity March 8, 2010 3
  • Preamble • Every good presentation should start with quotations from The Prince and Dilbert March 8, 2010 4
  • Management Wisdom • There is nothing more difficult to take in hand, more perilous to conduct or more uncertain in its success than to take the lead in the introduction of a new order of things. − The Prince • Never be in the same room as a decision. I'll illustrate my point with a puppet show that I call "Journey to Blameville" starring "Suggestion Sam" and "Manager Meg.“ • You will often be asked to comment on things you don't understand. These handouts contain nonsense phrases that can be used in any situation so, let's dominate our industry with quality implementation of methodologies. • Our executives have started their annual strategic planning sessions. This involves sitting in a room with inadequate data until an illusion of knowledge is attained. Then we'll reorganise, because that's all we know how to do. − Dilbert March 8, 2010 5
  • Information • Information in all its forms – input, processed, outputs – is a Applications core component of any IT system • Applications exist to process data supplied by users and other applications Processes Information • Data breathes life into applications IT Systems • Data is stored and managed by infrastructure – hardware and software • Data is a key organisation asset with a substantial value People Infrastructure • Significant responsibilities are imposed on organisations in managing data March 8, 2010 6
  • Data, Information and Knowledge • Data is the representation of facts as text, numbers, graphics, images, sound or video • Data is the raw material used to create information • Facts are captured, stored, and expressed as data • Information is data in context • Without context, data is meaningless - we create meaningful information by interpreting the context around data • Knowledge is information in perspective, integrated into a viewpoint based on the recognition and interpretation of patterns, such as trends, formed with other information and experience • Knowledge is about understanding the significance of information • Knowledge enables effective action March 8, 2010 7
  • Data, Information, Knowledge and Action Knowledge Action Information Data March 8, 2010 8
  • Information is an Organisation Asset • Tangible organisation assets are seen as having a value and are managed and controlled using inventory and asset management systems and procedures • Data, because it is less tangible, is less widely perceived as a real asset, assigned a real value and managed as if it had a value • High quality, accurate and available information is a pre- requisite to effective operation of any organisation March 8, 2010 9
  • Data Management and Project Success • Data is fundamental to the effective and efficient operation of any solution − Right data − Right time − Right tools and facilities • Without data the solution has no purpose • Data is too often overlooked in projects • Project managers frequently do not appreciate the complexity of data issues March 8, 2010 10
  • Generalised Information Management Lifecycle Enter, Create, Acquire, • Generalised lifecycle that Derive, Update, Capture differs for specific information types Store, Manage, M an Replicate and Distribute ag e, Co nt ro la nd Ad Protect and Recover mi n is t er • Design, define and implement framework to manage Archive and Recall information through this lifecycle Delete/Remove March 8, 2010 11
  • Expanded Generalised Information Management Lifecycle Plan, Design and Specify De Implement sig Underlying n, Im Infrastructure ple m en Enter, Create, t, M Acquire, Derive, an ag Update, Capture e, Co nt Store, Manage, ro la Replicate and nd Distribute Ad mi ni ste r • Include phases for information Protect and Recover management lifecycle design and implementation of Archive and Recall appropriate hardware and software to actualise lifecycle Delete/Remove March 8, 2010 12
  • Data and Information Management • Data and information management is a business process consisting of the planning and execution of policies, practices, and projects that acquire, control, protect, deliver, and enhance the value of data and information assets March 8, 2010 13
  • Data and Information Management To manage and utilise information as a strategic asset To implement processes, policies, infrastructure and solutions to govern, protect, maintain and use information To make relevant and correct information available in all business processes and IT systems for the right people in the right context at the right time with the appropriate security and with the right quality To exploit information in business decisions, processes and relations March 8, 2010 14
  • Data Management Goals • Primary goals − To understand the information needs of the enterprise and all its stakeholders − To capture, store, protect, and ensure the integrity of data assets − To continually improve the quality of data and information, including accuracy, integrity, integration, relevance and usefulness of data − To ensure privacy and confidentiality, and to prevent unauthorised inappropriate use of data and information − To maximise the effective use and value of data and information assets March 8, 2010 15
  • Data Management Goals • Secondary goals − To control the cost of data management − To promote a wider and deeper understanding of the value of data assets − To manage information consistently across the enterprise − To align data management efforts and technology with business needs March 8, 2010 16
  • Triggers for Data Management Initiative • When an enterprise is about to undertake architectural transformation, data management issues need to be understood and addressed • Structured and comprehensive approach to data management enables the effective use of data to take advantage of its competitive advantages March 8, 2010 17
  • Data Management Principles • Data and information are valuable enterprise assets • Manage data and information carefully, like any other asset, by ensuring adequate quality, security, integrity, protection, availability, understanding and effective use • Share responsibility for data management between business data owners and IT data management professionals • Data management is a business function and a set of related disciplines March 8, 2010 18
  • Organisation Data Management Function • Business function of planning for, controlling and delivering data and information assets • Development, execution, and supervision of plans, policies, programs, projects, processes, practices and procedures that control, protect, deliver, and enhance the value of data and information assets • Scope of the data management function and the scale of its implementation vary widely with the size, means, and experience of organisations • Role of data management remains the same across organisations even though implementation differs widely March 8, 2010 19
  • Scope of Complete Data Management Function Data Management Data Governance Data Architecture Management Data Development Data Operations Management Data Security Management Data Quality Management Reference and Master Data Data Warehousing and Business Management Intelligence Management Document and Content Management Metadata Management March 8, 2010 20
  • Shared Role Between Business and IT • Data management is a shared responsibility between data management professionals within IT and the business data owners representing the interests of data producers and information consumers • Business data ownership is the concerned with accountability for business responsibilities in data management • Business data owners are data subject matter experts • Represent the data interests of the business and take responsibility for the quality and use of data March 8, 2010 21
  • Why Develop and Implement a Data Management Framework? • Improve organisation data management efficiency • Deliver better service to business • Improve cost-effectiveness of data management • Match the requirements of the business to the management of the data • Embed handling of compliance and regulatory rules into data management framework • Achieve consistency in data management across systems and applications • Enable growth and change more easily • Reduce data management and administration effort and cost • Assist in the selection and implementation of appropriate data management solutions • Implement a technology-independent data architecture March 8, 2010 22
  • Data Management Issues March 8, 2010 23
  • Data Management Issues • Discovery - cannot find the right information • Integration - cannot manipulate and combine information • Insight - cannot extract value and knowledge from information • Dissemination - cannot consume information • Management – cannot manage and control information volumes and growth March 8, 2010 24
  • Data Management Problems – User View • Managing Storage Equipment • Application Recoveries / Backup Retention • Vendor Management • Power Management • Regulatory Compliance • Lack of Integrated Tools • Dealing with Performance Problems • Data Mobility • Archiving and Archive Management • Storage Provisioning • Managing Complexity • Managing Costs • Backup Administration and Management • Proper Capacity Forecasting and Storage Reporting • Managing Storage Growth March 8, 2010 25
  • Information Management Challenges • Explosive Data Growth − Value and volume of data is overwhelming − More data is see as critical − Annual rate of 50+% percent • Compliance Requirements − Compliance with stringent regulatory requirements and audit procedures • Fragmented Storage Environment − Lack of enterprise-wide hardware and software data storage strategy and discipline • Budgets − Frozen or being cut March 8, 2010 26
  • Data Quality • Poor data quality costs real money • Process efficiency is negatively impacted by poor data quality • Full potential benefits of new systems not be realised because of poor data quality • Decision making is negatively affected by poor data quality March 8, 2010 27
  • State of Information and Data Governance • Information and Data Governance Report, April 2008 − International Association for Information and Data Quality (IAIDQ) − University of Arkansas at Little Rock, Information Quality Program (UALR-IQ) March 8, 2010 28
  • Your Organisation Recognises and Values Information as a Strategic Asset and Manages it Accordingly Strongly Disagree 3.4% Disagree 21.5% Neutral 17.1% Agree 39.5% Strongly Agree 18.5% 0% 10% 20% 30% 40% 50% March 8, 2010 29
  • Direction of Change in the Results and Effectiveness of the Organisation's Formal or Informal Information/Data Governance Processes Over the Past Two Years Results and Effectiveness Have Significantly 8.8% Improved Results and Effectiveness Have Improved 50.0% Results and Effectiveness Have Remained 31.9% Essentially the Same Results and Effectiveness Have Worsened 3.9% Results and Effectiveness Have Significantly 0.0% Worsened Don’t Know 5.4% 0% 10% 20% 30% 40% 50% 60% 70% March 8, 2010 30
  • Perceived Effectiveness of the Organisation's Current Formal or Informal Information/Data Governance Processes Excellent (All Goals are 2.5% Met) Good (Most Goals are 21.1% Met) OK (Some Goals are Met) 51.5% Poor (Few Goals are Met) 19.1% Very Poor (No Goals are 3.9% Met) Don’t Know 2.0% 0% 10% 20% 30% 40% 50% 60% 70% March 8, 2010 31
  • Actual Information/Data Governance Effectiveness vs. Organisation's Perception It is Better Than Most 20.1% People Think It is the Same as Most 32.4% People Think It is Worse Than Most 35.8% People Think Don’t Know 11.8% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% March 8, 2010 32
  • Current Status of Organisation's Information/Data Governance Initiatives Started an Information/Data Governance Initiative, but 1.5% Discontinued the Effort Considered a Focused Information/Data Governance 0.5% Effort but Abandoned the Idea None Being Considered - Keeping the Status Quo 7.4% Exploring, Still Seeking to Learn More 20.1% Evaluating Alternative Frameworks and Information 23.0% Governance Structures Now Planning an Implementation 13.2% First Iteration Implemented the Past 2 Years 19.1% First Interation"in Place for More Than 2 Years 8.8% Don’t Know 6.4% 0% 5% 10% 15% 20% 25% 30% March 8, 2010 33
  • Expected Changes in Organisation's Information/Data Governance Efforts Over the Next Two Years Will Increase Significantly 46.6% Will Increase Somewhat 39.2% Will Remain the Same 10.8% Will Decrease Somewhat 1.0% Will Decrease Significantly 0.5% Don’t Know 2.0% 0% 10% 20% 30% 40% 50% 60% March 8, 2010 34
  • Overall Objectives of Information / Data Governance Efforts Improve Data Quality 80.2% Establish Clear Decision Rules and Decisionmaking 65.6% Processes for Shared Data Increase the Value of Data Assets 59.4% Provide Mechanism to Resolve Data Issues 56.8% Involve Non-IT Personnel in Data Decisions IT Should 55.7% not Make by Itself Promote Interdependencies and Synergies Between 49.6% Departments or Business Units Enable Joint Accountability for Shared Data 45.3% Involve IT in Data Decisions non-IT Personnel Should 35.4% not Make by Themselves Other 5.2% None Applicable 1.0% Don't Know 2.6% 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100 % March 8, 2010 35
  • Change In Organisation's Information / Data Quality Over the Past Two Years Information / Data Quality 10.5% Has Significantly Improved Information / Data Quality 68.4% Has Improved Information / Data Quality Has Remained Essentially 15.8% the Same Information / Data Quality 3.5% Has Worsened Information / Data Quality 0.0% Has Significantly Worsened Don’t Know 1.8% 0% 10% 20% 30% 40% 50% 60% 70% 80% March 8, 2010 36
  • Maturity Of Information / Data Governance Goal Setting And Measurement In Your Organisation 5 - Optimised 3.7% 4 - Managed 11.8% 3 - Defined 26.7% 2 - Repeatable 28.9% 1 - Ad-hoc 28.9% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% March 8, 2010 37
  • Maturity Of Information / Data Governance Processes And Policies In Your Organisation 5 - Optimised 1.6% 4 - Managed 4.8% 3 - Defined 24.5% 2 - Repeatable 46.3% 1 - Ad-hoc 22.9% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% March 8, 2010 38
  • Maturity Of Responsibility And Accountability For Information / Data Governance Among Employees In Your Organisation 5 - Optimised 6.9% 4 - Managed 3.2% 3 - Defined 31.7% 2 - Repeatable 25.4% 1 - Ad-hoc 32.8% 0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% March 8, 2010 39
  • Other Data Management Frameworks March 8, 2010 40
  • Other Data Management-Related Frameworks • TOGAF (and other enterprise architecture standards) define a process for arriving an at enterprise architecture definition, including data • TOGAF has a phase relating to data architecture • TOGAF deals with high level • DMBOK translates high level into specific details • COBIT is concerned with IT governance and controls: − IT must implement internal controls around how it operates − The systems IT delivers to the business and the underlying business processes these systems actualise must be controlled – these are controls external to IT − To govern IT effectively, COBIT defines the activities and risks within IT that need to be managed • COBIT has a process relating to data management • Neither TOGAF nor COBIT are concerned with detailed data management design and implementation March 8, 2010 41
  • DMBOK, TOGAF and COBIT Can be a DMBOK Is a Specific and Precursor to Comprehensive Data Implementing Oriented Framework Data Management DMBOK Provides Detailed for Definition, Implementation and TOGAF Defines the Process Operation of Data for Creating a Data Management and Utilisation Architecture as Part of an Overall Enterprise Architecture Can Provide a Maturity Model for Assessing Data Management COBIT Provides Data Governance as Part of Overall IT Governance March 8, 2010 42
  • DMBOK, TOGAF and COBIT – Scope and Overlap DMBOK Data Development Data Operations Management Reference and Master Data Management Data Warehousing and Business Intelligence Management TOGAF Document and Content Management Metadata Management Data Quality Management Data Architecture Management Data Management Data Migration Data Governance Data Security COBIT Management March 8, 2010 43
  • TOGAF and Data Management • Phase C1 (subset of Phase C) relates to Phase A: Architecture defining a data Vision Phase H: Phase B: architecture Architecture Business Change Architecture Management Phase C1: Data Architecture Phase G: Phase C: Requirements Information Implementation Management Systems Governance Architecture Phase C2: Solutions and Application Phase F: Phase D: Architecture Migration Technology Planning Architecture Phase E: Opportunities and Solutions March 8, 2010 44
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Objectives • Purpose is to define the major types and sources of data necessary to support the business, in a way that is: − Understandable by stakeholders − Complete and consistent − Stable • Define the data entities relevant to the enterprise • Not concerned with design of logical or physical storage systems or databases March 8, 2010 45
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Overview Phase C1: Information Systems Architectures - Data Architecture Approach Elements Inputs Steps Outputs Key Considerations for Data Reference Materials External to the Select Reference Models, Architecture Enterprise Viewpoints, and Tools Develop Baseline Data Architecture Architecture Repository Non-Architectural Inputs Description Develop Target Data Architecture Architectural Inputs Description Perform Gap Analysis Define Roadmap Components Resolve Impacts Across the Architecture Landscape Conduct Formal Stakeholder Review Finalise the Data Architecture Create Architecture Definition Document March 8, 2010 46
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture • Data Management − Important to understand and address data management issues − Structured and comprehensive approach to data management enables the effective use of data to capitalise on its competitive advantages − Clear definition of which application components in the landscape will serve as the system of record or reference for enterprise master data − Will there be an enterprise-wide standard that all application components, including software packages, need to adopt − Understand how data entities are utilised by business functions, processes, and services − Understand how and where enterprise data entities are created, stored, transported, and reported − Level and complexity of data transformations required to support the information exchange needs between applications − Requirement for software in supporting data integration with external organisations March 8, 2010 47
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture • Data Migration − Identify data migration requirements and also provide indicators as to the level of transformation for new/changed applications − Ensure target application has quality data when it is populated − Ensure enterprise-wide common data definition is established to support the transformation March 8, 2010 48
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Approach - Key Considerations for Data Architecture • Data Governance − Ensures that the organisation has the necessary dimensions in place to enable the data transformation − Structure – ensures the organisation has the necessary structure and the standards bodies to manage data entity aspects of the transformation − Management System - ensures the organisation has the necessary management system and data-related programs to manage the governance aspects of data entities throughout its lifecycle − People - addresses what data-related skills and roles the organisation requires for the transformation March 8, 2010 49
  • TOGAF Phase C1: Information Systems Architectures - Data Architecture - Outputs • Refined and updated versions of the Architecture Vision phase deliverables − Statement of Architecture Work − Validated data principles, business goals, and business drivers • Draft Architecture Definition Document − Baseline Data Architecture − Target Data Architecture • Business data model • Logical data model • Data management process models • Data Entity/Business Function matrix • Views corresponding to the selected viewpoints addressing key stakeholder concerns − Draft Architecture Requirements Specification • Gap analysis results • Data interoperability requirements • Relevant technical requirements • Constraints on the Technology Architecture about to be designed • Updated business requirements • Updated application requirements − Data Architecture components of an Architecture Roadmap March 8, 2010 50
  • COBIT Structure COBIT Plan and Organise (PO) Acquire and Implement (AI) Deliver and Support (DS) Monitor and Evaluate (ME) DS1 Define and manage service ME1 Monitor and evaluate IT PO1 Define a strategic IT plan AI1 Identify automated solutions levels performance PO2 Define the information AI2 Acquire and maintain ME2 Monitor and evaluate DS2 Manage third-party services architecture application software internal control PO3 Determine technological AI3 Acquire and maintain DS3 Manage performance and ME3 Ensure regulatory direction technology infrastructure capacity compliance PO4 Define the IT processes, AI4 Enable operation and use DS4 Ensure continuous service ME4 Provide IT governance organisation and relationships PO5 Manage the IT investment AI5 Procure IT resources DS5 Ensure systems security PO6 Communicate management AI6 Manage changes DS6 Identify and allocate costs aims and direction AI7 Install and accredit solutions PO7 Manage IT human resources DS7 Educate and train users and changes DS8 Manage service desk and PO8 Manage quality incidents PO9 Assess and manage IT risks DS9 Manage the configuration PO10 Manage projects DS10 Manage problems DS11 Manage data DS12 Manage the physical environment DS13 Manage operations March 8, 2010 51
  • COBIT and Data Management • COBIT objective DS11 Manage Data within the Deliver and Support (DS) domain • Effective data management requires identification of data requirements • Data management process includes establishing effective procedures to manage the media library, backup and recovery of data and proper disposal of media • Effective data management helps ensure the quality, timeliness and availability of business data March 8, 2010 52
  • COBIT and Data Management • Objective is the control over the IT process of managing data that meets the business requirement for IT of optimising the use of information and ensuring information is available as required • Focuses on maintaining the completeness, accuracy, availability and protection of data • Involves taking actions − Backing up data and testing restoration − Managing onsite and offsite storage of data − Securely disposing of data and equipment • Measured by − User satisfaction with availability of data − Percent of successful data restorations − Number of incidents where sensitive data were retrieved after media were disposed of March 8, 2010 53
  • COBIT Process DS11 Manage Data • DS11.1 Business Requirements for Data Management − Establish arrangements to ensure that source documents expected from the business are received, all data received from the business are processed, all output required by the business is prepared and delivered, and restart and reprocessing needs are supported • DS11.2 Storage and Retention Arrangements − Define and implement procedures for data storage and archival, so data remain accessible and usable − Procedures should consider retrieval requirements, cost-effectiveness, continued integrity and security requirements − Establish storage and retention arrangements to satisfy legal, regulatory and business requirements for documents, data, archives, programmes, reports and messages (incoming and outgoing) as well as the data (keys, certificates) used for their encryption and authentication • DS11.3 Media Library Management System − Define and implement procedures to maintain an inventory of onsite media and ensure their usability and integrity − Procedures should provide for timely review and follow-up on any discrepancies noted • DS11.4 Disposal − Define and implement procedures to prevent access to sensitive data and software from equipment or media when they are disposed of or transferred to another use − Procedures should ensure that data marked as deleted or to be disposed cannot be retrieved. • DS11.5 Backup and Restoration − Define and implement procedures for backup and restoration of systems, data and documentation in line with business requirements and the continuity plan − Verify compliance with the backup procedures, and verify the ability to and time required for successful and complete restoration − Test backup media and the restoration process • DS11.6 Security Requirements for Data Management − Establish arrangements to identify and apply security requirements applicable to the receipt, processing, physical storage and output of data and sensitive messages − Includes physical records, data transmissions and any data stored offsite March 8, 2010 54
  • COBIT Data Management Goals and Metrics Activity Goals Process Goals Activity Goals •Backing up data and testing •Maintain the completeness, •Backing up data and testing restoration accuracy, validity and restoration •Managing onsite and offsite accessibility of stored data •Managing onsite and offsite storage of data •Secure data during disposal storage of data •Securely disposing of data of media •Securely disposing of data and equipment •Effectively manage storage and equipment media Are Measured Are Measured Are Measured By Drive By Drive By Key Performance Process Key Goal IT Key Goal Indicators Indicators Indicators •% of successful data •Occurrences of inability to restorations recover data critical to •Frequency of testing of •# of incidents where business process backup media sensitive data were retrieved •User satisfaction with •Average time for data after media were disposed of availability of data restoration •# of down time or data •Incidents of noncompliance integrity incidents caused by with laws due to storage insufficient storage capacity management issues March 8, 2010 55
  • Data Management Book of Knowledge (DMBOK) March 8, 2010 56
  • Data Management Book of Knowledge (DMBOK) • DMBOK is a generalised and comprehensive framework for managing data across the entire lifecycle • Developed by DAMA (Data Management Association) • DMBOK provides a detailed framework to assist development and implementation of data management processes and procedures and ensures all requirements are addressed • Enables effective and appropriate data management across the organisation • Provides awareness and visibility of data management issues and requirements March 8, 2010 57
  • Data Management Book of Knowledge (DMBOK) • Not a solution to your data management needs • Framework and methodology for developing and implementing an appropriate solution • Generalised framework to be customised to meet specific needs • Provide a work breakdown structure for a data management project to allow the effort to be assessed • No magic bullet March 8, 2010 58
  • Scope and Structure of Data Management Book of Knowledge (DMBOK) Data Management Environmental Elements Data Management Functions March 8, 2010 59
  • DMBOK Data Management Functions Data Management Functions Data Governance Data Architecture Management Data Development Data Operations Management Data Security Management Data Quality Management Data Warehousing and Business Reference and Master Data Management Intelligence Management Document and Content Management Metadata Management March 8, 2010 60
  • DMBOK Data Management Functions • Data Governance - planning, supervision and control over data management and use • Data Architecture Management - defining the blueprint for managing data assets • Data Development - analysis, design, implementation, testing, deployment, maintenance • Data Operations Management - providing support from data acquisition to purging • Data Security Management - Ensuring privacy, confidentiality and appropriate access • Data Quality Management - defining, monitoring and improving data quality • Reference and Master Data Management - managing master versions and replicas • Data Warehousing and Business Intelligence Management - enabling reporting and analysis • Document and Content Management - managing data found outside of databases • Metadata Management - integrating, controlling and providing metadata March 8, 2010 61
  • DMBOK Data Management Environmental Elements Data Management Environmental Elements Goals and Principles Activities Primary Deliverables Roles and Responsibilities Practices and Techniques Technology Organisation and Culture March 8, 2010 62
  • DMBOK Data Management Environmental Elements • Goals and Principles - directional business goals of each function and the fundamental principles that guide performance of each function • Activities - each function is composed of lower level activities, sub-activities, tasks and steps • Primary Deliverables - information and physical databases and documents created as interim and final outputs of each function. Some deliverables are essential, some are generally recommended, and others are optional depending on circumstances • Roles and Responsibilities - business and IT roles involved in performing and supervising the function, and the specific responsibilities of each role in that function. Many roles will participate in multiple functions • Practices and Techniques - common and popular methods and procedures used to perform the processes and produce the deliverables and may also include common conventions, best practice recommendations, and alternative approaches without elaboration • Technology - categories of supporting technology such as software tools, standards and protocols, product selection criteria and learning curves • Organisation and Culture – this can include issues such as management metrics, critical success factors, reporting structures, budgeting, resource allocation issues, expectations and attitudes, style, cultural, approach to change management March 8, 2010 63
  • DMBOK Data Management Functions and Environmental Elements Goals and Activities Primary Roles and Practices and Technology Organisation Principles Deliverables Responsibilities Techniques and Culture Data Governance Data Architecture Management Data Development Data Operations Management Scope of Each Data Management Function Data Security Management Data Quality Management Reference and Master Data Management Data Warehousing and Business Intelligence Management Document and Content Management Metadata Management March 8, 2010 64
  • Scope of Data Management Book of Knowledge (DMBOK) Data Management Framework • Hierarchy − Function • Activity − Sub-Activity (not in all cases) • Each activity is classified as one (or more) of: − Planning Activities (P) • Activities that set the strategic and tactical course for other data management activities • May be performed on a recurring basis − Development Activities (D) • Activities undertaken within implementation projects and recognised as part of the systems development lifecycle (SDLC), creating data deliverables through analysis, design, building, testing, preparation, and deployment − Control Activities (C) • Supervisory activities performed on an on-going basis − Operational Activities (O) • Service and support activities performed on an on- going basis March 8, 2010 65
  • Activity Groups Within Functions • Activity groups are classifications of data management Planning Development activities Activities Activities • Use the activity groupings to define the scope of data management sub- projects and identify the appropriate tasks: Control Operational Activities − Analysis and design Activities − Implementation − Operational improvement − Management and administration March 8, 2010 66
  • DMBOK Function and Activity Structure Data Management Reference and Document and Data Architecture Data Operations Data Security Data Quality DW and BI Metadata Data Governance Data Development Master Data Content Management Management Management Management Management Management Management Management Understand Data Data Modeling, Develop and Promote Understand Reference Understand Business Data Management Understand Enterprise Security Needs and Documents / Records Understand Metadata Analysis, and Solution Database Support Data Quality and Master Data Intelligence Planning Information Needs Regulatory Management Requirements Design Awareness Integration Needs Information Needs Requirements Identify Master and Develop and Maintain Define and Maintain Data Management Data Technology Define Data Security Define Data Quality Reference Data Define the Metadata the Enterprise Data Detailed Data Design the DW / BI Content Management Control Management Policy Requirement Sources and Architecture Model Architecture Contributors Analyse and Align Data Model and Define and Maintain Implement Data Define Data Security Profile, Analyse, and Develop and Maintain With Other Business Design Quality the Data Integration Warehouses and Data Standards Assess Data Quality Metadata Standards Models Management Architecture Marts Implement Reference Define and Maintain Define Data Security Implement a Managed Define Data Quality and Master Data Implement BI Tools the Database Data Implementation Controls and Metadata Metrics Management and User Interfaces Architecture Procedures Environment Solutions Define and Maintain Manage Users, Define Data Quality Define and Maintain Process Data for Create and Maintain the Data Integration Passwords, and Group Business Rules Match Rules Business Intelligence Metadata Architecture Membership Define and Maintain Monitor and Tune Manage Data Access Test and Validate Data Establish “Golden” the DW / BI Data Warehousing Integrate Metadata Views and Permissions Quality Requirements Records Architecture Processes Define and Maintain Monitor User Define and Maintain Monitor and Tune BI Set and Evaluate Data Manage Metadata Enterprise Taxonomies Authentication and Hierarchies and Activity and Quality Service Levels Repositories and Namespaces Access Behaviour Affiliations Performance Define and Maintain Continuously Measure Plan and Implement Classify Information Distribute and Deliver the Metadata and Monitor Data Integration of New Confidentiality Metadata Architecture Quality Data Sources Replicate and Manage Data Quality Query, Report, and Audit Data Security Distribute Reference Issues Analyse Metadata and Master Data Clean and Correct Data Manage Changes to Quality Defects Reference and Master Data Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance March 8, 2010 67
  • DMBOK Function and Activity - Planning Activities Data Management Reference and Document and Data Architecture Data Operations Data Security Data Quality DW and BI Metadata Data Governance Data Development Master Data Content Management Management Management Management Management Management Management Management Understand Data Understand Understand Data Modeling, Develop and Promote Understand Business Understand Data Management Security Needs and Reference and Documents / Records Enterprise Analysis, and Database Support Data Quality Intelligence Metadata Planning Regulatory Master Data Management Information Needs Solution Design Awareness Information Needs Requirements Requirements Integration Needs Develop and Identify Master and Define and Maintain Data Management Maintain the Data Technology Define Data Security Define Data Quality Reference Data Content Define the Metadata Detailed Data Design the DW / BI Control Enterprise Data Management Policy Requirement Sources and Management Architecture Architecture Model Contributors Analyse and Align Data Model and Define and Maintain Implement Data Develop and Define Data Security Profile, Analyse, and With Other Business Design Quality the Data Integration Warehouses and Maintain Metadata Standards Assess Data Quality Models Management Architecture Data Marts Standards Implement Reference Define and Maintain Define Data Security Implement a Define Data Quality and Master Data Implement BI Tools the Database Data Implementation Controls and Managed Metadata Metrics Management and User Interfaces Architecture Procedures Environment Solutions Define and Maintain Manage Users, Define Data Quality Define and Maintain Process Data for Create and Maintain the Data Integration Passwords, and Business Rules Match Rules Business Intelligence Metadata Architecture Group Membership Define and Maintain Manage Data Access Test and Validate Monitor and Tune Establish “Golden” the DW / BI Views and Data Quality Data Warehousing Integrate Metadata Records Architecture Permissions Requirements Processes Define and Maintain Monitor User Set and Evaluate Define and Maintain Monitor and Tune BI Enterprise Manage Metadata Authentication and Data Quality Service Hierarchies and Activity and Taxonomies and Repositories Access Behaviour Levels Affiliations Performance Namespaces Define and Maintain Continuously Plan and Implement Classify Information Distribute and the Metadata Measure and Monitor Integration of New Confidentiality Deliver Metadata Architecture Data Quality Data Sources Replicate and Manage Data Quality Query, Report, and Audit Data Security Distribute Reference Issues Analyse Metadata and Master Data Clean and Correct Manage Changes to Data Quality Defects Reference and Master Data Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance March 8, 2010 68
  • DMBOK Function and Activity - Control Activities Data Management Reference and Document and Data Architecture Data Operations Data Security Data Quality DW and BI Metadata Data Governance Data Development Master Data Content Management Management Management Management Management Management Management Management Understand Data Data Modeling, Develop and Promote Understand Reference Understand Business Data Management Understand Enterprise Security Needs and Documents / Records Understand Metadata Analysis, and Solution Database Support Data Quality and Master Data Intelligence Planning Information Needs Regulatory Management Requirements Design Awareness Integration Needs Information Needs Requirements Identify Master and Develop and Maintain Define and Maintain Data Management Data Technology Define Data Security Define Data Quality Reference Data Define the Metadata the Enterprise Data Detailed Data Design the DW / BI Content Management Control Management Policy Requirement Sources and Architecture Model Architecture Contributors Analyse and Align Data Model and Define and Maintain Implement Data Define Data Security Profile, Analyse, and Develop and Maintain With Other Business Design Quality the Data Integration Warehouses and Data Standards Assess Data Quality Metadata Standards Models Management Architecture Marts Implement Reference Define and Maintain Define Data Security Implement a Managed Define Data Quality and Master Data Implement BI Tools the Database Data Implementation Controls and Metadata Metrics Management and User Interfaces Architecture Procedures Environment Solutions Define and Maintain Manage Users, Define Data Quality Define and Maintain Process Data for Create and Maintain the Data Integration Passwords, and Group Business Rules Match Rules Business Intelligence Metadata Architecture Membership Define and Maintain Monitor and Tune Manage Data Access Test and Validate Data Establish “Golden” the DW / BI Data Warehousing Integrate Metadata Views and Permissions Quality Requirements Records Architecture Processes Define and Maintain Monitor User Define and Maintain Monitor and Tune BI Set and Evaluate Data Manage Metadata Enterprise Taxonomies Authentication and Hierarchies and Activity and Quality Service Levels Repositories and Namespaces Access Behaviour Affiliations Performance Define and Maintain Continuously Measure Plan and Implement Classify Information Distribute and Deliver the Metadata and Monitor Data Integration of New Confidentiality Metadata Architecture Quality Data Sources Replicate and Manage Data Quality Query, Report, and Audit Data Security Distribute Reference Issues Analyse Metadata and Master Data Clean and Correct Data Manage Changes to Quality Defects Reference and Master Data Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance March 8, 2010 69
  • DMBOK Function and Activity - Development Activities Data Management Reference and Document and Data Architecture Data Operations Data Security Data Quality DW and BI Metadata Data Governance Data Development Master Data Content Management Management Management Management Management Management Management Management Understand Data Data Modeling, Develop and Promote Understand Reference Understand Business Data Management Understand Enterprise Security Needs and Documents / Records Understand Metadata Analysis, and Solution Database Support Data Quality and Master Data Intelligence Planning Information Needs Regulatory Management Requirements Design Awareness Integration Needs Information Needs Requirements Identify Master and Develop and Maintain Define and Maintain Data Management Data Technology Define Data Security Define Data Quality Reference Data Define the Metadata the Enterprise Data Detailed Data Design the DW / BI Content Management Control Management Policy Requirement Sources and Architecture Model Architecture Contributors Analyse and Align Data Model and Define and Maintain Implement Data Define Data Security Profile, Analyse, and Develop and Maintain With Other Business Design Quality the Data Integration Warehouses and Data Standards Assess Data Quality Metadata Standards Models Management Architecture Marts Implement Reference Define and Maintain Define Data Security Implement a Managed Define Data Quality and Master Data Implement BI Tools the Database Data Implementation Controls and Metadata Metrics Management and User Interfaces Architecture Procedures Environment Solutions Define and Maintain Manage Users, Define Data Quality Define and Maintain Process Data for Create and Maintain the Data Integration Passwords, and Group Business Rules Match Rules Business Intelligence Metadata Architecture Membership Define and Maintain Monitor and Tune Manage Data Access Test and Validate Data Establish “Golden” the DW / BI Data Warehousing Integrate Metadata Views and Permissions Quality Requirements Records Architecture Processes Define and Maintain Monitor User Define and Maintain Monitor and Tune BI Set and Evaluate Data Manage Metadata Enterprise Taxonomies Authentication and Hierarchies and Activity and Quality Service Levels Repositories and Namespaces Access Behaviour Affiliations Performance Define and Maintain Continuously Measure Plan and Implement Classify Information Distribute and Deliver the Metadata and Monitor Data Integration of New Confidentiality Metadata Architecture Quality Data Sources Replicate and Manage Data Quality Query, Report, and Audit Data Security Distribute Reference Issues Analyse Metadata and Master Data Clean and Correct Data Manage Changes to Quality Defects Reference and Master Data Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance March 8, 2010 70
  • DMBOK Function and Activity - Operational Activities Data Management Reference and Document and Data Architecture Data Operations Data Security Data Quality DW and BI Metadata Data Governance Data Development Master Data Content Management Management Management Management Management Management Management Management Understand Data Data Modeling, Develop and Promote Understand Reference Understand Business Data Management Understand Enterprise Security Needs and Documents / Records Understand Metadata Analysis, and Solution Database Support Data Quality and Master Data Intelligence Planning Information Needs Regulatory Management Requirements Design Awareness Integration Needs Information Needs Requirements Identify Master and Develop and Maintain Define and Maintain Data Management Data Technology Define Data Security Define Data Quality Reference Data Define the Metadata the Enterprise Data Detailed Data Design the DW / BI Content Management Control Management Policy Requirement Sources and Architecture Model Architecture Contributors Analyse and Align Data Model and Define and Maintain Implement Data Define Data Security Profile, Analyse, and Develop and Maintain With Other Business Design Quality the Data Integration Warehouses and Data Standards Assess Data Quality Metadata Standards Models Management Architecture Marts Implement Reference Define and Maintain Define Data Security Implement a Managed Define Data Quality and Master Data Implement BI Tools the Database Data Implementation Controls and Metadata Metrics Management and User Interfaces Architecture Procedures Environment Solutions Define and Maintain Manage Users, Define Data Quality Define and Maintain Process Data for Create and Maintain the Data Integration Passwords, and Group Business Rules Match Rules Business Intelligence Metadata Architecture Membership Define and Maintain Monitor and Tune Manage Data Access Test and Validate Data Establish “Golden” the DW / BI Data Warehousing Integrate Metadata Views and Permissions Quality Requirements Records Architecture Processes Define and Maintain Monitor User Define and Maintain Monitor and Tune BI Set and Evaluate Data Manage Metadata Enterprise Taxonomies Authentication and Hierarchies and Activity and Quality Service Levels Repositories and Namespaces Access Behaviour Affiliations Performance Define and Maintain Continuously Measure Plan and Implement Classify Information Distribute and Deliver the Metadata and Monitor Data Integration of New Confidentiality Metadata Architecture Quality Data Sources Replicate and Manage Data Quality Query, Report, and Audit Data Security Distribute Reference Issues Analyse Metadata and Master Data Clean and Correct Data Manage Changes to Quality Defects Reference and Master Data Design and Implement Operational DQM Procedures Monitor Operational DQM Procedures and Performance March 8, 2010 71
  • DMBOK Environmental Elements Structure Data Management Environmental Elements Goals and Primary Roles and Practices and Organisation and Activities Technology Principles Deliverables Responsibilities Techniques Culture Phases. Tasks, Inputs and Recognised Best Critical Success Vision and Mission Individual Roles Tool Categories Steps Outputs Practices Factors Standards and Common Reporting Business Benefits Dependencies Information Organisation Roles Protocols Approaches Structures Sequence and Business and IT Alternative Management Strategic Goals Documents Selection Criteria Flow Roles Techniques Metrics Use Cases and Qualifications and Values, Beliefs, Specific Objectives Databases Learning Curves Scenarios Skills Expectations Attitudes. Styles, Guiding Principles Trigger Events Other Resources Preferences Teamwork, Group Dynamics, Authority, Empowerment. Contracting Strategies Change Management Approach March 8, 2010 72
  • DMBOK Environmental Elements March 8, 2010 73
  • Data Governance March 8, 2010 74
  • Data Governance • Core function of the Data Management Framework • Interacts with and influences each of the surrounding ten data management functions • Data governance is the exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets • Data governance function guides how all other data management functions are performed • High-level, executive data stewardship • Data governance is not the same thing as IT governance • Data governance is focused exclusively on the management of data assets March 8, 2010 75
  • Data Governance – Definition and Goals • Definition − The exercise of authority and control (planning, monitoring, and enforcement) over the management of data assets • Goals − To define, approve, and communicate data strategies, policies, standards, architecture, procedures, and metrics − To track and enforce regulatory compliance and conformance to data policies, standards, architecture, and procedures − To sponsor, track, and oversee the delivery of data management projects and services − To manage and resolve data related issues − To understand and promote the value of data assets March 8, 2010 76
  • Data Governance - Overview Inputs Primary Deliverables •Business Goals •Data Policies •Business Strategies •Data Standards •IT Objectives •Resolved Issues •IT Strategies •Data Management Projects and •Data Needs Services •Data Issues •Quality Data and Information •Regulatory Requirements •Recognised Data Value Suppliers Data Governance Consumers •Business Executives •Data Producers •IT Executives •Knowledge Workers •Data Stewards •Managers and Executives •Regulatory Bodies •Data Professionals •Customers Participants Tools Metrics •Executive Data Stewards •Intranet Website •Data Value •Coordinating Data Stewards •E-Mail •Data Management Cost •Business Data Stewards •Metadata Tools •Achievement of Objectives •Data Professionals •Metadata Repository •# of Decisions Made •DM Executive •Issue Management Tools •Steward Representation / Coverage •CIO •Data Governance KPI •Data Professional Headcount •Dashboard •Data Management Process Maturity March 8, 2010 77
  • Data Governance Function, Activities and Sub- Activities Data Governance Data Management Planning Data Management Control Supervise Data Professional Organisations Understand Strategic Enterprise Data Needs and Staff Develop and Maintain the Data Strategy Coordinate Data Governance Activities Establish Data Professional Roles and Manage and Resolve Data Related Issues Organisations Identify and Appoint Data Stewards Monitor and Ensure Regulatory Compliance Establish Data Governance and Stewardship Monitor and Enforce Conformance with Data Organisations Policies, Standards and Architecture Develop and Approve Data Policies, Oversee Data Management Projects and Standards, and Procedures Services Communicate and Promote the Value of Data Review and Approve Data Architecture Assets Plan and Sponsor Data Management Projects and Services Estimate Data Asset Value and Associated Costs March 8, 2010 78
  • Data Governance • Data governance is accomplished most effectively as an on-going program and a continual improvement process • Every data governance programme is unique, taking into account distinctive organisational and cultural issues, and the immediate data management challenges and opportunities • Data governance is at the core of managing data assets March 8, 2010 79
  • Data Governance - Possible Organisation Structure Data Governance Structure Organisation Data Governance CIO Council Data Governance Office Data Management Executive Business Unit Data Governance Data Technologists Councils Data Stewardship Committees Data Stewardship Teams March 8, 2010 80
  • Data Governance Shared Decision Making Business Decisions Shared Decision Making IT Decisions Enterprise Business Operating Enterprise Information Database Model Information Model Management Architecture Strategy Enterprise Information Needs Information Data Integration IT Leadership Management Architecture Policies Enterprise Data Warehousing Information Information and Business Capital Investments Specifications Management Intelligence Standards Architecture Research and Enterprise Quality Information Metadata Development Requirements Management Architecture Funding Metrics Enterprise Data Governance Issue Resolution Information Technical Metadata Model Management Services March 8, 2010 81
  • Data Stewardship • Formal accountability for business responsibilities ensuring effective control and use of data assets • Data steward is a business leader and/or recognised subject matter expert designated as accountable for these responsibilities • Manage data assets on behalf of others and in the best interests of the organisation • Represent the data interests of all stakeholders, including but not limited to, the interests of their own functional departments and divisions • Protects, manages, and leverages the data resources • Must take an enterprise perspective to ensure the quality and effective use of enterprise data March 8, 2010 82
  • Data Stewardship - Roles • Executive Data Stewards – provide data governance and make of high-level data stewardship decisions • Coordinating Data Stewards - lead and represent teams of business data stewards in discussions across teams and with executive data stewards • Business Data Stewards - subject matter experts work with data management professionals on an ongoing basis to define and control data March 8, 2010 83
  • Data Stewardship Roles Across Data Management Functions - 1 All Data Stewards Executive Data Stewards Coordinating Data Business Data Stewards Stewards Data Architecture Review, validate, approve, Review and approve the Integrate specifications, Define data requirements Management maintain and refine data enterprise data resolving differences specifications architecture architecture Data Development Validate physical data Define data requirements models and database and specifications designs, participate in database testing and conversion Data Operations Define requirements for Management data recovery, retention and performance Help identify, acquire, and control externally sourced data Data Security Management Provide security, privacy and confidentiality requirements, identify and resolve data security issues, assist in data security audits, and classify information confidentiality Reference and Master Data Control the creation, Management update, and retirement of code values and other reference data, define master data management requirements, identify and help resolve issues March 8, 2010 84
  • Data Stewardship Roles Across Data Management Functions - 2 All Data Stewards Executive Data Stewards Coordinating Data Business Data Stewards Stewards Data Warehousing and Provide business Business Intelligence intelligence requirements Management and management metrics, and they identify and help resolve business intelligence issues Document and Content Define enterprise Management taxonomies and resolve content management issues Metadata Management Create and maintain business metadata (names, meanings, business rules), define metadata access and integration needs and use metadata to make effective data stewardship and governance decisions Data Quality Management Define data quality requirements and business rules, test application edits and validations, assist in the analysis, certification, and auditing of data quality, lead clean-up efforts, identify ways to solve causes of poor data quality, promote data quality awareness March 8, 2010 85
  • Data Strategy • High-level course of action to achieve high-level goals • Data strategy is a data management program strategy a plan for maintaining and improving data quality, integrity, security and access • Address all data management functions relevant to the organisation March 8, 2010 86
  • Elements of Data Strategy • Vision for data management • Summary business case for data management • Guiding principles, values, and management perspectives • Mission and long-term directional goals of data management • Management measures of data management success • Short-term data management programme objectives • Descriptions of data management roles and business units along with a summary of their responsibilities and decision rights • Descriptions of data management programme components and initiatives • Outline of the data management implementation roadmap • Scope boundaries March 8, 2010 87
  • Data Strategy Data Management Programme Charter Data Management Data Management Scope Statement Overall vision, business case, goals, guiding principles, Implementation measures of success, critical Roadmap Goals and objectives for a success factors, recognised risks defined planning horizon and the Identifying specific programs, roles, organisations, and projects, task assignments, and individual leaders accountable delivery milestones for achieving these objectives March 8, 2010 88
  • Data Policies • Statements of intent and fundamental rules governing the creation, acquisition, integrity, security, quality, and use of data and information • More fundamental, global, and business critical than data standards • Describe what to do and what not to do • Should be few data policies stated briefly and directly March 8, 2010 89
  • Data Policies • Possible topics for data policies − Data modeling and other data development activities − Development and use of data architecture − Data quality expectations, roles, and responsibilities − Data security, including confidentiality classification policies, intellectual property policies, personal data privacy policies, general data access and usage policies, and data access by external parties − Database recovery and data retention − Access and use of externally sourced data − Sharing data internally and externally − Data warehousing and business intelligence − Unstructured data - electronic files and physical records March 8, 2010 90
  • Data Architecture • Enterprise data model and other aspects of data architecture sponsored at the data governance level • Need to pay particular attention to the alignment of the enterprise data model with key business strategies, processes, business units and systems • Includes − Data technology architecture − Data integration architecture − Data warehousing and business intelligence architecture − Metadata architecture March 8, 2010 91
  • Data Standards and Procedures • Include naming standards, requirement specification standards, data modeling standards, database design standards, architecture standards and procedural standards for each data management function • Must be effectively communicated, monitored, enforced and periodically re-evaluated • Data management procedures are the methods, techniques, and steps followed to accomplish a specific activity or task March 8, 2010 92
  • Data Standards and Procedures • Possible topics for data standards and procedures − Data modeling and architecture standards, including data naming conventions, definition standards, standard domains, and standard abbreviations − Standard business and technical metadata to be captured, maintained, and integrated − Data model management guidelines and procedures − Metadata integration and usage procedures − Standards for database recovery and business continuity, database performance, data retention, and external data acquisition − Data security standards and procedures − Reference data management control procedures − Match / merge and data cleansing standards and procedures − Business intelligence standards and procedures − Enterprise content management standards and procedures, including use of enterprise taxonomies, support for legal discovery and document and e-mail retention, electronic signatures, report formatting standards and report distribution approaches March 8, 2010 93
  • Regulatory Compliance • Most organisations are is impacted by government and industry regulations • Many of these regulations dictate how data and information is to be managed • Compliance is generally mandatory • Data governance guides the implementation of adequate controls to ensure, document, and monitor compliance with data-related regulations. March 8, 2010 94
  • Regulatory Compliance • Data governance needs to work the business to find the best answers to the following regulatory compliance questions − How relevant is a regulation? − Why is it important for us? − How do we interpret it? − What policies and procedures does it require? − Do we comply now? − How do we comply now? − How should we comply in the future? − What will it take? − When will we comply? − How do we demonstrate and prove compliance? − How do we monitor compliance? − How often do we review compliance? − How do we identify and report non-compliance? − How do we manage and rectify non-compliance? March 8, 2010 95
  • Issue Management • Data governance assists in identifying, managing, and resolving data related issues − Data quality issues − Data naming and definition conflicts − Business rule conflicts and clarifications − Data security, privacy, and confidentiality issues − Regulatory non-compliance issues − Non-conformance issues (policies, standards, architecture, and procedures) − Conflicting policies, standards, architecture, and procedures − Conflicting stakeholder interests in data and information − Organisational and cultural change management issues − Issues regarding data governance procedures and decision rights − Negotiation and review of data sharing agreements March 8, 2010 96
  • Issue Management, Control and Escalation • Data governance implements issue controls and procedures − Identifying, capturing, logging and updating issues − Tracking the status of issues − Documenting stakeholder viewpoints and resolution alternatives − Objective, neutral discussions where all viewpoints are heard − Escalating issues to higher levels of authority − Determining, documenting and communicating issue resolutions. March 8, 2010 97
  • Data Management Projects • Data management roadmap sets out a course of action for initiating and/or improving data management functions • Consists of an assessment of current functions, definition of a target environment and target objectives and a transition plan outlining the steps required to reach these targets including an approach to organisational change management • Every data management project should follow the project management standards of the organisation March 8, 2010 98
  • Data Asset Valuation • Data and information are truly assets because they have business value, tangible or intangible • Different approaches to estimating the value of data assets • Identify the direct and indirect business benefits derived from use of the data • Identify the cost of data loss, identifying the impacts of not having the current amount and quality level of data March 8, 2010 99
  • Data Architecture Management March 8, 2010 100
  • Data Architecture Management • Concerned with defining and maintaining specifications that − Provide a standard common business vocabulary − Express strategic data requirements − Outline high level integrated designs to meet these requirements − Align with enterprise strategy and related business architecture • Data architecture is an integrated set of specification artifacts used to define data requirements, guide integration and control of data assets and align data investments with business strategy • Includes formal data names, comprehensive data definitions, effective data structures, precise data integrity rules, and robust data documentation March 8, 2010 101
  • Data Architecture Management – Definition and Goals • Definition − Defining the data needs of the enterprise and designing the master blueprints to meet those needs • Goals − To plan with vision and foresight to provide high quality data − To identify and define common data requirements − To design conceptual structures and plans to meet the current and long-term data requirements of the enterprise March 8, 2010 102
  • Data Architecture Management - Overview Inputs Primary Deliverables •Business Goals •Enterprise Data Model Information •Business Strategies Value Chain Analysis •Business Architecture •Data Technology Architecture •Process Architecture •Data Integration / MDM Architecture •IT Objectives •DW / BI Architecture •IT Strategies •Metadata Architecture •Data Strategies •Enterprise Taxonomies and •Data Issues and Needs Namespaces •Technical Architecture •Document Management Architecture •Metadata Data Architecture Suppliers Consumers Management •Executives •Data Producers •Data Stewards •Knowledge Workers •Data Producers •Managers and Executives •Information Consumers •Data Professionals •Customers Participants Tools Metrics •Data Stewards •Subject Matter Experts (SMEs) Data •Data Value Architects •Data Modeling Tools •Data Management Cost •Data Analysts and Modelers Other •Model Management Tool •Achievement of Objectives Enterprise Architects •Metadata Repository Office •# of Decisions Made •DM Executive and Managers •Productivity Tools •Steward Representation / Coverage •CIO and Other Executives •Data Professional Headcount •Database Administrators •Data Management Process Maturity •Data Model Administrator March 8, 2010 103
  • Enterprise Data Architecture • Integrated set of specifications and documents − Enterprise Data Model - the core of enterprise data architecture − Information Value Chain Analysis - aligns data with business processes and other enterprise architecture components − Related Data Delivery Architecture - including database architecture, data integration architecture, data warehousing / business intelligence architecture, document content architecture, and metadata architecture March 8, 2010 104
  • Data Architecture Management Activities • Understand Enterprise Information Needs • Develop and Maintain the Enterprise Data Model • Analyse and Align With Other Business Models • Define and Maintain the Database Architecture • Define and Maintain the Data Integration Architecture • Define and Maintain the Data Warehouse / Business Intelligence Architecture • Define and Maintain Enterprise Taxonomies and Namespaces • Define and Maintain the Metadata Architecture March 8, 2010 105
  • Understanding Enterprise Information Needs • In order to create an enterprise data architecture, the organisation must first define its information need • An enterprise data model is a way of capturing and defining enterprise information needs and data requirements • Master blueprint for enterprise-wide data integration • Enterprise data model is a critical input to all future systems development projects and the baseline for additional data requirements analysis • Evaluate the current inputs and outputs required by the organisation, both from and to internal and external targets March 8, 2010 106
  • Develop and Maintain the Enterprise Data Model • Data is the set of facts collected about business entities • Data model is a set of data specifications that reflect data requirements and designs • Enterprise data model is an integrated, subject-oriented data model defining the critical data produced and consumed across the organisation • Define and analyse data requirements • Design logical and physical data structures that support these requirements March 8, 2010 107
  • Enterprise Data Model Enterprise Data Model Other Enterprise Conceptual Data Enterprise Logical Subject Area Model Data Model Model Data Models Components Data Steward Valid Reference Data Quality Responsibility Entity Life Cycles Data Values Specifications Assignments March 8, 2010 108
  • Enterprise Data Model • Build an enterprise data model in layers • Focus on the most critical business subject areas March 8, 2010 109
  • Subject Area Model • List of major subject areas that collectively express the essential scope of the enterprise • Important to the success of the entire enterprise data model • List of enterprise subject areas becomes one of the most significant organisation classifications • Acceptable to organisation stakeholders • Useful as the organising framework for data governance, data stewardship, and further enterprise data modeling March 8, 2010 110
  • Conceptual Data Model • Conceptual data model defines business entities and their relationships • Business entities are the primary organisational structures in a conceptual data model • Business needs data about business entities • Include a glossary containing the business definitions and other metadata associated with business entities and their relationships • Assists improved business understanding and reconciliation of terms and their meanings • Provide the framework for developing integrated information systems to support both transactional processing and business intelligence. • Depicts how the enterprise sees information March 8, 2010 111
  • Enterprise Logical Data Models • Logical data model contain a level of detail below the conceptual data model • Contain the essential data attributes for each entity • Essential data attributes are those data attributes without which the enterprise cannot function – can be a subjective decision March 8, 2010 112
  • Other Enterprise Data Model Components • Data Steward Responsibility Assignments- for subject areas, entities, attributes, and/or reference data value sets • Valid Reference Data Values - controlled value sets for codes and/or labels and their business meaning • Data Quality Specifications - rules for essential data attributes, such as accuracy / precision requirements, currency (timeliness), integrity rules, nullability, formatting, match/merge rules, and/or audit requirements • Entity Life Cycles - show the different lifecycle states of the most important entities and the trigger events that change an entity from one state to another March 8, 2010 113
  • Analyse and Align with Other Business Models • Information value-chain analysis maps the relationships between enterprise model elements and other business models • Business value chain identifies the functions of an organisation that contribute directly or indirectly to the organisation’s goals March 8, 2010 114
  • Define and Maintain the Data Technology Architecture • Data technology architecture guides the selection and integration of data-related technology • Data technology architecture defines standard tool categories, preferred tools in each category, and technology standards and protocols for technology integration • Technology categories include − Database management systems (DBMS) − Database management utilities − Data modelling and model management tools − Business intelligence software for reporting and analysis − Extract-transform-load (ETL), changed data capture (CDC), and other data integration tools − Data quality analysis and data cleansing tools − Metadata management software, including metadata repositories March 8, 2010 115
  • Define and Maintain the Data Technology Architecture • Classify technology architecture components as − Current - currently supported and used − Deployment - deployed for use in the next 1-2 years − Strategic - expected to be available for use in the next 2+ years − Retirement - the organisation has retired or intends to retire this year − Preferred - preferred for use by most applications. − Containment - limited to use by certain applications − Emerging - being researched and piloted for possible future deployment March 8, 2010 116
  • Define and Maintain the Data Integration Architecture • Defines how data flows through all systems from beginning to end • Both data architecture and application architecture, because it includes both databases and the applications that control the data flow into the system, between databases and back out of the system March 8, 2010 117
  • Define and Maintain the Data Warehouse / Business Intelligence Architecture • Focuses on how data changes and snapshots are stored in data warehouse systems for maximum usefulness and performance • Data integration architecture shows how data moves from source systems through staging databases into data warehouses and data marts • Business intelligence architecture defines how decision support makes data available, including the selection and use of business intelligence tools March 8, 2010 118
  • Define and Maintain Enterprise Taxonomies and Namespaces • Taxonomy is the hierarchical structure used for outlining topics • Organisations develop their own taxonomies to organise collective thinking about topics • Overall enterprise data architecture includes organisational taxonomies • Definition of terms used in such taxonomies should be consistent with the enterprise data model March 8, 2010 119
  • Define and Maintain the Metadata Architecture • Metadata architecture is the design for integration of metadata across software tools, repositories, directories, glossaries, and data dictionaries • Metadata architecture defines the managed flow of metadata • Defines how metadata is created, integrated, controlled, and accessed • Metadata repository is the core of any metadata architecture • Focus of metadata architecture is to ensure the quality, integration, and effective use of metadata March 8, 2010 120
  • Data Architecture Management Guiding Principles • Data architecture is an integrated set of specification master blueprints used to define data requirements, guide data integration, control data assets, and align data investments with business strategy • Enterprise data architecture is part of the overall enterprise architecture, along with process architecture, business architecture, systems architecture, and technology architecture • Enterprise data architecture includes three major categories of specifications: the enterprise data model, information value chain analysis, and data delivery architecture • Enterprise data architecture is about more than just data - it helps to establish a common business vocabulary • An enterprise data model is an integrated subject-oriented data model defining the essential data used across an entire organisation • Information value-chain analysis defines the critical relationships between data, processes, roles and organisations and other enterprise elements • Data delivery architecture defines the master blueprint for how data flows across databases and applications • Architectural frameworks like TOGAF help organise collective thinking about architecture March 8, 2010 121
  • Data Development March 8, 2010 122
  • Data Development • Analysis, design, implementation, deployment, and maintenance of data solutions to maximise the value of the data resources to the enterprise • Subset of project activities within the system development lifecycle focused on defining data requirements, designing the data solution components, and implementing these components • Primary data solution components are databases and other data structures March 8, 2010 123
  • Data Development – Definition and Goals • Definition − Designing, implementing, and maintaining solutions to meet the data needs of the enterprise • Goals − Identify and define data requirements − Design data structures and other solutions to these requirements − Implement and maintain solution components that meet these requirements − Ensure solution conformance to data architecture and standards as appropriate − Ensure the integrity, security, usability, and maintainability of structured data assets March 8, 2010 124
  • Data Development - Overview Inputs Primary Deliverables •Data Requirements and Business •Business Goals and Strategies Rules •Data Needs and Strategies •Conceptual Data Models •Data Standards •Logical Data Models and •Data Architecture Specifications •Process Architecture •Physical Data Models and •Application Architecture Specifications •Technical Architecture •Metadata (Business and Technical) •Data Modeling and DB Design Standards Suppliers Data Development •Data Model and DB Design Reviews •Version Controlled Data Models •Test Data •Data Stewards •Development and Test Databases •Subject Matter Experts •Information Products •IT Steering Committee •Data Access Services •Data Governance Council •Data Integration Services •Data Architects and Analysts •Migrated and Converted Data •Software Developers •Data Producers •Information Consumers Participants Tools Consumers •Data Stewards and SMEs •Data Modeling Tools •Data Architects and Analysts •Database Management Systems •Data Producers •Database Administrators •Software Development Tools •Knowledge Workers •Data Model Administrators •Testing Tools •Managers and Executives •Software Developers •Data Profiling Tools •Customers •Project Managers •Model Management Tools •Data Professionals •DM Executives and Other IT •Configuration Management Tools •Other IT Professionals Management •Office Productivity Tools March 8, 2010 125
  • Data Development Function, Activities and Sub- Activities Data Development Data Modelling, Data Model and Design Analysis and Solution Detailed Data Design Data Implementation Quality Management Design Implement Analyse Information Design Physical Develop Data Modeling Development / Test Requirements Databases and Design Standards Database Changes Develop and Maintain Physical Database Review Data Model and Create and Maintain Conceptual Data Design Database Design Quality Test Data Models Performance Conceptual and Logical Migrate and Convert Entities Modifications Data Model Reviews Data Physical Database Physical Database Build and Test Relationships Design Documentation Design Review Information Products Develop and Maintain Design Information Build and Test Data Data Model Validation Logical Data Models Products Access Services Manage Data Model Design Data Access Validate Information Attributes Versioning and Services Requirements Integration Design Data Integration Prepare for Data Domains Services Deployment Keys Develop and Maintain Physical Data Models March 8, 2010 126
  • Data Development - Principles • Data development activities are an integral part of the software development lifecycle • Data modeling is an essential technique for effective data management and system design • Conceptual and logical data modeling express business and application requirements while physical data modeling represents solution design • Data modeling and database design define detail solution component specifications • Data modeling and database design balances tradeoffs and needs • Data professionals should collaborate with other project team members to design information products and data access and integration interfaces • Data modeling and database design should follow documented standards • Design reviews should review all data models and designs, in order to ensure they meet business requirements and follow design standards • Data models represent valuable knowledge resources and so should be carefully managed and controlled them through library, configuration, and change management to ensure data model quality and availability • Database administrators and other data professionals play important roles in the construction, testing, and deployment of databases and related application systems March 8, 2010 127
  • Data Modeling, Analysis, and Solution Design • Data modeling is an analysis and design method used to define and analyse data requirements, and design data structures that support these requirements • A data model is a set of data specifications and related diagrams that reflect data requirements and designs • Data modeling is a complex process involving interactions between people and with technology which do not compromise the integrity or security of the data • Good data models accurately express and effectively communicate data requirements and quality solution design March 8, 2010 128
  • Data Model • The purposes of a data model are: − Communication - a data model is a bridge to understanding data between people with different levels and types of experience. Data models help us understand a business area, an existing application, or the impact of modifying an existing structure. Data models may also facilitate training new business and/or technical staff − Formalisation - a data model documents a single, precise definition of data requirements and data related business rules − Scope – a data model can help explain the data context and scope of purchased application packages • Data models that include the same data may differ by: − Scope - expressing a perspective about data in terms of function (business view or application view), realm (process, department, division, enterprise, or industry view), and time (current state, short-term future, long-term future) − Focus - basic and critical concepts (conceptual view), detailed but independent of context (logical view), or optimised for a specific technology and use (physical view) March 8, 2010 129
  • Analyse Information Requirements • Information is relevant and timely data in context • To identify information requirements, first identify business information needs, often in the context of one or more business processes • Business processes (and the underlying IT systems) consume information output from other business processes • Requirements analysis includes the elicitation, organisation, documentation, review, refinement, approval, and change control of business requirements • Some of these requirements identify business needs for data and information • Logical data modeling is an important means of expressing business data requirements March 8, 2010 130
  • Develop and Maintain Conceptual Data Models • Visual, high-level perspective on a subject area of importance to the business • Contains the basic and critical business entities within a given realm and function with a description of each entity and the relationships between entities • Define the meanings of the essential business vocabulary • Reflect the data associated with a business process or application function • Independent of technology and usage context March 8, 2010 131
  • Develop and Maintain Conceptual Data Models • Entities − A data entity is a collection of data about something that the business deems important and worthy of capture − Entities appear in conceptual or logical data models • Relationships − Business rules define constraints on what can and cannot be done • Data Rules – define constraints on how data relates to other data • Action Rules - instructions on what to do when data elements contain certain values March 8, 2010 132
  • Develop and Maintain Logical Data Models • Detailed representation of data requirements and the business rules that govern data quality • Independent of any technology or specific implementation technical constraints • Extension of a conceptual data model • Logical data models transform conceptual data model structures by normalisation and abstraction − Normalisation is the process of applying rules to organise business complexity into stable data structure − Abstraction is the redefinition of data entities, elements, and relationships by removing details to broaden the applicability of data structures to a wider class of situations March 8, 2010 133
  • Develop and Maintain Physical Data Models • Physical data model optimises the implementation of detailed data requirements and business rules in light of technology constraints, application usage, performance requirements, and modeling standards • Physical data modeling transforms the logical data model • Includes specific decisions − Name of each table and column or file and field or schema and element − Logical domain, physical data type, length, and nullability of each column or field − Default values − Primary and alternate unique keys and indexes March 8, 2010 134
  • Detailed Data Design • Detailed data design activities include − Detailed physical database design, including views, functions, triggers, and stored procedures − Definition of supporting data structures, such as XML schemas and object classes − Creation of information products, such as the use of data in screens and reports − Definition of data access solutions, including data access objects, integration services, and reporting and analysis services March 8, 2010 135
  • Design Physical Databases • Create detailed database implementation specifications • Ensure the design meets data integrity requirements • Determine the most appropriate physical structure to house and organise the data, such as relational or other type of DBMS, files, OLAP cubes, XML, etc. • Determine database resource requirements, such as server size and location, disk space requirements, CPU and memory requirements, and network requirements • Creating detailed design specifications for data structures, such as relational database tables, indexes, views, OLAP data cubes, XML schemas, etc. • Ensure performance requirements are met, including batch and online response time requirements for queries, inserts, updates, and deletes • Design for backup, recovery, archiving, and purge processing, ensuring availability requirements are met • Design data security implementation, including authentication, encryption needs, application roles and data access and update permissions • Review code to ensure that it meets coding standards and will run efficiently March 8, 2010 136
  • Physical Database Design • Choose a database design based on both a choice of architecture and a choice of technology • Base the choice of architecture (for example, relational, hierarchical, network, object, star schema, snowflake, cube, etc.) on data considerations • Consider factors such as how long the data needs to be kept, whether it must be integrated with other data or passed across system or application boundaries, and on requirements of data security, integrity, recoverability, accessibility, and reusability • Consider organisational or political factors, including organisational biases and developer skill sets, that lean toward a particular technology or vendor March 8, 2010 137
  • Physical Database Design - Principles • Performance and Ease of Use - Ensure quick and easy access to data by approved users in a usable and business-relevant form • Reusability - The database structure should ensure that, where appropriate, multiple applications would be able to use the data • Integrity - The data should always have a valid business meaning and value, regardless of context, and should always reflect a valid state of the business • Security - True and accurate data should always be immediately available to authorised users, but only to authorised users • Maintainability - Perform all data work at a cost that yields value by ensuring that the cost of creating, storing, maintaining, using, and disposing of data does not exceed its value to the organisation March 8, 2010 138
  • Physical Database Design - Questions • What are the performance requirements? What is the maximum permissible time for a query to return results, or for a critical set of updates to occur? • What are the availability requirements for the database? What are the window(s) of time for performing database operations? How often should database backups and transaction log backups be done (i.e., what is the longest period of time we can risk non-recoverability of the data)? • What is the expected size of the database? What is the expected rate of growth of the data? At what point can old or unused data be archived or deleted? How many concurrent users are anticipated? • What sorts of data virtualisation are needed to support application requirements in a way that does not tightly couple the application to the database schema? • Will other applications need the data? If so, what data and how? • Will users expect to be able to do ad-hoc querying and reporting of the data? If so, how and with which tools? • What, if any, business or application processes does the database need to implement? (e.g., trigger code that does cross-database integrity checking or updating, application classes encapsulated in database procedures or functions, database views that provide table recombination for ease of use or security purposes, etc.). • Are there application or developer concerns regarding the database, or the database development process, that need to be addressed? • Is the application code efficient? Can a code change relieve a performance issue? March 8, 2010 139
  • Performance Modifications • Consider how the database will perform when applications make requests to access and modify data • Indexing can improve query performance in many cases • Denormalisation is the deliberate transformation of a normalised logical data model into tables with redundant data March 8, 2010 140
  • Physical Database Design Documentation • Create physical database design document to assist implementation and maintenance March 8, 2010 141
  • Design Information Products • Design data-related deliverables • Design screens and reports to meet business data requirements • Ensure consistent use of business data terminology • Reporting services give business users the ability to execute both pre-developed and ad-hoc reports • Analysis services give business users to ability slice and dice data across multiple dimensions • Dashboards display a wide array of analytics indicators, such as charts and graphs, efficiently • Scorecard display information that indicates scores or calculated evaluations of performance • Use data integrated from multiple databases as input to software for business process automation that coordinates multiple business processes across disparate platforms • Data integration is a component of Enterprise Application Integration (EAI) software, enabling data to be easily passed from application to application across disparate platforms March 8, 2010 142
  • Design Data Access Services • May be necessary to access and combine data from remote databases with data in the local database • Goal is to enable easy and inexpensive reuse of data across the organisation preventing, wherever possible, redundant and inconsistent data • Options include − Linked database connections − SOA web services − Message brokers − Data access classes − ETL − Replication March 8, 2010 143
  • Design Data Integration Services • Critical aspect of database design is determining appropriate update mechanisms and database transaction for recovery • Define source-to-target mappings and data transformation designs for extract-transform-load (ETL) programs and other technology for ongoing data movement, cleansing and integration • Design programs and utilities for data migration and conversion from old data structures to new data structures March 8, 2010 144
  • Data Model and Design Quality Management • Balance the needs of information consumers (the people with business requirements for data) and the data producers who capture the data in usable form • Time and budget constraints • Ensure data resides in data structures that are secure, recoverable, sharable, and reusable, and that this data is as correct, timely, relevant, and usable as possible • Balance the short-term versus long-term business data interests of the organisation March 8, 2010 145
  • Develop Data Modeling and Design Standards • Data modeling and database design standards serve as the guiding principles to effectively meet business data needs, conform to data architecture, and ensure data quality • Data modeling and database design standards should include − A list and description of standard data modeling and database design deliverables − A list of standard names, acceptable abbreviations, and abbreviation rules for uncommon words, that apply to all data model objects − A list of standard naming formats for all data model objects, including attribute and column class words − A list and description of standard methods for creating and maintaining these deliverables − A list and description of data modeling and database design roles and responsibilities − A list and description of all metadata properties captured in data modeling and database design, including both business metadata and technical metadata, with guidelines defining metadata quality expectations and requirements − Guidelines for how to use data modeling tools − Guidelines for preparing for and leading design reviews March 8, 2010 146
  • Review Data Model and Database Design Quality • Conduct requirements reviews and design reviews, including a conceptual data model review, a logical data model review, and a physical database design review March 8, 2010 147
  • Conceptual and Logical Data Model Reviews • Conceptual data model and logical data model design reviews should ensure that: − Business data requirements are completely captured and clearly expressed in the model, including the business rules governing entity relationships − Business (logical) names and business definitions for entities and attributes (business semantics) are clear, practical, consistent, and complementary − Data modeling standards, including naming standards, have been followed − The conceptual and logical data models have been validated March 8, 2010 148
  • Physical Database Design Review • Physical database design reviews should ensure that: − The design meets business, technology, usage, and performance requirements − Database design standards, including naming and abbreviation standards, have been followed − Availability, recovery, archiving, and purging procedures are defined according to standards − Metadata quality expectations and requirements are met in order to properly update any metadata repository − The physical data model has been validated March 8, 2010 149
  • Data Model Validation • Validate data models against modeling standards, business requirements, and database requirements • Ensure the model matches applicable modeling standards • Ensure the model matches the business requirements • Ensure the model matches the database requirements March 8, 2010 150
  • Manage Data Model Versioning and Integration • Data models and other design specifications require change control − Each change should include − Why the project or situation required the change − What and how the object(s) changed, including which tables had columns added, modified, or removed, etc. − When the change was approved and when the change was made to the model − Who made the change − Where the change was made March 8, 2010 151
  • Data Implementation • Data implementation consists of data management activities that support system building, testing, and deployment − Database implementation and change management in the development and test environments − Test data creation, including any security procedures − Development of data migration and conversion programs, both for project development through the SDLC and for business situations − Validation of data quality requirements − Creation and delivery of user training − Contribution to the development of effective documentation March 8, 2010 152
  • Implement Development / Test Database Changes • Implement changes to the database that are required during the course of application development • Monitor database code to ensure that it is written to the same standards as application code • Identify poor SQL coding practices that could lead to errors or performance problems March 8, 2010 153
  • Create and Maintain Test Data • Populate databases in the development environment with test data • Observe privacy and confidentiality requirements and practices for test data March 8, 2010 154
  • Migrate and Convert Data • Key component of many projects is the migration of legacy data to a new database environment, including any necessary data cleansing and reformatting March 8, 2010 155
  • Build and Test Information Products • Implement mechanisms for integrating data from multiple sources, along with the appropriate metadata to ensure meaningful integration of the data • Implement mechanisms for reporting and analysing the data, including online and web-based reporting, ad-hoc querying, BI scorecards, OLAP, portals, and the like • Implement mechanisms for replication of the data, if network latency or other concerns make it impractical to service all users from a single data source March 8, 2010 156
  • Build and Test Data Access Services • Develop, test, and execute data migration and conversion programs and procedures, first for development and test data and later for production deployment • Data requirements should include business rules for data quality to guide the implementation of application edits and database referential integrity constraints • Business data stewards and other subject matter experts should validate the correct implementation of data requirements through user acceptance testing March 8, 2010 157
  • Validate Information Requirements • Test and validate that the solution meets the requirements, and plan deployment, developing training, and documentation. • Data requirements may change abruptly, in response to either changed business requirements, invalid assumptions regarding the data or reprioritisation of existing requirements • Test the implementation of the data requirements and ensure that the application requirements are satisfied March 8, 2010 158
  • Prepare for Data Deployment • Leverage the business knowledge captured in data modeling to define clear and consistent language in user training and documentation • Business concepts, terminology, definitions, and rules depicted in data models are an important part of application user training • Data stewards and data analysts should participate in deployment preparation, including development and review of training materials and system documentation, especially to ensure consistent use of defined business data terminology • Help desk support staff also require orientation and training in how system users appropriately access, manipulate, and interpret data • Once installed, business data stewards and data analysts should monitor the early use of the system to see that business data requirements are indeed met March 8, 2010 159
  • Data Operations Management March 8, 2010 160
  • Data Operations Management • Management is the development, maintenance, and support of structured data to maximise the value of the data resources to the enterprise and includes − Database support − Data technology management March 8, 2010 161
  • Data Operations Management – Definition and Goals • Definition − Planning, control, and support for structured data assets across the data lifecycle, from creation and acquisition through archival and purge • Goals − Protect and ensure the integrity of structured data assets − Manage the availability of data throughout its lifecycle − Optimise performance of database transactions March 8, 2010 162
  • Data Operations Management - Overview Inputs Primary Deliverables •DBMS Technical Environments •Data Requirements •Dev/Test, QA, DR, and Production •Data Architecture Databases •Data Models •Externally Sourced Data •Legacy Data •Database Performance •Service Level Agreements •Data Recovery Plans •Business Continuity •Data Retention Plan Data Operations •Archived and Purged Data Suppliers Management Consumers •Executives •IT Steering Committee •Data Governance Council •Data Stewards •Data Creators •Data Architects and Modelers •Information Consumers •Software Developers •Enterprise Customers •Data Professionals •Other IT Professionals Participants Tools •Database Administrators Metrics •Software Developers •Project Managers •Database Management Systems •Data Stewards •Data Development Tools •Data Architects and Analysts •Database Administration Tools •Availability •DM Executives and Other IT •Office Productivity Tools •Performance Management •IT Operators March 8, 2010 163
  • Data Operations Management Function, Activities and Sub-Activities Data Operations Management Database Support Data Technology Management Implement and Control Database Understand Data Technology Requirements Environments Obtain Externally Sourced Data Define the Data Technology Architecture Plan for Data Recovery Evaluate Data Technology Backup and Recover Data Install and Administer Data Technology Set Database Performance Service Levels Inventory and Track Data Technology Licenses Monitor and Tune Database Performance Support Data Technology Usage and Issues Plan for Data Retention Archive, Retain, and Purge Data Support Specialised Databases March 8, 2010 164
  • Data Operations Management - Principles • Write everything down • Keep everything • Whenever possible, automate a procedure • Focus to understand the purpose of each task, manage scope, simplify, do one thing at a time • Measure twice, cut once • React to problems and issues calmly and rationally, because panic causes more errors • Understand the business, not just the technology • Work together to collaborate, be accessible, share knowledge • Use all of the resources at your disposal • Keep up to date March 8, 2010 165
  • Database Support - Scope • Ensure the performance and reliability of the database, including performance tuning, monitoring, and error reporting • Implement appropriate backup and recovery mechanisms to guarantee the recoverability of the data in any circumstance • Implement mechanisms for clustering and failover of the database, if continual data availability data is a requirement • Implement mechanisms for archiving data operations management March 8, 2010 166
  • Database Support - Deliverables • A production database environment, including an instance of the DBMS and its supporting server, of a sufficient size and capacity to ensure adequate performance, configured for the appropriate level of security, reliability and availability • Mechanisms and processes for controlled implementation and changes to databases into the production environment • Appropriate mechanisms for ensuring the availability, integrity, and recoverability of the data in response to all possible circumstances that could result in loss or corruption of data • Appropriate mechanisms for detecting and reporting any error that occurs in the database, the DBMS, or the data server • Database availability, recovery, and performance in accordance with service level agreements March 8, 2010 167
  • Implement and Control Database Environments • Updating DBMS software • Maintaining multiple installations, including different DBMS versions • Installing and administering related data technology, including data integration software and third party data administration tools • Setting and tuning DBMS system parameters • Managing database connectivity • Tune operating systems, networks, and transaction processing middleware to work with the DBMS • Optimise the use of different storage technology for cost-effective storage March 8, 2010 168
  • Obtain Externally Sourced Data • Managed approach to data acquisition centralises responsibility for data subscription services • Document the external data source in the logical data model and data dictionary • Implement the necessary processes to load the data into the database and/or make it available to applications March 8, 2010 169
  • Plan for Data Recovery • Establish service level agreements (SLAs) with IT data management services organisations for data availability and recovery • SLAs set availability expectations, allowing time for database maintenance and backup, and set recovery time expectations for different recovery scenarios, including potential disasters • Ensure a recovery plan exists for all databases and database servers, covering all possible scenarios − Loss of the physical database server − Loss of one or more disk storage devices − Loss of a database, including the DBMS master database, temporary storage database, transaction log segment, etc. − Corruption of database index or data pages − Loss of the database or log segment file system − Loss of database or transaction log backup files March 8, 2010 170
  • Backup and Recover Data • Make regular backups of database and the database transaction logs • Balance the importance of the data against the cost of protecting it • Databases should reside on some sort of managed storage area • For critical data, implement some sort of replication facility March 8, 2010 171
  • Set Database Performance Service Levels • Database performance has two components - availability and performance • An unavailable database has a performance measure of zero • SLAs between data management services organisations and data owners define expectations for database performance • Availability is the percentage of time that a system or database can be used for productive work • Availability requirements are constantly increasing, raising the business risks and costs of unavailable data March 8, 2010 172
  • Set Database Performance Service Levels • Factors affecting availability include − Manageability - ability to create and maintain an effective environment − Recoverability - ability to reestablish service after interruption, and correct errors caused by unforeseen events or component failures − Reliability - ability to deliver service at specified levels for a stated period − Serviceability - ability to determine the existence of problems, diagnose their causes, and repair / solve the problems • Tasks to ensure databases stay online and operational − Running database backup utilities − Running database reorganisation utilities − Running statistics gathering utilities − Running integrity checking utilities − Automating the execution of these utilities − Exploiting table space clustering and partitioning − Replicating data across mirror databases to ensure high availability March 8, 2010 173
  • Set Database Performance Service Levels • Cause of loss of database availability − Planned and unplanned outages − Loss of the server hardware − Disk hardware failure − Operating system failure − DBMS software failure − Application problems − Network failure − Data center site loss − Security and authorisation problems − Corruption of data (due to bugs, poor design, or user error) − Loss of database objects − Loss of data − Data replication failure − Severe performance problems − Recovery failures − Human error March 8, 2010 174
  • Monitor and Tune Database Performance • Optimise database performance both proactively and reactively, by monitoring performance and by responding to problems quickly and effectively • Run activity and performance reports against both the DBMS and the server on a regular basis including during periods of heavy activity • When performance problems occur, use the monitoring and administration tools of the DBMS to help identify the source of the problem − Memory allocation (buffer / cache for data) − Locking and blocking − Failure to update database statistics − Poor SQL coding − Insufficient indexing − Application activity − Increase in the number, size, or use of databases − Database volatility March 8, 2010 175
  • Support Specialised Databases • Some specialised situations require specialised types of databases March 8, 2010 176
  • Data Technology Management • Managing data technology should follow the same principles and standards for managing any technology • Use a reference model for technology management such as Information Technology Infrastructure Library (ITIL) March 8, 2010 177
  • Understand Data Technology Requirements • Understand the data and information needs of the business • Understand the best possible applications of technology to solve business problems and take advantage of new business opportunities • Understand the requirements of a data technology before determining what technical solution to choose for a particular situation − What problem does this data technology mean to solve? − What does this data technology do that is unavailable in other data technologies? − What does this data technology not do that is available in other data technologies? − Are there any specific hardware requirements for this data technology? − Are there any specific Operating System requirements for this data technology? − Are there any specific software requirements or additional applications required for this data technology to perform as advertised? − Are there any specific storage requirements for this data technology? − Are there any specific network or connectivity requirements for this data technology? − Does this data technology include data security functionality? If not, what other tools does this technology work with that provides for data security functionality? − Are there any specific skills required to be able support this data technology? Do we have those skills in-house or must we acquire them? March 8, 2010 178
  • Define the Data Technology Architecture • Data technology architecture addresses three core questions − What technologies are standard (which are required, preferred, or acceptable)? − Which technologies apply to which purposes and circumstances? − In a distributed environment, which technologies exist where, and how does data move from one node to another? • Technology is never free - even open-source technology requires maintenance • Technology should always be regarded as the means to an end, rather than the end itself • Buying the same technology that everyone else is using, and using it in the same way, does not create business value or competitive advantage for the organisation March 8, 2010 179
  • Define the Data Technology Architecture • Technology categories include − Database management systems (DBMS) − Database management utilities − Data modelling and model management tools − Business intelligence software for reporting and analysis − Extract-transform-load (ETL), changed data capture (CDC), and other data integration tools − Data quality analysis and data cleansing tools − Metadata management software, including metadata repositories March 8, 2010 180
  • Define the Data Technology Architecture • Classify technology architecture components as − Current - currently supported and used − Deployment - deployed for use in the next 1-2 years − Strategic - expected to be available for use in the next 2+ years − Retirement - the organisation has retired or intends to retire this year − Preferred - preferred for use by most applications. − Containment - limited to use by certain applications − Emerging - being researched and piloted for possible future deployment • Create road map for the organisation consisting of these components to helps govern future technology decisions March 8, 2010 181
  • Evaluate Data Technology • Selecting appropriate data related technology, particularly the appropriate database management technology, is an important data management responsibility • Data technologies to be researched and evaluated include: − Database management systems (DBMS) software − Database utilities, such as backup and recovery tools, and performance monitors − Data modeling and model management software − Database management tools, such as editors, schema generators, and database object generators − Business intelligence software for reporting and analysis − Extract-transfer-load (ETL) and other data integration tools − Data quality analysis and data cleansing tools − Data virtualisation technology − Metadata management software, including metadata repositories March 8, 2010 182
  • Evaluate Data Technology • Use a standard technology evaluation process − Understand user needs, objectives, and related requirements − Understand the technology in general − Identify available technology alternatives − Identify the features required − Weigh the importance of each feature − Understand each technology alternative − Evaluate and score each technology alternative’s ability to meet requirements − Calculate total scores and rank technology alternatives by score − Evaluate the results, including the weighted criteria − Present the case for selecting the highest ranking alternative March 8, 2010 183
  • Evaluate Data Technology • Selecting strategic DBMS software is very important • Factors to consider when selecting DBMS software include: − Product architecture and complexity − Application profile, such as transaction processing, business intelligence, and personal profiles − Organisational appetite for technical risk − Hardware platform and operating system support − Availability of supporting software tools − Performance benchmarks − Scalability − Software, memory, and storage requirements − Available supply of trained technical professionals − Cost of ownership, such as licensing, maintenance, and computing resources − Vendor reputation − Vendor support policy and release schedule − Customer references March 8, 2010 184
  • Install and Administer Data Technology • Need to deploy new technology products in development / test, QA / certification, and production environments • Create and document processes and procedures for administering the product • Cost and complexity of implementing new technology is usually underestimated • Features and benefits are usually overestimated • Start with small pilot projects and proof-of-concept (POC) implementations to get a good idea of the true costs and benefits before proceeding with larger production implementation March 8, 2010 185
  • Inventory and Track Data Technology Licenses • Comply with licensing agreements and regulatory requirements • Track and conduct yearly audits of software license and annual support costs • Track other costs such as server lease agreements and other fixed costs • Use data to determine the total cost-of-ownership (TCO) for each type of technology and technology product • Evaluate technologies and products that are becoming obsolete, unsupported, less useful, or too expensive March 8, 2010 186
  • Support Data Technology Usage and Issues • Work with business users and application developers to − Ensure the most effective use of the technology − Explore new applications of the technology − Address any problems or issues that surface from its use • Training is important to effective understanding and use of any technology March 8, 2010 187
  • Data Security Management March 8, 2010 188
  • Data Security Management • Planning, development, and execution of security policies and procedures to provide proper authentication, authorisation, access, and auditing of data and information assets • Effective data security policies and procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted • Effective data security management function establishes governance mechanisms that are easy enough to abide by on a daily operational basis March 8, 2010 189
  • Data Security Management – Definition and Goals • Definition − Planning, development, and execution of security policies and procedures to provide proper authentication, authorisation, access, and auditing of data and information. • Goals − Enable appropriate, and prevent inappropriate, access and change to data assets − Meet regulatory requirements for privacy and confidentiality − Ensure the privacy and confidentiality needs of all stakeholders are met March 8, 2010 190
  • Data Security Management • Protect information assets in alignment with privacy and confidentiality regulations and business requirements − Stakeholder Concerns - organisations must recognise the privacy and confidentiality needs of their stakeholders, including clients, patients, students, citizens, suppliers, or business partners − Government Regulations - government regulations protect some of the stakeholder security interests. Some regulations restrict access to information, while other regulations ensure openness, transparency, and accountability − Proprietary Business Concerns - each organisation has its own proprietary data to protect - ensuring competitive advantage provided by intellectual property and intimate knowledge of customer needs and business partner relationships is a cornerstone in any business plan − Legitimate Access Needs - data security implementers must also understand the legitimate needs for data access March 8, 2010 191
  • Data Security Requirements and Procedures • Data security requirements and the procedures to meet these requirements − Authentication - validate users are who they say they are − Authorisation - identify the right individuals and grant them the right privileges to specific, appropriate views of data − Access - enable these individuals and their privileges in a timely manner − Audit - review security actions and user activity to ensure compliance with regulations and conformance with policy and standards March 8, 2010 192
  • Data Security Management - Overview Inputs Primary Deliverables •Business Goals •Business Strategy •Business Rules •Business Process •Data Security Policies •Data Strategy •Data Privacy and Confidentiality •Data Privacy Issues Standards •Related IT Policies and Standards •User Profiles, Passwords and Memberships Data Security •Data Security Permissions •Data Security Controls Suppliers Management •Data Access Views •Document Classifications •Authentication and Access History •Data Security Audits •Data Stewards •IT Steering Committee •Data Stewardship Council •Government •Customers Participants Tools Consumers •Data Stewards •Data Security Administrators •Data Producers •Database Administrators •Database Management System •Knowledge Workers •BI Analysts •Business Intelligence Tools •Managers •Data Architects •Application Frameworks •Executives •DM Leader •Identity Management Technologies •Customers •CIO/CTO •Change Control Systems •Data Professionals •Help Desk Analysts March 8, 2010 193
  • Data Security Management Function, Activities and Sub-Activities Data Security Management Understand Define Data Manage Users, Monitor User Data Security Define Data Manage Data Classify Define Data Security Passwords, and Authentication Audit Data Needs and Security Access Views Information Security Policy Controls and Group and Access Security Regulatory Standards and Permissions Confidentially Procedures Membership Behaviour Requirements Password Business Standards and Requirements Procedures Regulatory Requirements March 8, 2010 194
  • Data Operations Management - Principles • Be a responsible trustee of data about all parties. Understand and respect the privacy and confidentiality needs of all stakeholders, be they clients, patients, students, citizens, suppliers, or business partners • Understand and comply with all pertinent regulations and guidelines • Data-to-process and data-to-role relationship (CRUD Create, Read, Update, Delete) matrices help map data access needs and guide definition of data security role groups, parameters, and permissions • Definition of data security requirements and data security policy is a collaborative effort involving IT security administrators, data stewards, internal and external audit teams, and the legal department • Identify detailed application security requirements in the analysis phase of every systems development project • Classify all enterprise data and information products against a simple confidentiality classification schema • Every user account should have a password set by the user following a set of password complexity guidelines, and expiring every 45 to 60 days • Create role groups; define privileges by role; and grant privileges to users by assigning them to the appropriate role group. Whenever possible, assign each user to only one role group • Some level of management must formally request, track, and approve all initial authorisations and subsequent changes to user and group authorisations • To avoid data integrity issues with security access information, centrally manage user identity data and group membership data • Use relational database views to restrict access to sensitive columns and / or specific rows • Strictly limit and carefully consider every use of shared or service user accounts • Monitor data access to certain information actively, and take periodic snapshots of data access activity to understand trends and compare against standards criteria • Periodically conduct objective, independent, data security audits to verify regulatory compliance and standards conformance, and to analyse the effectiveness and maturity of data security policy and practice • In an outsourced environment, be sure to clearly define the roles and responsibilities for data security and understand the chain of custody data across organisations and roles. March 8, 2010 195
  • Understand Data Security Needs and Regulatory Requirements • Distinguish between business rules and procedures and the rules imposed by application software products • Common for systems to have their own unique set of data security requirements over and above those required business processes March 8, 2010 196
  • Business Requirements • Implementing data security within an enterprise requires an understanding of business requirements • Business needs of an enterprise define the degree of rigidity required for data security • Business rules and processes define the security touch points • Data-to-process and data-to-role relationship matrices are useful tools to map these needs and guide definition of data security role-groups, parameters, and permissions • Identify detailed application security requirements in the analysis phase of every systems development project March 8, 2010 197
  • Regulatory Requirements • Organisations must comply with a growing set of regulations • Some regulations impose security controls on information management March 8, 2010 198
  • Define Data Security Policy • Definition of data security policy based on data security requirements is a collaborative effort involving IT security administrators, data stewards, internal and external audit teams, and the legal department • Enterprise IT strategy and standards typically dictate high- level policies for access to enterprise data assets • Data security policies are more granular in nature and take a very data-centric approach compared to an IT security policy March 8, 2010 199
  • Define Data Security Standards • No one prescribed way of implementing data security to meet privacy and confidentiality requirements • Regulations generally focus on ensuring achieving an end without defining them means for achieving it • Organisations should design their own security controls, demonstrate that the controls meet the requirements of the law or regulations and document the implementation of those controls • Information technology security standards can also affect − Tools used to manage data security − Data encryption standards and mechanisms − Access guidelines to external vendors and contractors − Data transmission protocols over the internet − Documentation requirements − Remote access standards − Security breach incident reporting procedures March 8, 2010 200
  • Define Data Security Standards • Consider physical security, especially with the explosion of portable devices and media, to formulate an effective data security strategy − Access to data using mobile devices − Storage of data on portable devices such as laptops, DVDs, CDs or USB drives − Disposal of these devices in compliance with records management policies • An organisation should develop a practical, implementable security policy including data security guiding principles • Focus should be on quality and consistency not creating a lengthy body of guidelines • Execution of the policy requires satisfying the elements of securing information assets: authentication, authorisation, access, and audit • Information classification, access rights, role groups, users, and passwords are the means to implementing policy and satisfying these elements March 8, 2010 201
  • Define Data Security Controls and Procedures • Implementation and administration of data security policy is primarily the responsibility of security administrators • Database security is often one responsibility of database administrators • Implement proper controls to meet the objectives of relevant laws • Implement a process to validate assigned permissions against a change management system used for tracking all user permission requests March 8, 2010 202
  • Manage Users, Passwords, and Group Membership • Role groups enable security administrators to define privileges by role and to grant these privileges to users by enrolling them in the appropriate role group • Data consistency in user and group management is a challenge in a mixed IT environment • Construct group definitions at a workgroup or business unit level • Organise roles in a hierarchy, so that child roles further restrict the privileges of parent roles March 8, 2010 203
  • Password Standards and Procedures • Passwords are the first line of defense in protecting access to data • Every user account should be required to have a password set by the user with a sufficient level of password complexity defined in the security standards March 8, 2010 204
  • Manage Data Access Views and Permissions • Data security management involves not just preventing inappropriate access, but also enabling valid and appropriate access to data • Most sets of data do not have any restricted access requirements • Control sensitive data access by granting permissions - opt-in • Access control degrades when achieved through shared or service accounts − Implemented as convenience for administrators, these accounts often come with enhanced privileges and are untraceable to any particular user or administrator − Enterprises using shared or service accounts run the risk of data security breaches − Evaluate use of such accounts carefully, and never use them frequently or by default March 8, 2010 205
  • Monitor User Authentication and Access Behaviour • Monitoring authentication and access behaviour is critical because: − It provides information about who is connecting and accessing information assets, which is a basic requirement for compliance auditing − It alerts security administrators to unforeseen situations, compensating for oversights in data security planning, design, and implementation • Monitoring helps detect unusual or suspicious transactions that may warrant further investigation and issue resolution • Perform monitoring either actively or passively • Automated systems with human checks and balances in place best accomplish both methods March 8, 2010 206
  • Classify Information Confidentiality • Classify an organisation’s data and information using a simple confidentiality classification schema • Most organisations classify the level of confidentiality for information found within documents, including reports • A typical classification schema might include the following five confidentiality classification levels: − For General Audiences: Information available to anyone, including the general public − Internal Use Only: Information limited to employees or members, but with minimal risk if shared − Confidential: Information which should not be shared outside the organisation. Client Confidential information may not be shared with other clients − Restricted Confidential: Information limited to individuals performing certain roles with the need to know − Registered Confidential: Information so confidential that anyone accessing the information must sign a legal agreement to access the data and assume responsibility for its secrecy March 8, 2010 207
  • Audit Data Security • Auditing data security is a recurring control activity with responsibility to analyse, validate, counsel, and recommend policies, standards, and activities related to data security management • Auditing is a managerial activity performed with the help of analysts working on the actual implementation and details • The goal of auditing is to provide management and the data governance council with objective, unbiased assessments, and rational, practical recommendations • Auditing data security is no substitute for effective management of data security • Auditing is a supportive, repeatable process, which should occur regularly, efficiently, and consistently March 8, 2010 208
  • Audit Data Security • Auditing data security includes − Analysing data security policy and standards against best practices and needs − Analysing implementation procedures and actual practices to ensure consistency with data security goals, policies, standards, guidelines, and desired outcomes − Assessing whether existing standards and procedures are adequate and in alignment with business and technology requirements − Verifying the organisation is in compliance with regulatory requirements − Reviewing the reliability and accuracy of data security audit data − Evaluating escalation procedures and notification mechanisms in the event of a data security breach − Reviewing contracts, data sharing agreements, and data security obligations of outsourced and external vendors, ensuring they meet their obligations, and ensuring the organisation meets its obligations for externally sourced data − Reporting to senior management, data stewards, and other stakeholders on the state of data security within the organisation and the maturity of its practices − Recommending data security design, operational, and compliance improvements March 8, 2010 209
  • Data Security and Outsourcing • Outsourcing IT operations introduces additional data security challenges and responsibilities • Outsourcing increases the number of people who share accountability for data across organisational and geographic boundaries • Previously informal roles and responsibilities must now be explicitly defined as contractual obligations • Outsourcing contracts must specify the responsibilities and expectations of each role • Any form of outsourcing increases risk to the organisation • Data security risk is escalated to include the outsource vendor, so any data security measures and processes must look at the risk from the outsource vendor not only as an external risk, but also as an internal risk March 8, 2010 210
  • Data Security and Outsourcing • Transferring control, but not accountability, requires tighter risk management and control mechanisms: − Service level agreements − Limited liability provisions in the outsourcing contract − Right-to-audit clauses in the contract − Clearly defined consequences to breaching contractual obligations − Frequent data security reports from the service vendor − Independent monitoring of vendor system activity − More frequent and thorough data security auditing − Constant communication with the service vendor • In an outsourced environment, it is important to maintain and track the lineage, or flow, of data across systems and individuals to maintain a chain of custody March 8, 2010 211
  • Reference and Master Data Management March 8, 2010 212
  • Reference and Master Data Management • Reference and Master Data Management is the ongoing reconciliation and maintenance of reference data and master data − Reference Data Management is control over defined domain values (also known as vocabularies), including control over standardised terms, code values and other unique identifiers, business definitions for each value, business relationships within and across domain value lists, and the consistent, shared use of accurate, timely and relevant reference data values to classify and categorise data − Master Data Management is control over master data values to enable consistent, shared, contextual use across systems, of the most accurate, timely, and relevant version of truth about essential business entities • Reference data and master data provide the context for transaction data March 8, 2010 213
  • Reference and Master Data Management – Definition and Goals • Definition − Planning, implementation, and control activities to ensure consistency with a golden version of contextual data values • Goals − Provide authoritative source of reconciled, high-quality master and reference data − Lower cost and complexity through reuse and leverage of standards − Support business intelligence and information integration efforts March 8, 2010 214
  • Reference and Master Data Management - Overview Inputs Primary Deliverables •Business Drivers •Data Requirements Policy and •Master and Reference Data Regulations Requirements •Standards •Data Models and Documentation •Code Sets •Reliable Reference and Master Data •Master Data •Golden Record Data Lineage •Transactional Data •Data Quality Metrics and Reports Reference and •Data Cleansing Services Suppliers Master Data Management Consumers •Steering Committees •Business Data Stewards •Subject Matter Experts •Application Users •Data Consumers •BI and Reporting Users •Standards Organisations Tools •Application Developers and Architects •Data Providers •Data Integration Developers and Architects •Reference Data Management •BI Developers and Architects Applications •Vendors, Customers, and Partners Participants •Master Data Management Applications •Data Stewards •Data Modeling Tools Metrics •Subject Matter Experts •Process Modeling Tools •Data Architects •Metadata Repositories •Reference and Master Data Quality •Data Analysts •Data Profiling Tools •Change Activity •Application Architects •Data Cleansing Tools •Issues, Costs, Volume •Data Governance Council •Data Integration Tools •Use and Re-Use •Data Providers •Business Process and Rule Engines •Availability •Other IT Professionals Change Management Tools •Data Steward Coverage March 8, 2010 215
  • Reference and Master Data Management Function, Activities and Sub-Activities Reference and Master Data Management Understand Identify Implement Define and Define and Plan and Replicate and Manage Reference Reference Reference Maintain the Define and Establish Maintain Implement Distribute Changes to Reference and Master and Master and Master Master Data Data Maintain Golden Hierarchies Integration of Reference Reference Data Data Data Sources Data integration Match Rules Records and New Data and Master and Master Integration and Management Architecture Affiliations Sources Data Data Needs Contributors Solutions Vocabulary Management Party Master and Data Reference Data Defining Financial Golden Master Data Master Data Values Product Master Data Location Master Data March 8, 2010 216
  • Reference and Master Data Management - Principles • Shared reference and master data belongs to the organisation, not to a particular application or department • Reference and master data management is an on-going data quality improvement program; its goals cannot be achieved by one project alone • Business data stewards are the authorities accountable for controlling reference data values. Business data stewards work with data professionals to improve the quality of reference and master data • Golden data values represent the organisation’s best efforts at determining the most accurate, current, and relevant data values for contextual use. New data may prove earlier assumptions to be false. Therefore, apply matching rules with caution, and ensure that any changes that are made are reversible • Replicate master data values only from the database of record • Request, communicate, and, in some cases, approve of changes to reference data values before implementation March 8, 2010 217
  • Reference Data • Reference data is data used to classify or categorise other data • Business rules usually dictate that reference data values conform to one of several allowed values • In all organisations, reference data exists in virtually every database • Reference tables link via foreign keys into other relational database tables, and the referential integrity functions within the database management system ensure only valid values from the reference tables are used in other tables March 8, 2010 218
  • Master Data • Master data is data about the business entities that provide context for business transactions • Master data is the authoritative, most accurate data available about key business entities, used to establish the context for transactional data • Master data values are considered golden • Master Data Management is the process of defining and maintaining how master data will be created, integrated, maintained, and used throughout the enterprise March 8, 2010 219
  • Master Data Challenges • What are the important roles, organisations, places, and things referenced repeatedly? • What data is describing the same person, organisation, place, or thing? • Where is this data stored? What is the source for the data? • Which data is more accurate? Which data source is more reliable and credible? Which data is most current? • What data is relevant for specific needs? How do these needs overlap or conflict? • What data from multiple sources can be integrated to create a more complete view and provide a more comprehensive understanding of the person, organisation, place or thing? • What business rules can be established to automate master data quality improvement by accurately matching and merging data about the same person, organisation, place, or thing? • How do we identify and restore data that was inappropriately matched and merged? • How do we provide our golden data values to other systems across the enterprise? • How do we identify where and when data other than the golden values is used? March 8, 2010 220
  • Party Master Data • Includes data about individuals, organisations, and the roles they play in business relationships • Customer relationship management (CRM) systems perform MDM for customer data (also called Customer Data Integration (CDI)) • Focus is to provide the most complete and accurate information about each and every customer • Need to identify duplicate, redundant and conflicting data • Party master data issues − Complexity of roles and relationships played by individuals and organisations − Difficulties in unique identification − High number of data sources − Business importance and potential impact of the data March 8, 2010 221
  • Financial Master Data • Includes data about business units, cost centers, profit centers, general ledger accounts, budgets, projections, and projects • Financial MDM solutions focus on not only creating, maintaining, and sharing information, but also simulating how changes to existing financial data may affect the organisation’s bottom line March 8, 2010 222
  • Product Master Data • Product master can consists of information on an organisation’s products and services or on the entire industry in which the organisation operates, including competitor products, and services • Product Lifecycle Management (PLM) focuses on managing the lifecycle of a product or service from its conception (such as research), through its development, manufacturing, sale / delivery, service, and disposal March 8, 2010 223
  • Location Master Data • Provides the ability to track and share reference information about different geographies, and create hierarchical relationships or territories based on geographic information to support other processes • Different industries require specialised earth science data (geographic data about seismic faults, flood plains, soil, annual rainfall, and severe weather risk areas) and related sociological data (population, ethnicity, income, and terrorism risk), usually supplied from external sources March 8, 2010 224
  • Understand Reference and Master Data Integration Needs • Reference and master data requirements are relatively easy to discover and understand for a single application • Potentially much more difficult to develop an understanding of these needs across applications, especially across the entire organisation • Analysing the root causes of a data quality problem usually uncovers requirements for reference and master data integration • Organisations that have successfully managed reference and master data typically have focused on one subject area at a time − Analyse all occurrences of a few business entities, across all physical databases and for differing usage patterns March 8, 2010 225
  • Identify Reference and Master Data Sources and Contributors • Successful organisations first understand the needs for reference and master data • Then trace the lineage of this data to identify the original and interim source databases, files, applications, organisations and the individual roles that create and maintain the data • Understand both the upstream sources and the downstream needs to capture quality data at its source March 8, 2010 226
  • Define and Maintain the Data integration Architecture • Effective data integration architecture controls the shared access, replication, and flow of data to ensure data quality and consistency, particularly for reference and master data • Without data integration architecture, local reference and master data management occurs in application silos, inevitably resulting in redundant and inconsistent data • The selected data integration architecture should also provide common data integration services − Change request processing, including review and approval − Data quality checks on externally acquired reference and master data − Consistent application of data quality rules and matching rules − Consistent patterns of processing − Consistent metadata about mappings, transformations, programs and jobs − Consistent audit, error resolution and performance monitoring data − Consistent approach to replicating data • Establishing master data standards can be a time consuming task as it may involve multiple stakeholders. • Apply the same data standards, regardless of integration technology, to enable effective standardisation, sharing, and distribution of reference and master data March 8, 2010 227
  • Data Integration Services Architecture Data Quality Management Data Acquisition, File Data Standardisation Replication Management and Cleansing and Management Audit Matching Source Data Rules Reconciled Master Data Archives Errors Subscriptions Staging MetaData Management Business Integration Job Flow and Metadata Metadata Statistics March 8, 2010 228
  • Implement Reference and Master Data Management Solutions • Reference and master data management solutions are complex • Given the variety, complexity, and instability of requirements, no single solution or implementation project is likely to meet all reference and master data management needs • Organisations should expect to implement reference and master data management solutions iteratively and incrementally through several related projects and phases March 8, 2010 229
  • Define and Maintain Match Rules • Matching, merging, and linking of data from multiple systems about the same person, group, place, or thing is a major master data management challenge • Matching attempts to remove redundancy, to improve data quality, and provide information that is more comprehensive • Data matching is performed by applying inference rules − Duplicate identification match rules focus on a specific set of fields that uniquely identify an entity and identify merge opportunities without taking automatic action − Match-merge rules match records and merge the data from these records into a single, unified, reconciled, and comprehensive record. − Match-link rules identify and cross-reference records that appear to relate to a master record without updating the content of the cross-referenced record March 8, 2010 230
  • Establish Golden Records • Establishing golden master data values requires more inference, application of matching rules, and review of the results March 8, 2010 231
  • Vocabulary Management and Reference Data • A vocabulary is a collection of terms / concepts and their relationships • Vocabulary management is defining, sourcing, importing, and maintaining a vocabulary and its associated reference data − See ANSI/NISO Z39.19 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies - http://www.niso.org/kst/reports/standards?step=2&gid=&project _key=7cc9b583cb5a62e8c15d3099e0bb46bbae9cf38a • Vocabulary management requires the identification of the standard list of preferred terms and their synonyms • Vocabulary management requires data governance, enabling data stewards to assess stakeholder needs March 8, 2010 232
  • Vocabulary Management and Reference Data • Key questions to ask to enable vocabulary management − What information concepts (data attributes) will this vocabulary support? − Who is the audience for this vocabulary? What processes do they support, and what roles do they play? − Why is the vocabulary needed? Will it support applications, content management, analytics, and so on? − Who identifies and approves the preferred vocabulary and vocabulary terms? − What are the current vocabularies different groups use to classify this information? Where are they located? How were they created? Who are their subject matter experts? Are there any security or privacy concerns for any of them? − Are there existing standards that can be leveraged to fulfill this need? Are there concerns about using an external standard vs. internal? How frequently is the standard updated and what is the degree of change of each update? Are standards accessible in an easy to import / maintain format in a cost efficient manner? March 8, 2010 233
  • Defining Golden Master Data Values • Golden data values are the data values thought to be the most accurate, current, and relevant for shared, consistent use across applications • Determine golden values by analyssing data quality, applying data quality rules and matching rules, and incorporating data quality controls into the applications that acquire, create, and update data • Establish data quality measurements to set expectations, measure improvements, and help identify root causes of data quality problems • Assess data quality through a combination of data profiling activities and verification against adherence to business rules • Once the data is standardised and cleansed, the next step is to attempt reconciliation of redundant data through application of matching rules March 8, 2010 234
  • Define and Maintain Hierarchies and Affiliations • Vocabularies and their associated reference data sets are often more than lists of preferred terms and their synonyms • Affiliation management is the establishment and maintenance of relationships between master data records March 8, 2010 235
  • Plan and Implement Integration of New Data Sources • Integrating new reference data sources involves − Receiving and responding to new data acquisition requests from different groups − Performing data quality assessment services using data cleansing and data profiling tools − Assessing data integration complexity and cost − Piloting the acquisition of data and its impact on match rules − Determining who will be responsible for data quality − Finalising data quality metrics March 8, 2010 236
  • Replicate and Distribute Reference and Master Data • Reference and master data may be read directly from a database of record, or may be replicated from the database of record to other application databases for transaction processing, and data warehouses for business intelligence • Reference data most commonly appears as pick list values in applications • Replication aids maintenance of referential integrity March 8, 2010 237
  • Manage Changes to Reference and Master Data • Specific individuals have the role of a business data steward with the authority to create, update, and retire reference data • Formally control changes to controlled vocabularies and their reference data sets • Carefully assess the impact of reference data changes March 8, 2010 238
  • Data Warehousing and Business Intelligence Management March 8, 2010 239
  • Data Warehousing and Business Intelligence Management • A Data Warehouse is a combination of two primary components − An integrated decision support database − Related software programs used to collect, cleanse, transform, and store data from a variety of operational and external sources • Both components combine to support historical, analytical, and business intelligence (BI) requirements • A Data Warehouse may also include dependent data marts, which are subset copies of a data warehouse database • A Data Warehouse includes any data stores or extracts used to support the delivery of data for BI purposes March 8, 2010 240
  • Data Warehousing and Business Intelligence Management • Data Warehousing means the operational extract, cleansing, transformation, and load processes and associated control processes that maintain the data contained within a data warehouse • Data Warehousing process focuses on enabling an integrated and historical business context on operational data by enforcing business rules and maintaining appropriate business data relationships and processes that interact with metadata repositories • Business Intelligence is a set of business capabilities including − Query, analysis, and reporting activity by knowledge workers to monitor and understand the financial operation health of, and make business decisions about, the enterprise − Strategic and operational analytics and reporting on corporate operational data to support business decisions, risk management, and compliance March 8, 2010 241
  • Data Warehousing and Business Intelligence Management • Together Data Warehousing and Business Intelligence Management is the collection, integration, and presentation of data to knowledge workers for the purpose of business analysis and decision-making • Composed of activities supporting all phases of the decision support life cycle that provides context − Moves and transforms data from sources to a common target data store − Provides knowledge workers various means of access, manipulation − Reporting of the integrated target data March 8, 2010 242
  • Data Warehousing and Business Intelligence Management – Definition and Goals • Definition − Planning, implementation, and control processes to provide decision support data and support knowledge workers engaged in reporting, query and analysis • Goals − To support and enable effective business analysis and decision making by knowledge workers − To build and maintain the environment / infrastructure to support business intelligence activity, specifically leveraging all the other data management functions to cost effectively deliver consistent integrated data for all BI activity March 8, 2010 243
  • Data Warehousing and Business Intelligence Management - Overview Inputs Primary Deliverables •Business Drivers •BI Data and Access Requirements •Data Quality Requirements •DW/BI Architecture •Data Security Requirements •Data Warehouses •Data Architecture •Data Marts and OLAP Cubes •Technical Architecture •Dashboards and Scorecards •Data Modeling Standards and Guidelines •Transactional Data Data Warehousing •Analytic Applications •File Extracts (for Data Mining/Statistical •Master and Reference Data •Industry and External Data and Business Tools) •BI Tools and User Environments •Data Quality Feedback Mechanism/Loop Suppliers Intelligence •Executives and Managers Management •Subject Matter Experts Consumers •Data Governance Council •Information Consumers (Internal and External) •Data Producers •Knowledge Workers •Data Architects and Analysts Tools •Managers and Executives •External Customers and Systems •Internal Customers and Systems Participants •Data Professionals Other IT Professionals •Database Management Systems •Business Executives and Managers •Data Profiling Tools •DM Execs and Other IT Management •Data Integration Tools •BI Program Manage •Data Cleansing Tools •SMEs and Other Information Consumers •Business Intelligence Tools Metrics •Data Stewards •Analytic Applications •Project Managers •Data Modeling Tools •Data Architects and Analysts •Performance Management Tools •Data Integration (ETL) Specialists •Usage Metrics •Metadata Repository •Customer/User Satisfaction •BI Specialists •Data Quality Tools •Database Administrators •Subject Area Coverage % •Data Security Tools •Response/Performance Metrics •Data Security Administrators •Data Quality Analysts March 8, 2010 244
  • Data Warehousing and Business Intelligence Management Objectives • Providing integrated storage of required current and historical data, organised by subject areas • Ensuring credible, quality data for all appropriate access capabilities • Ensuring a stable, high-performance, reliable environment for data acquisition, data management, and data access • Providing an easy-to-use, flexible, and comprehensive data access environment • Delivering both content and access to the content in increments appropriate to the organisation’s objectives • Leveraging, rather than duplicating, relevant data management component functions such as Reference and Master Data Management, Data Governance, Data Quality, and Metadata • Providing an enterprise focal point for data delivery in support of the decisions, policies, procedures, definitions, and standards that arise from DG • Defining, building, and supporting all data stores, data processes, data infrastructure, and data tools that contain integrated, post-transactional, and refined data used for information viewing, analysis, or data request fulfillment • Integrating newly discovered data as a result of BI processes into the DW for further analytics and BI use. March 8, 2010 245
  • Data Warehousing and Business Intelligence Management Function, Activities and Sub-Activities Data Warehousing and Business Intelligence Management Understand Business Define and Maintain Implement Data Implement Business Monitor and Tune Monitor and Tune BI Process Data for Intelligence the DW-BI Warehouses and Data Intelligence Tools and Data Warehousing Activity and Business Intelligence Information Needs Architecture Marts User Interfaces Processes Performance Query and Reporting Staging Areas Tools On Line Analytical Mapping Sources and Processing (OLAP) Targets Tools Data Cleansing and Analytic Applications Transformations (Data Acquisition) Implementing Management Dashboards and Scorecards Performance Management Tools Predictive Analytics and Data Mining Tools Advanced Visualisation and Discovery Tools March 8, 2010 246
  • Data Warehousing and Business Intelligence Management Principles • Obtain executive commitment and support as these projects are labour intensive • Secure business SMEs as their support and high availability are necessary for getting the correct data and useful BI solution • Be business focused and driven. Make sure DW / BI work is serving real priority business needs and solving burning business problems. Let the business drive the prioritisation • Demonstrable data quality is essential • Provide incremental value. Ideally deliver in continual 2-3 month segments • Transparency and self service. The more context (metadata of all kinds) provided, the more value customers derive. Wisely exposing information about the process reduces calls and increases satisfaction. • One size does not fit all. Make sure you find the right tools and products for each of your customer segments • Think and architect globally, act and build locally. Let the big-picture and end- vision guide the architecture, but build and deliver incrementally, with much shorter term and more project-based focus • Collaborate with and integrate all other data initiatives, especially those for data governance, data quality, and metadata • Start with the end in mind. Let the business priority and scope of end-data- delivery in the BI space drive the creation of the DW content. The main purpose for the existence of the DW is to serve up data to the end business customers via the BI capabilities • Summarise and optimise last, not first. Build on the atomic data and add aggregates or summaries as needed for performance, but not to replace the detail. March 8, 2010 247
  • Understand Business Intelligence Information Needs • All projects start with requirements • Gathering requirements for DW-BIM projects has both similarities to and differences from gathering requirements for other projects • For DW-BIM projects, it is important to understand the broader business context of the business area targeted as reporting is generalised and exploratory • Capturing the actual business vocabulary and terminology is a key to success • Document the business context, then explore the details of the actual source data • Typically, the ETL portion can consume 60%-70% of a DW-BIM project’s budget and time • The DW is often the first place where the pain of poor quality data in source systems and / or data entry functions becomes apparent • Creating an executive summary of the identified business intelligence needs is a best practice • When starting a DW-BIM programme, a good way to decide where to start is using a simple assessment of business impact and technical feasibility − Technical feasibility will take into consideration things like complexity, availability and state of the data, and the availability of subject matter experts − Projects that have high business impact and high technical feasibility are good candidates for starting. March 8, 2010 248
  • Define and Maintain the DW-BI Architecture • Successful DW-BIM architecture requires the identification and bringing together of a number of key roles − Technical Architect - hardware, operating systems, databases and DW-BIM architecture − Data Architect - data analysis, systems of record, data modeling and data mapping − ETL Architect / Design Lead - staging and transform, data marts, and schedules − Metadata Specialist - metadata interfaces, metadata architecture and contents − BI Application Architect / Design Lead - BI tool interfaces and report design, metadata delivery, data and report navigation and delivery • Technical requirements including performance, availability, and timing needs are key drivers in developing the DW-BIM architecture • The design decisions and principles for what data detail the DW contains is a key design priority for DW-BIM architecture • Important that the DW-BIM architecture integrate with the overall corporate reporting architecture March 8, 2010 249
  • Define and Maintain the DW-BI Architecture • No DW-BIM effort can be successful without business acceptance of data • Business acceptance includes the data being understandable, having verifiable quality and having a demonstrable origin • Sign-off by the Business on the data should be part of the User Acceptance Testing • Structured random testing of the data in the BIM tool against data in the source systems over the initial load and a few update load cycles should be performed to meet sign-off criteria • Meeting these requirements is paramount for every DW-BIM architecture March 8, 2010 250
  • Implement Data Warehouses and Data Marts • The purpose of a data warehouse is to integrate data from multiple sources and then serve up that integrated data for BI purposes • Consumption is typically through data marts or other systems • A single data warehouse will integrate data from multiple source systems and serve data to multiple data marts • Purpose of data marts is to provide data for analysis to knowledge workers • Start with the end in mind - identify the business problem to solve, then identify the details and what would be used and continue to work back into the integrated data required and ultimately all the way back to the data sources. March 8, 2010 251
  • Implement Business Intelligence Tools and User Interfaces • Well defined set of well-proven BI tools • Implementing the right BI tool or User Interface (UI) is about identifying the right tools for the right user set • Almost all BI tools also come with their own metadata repositories to manage their internal data maps and statistics March 8, 2010 252
  • Query and Reporting Tools • Query and reporting is the process of querying a data source and then formatting it to create a report • With business query and reporting the data source is more often a data warehouse or data mart • While IT develops production reports, power users and casual business users develop their own reports with business query tools • Business query and reporting tools enable users who want to author their own reports or create outputs for use by others March 8, 2010 253
  • Query and Reporting Tools Landscape Customers, Suppliers and Regulators Published Frontline Workers Reports Embedded BI Executives and Managers Scorecards Analysts and Dashboards Information Workers Interactive IT Developers Fixed OLAP Reports BI Spreadsheets Business Production Reporting Tools Statistics Query Commonly Commonly Specialist Tools Used Tools Used Tools March 8, 2010 254
  • On Line Analytical Processing (OLAP) Tools • OLAP provides interactive, multi-dimensional analysis with different dimensions and different levels of detail • The value of OLAP tools and cubes is reduction of the chance of confusion and erroneous interpretation by aligning the data content with the analyst's mental model • Common OLAP operations include slice and dice, drill down, drill up, roll up, and pivot − Slice - a slice is a subset of a multi-dimensional array corresponding to a single value for one or more members of the dimensions not in the subset − Dice - the dice operation is a slice on more than two dimensions of a data cube, or more than two consecutive slices − Drill Down / Up - drilling down or up is a specific analytical technique whereby the user navigates among levels of data, ranging from the most summarised (up) to the most detailed (down) − Roll-Up – a roll-up involves computing all of the data relationships for one or more dimensions. To do this, define a computational relationship or formula − Pivot - to change the dimensional orientation of a report or page display March 8, 2010 255
  • Analytic Applications • Analytic applications include the logic and processes to extract data from well-known source systems, such as vendor ERP systems, a data model for the data mart, and pre-built reports and dashboards • Analytic applications provide businesses with a pre-built solution to optimise a functional area or industry vertical • Different types of analytic applications include customer, financial, supply chain, manufacturing, and human resource applications March 8, 2010 256
  • Implementing Management Dashboards and Scorecards • Dashboards and scorecards are both ways of efficiently presenting performance information • Dashboards are oriented more toward dynamic presentation of operational information while scorecards are more static representations of longer-term organisational, tactical, or strategic goals • Typically, scorecards are divided into 4 quadrants or views of the organisation such as Finance, Customer, Environment, and Employees, each with a number of metrics March 8, 2010 257
  • Performance Management Tools • Performance management applications include budgeting, planning, and financial consolidation March 8, 2010 258
  • Predictive Analytics and Data Mining Tools • Data mining is a particular kind of analysis that reveals patterns in data using various algorithms • A data mining tool will help users discover relationships or show patterns in more exploratory fashion March 8, 2010 259
  • Advanced Visualisation and Discovery Tools • Advanced visualisation and discovery tools allow users to interact with the data in a highly visual, interactive way • Patterns in a large dataset can be difficult to recognise in a numbers display • A pattern can be picked up visually fairly quickly when thousands of data points are loaded into a sophisticated display on a single page of display March 8, 2010 260
  • Process Data for Business Intelligence • Most of the work in any DW-BIM effort involves in the preparation and processing of the data March 8, 2010 261
  • Staging Areas • A staging area is the intermediate data store between an original data source and the centralised data repository • All required cleansing, transformation, reconciliation, and relationships happen in this area March 8, 2010 262
  • Mapping Sources and Targets • Source-to-target mapping is the documentation activity that defines data type details and transformation rules for all required entities and data elements and from each individual source to each individual target • DW-BIM adds additional requirements to this source-to-target mapping process encountered as a component of any typical data migration • One of the goals of the DW-BIM effort should be to provide a complete lineage for each data element available in the BI environment all the way back to its respective source(s) • A solid taxonomy is necessary to match the data elements in different systems into a consistent structure in the EDW March 8, 2010 263
  • Data Cleansing and Transformations (Data Acquisition) • Data cleansing focuses on the activities that correct and enhance the domain values of individual data elements, including enforcement of standards • Cleansing is particularly necessary for initial loads where significant history is involved • The preferred strategy is to push data cleansing and correction activity back to the source systems whenever possible • Data transformation focuses on activities that provide organisational context between data elements, entities, and subject areas • Organisational context includes cross- referencing, reference and master data management and complete and correct relationships • Data transformation is an essential component of being able to integrate data from multiple sources March 8, 2010 264
  • Monitor and Tune Data Warehousing Processes • Processing should be monitored across the system for bottlenecks and dependencies among processes • Database tuning techniques should be employed where and when needed, including partitioning, tuned backup and recovery strategies • Archiving is a difficult subject in data warehousing • Users often consider the data warehouse as an active archive due to the long histories that are built, and are unwilling, particularly if the OLAP sources have dropped records, to see the data warehouse engage in archiving March 8, 2010 265
  • Monitor and Tune BI Activity and Performance • A best practice for BI monitoring and tuning is to define and display a set of customer- facing satisfaction metrics • Average query response time and the number of users per day / week / month, are examples of useful metrics to display • Regular review of usage statistics and patterns is essential • Reports providing frequency and resource usage of data, queries, and reports allow prudent enhancement • Tuning BI activity is analogous to the principle of profiling applications in order to know where the bottlenecks are and where to apply optimisation efforts March 8, 2010 266
  • Document and Content Management March 8, 2010 267
  • Document and Content Management • Document and Content Management is the control over capture, storage, access, and use of data and information stored outside relational databases • Strategic and tactical focus overlaps with other data management functions in addressing the need for data governance, architecture, security, managed metadata, and data quality for unstructured data • Document and Content Management includes two sub-functions: − Document management is the storage, inventory, and control of electronic and paper documents. Document management encompasses the processes, techniques, and technologies for controlling and organising documents and records, whether stored electronically or on paper − Content management refers to the processes, techniques, and technologies for organising, categorising, and structuring access to information content, resulting in effective retrieval and reuse. Content management is particularly important in developing websites and portals, but the techniques of indexing based on keywords, and organising based on taxonomies, can be applied across technology platforms. March 8, 2010 268
  • Document and Content Management – Definition and Goals • Definition − Planning, implementation, and control activities to store, protect, and access data found within electronic files and physical records (including text, graphics, images, audio, and video) • Goals − To safeguard and ensure the availability of data assets stored in less structured formats − To enable effective and efficient retrieval and use of data and information in unstructured formats − To comply with legal obligations and customer expectations − To ensure business continuity through retention, recovery, and conversion − To control document storage operating costs March 8, 2010 269
  • Document and Content Management - Overview Inputs Primary Deliverables •Text Documents •Managed Records in Many Media •Reports Formats •Spreadsheets •E-discovery Records •Email •Outgoing Letters and Emails •Instant Messages •Contracts and Financial Documents •Faxes •Policies and Procedures •Voicemail •Audit Trails and Logs •Images •Meeting Minutes •Video Recordings •Audio Recordings Document and •Formal Reports •Significant Memoranda •Printed Paper Files •Microfiche/Microfilm Content •Graphics Management Consumers Suppliers •Business and IT Users •Employees Tools •Government Regulatory Agencies •External Parties • Senior Management •External Customers •Stored Documents Participants •Office Productivity Tools •All Employees •Image and Workflow •Data Stewards Management Tools Metrics •DM Professionals •Records Management Tools •Records Management Staff •XML Development Tools •Other IT Professionals •Collaboration Tools •Return on investment •Data Management Executive •Internet •Key Performance Indicators •Other IT Managers •Email Systems •Balanced Scorecards •Chief Information Officer •Chief Knowledge Officer March 8, 2010 270
  • Document and Content Management Function, Activities and Sub-Activities Document and Content Management Document / Record Management Content Management Define and Maintain Enterprise Plan for Managing Documents / Records Taxonomies (Information Content Architecture) Implement Document / Record Document / Index Information Content Management Systems for Acquisition, Metadata Storage, Access, and Security Controls Backup and Recover Documents / Provide Content Access and Retrieval Records Retention and Disposition of Documents Govern for Quality Content / Records Audit Document / Records Management March 8, 2010 271
  • Document and Content Management - Principles • Everyone in an organisation has a role to play in protecting its future. Everyone must create, use, retrieve, and dispose of records in accordance with the established policies and procedures • Experts in the handling of records and content should be fully engaged in policy and planning. Regulatory and best practices can vary significantly based on industry sector and legal jurisdiction • Even if records management professionals are not available to the organisation, everyone can be trained and have an understanding of the issues. Once trained, business stewards and others can collaborate on an effective approach to records management March 8, 2010 272
  • Document and Content Management • A document management system is an application used to track and store electronic documents and electronic images of paper documents • Document management systems commonly provide storage, versioning, security, metadata management, content indexing, and retrieval capabilities • A content management system is used to collect, organise, index, and retrieve information content; storing the content either as components or whole documents, while maintaining links between components • While a document management system may provide content management functionality over the documents under its control, a content management system is essentially independent of where and how the documents are stored March 8, 2010 273
  • Document / Record Management • Document / Record Management is the lifecycle management of the designated significant documents of the organisation • Records can − Physical such as documents, memos, contracts, reports or microfiche − Electronic such as email content, attachments, and instant messaging − Content on a website − Documents on all types of media and hardware − Data captured in databases of all kinds • More than 90% of the records created today are electronic • Growth in email and instant messaging has made the management of electronic records critical to an organisation March 8, 2010 274
  • Document / Record Management • The lifecycle of Document / Record Management includes: − Identification of existing and newly created documents / records − Creation, Approval, and Enforcement of documents / records policies − Classification of documents / records − Documents / Records Retention Policy − Storage: Short and long term storage of physical and electronic documents / records − Retrieval and Circulation: Allowing access and circulation of documents / records in accordance with policies, security and control standards, and legal requirements − Preservation and Disposal: Archiving and destroying documents / records according to organisational needs, statutes, and regulations March 8, 2010 275
  • Plan for Managing Documents / Records • Plan document lifecycle from creation or receipt, organisation for retrieval, distribution and archiving or disposition • Develop classification / indexing systems and taxonomies so that the retrieval of documents is easy • Create planning and policy around documents and records on the value of the data to the organisation and as evidence of business transactions • Identify the responsible, accountable organisational unit for managing the documents / records • Develop and execute a retention plan and policy to archive, such as selected records for long-term preservation • Records are destroyed at the end of their lifecycle according to operational needs, procedures, statutes and regulations March 8, 2010 276
  • Implement Document / Record Management Systems for Acquisition, Storage, Access, and Security Controls • Documents can be created within a document management system or captured via scanners or OCR software • Electronic documents must be indexed via keywords or text during the capture process so that the document can be found • A document repository enables check-in and check-out features, versioning, collaboration, comparison, archiving, status state(s), migration from one storage media to another and disposition • Document management can support different types of workflows − Manual workflows that indicate where the user sends the document − Rules-based workflow, where rules are created that dictate the flow of the document within an organisation − Dynamic rules that allow for different workflows based on content March 8, 2010 277
  • Backup and Recover Documents / Records • The document / record management system needs to be included as part of the overall corporate backup and recovery activities for all data and information • Document / records manager be involved in risk mitigation and management, and business continuity especially regarding security for vital records • A vital records program provides the organisation with access to the records necessary to conduct its business during a disaster and to resume normal business afterward March 8, 2010 278
  • Retention and Disposition of Documents / Records • Defines the period of time during which documents / records for operational, legal, financial or historical value must be maintained • Specifies the processes for compliance, and the methods and schedules for the disposition of documents / records • Must deal with privacy and data protection issues • Legal and regulatory requirements must be considered when setting up document record retention schedules March 8, 2010 279
  • Audit Document / Records Management • Document / records management requires auditing on a periodic basis to ensure that the right information is getting to the right people at the right time for decision making or performing operational activities − Inventory - Each location in the inventory is uniquely identified − Storage - Storage areas for physical documents / records have adequate space to accommodate growth − Reliability and Accuracy - Spot checks are executed to confirm that the documents / records are an adequate reflection of what has been created or received − Classification and Indexing Schemes - Metadata and document file plans are well described − Access and Retrieval - End users find and retrieve critical information easily − Retention Processes - Retention schedule is structured in a logical way − Disposition Methods - Documents / records are disposed of as recommended − Security and Confidentiality - Breaches of document / record confidentiality and loss of documents / records are recorded as security incidents and managed appropriately − Organisational Understanding of Documents / Records Management - Appropriate training is provided to stakeholders and staff as to the roles and responsibilities related to document / records management March 8, 2010 280
  • Content Management • Organisation, categorisation, and structure of data / resources so that they can be stored, published, and reused in multiple ways • Includes data / information, that exists in many forms and in multiple stages of completion within its lifecycle • Content management systems manage the content of a website or intranet through the creation, editing, storing, organising, and publishing of content March 8, 2010 281
  • Define and Maintain Enterprise Taxonomies (Information Content Architecture) • Process of creating a structure for a body of information or content • Contains a controlled vocabulary that can help with navigation and search systems • Content Architecture identifies the links and relationships between documents and content, specifies document requirements and attributes and defines the structure of content in a document or content management system March 8, 2010 282
  • Document / Index Information Content Metadata • Development of metadata for unstructured data content • Maintenance of metadata for unstructured data becomes the maintenance of a cross-reference of various local schemes to the official set of organisation metadata March 8, 2010 283
  • Provide Content Access and Retrieval • Once the content has been described by metadata / key word tagging and classified within the appropriate Information Content Architecture, it is available for retrieval and use • Finding unstructured data can be eased through portal technology March 8, 2010 284
  • Govern for Quality Content • Managing unstructured data requires effective partnerships between data stewards, data professionals, and records managers • The focus of data governance can include document and record retention policies, electronic signature policies, reporting formats, and report distribution policies • High quality, accurate, and up-to-date information will aid in critical business decisions • Timeliness of the decision-making process with high quality information may increase competitive advantage and business effectiveness March 8, 2010 285
  • Metadata Management March 8, 2010 286
  • Metadata Management • Metadata is data about data • Metadata Management is the set of processes that ensure proper creation, storage, integration, and control to support associated usage of metadata • Leveraging metadata in an organisation can provide benefits − Increase the value of strategic information by providing context for the data, thus aiding analysts in making more effective decisions − Reduce training costs and lower the impact of staff turnover through thorough documentation of data context, history, and origin − Reduce data-oriented research time by assisting business analysts in finding the information they need, in a timely manner − Improve communication by bridging the gap between business users and IT professionals, leveraging work done by other teams, and increasing confidence in IT system data − Increase speed of system development time-to-market by reducing system development life-cycle time − Reduce risk of project failure through better impact analysis at various levels during change management − Identify and reduce redundant data and processes, thereby reducing rework and use of redundant, out-of-date, or incorrect data March 8, 2010 287
  • Metadata Management – Definition and Goals • Definition − Planning, implementation, and control activities to enable easy access to high quality, integrated metadata • Goals − Provide organisational understanding of terms, and usage − Integrate metadata from diverse source − Provide easy, integrated access to metadata − Ensure metadata quality and security March 8, 2010 288
  • Metadata • Metadata is information about the physical data, technical and business processes, data rules and constraints, and logical and physical structures of the data, as used by an organisation • Descriptive tags describe data, concepts and the connections between the data and concepts − Business Analytics: Data definitions, reports, users, usage, performance − Business Architecture: Roles and organisations, goals and objectives − Business Definitions: The business terms and explanations for a particular concept, fact, or other item found in an organisation − Business Rules: Standard calculations and derivation methods − Data Governance: Policies, standards, procedures, programs, roles, organisations, stewardship assignments − Data Integration: Sources, targets, transformations, lineage, ETL workflows, EAI, EII, migration / conversion − Data Quality: Defects, metrics, ratings − Document Content Management: Unstructured data, documents, taxonomies, name sets, legal discovery, search engine indexes − Information Technology Infrastructure: Platforms, networks, configurations, licenses − Logical Data Models: Entities, attributes, relationships and rules, business names and definitions − Physical Data Models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management − Process Models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores − Systems Portfolio and IT Governance: Databases, applications, projects and programs, integration roadmap, change management − Service-Oriented Architecture (SOA) Information: Components, services, messages, master data − System Design and Development: Requirements, designs and test plans, impact − Systems Management: Data security, licenses, configuration, reliability, service levels March 8, 2010 289
  • Metadata Management - Overview Inputs Primary Deliverables •Metadata Requirements •Metadata Issues •Metadata Repositories •Data Architecture •Quality Metadata •Business Metadata •Metadata Models and Architecture •Technical Metadata •Metadata Management •Process Metadata •Operational Analysis •Operational Metadata •Metadata Analysis •Data Stewardship Metadata •Data Lineage •Change Impact Analysis Metadata •Metadata Control Procedures Suppliers Management •Data Stewards Consumers •Data Architects •Data Stewards •Data Modelers •Data Professionals •Database Administrators •Other IT Professionals •Other Data Professionals •Knowledge Workers •Data Brokers Tools •Managers and Executives •Government and Industry Regulators •Customers and Collaborators •Metadata Repositories •Business Users •Data Modeling Tools •Database Management Systems Participants •Data Integration Tools Metrics •Business Intelligence Tools •Metadata Specialist •System Management Tools •Meta Data Quality •Data Integration Architects •Object Modeling Tools •Master Data Service Data Compliance •Data Stewards •Process Modeling Tools •Metadata Repository Contribution •Data Architects and Modelers •Report Generating Tools Metadata Documentation Quality •Database Administrators •Data Quality Tools Steward Representation / Coverage •Other DM Professionals •Data Development and •Metadata Usage / Reference •Other IT Professionals Administration Tools •Metadata Management Maturity •DM Executive •Reference and Master Data •Metadata Repository Availability •Business Users Management Tools March 8, 2010 290
  • Metadata Management Function, Activities and Sub- Activities Metadata Management Develop and Implement a Understand Define the Create and Manage Distribute and Query, Report Maintain Managed Integrate Metadata Metadata Maintain Metadata Deliver and Analyse Metadata Metadata Metadata Requirements Architecture Metadata Repositories Metadata Metadata Standards Environment Industry / Centralised Business User Consensus Metadata Metadata Requirements Metadata Repositories Architecture Standards Directories, Distributed International Glossaries and Technical User Metadata Metadata Other Requirements Architecture Standards Metadata Stores Hybrid Standard Metadata Metadata Architecture Metrics March 8, 2010 291
  • Metadata Management - Principles • Establish and maintain a metadata strategy and appropriate policies, especially clear goals and objectives for metadata management and usage • Secure sustained commitment, funding, and vocal support from senior management concerning metadata management for the enterprise • Take an enterprise perspective to ensure future extensibility, but implement through iterative and incremental delivery • Develop a metadata strategy before evaluating, purchasing, and installing metadata management products • Create or adopt metadata standards to ensure interoperability of metadata across the enterprise • Ensure effective metadata acquisition for both internal and external meta- data • Maximise user access, since a solution that is not accessed or is under-accessed will not show business value • Understand and communicate the necessity of metadata and the purpose of each type of metadata; socialisation of the value of metadata will encourage business usage • Measure content and usage • Leverage XML, messaging, and Web services • Establish and maintain enterprise-wide business involvement in data stewardship, assigning accountability for metadata • Define and monitor procedures and processes to ensure correct policy implementation • Include a focus on roles, staffing, standards, procedures, training, and metrics • Provide dedicated metadata experts to the project and beyond • Certify metadata quality March 8, 2010 292
  • Understand Metadata Requirements • Metadata management strategy must reflect an understanding of enterprise needs for metadata • Gather requirements to confirm the need for a metadata management environment, to set scope and priorities, educate and communicate, to guide tool evaluation and implementation, guide metadata modeling, guide internal metadata standards, guide provided services that rely on metadata, and to estimate and justify staffing needs • Gather requirements from business and technical users • Summarise the requirements from an analysis of roles, responsibilities, challenges, and the information needs of selected individuals in the organisation March 8, 2010 293
  • Business User Requirements • Business users require improved understanding of the information from operational and analytical systems • Business users require a high level of confidence in the information obtained from corporate data warehouses, analytical applications, and operational systems • Need appropriate access to information delivery methods, such as reports, queries, ad-hoc, OLAP, dashboards with a high degree of quality documentation and context • Business users must understand the intent and purpose of metadata management March 8, 2010 294
  • Technical User Requirements • Technical requirement topics include − Daily feed throughput: size and processing time − Existing metadata − Sources - known and unknown − Targets − Transformations − Architecture flow logical and physical − Non-standard metadata requirements • Technical users must understand the business context of the data at a sufficient level to provide the necessary support, including implementing the calculations or derived data rules March 8, 2010 295
  • Define the Metadata Architecture • Metadata management solutions consist of − Metadata creation / sourcing − metadata integration − Mmetadata repositories − Metadata delivery − Metadata usage − Metadata control / management March 8, 2010 296
  • Centralised Metadata Architecture • Single metadata repository that contains copies of the live metadata from the various sources • Advantages − High availability, since it is independent of the source systems − Quick metadata retrieval, since the repository and the query reside together − Resolved database structures that are not affected by the proprietary nature of third party or commercial systems − Extracted metadata may be transformed or enhanced with additional metadata that may not reside in the source system, improving quality • Disadvantages − Complex processes are necessary to ensure that changes in source metadata quickly replicate into the repository − Maintenance of a centralised repository can be substantial − Extraction could require custom additional modules or middleware − Validation and maintenance of customised code can increase the demands on both internal IT staff and the software vendors March 8, 2010 297
  • Distributed Metadata Architecture • Metadata retrieval engine responds to user requests by retrieving data from source systems in real time with no persistent repository • Advantages − Metadata is always as current and valid as possible − Queries are distributed, possibly improving response / process time − Metadata requests from proprietary systems are limited to query processing rather than requiring a detailed understanding of proprietary data structures, therefore minimising the implementation and maintenance effort required − Development of automated metadata query processing is likely simpler, requiring minimal manual intervention − Batch processing is reduced, with no metadata replication or synchronisation processes • Disadvantages − No enhancement or standardisation of metadata is possible between systems − Query capabilities are directly affected by the availability of the participating source systems − No ability to support user-defined or manually inserted metadata entries since there is no repository in which to place these additions March 8, 2010 298
  • Hybrid Metadata Architecture • Hybrid architecture where metadata still moves directly from the source systems into a repository but the repository design only accounts for the user-added metadata, the critical standardised items and the additions from manual sources • Advantages − Near-real-time retrieval of metadata from its source and enhanced metadata to meet user needs most effectively, when needed − Lowers the effort for manual IT intervention and custom-coded access functionality to proprietary systems. • Disadvantages − Source systems must be available because the distributed nature of the back-end systems handles processing of queries − Additional overhead is required to link those initial results with metadata augmentation in the central repository before presenting the result set to the end user − Design forces the metadata repository to contain the latest version of the metadata source and forces it to manage changes to the source, as well − Sets of program / process interfaces to tie the repository back to the meta- data source(s) must be built and maintained March 8, 2010 299
  • Develop and Maintain Metadata Standards • Check industry or consensus standards and international standards • International standards provide the framework from which the industry standards are developed and executed March 8, 2010 300
  • Industry / Consensus Metadata Standards • Understanding the various standards for the implementation and management of meta- data in industry is essential to the appropriate selection and use of a metadata solution for an enterprise − OMG (Object Management Group) specifications • Common Warehouse Metadata (CWM) • Information Management Metamodel (IMM) • MDC Open Information Model (OIM) • Extensible Markup Language (XML) • Unified Modeling Language (UML) • Extensible Markup Interface (XMI) • Ontology Definition Metamodel (ODM) − World Wide Web Consortium (W3C) RDF (Relational Definition Framework) for describing and interchanging meta- data using XML − Dublin Core Metadata Initiative (DCMI) interoperable online metadata standard using RDF − Distributed Management Task Force (DTMF) Web-Based Enterprise Management (WBEM) Common Information Model (CIM) standards-based management tools facilitating the exchange of data across otherwise disparate technologies and platforms − Metadata standards for unstructured data • ISO 5964 - Guidelines for the establishment and development of multilingual thesauri • ISO 2788 - Guidelines for the establishment and development of monolingual thesauri • ANSI/NISO Z39.1 - American Standard Reference Data and Arrangement of Periodicals • ISO 704 - Terminology work Principles and methods March 8, 2010 301
  • International Metadata Standards • ISO / IEC 11179 is an international metadata standard for standardising and registering of data elements to make data understandable and shareable March 8, 2010 302
  • Standard Metadata Metrics • Controlling the effectiveness of the metadata deployed environment requires measurements to assess user uptake, organisational commitment, and content coverage and quality − Metadata Repository Completeness − Metadata Documentation Quality − Master Data Service Data Compliance − Steward Representation / Coverage − Metadata Usage / Reference − Metadata Management Maturity − Metadata Repository Availability March 8, 2010 303
  • Implement a Managed Metadata Environment • Implement a managed metadata environment in incremental steps in order to minimise risks to the organisation and to facilitate acceptance • First implementation is a pilot to prove concepts and learn about managing the metadata environment March 8, 2010 304
  • Create and Maintain Metadata • Metadata creation and update facility provides for the periodic scanning and updating of the repository in addition to the manual insertion and manipulation of metadata by authorised users and program • Audit process validates activities and reports exceptions • Metadata is the guide to the data in the organisation so its quality is critical March 8, 2010 305
  • Integrate Metadata • Integration processes gather and consolidate metadata from across the enterprise including metadata from data acquired outside the enterprise • Challenges will arise in integration that will require resolution through the governance process • Use a non-persistent metadata staging area to store temporary and backup files that supports rollback and recovery processes and provides an interim audit trail to assist repository managers when investigating metadata source or quality issues • ETL tools used for data warehousing and Business Intelligence applications are often used effectively in metadata integration processes March 8, 2010 306
  • Manage Metadata Repositories • Implement a number of control activities in order to manage the metadata environment • Control of repositories is control of metadata movement and repository updates performed by the metadata specialist March 8, 2010 307
  • Metadata Repositories • Metadata repository refers to the physical tables in which the metadata are stored • Generic design and not merely reflecting the source system database designs • Metadata should be as integrated as possible this will be one of the most direct valued-added elements of the repository March 8, 2010 308
  • Directories, Glossaries and Other Metadata Stores • A Directory is a type of metadata store that limits the metadata to the location or source of data in the enterprise • A Glossary typically provides guidance for use of terms • Other Metadata stores include specialised lists such as source lists or interfaces, code sets, lexicons, spatial and temporal schema, spatial reference, and distribution of digital geographic data sets, repositories of repositories and business rules March 8, 2010 309
  • Distribute and Deliver Metadata • Metadata delivery layer is responsible for the delivery of the metadata from the repository to the end users and to any applications or tools that require metadata feeds to them March 8, 2010 310
  • Query, Report and Analyse Metadata • Metadata guides management and use of data assets • A metadata repository must have a front-end application that supports the search-and- retrieval functionality required for all this guidance and management of data assets March 8, 2010 311
  • Data Quality Management March 8, 2010 312
  • Data Quality Management • Critical support process in organisational change management • Data quality is synonymous with information quality since poor data quality results in inaccurate information and poor business performance • Data cleansing may result in short-term and costly improvements that do not address the root causes of data defects • More rigorous data quality program is necessary to provide an economic solution to improved data quality and integrity • Institutionalising processes for data quality oversight, management, and improvement hinges on identifying the business needs for quality data and determining the best ways to measure, monitor, control, and report on the quality of data • Continuous process for defining the parameters for specifying acceptable levels of data quality to meet business needs, and for ensuring that data quality meets these levels March 8, 2010 313
  • Data Quality Management – Definition and Goals • Definition − Planning, implementation, and control activities that apply quality management techniques to measure, assess, improve, and ensure the fitness of data for use • Goals − To measurably improve the quality of data in relation to defined business expectations − To define requirements and specifications for integrating data quality control into the system development lifecycle − To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality March 8, 2010 314
  • Data Quality Management • Data quality expectations provide the inputs necessary to define the data quality framework • Framework includes defining the requirements, inspection policies, measures, and monitors that reflect changes in data quality and performance • Requirements reflect three aspects of business data expectations − Way to record the expectation in business rules − Way to measure the quality of data within that dimension − Acceptability threshold March 8, 2010 315
  • Data Quality Management Approach • Planning for the assessment of the current state and identification of key metrics for measuring data quality • Deploying processes for measuring and improving the quality of data • Monitoring and measuring the levels in relation to the defined business expectations • Acting to resolve any identified issues to improve data quality and better meet business expectations March 8, 2010 316
  • Data Quality Management - Overview Inputs Primary Deliverables •Business Requirements •Data Requirements •Improved Quality Data •Data Quality Expectations •Data Management •Data Policies and Standards •Operational Analysis •Business metadata •Data Profiles •Technical metadata •Data Quality Certification Reports •Data Sources and Data Stores •Data Quality Service Level Agreements Suppliers •External Sources Consumers •Regulatory Bodies Data Quality •Business Subject Matter Experts •Information Consumers Management •Data Stewards •Data Professionals •Data Producers •Other IT Professionals •Data Architects •Knowledge Workers •Data Modelers •Managers and Executives Customers Participants Tools Metrics •Data Quality Analysts •Data Analysts •Data Profiling Tools •Data Value Statistics •Database Administrators •Statistical Analysis Tools •Errors / Requirement Violations •Data Stewards •Data Cleansing Tools •Conformance to Expectations •Other Data Professionals •Data Integration Tools •Conformance to Service Levels •DRM Director •Issue and Event Management Tools •Data Stewardship Council March 8, 2010 317
  • Data Quality Management Function, Activities and Sub-Activities Data Quality Management Monitor Design and Develop and Profile, Define Data Test and Set and Continuously Clean and Operational Define Data Define Data Implement Promote Data Analyse and Quality Validate Data Evaluate Data Measure and Manage Data Correct Data DQM Quality Quality Operational Quality Assess Data Business Quality Quality Monitor Data Quality Issues Quality Procedures Requirements Metrics DQM Awareness Quality Rules Requirements Service Levels Quality Defects and Procedures Performance March 8, 2010 318
  • Data Quality Management - Principles • Manage data as a core organisational asset • All data elements will have a standardised data definition, data type, and acceptable value domain • Leverage Data Governance for the control and performance of DQM • Use industry and international data standards whenever possible • Downstream data consumers specify data quality expectations • Define business rules to assert conformance to data quality expectations • Validate data instances and data sets against defined business rules • Business process owners will agree to and abide by data quality SLAs • Apply data corrections at the original source, if possible • If it is not possible to correct data at the source, forward data corrections to the owner of the original source whenever possible • Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers • Identify a gold record for all data elements March 8, 2010 319
  • Develop and Promote Data Quality Awareness • Promoting data quality awareness means more than ensuring that the right people in the organisation are aware of the existence of data quality issues • Establish a data governance framework for data quality − Set priorities for data quality − Develop and maintain standards for data quality − Report relevant measurements of enterprise-wide data quality − Provide guidance that facilitates staff involvement − Establish communications mechanisms for knowledge sharing − Develop and apply certification and compliance policies − Monitor and report on performance − Identify opportunities for improvements and build consensus for approval − Resolve variations and conflicts March 8, 2010 320
  • Define Data Quality Requirements • Applications are dependent on the use of data that meets specific needs associated with the successful completion of a business process • Data quality requirements are often hidden within defined business policies − Identify key data components associated with business policies − Determine how identified data assertions affect the business − Evaluate how data errors are categorised within a set of data quality dimensions − Specify the business rules that measure the occurrence of data errors − Provide a means for implementing measurement processes that assess conformance to those business rules • Dimensions of data quality − Accuracy − Completeness − Consistency − Currency − Precision − Privacy − Reasonableness − Referential Integrity − Timeliness − Uniqueness − Validity March 8, 2010 321
  • Profile, Analyse and Assess Data Quality • Perform an assessment of the data using two different approaches, bottom-up and top-down • Bottom-up assessment of existing data quality issues involves inspection and evaluation of the data sets themselves • Top-down approach involves understanding how their processes consume data, and which data elements are critical to the success of the business application − Identify a data set for review − Catalog the business uses of that data set − Subject the data set to empirical analysis using data profiling tools and techniques − List all potential anomalies, review and evaluate − Prioritise criticality of important anomalies in preparation for defining data quality metrics March 8, 2010 322
  • Define Data Quality Metrics • Poor data quality affects the achievement of business objectives • Seek and use indicators of data quality performance to report the relationship between flawed data and missed business objectives • Measuring quality similarly to monitoring any type of business performance activity • Data quality metrics should be reasonable and effective − Measurability − Business Relevance − Acceptability − Accountability / Stewardship − Controllability − Trackability March 8, 2010 323
  • Define Data Quality Business Rules • Measurement of conformance to specific business rules requires definition • Monitoring conformance to these rules requires • Segregating data values, records, and collections of records that do not meet business needs from the valid ones • Generating a notification event alerting a data steward of a potential data quality issue • Establishing an automated or event driven process for aligning or possibly correcting flawed data within business expectations March 8, 2010 324
  • Test and Validate Data Quality Requirements • Data profiling tools analyse data to find potential anomalies • Data profiling tools allow data analysts to define data rules for validation, assessing frequency distributions and corresponding measurements and then applying the defined rules against the data sets • Characterising data quality levels based on data rule conformance provides an objective measure of data quality • By using defined data rules to validate data, an organisation can distinguish those records that conform to defined data quality expectations and those that do not • In turn, these data rules are used to baseline the current level of data quality as compared to ongoing audits March 8, 2010 325
  • Set and Evaluate Data Quality Service Levels • Data quality SLAs specify the organisation’s expectations for response and remediation • Having data quality inspection and monitoring in place increases the likelihood of detection and remediation of a data quality issue before a significant business impact can occur • Operational data quality control defined in a data quality SLA includes − The data elements covered by the agreement − The business impacts associated with data flaws − The data quality dimensions associated with each data element − The expectations for quality for each data element for each of the identified dimensions in each application or system in the value chain − The methods for measuring against those expectations − The acceptability threshold for each measurement − The individual(s) to be notified in case the acceptability threshold is not met. The timelines and deadlines for expected resolution or remediation of the issue − The escalation strategy and possible rewards and penalties when the resolution times are met. March 8, 2010 326
  • Continuously Measure and Monitor Data Quality • Provide continuous monitoring by incorporating control and measurement processes into the information processing flow • Incorporating the results of the control and measurement processes into both the operational procedures and reporting frameworks enable continuous monitoring of the levels of data quality March 8, 2010 327
  • Manage Data Quality Issues • Supporting the enforcement of the data quality SLA requires a mechanism for reporting and tracking data quality incidents and activities for researching and resolving those incidents • Data quality incident reporting system provides this capability • Tracking of data quality incidents provides performance reporting data, including mean-time-to-resolve issues, frequency of occurrence of issues, types of issues, sources of issues and common approaches for correcting or eliminating problems • Data quality incident tracking also requires a focus on training staff to recognise when data issues appear and how they are to be classified, logged and tracked according to the data quality SLA • Implementing a data quality issues tracking system provides a number of benefits − Information and knowledge sharing can improve performance and reduce duplication of effort − Analysis of all the issues will help data quality team members determine any repetitive patterns, their frequency, and potentially the source of the issue March 8, 2010 328
  • Clean and Correct Data Quality Defects • Perform data correction in three general ways − Automated correction - Submit the data to data quality and data cleansing techniques using a collection of data transformations and rule-based standardisations, normalisations, and corrections − Manual directed correction - Use automated tools to cleanse and correct data but require manual review before committing the corrections to persistent storage − Manual correction: Data stewards inspect invalid records and determine the correct values, make the corrections, and commit the updated records March 8, 2010 329
  • Design and Implement Operational DQM Procedures • Using defined rules for validation of data quality provides a means of integrating data inspection into a set of operational procedures associated with active DQM • Design and implement detailed procedures for operationalising activities − Inspection and monitoring − Diagnosis and evaluation of remediation alternatives − Resolving the issue − Reporting March 8, 2010 330
  • Monitor Operational DQM Procedures and Performance • Accountability is critical to the governance protocols overseeing data quality control • Issues must be assigned to some number of individuals, groups, departments, or organisations • Tracking process should specify and document the ultimate issue accountability to prevent issues from dropping through the cracks • Metrics can provide valuable insights into the effectiveness of the current workflow, as well as systems and resource utilisation and are important management data points that can drive continuous operational improvement for data quality control March 8, 2010 331
  • Conducting a Data Management Project March 8, 2010 332
  • Conducting a Data Management Project • Data management project depends on: − Scope of the Project – data management functions to be encompassed − Type of Project – from architecture to analysis to implementation − Scope Within the Organisation – one or more business units or the entire organisation March 8, 2010 333
  • Data Management Function and Project Type Scope of Project Data Data Data Data Data Security Reference and Data Document Metadata Data Quality Type of Governance Architecture Development Operations Management Master Data Warehousing and Content Management Management Project Management Management Management and Business Intelligence Management Management Architecture Analysis and Design Implementation Operational Improvement Management and Administration March 8, 2010 334
  • Mapping the Path Through the Selected Data Management Project • Use the framework to define the breakdown of the selected project March 8, 2010 335
  • Project Elements – Data Management Functions, Type of Project, Organisational Scope Organisational Scope of Project Data Management Functions Within Scope of Project Type of Project • Select the project building blocks based on the project scope March 8, 2010 336
  • Creating a Data Management Team March 8, 2010 337
  • Creating a Data Management Team • Having implemented a data management framework, must be monitored, managed and constantly improved • Need to consolidate and coordinate data management and governance efforts to meet the challenges of − Demand for performance management data − Complexity in systems and processes − Greater regulatory and compliance requirements • Build a Data Management Center of Excellence (DMCOE) March 8, 2010 338
  • Data Management Center of Excellence • Separate business units with the organisation generally implement their own solutions • Each business unit will have different IT systems, data warehouses/data marts and business intelligence tools • Organisation-wide coordination of data resources requires a centralised dedicated structure like the DMCOE providing data services • Leads a organisation to business benefits through continuous improvement of data management • DMCOE functions need to focus on leveraging organisational knowledge and skills to maximise the value of data to the organisation • Maximise technology investment while decreasing costs and increasing efficiency, centralise best practices and standards and empower knowledge workers with information and provide thought leadership to the entire company • DMCOE does not exist in isolation to other operations and service management functions March 8, 2010 339
  • DMCOE Functions • Maximise the value of the data technology investment to the organisation by taking a portfolio approach to increase skills and leverage and to optimise the infrastructure • Focus on project delivery and information asset creation with an emphasis on reusability and knowledge management along with solution delivery • Ensure the integrity of the organisation’s business processes and information systems • Ensure the quality compliance effort related to the configuration, development, and documentation of enhancements • Develop information learning and effective practices March 8, 2010 340
  • Data Charter • Create charter that lists the fundamental principles of data management the DMCOE will adhere to: − Data Strategy - Create a data blueprint, based upon business functions to facilitate data design − Data Sharing - Promote the sharing of data across the organisation and reduce data redundancy − Data Integrity - Ensure the integrity of data from design and availability perspectives − Technical Expertise - Provide the expertise for the development and support of data systems − High Availability and Optimal Performance - Ensure consistent high availability of data systems through proper design and use and optimise performance of the data systems March 8, 2010 341
  • DMCOE Skills • DMCOE needs skills across three dimensions − Specific data management functions − Business management and administration − Technology and service management March 8, 2010 342
  • DMCOE Skills Data Management Design and Development Data Management Data Management Business Skills Process Management Personnel Management Data Management Portfolio Management Data Reference Data Data Warehousing Document Data Management Data Data Data Security and Master Metadata Data Quality Architecture Operations and Business and Content Strategy Governance Development Management Data Management Management Management Management Intelligence Management Management Management Data Management Environment and Specific Functions Infrastructure Management Service Management • Idealised set of DMCOE skills that need to Data and Support be customised to suit specific organisation Management Application Deployment needs Technology and and Data Migration Service Functions • Just one view of a DMCOE Technical Architecture March 8, 2010 343
  • DMCOE Business Management and Administration Skills DMCOE Business Management and Administration Data Management Data Management Data Management Data Management Personnel Portfolio Process Design and Strategy Management Management Management Development Management of Education and Creation and Requirements Strategic Planning Portfolio of Data Skills Enforcement of Definition and Processes Management Identification and Process Standards Management Initiatives Development Co-ordination of Resource Data Management Management of Analysis and Management and Systems and Data Processes Design Allocation Initiatives Creation and Enforcement of Vendor Performance Development Data Principles Management Management Standards and Standards Solution Data Usage Data Quality Development and Strategy Deployment March 8, 2010 344
  • DMCOE Technology and Service Management Skills DMCOE Technology and Service Management Environment and Application Service Management Infrastructure Deployment and Data Technical Architecture and Support Management Migration Change Management Application Infrastructure Service Desk and Control Deployment Architecture Test Management – Version Management Service Level Application and Tools System, Integration, and Control Management Architecture UAT, UAT Support Performance Data Migration Data and Content Monitoring and Security Management Management Architecture Management Integration Reporting System Maintenance Architecture March 8, 2010 345
  • Benefits of DMCOE • Consistent infrastructure that reduces time to analyse and design and implement new IT solutions • Reduced data management costs through a consistent data architecture and data integration infrastructure - reduced complexity, redundancy, tool proliferation • Centralised repository of the organisation's data knowledge • Organisation-wide standard methodology and processes to develop and maintain data infrastructure and procedures • Increased data availability • Increased data quality March 8, 2010 346
  • Assessing Your Data Management Maturity March 8, 2010 347
  • Assessing Your Data Management Maturity • A Data Management Maturity Model is a measure of and then a process for determining the level of maturity that exists within an organisation’s data management function • Provides a systematic framework for improving data management capability and identifying and prioritising opportunities, reducing cost and optimising the business value of data management investments • Measure of data management maturity so that: − It can be tracked over time to measure improvements − It can be use to define project for data management maturity improvements within costs, time, and return on investment constraints • Enables organisations to improve their data management function so that they can increase productivity, increase quality, decrease cost and decrease risk March 8, 2010 348
  • Data Management Maturity Model • Assesses data management maturity on a level of 1 to 5 across a number of data management capabilities Level Title Description 1 Initial Data management is ad hoc and localised. Everybody has their own approach that is unique and not standardised except for local initiatives. 2 Repeatable and Data management has become independent of the person or Reactive business unit administering and is standardised. 3 Defined and Data management is fully documented, determined by subject Standardised matter experts and validated. 4 Managed and Data management results and outcomes are stored and pro- Predictable actively cross-related within and between business units. The data management function actively exploit benefits of standardisation. 5 Optimising and As time, resources, technology, requirements and business Innovating landscape changes the data management function is able to be easily and quickly adjusted to fit new needs and environments March 8, 2010 349
  • Maturity Level 1 - Initial • Data management processes are mostly disorganised and generally performed on an ad hoc or even even chaotic basis • Data is considered as general purpose and is not viewed by either business or executive management to be a problem or a priority • Data is accessible but not always available and is not secure or auditable • No data management group and no one owns the responsibility for ensuring the quality, accuracy or integrity of the data • Data management (to the degree that it is done at all) is reliant on the efforts and competence of individuals • Data proliferates without control and the quality is inconsistent across the various business and applications silos • Data exists in unconnected databases and spreadsheets using multiple formats and inconsistent definitions • Little data profiling or analysis and data is not considered or understood as a component of linked processes • No formal data quality processes and the processes that do exist are not repeatable because they are neither well defined nor well documented March 8, 2010 350
  • Maturity Level 2 - Repeatable and Reactive • Fundamental data management practices are established, defined, documented and can be repeated • Data policies for creation and change management exist, but still rely on individuals and are not institutionalised throughout the organisation • Data as valuable asset is a concept understood by some, but senior management support is lacking and there is little organisational buy-in to the importance of an enterprise-wide approach to managing data • data is stored locally and data quality is reactive to circumstances • Requirements are known and managed at the business unit and application level • Procurement is ad hoc based on individual needs and data duplication is mostly invisible • Data quality varies among business units and data failures occur on a cross- functional basis. • Most data is integrated point-to-point and not across business units March 8, 2010 351
  • Maturity Level 3 - Defined and Standardised • Business analysts begin to control the data management process with IT playing a supporting role • Data is recognised as a business enabler and moves from an undervalued commodity to an enterprise asset but there are still limited controls in place • Executive management appreciates and understands the role of data governance and commits resources to its management • Data administrative function exists as a complement to the database administration function and data is present for both business and IT related development discussions • Some core data has defined policy that it is documented as part of the applications development lifecycle and the policies are enforced to a limited extent and testing is performed to ensure that data quality requirements are being achieved • Data quality is not fully defined and there are multiple views of what quality • Metadata repository exists and a data group maintains corporate data definitions and business rules • A centralised platform for managing data is available at the group level and feeds analytical data marts • Data is available to business users and can be audited March 8, 2010 352
  • Maturity Level 4 - Managed and Predictable • Data is treated as a critical corporate asset and viewed as equivalent to other enterprise wide assets • Unified data governance strategy exists throughout the enterprise with executive level and CEO support • Data management objectives are reviewed by senior management • Business process interaction is completely documented and planning is centralised • Data quality control, integration and synchronisation are integral parts of all business processes • Content is monitored and corrected in real time to manage the reliability of the data manufacturing process and is based on the needs of customers, end users and the organisation as a whole • Data quality is understood in statistical terms and managed throughout the transactions lifecycle • Root cause analysis is well established and proactive steps are taken to prevent and not just correct data inconsistencies • A centralised metadata repository exists and all changes are synchronised • Data consistency is expected and achieved • Data platform is managed at the enterprise level and feeds all reference data repositories • Advanced platform tools are used to manage the metadata repository and all data transformation processes • Data quality and integration tools are standardised across the enterprise. March 8, 2010 353
  • Maturity Level 5 - Optimising and Innovating • The organisation is in continuous improvement mode • Process enhancements are managed through monitoring feedback and a quantitative understanding of the causes of data inconsistencies • Enterprise wide business intelligence is possible • Organisation is agile enough to respond to changing circumstances and evolving business objectives • Data is considered as the key resource for process improvement • Data requirements for all projects are defined and agreed prior to initiation • Development stresses the re-use of data and is synchronised with the procurement process • Process of data management is continuously being improved • Data quality (both monitoring and correction) is fully automated and adaptive • Uncontrolled data duplication is eliminated and controlled duplication must be justified • Governance is data driven and the organisation adopts a “test and learn” philosophy March 8, 2010 354
  • Data Management Maturity Evaluation - Key Capabilities and Maturity Levels Level 1 Level 2 Level 3 Level 4 Level 5 Data Governance < Description of Data Architecture Management capability associated with maturity level > Data Development Data Operations Management Data Security Management Reference and Master Data Management Data Warehousing and Business Intelligence Management Document and Content Management Metadata Management Data Quality Management March 8, 2010 355
  • More Information Alan McSweeney alan@alanmcsweeney.com March 8, 2010 356