Successfully Kickstarting Data Governance's Social Dynamics: Define, Collaborate, Validate
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Successfully Kickstarting Data Governance's Social Dynamics: Define, Collaborate, Validate



Learn how to launch your data governance program, by answering three questions: ...

Learn how to launch your data governance program, by answering three questions:
- What does my data mean: collect and manage business definitions and relations, taxonomies and classifications, business rules and ontologies;
- How can I involve all stakeholders: engage them across business units and geographies, with stewards, data owners, … in a guiding workflow;
- How do I operationalize data governance: link MDM, DQ and BI to the business, use business-driven semantic modelling, achieve end-to end traceabilitiy. During this session we will use examples from different verticals: Finance, Government, Utilities,… .

We discuss their main drivers for starting a Data Governance initiative, as well as their pragmatic approach in moving from gradual roll out to support and sustain their Data Governance program.



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Define your business concepts, facts & rules as a “ shared business language ” in a clear and formal way, understood by both business and IT, using open standards; Manage roles & responsibilities using stewards & stakeholders, including the complexity of organizations and their constant change; Collaborate with all stakeholders to ensure usable data for all people and systems involved, across geographical and organizational boundaries; Validate data against business definitions & rules to ensure reliability and correctness.
  • By applying ICT, the physical information system is partly replaced by a computerised information subsystem component. In order to define such this technical system part, the designer observes the information system as-is, make a conceptual interpretation of it in his mind, and represents this in terms of a set of software functionality specs and a data model. Language is essential to bridge between reality and its modelling concepts, and is concerned with syntax, semanics, and pragmatics. In this example the real-world personnel record is represented in terms of a table structure which relates the concept employee with its name, address, etcetera. Application software provides the human actors with a graphical interface to access and manipulate the personnel database. We call this a closed information system. It is designed for the purpose of one organisation. The software requirements and functionality are known a priori, and the data model is agreed locally and refers to organisational concepts. Information Systems usually suffer from a closed-world syndrome. They were designed from a naive assumption they have already stored all possible facts about the domain. Facts not in their database are presumed to be false. They are also stored in format only fully understood by designer. Hence it is assumed there never will be a need for data exchange with other systems.
  • For many years, information systems have been designed with a closed world assumption. However, in today ’ s information-centric economy it is becomes increasingly important that information systems are able to communicate with each other. Consider the integration of the information systems of two HR departments of the same company, one in Brussels and one in Paris. In the underlying computerised information systems the syntax and semantics of the data is different, as their designs were based on different assumptions. E.g, both data models use different labels to refer to the entities that describe personnel information of employees. The designers have to align their data models and make this alignment explicit in order to integrate their systems.
  • All too often, people see MDM as a problem you solve once through a so-called “ golden record ” that intergates underlying databases. MDM is perceived as a piece of technology you install, as opposed to a discipline that you need to pursue. This approach contradicts with the inherent dynamicity of the data space where errors caused by manual input are not uncommon and the new valuable facts related to a certain business entity may show up every hour. Consequently, the chance of building a successful information governance platform that is scalable and sustainable over longer periods of time based on these premises only is very low.
  • In an open system ’ s assumption the usage context and applications are unknown before. As communities evolves, new data processing requirements emerge. To dynamically combine the underlying information systems, integration is impossible.In order to establish semantic interoperability between these systems there is a need for an abstraction layer that refers to language-neutral and context-independent concept types. We will call this an ontology.Usually, ontologies are managed by a select team of people with a technical background. However, the definition and evolution of an ontology should not be based on organisational assumptions, but on shared and agreed needs of the community. Bridging this gap introduces many challenges.
  • The goal is to enact communities in the evolution of their ontologies. The basic principle of CBOE is the co-evolution of three first-class citizens of the community: the social interactions, the underlying information systems, and the ontology that establishes semantic interoperability. Therefore we must consider both social as technical aspects, and the gap in between. In order to bridge the gap, a viable approach must put into practice the necessary activities to identify common needs from ad-hoc social interactions, and bring the community stakeholders together to find an ontological agreement to support these needs. We will coin this collaborative approach Business Semantics Management.
  • Current solutions fail to bridge this gap So-called metadata management solutions operate more on the technical level and do not provide support for business users to enforce their rules and vocabularies. As a result management of business glossaries and technical metadata are not aligned. Moreover, enterprise utlise several tools offered by differen vendore, which obviously are not compatible with each other. As a result the meaning of data is governed by a walhalla of technical metadata walled gardens.
  • In order to empower data governance one needs to introduce a full-cycle that co-evolves business definitions with technical metadata counterparts used for semantic interoperability.
  • This feedback loop is implemented by a methodology that oultines the different activities involved. Doing so, it becomes a true discipline.
  • As metamodel we adopt SBVR

Successfully Kickstarting Data Governance's Social Dynamics: Define, Collaborate, Validate Presentation Transcript

  • 1.
    • Successfully Kickstarting Data Governance's Social Dynamics:
    • Define, Collaborate, Validate
    • [email_address]
    • April 6 th 2011, Chicago
  • 2. It’s all about the context
  • 3. What would Google do?
  • 4. Did I take the wrong Gates to the web?
  • 5.  
  • 6. Let’s get more computing power
  • 7.  
  • 8.  
  • 9.  
  • 10. World View Information is meaningful data Information is today ’s currency
  • 11. Lack of context leads to unreliable data
    • A ‘Customer’ for Marketing is not the same as ‘Customer’ for Finance
    • Field ‘Customer’ in CRM is not equal to field ‘Customer’ in ERP
  • 12. Solution: Data Governance Bringing business and IT together to govern data as an Enterprise Asset What does my data mean? How can I involve all stakeholders? How do I operationalize DG?
  • 13. Positioning
  • 14. The Closed World Syndrome requirements and functionality known and specific data model agreed locally and refers to organisational concepts usually cryptically stored in proprietary format (vendor lock-in) only understood by designer designed for the purpose of one organisation all facts about the domain are already stored;facts not stored are presumed false
  • 15.  
  • 16. The Fairy Tail of the Everlasting Golden Record
    • the golden record is a single version of truth for a limited period of time
    • considered a install-once fix rather than a discipline to pursue
    • contradicts with the inherent dynamics of online B2B communications
    • unscalable and unsustainable
    • need for governance to oversee
  • 17. Limits of Data Integration in The Extended Enterprise users, usage context, and applications largely unknown a priori ontologies refer to language-neutral, context-independent concepts agreed by the community systems must combine by interoperation
  • 18. Sounds familiar? what does it mean “ Customer ” ? “ Customer ” is a type of Party of Person that orders at least two Product Items per Year. so “ Customer ” refers to a class with attributes Pname, Paddress,... ? ...and a Party can either be an Individual or a Company... Aha, and what types of Product Item exist ?
  • 19. Feed the metadata repository giant? walled garden walhalla
  • 20. Empowering Information Governance
  • 21. Banking customer
    • Goal : reduce time needed to do end of year closing of general ledger from 40 to 10 days
    • Problem : takes too long because of manual reconciliation
    • Root cause : conflicting general ledger account taxonomies (apples and oranges)
    • Solution : build a shared business vocabulary
  • 22. Technology company
    • Goal: meaningful reporting on corporate level
    • Problem: inaccurate reporting (e.g., on “Customer Install Base”)
    • Root cause: lack of Governance across organizational boundaries and lack of business ownership of data
    • Solution: create common understanding, agreement and ownership on the business level
  • 23. Government
    • Goal : link data between different government bodies and agencies
    • Problem : integration is costly and painful
    • Root cause : bad quality data in various formats
    • Solution : effects of business rule change can be tested and operationalized for data transformation and validation
  • 24. Utilities
    • Goal : obtain correct understanding of assets for reporting
    • Problem : registration has never been done sufficiently
    • Root cause : unclarity on “what things actually are”
    • Solution : governance organization
  • 25. So where do we start?
    • Each industry has their own specifics – start with what they provide
    • Each organization has their own specifics – start with what you already have
  • 26. Cyc Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. Source:
  • 27. Cyc (in facts & figures)
    • Started in 1984 by Doug Lenat.
    • Name comes from the stressed syllable of 'en cyc lopedia'.
    • 70 million dollars and 700 person-years of work,
    • 600,000 concepts, defined by 2,000,000 axioms, organized in 6,000 microtheories,
    • But not enough applications to support continued research.
    • In 2004, the Cyc project was scaled back, and more emphasis was placed on developing applications.
    Source: John F. Sowa (1 September 2009)
  • 28. Wikipedia "an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language" Source:
  • 29. Wikipedia (in facts & figures)
    • Created in 2001 by Jimmy Wales and run by non-profit organization, the Wikimedia Foundation.
    • More than 14,000,000 articles in more than 260 languages
    • There are 11,062,835 registered users, including 1,702 administrators , while employing fewer than 35 people.
    • Net loss of 49,000 editors in first 3 months of 2009 versus loss of 4,900 editors in first 3 months of 2008... (WSJ, 23 November 2009)
  • 30. Too much freedom? Evolution evolving December 3, 2001: initial version. July 13, 2002: from controversial to commonly accepted in 2 hours. October 1, 2002: debut of biology grad student at Harvard, good for a total of 79 edits over 3 years. August 9, 2004: black line indicates deletion as vandalism (half of all vandalisms are corrected within 5 minutes). March 29, 2005: longest point, discussion to reduce to neutral point of view September 19, 2005: edit war, with rollbacks rollbacked several times 1 2 3 4 5 6 from IBM Watson Research
  • 31. Agile methodology
    • setup communities to reflect your organizational structure
    • determine roles and responsibilities to reconcile and validate business vocabularies and rules within these communities
    • define workflows and tasks to streamline this whole process around the clock
    • monitor completeness of terms and rules
    • analyze performance of contributors and tune accordingly
    • transparency about use and lineage of business vocabulary and rules in technical systems
  • 32.  
  • 33.  
  • 34. Structuring communities
    • Based on the organizational chart (i.e., direct alignment with business units)
    • Based on functional division (e.g., Sales and Marketing, Finance)
    • Based on regional division (e.g., per country, region, …)
    • Based on Lines of Business
    • Based on existing Subject Areas
    • Can span different organizations or enterprises
    • Community entails ownership, which means “know thy self or thy customer” …
  • 35. Functional example
  • 36. Workflow and roles
    • Standard roles exists from various sources: thought leaders, DAMA, frameworks (e.g., steward, owner, council, …)
    • Various workflows are needed:
      • Intake, approval, publication, decommission
      • Notifications
      • Validation
      • Promote, demote, hire, fire members
      • Dependency validation
    • Every organization has their own roles and workflows – whatever works best for them
  • 37.  
  • 38. Stakeholder performance
  • 39. Stakeholder performance
  • 40. Stakeholder performance
  • 41. If airplanes were like your systems… Courtesy of Poppy Quintal (see
  • 42. … would you still board them? Courtesy of Poppy Quintal (see
  • 43. How?
    • Results: faster and more efficient handling of tasks, less errors, less accountability, more precision, easier communication, ... C ontrol over complexity
    • Combination of limited vocabulary (about 1000), larger unlimited set of more technical terms, rules and standards
    Courtesy of Poppy Quintal
  • 44. Conclusions
    • Existing tools are insufficient for handling complexity in the information age:
      • There is little focus for sought-after business audience.
    • Semantic technology is available: it helps understand what our data means
      • Agreed, clear and formal meaning, ready for use in systems.
    • Data governance: to keep data understood, we need to “run it right”:
      • Combo of technology, organisation, methodology, and culture.
  • 45. Thank you
    • Questions & Feedback?
    • Read and watch more:
    • Website:
    • Thought leader sessions: sessions
    • Blog:
    • Twitter: @collibra
    • E-mail: [email_address]