• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Information is at the heart of all architecture disciplines & why Conceptual data modeling is essential
 

Information is at the heart of all architecture disciplines & why Conceptual data modeling is essential

on

  • 990 views

Information is at the heart of all of the architecture disciplines such as Business Architecture, Applications Architecture and Conceptual Data Modelling helps this. ...

Information is at the heart of all of the architecture disciplines such as Business Architecture, Applications Architecture and Conceptual Data Modelling helps this.
Also, data modelling which helps inform this has been wrongly taught as being just for Database design in many Universities.

Statistics

Views

Total Views
990
Views on SlideShare
952
Embed Views
38

Actions

Likes
1
Downloads
18
Comments
0

1 Embed 38

https://twitter.com 38

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Information is at the heart of all architecture disciplines & why Conceptual data modeling is essential Information is at the heart of all architecture disciplines & why Conceptual data modeling is essential Document Transcript

    • ENTERPRISE ARCHITECTS WHITEPAPER INFORMATION IS AT THE HEART OF ALL ARCHITECTURE DISCIPLINES AND THERE’S MORE TO DATA MODELLING THAN YOU EVER THOUGHT
    • 2 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED INFORMATION IS AT THE HEART OF ALL ARCHITECTURE DISCIPLINES DOC DATE MARCH 2014 PREPARED BY: CHRISTOPHER BRADLEY CHIEF INFORMATION ARCHITECT & ENTERPRISE SERVICES DIRECTOR VERSION: V1.0.0 ENTERPRISE ARCHITECTS ©2014, all rights reserved
    • 3 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED CONTENTS DATA MODELLING IS A CRITICAL TECHNIQUE AND AT THE HEART OF ALL ARCHITECTURE DISCIPLINES 5   DATA MODELLING INTRODUCTION 6   BACKGROUND & HISTORY 7   DIFFERENT TYPES OF MODELS FOR DIFFERENT PURPOSES AND AUDIENCES 9   DATA MODELLING FOR DBMS DEVELOPMENT 10   DATA MODELLING INCORRECTLY TAUGHT AT UNIVERSITY 14   WHAT NEEDS TO CHANGE? 15   MODELLING FOR THE “NEW” TECHNOLOGIES 16   DEMONSTRATING BENEFITS 28   THE GREATEST CHANGE REQUIRED 29   WHAT NEEDS TO STAY THE SAME? 31  
    • 4 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DATA MODELLING IS… A CRITICAL TECHNIQUE AND AT THE HEART OF ALL ARCHITECTURE DISCIPLINES Many years ago people believed the world was flat and if they sailed over the horizon, then they would fall off the edge. They also believed that the Earth was at the centre of the heavens, and that all other planets orbited around it. But they were wrong. People who believe Data Modelling is just for DBMS design are just as misinformed. Data Modelling, particularly Conceptual Data Modelling is an absolutely critical technique and is at the heart of all architecture disciplines. Here’s why: Since data has to be understood to be managed, it stands to reason that gaining agreement on the meaning and definition of concepts will be a key component. That is precisely what a data model provides. But just what do I mean when I state that Data Modelling is at the heart of all architecture disciplines? FIGURE 1: Data Modelling is at the heart of all architecture disciplines At its heart, the Data Model provides the unifying language, lingua franca, the common vocabulary upon which everything else is based. Other modelling techniques within the complimentary architecture disciplines will interact with each other, forming a supportive; cross- checked, integrated and validated set of techniques. It’s not just (sometime it’s never) about technical DBMS design.
    • 5 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED SO TO ILLUSTRATE THE CASE WITH A FEW SIMPLE EXAMPLES, WE SEE IN: Every type of model references the entities of significance in the conceptual data model, showing why conceptual data modelling is such a vital technique. The Business Architecture Domain: A Project Charter documents the rationale, the objectives, the business scope, and measures the success of the project. It uses the language of a high level data model to describe the business concepts. The Process Architecture Domain: A Workflow Model describes the sequence of steps carried out by the actors involved in the process The Application & Systems Architecture Domain: A Use Case describes how an actor completes a step in the process, by interacting with a system to obtain a service. A Service Specification describes some form of business service that is initiated to complete a business event. The Information Architecture Domain: A Data Model depicts the critical data items, and the attributes or facts about them. This is important data that the organisation wishes to know or store information on, and is the stuff that the processes and systems act on. Getting agreement on the language and definition of the data concepts always must always occur first; once established detail about processes can be added: › To begin we discover the Nouns: i.e. the items of interest to the organisation , e.g. “Product” “Customer” “Location” › Next we discover “Verb – Noun” pairs: These are activities that must be performed, such as process and sub-process, in order for the organisation to operate, e.g. “Design Product” “Ship Order” › Lastly we discover “Actor – Verb – Noun “ combinations: These form the Use Cases or steps within a business process, e.g. “Lead Architect Designs New Product” At this high level, we are seeking to gain an understanding and agreement on terms and vocabulary for the data concepts. We do not want to get bogged down in the level of excruciating detail that a detailed logical model would take us into. Thus high level conceptual models (often called Business Data Models) are the appropriate vehicle to use here. It can be loosely argued that they provide some of the features of an “ontology” i.e. business concepts and their relationships, although a Conceptual Data Model with its metadata extensions provides much more.
    • 6 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DATA MODELLING INTRODUCTION The problem for many Data Architects is that “Data Modelling” has, in far too many companies received a lot of bad press. Have you heard any of these? › “It just gets in the way”, › “It takes too much time”, › “What’s the point of it”; › “It’s not relevant in today’s systems landscape”. › “I don’t need to do modelling, the package has it all” Yet when Data Modelling first came onto the radar in the mid 1970’s the potential was enormous: We were told we would realise benefits of: › "a single consistent definition of data" › "master data records of reference" › “reduced development time” › “improved data quality” › “impact analysis” › to name but a few. Do organisations today want to reap these benefits? You bet, it’s a no- brainer. So then, why is it that now, here we are, 30+ years on and we see in many organisations that the benefits of Data Modelling still need to be “sold” and in others the big benefits simply fail to be delivered? What’s happened? What needs to change? As with most things a look back into the past is a good place to start.
    • 7 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED BACKGROUND & HISTORY Looking back into the history of data management; we see a number of key eras. 1950’s – 70’s: Information Technology (at that time often called Automated Data Processing (ADP)) was starting to enter the mainstream world of commerce. During this period we saw the introduction of the first database management systems such as DL1, IMS, IDMS and TOTAL. Who can remember a DBMS that could be implemented entirely on tapes? ** At that time the cost of disc storage was exceptionally high, and the notion of exchangeable disc packs was just coming into the data centre. The concept of “database” operations came into play and the early mentions of “corporate central databases” appeared. ** It was IMS HISAM if you really want to know. 1970 – 1990: Data was “discovered”. Early mentions of managing data “as an asset” were seen and the concepts of Data Requirements Analysis and Data Modelling were introduced. 1990 – 2000: The “Enterprise” became flavour of the decade. We saw Enterprise Data Management Coordination, Enterprise Data Integration, Enterprise Data Stewardship and Enterprise Data Use. An important change began to happen in this period, there was a dawning realisation that “technology” alone wasn’t the answer to many of the information issues, and we started to see Data Governance being talked about seriously. 2000 and beyond: Data Quality, Data as a Service, Data Security & Compliance, Data Virtualisation, Services Oriented Architecture (SOA), governance and alignment with the business were (and still are) the data management challenges of this period. All of this needs to be undertaken in these rapidly changing times when we have a “new” view of information: Web 2.0, Blogs, Mash-ups, Data Virtualisation. It seems anyone can create data! At the same time we have a greater dependence on “packaged” or COTS (Commercial off the shelf) applications such as the major ERPs. There is also more and more use of SOA, XML, business intelligence and less reliance on traditional “bespoke” development.
    • 8 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED NOTICE I SNEAKED IN “MASH-UPS” (OR WEB APPLICATION HYBRID) THERE? See the Wiki article for more on mash-ups. There are many powerful facilities available now that enable you to create your own mash-ups. Make no mistake, these are now becoming the new “Shadow IT” of this decade. Remember the home grown departmental Excel macros of the 90’s and onwards that became “critical” to parts of the business? Now mash-ups are doing the same thing. But just who is looking at the data definitions, the data standards, applicability etc.? Certainly not the data management group – because frequently they don’t even know that these functions are being built in departmental silos, and anyway the “data team” is pigeon holed as being only involved in DBMS development. So that leads us on to examine the belief that many people still have (too many unfortunately) that Data Modelling is only for DBMS development. So why is that? Firstly we’ll look at Data Modelling for use in DBMS development.
    • 9 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DIFFERENT TYPES OF MODELS FOR DIFFERENT PURPOSES AND AUDIENCES In its early days Data Modelling was mostly (almost exclusively) what we now call Logical and/or Physical Data Modelling and it was primarily aimed at DBMS development. However, there are many different levels of “Data Models” that can be developed, and they each have a different purpose and audience: FIGURE 2: Levels of Data Models From Figure 2 above, we see there are many different levels of “Data Models”. The higher up the pyramid we go, the more “communication” focused the models are. Whereas the further down the pyramid we go the more “implementation focused the models are. Frequently, a higher level model is created with the sole purpose of improving communication and understanding. FIGURE 3: Purpose of data model levels
    • 10 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DATA MODELLING FOR DBMS DEVELOPMENT In its early days data modelling was primarily aimed at DBMS development. We’ll have a look at the two main techniques in a moment. Just to illustrate this we can look at 4 typical roles that may be considered as “customers” of the data modelling output: The Enterprise data customer: This might be at Director or CxO level. The accuracy of data is critical, they are reports users, and the data “products” that data professionals produce are key to serving the needs of this level of user. The Data Architect: This person knows the business and its rules. He/she manages knowledge about the data and defines the conceptual direction and requirements for capturing of data. The DBA: This person is production oriented, manages data storage and the performance of databases. He also plans and manages data movement strategies and plays a major part in data architecture by working with architects to help optimise and implement their designs in databases. The developer DBA: This role works closely with the development teams and is focused on DBMS development. They frequently move and transform data often writing scripts and ETL to accomplish this. Data models (more accurately the metadata) were (and are) seen as the glue or the lingua franca for integrating IT roles through the DBMS development lifecycle. All of the roles above depend on metadata from at least one of the other roles. What then are the steps for developing a DBMS and utilising Data models? What then are the steps for developing a DBMS and utilising Data Models? Firstly a word of warning; this could be the subject of a huge paper in its own right, but I’ll try and summarise it simply here:
    • 11 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED THERE ARE TWO “MAIN” APPROACHES TO CREATING DBMS’S FROM MODELS There are two “main” approaches to creating DBMS’s from models: One is the “top down” or “to-be” approach and the other is termed the “bottom-up” or “as-is” approach. TOP DOWN (TO-BE) APPROACH STEP 1: When speaking with business representatives, discover and document the business requirements, before agreeing on a high-level scope. The output is typically some form of Business Requirements Document (BRD). This will give an understanding at a high level, of the concept where the data is used by business processes, and vice versa. STEP 2: Create a more detailed business requirement document with subscriber data requirements, business process and business rules. STEP 3: Understand and document the business keys, attributes and definitions from business subject matter experts. From this create and continually refine a logical data model. Determine what the master entities are and what is common to other business areas. STEP 4: Verify the logical data model with the stakeholders. Walk a number of major business use cases through and refine the model. Apply the technical design rules with knowledge of the technical environment that you are going to implement the solution on, use known volumetric and performance criteria and create a first cut physical data model. Remember the same logical model could be implemented in different ways upon varying technology platforms. STEP 5: Generate the Data Definition Language (DDL) from the physical model. Refine the physical design with DBA support and implement the DBMS using the refined physical model. This top down approach has an advantage that the “New” or “To-Be” business and data requirements are the main priority. In the early days there were not many “existing systems” to consider, a good job because the approach doesn’t take into account any of the hidden nuances & rules that may be deep down within the existing systems.
    • 12 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED The primary purpose of the Bottom Up (or As-Is) Approach is to create a model of an existing system into which the new requirements can be added. Frequently, the bottom-up approach is used because a model of the current system simply doesn’t exist. Often because it has evolved and/or the original design staff have retired, died, or moved on and the documentation has not been kept up to date. BOTTOM UP (AS-IS) APPROACH The main steps in the bottom-up approach are: STEP 1: Reverse engineer the database of file schema from the system that is already implemented. From this you will have the database catalog, table, column, index names etc. Of course these will all be in “tech” language without any business definitions. STEP 2: Profile the real data by browsing and analysing the data from the tables. Scan through the ETLs to find out any hidden relationships and constraints. Modern data profiling tools are invaluable here as they will allow you to gain real insight to the data, way beyond simply trying to understand from the column names. You did know that SpareField6 really has the alternative delivery location? STEP 3: Find out foreign key relationships between tables from IT subject matter experts, and verify the findings. The typical output here is a refined physical model. STEP 4: Document the meanings of columns and tables from IT subject matter experts STEP 5: Try to understand the business meaning of probable attributes and entities that may be candidates for logical data model. From here the result is a “near logical” model. The bottom up approach is great for capturing those hidden “gotchas” that are tucked away inside the current system. However it doesn’t give any serious attention to new requirements. Thus, a third way is a hybrid of these two approaches that is frequently called the “Middle Out” Approach. The Middle Out Approach employs the best parts of the Top-Down and Bottom-Up Approaches. This is the approach I favour when designing a new model, which is likely to have a better chance of ultimately being used for a technology solution.
    • 13 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED
    • 14 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DATA MODELLING INCORRECTLY TAUGHT AT UNIVERSITY Over the past 10 years or so I have been taken aback at what I have observed regarding the way in which Data Modelling is portrayed on courses at many Universities in the UK and USA (and I suspect in other places too). As part of my DAMA-I education brief (and to be honest as a way of giving something back to the community) I am frequently asked to speak not just at conferences but with academic institutions. Here are a few snippets I have pulled from 5 separate universities recently regarding data modelling on the Computer Science Bachelors & Masters courses: › “The purpose of a Data model is to design a relational database system” › “An ER Model is used to specify design and document Database design” › “A Data model is a pictorial representation of the structure of a relational database system” › “… it is a description of the objects represented by a computer system together with their properties and relationships” › “ER Modelling is a Database design method” At one of these I dug deeper and examined several of the course assignments. One assignment asked students to prepare a model to represent an office environment and in part of the detailed description within the assignment brief it mentioned the “Rolodex” and “IBM Selectric” that were on the desks in this office. Now, I’m not talking here of reading an assignment paper set for a course in 1975, this was one I saw in 2013!! Now with all of these uses of Data Models that I have described so far, the history of Data Modelling, the way it’s still being taught in some Universities, and judging from much of the literature from the Data Modelling tool vendors themselves; it not surprising that many people are left with the impression that data modelling is just for DBMS’s. But this is wrong!
    • 15 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED WHAT NEEDS TO CHANGE? See also “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”; Technics Publishing; ISBN 978-0-9771400-7-7; The use and benefit of Data modelling is considerably greater than its current “one trick pony” press would suggest. To make Data Modelling relevant for today’s systems landscape we must show that it’s relevant for the “new” technologies such as: › ERP packages; › SOA & XML › Business Intelligence › Data Lineage › Data Virtualisation Without forgetting that an appropriate level Data Model is an awesome communication tool so it can for used for communicating with the business. See also “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level Data Models”; Technics Publishing; ISBN 978-0-9771400-7-7; We also need to break away from the “you must read my detailed Data Model” mentality and make the information available in a format users can readily understand. For example this means that Data Architects need to recognise the different motivations of their users and re-purpose the model for the audience: Don’t show a business user a Data Model! Information should be updated instantaneously, and we must make it easy for users to give feedback, after all you’ll achieve common definitions quicker that way. We need to recognise the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data Modelling play a real part in our business then it’s up to us to demonstrate and communicate the genuine benefits that can be realised. Remember, Data Modelling isn’t a belief system, just because you “get it” don’t assume that the next person does.
    • 16 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED MODELLING FOR THE “NEW” TECHNOLOGIES I feel I must make a confession here. The technologies are not really all that new! It’s just that “traditionally” Data Modelling has not been seen as being relevant to these areas. To break out of this “modelling is a one trick pony” view we need to show how and why Data Modelling IS relevant for today’s varied IT landscape. Therefore we must show that it’s relevant for the “new(er)” technologies such as: › ERP packages; › SOA & XML › Business Intelligence › Data Lineage › Data Virtualisation › Communicating with the business ERP PACKAGES A data model will assist in both package configuration and fitness for purpose evaluation. As data architects, when faced with projects that are embarking upon the introduction of a major ERP package, have you ever heard the cry: “We don’t need a data model – the package has it all”? But, does it? Is data part of your business requirement? Of course it is. So just how do you know whether the package meets your overall business data requirements? You did assess the data component when doing your fitness for purposes evaluation didn’t you? How can you assess that the ERP package has compatible data structures, definitions and meanings as your legacy systems? Again a good Data Model will assist this. What about data integration, legacy data take on and master data integration – how can these readily be accomplished? You guessed it – a Data Model can help here too. The critics say that modelling isn’t needed for ERP packages. But that’s because they are wedded to the old-world view that modelling is only used for DBMS development. It’s not. In this case, when we are implementing ERP systems, the model will NOT be required to generate a DBMS from, however for all of the other aspects described above it IS invaluable.
    • 17 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED SO WHAT’S’ THE PROBLEM? Why can’t we just point our favourite Data Modelling tool at the underlying DBMS of the package? Simply put, for the most part the problem is that Database System Catalog does not hold useful metadata. Several well-known ERP systems do not hold any Primary Key (PK) or Foreign Key (FK) constraints in the Database itself. It’s only within their application layer that this knowledge is held. It is within the proprietary ERP Data Dictionary where anything resembling a ‘Logical View’ of the data and the definitions are held. FIGURE 4: Part of an ERP reverse engineered directly from the DBMS
    • 18 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED WHAT WE REALLY NEED is to be able to get the ERP metadata into a useful format similar to that shown in figure (5) below. FIGURE 5: Useful model from an ERP HOW CAN WE DO THAT? Well there isn’t space in this article to go into the detail, and much of it varies from ERP to ERP. However with for example SAP, there is a metadata extraction facility independently available called SAPHIR. Additionally, you can also validate a model created from SAPHIR be examining key screen items such as in the example illustrated below. FIGURE 6: Validating an ERP model from transaction screens
    • 19 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED SUMMARY: WHY DEVELOP DATA MODELS FOR PACKAGE IMPLEMENTATION: So why do we need to bother undertaking Data Modelling when implementing an ERP system? 1. For requirements gathering. If your business data is part of your requirement, you need to model them. 2. For a fit for purpose evaluation. Surely you must have evaluated the suitability of the package before deciding to implement it? 3. For gap analysis: Even if you are told “it’s a done deal – we are going with package X”, the Data Model will give you rich insight to gaps in key areas of functionality. I have used this many times with clients when implementing major well known packages to help spot areas where a work round, or manual implementation will be required. 4. For configuration. Using models as a communication vehicle to demonstrate use case is invaluable. From these the many options in the ERP system can be examined and then configured with confidence. 5. For legacy data migration and take on. 6. For master data alignment. The ERP may have its own master data sets. You can use the model to ensure correct alignment of these with your corporate master data initiative. Don’t fall into the trap of letting the tail wag the dog! 7. Fundamentally, this is the key one. It’s all about ensuring that your ERP data can integrate within your overall Information Architecture SOA AND XML I don’t intend to give a detailed exposition on the subject of SOA; however it’s worth reminding ourselves of the fundamental components in the architecture. The Bus in SOA is a “conceptual” construct, which helps to get away from point to point thinking. An approach for integrating applications via “a bus” is by using Message Oriented Middleware (MOM). A Message Broker is a dispatcher of messages and comes in many varieties. The broker operates upon a queue of messages within the routing table. Adapters are where the different technology worlds are translated, e.g. UNIX, Windows, OS/390 and so on. Fundamentally, SOA is built upon a message based set of interactions, i.e. all interaction between components is through messages. These are generally XML messages, so it is true to say that XML is at the core of SOA. XML is a hierarchical structure (just like in the good old days of IMS &
    • 20 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED But there is a potential problem DL1), but the real world of data is not. FIGURE 7: Book example LET’S ILLUSTRATE With a real-world example - a book. Looking at figure 7, we see that this book is entitled “Data Modeling For The Business”. When we look at this example we see data such as Title, Author(s), ISBN, Price, Publisher, Amazon URL and so on. Looking at the authors, (myself, Steve & Donna) there is also some information (on the back cover) relating to each of us. We can develop a Data Model to represent this “real world” data and show it in an Entity Relational format. Typically these ER models can represent real world data pretty accurately. Figure 8 shows an example ER model for the “book authoring” data subject area. A few of the business assertions that this Data Model makes are that: › A book can be written (authored) by at least one & possibly several writers (in this case, me, Steve and Donna). › A writer may be the author of many books (e.g. Steve has also written “Data Modeling Made Simple”). › Thus Book <> Writer is a many to many relationship. However the intersection entity is a real world concept; it’s the “Book Authorship” entity and this is shown in Figure 8
    • 21 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED ER MODEL FIGURE 8: Book example ER model XML MODEL In figure 8 below, the resultant XML model has been created after choosing Book as the parent. Now, when we want to use data in this model within an XML based system we have to remember that XML messages are hierarchic; that is a child entity can only have one parent entity; whereas an entity relationship (ER) model allows a child entity to have several parent entities. Thus we need to do something to turn the ER model representation into a hierarchic XML representation. To accomplish this we need to decide whether to make “Book” the parent of Book Authorship or to choose “Writer” to be the parent. . FIGURE 7: Book XML model
    • 22 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED XML MESSAGE Whilst simplistic (for the sake of the example), the XML model in Figure 9 now represents the XML schema we’re going to use. Within our SOA based system, we may have a transaction which utilises an XML message called “Book Details”. Figure 10 below shows how an XML message has been created from the XML schema, and is utilised (in the message queue) in our SOA solution. FIGURE 10: Book details XML message So clearly, Data Modelling IS a key component required in a SOA implementation. It’s somewhat ironic that this “new” SOA concept and the representation of data in a hierarchic form (i.e. in XML messages), draws heavily on the approaches we had to employ when designing a database schema for IMS and DL1 which were hierarchic DBMS’s!
    • 23 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED BUSINESS INTELLIGENCE When looking at Business Intelligence and Data Warehouses, we are trying to ensure that the data utilised by the business for their queries and reports is reliable. In order to accomplish this, not only do we need to manage the data that the business utilises, but also the metadata. We all know by now that much of this metadata is contained within the data models. So, what are the main reasons for managing this model metadata? 1. Reduce Cost: In addition to all the other points below, the goal here is to reduce the overall cost of managing a significant part of the IT infrastructure. Managing metadata helps automate processes, reduce costly mistakes of creating redundant/non- conformant data, and reduce the length of time to change systems according to business needs. 2. Higher Data Quality: Without proper management, the same type of data may be managed differently in the places it is used and degrade its quality/accuracy. 3. Simplified Integration: If data is understood and standardised, it reduces the need for complex and expensive coding and scripting to transform and massage data during integration. 4. Asset Inventory: Managing the knowledge about where data lives and what you store is critical for eliminating redundant creation. 5. Reporting: Creating a standard definition of data types and making it easy for the enterprise to find, will reduce cost in application development (e.g. time to research and create new objects) as well as facilitate a general understanding of the enterprise’s data. 6. Regulatory Compliance: Without metadata management, you are not complying with regulations. Bottom line: An audit trail of data, starting with its whereabouts, is critical to complying with government mandates. The top 5 benefits from managing this model metadata for reporting are: #5 Data Structure Quality. Models ensure that the business design of data architecture is appropriately mapped to the logical design, providing comprehensive documentation on both sides. #4 Data Consistency. By having standardised nomenclature for all data – including domains, sizing, and documentation formats – the risk of data redundancy or misalignment is greatly reduced. #3 Data Advocacy. Models help to emphasise the critical nature of data within the organisation, indicating direction of data strategy and tying data architecture to overall enterprise architecture plans, and ultimately to the business’s objectives.
    • 24 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED #2 Data Reuse. Models, and encapsulation of the metadata underpinning data structures, ensure that data is easily identified and is leveraged correctly in the first place, speeding incremental tasks through reuse and minimising the accidental building of redundant structures to manage the same content. #1 Data Knowledge. Models, combined with an efficient modelling practice, enable the effective communication of metadata throughout an organisation, and ensure all stakeholders are in agreement on the most fundamental requirement: the data. ER MODELS VS. DIMENSIONAL MODELS FOR REPORTING A lot has been written previously about the appropriateness of ER vs Dimensional Models for BI and Data Warehousing. To dispel any myths it’s worth looking at the key features of each type of model: FEATURES OF AN ER MODEL › Optimised for transactional processing (arrival of new data) › Normalised – typically in 3 rd (or 5 th normal form) › Designed for low redundancy of data › Relationships between business entities are explicit (e.g. Product determines brand determines manufacturer) › Tightly coupled to current business model FEATURES OF A DIMENSIONAL MODEL › “Star Schema” (or snowflake or even star flake) › Optimised for reporting › Business entities are de-normalised › More data redundancy to support faster query performance › Relationships between business entities are implicit (it’s evident that a product has a brand and manufacturer, but the nature of the relationship between these entities is not immediately obvious) › Loosely coupled to the business model – changes to the business model can often be accommodated via graceful changes without invalidating existing data or applications.
    • 25 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED DATA LINEAGE Don’t forget data lineage – it’s applicable to many aspects, and now with regulatory compliance requirements in many sectors this is now a statutory need. In BI and DW applications, mappings and transformations determine how each field in the Dimensional Model is derived. The derivations could actually drive the ETL process. In lineage, like BI the metadata is vital! Big Data Trap: Using the metadata to help understand and document Data Lineage (as well as to help with business data understanding, data glossaries and so on) is one of the areas which companies rush into. This “me too” attitude towards big data can be damaging if companies don’t tread incredibly carefully. After all, if you haven’t got your “little & medium” data strategy correct, how can you hope to succeed in the big data space? WHAT IS THE PROBLEM? Fundamentally we need to be able to help business users to answer questions or concerns raised such as: › That figure doesn’t look right! Where does it come from? › How can we prove to the auditor that financial data has been handled correctly? Not only do we need to help our primary customers (the business folks), but we also need to be able to help IT staff to answer questions such as: I need to integrate the data supplied from your system with the data in my system. How can I understand where your data has come from and what it means? And finally, we need to be able to help systems to answer questions such as: When a piece of source data is updated, which items in the Data Warehouse will need to be recalculated?
    • 26 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED SO WHY DOES DATA LINEAGE MATTER? We aim to have an increased understanding of where data comes from and how it is used, which will lead to increased confidence in the accuracy of data The knowledge of how data is transformed is itself valuable intellectual property that should be retained within a business, and very importantly it is absolutely necessary for compliance with the Basel II Accord and Sarbanes-Oxley Act (SOX): SOX requires that lineage & transformation of financial data is recorded as it flows through business systems. TWO KEY ASPECTS OF DATA LINEAGE Transformations: › What has been done to the data? Business Processes: › Which business processes can be applied to the data? › What type of actions do those processes perform (Create, Read, Update, Delete)? › Audit Trail – who has supplied, accessed, updated, approved and deleted the data and when? › Which processes have acted on the data? SO WHERE DO I NEED DATA LINEAGE? For the design of ETL processes, the creation of Dimensional Models, the transforming data to XML (typically from ER) and for workflow design. DATA VIRTUALISATION One of the great newer technologies to emerge recently is Data Virtualisation. Most of us will be familiar with Storage Virtualisation and even Server Virtualisation. The purpose of virtualisation in the IT world is to mask complexity, and present a virtual representation of the thing as if it were a real instance itself. So with Data Virtualisation, data can be federated from a very wide variety of heterogeneous environments and data storage systems, but presented to an application as if it were a real SQL table, XML message, Web service, SOAP call etc. Figure 11 illustrates a typical data virtualisation architecture. But what is going to be presented to the applications? We’ve got all sorts of different data formats, rules, characteristics and so on in the source data. So what are we going to show in our nice new uniform view of the data that is presented to the applications? It’s the Data Model that is absolutely the language, the key which
    • 27 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED unlocks the potential of Data Virtualisation. The Data Model informs the federation layer of the DV toolset, and it is against the definitions & structures of the Data Model that the consuming applications access the data. You can almost imagine Data Virtualisation as being “views on steroids” FIGURE 11: Typical Data Virtualisation Architecture COMMUNICATING WITH THE BUSINESS Finally, Data Modelling can play a very useful role in helping to communicate with the business. As described earlier in this paper, Data Models can be produced at different levels (Enterprise, Conceptual, Logical, Physical) and are for different audiences. At the higher levels a model is a phenomenal tool for getting across ideas, concepts and gaining a good understanding of the language and meaning of the major data concepts in the business. At the highest level, an Enterprise Data Model documents the very high level business data objects and definitions. Its scope is Enterprise wide and is there to provide a strategic view of Enterprise data. The Enterprise Data Model is there to get across big picture, high level concepts. In a Conceptual Data Model, the business key, attributes and definitions of major business data objects are developed. It also shows the relationship between major business data objects. It is used to communicate with the business, to give an overview of the main entities, super types, attributes, and relationships. It will contain lots of ‘Many to Many’ and multiple meaning relationships. All of this is addressed in the more detailed logical data model, after there is agreement on scope and definitions from these high level models. Fundamentally, these high level models have different perspectives and
    • 28 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED levels of detail for different uses. DEMONSTRATING BENEFITS As I mentioned earlier, we constantly need to demonstrate the benefits accruing from data modelling. Nobody owes us a living, and no matter how important WE believe the place of modelling to be, it is beholdant upon us to demonstrate (and sell) the benefits within our organisations. So just how can you gain traction, budget and Executive buy-in? Here are a few tips: 1. Be visible about the program: › Identify key decision-makers in your organisation and update them on your project and its value to the organisation › Focus on the data that is crucial to the business first! Publish that and get buy in before moving on. (e.g. start small with a core set of data) 2. Monitor the progress of your project and show its value. 3. Define deliverables, goals and key performance indicators (KPIs) 4. Start small—focus on core data that is highly visible in the organisation. Don’t try to “boil the ocean” initially. 5. Track and Promote progress that is made 6. Measure metrics where possible › “Hard data” is easy (for example # data elements, #end users, money saved, etc.) › “Softer data” is important as well (data quality, improved decision-making, etc.) Anecdotal examples help with business/executive users e.g.. “Did you realise we were using the wrong calculation for Total Revenue?” (based on data definitions) Remember, soft skills are becoming critically important for Information professionals, and whilst you might not like it, the hard facts are that part of YOUR job nowadays IS marketing.
    • 29 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED THE GREATEST CHANGE REQUIRED As Information Professionals, we need to break away from the “you must read my detailed data model” mentality and make the appropriate model information available in a format users can readily understand. This for example means that Data Architects need to recognise the different motivations of their users, and re-purpose the information they present to be suitable for the audience: Don’t show a business user a Data Model! Information should be updated instantaneously, and we must make it easy for users to give feedback, after all you will achieve common definitions much quicker that way. We need to recognise the real world commercial climate that we’re working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data Modelling play a real part in our business then it’s up to us to demonstrate and communicate the benefits that are realised. Remember, Data Modelling isn’t a belief system, just because you “get it” don’t assume that the next person does. So what can we do? 1. Provide information to users in their “Language” › Repurpose information into various tools: BI, ETL, DDL, etc. › Publish to the web › Exploit collaboration tools / SharePoint / Wiki and so on. What about a Company Information Management Twitter channel? › Business users like Excel, Word, Web tools, so make the relevant data available to them in these formats. 2. Document Metadata › Data in context (by Organisation, Project, etc.) › Data with definitions 3. Provide the Right Amount of Information › Don’t overwhelm with too much information. For business users, terms and definitions might be enough. › Cater to your audience. Don’t show DDL to a business user or business definitions to a DBA. 4. Market, Market, Market! › Provide visibility to your project. › Talk to teams in the organisation that are looking for assistance › Provide short-term results with a subset of information, and then move on.
    • 30 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED 5. Be aware of the differences in behaviour & motivations of different types of users, for example a DBA is typically: › Cautious › Analytical › Structured › Doesn’t like to talk › “Just let me code!” HOWEVER A DATA ARCHITECT IS: AND A BUSINESS EXECUTIVE IS: › Analytical › Structured › Passionate › “Big Picture” focused › Likes to Talk › “Let me tell you about my data model!” › Results-Oriented › “Big Picture” focused › Has little time › “How is this going to help me?” › “I don’t care about your data model.” › “I don’t have time.” As Information professionals we’ve got to get these softer skills baked into ourselves and our colleagues. Some of the key things as a profession we can do are to: › Develop interpersonal skills › Avoid methodology wars & notation bigots. Please don’t air discussions about Barker vs IE vs UML class diagrams in front of business users. Yes, sadly enough I have seen this done! › Remember, nobody owes us a living, so we must constantly demonstrate benefits. As data professionals we constantly need to fight for their existence › Examine professional certification (CDMP / BCS etc.). This shows we are serious about our profession. ›
    • 31 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED WHAT NEEDS TO STAY THE SAME? Having highlighted the areas that need to change in order to make modelling more relevant to our business colleagues, and the information environments of today, are there any things that should stay the same? Yes indeed. We must keep the disciplines and best practices that have existed in the modelling community for many years. These can be categorised into 3 major areas as follows: 1. MODELLING RIGOUR: Development of Conceptual, Logical and Physical Data models with good lineage and object re-use. Structures created in the most appropriate normal form (typically 3 rd normal form); Good and consistent data definitions, for all components of the data model. 2. STANDARDS & GOVERNANCE These cover standards for both development and usage of information models, including aspects of data quality. Data Governance including ownership, stewardship and operational control of the data. 3. OBJECT REUSE VIA A COMMON REPOSITORY Not only used for data modelling, the metadata that is captured whilst developing Conceptual, Logical and Physical Data models is of immense use for many aspects of the business. Interestingly, several organisations are now beginning to use this metadata as the basis of their Business Data Dictionaries. The key here is holding the metadata in a common, repository and reusing the objects where appropriate.
    • 32 | ENTERPRISE ARCHITECTS © 2014, ALL RIGHTS RESERVED WE NEED TO GRASP THE NETTLE AND ENGAGE EFFECTIVELY WITHIN OUR BUSINESSES Throughout this paper we have illustrated that data is at the heart of all architecture disciplines. We have seen that Data Models can be produced at different levels and for different purposes and audiences. We have examined many aspects of Data Modelling, starting with its history, its use in DBMS development, the way it is taught in some Universities and firmly refuting the criticism that it is only appropriate for DBMS development. However as data professionals, it’s up to us to make the biggest change necessary to make it appropriate to the technologies and business environments of today. Go to it!