Data Modelling is NOT just for RDBMS's
Upcoming SlideShare
Loading in...5
×
 

Data Modelling is NOT just for RDBMS's

on

  • 1,304 views

Data modelling has been around since the mid 1970's but in many organisations there is considerable scepticism and downright distrust regarding the place dta modelling should occupy. So why does data ...

Data modelling has been around since the mid 1970's but in many organisations there is considerable scepticism and downright distrust regarding the place dta modelling should occupy. So why does data modelling still have to be "sold" in many companies, and in others people simply don't believe it's necessary " the software package has all I need"! This paper looks at the failure of organisations to capitalise on the benefits data modelling can yield and examines where in the changing information systems landscape modelling is relevant.

Statistics

Views

Total Views
1,304
Views on SlideShare
1,302
Embed Views
2

Actions

Likes
0
Downloads
26
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Modelling is NOT just for RDBMS's Data Modelling is NOT just for RDBMS's Document Transcript

  • Christopher Bradley - Business Consulting Director (IPL) April 2012 White Paper Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited© Abstract This paper examines many aspects of data modelling. By understanding Database Management Systems (DMBS) in an historical context from their infancy in the 1950s to the data explosion of today, we explore not only the legacy challenges organisations face, but how Information Professionals must deliver data in consumable formats and use data as ‘the’ corporate asset. Data Modelling is no longer just the domain of the Information Professional. Today’s organisations need people who are able to manage, manipulate and communicate the benefits of data to deliver a reduction in cost, improve data quality, simplify data integration, improve asset management, provide reporting tools and comply with regulation. Put simply, data is a corporate asset, most people are stakeholders and everyone consumes it – whether they know it or not. This paper helps Information Professionals grapple with the challenges and will help organisations to understand, present and make data relevant in today’s environment. IPL www.IPL.com Tel: +44 (0)1225 475000 Grove Street, Eveleigh House, Bath BA1 5LR info@ipl.com
  • Data Modelling is NOT JUST for Database Management Systems “What needs to change to make Data modelling more relevant to today’s environments?” Who is this Report For? This paper is primarily aimed at those people who own, run and work with Data Management Systems serving small, medium and large enterprises. Channel players and re-sellers may also find this paper beneficial in enabling their organisations to introduce value-added services and strategies to improve data performance. • Information Professionals • All Data Consumers Key Messages • Information professionals face challenges when it comes to highlighting the value of the work they do when it comes to data modelling, especially as the industry moves away from traditional bespoke software development. • Data modelling has traditionally been seen as only being relevant to Database Management Systems (DBMS). This is a perception that needs to be changed. • Data modelling is just as relevant to the new technologies and methods emerging today as it has always been. • For Business Intelligence and Data Warehousing reporting, Dimensional Models are better than Entity Relationship Models, and there are ways to transform an existing ER model into a Dimensional one. • Data lineage is extremely important for stakeholders across the business, as well as to comply with statutory requirements. Data lineage is required for the design of ETL processes, the creation of Dimensional Models, transforming data and for workflow design. • It is vital that information professionals pick data modelling projects that will deliver tangible business benefits fast. Furthermore, they must continually show others across the enterprise what these benefits are, to help the initiative gain traction and secure executive buy-in. • To do this, information professionals need to present information in ways that are tailored to the specific audience they are interacting with. There is no one-size-fits-all approach. • Key disciplines and practices that have existed in the modelling community for many years, such as modelling rigour, good standards and governance, and object re-use, need to be retained as data modelling moves forward. Preface Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 1. Introduction Page 1 2. Background & History Page 2 a. 1950 – 1970 c. 1990 – 2000 d. 2000 and beyond 3. Mashups Page 3 4. Modelling for DBMS Development Page 4 a. Top-down Approach b. Bottom-up Approach 5. What Needs to Change? Page 6 6. Modelling for the ‘New’ Technologies Page 7 a. ERP Packages b. SOA & XML 7. Business Intelligence Page 14 a. ER Models vs Dimensional Models for Reporting b. Features of an ER Model c. Features of a Dimensional Model 8. Data Lineage Page 16 a. What is the Problem? b. Why does Data Lineage Matter? c. Two Aspects of Data Lineage i. Transformations ii. Business Process d. So where do I need Data Lineage? 9. Demonstrating Benefits Page 17 10. The Greatest Change Required Page 18 a. What can we do? 11. What Needs to Stay the Same? Page 20 a. Modelling Rigour b. Standards & Governance c. Object Reuse via Common Repository 12. Summary Page 21 13 About the Author Page 22 Index Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 1 In many organisations data modelling has received a lot of bad press. Yet when Data modelling first came onto the radar in the 1970s the potential was enormous and organisations were promised benefits ranging from: • A single consistent definition of data • Master data records of reference • Reduced development time • Improved data quality • Impact analysis; and much more So why is it, thirty-plus years on many organisations are still not ‘sold’ on the need for data modelling? Furthermore, many Information Professionals are struggling to “justify” data modelling, particularly with the push for “Agile” development This paper looks at the failure of organisations to capitalise on the benefits, but also investigates project breakdown and the reasons these projects have not delivered. It also focuses on what needs to happen moving forward and what changes are needed if Information Professionals are going to counter balance the ‘bad press’ of: • “It gets in the way” • “It takes too much time” • “What’s the point of it” • “It’s not relevant to today’s systems landscape” • “I don’t need to do modelling, the package has it all” This paper also takes a forward-looking approach to data modelling, touching on areas associated with: • Modelling for ‘new’ technologies • Demonstrating benefits • The greatest change required of the organisation • And elements that need to stay the same As with most things, a look back into the past is a good place to start. Introduction Page 1Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 2 Looking back into the history of data management we see a number of key eras. a. 1950-1970: IT was starting to enter the world of commerce and during this period we saw the introduction of the first database management systems such as IMS, IDMS and TOTAL. The cost of disk storage was originally very high – who can remember the DBMS that could be implemented entirely on tapes?** The concept of “database” operations came into being and the early mentions of “corporate central databases” appeared. ** It was IMS HISAM if you really want to know. b. 1970 – 1990: Data was “discovered” and “Data Management” became fashionable. Early mentions of managing data “as an asset” were seen and the concepts of Data Requirements Analysis and Data Modelling were introduced. The first real mentions of “Data Modelling” were observed around the mid 1970s. In this period, a number of organisations embarked upon programmes to create single or corporate database systems. The term ‘MDM’ had not been coined yet, but many of the MDM initiatives of the early 21st century picked up where these efforts left off. c. 1990 – 2000: The “Enterprise” became flavour of the decade. We saw “Enterprise” data management coordination, “Enterprise” data integration, “Enterprise” data stewardship, “Enterprise” data use. An important change began to happen in this period, in that there was a realisation that “technology” was not the answer to many of the data issues, Data Governance and the human factors associated with managing information began to be seriously considered. d. 2000 and beyond: Data quality, Data modelling as a Service, Security & Compliance, Service Oriented Architecture (SOA), Governance (still) and Alignment with the Business were – and still are – the data management challenges of this period. And all of this has to be undertaken in these rapidly changing times when we have a “new” view of Information: Web 2.0, Blogs, Mashups. Anyone can create data! At the same time, we have a greater dependence on “packaged” or Commercial off-the- shelf (COTS) applications such as the major ERPs. We are, also seeing greater use and application of SOA, XML, Business Intelligence and less traditional “bespoke” development. All of this is being undertaken in an environment where tighter compliance and regulatory demands are placed upon the information within the Enterprise. Unfortunately, the “traditional” way in which Data Modelling has been taught (and still is in some quarters) means that its relevance in non-custom developments in underestimated. This has to change. Background & History Page 2Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 3 Make no mistake, mashups are becoming the new “cottage industry” IT applications of this decade. Mashups are applications that use and combine data, presentation or functionality from two or more sources to create new services [http://en.wikipedia.org/ wiki/Mashup_(web_application_hybrid)]. Remember the home-grown departmental Excel macros of the 90s and onwards?[http:// www.ipl.com/papers/Drowning in spreadsheets.pdf] These became “critical” to parts of the business, and we all know of the data management problems these resulted in. Mashups are doing the same thing today But just who is looking at the data definitions, standards, applicability and so on? Certainly not the data management group – because they do not know that these things are being built in departmental silos, and anyway the “data team” is pigeon-holed as being only involved in custom DBMS development. So that leads us on to examine the belief that many people have (too many unfortunately) that Data Modelling is only for DBMS development. Why is that? For many organisations, spreadsheet-based applications have become the way in which the hidden IT backlog is addressed by disgruntled users. These dark developments pose significant risks and have spawned an entire cottage industry within several organisations. Mashups Page 3Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 4 In its infancy, data modelling was primarily aimed at facilitating DBMS development. Typically, this involved creating a Conceptual Data Model, then a detailed Logical Data Model, then applying design contraints to make a Physical Data Model, then generating some DDL for your IMS/DL1/IDMS/Total DBMS (remember, it started in the pre-SQL era). To illustrate this, four typical roles can be identified: 1. The Enterprise Data Customer: This might be at Director or CxO level. The accuracy of data is critical, they are report users. As data professionals we produce data or products that are key to serving the needs of this level of user 2. The Data Architect: This person knows the business and its rules. He/she manages data knowledge and defines the conceptual direction and requirements for data capture 3. The DBA: This person is production-oriented, manages data storage and the performance of databases. They also plan and manage data movement strategies and play a major part in data architecture by working with architects to help optimise and implement their designs in databases 4. The developer DBA: This role works closely with the development teams and is focused on DBMS development. They frequently move and transform data, often writing scripts and ETL to accomplish this. Data models (more accurately, the metadata) were (and are) seen as the glue or the lingua franca for integrating IT roles through the DBMS development lifecycle. All of the roles above depend on metadata from at least one of the other roles. So what are the steps for developing DBMSs using models? This could be the subject of a large paper, but to try and summarise it simply: There are two “main” approaches to creating DBMSs from models: One is the “top-down” or the “to-be” approach and the other is termed the “bottom-up” or “as-is” approach. (There is actually a third, called a hybrid approach, which we will not cover here.) a. Top-Down (to-be) Approach Step 1: Document the business requirement and agree a high-level scope. The output is typically some form of Business Requirements Document (BRD) Step 2: Create a more detailed business requirement document with subscriber data requirements, business process as and business rules Step 3: Understand and document the business keys, attributes and definitions from business subject matter experts. From this, create and continually refine a logical data model Step 4: Verify the logical data model with the stakeholders. “Walk” a number of major use cases and users through the model. Apply technical design rules, known volumetric and performance criteria and create a first-cut physical data model. Step 5: Refine the physical design with DBA support and implement the DBMS using the refined physical model. This approach has the great advantage in that the “new” or “to-be” business and data requirements are foremost. However, it does not take account of any existing or legacy systems, nor of knowledge based deep in the current systems b. Bottom-up (as-is) approach The primary purpose of a Bottom-up or as-is approach is to create a model of the existing system into which the new requirements can be added. Frequently, the bottom-up approach is used because a model of the as-is system simply does not exist because it has evolved or the original design staff have moved on. Modelling for DBMS development Page 4Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 4 The steps in this approach are: Step 1: Reverse-engineer the database schema from the system that is already implemented. From this, you will have the database catalogue, table, column, index names and so on. Of course, these will all be in “IT” language, without any business definitions. Step 2: Profile the real data by browsing and analysing the data from the tables. This can be accomplished by using good data profiling tools. Scan through the ETLs to find out any hidden relationships and constraints. Step 3: From IT subject-matter, experts find out foreign key relationships between tables, and verify the findings. The typical output here is a refined physical model. Step 4: Document the meanings of columns and tables from IT subject-matter experts. Step 5: Try to understand the business meanings of “probable” attributes and entities that may be candidates for logical data model. From here, the result is a “near logical” model. A third way is a hybrid of these, frequently called the “middle-out” approach. With all of these uses of models described so far, looking at the history of data modelling, and if we are to believe much of the literature from the data modelling tool vendors, we would be left with the assumption that data modelling is just for DBMSs. WRONG! Modelling for DBMS development (cont.) Page 5Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 5 The use and benefit of Data Modelling is considerably greater than just a “one-trick pony”, as current assumptions would suggest. To make data modelling relevant for today’s IT landscape, we must show it is relevant for the “new” (well newer) technologies such as: • ERP packages • SOA & XML • Business Intelligence & Data Warehousing • Data Lineage We also need to break away from the “you must read my detailed data model” mentality that gets touted by some IM practitioners and make information models available in formats users can readily understand. This, for example, means Data Architects need to recognise the different motivations of their users and re-purpose the models for the audience: Do not show a business user a detailed data model! Information models should be updated quickly, and we must make it easy for users to give their feedback – after all, you will achieve common definitions quicker that way. We need to recognise the real world commercial climate we are working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data Modelling play a real part in our business, then it is up to us to demonstrate and communicate the real benefits that can be realised. Remember, Data Modelling is not a belief system: just because you “get it”, do not assume that the next person does. What Needs to Change? Page 6Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 I feel I must make a confession here. The technologies are not really all that new! It is just that traditionally, Data modelling has not been seen as being relevant to these areas. To break out of this “modelling is a one-trick pony” view, we need to show how and why data modelling is relevant for today’s varied IT landscape. Thus we must show that it is relevant for the “new” technologies such as: • ERP packages • SOA & XML • Business Intelligence & Data Warehousing • Data Lineage a. ERP Packages As data architects, when faced with projects that are embarking upon the introduction of a major ERP package, have you ever heard the cry: “We don’t need a data model – the package has it all”? But does it? Is data part of your business requirement? Of course it is. So just how do you know whether the package meets your overall business data requirements? You did assess the data component when doing your “fit for purpose” evaluation, did you not? A data model will assist in both package configuration and fit-for-purpose evaluation. How can you assess if the ERP package has data structures, definitions and meanings that are compatibal with your legacy systems? Again a good data model will assist this. What about data integration, legacy data take-on and master data integration? How can these readily be accomplished? You guessed it – a data model can help here too. Again, another objection raised is that “we’ve been told to implement X – why bother with a model?” Setting aside the insanity of selecting a solution without first understanding the data requirements, there are two big reasons: 1. The data model will help determine package configuration options 2. Developing a conceptual data model and then walking major business use cases through the model will highlight areas the package cannot satisfy. Then you have a chance of developing workarounds before it is too late, or at least going into an implementation knowing which areas are not satisfied The critics often say that modelling is not needed for ERP packages. But that is because they are wedded to the old-world view that modelling is only used for DBMS development. It is not. In this case, when implementing ERP systems, the model will not be required to generate a DBMS from (after all, the Physical model is generally proprietory to the package vendor), however for all of the other aspects described above, it is invaluable. Modelling for the “new” Technologies Page 7Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 So what is the problem? Why can we not just point our favourite data modelling tool at the underlying DBMS of the package and reverese-engineer a “model” from the Physical DBMS? Simply put, for the most part, the problem is that Database System Catalogue does not hold any useful metadata. Several well-known ERP systems do not hold any Primary Key (PK) or Foreign Key (FK) constraints in the Database itself. See the example in Figure 1. Figure 1: Part of an ERP reverse-engineered directly from the DBMS Modelling for the “new” Technologies (cont.) Page 8Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 It is a package’s application layer that holds all of this logic and referential integrity, and it is the proprietary ERP Data Dictionary (DD) that holds the “Logical View” of the data. What we really need to be able to achieve is getting the ERP metadata into a “useful format”, such as the example shown in Figure 2 below. Figure 2: Useful model from an ERP. Modelling for the “new” Technologies (cont.) Page 9Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 How can we do that? There is not the space to expand on this area in this paper, and much of it varies from ERP to ERP. However, for example within SAP, there is a metadata extraction facility independently available called SAPHIR. Additionally, you can also validate a model created from SAPHIR by examining key items from transaction screens such as in the example shown below in Figure 3. Figure 3: Validating an ERP model from transaction screens. So in summary, why do we need to bother undertaking data modelling when implementing an ERP system? 1) For requirements gathering. If your business data is part of your requirement, you need to model it 2) Fit-for-purpose evaluation. Surely you must have evaluated the suitability of the package before deciding to implement it? 3) Configuration. Using models as a communication vehicle to demonstrate business use cases is invaluable. From these, the many options in the ERP system can be examined and then configured with confidence 4) Legacy data migration and take-on, ensuring accurate source to target mapping 5) Master data alignment. The ERP may have its own master data sets. You can use the data model to ensure correct alignment of these with your corporate master data initiative. Do not fall into the trap of letting the tail wag the dog! 6) Ensuring your ERP data can integrate within your overall Information Architecture 7) Spotting missing major areas of functionality in the package before it is too late; a conceptual data model is a very powerful vehicle for this. Modelling for the “new” Technologies (cont.) Page 10Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 b. SOA and XML The second area where modelling is frequently ignored is in SOA/XML solutions. It is worth reminding ourselves of the fundamental components in the architecture. The Bus in SOA is a “conceptual” construct, which helps to get us away from point-to-point thinking. An approach for integrating applications via a bus is using Message Oriented Middleware (MOM). The Message Broker is a dispatcher of messages and comes in many varieties. The broker operates upon a queue of messages within the routing table. Adapters are where the different technology worlds are translated, e.g. UNIX, Windows, OS/390 and so on. Fundamentally, SOA is built upon a message-based set of interactions, i.e. all interaction between components is through messages. These are generally XML messages, so it is true to say that XML is at the core of SOA. But there is a potential problem: XML is represented via a hierarchical structure, but real-world of data is not. Figure 4: Book example Let us illustrate this with a real-world example – a book (figure 4) we see that this book is entitled “Data Modelling For The Business”. When we look at this real example we see data such as: Title: Data Modeling for the Busines, Authors: (Steve Hoberman, Donna Burbank & Christopher Bradley) ISBN: 978-0-9771400-7-7 £21.79 Technics Publications, LLC URL: http://www.amazon.co.uk/Data-Modeling-Business-Handbook-High-Level/dp/0977140075 Modelling for the “new” Technologies (cont.) Page 11Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 Looking at the authors, (myself, Steve & Donna) there is also some information (on the back cover) relating to each of us. We can develop a model to represent this “real-world” data and show it in an Entity Relational format. Typically these ER models can represent real-world data pretty accurately. Figure 5 shows an example ER model for the book authoring data subject area. A book can be authored by many writers (in this case, myself, Steve and Donna). However, a writer can author many books (Steve has also written “Data Modelling Made Simple”, for example). So as we see in Figure 5, we have added an intersection entity (Book Authorship) to resolve the many-to-many relationship. This intersection entity has many real business attributes. Figure 5: Book example ER model Now, when we want to use data in this model within an XML message, we have to turn the model into a hierarchic XML representation. Thus we need to decide whether to make Book the parent of Book Authorship or to choose Writer as the parent. Modelling for the “new” Technologies (cont.) Page 12Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 6 In Figure 6 below, the resultant XML model has been created after choosing Book as the parent. Figure 6: Book XML model Whilst simplistic (for the sake of the example), the XML model in Figure 6 now represents the XML schema we are going to use. Within our SOA-based system, we may have a transaction that utilises an XML message called “Book Details”. Figure 7 below, shows how the XML message has been created from the XML schema and is utilised (in the message queue) in our SOA solution. Figure 7: Book details XML message So clearly, data modelling is (or at least should be) a key component required for any successful SOA implementation. Modelling for the “new” Technologies (cont.) Page 13Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 7 When looking at Business Intelligence and Data Warehouses, we are trying to ensure that the data utilised by the business for their queries and reports are reliable. In order to accomplish this, not only do we need to manage the data that the business utilises, but also the metadata. We all know that much of this metadata is contained within the data models, and indeed the models can be extended to hold even more useful metadata. So, what are the main reasons for managing this model metadata? 1. Reduce Cost: In addition to all the other points below, the goal here is to reduce the overall cost of managing a significant part of the IT infrastructure. Managing metadata helps automate processes, reduce costly mistakes of creating redundant or non-conformant data, and reduces the length of time to change systems according to business needs 2. Higher Data Quality: Without proper management, the same type of data may be managed differently in the places it is used, which could degrade its quality or accuracy 3. Simplified Integration: If data is understood and standardised, it reduces the need for complex and expensive coding and scripting to transform and massage data during integration 4. Asset Inventory: Managing the knowledge about where data lives and what you store is critical for eliminating redundant creation 5. Reporting: Creating a standard definition of data types and making them easy for the enterprise to find will reduce cost in application development (e.g. time to research and create new objects) as well as facilitate an improved understanding of the enterprise’s data 6. Regulatory Compliance: Without metadata management, you are not complying with regulations. The bottom line is that in an audit trail of data, starting with its whereabouts, is critical when it comes to complying with several government and other regulatory mandates The top 5 benefits from managing this model metadata for reporting are: #5 Data Structure Quality. Models ensure that the business design of the data architecture is appropriately mapped to the logical design, providing comprehensive documentation on both sides #4 Data Consistency. By having standardised nomenclature for all data – including domains, sizing, and documentation formats – the risk of data redundancy or misalignment is greatly reduced #3 Data Advocacy. Models help to emphasise the critical nature of data within the organisation, indicating direction of data strategy and tying data architecture to overall enterprise architecture plans, and ultimately to the business’s objectives #2 Data Reuse. Models, and encapsulation of the metadata underpinning data structures, ensure that data is easily identified and is leveraged correctly in the first place, speeding incremental tasks through reuse and minimising the accidental building of redundant structures to manage the same content #1 Data Knowledge. Models, combined with an efficient modelling practice, enable the effective communication of metadata throughout an organisation, and ensure all stakeholders are in agreement on the most fundamental requirement: the data. Business Intelligence Page 14Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 7 ER Models vs. Dimensional Models for Reporting Much has been written previously about the appropriateness of Entity Relationship (ER) vs Dimensional models for BI and Data Warehousing. To dispel any myths, it is worth looking at the key features of each type of model: Features of an ER Model • Optimised for transactional processing (arrival of new data) • Normalised – typically in third (or fifth normal form) • Designed for low redundancy of data • Relationships between business entities are explicit (e.g. Product determines Brand determines Manufacturer) • Tightly coupled to current business model Features of a Dimensional Model • “Star Schema” (or snowflake or even star flake) • Optimised for reporting • Business entities are de-normalised • More data redundancy to support faster query performance • Relationships between business entities are implicit (it is evident that a Product has a Brand and Manufacturer, but the nature of the relationship between these entities is not immediately obvious) • Loosely coupled to business model – changes to the business model can often be accommodated via graceful changes without invalidating existing data or applications So, where should I start if I want to develop a model that is suitable for reporting? Firstly, start with the existing data landscape. There should be an existing conceptual or logical data model for the area under consideration. Hopefully there will also be a physical model of the source system too. • Next examine the data model of the source system (ER Model), then • Identify the facts and their level of granularity, and following that • Identify the dimensions and their position within hierarchies. • After completing these you can design the dimensional model. • Finally, define mappings and transformations from fields in the source system typical in an (ER Model) to fields in the dimensional model – Hierarchies map to dimension tables (sometimes after applying a lookup) – Transaction figures map to measures in FACT tables (sometimes after applying some aggregation or other calculation) Business Intelligence (cont.) Page 15Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 8 Do not forget data lineage – it is applicable to many aspects of information management, and now with regulatory compliance requirements in many sectors, this is also a statutory need. In BI and DW applications, mappings and transformations determine how each field in the Dimensional Model is derived. The derivations could actually drive the ETL process. In data lineage, like BI, the metadata is vital! What is the Problem? Fundamentally, we need to be able to help business users to answer questions or concerns raised such as: That figure doesn’t look right! Where does it come from? How can we prove to the auditor that financial data has been handled correctly? Not only do we need to help our primary customers (the business folk), but also we need to be able to help IT staff to answer questions such as: I need to integrate the data supplied from your system with the data in my system. How can I understand where your data has come from and what it means? And finally, we need to be able to help systems developers to answer questions such as: When a piece of source data is updated, which items in the Data Warehouse will need to be recalculated? So why does Data Lineage Matter? We aim to have an increased understanding of where data comes from and how it is used, which will lead to increased confidence in the accuracy of data. The knowledge of how data is transformed is itself valuable intellectual property that should be retained within a business, and importantly, it is absolutely necessary for compliance with the Basel II Accord and Sarbanes-Oxley Act (SOX): SOX requires that lineage & transformation of financial data is recorded as it flows through business systems. Several other regulatory acts contain similar lineage requirements. Two Key Aspects of Data Lineage i. Transformations: What has been done to the data? ii. Business Processes: Which business processes can be applied to the data? What type of actions do those processes perform (Create, Read, Update, Delete, Archive)? Audit Trail – who has supplied, accessed, updated, approved and deleted the data and when? Which processes have acted on the data? So where do I need Data Lineage? For the design of ETL processes, the creation of Dimensional Models, the transforming data to XML (typically from ER) and for workflow design. Data Lineage Page 16Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 9 As mentioned earlier in this paper, we constantly need to demonstrate the benefits accruing from data modelling. Nobody owes Data Architects a living, and no matter how important we believe the place of modelling to be, it is beholdant upon us to demonstrate (and sell) the benefits within our organisations. Whether you like it or not, part of your job in Information Management is “selling”. So just how can you gain traction, budget and executive buy-in? Here are a few tips: 1. Be visible about the programme: • Identify key decision-makers in your organisation and update them on your project and its value to the organisation • Develop dashboards, show current and planned progress • Focus on the most important data that is crucial to the business first! Publish that and get buy-in before moving on (start small with a core set of data) 2. Monitor the progress of your project and constantly show its value. Do not be afraid to alter course. 3. Define deliverables, goals and key performance indicators (KPIs) 4. Start small—focus on core data that is highly visible in the organisation. Do not try to “boil the ocean” initially 5. Track and promote progress that is made 6. Measure metrics and demonstrate success: • “Hard data” is easy (for example # data elements, #end users, money saved, etc.) • “Softer data” is important as well (data quality, improved decision-making, and so on) Anecdotal examples help with business or executive users such as “Did you realise we were using the wrong calculation for Total Revenue?” (based on data definitions) Remember, soft skills are becoming critically important for information professionals, and whilst you might not like it, the hard facts are that part of your job is marketing. Demonstrating Benefits Page 17Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 10 As Information Professionals, we need to break away from the “you must read my detailed data model” mentality and make the appropriate model information available in a format users can readily understand. This, for example, means that Data Architects need to recognise the different motivations of their users and re-purpose the information they present to be suitable for the audience: Do not show a business user a detailed data model! Models should be regularly updated, and we must make it easy for users to give feedback, after all, you will achieve a better result and greater concensus. We need to recognise the-real world commercial climate we are working in and break away from arcane academic arguments about notations methodologies and the like. If we want to have Data modelling play a real part in our business then it is up to us to demonstrate and communicate the benefits that are realised. Remember, Data Modelling is not a belief system, just because you “get it” don’t assume that the next person does. So what can we do? 1. Provide Information to users in their “Language” • Repurpose information into various tools: BI, ETL, Data Definition Language (DDL), and so on • Publish to the web (e.g; via Sharepoint, company wiki, intranet and the like) • Exploit collaboration tools / SharePoint / Wiki and so on. What about a Company Information Management Twitter channel? ...you could “follow” CUSTOMER! • Business users like Excel, Word and web tools, so make the relevant data available to them in these formats. Modern Data Modelling tools facilitate web publishing so exploit these capabilities 2. Document Metadata • Data in Context (by organization, project, and so on) • Data with Definitions • Additional enrished metadata (such as Stewards, Owners, Master Data Sources) 3. Provide the Right Amount of Information • Do not overwhelm with too much information. For business users, terms and definitions might be enough • Cater to your audience; do not show DDL to a business user or business definitions to a Database Administrator (DBA). 4. Market, Market, Market! • Provide visibility of your project • Talk to teams in the organisation that are looking for assistance • Provide short-term results with a subset of information, and then move on The Greatest Change Required Page 18Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 10 5. Be aware of the differences in behaviour & motivations of different types of users. For example, a DBA is typically: • Cautious • Analytical • Structured • Doesn’t like to talk • “Just let me code!” 6. However a Data Architect is: • Analytical • Structured • Passionate • “Big Picture” focused • Likes to Talk • “Let me tell you about my data model!” 7. And a Business Executive is: • Results-oriented • “Big Picture”-focused • Has little time • “How is this going to help me?” • “I dont care about your data model.” • Doesn’t have time to look at “your” model As information professionals, we have got to get these softer skills baked into ourselves and our colleagues. Some of the key things as a profession we can do is to: • Develop interpersonal skills • Avoid methodology wars & notation bigots. Please do not air discussions about Barker vs IE vs UML class diagrams in front of business users. Yes, sadly enough I have seen this done! • Remember, nobody owes us a living, so we must constantly demonstrate benefits. As data professionals, we constantly need to fight for our existence • Examine professional certification (CDMP / BCS etc). This shows we are serious about our profession. With examinations such as these, “certification” is not the end goal, but “professionalism” is. The Greatest Change Required (cont.) Page 19Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 11 So having highlighted the areas that need to change in order to make modelling more relevant to our business colleagues, and the information environments of today, are there any things that should stay the same? Yes indeed. We must keep the disciplines and best practices that have existed in the modelling community for many years. These can be categorised into three major areas as follows: a. Modelling Rigour • Development of Conceptual, Logical and Physical Data models with good lineage and object re-use. • Structures created in the most appropriate normal form (typically third-normal form); • Good and consistent data definitions, for all components of the data model • Fundamentally, a good, well-designed data model represents a set of business assertions that can be verified with business folks. b. Standards & Governance These cover standards for both development and usage of information models and should include aspects of data quality. Data Governance including ownership, stewardship and operational control of the data. There is no getting away from it, but even in federated organisations, good data governance will require a degree of central coordination. Think of it as bureaucracy! c. Object Reuse via a Common Repository Not only used for data modelling, the metadata that is captured whilst developing Conceptual, Logical and Physical Data models is of immense use for many aspects of the business. Interestingly, several organisations are now beginning to use this metadata as the basis of their Business Data Dictionaries. Also, an increasing trend is to ‘generate’ many concepts of requirements documentation from the metadata repository rather than “duplicate” this in Word documents. The key here is holding the metadata in a common, repository and reusing the objects where appropriate. What Needs to Stay the Same? Page 20Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • 12 In the course of this paper we have examined many aspects of data modelling, starting with its history, its traditional use in DBMS development and firmly refuting the criticism that it is only appropriate for DBMS development. However as data professionals, it is up to us to make the biggest change necessary to make it appropriate to the “new” technologies and business environments of today. We need to grasp the nettle and engage effectively within our businesses. Go to it... Chris Bradley has thirty years of Information Management experience. A published author, regular columnist for the BeyeNETWORK and International conference speaker Chris leads IPL’s Business Consulting practice. He can be contacted at BusinessConsulting@IPL.com Summary Page 21Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • Christopher Bradley has spent thirty years in the Information Management field, working for leading organisations in Data Governance, Information Management Strategy, Master Data Management, Metadata Management, Data Warehouse and Business Intelligence implementations. His first degree was in Chemical Engineering & later he obtained his MBA. Bradley’s post academic career started for the UK Ministry of Defence where he worked on several major naval database systems and on the development of the ICL Data Dictionary System (DDS). His career included Volvo as lead data base architect, Thorn EMI as Head of Data Management, Readers Digest Inc as European CIO, and Coopers and Lybrand’s Management Consultancy where he established and ran the International Data Management specialist practice. During this time he worked upon and led many major international assignments including data management strategies, data warehouse implementations and establishment of data governance structures and the largest Data Management strategy undertaken in Europe. Currently, Bradley heads the Business Consultancy practice at IPL, a UK based consultancy and has been working for several years with many clients including a major energy company, a Norwegian Exploration Company, a Global Pharmaceuticals company & major Financial services organisations. He has introduced Data Governance, established Data Modelling as a Service, and developed group-wide Information Management strategies to ensure that common business practices and use of master data and models are promoted throughout. Frequently he has been engaged to evangelise the IM message to Executive management worldwide, develop governance and new business processes for Information Management and deliver training. Bradley is a member of the Meta Data Professionals Organisation (MPO) and an officer of DAMA, and holds the CDMP (Master) certification. He recently co-authored a book “Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models”. He also authors the Information Asset Management “Expert channel” on the BeyeNETWORK, is a regular blogger on information management (http://infomanagementlifeandpetrol.blogspot.com/) and regular Tweeter @InfoRacer. Bradley’s interests are motor racing, where he competes in a UK championship & youth work at his local Church. IPL www.IPL.com Tel: +44 (0)1225 475000 Grove Street, Eveleigh House, Bath BA1 5LR About the Author Page 22Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited© Christopher Bradley Business Consulting Director for IPL
  • Page 23 About the Author (cont.) Recent Speaking Engagements: Enterprise Data World International: (DAMA / Wilshire), May 2012, Atlanta GA, “A Model Driven Data Governance Framework For MDM - Statoil Case Study” “When Two Worlds Collide – Data and Process Architecture Synergies” “Petrochemical Information Management” Data Governance & MDM Europe: (DAMA / IRM), April 2012, London, “A Model Driven Data Governance Framework For MDM - Statoil Case Study” AAPG Exploration & Production Data Management: April 2012, Dead Sea Jordan, “A Process For Introducing Data Governance into Large Enterprises” PWC & Iron Mountain Corporate Information Management: March 2012, Madrid “Information Management & Regulatory Compliance” DAMA Scandinavia: March 2012, Stockholm, “Reducing Complexity in Information Management” Ovum IT Governance & Planning: March 2012, London “Data Governance – An Essential Part of IT Governance” American Express Global Technology Conference: November 2011, UK, “All An Enterprise Architect Needs To Know About Information Management” FIMA Europe (Financial Information Management):, November 2011, London, “Confronting The Complexities Of Financial Regulation With A Customer Centric Approach; Applying IPL’s Master Data Management And Data Governance Process In Clydesdale Bank “ Data Management & Information Quality Europe: (DAMA / IRM), November 2011, London, “Assessing & Improving Information Management Effectiveness – Cambridge University Press Case Study” “Too Good To Be True? – The Truth About Open Source BI” ECIM Exploration & Production: September 12th 14th 2011, Haugesund, Norway: “The Role Of Data Virtualisation In Your EIM Strategy” Enterprise Data World International: (DAMA / Wilshire), April 2011, Chicago IL, “How Do You Want Yours Served? – The Role Of Data Virtualisation And Open Source BI” Data Governance & MDM Europe: (DAMA / IRM), March 2011, London, “Clinical Information Data Governance” Data Management & Information Management Europe: (DAMA / IRM), November 2010, London, “How Do You Get A Business Person To Read A Data Model? DAMA Scandinavia: October 26th-27th 2010, Stockholm, “Incorporating ERP Systems Into Your Overall Models & Information Architecture” BPM Europe: (IRM), September 27th – 29th 2010, London, “Learning to Love BPMN 2.0” Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • Page 24 About the Author (cont.) IPL / Composite Information Management in Pharmaceuticals: September 15th 2010, London, “Clinical Information Management – Are We The Cobblers Children?” ECIM Exploration & Production: September 13th 15th 2010, Haugesund, Norway: “Information Challenges and Solutions” Enterprise Architecture Europe: (IRM), June 16th – 18th 2010, London: ½ day workshop The Evolution of Enterprise Data Modelling” BeyeNETWORK Webinar: (CA/BeyeNETWORK), March 31st 2010, Webinar. “Communicating With The Business Through High Level Data Models” IPL & DataFlux Seminar Series: (IPL/DataFlux), March 26th 2010, Bath, UK. “The Information Advantage – Exploiting Information Management For The Business” Enterprise Data World International: (DAMA / Wilshire), March 14th – 19th 2010, San Francisco CA, How To Communicate With The Business Using High Level Models” Data Management & Information Management Europe: (DAMA / IRM), November 2-5 2009, London, “Modelling Is NOT Just For DBMS’s Anymore” “Meet The Metadata Professional Organisation” “Experts Panel” Data Migration Matters: October 1st 2009, London, “Designing For Success” BPM Europe: (IRM), September 2009, London: ½ day workshop “An Introduction To Data And The BPMN” DAMA UK & BCS Data Management Group:, June 11th 2009; London, “Evolve Or Die - Data Modelling Is Not Just For DBMS’s” Enterprise Data World International: (DAMA / Wilshire), April 5th -12th 2009, Tampa FL, “Exploiting Models For Effective SAP Implementations” Chairing Panel Of Experts “Keeping Modelling Relevant” Panel Of Experts “Issues In Information Internationalisation” “Modelling Is Not Just For RDBMS’s” Data Rage 2009: March 17-19 2009, “Evolve Or Die – Modelling Is Not Just For DBMS’s Anymore” “Data Modelling As A Service” Webinar series: (Embarcadero Technologies & IPL), Oct 2008 – Feb 2009, “The New Formula For Success – Moving Data Modelling Beyond The Database” Data Governance Europe Symposia: (IRM / Debtech; London), February 2009, “Data Governance Challenges In A Major Multi National” DAMA Europe: (IRM / DAMA), November 2008, London, “BPMN for Dummies” “Data Modelling as a service” Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • Page 25 About the Author (cont.) BPM Europe: (IRM), September 2008, London: “BPMN for Dummies” DAMA International: (DAMA / Wilshire), March 16th – 21st 2008, San Diego, CA. “Establishing Data Modelling as a Service in BP” DAMA International: (DAMA / Wilshire), March 16th – 21st 2008, San Diego, CA. “Modelling For SOA” “XML And Data Models” Data Governance Conference: (Debtech / Wilshire) Florida, December 2007, “Data Governance 2.0” DQ/IM & DAMA Europe (IRM London), November 2007, “Data Modelling As A Service” IPL & Embarcadero seminar series: (Bristol, London, Manchester, Edinburgh), October 2007, “Data Modelling – Where Did It All Go Wrong?” Data Governance Conference, (Debtech / Wilshire) June 25 -28, 2007, San Francisco, CA, “Data Architecture For Governance – Case Study” DAMA UK: June 15th 2007, London, “Data Modelling – Where Did It All Go Wrong?” CDi_MDM Summit (IRM UK), April 30 – May 2 2007, London, “A Data Architecture For Data Governance” DAMA International (DAMA / Wilshire), March 5th -8th 2007, Boston, MA “Data As A Service” “Panel Of Data Modelling Experts” Embarcadero International users group: February 2007, “Extending Your Data Architecture” Meta Data Summit: (Debtech), January 2006, Orlando FL “Panel Of Experts” Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©
  • Page 26 About the Author (cont.) Recent Publications: Article: “How Data Virtualization Helps Data Integration Strategies” BeyeNETWORK (December 2011) Article: “Approaches & Selection Criteria For organizations approaching data integration programmes” TechTarget (November 2011) Article: Big Data – Same Problems. BeyeNETWORK and TechTarget. (July 2011) Article: “10 easy steps to evaluate Data Modelling tools” Information Management, (March 2010) Article: “How Do You Want Your Data Served?” Conspectus Magazine (February 2010) Article: “How do you want yours served (data that is)” (BeyeNETWORK January 2010) Article: “Seven deadly sins of data modelling” (BeyeNETWORK October 2009) Article: “Data Modelling is NOT just for DBMS’s” Part 1 BeyeNETWORK July 2009 and Part 2 BeyeNETWORK August 2009 BeyeNETWORK “Chris Bradley Expert Channel” Information Asset Management http://www.b-eye-network.co.uk/channels/1554/ Data Modelling For The Business – A Handbook for aligning the business with IT using high-level data models; Technics Publishing; ISBN 978-0-9771400-7-7; http://www.amazon.com/Data-Modeling-Business-Handbook-High-Level/dp/0977140075/ref=sr_1_4?ie=UTF8&s=books&qid=1 235660979&sr=1-4 Database Marketing Magazine, February 2009, “Preventing a Data Disaster” Data Modelling is NOT JUST for Database Management Systems Copyright of IPL Limited©