Trends in Data Modeling

Trends in Data Modeling
Presented by James Michael Lee and Peter Aiken, Ph.D.

Welcome: Trends in Data Modeling
Date: August 11, 2015
Time: 2:00 PM ET
Presented by: Peter Aiken, PhD 
Steven MacLauchlan 
Michael Lee
2Copyright 2015 by Data Blueprint Slide #
Businesses cannot compete without data. Every organization produces and
consumes it. Data trends are hitting the mainstream and businesses are adopting
buzzwords such as Big data, Data Vault, Data Scientist, etc., to seek solutions to
their fundamental data issues. Few realize that the importance of any solution,
regardless of platform or technology relies on the data model supporting it. Data
modeling is not an optional task for an organization’s data remediation effort.
Instead, it is a vital activity that supports the solution driving your business.
This webinar will address emerging trends around data model application
technology, as well as trends around the practice of data modeling itself. We will
discuss abstract models and entity frameworks, as well as the general shift from
data modeling being segmented to becoming more integrated with business
practices.
Takeaways:
• NoSQL, data vault, etc., different and when should I apply them?
• How Data Modeling relates to business process
• Application development (data first, code first, object first?)

Steven MacLauchlan
• 10 years of experience in Application
Development and Data Modeling with a
focus on Healthcare solutions.
• Delivers tailored data management solutions
that provide focus on data’s business value
while enhancing clients’ overall capability to
manage data
• Certified Data Management Professional (CDMP)
• Computer Science degree from Virginia Commonwealth
University
• Most recent focus: Understanding emerging  
data modeling trends and how these can  
best be leveraged for the Enterprise.

Peter Aiken, Ph.D.
• 30+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 9 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:
– US DoD (DISA/Army/Marines/DLA)
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– …
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
The Case for the
Chief Data Officer
Recasting the C-Suite to Leverage
Your MostValuable Asset
Peter Aiken and
Michael Gorman

James “Michael” Lee
• Data Consultant certified in a number of areas, including Data
Vault 2.0 Practitioner, Kimball ETL Architecture and Certified
Data Management Professional (CDMP).
• Over 7 years of experience with
– Designing data quality solutions
– improving data management practices
– implementing Data Governance frameworks
– architecting data warehouses
– implementation of system upgrades and migrations
• In the following industries:
– telecommunications
– banking
– insurance
– government (defense)
– commercial manufacturing
– international shipping

We believe ...
Data  
Assets
Financial  
Assets
Real 
Estate Assets
Inventory
Assets
Non-
depletable
Available for
subsequent
use
Can be  
used up
Can be  
used up
Non-
degrading √ √ Can degrade 
over time
Can degrade 
over time
Durable Non-taxed √ √
Strategic
Asset √ √ √ √
• Today, data is the most powerful, yet underutilized and poorly
managed organizational asset
• Data is your
– Sole
– Non-depleteable
– Non-degrading
– Durable
– Strategic
• Asset
– Data is the new oil!
– Data is the new (s)oil!
– Data is the new bacon!
• Our mission is to unlock business value by
– Strengthening your data management capabilities
– Providing tailored solutions, and
– Building lasting partnerships
Asset: A resource controlled by the organization as a result of past events or transactions and from which
future economic benefits are expected to flow [Wikipedia]

Copyright 2015 by Data Blueprint
• Business to Data: the Relationship
• What is a Data Model?
• Conceptual, Logical, Physical
• What issues can poor data modeling
introduce?
• Different Models, Different Uses
• Traditional (3NF, Star Schema, Data Vault)
• NoSQL Technologies (Key-Value/Document,
Graph, Column Family)
• Trends
- Move to the business
- Self Service and Virtualization
- Agile
- Data Sharing World (The API’s)
- Patterns and Reuse
- Metadata Modeling
7

What is a Data Model*?
• A data model organizes data
elements and standardizes how the
data elements relate to one another.
• In “Data Modeling Made Simple” by
Steve Hoberman, he says: "A data
model is a wayfinding tool for both
business and IT professionals,
which uses a set of symbols and
text to precisely explain a subset of
real information to improve
communication within the
organization and thereby lead to a
more flexible and stable application
environment."
*According to ANSI.

Why should we care about poor data models?
• Poor data modeling up front can cause Data Quality issues “downstream”
• If the model isn’t a true representation of the business concepts, this will impact
confidence in the data, inhibit business insights and innovation
• Potential for poor DB/Application performance for reads/writes. Example: Over-
normalization
• Lack of flexibility can cause difficulty aligning with evolving business requirements
• Difficulty integrating data in the future
• Constrains business agility by complicating reengineering
• Creates operational inefficiencies (ex: poor application performance)
• Limits workflow transparency
• Proliferates system work-arounds,  
including shadow systems  
developed by end users
• Impact Analysis

How are Data Models Expressed as Architectures?
• Attributes are organized into entities/objects
– Attributes are characteristics of "things"
– Entitles/objects are "things" whose information is
managed in support of strategy
• Entities/objects are organized into models
– Combinations of attributes and entities are structured
to represent information requirements
– Poorly structured data, constrains organizational
information delivery capabilities
• Models are organized into architectures
– When building new systems, architectures are used to
plan development
– More often, data managers do not know what existing
architectures are and - therefore - cannot make use of
them in support of strategy implementation
More Granular 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
More Abstract

The Conceptual Data Model
• Represents entities and relationships
• Should Identify the domain and scope of data
• Should be easily understood by business users in order to
communicate core data concepts, and drive application
requirements
Example:
We need to model customer
address data. A customer may have
many addresses, and many
customers may share one address.
“many to many”

DISPOSITION Data Map

Data map of DISPOSITION
• At least one but possibly more system USERS enter the DISPOSITION facts into the system.
• An ADMISSION is associated with one and only one DISCHARGE.
• An ADMISSION is associated with zero or more FACILITIES.
• An ADMISSION is associated with zero or more PROVIDERS.
• An ADMISSION is associated with one or more ENCOUNTERS.
• An ENCOUNTER may be recorded by a system USER.
• An ENCOUNTER may be associated with a PROVIDER.
• An ENCOUNTER may be associated with one or more DIAGNOSES.
ADMISSION Contains information about patient admission history
related to one or more inpatient episodes
DIAGNOSIS Contains the International Disease Classification (IDC) of
code representation and/or description of a patient's health
related to an inpatient code
DISCHARGE A table of codes describing disposition types available for
an inpatient at a FACILITY
ENCOUNTER Tracking information related to inpatient episodes
FACILITY File containing a list of all facilities in regional health care
system
PROVIDER Full name of a member of the FACILITY team providing
services to the patient
USER Any user with access to create, read, update, and delete
DISPOSITION data

A sample data entity and associated metadata
Entity: BED
Data Asset Type: Principal Data Entity
Purpose: This is a substructure within the Room 
substructure of the Facility Location. It contains  
information about beds within rooms.
Source: Maintenance Manual for File and Table 
Data (Software Version 3.0, Release 3.1)
Attributes: Bed.Description 
Bed.Status 
Bed.Sex.To.Be.Assigned 
Bed.Reserve.Reason
Associations: >0-+ Room
Status: Validated
• A purpose statement describing why the organization is maintaining information
about this business concept;
• Sources of information about it;
• A partial list of the attributes or characteristics of the entity; and
• Associations with other data items; this one is read as "One room contains zero or
many beds."

The Logical Data Model
• Should represent the Conceptual Data model more
thoroughly, but be otherwise very similar
• Will include attributes, names, relationships, and other
metadata
• Will be developed using Data Modeling notation (ex: UML)

The Physical Data Model
• Describes the specific database implementation of the
data
• Attributes will be named according to naming conventions
• Displays data types, accurate table names, Key
information, etc

CM2 Component Evolution is technology derived but technology independent

Data Reengineering for More Shareable Data
Other logical as-is
data architecture
components

Data Modeling Framework
Conceptual Logical Physical
 
 
 
 
 
Goal
Validated
Not Validated
Copyright 2015 by Data Blueprint Slide # 19

introduce?
• Trends
- Agile
- Metadata Modeling
20

Normalization Rules Overview
• 1st Normal Form - no repeating non-
key attributes for a given primary key
• 2nd Normal Form - no non-key
attributes that depend on only a
portion of the primary key
• 3rd Normal Form - no attributes
depend on something other than the
primary key
• 4th Normal Form - attributes depend
on not only key but the value of the
key
• 5th Normal Form - an entity is in
5NBF if its dependencies on
occurrences of the same entity of
entity type have been moved into a
structured entity
The row in every table is
dependent on the key, the whole
key and northern but the key

Third Normal Form
• Each attribute in the relationship is a fact about a key
• Highly normalized structure
• Use Cases:
– Transactional Systems.
– Operational Data Stores.

Third Normal Form: Pros and Cons
• Pros
– Easily understood by business and end users
– Reduced data redundancy
– Enforced referential integrity
– Indexed attributes/flexible querying
• Cons
– Joins can be expensive
– Does not scale
Neo4j.com

Star Schema
• Comprised of “fact tables” that contain quantitative data,
and any number of adjoining “dimension” tables
• Optimized for business reporting
• Use Cases:
– OLAP (Online Analytic Processing)
– BI
Wikipedia

Star Schema Pros and Cons
• Pros
– Simple Design
– Fast Queries
– Most major DBMS
are optimized for
Star Schema
Designs
• Cons
– Questions must be
built into the design
– Data marts are often
centralized on one
fact table

Data Vault
• Designed to facilitate long-term historical storage, focusing on ease
of implementation
• Retains data lineage information (source/date)
• “All the data, all the time”. Hybrid approach of Inmon and Kimball.
• Comprised of Hubs (which contain a list of business keys that do not
change often), Links (Associations/transactions between hubs), and
Satellites (descriptive attributes associated with hubs and links)
• Use Cases:
– Data Warehousing
– Complete Auditability
Bukhantsov.org

Data Vault Pros and Cons
• Pros
– Simple integration
– Houses immense
amounts of data with
excellent performance
– Full data lineage
captured
• Cons
– Complication is pushed
to the “back end”
– Can be difficult to setup
for many data workers
– No widespread support
for ETL tools yet

Model Comparison Matrix
3NF Dimensional Vault
Scalability ☑ ☑ ☑
Flexibility ☒ ☒ ☑
Reengineering ☒ ☒ ☑
Auditability ☑
Business Interpretable ☑ ☑ ☒
Presentation Layer ☒ ☑ ☒
Performance ☒ ☑ ☑
Support ☑ ☑

Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
Gartner Five-phase Hype Cycle

2012 Hype Cycle

2012 Big Data in Hype Cycle

"A focus on big data is not a substitute for the
fundamentals of information management."

NoSQL Solutions*
• Document/Key Value
– “Schema-less” design empowers developers*
– Scalable
– High availability
– Economically viable (scale out not up!)
• RDF/Triple Store
– Purpose-built to store triples (“bob likes football”)
– SPARQL is a query language specific to RDF.
– One of the pillars of “Semantic Web”
• Graph
– Structure comprised of “nodes”, “edges”, and “properties”
– Focused on the interconnection between entities
– Fast queries to find associative data
• Column Family
– Columns are stored individually (but clustered by “family” unlike traditional
columnar databases)
– By only querying specific column families, we can have nearly unlimited
numbers of columns without causing expensive queries
*not exhaustive!

NoSQL Data Models
RDF/Triple Store
Graph (Source: Neo4J)
Document Store (Source: MongoDB)
Column Store (Source: Toadworld)

NoSQL providers
Wikibon.org

Example: Marvel’s Data Model

introduce?
• Trends
- Agile
- Metadata Modeling
38

Move it to the Business
• Models need to add value
• Models need to be part of the process
– (Not a documentation of the process)
• Models need to assist in improving capabilities, not
hindering them
– Self Service BI

Self Service and Virtualization
• Self Service BI requires end user understanding of
the system
• Presentation Data Models

Agile
• Incremental build of models
– Not an excuse to create bad models
• 80/20 Rule
• The problem with code first
– Rules exist in code
– Reengineering concerns
– Governance concerns
– Lack Business Insights
• Database First
– Creates value in modeling
– Enforced integrity and lineage of the data
– Integrates the model into the process
– Used to generate code

A Data Sharing World
• Adding structure to information allows us to obtain
exactly what we want, when we want it.
• Allows applications to serve up data to external
sources in a structured way- “Post-schema”.

Design Patterns
• Why are the restrooms generally in the same place in each building?
• What about the electrical wiring?
• HVAC? Floorplans? ...
• Architecture design patterns (spoke and hub,  
hub of hubs, warehouse, cloud, MDM,  
changing tires, portal)

Patterns and Reuse
• Common rule of thumb:
– One third of a data model
contains fields common to all
business.
– One third contains fields common
to the industry, and the
– Other third is specific to the
organization.
• Patterns should theoretically provide
an organization with a base-line to
quickly develop data infrastructure.
• Off-the-shelf solutions may require
in-depth customization or
specialization.

Source:http://dmreview.com/article_sub.cfm?articleID=1000941 used with permission
Meta Data Models

Marco & Jennings's Metadata Model
Source:http://dmreview.com/article_sub.cfm?articleID=1000941 used with permission

introduce?
• Trends
- Agile
- Metadata Modeling
47

Conclusions
• Data Modeling is
important to get right.
• Getting it “right” is
hugely dependent on
the business case,
maturity of the
organization,
flexibility for future
growth, and so much
more.
• There are many
technologies and
ideas available to
help solve a number
of problems.
• Don't try any of this
without considering
the various
architectures involved

Questions?
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter, Michael and Steven now.

Upcoming Events
Data Quality Success Stories
September 8, 2015
@ 2:00 PM ET/11:00 AM PT
Design & Manage Data Structures
October 13, 2015  
@ 2:00 PM ET/11:00 AM PT
Sign up here:
• www.datablueprint.com/webinar-schedule
• or www.dataversity.net

Sources
• Data model. (2014, October 7). In Wikipedia, The Free
Encyclopedia. Retrieved October 7, 2014, from http://
en.wikipedia.org/w/index.php?
title=Data_model&oldid=628639882
• Data Modeling 101. (2006). In Agile Data. Retrieved
October 7, 2014, from http://www.agiledata.org/essays/
dataModeling101.html

Trends in Data Modeling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Trends in Data Modeling

Similar to Trends in Data Modeling (20)

More from DATAVERSITY

More from DATAVERSITY (20)

Recently uploaded

Recently uploaded (20)

Trends in Data Modeling