http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
Example Data Specifications &
Information Requirements Framework
PHYSICAL DATA SPECIFICATION
TEMPLATE
Alan D. Duncan
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
1 Purpose
This document template defines an outline structure for the clear and unambiguous definition of the
discreet data elements (tables, columns, fields etc.) within the physical data management layers of
the required data solution.
This template forms part of example data specification & information requirements framework. The
framework offers a set of outline principles, standards and guidelines to describe and clarify the
semantic meaning of data terms in support of an Information Requirements Management process.
(See the Framework Overview for further details.)
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
2 Physical Data Specification
 The template should be completed for each individual data element required within the
Logical Data Model layer (to the extent that this information can be defined).
Data Specification Item Purpose
System Table (File)
Name
The database table name (or file name in file-based data stores)
System Column Name
The database or file structure column.
Definition of what characteristic of each row the column describes from a
business perspective.
 This should be based on the concept of what data the column should
contain. Sometimes judgement will be required to draw a line
between what is normal use of a column and what constitutes a
quality issue.
 It is important here to concentrate on the relation of the column to
the table rows and not on how the column is used. Whilst the latter
may be of interest, it should never be a substitute for the former.
 Whilst we should strive for consistency, the language of the system
“owners” should generally be used here.
 Typically definitions will need to refer to other tables or data entities.
At a level of detail, these entities may have several definitions. It is
important that the references are explicit when referring to a specific
definition.
 Examples should be included wherever this aids understanding.
 Where a column is found to contain de-normalised data, the path of
de-normalisation should be fully described.
 If a column has multiple definitions dependant on row, it should be
clearly described together with an indication of how to determine the
actual definition for each row.
Column Description
Elaboration of the purpose for the column
Column Domain Type
See Appendix B for suggested list of Column Domains
Data Type & Length
e.g. Varchar 12, Numeric 9.2
Required Status
Record whether a value is always required (both from a physical and
logical perspective (NULL/NOT NULL constraint).
Primary Key Definition
List of Columns which constitute the primary key plus any other
information pertinent to the identification of rows
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
Column Relationships
Linkages from the column to other tables (e.g. Foreign Key relationships
to other table/columns).
 Typically definitions will need to refer to other tables or data entities.
At a level of detail, these entities may have several definitions. It is
important that the references are explicit when referring to a specific
definition.
Column Constraints
Any constraint rules to be applied (e.g. Primary Key, Unique Key rules)
Data row Definitions
Definition of what each row represents from a business perspective.
Any detailed technical rules or semantic encoding defined at the
record/row level of the data store. (e.g. Data Quality cleansing rules).
 This should be based on the concept of what data the table should
contain. Sometimes judgement will be required to draw a line
between what is normal use of a table and what constitutes a quality
issue.
 It is important here to concentrate on what the rows represent and
not on how they are used. Whilst the latter may be of interest, it
should never be a substitute for the former.
 Whilst we should strive for consistency, the language of the system
“owners” should generally be used here.
 If a table has sets of rows with quite different characteristics, this
should be clearly described together with an indication of how the
sets can be differentiated. Where appropriate a name should be
allocated to each set and potentially multiple definitions of the table
may need to be recorded. The Master/Copy status of each set
should be recorded.
 Data Scope should be recorded for all significant dimensions
(inclusions & exclusions)
Value Range
The valid set of values for the Column. (Or valid range in case of Date &
Number fields).
(Could be defined as a link or pointer to the location of an underlying
master data set.)
Related Logical Model
Data Element(s)
The supporting elements of the canonical model and their lineage with
the physical columns.
Expected Data
Volumes
 Number of records
 Size per record
Master/Copy Status
Is this an originating master source (System of Record) for this data set,
or a copy of the originating source?
Data Quality
Indications
Any initial indications of poor data quality at row level + Cause +
Business Impact
(NB: Not as a detailed level. This is an indicative assessment only, and
should trigger a more DQ investigation by Data Governance Unit if
indicated.
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
 Good data management and data governance practices require that the physical data storage
of any data solution aligns with the Enterprise Logical (Canonical) Model.
 Data designers must therefore clearly demonstrate that the data structures within any data
system of business application map to and align with the Logical Model.
 Beyond this requirement, data specification is not concerned with the technical details of data
management implementation, and therefore takes no specific interest in the physical design
or technical structure of any data stores or data processing layers therein.
 Note that this physical data definition schema is suitable for both “Source” and “Target” data
definitions.
 Examples should be included wherever this aids understanding.
 Notwithstanding, the expectations of auditability, integrity, traceability and persistence must
be demonstrated.
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
Appendix A: Column Domains – Candidate list
Name Definition
Amount A Monetary Amount. i.e. a Quantity of a CURRENCY
Code A character string or number which is used for identification purposes.
* has no explicit natural language meaning - i.e. not an English word
Cost/Revenue Amount An Amount of a Currency where:
* positive = Revenue
* negative = Cost
Count
Date
Date/Time specification of seconds ?
Day of Month
Day of Week
Days A number of days.
Description A brief text description.
Details Data with embedded meaning and of a complex format but for which the
meaning cannot be consistently interpreted by a computer system.
Direction Direction of an accounted balance.
DR/CR Amount An Amount of a Currency where:
* positive = DR
* negative = CR
Email Address
External Reference A code or reference for which the format is specified by an external party.
Factor A rate/proportion/ratio in the range 0 to a maximum value.
Frequency EXAMPLES Annual, Half Year, Quarterly, Monthly, Weekly, Daily, Ad Hoc
Indicator Binary Indicator - Yes or No.
Name A meaningful word or phrase used for identification purposes.
Notes Textual Notes.
Ordinal A number indicating a position within a sequence of numbers.
Phone International
Quantity A number of units.
Rate A rate/proportion/percentage.
Status A number or character string used to indicate a state which is likely to change
over time.
Time
Type A number or character string used for classification with a discrete set of
values per column.
* could be an english word or phrase
Year A calendar year. E.g. 2002
Year/Month A month in a specific year. E.g. November 2002
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
About the author
Alan D. Duncan is an evangelist for information and analytics as
enablers of better business outcomes, and a member of the
Advisory Board for QFire Software.
An executive-level leader in the field of Information and Data
Management Strategy, Governance and Business Analytics, he
has over 20 years of international business experience, working
with blue-chip companies in a range of industry sectors. Alan
was named by Information-Management.com in their 2012 list of
“Top 12 Data Governance gurus you should be following on
Twitter”.
Twitter: @Alan_D_Duncan
Blog: http://informationaction.blogspot.com.au/
Physical Data Specification Template
http://informationaction.blogspot.com
Tw: @Alan_D_Duncan
Information Strategy | Data Governance | Analytics | Better Business Outcomes
Intellectual curiosity
Skeptical scrutiny
Critical thinking
http://www.informationaction.blogspot.com.au/
@Alan_D_Duncan
http://www.linkedin.com/in/alandduncan

05. Physical Data Specification Template

  • 1.
    http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy| Data Governance | Analytics | Better Business Outcomes Example Data Specifications & Information Requirements Framework PHYSICAL DATA SPECIFICATION TEMPLATE Alan D. Duncan This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
  • 2.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes 1 Purpose This document template defines an outline structure for the clear and unambiguous definition of the discreet data elements (tables, columns, fields etc.) within the physical data management layers of the required data solution. This template forms part of example data specification & information requirements framework. The framework offers a set of outline principles, standards and guidelines to describe and clarify the semantic meaning of data terms in support of an Information Requirements Management process. (See the Framework Overview for further details.)
  • 3.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes 2 Physical Data Specification  The template should be completed for each individual data element required within the Logical Data Model layer (to the extent that this information can be defined). Data Specification Item Purpose System Table (File) Name The database table name (or file name in file-based data stores) System Column Name The database or file structure column. Definition of what characteristic of each row the column describes from a business perspective.  This should be based on the concept of what data the column should contain. Sometimes judgement will be required to draw a line between what is normal use of a column and what constitutes a quality issue.  It is important here to concentrate on the relation of the column to the table rows and not on how the column is used. Whilst the latter may be of interest, it should never be a substitute for the former.  Whilst we should strive for consistency, the language of the system “owners” should generally be used here.  Typically definitions will need to refer to other tables or data entities. At a level of detail, these entities may have several definitions. It is important that the references are explicit when referring to a specific definition.  Examples should be included wherever this aids understanding.  Where a column is found to contain de-normalised data, the path of de-normalisation should be fully described.  If a column has multiple definitions dependant on row, it should be clearly described together with an indication of how to determine the actual definition for each row. Column Description Elaboration of the purpose for the column Column Domain Type See Appendix B for suggested list of Column Domains Data Type & Length e.g. Varchar 12, Numeric 9.2 Required Status Record whether a value is always required (both from a physical and logical perspective (NULL/NOT NULL constraint). Primary Key Definition List of Columns which constitute the primary key plus any other information pertinent to the identification of rows
  • 4.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes Column Relationships Linkages from the column to other tables (e.g. Foreign Key relationships to other table/columns).  Typically definitions will need to refer to other tables or data entities. At a level of detail, these entities may have several definitions. It is important that the references are explicit when referring to a specific definition. Column Constraints Any constraint rules to be applied (e.g. Primary Key, Unique Key rules) Data row Definitions Definition of what each row represents from a business perspective. Any detailed technical rules or semantic encoding defined at the record/row level of the data store. (e.g. Data Quality cleansing rules).  This should be based on the concept of what data the table should contain. Sometimes judgement will be required to draw a line between what is normal use of a table and what constitutes a quality issue.  It is important here to concentrate on what the rows represent and not on how they are used. Whilst the latter may be of interest, it should never be a substitute for the former.  Whilst we should strive for consistency, the language of the system “owners” should generally be used here.  If a table has sets of rows with quite different characteristics, this should be clearly described together with an indication of how the sets can be differentiated. Where appropriate a name should be allocated to each set and potentially multiple definitions of the table may need to be recorded. The Master/Copy status of each set should be recorded.  Data Scope should be recorded for all significant dimensions (inclusions & exclusions) Value Range The valid set of values for the Column. (Or valid range in case of Date & Number fields). (Could be defined as a link or pointer to the location of an underlying master data set.) Related Logical Model Data Element(s) The supporting elements of the canonical model and their lineage with the physical columns. Expected Data Volumes  Number of records  Size per record Master/Copy Status Is this an originating master source (System of Record) for this data set, or a copy of the originating source? Data Quality Indications Any initial indications of poor data quality at row level + Cause + Business Impact (NB: Not as a detailed level. This is an indicative assessment only, and should trigger a more DQ investigation by Data Governance Unit if indicated.
  • 5.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes  Good data management and data governance practices require that the physical data storage of any data solution aligns with the Enterprise Logical (Canonical) Model.  Data designers must therefore clearly demonstrate that the data structures within any data system of business application map to and align with the Logical Model.  Beyond this requirement, data specification is not concerned with the technical details of data management implementation, and therefore takes no specific interest in the physical design or technical structure of any data stores or data processing layers therein.  Note that this physical data definition schema is suitable for both “Source” and “Target” data definitions.  Examples should be included wherever this aids understanding.  Notwithstanding, the expectations of auditability, integrity, traceability and persistence must be demonstrated.
  • 6.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes Appendix A: Column Domains – Candidate list Name Definition Amount A Monetary Amount. i.e. a Quantity of a CURRENCY Code A character string or number which is used for identification purposes. * has no explicit natural language meaning - i.e. not an English word Cost/Revenue Amount An Amount of a Currency where: * positive = Revenue * negative = Cost Count Date Date/Time specification of seconds ? Day of Month Day of Week Days A number of days. Description A brief text description. Details Data with embedded meaning and of a complex format but for which the meaning cannot be consistently interpreted by a computer system. Direction Direction of an accounted balance. DR/CR Amount An Amount of a Currency where: * positive = DR * negative = CR Email Address External Reference A code or reference for which the format is specified by an external party. Factor A rate/proportion/ratio in the range 0 to a maximum value. Frequency EXAMPLES Annual, Half Year, Quarterly, Monthly, Weekly, Daily, Ad Hoc Indicator Binary Indicator - Yes or No. Name A meaningful word or phrase used for identification purposes. Notes Textual Notes. Ordinal A number indicating a position within a sequence of numbers. Phone International Quantity A number of units. Rate A rate/proportion/percentage. Status A number or character string used to indicate a state which is likely to change over time. Time Type A number or character string used for classification with a discrete set of values per column. * could be an english word or phrase Year A calendar year. E.g. 2002 Year/Month A month in a specific year. E.g. November 2002
  • 7.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes About the author Alan D. Duncan is an evangelist for information and analytics as enablers of better business outcomes, and a member of the Advisory Board for QFire Software. An executive-level leader in the field of Information and Data Management Strategy, Governance and Business Analytics, he has over 20 years of international business experience, working with blue-chip companies in a range of industry sectors. Alan was named by Information-Management.com in their 2012 list of “Top 12 Data Governance gurus you should be following on Twitter”. Twitter: @Alan_D_Duncan Blog: http://informationaction.blogspot.com.au/
  • 8.
    Physical Data SpecificationTemplate http://informationaction.blogspot.com Tw: @Alan_D_Duncan Information Strategy | Data Governance | Analytics | Better Business Outcomes Intellectual curiosity Skeptical scrutiny Critical thinking http://www.informationaction.blogspot.com.au/ @Alan_D_Duncan http://www.linkedin.com/in/alandduncan