More Related Content
Similar to Get the Most Out of Your Tools: Data Management Technologies
Similar to Get the Most Out of Your Tools: Data Management Technologies (20)
More from DATAVERSITY (20)
Get the Most Out of Your Tools: Data Management Technologies
- 1. TITLE
Welcome!
Get the Most out of Your Tools:
Data Management Technologies
Date: November 13, 2012
Time: 2:00 PM ET
Presenter: Dr. Peter Aiken
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 1
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 2. TITLE
Get Social With Us!
Live Twitter Feed Like Us on Facebook Join the Group
Join the conversation! www.facebook.com/ Data Management &
Follow us: datablueprint Business Intelligence
@datablueprint Post questions and Ask questions, gain insights
comments and collaborate with fellow
@paiken
Find industry news, insightful data management
Ask questions and submit
content professionals
your comments: #dataed
and event updates.
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 2
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 3. TITLE
TITLE
Meet Your Presenter: Dr. Peter Aiken
• Internationally recognized thought-leader in
the data management field – 30 years of
experience
• Recipient of multiple international
awards
• Founder, Data Blueprint
http://datablueprint.com
• 7 books and dozens of articles
• Experienced w/ 500+ data management
practices in 20 countries
• Multi-year immersions with organizations
as diverse as the US DoD, Deutsche Bank,
Nokia, Wells Fargo, the Commonwealth of
Virginia and Walmart
PRODUCED BY CLASSIFICATION DATE SLIDE
PRODUCED BY CLASSIFICATION* DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 3
DATA© BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
11/06/12 Copyright this and previous years by Data Blueprint - all rights reserved! EDUCATION 4
11/13/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 4. Data Management
Technologies
Data Management Technologies
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION
- 5. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 5
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 6. TITLE
The DAMA Guide to the Data Management Body of Knowledge
Published by DAMA
International
• The professional
association for Data
Managers (40
chapters worldwide)
DMBoK organized
around
• Primary data
management
functions focused
around data delivery
to the organization
• Organized around
several
environmental
elements
Data Management
Functions
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 6
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 7. TITLE
The DAMA Guide to the Data Management Body of Knowledge
Amazon:
http://
www.amazon.com/
DAMA-Guide-
Management-
Knowledge-DAMA-
DMBOK/dp/
0977140083
Or enter the terms
"dama dm bok" at the
Amazon search
engine
Environmental Elements
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 7
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 8. TITLE
What is the CDMP?
• Certified Data Management
Professional
• DAMA International and ICCP
• Membership in a distinct group made
up of your fellow professionals
• Recognition for your specialized
knowledge in a choice of 17 specialty
areas
• Series of 3 exams
• For more information, please visit:
– http://www.dama.org/i4a/pages/
index.cfm?pageid=3399
– http://iccp.org/certification/
designations/cdmp
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 8
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 9. TITLE
Data Management
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 9
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 10. TITLE
Data Management
Manage data coherently.
Data Program
Coordination
Share data across boundaries.
Organizational
Data Integration
Data Stewardship Data Development
Assign responsibilities for data.
Engineer data delivery systems.
Data Support
Operations
Maintain data availability.
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 10
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 11. TITLE
Data Management
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 11
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 12. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 12
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 13. TITLE
Tools and Methods Are Required!
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 13
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 14. TITLE
Sample Existing Environment
Ma r
ketin
g
Logistics
Systems Flat Files
S 2
BM
RD
HR
Finance
Manufacturing
Systems Flat Files
RDB
#1
MS
1
R&D
2
BackOffice
D#
Applications
R&
#3 Network
D Database
R&
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 14
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 15. TITLE
Reengineering is typically the problem solution…
Reverse Engineering
As Is Information As Is Data Design Assets As Is Data Implementation
Requirements Assets
Assets
Existing
To Be To Be To Be Data
Requirements Design Implementation
New
Assets Assets Assets
Forward engineering
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 15
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 16. TITLE
Bibiana Duet's
Example10124-C W. BROAD ST, GLEN ALLEN, VA 23060
DATA BLUEPRINT Query Outputs
PRODUCED BY CLASSIFICATION DATE
EDUCATION
SLIDE
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 17. TITLE
Data Management Technologies
• Managing data technology should follow the
same principles and standards for managing any
technology
• Leading reference model for technology
management is the Information Technology
Infrastructure Library (ITIL):
http://www.itil-officialsite.com/home/home.asp
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 17
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 18. TITLE
Understanding Data Technology Requirements
Need to understand:
• How the technology works
• How it provides value in the context of a particular
business
• Requirements of a data technology before determining
what technical solution to choose for a particular situation
Suggested questions:
• What problem does this data technology mean to solve?
• What sets this data technology apart from others?
• Are there specific hardware/software/operating systems/
storage/network/connectivity requirements?
• Does this technology include data security functionality?
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 18
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 19. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 19
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 20. TITLE
Defining Data Technology Architecture
• Data technology is part of the overall technology
architecture
• It is also often considered part of the enterprise’s
data architecture
• Data technology architecture addresses 3
questions:
– What technologies are standard/required/
preferred/acceptable?
– Which technologies apply to which purposes
and circumstances?
– In a distributed environment, which
technologies exist where, and how does data
move from one node to another?
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 20
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 21. TITLE
Data Technology Architecture, cont’d
Data technologies to be included in the technology
architecture:
• Database management systems (DBMS) software
• Related database management utilities
• Data modeling and model management software
• Business intelligence software for reporting and analysis
• Extract-transform-load (ETL) and other data integration
tools
• Data quality analysis and data cleansing tools
• Metadata management software, including metadata
repositories
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 21
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 22. TITLE
Data Technology Architecture, cont’d
• The technology roadmap
for the organization
consists of technology
objectives as well as
reviewed, approved, and
published technology
architecture components
• This strategic roadmap
can be used to inform and
direct future data
technology research and
project work
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 22
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 23. TITLE
Polling Question #1
What is one important thing to understand
about technology?
a) It is sometimes free
b) Buying the same technology
that everyone else is using,
and using it in the same way
will create business value
c) It should always be regarded
as the means to an end,
rather than the end itself
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 23
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 24. TITLE
Data Technology Architecture, cont’d
• It is important to understand several things
about technology:
– It is never free. Even open-sourced
technology requires care and feeding.
– It should always be regarded as the means to
an end, rather than the end itself.
– Most importantly: Buying the same technology
that everyone else is using, and using it in the
same way, does not create business value or
competitive advantage.
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 24
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 25. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 25
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 26. TITLE
CASE Tools
Computer Aided Software/Systems Engineering
Computer-aided software engineering
Tools
(CASE) is application of a set of tools and methods
• Scientific the scientific application of a
set of software system which is meant to result in
to a tools and methods to a software
system which is meantand result in high-
high-quality, defect free, to maintainable
software products
quality, defect-free, and maintainable
• Refers to methods for the development of
software products. It also refers to
information systems together with automated
methods for the development of
tools that can be used in the software
information systems together with
development process
automated toolsinclude analysis, design, the
• CASE functions that can be used in and
programming
software development process.
Source: http://en.wikipedia.org/wiki/
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 26
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 27. TITLE
CASE Tools: Example(s)
• Microsoft
– Visio
– Powerpoint
– Excel
• ERwin
• ER/Studio
List of CASE Tools: http://www.unl.csi.cuny.edu/faqs/software-enginering/tools.html
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 27
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 28. TITLE
Figure 18.2 Sample budget for implementing a $2500/seat CASE
technology can be $2.5 million over a 5-year period
[adapted from Huff "Elements of a
Realistic CASE Tool Adoption Budget" ©
1992 Communications of the ACM]
$187K =
$2500/seat
× 75 seats
$360K = training
$500K = workstations
28 $150K= assessment costs
$910K = total initial investment
$150K
= in-house support
$ 55K
= hardware and software maintenance
$ 60K
= ongoing training and misc.
$265K
= annual additional investment
× 5 years
$1325K investment over 5 years
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 29. CASE Tool: "Taxonomy"
TITLE
• Senders—flows from the
CASE effort that can
inform the re-architecting
effort.
• Receivers —flows from
the project that can inform
the CASE effort.
• Senders and receivers
—some elements, such as
restructuring and
reengineering, are both
senders and receivers.
[adapted from Joanes Assessment and Control
of Software © 1994 Prentice-Hall]
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 29
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 30. TITLE
CASE-based XML Support http://www.visible.com
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 30
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 31. TITLE
Changing Model of CASE Tool Usage
Everything must "fit" into
one CASE technology metadata
A variety of
Limited access
CASE-based
from outside CASE methods and
the CASE tool-specific XML
technologies can
technology methods Integration
access and
environment and update the
technologies metadata
Additional metadata uses
Limited additional accessible via: web; portal;
metadata use XML; RDBMS
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 31
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 32. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 32
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 33. TITLE
Repositories have been difficult to "sell"
21 September 1999
Michael Blechar, Lisa Wallace
Management Summary
Most executive and IS managers view an IT metadata repository as
an esoteric technology that is not directly related to the business.
However, as will be seen, an IT metadata repository can substantially
help IS organizations support the applications, which in turn support
the business. An IT metadata repository is a pre-built system and
reference database where the IS organizations can track and
manage the information about the applications and databases they
build and maintain; think of it as the inventory and change impact
reporting system for IS. These repositories track metadata such as
the descriptions of jobs, programs, modules, screens, data and
databases, and the interrelationships between them. Metadata differs
from the actual data being described. Metadata is information about
data. For example, the metadata descriptions in the repository tell
one that the field "customer number" appears in Databases A, B and
F ...
[From gartner.com]
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 33
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 34. TITLE
Repository Technologies in Use
What tools do you use?
45% • Almost one in two organizations
(45%) doesn't use repository
technology
• Almost one in four organizations
(23%) is building their own
repository technology
23% • The "traditional" players (CA &
Rochade) are in use in 16% of
organizations surveyed
13%
9%
7%
2%
1% 1% 1% 1%
None HomeGrown Other CA Platinum Rochade Universal DesignBank DWGuide InfoManager Interface
Repository Metadata
Tool
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 Number Responding=181
EDUCATION 34
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 35. TITLE
Repository Evolution
Traditional Evolving
§ Passive Analysis § Standards – investment
protection: MOF
§ Relational & Data
Warehouse § Openness, Simplification &
Choice: XMI
§ Batch & Reports
§ Diverse metadata management
§ Optional not critical (including messaging)
§ Proprietary & OIM § Real time and ad hoc for
decision support
§ Daily business value within a
production architecture
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 35
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 36. TITLE
Metadata Repositories 2004
"However, due to
cost (these tools
start at about
$150,000, but
frequently exceed $1
million) and being
slow to market in
terms of support for
new service-oriented
architectures
(SOAs), CA and ASG
have opened the
door to smaller
competitors"
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 36
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 37. Application Build Model IBM's AD/Cycle Information Model
Defines the tools, parameters and Business
Business Strategy
TITLE
environment required to build an
IBM AD/Cycle
Model
Rules Model Model
automated Business Application. Goals
Applications Structure Model
Defines the overall scope of an automated
Business Application, the components of the
Resource/
application and how they fit together. Organization/
Problem
LocationModel
Business Goals Model Model
Defines the mission of the
enterprise, its long-range goals,
and the business policies and
assumptions that affect its
operations.
Business Rules Model Enterprise Entity-
Records rules that govern the Structure Relationship
operation of the business and the Model Model
Business Events that trigger
execution of Business Processes. Process Model
Data Structures Model
Defines the data structures and their
elements used in an automated
Business Application. Info Usage
Flow Model
DB2 Model Model
Refines the definition of a Relational
Value Domain
Database design to a DB2-specific
design. Model
Derivations/Constraints Model
Records the rules for deriving legal
values for instances of
Extension Derivations/
Entity-Relationship Model Global Text
Support Model Constriants
components, and for controlling the Model
Model
use or existence of E-R instance.
Enterprise Structure Model
Defines the scope of the enterprise
to be modeled. Assigns a name to the
model that serves to qualify each
component of the model. Application
Application
Entity-Relationship Model Structure
Build Model
Defines the Business Entities, their Model
properties (attributes) and the Program
relationships they have with other Elements
Business Entities. Model
IMS Structure
Extension Support Model DB2 Model
Model
Provides for tactical Information
Model extensions to support special
tool needs.
Flow Model Relational Data
Library Panel/ Screen
Specifies which of the Entity Database Test Model Structure
Model Model
Relationship Model component Model Model
instances are passed between
Process Model components.
Library Model Program Elements Model Strategy Model
Global Text Model Records the existence of Identifies the various pieces and Records business strategies to
Supports recording of extended non-repository files and the role they elements of application program resolve problems, address goals,
descriptive text for many of the play in defining and building an source that serve as input to the and take advantage of business
Information Model components. automated Business Application. application build process. opportunities. It also records
IMS Structures Model Organization/Location Model Resource/Problem Model the actions and steps to be taken.
Defines the component structures Records the organization structure Identifies the problems and needs Test Model
and elements and the application and location definitions for use in of the enterprise, the projects Identifies the various file (test
program views of an IMS Database. describing the enterprise. designed to address those needs, procedures, test cases, etc.)
Info Usage Model Panel/Screen Model and the resources required. affiliated with an automated
Specifies which of the Identifies the Panels and Screens and business Application for use in
Relational Database Model
Entity-Relationship Model the fields they contain as elements testing that application.
Describes the components of a
PRODUCED BY
component instances are used by
other Information Model
used in an automated Business
Application.
CLASSIFICATION DATE
Relational Database design in Value Domain SLIDE
Model
terms common to all SAA Defines the data characteristics
components.
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060
Process Model
Defines Business Processes, their
relational DBMSs.
EDUCATION and allowed values for
information items. 37
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved! and components.
sub processes
- 38. TITLE
Implementing Metadata Repository Functionality
• "The repository" does not have to be an integrated
solution
– it must be an easily integrateable solution
• Repository functionality (does not equal a)
repository
– metadata must easily evolve to repository solution
• Multiple repositories are not necessarily bad
– as interim solutions, Excel has been working quite well
• Minimal functionality includes ability to create,
read, update, delete, and evolve metadata items
• Remember the 1st law of data management
– In order to manage metadata, you need metadata
repository functions
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 38
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 39. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 39
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 40. Profiling
TITLE
Data Discovery Technologies
Analysis
• Data analysis software technologies deliver up to 10X
productivity over manual approaches
• Based on a powerful computing technology that allows data
engineers to quickly form candidate hypotheses with respect
to the existing data structures
• Hypotheses are then presented to the SMEs (both business
and technical) who confirm, refine, or deny them
• Allows existing data structures to be inferred at rate that is
an order of magnitude more effective than previous manual
approaches
• Pioneers include Evoke->CSI, Metagenix->Ascential->IBM,
Sypherlink
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 40
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 41. How has this been done in the past?
Old New
• Manually • Semi-automated
• Brute force • Engineered
• Repository • Repository
dependent independent
• Quality • Integrated quality
indifferent • Repeatable
• Not repeatable • Currency
• Accuracy
41 - datablueprint.com 11/15/2012 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 42. TITLE
Select an Attribute to
get a list of values
PRODUCED BY Double-click a value to
CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 see rows with that value
EDUCATION 42
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 43. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 43
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 44. TITLE
Data Quality Engineering Tools
4 categories of Principal tools:
activities: 1) Data Profiling
1) Analysis 2) Parsing and
2) Cleansing Standardization
3) Enhancement 3) Data Transformation
4) Monitoring 4) Identity Resolution and
Matching
5) Enhancement
6) Reporting
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 44
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 45. TITLE
DQ Tools:
DQ Tools:
(2) Parsing &
(1) Data Profiling Standardization
• Need to be able to distinguish • Data parsing tools enable
between good and bad data the definition of patterns that
before making any feed into a rules engine
improvements used to distinguish between
• Data profiling is a set of valid and invalid data values
algorithms for 2 purposes: • Actions are triggered upon
– Statistical analysis and matching a specific pattern
assessment of the data • When an invalid pattern is
quality values within a data recognized, the application
set may attempt to transform the
– Exploring relationships that invalid value into one that
exist between value meets expectations
collections within and across
data sets
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 45
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 46. TITLE
DQ Tools:
DQ Tools:
(4) Identify Resolution
(3) Data Transformation
& Matching
• Upon identification of data 2 basic approaches to matching:
errors, trigger data rules to • Deterministic
transform the flawed data – Relies on defined patterns and rules
• Perform standardization and for assigning weights and scores to
determine similarity
guide rule-based – Predictable
transformations by mapping – Only as good as anticipations of the
data values in their original rules developers
formats and patterns into a • Probabilistic
target representation – Relies on statistical techniques for
• Parsed components of a assessing the probability that any pair
of record represents the same entity
pattern are subjected to
– Not reliant on rules
rearrangement, corrections, or
– Probabilities can be refined based on
any changes as directed by the experience -> matchers can improve
rules in the knowledge base precision as more data is analyzed
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 46
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 47. TITLE
DQ Tools: DQ Tools:
(5) Enhancement (6) Reporting
Definition: Good reporting supports:
• A method for adding value to • Inspection and monitoring of
information by accumulating conformance to data quality
additional information about a base expectations
set of entities and then merging all • Monitoring performance of data
the sets of information to provide a stewards conforming to data quality
focused view SLAs
Examples of data • Workflow processing for data
quality incidents
enhancements:
• Manual oversight of data cleansing
• Time/date stamps and correction
• Auditing information Associate report results w/:
• Contextual information • Data quality measurement
• Geographic information • Metrics
• Demographic information • Activity
• Psychographic information
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 47
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 48. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 48
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 49. TITLE
Traditional Quality Life Cycle
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 49
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 50. TITLE
Data Life Cycle Model
Metadata
Creation Metadata Refinement
Data Refinement Data
Metadata Assessment
Structuring
Data
Utilization
Data Creation
Data Storage
Data
Manipulation
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 50
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 51. TITLE
Extended data life cycle model with metadata sources and uses
Starting
point Metadata Refinement
Metadata Creation
for new • Define Data Architecture • Correct Structural Defects
system • Update Implementation
• Define Data Model Structures
development
architecture
data architecture
refinements
Metadata Structuring Data Refinement
• Implement Data Model Views • Correct Data Value Defects
• Populate Data Model Views corrected • Re-store Data Values
data
data
architecture and Metadata &
data models Data Storage
data performance metadata
Data Creation facts & Data Assessment
• Create Data meanings • Assess Data Values
• Verify Data Values • Assess Metadata
shared data updated data
Starting point
for existing
Data Utilization Data Manipulation systems
• Inspect Data • Manipulate Data
• Present Data • Updata Data
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 51
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 52. TITLE
Outline
1. Data Management Overview
2. Data Management Tools Overview
3. Data Technology Architecture
4. CASE Tools
5. Repositories
6. Profiling/Discovery Tools
7. Data Quality Engineering Tools
8. Data Life Cycle
9. Other Technologies
10.Q&A
Tweeting now:
#dataed
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 52
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 53. TITLE
Other Technologies
Data Integration Definition:
• Pulling together and reconciling dispersed data for
analytic purposes that organizations have maintained in
multiple, heterogeneous systems. Data needs to be
accessed and extracted, moved and loaded, validated
and cleaned, standardized and transformed.
• Other tools include:
– Servers
– EII technologies
– Portals
– Conversion tools
Source: http://www.information-management.com
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 53
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 54. TITLE
Polling Question #2
Which is not a strategic technology trend in
2013?
a) Hybrid IT and Cloud
Computing
b) App and Cloud Computing
c) Personal Cloud
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 54
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 55. TITLE
Top 10 Strategic Tech Trends in 2013
1. Mobile device Battles- By 2013 mobile phones will overtake
PCs as the most common Web access device worldwide.
2. Mobile Applications and HTML5- For the next few years, no
single tool will be optimal for all types of mobile application so
expect to employ several.
3. Personal Cloud- The personal cloud will gradually replace the
PC as the location where individuals keep their personal content.
4. Enterprise APP Stores- Enterprises face a complex app store
future as some vendors will limit their stores to specific devices
and types of apps forcing the enterprise to deal with multiple
stores.
5. The Internet of Things- The Internet of Things (IoT) is a concept
that describes how the Internet will expand as physical items
such as consumer devices and physical assets are connected to
the Internet.
Source: http://www.gartner.com/it/page.jsp?id=2209615
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 55
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 56. TITLE
Top 10 Strategic Tech Trends in 2013
6. Hybrid IT and Cloud Computing- As staffs have been asked to do
more with less, IT departments must play multiple roles in
coordinating IT-related activities, and cloud computing is now
pushing that change to another level.
7. Strategic Big Data- Big Data is moving from a focus on individual
projects to an influence on enterprises’ strategic information
architecture.
8. Actionable Analytics- Analytics is increasingly delivered to users at
the point of action and in context.
9. In Memory Computing- In memory computing (IMC) can also
provide transformational opportunities.
10.Integrated Ecosystems- The market is undergoing a shift to more
integrated systems and ecosystems and away from loosely coupled
heterogeneous approaches.
Source: http://www.gartner.com/it/page.jsp?id=2209615
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 56
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 57. TITLE
XML Server Types: Integration, Mediation, Repository
XML Integration Server Requirements
• Traditional Integration with Existing Systems
– Message Oriented Middleware
– “EAI” Adapters
• Validation
– Using XML Schema or DTD
• Query Multiple Integration Points using XQuery
• Ease of Defining Mappings
– XML to Existing Systems
– Existing Systems Creating XML
• APIs for XML
Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 57
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 58. TITLE
XML Server Types: Integration, Mediation, Repository
XML Mediation Server Requirements
• XML Standards Based
– Ensures eXtensibility
– Changing documents / applications
– Transformation to new outputs
• Validation
– Using XML Schema or DTD
– Business Rules
• Integration with Existing Systems / Integration
Servers
• Ease of Defining Rules via GUI for Business
User
– IT Should Not Have to be Involved
Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 58
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 59. TITLE
XML Server Types: Integration, Mediation, Repository
XML Repository Server Requirements
• XML Optimization
– Document Instance
• XML Storage
– Stores Document in
Native Format
• Better performance
• Non-repudiation
– Compression
• XML Standards Support
– Faster Development XML Server Types
(Integration, Mediation, Repository)
– Ensures Extensibility
• Support Data Access Security at Node level
Adapted from Steve Hamby "Understanding XML Servers" DAMA/Metadata Conference April 2003, Orlando, FL
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 59
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 60. TITLE
Portal Options
[Adapted from Terry Lanham Designing Innovative Enterprise Portals and Implementing Them Into Your Content Strategies Lockheed
Martin’s Compelling Case Study Web Content II: Leveraging Best-of-Breed Content Strategies - San Francisco, CA 23 January 2001]
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 60
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 61. TITLE
Top Tier Demo
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 61
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 62. TITLE
Portals as a Data Quality Tool
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 62
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 63. TITLE
Meta-Matrix Integration Example
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 63
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 64. TITLE
• Data extraction and conversion software solutions for transforming
complex, unstructured data formats into XML for Enterprise
Application Integration
BizTalk
– RTF
– HTML
– HL7
– Positional (Offset-Based)
reports
– TAB-delimited and other
delimited reports
– EDI
Tamino
• Binary documents are automatically converted to a suitable text
for parsing for:
– Microsoft Word documents
– Microsoft Excel documents
– PDF documents
– COBOL programs ItemField
http://www.itemfield.com/
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 64
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 65. TITLE
More Data Management Tools
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 65
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!
- 66. TITLE
More Data Management Tools
from The DAMA Guide to the Data Management Body of Knowledge © 2009 by DAMA International
PRODUCED BY CLASSIFICATION DATE SLIDE
DATA BLUEPRINT 10124-C W. BROAD ST, GLEN ALLEN, VA 23060 EDUCATION 66
11/06/12 © Copyright this and previous years by Data Blueprint - all rights reserved!