The document discusses common pitfalls in data integration projects. It describes projects that rush into development without proper planning, creating overly customized data models. This can lead to projects taking much longer than planned and models that are never fully deployed. The document also discusses projects that build solutions without ensuring they meet business needs, resulting in solutions not being adopted. It advocates planning projects better by understanding requirements, using industry data models when possible, and ensuring solutions address real business problems. The document also stresses the importance of data quality, profiling data sources, and performing impact analysis in iterative development cycles.
InfoSphere: Leading from the Front - Accelerating Data Integration through Metadata
1. Leading from the Front
Accelerating Data Integration through Metadata
Scott Abbott
Certified IT Architect, InfoSphere Software
IBM Insight Forum 09 Make change work for you
®
2. Context
C t t
IBM Insight Forum 09
IBM Insight Forum 09
2 Make change work for you
®
®
3. Are you
e
constantly
disappointed
by your Data
Integration
I t ti
projects?
IBM Insight Forum 09 Make change work for you
®
4. Often it’s
because we
rush in
without
thinking
what we are
doing
d i
IBM Insight Forum 09 Make change work for you
®
5. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS
REFERENCE DATA “if we build it
they will come”
MASTER DATA
“The custom
data model”
“of course our “we’ll work it out
data is good” in the testing”
IBM Insight Forum 09 Make change work for you
®
6. The I f S h
Th InfoSphere Software Evolution
S ft E l ti
DataMirror Change Data
Ch D t
Capture
LAS Global Name
Enrichment
DWL
Unicorn Operational Master Data
Management
Ascential Metadata Management
SRD Transformation, Cleansing,
Trigo Profiling and metadata integration
Entity Resolution and
Product Information Analysis
Management
IBM Insight Forum 09 Make change work for you
®
8. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS
REFERENCE DATA
MASTER DATA
METADATA
IBM Insight Forum 09 Make change work for you
®
9. Pitfall
Pitf ll #1
“The C t
“Th Custom Model”
M d l”
IBM Insight Forum 09
IBM Insight Forum 09
9 Make change work for you
®
®
10. DI Pitfall #1
WAREHOUSE
1
“The custom
data model
model”
NZ Customer Experience
“who k
“ h knows our industry
i d • Project duration 24-36 mths
better than us” • Model never fully deployed
• Complex ETL feeds
destabilized ti
d t bili d entire BI system
t
“it will only take a couple of • Users bypass to get required
months” information
IBM Insight Forum 09 Make change work for you
®
11. DI Pitfall #1
Accelerator
80:20 rule (20% customization)
Months not years
Fully attributed data models across
six industries
Complete b i
C l t business t
templates f
l t for
industry KPIs
Key
Ke accelerators for migration &
integration projects
Act
A t as acceleration t
l ti templates within
l t ithi
Information Server & Cognos 8 BI
IBM Insight Forum 09 Make change work for you
®
12. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
industry
models
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS
REFERENCE DATA
MASTER DATA
Target
state
METADATA
IBM Insight Forum 09 Make change work for you
®
13. Pitfall
Pitf ll #2
if we build it
b ild
they will come..
y
IBM Insight Forum 09
IBM Insight Forum 09
13 Make change work for you
®
®
14. 14
DI Pitfall #2
REPORTS
OLAP
4
“if we build it
they will come”
“it is what the business NZ Customer Experience
asked for” • Multiple examples of BI
solutions not meeting initial
business drivers
“the users will understand •UUsers perceive new BI
i
initiatives as burdens rather
the new system” than assets
IBM Insight Forum 09 Make change work for you
®
15. 15
Missing the Point
Corporate Chi
C t Chinese Whi
Whispers
Identify High Value Monthly Report on
Customers to support Customers Revenue
Call Centre & Web breakdown
Personalization
Business Subject Matter Architects Data Developers DBAs
Users Experts Analysts
IBM Insight Forum 09 Make change work for you
®
16. 16
Bridging the Gap
relating the new to the old
l ti th t th ld
“item”
“component” ? “part”
?
IBM Insight Forum 09 Make change work for you
®
32. Understanding Your D t
U d t di Y Data
InfoSphere
Business Glossary
Captures Business Taxonomies
Captures and defines shared searchable business
glossary
Assigns stewardship to key business terms
Links business terms to technical assets
IBM Insight Forum 09 Make change work for you
®
33. InfoSphere Business Glossary
Web-based authoring, managing and
sharing of business metadata
Aligns the efforts of IT with the goals
Subject Matter Business
of the business Experts Users
Provides business context to
InfoSphere Business Gl
I f S h B i Glossary
information technology assets
Establishes responsibility and Create and manage business
vocabulary and relationships, while
accountability
y linking to physical sources
Database = DB2 GL Account
Number
Schema =
NAACCT The ten digit
account number.
Table =
Sometimes
DLYTRANS
referred to as
Technical Business
Column =
C l the
th account ID.
t ID
ACCT_NO This value is of
the form L-
data type =
FIIIIVVVV. Business View
char(11)
IBM Insight Forum 09 Make change work for you
®
34. Business Glossary Anywhere ANY
User
Real-time access to business glossary from any desktop application
Features From Any
From any desktop application, click on a term & Application..
view its business definition in a pop-up window .
without any loss of context or focus
Intelligent matching returns best candidates in a
I t lli t t hi t b t did t i
single search
Search engine for terms and categories
Access steward contact information directly
Security enforced via the Information Server
common security layer
Benefits
Increased trust and acceptance of information by
delivering definitions in context
Expanded adoption of enterprise glossary outside of
Information Platform technologies
Pop the
Improved information availability with multiple access
mechanisms for electronically stored information (ESI) Definition!
35. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1 Correct
2 3 DATA INTEGRATION
DATAMARTS
Understood
REFERENCE DATA
Data
Steward
MASTER DATA
Terms
Target
state
METADATA
IBM Insight Forum 09 Make change work for you
®
36. Pitfall
Pitf ll #3
data
d t quality
lit
IBM Insight Forum 09
IBM Insight Forum 09
36 Make change work for you
®
®
37. DI Pitfall #3
LEGACY
SOURCES
2
“of course our
data is good”
NZ Customer Experience
“the b i
“ h business owner says the h • ETL Proof of Concept
• Client assured data quality sufficient so
information we need is in there” excluded data cleansing from scope
• At end of 2wk pilot, project halted due to
unsolvable data quality issues
q y
“the schema’s show they • Many 15-20 year old systems still in
operation in NZ market
have the same keys”
IBM Insight Forum 09 Make change work for you
®
60. InfoSphere Information Analyzer
Data-centric analysis of application,
Subject Matter Data
database and file-based sources Experts Analysts
InfoSphere Information Analyzer
Secure, detailed profiling of fields,
across fields, and across sources
Analyse source data structures, and
monitor adherence to integration and
quality rules
lit l
Creation of metadata from profiling
results
Results instantly promotable across
IBM InfoSphere Information Server
Physical View
IBM Insight Forum 09 Make change work for you
®
61. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS Correct
REFERENCE DATA
Understood
Data
Steward
MASTER DATA
Terms
Target
ETL Source state
Hints State
METADATA
IBM Insight Forum 09 Make change work for you
®
62. Pitfall
Pitf ll #4
Iterative
It ti
Development
p
IBM Insight Forum 09
IBM Insight Forum 09
62 Make change work for you
®
®
63. DI Pitfall #4
3 DATA INTEGRATION
“we’ll work it out
in the testing”
NZ Customer Experience
• ETL development >75% total project $$
• Projects t ki
P j t taking 2-3x l
2 3 longer th planned
than l d
• Some clients taking 70+% of dev.time doing impact analysis
• Impact analysis methods very basic
• Largely iterative development method
• Unreliable forecast completion dates
• Low levels of trust by business in IT ability to achieve BI
outcomes
• Substantial cost overruns
• Expensive BI maintenance costs
IBM Insight Forum 09 Make change work for you
®
64. Where does the
How d I Find Out …
H do Fi d O t data for this
report come
Data Analyst
from?
…where this data comes
from?
… when the job had been
running last time?
… the details for these
assets?
IBM Insight Forum 09 Make change work for you
®
65. Pitfall
Pitf ll #4
Development
D l t
(Impact Analysis)
( p y )
IBM Insight Forum 09
IBM Insight Forum 09
65 Make change work for you
®
®
87. What is the InfoSphere Metadata Workbench?
Web-based exploration of
Information Assets generated and
g
used by Information Server
applications
Out of the box reporting on data
p g Data
Developers
Integration
I t ti
movement, data lineage, Managers
business meaning, impact of InfoSphere Metadata Workbench®
changes and dependencies Provides IT professionals with a tool for
Tracing the data lineage of exploring and understanding the assets
generated and used by the Information
Business Intelligence Reports to Server suite.
provide basis for compliance with
legislation such as S
Sarbanes-
Oxley and Basel II
88. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS Correct
REFERENCE DATA
Understood
Data
Steward
MASTER DATA
Impact Terms
Analysis
Target
ETL Source state
Hints State
METADATA
IBM Insight Forum 09 Make change work for you
®
89. Pitfall
Pitf ll #4
Development
D l t
(Iterative cycles)
( y )
IBM Insight Forum 09
IBM Insight Forum 09
89 Make change work for you
®
®
90. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS Correct
Requirements
REFERENCE DATA
Understood
ETL Code Data
Generation Steward
MASTER DATA
Impact Terms
Analysis
Target
ETL Source state
Hints State
METADATA
IBM Insight Forum 09 Make change work for you
®
91. InfoSphere FastTrack
To reduce costs of integration projects through automation
Business analysts and IT
collaborate in context to
create project specification
Leverages source analysis,
analysis
target models, and
metadata to facilitate Specification
mapping process
Auto-generation of
data transformation
j
jobs and reports
p
Auto-generates
DataStage jobs
Flexible Reporting
92. Typical Data Integration Project REPORTS
OLAP
WAREHOUSE
4
LEGACY
SOURCES 1
2 3 DATA INTEGRATION
DATAMARTS Correct
Requirements
REFERENCE DATA
Understood
ETL Code Data
Generation Steward
MASTER DATA
Impact Terms
Analysis
Target
ETL Source state
Hints State
METADATA
IBM Insight Forum 09 Make change work for you
®
93. 93
Information Server
Optimizing A li ti D
O ti i i Application Development
l t
IBM Insight Forum 09 Make change work for you
®
94. 94
IBM InfoSphere Information Server
Delivering information you can trust
Information S
I f ti Server
InfoSphere Information Services Director
InfoSphere Information Analyzer
InfoSphere Business Glossary InfoSphere Federation Server
InfoSphere QualityStage InfoSphere DataStage
InfoSphere Data Architect InfoSphere Replication Server / EVP
InfoSphere FastTrack InfoSphere Change Data Capture
InfoSphere Metadata Server
InfoSphere Metadata Workbench
IBM Insight Forum 09 Make change work for you
®
95. 95
Bringing It All Together
g g g
Business Subject Matter Architects Data Developers DBAs
Users Experts Analysts
Information Server – Common Framework
Simplify Integration Increase trust and
confidence in information
Facilitate h
F ilit t change Increase compliance to
I li t
Design Operational management & reuse standards
IBM Insight Forum 09 Make change work for you
®
96. Leading from the Front
Greater Preparation will yield dramatically lower
project costs/times
Typical Work Effort for Migration Activities
15-30% of total project budget will be spent on Migration Activities
15-30% of total p j
15 30% g p g
project budget will be spent on Migration Activities
Discover Prepare Deliver
30% 40% 30%
Understanding Cleaning, Standardising Conversion, Loading,
Source Data Harmonizing, Management Interfaces, Connectivity
This effort is the most unpredictable. The work can vary
50% Business
greatly depending on condition of data, however it is 25% Business
Coding transformations and loads.
75% Business
Largely manual effort on small
always the largest piece of work in the data initiative.
Traditionally this effort is plagued with
problems related to data quality and it
Largely manual effort on 100% of data. This can mean
percentage of data. Some manual can easily be pulled by necessity into the
dozens of persons cleaning source systems manually t
d f l i t ll to
coding can review all data . 50% IT
correct and augment data and manually aligning records 75% IT
Cleaning, Standardising and Harmonising
25% IT to MRD. Some manual coding can reduce the manual
area causing timing and budget
problems.
effort.
IBM Insight Forum 09 Make change work for you
®
97. 97
Thank
Th k you
Questions?
IBM Insight Forum 09 Make change work for you
®