Making sense of BI

1
Making Sense ofMaking Sense of
BUSINESSBUSINESS
INTELLIGENCEINTELLIGENCE
A Look at People, Processes, and ParadigmsA Look at People, Processes, and Paradigms
Ralph L. MartinoRalph L. Martino

2
4th
Edition
Ralph L. Martino
Table of Contents
Business Intelligence works
because of technology, but is a
success because of people!

3
Introduction 4
Background 6
Modeling the Enterprise 9
Operational Information Process Model 11
Proactive Event-Response Process Model 15
Analytical Information Process Model 22
Understanding your Information 29
Understanding your User Community 32
Mapping and Assessing Analytical Information Processes 34
Information Value-Chain 40
A look at Information Strategy 42
A look at Architectural Issues and Components 48
Information Manufacturing and Metadata 59
“Real Time” or Active Data Warehouse ETL 71
Data Quality Concepts and Processes 74
Information Planning and Project Portfolio Management 87
Organizational Culture and Change 90
Tactical Recommendations 93
Strategic Recommendations 95
The Relentless March of Technology 99
Conclusions 100

4
Introduction
Business Intelligence is many things to many people, and depending on whom you ask you
can get very different perspectives. The database engineer will talk with pride about the
number of terabytes of data or the number of queries serviced per month. The ETL staff will
talk about the efficiency of loading many gigabytes of data and regularly bettering load time
Service Level Agreements. The system administrators will speak proudly of system uptime
percentage and number of concurrent users supported by the system. Business Intelligence
managers see modeled databases organized around subject areas accessible through one or
more access tools, supporting query needs for a large number of active users.
In reality, these are necessary, but not sufficient, for Business Intelligence success. Those of
us in Business Intelligence long enough realize that its mission is not to implement
technology, but to drive the business. Technical implementation problems can take an
otherwise good BI strategy and turn it into a failure, but technical implementation perfection
cannot take a poorly conceived strategy and turn it into a success. In this book, we will start
by providing the business context and framework for understanding the role of BI and what it
will take to make it a business success. We will then look at how the BI strategy will drive the
assembly of architectural components into a BI infrastructure, and how the processes that
populate and manage this environment need to be structured.
As with all other investments that an organization makes, the expected results of investments
in data warehousing and business intelligence technology should be a direct contribution to the
bottom line, with a rate of return that meets or exceeds that of any other investment the
organization could make with that same funding. Unfortunately, with information, the
connection between investment and dollar results is often lost. Benefits are often regarded as
intangible or “fuzzy”.
The framework that I will present will assist those who are making strategy, architecture, and
investment decisions related to business intelligence to be able to make this connection. To do
this requires certain insights and certain tools. The tools we will be using here are models,
simplified views of organizations and processes that allow you to identify all of the ‘moving
parts’ and how they fit together. We will keep this at a very high level. We will be focusing
on the overall ecosystem of the forest rather than on the characteristics of the individual trees.
Understanding the big picture and the interrelationships between the major components is
critical to being able to avoid partial solutions that ignore key steps in the process. A bridge
that goes 90% of the way across a river gives you minimal value, since the process it needs to
support is enabling transport completely across a river. The key is to understand the context
when you are defining your business intelligence and data warehousing environment, and
design for that context.

5
Note that in many cases, not even your business partners really understand the context of
business intelligence. You can ask five different people, and they will give you five different
views of why the data warehouse exists, how it is used, and how it should be structured. This
is not reflective of the fact that information usage is random. It is reflective of the fact that
these individuals have different roles in systematized information-based processes, with
dramatically different needs. To understand business intelligence, we must delve into the roles
that these individuals play in the overall process, and look at how each interacts with
information in his own way. Hence, the information environment cannot be designed to be a
‘one-size-fits-all’ information source, but rather must conform to the diverse needs of different
individuals and facilitate their roles and activities.

6
Background
As Euclid did when he constructed his framework for geometry, we will build on some
fundamental premises. Let’s start with the very basics:
Data warehousing and business intelligence technologies are enablers. Putting data into
normalized tables that are accessible using some tool does not in and of itself create any value.
It must be accessed and utilized by the business community in order to create value. They
must not only use it, but use it effectively and successfully. This is similar to any other tool.
Phones only produce value when a person is able to successfully contact and communicate
with another intended person. Televisions only produce value when the content that is
delivered provides entertainment or useful information. Automobiles only produce value
when individuals and/or items are transported from one location to an intended second
location. The value is in the results, and not in the entity itself.
The deliverables or activities of an individual taken in isolation would create no more value
than a solitary brick outside of the context of a brick wall. In the context of an overarching
process, a series of data extraction/manipulation activities, analyses, and decisions together
have purpose and meaning, and can ultimately impact how the business operates and generate
incremental long-term value for the enterprise. Processes are unique to individual businesses,
and their efficiency and effectiveness are important determinants of the overall organizational
success. The complete set of business processes defines an organization, and is reflective of
its underlying character and culture.
Data Warehousing and Business Intelligence technology
by itself does not produce business value. Business
information users produce value, with the technology as
a tool and enabler that facilitates this.
People do not produce value in isolation - overarching
information processes are the vehicles through which
their activities and deliverables find meaning and context
and ultimately create value.

7
How an organization operates is based upon a spider web of dependencies, some of which are
almost chicken/egg types of recursive causality. Business intelligence is just one of these
interdependent pieces. As a result, even business intelligence processes must be viewed in
context of the broader organization, and can only be changed and enhanced to the extent that
the connection points that join this with other processes can be changed.
Information culture is a major determining factor as to the manner in which processes evolve.
A conservative culture is more prone to stepwise improvements, applying technology and
automation to try to do things in a similar fashion but somewhat faster and better. A dynamic
culture is more prone to adapt new paradigms and reengineer processes to truly take advantage
of the latest technologies and paradigms. Process-focused cultures, where methodologies such
as Six Sigma are promoted and engrained into the mindsets of the employees, are more likely
to understand and appreciate the bigger picture of information processes and be more inclined
to use that paradigm for analyzing and improving their BI deployments.
Other factors related to cultural paradigms include governance and decision-making
paradigms, which will direct how people will need to work together and interact with
information. Even cultural issues such as how employees are evaluated and rewarded will
impact how much risk an employee is willing to take.
Operational paradigms of the organization relate to how it manages itself internally, plus how
it interfaces with suppliers, partners, and customers. What types of channels are used? What
processes are automated versus manual? What processes are real-time versus batch? While
these issues may not impact an organization’s intentions or interests relative to BI deployment,
they will impact the connection between BI and decision deployment points, and will impact
the breadth and effectiveness of potential decisioning applications.
As with any other systematized series of interactions, information processes have a tendency
to reach a stable equilibrium over time. This is not necessarily an optimal state, but a state in
which the forces pushing towards change are insufficient to overcome the process’s inherent
inertia. Forces of change may come from two different sources – organizational needs for
change to improve effectiveness or achieve new organizational goals, and a change in
underlying tools and infrastructure which enables new process enhancements and efficiencies.
Processes are designed and/or evolve in the context of
organizational/operational paradigms, standards, and
culture, and adapt to whatever underlying
infrastructure of tools and technology is in place.

8
Ability to control and manage process change is critical for an enterprise to be able to thrive in
a constantly changing competitive environment, and is a key determinant of success in the
development and deployment of Business Intelligence initiatives.
In this book we will look together at the big picture of business intelligence and how it fits in
the context of the overall enterprise. We will do this by focusing on the underpinnings of
business intelligence: people, processes, and paradigms.

9
Modeling the Enterprise
In our quest to define the context for Business Intelligence, we need to start at the top. The
first thing we will do is come up with an extremely simplified model of the enterprise. If you
reduce the enterprise to its essence, you wind up with three flows: funds, product, and
information. These flows are embodied in the three levels of this diagram:
Flow of funds relates to the collection and disbursement of cash. Cash flows out to purchase
resources, infrastructure, and raw materials to support production and distribution, and flows in
as customers pay for products and services. Processes in this category include payables and
receivables, payroll, and activities to acquire and secure funding for operations.
Development and production activities physically acquire and transport raw materials, and
assemble them into the finished product that is distributed to customers. For a financial
institution, it would consist of acquiring the funding for credit products and supporting the
infrastructure that executes fulfillment, transaction processing, and repayment.
Marketing and distribution activities consist of all activities needed to get product into the
hands of customers. It includes the identification of potential customers, the packaging of the
value proposition, the dissemination of information, and the delivery of the product to the
Development
and
Production
Activities
Products,
Services,
Experiences
Marketing
and
Distribution
Activities
Financial Control Processes
- Flow of funds -
Information Processes
- Flow of Data -
Customers

10
customer. For credit cards, it includes everything from product definition and pricing, to direct
mail to acquire customers, to assessing credit applications, to delivering the physical plastic.
In addition, it includes any post-sales activities needed to support the product and maintain
customer relationships.
Shown on the bottom, since these are the foundation for all other processes, are information
processes. These processes represent the capture, manipulation, transport, and usage of all data
throughout the enterprise, whether through computers or on paper. This data supports all other
activities, directing workflow and decisions, and enables all types of inward and outward
communication, including mandatory financial and regulatory reporting.
Of course, in the enterprise there are not three distinct parallel threads – information, financial,
and production/marketing processes are generally tightly integrated into complete business
processes. For example, in the business process of completing a sale, there is a flow of funds
component, flow of product component, and flow of information component, all working
together to achieve a business objective.
Our focus here will be on information processes. We will look at how they interact with other
processes, how they generate value, and how they are structured. We will start at the highest
level, where information processes are subdivided into two broad categories, operational
information processes and analytical information processes.

11
Operational Information Process Model
In its essential form, the operational information process can be modeled as follows:
Let’s break this out section by section. An organization is essentially a collection of
operational business processes that control the flow of information and product. What I
consider to be the intellectual capital of the enterprise, the distinguishing factor that separates it
from its competitors and drives its long-term success, is the set of business rules under which it
operates. Business rules drive its response to data, and dynamically control its workflows
based on data contents and changes. Business rules may be physically embodied in computer
code, procedures manuals, business rules repositories, or a person’s head. Wherever they are
located, they are applied either automatically or manually to interpret data and drive action.
The life-blood of any operational process is, of course, the data. I have broken this out into
two distinct data categories. The first describes all entities that the enterprise interacts with.
Business
Rules
Entities/
Status
Events
Operational
Data
Foundational Processes
• Product development/Pricing
• Capacity/infrastructure Planning
• Marketing strategy planning
Reactive Processes
• Dynamic pricing/discounts
• Customer service support
• Collections decisioning
Proactive Processes
• Sales/Customer Mgmt
• Production/Inventory Management
Operational Information
Processes

12
This could be their products, suppliers, customers, employees, or contracts. Included in this
description is the status of the entity relative to its relationship with the enterprise.
When the status of an entity changes, or an interaction occurs between the entity and the
enterprise, this constitutes an event. Events are significant because the enterprise must respond
to each event that occurs. Note that a response does not necessarily imply action – a response
could be intentional inaction. However, each time an event occurs, the enterprise must capture
it, interpret it, and determine what to do in a timeframe that is meaningful for that event.
There are certain organizational processes that must execute on a regular basis, being driven by
timing and need. These are the foundational processes. Include in this are the development of
strategy, the planning of new products and services, the planning of capacities and
infrastructure. These processes keep the organization running.
In the operational information process model, there are two distinct scenarios for responding to
events. The first consists of what I refer to as reactive processes. A reactive process is when
the event itself calls for a response. It can be as simple as a purchase transaction, where money
is exchanged for a product or service. A more complex example from the financial services
industry could be when a credit card customer calls customer service and requests that his
interest rate be lowered to a certain level. The enterprise must have a process for making the
appropriate decision: whether to maintain the customer interest rate, lower it to what the
customer requests, or reduce it to some intermediate level.
Whatever decision is made, it will have long-term profitability implications. By reducing the
interest rate, total revenue for that customer is reduced, thereby lowering the profitability and
net present value of that customer relationship. However, by not lowering the rate, the
enterprise is risking the total loss of that customer to competitors. By leveraging profitability
and behavioral information in conjunction with optimized business rules, a decision will be
made that hopefully maximizes the probabilistic outcome of the event/response transaction.
The second type of event-response process is what I call a proactive process. The distinction
between proactive and reactive processes is the nature of the triggering event. In a proactive
process, the event being responded to does not necessarily have any profound significance in
and of itself. However, through modeling and analysis it has been statistically identified as an
event precursor, which heralds the probable occurrence of a future event. Identifying that
event precursor gives the enterprise the opportunity to either take advantage of a positive future
event or to mitigate the impact of a negative future event.
For example, a credit card behavioral model has identified a change in a customer’s behavior
that indicates a significant probability of a future delinquency and charge-off. With this
knowledge, the company can take pre-emptive action to reduce its exposure to loss. It could
actually contact the customer to discuss the behaviors, it could do an automatic credit line
decrease, it could put the customer into a higher interest rate category. The action selected
would hopefully result in the least negative future outcome.

13
Note that without business rules that identify these events as event precursors, no response is
possible. In addition, other factors are involved in determining the effectiveness of the event-
response transaction. The first is latency time. A customer about to charge-off his account
may be inclined to run up the balance, knowing it will not be paid back anyway. Therefore,
the faster the response, the better the outcome will be for the company. Enterprise agility and
the ability to rapidly identify and respond to events are critical success factors.
Another factor that plays a huge role in the effectiveness of an event-response is data quality.
The business rules set data thresholds for event categorization and response calculation. The
nature or magnitude of the data quality problem may be sufficient to:
 Cause the precursor event to go undetected and hence unresponded to
 Change the magnitude of the event-response to one that is sub-optimal
 Cause an action to occur which is different from what is called for
This will result in a reduction in profitability and long-term value for the organization. Small
variations may not be sufficient to change the outcome. We will later discuss process
sensitivity to data quality variations and how to assess and mitigate this.
Operational information processes are implemented primarily using application software that
collects, processes, and stores information. This may be supplemented by business rules
repositories that facilitate the storage and maintenance of business rules. In certain event-
response processes, BI tools may also be utilized. This would be in the context of collecting
and presenting information from low-latency data stores, either by looking at a full data set or
isolating exceptions. This information is presented to analysts, who assimilate the information
from these reports and apply some sort of business rules, whether documented or intuitive.
This supports tactical management functions such as short term optimization of cash flows,
staffing, and production.
Our biggest focus will be on business rules. The validity of the business rules has a direct
impact on the appropriateness and business value of event-responses. In many cases, business
rules interact and not only need to be optimized as stand-alone entities, but also within the
context of all of the other business rules. This leads to the fundamental assertion:
In other words, you could theoretically ‘solve’ this as a constrained maximization problem.
You pick a single critical organizational metric, such as shareholder value. You identify all
Given a primary organizational metric and a defined set of environmental
constraints, there is a single set of business rules that maximizes
organizational performance relative to that metric. This optimal set
changes as the environment changes.

14
constraints related to resource costing, customer behavior, competitive environment, funding
sources, etc. What this states is that there is a single combination of business rules that will
achieve the maximum value for that metric. There are several corollaries to this:
 Because of cross-impacts of business rules, you cannot achieve the optimal set by
optimizing each rule in isolation. Optimizing one rule may sub-optimize another.
 As you increase the number of business rules assessed simultaneously, the
complexity increases geometrically, becoming unwieldy very rapidly.
The unfortunate conclusion is that achieving exact optimality is a virtual impossibility,
although wisely applied analytics can get you close. Part of the art of analytics is
understanding which rules have sufficient cross-impacts that it makes sense to evaluate them
together, and which can be approximated as being independent to simplify the math. These
trade-offs are what make the human, judgmental aspects of analytics so important.
Of course, even if you were to somehow identify the optimum combination of business rules,
your work would not be done. Because the environmental constraints are continuously
changing, the business rules that optimize organizational performance will also need to change
accordingly. Optimization is a continuous process, not an event.

15
Proactive Event-Response Process Model
Because most reactive information processes are handled by production applications and are
therefore not as interesting from a BI perspective, I would like to spend a little additional time
discussing proactive event-response processes. These are often referred to by industry thought
leaders as real-time information processes. Unfortunately, the perception most people have of
real-time misses the real point. Most people think of real-time in technological terms,
assuming it is synonymous with immediate access to events and data changes as they occur.
They associate it with a series of architectural components:
 Messaging allows events to be captured from applications/processes as they occur.
 Immediate access to live data allows quick decisions to be made and actions to be
taken.
 Continuously operating engines for event capture, analysis, and response ensure
quick turnaround.
However, the true essence of real-time, from a purely business perspective, is very different:
Using this definition, real time often involves but no longer necessitates instantaneous
responses, nor is the focus around a specific technology set. Real-time now can be looked at in
purely business terms. Since we are now talking about optimizing business value, the
underlying issue becomes the maximization of net profitability, which is driven by its cost and
revenue components:
 The costs of integrating the information sources needed to make an optimal response
to the event, which are dependent on the underlying infrastructure, application
software, and architecture.
 The revenue produced through the implementation of that response, which are
dependent on the nature and distribution frequency of different possible event-
response outcomes.
Both costs and revenues are fairly complex. To facilitate analyzing and optimizing proactive
information processes, I have come up with some simple models. First, let’s break the
proactive event-response process out into a series of basic steps:
Real-time refers to the ability to respond to an event, change,
or need in a timeframe that optimizes business value.

16
As you can see from this timeline, the event-response window begins with the occurrence of a
trigger event, which has been identified as a precursor to a future, predicted event. The event-
response window closes at the time that the future event is predicted to occur, since at that
point, you can no longer take any action that can impact the event or its results. Let us look
individually at these process components.
Trigger event is detected and recorded:
After a trigger event occurs, data describing this event must be generated and stored
somewhere. Event detection is when the knowledge that this trigger event has occurred
is available outside of the context of the operational application that captured it and
becomes commonly available knowledge. This may happen because an event record is
placed on a common data backbone or bus, or it may happen because an output record
from that application is ultimately written out to an operational data store for common
usage. In some cases, significant processing must be done to actually detect an event.
Because of limitations in the source systems for the data needed, it is possible that deltas
will have to be computed (differences in data value between two points in time) to
actually detect an event.
Event is determined to be significant:
Out of all the events for which a record is captured, only a small number of them will
actually have significance in terms of foretelling future events. A mechanism must be
in place to do preliminary filtering of these events, so that just the small subset of events
with the highest probability of having meaningful consequences are kept. Note that at
this stage, without any contextual information, it is difficult to ascertain significance of
an event with any accuracy, but at least a massive cutdown of the volume of events to
be further examined can occur.
Context is assembled for analysis:
While an individual event or piece of data by itself is not necessarily a reliable predictor
of an event, it does indicate a possibility that a certain scenario will exist that is a
precursor to that event. The scenario consists of the fact that that event occurred, plus a
complementary series of prior events and conditions that in total comprise a precursor
Precursor
(trigger)
event
takes
place
Predicted
future
event
Duration of event-response window based on
probabilistic lag between precursor and
predicted event
Trigger
event is
detected
and
recorded
Event is
determined
to be
significant
Context is
assembled
for analysis
Future
event is
predicted
and
required
action is
determined
Action
is
initiated
Results of
action are
manifested

17
scenario. Once that single individual piece of the picture, the trigger event, is detected,
the data elements that comprise the remaining pieces must be pulled together for
complete evaluation within the context of a statistical model.
Future event is predicted and required action is determined
After all data is assembled it is run through the predictive model, generating probability
scores for one or more events. Depending on where these scores fall relative to
prescribed predictive thresholds, they will either be reflective of a non-predictive
scenario that does not require further action, or else will predict a future event and
prescribe an appropriate action to influence the future outcome.
Action is initiated:
All actions must be initiated in the operational world, via an appropriate system and/or
channel. Actions may include pricing updates, inventory orders, customer contacts, or
production adjustments. Actions may either be implemented:
Manually - a list is generated and a human must act upon that list in
order for any action to take place.
Automatically - data is transmitted to appropriate systems via automated
interfaces, with control reports for human validation. A
person must intervene for an action not to take place.
Results of action are manifested:
After an action is initiated, there will be some lag up until the time that it is actually
manifested. Actions are manifested when there is an interaction between the enterprise
and the person or entity being acted upon. For example, if the predicted event is the
customer’s need for a specific product and the action is to make an offer to a customer
to try to cross-sell a new product, the action manifestation is when the person receives
the offer in the mail and handles that piece of mail. If the predicted event is running out
of inventory and the action is to place an order for additional inventory, the action
manifestation is when the additional inventory is actually delivered.
As with any other process, the designer of the process has numerous decision points. Each
individual step in the process has a specific duration. This duration may be adjusted based on
process design, what types of software and infrastructure are involved, how much computing
resource is available, what type of staffing is assigned, etc. By understanding that the critical
time to consider is the full process cycle from trigger event occurrence to action manifestation,
and not just event detection, it is then apparent that trade-offs can be made as you allocate your
investment across the various response-process components. A certain level of investment
may cut event detection time by a few hours, but the same investment may accelerate action
initiation by a day or action manifestation by 2 days.

18
Note that while your event-response process is probably fairly predictable and should complete
in a specified amount of time with a fairly small variance, there is probably a much wider
variance in the size of the event-response window:
There should be just the right amount of time between the action manifestation and the mean
event occurrence. There must be sufficient lead time between action manifestation and the
mean event occurrence time that will provide adequate response time for a behavioral change
to occur. However, if action manifestation occurs too soon, you may wind up sending a
message before it actually has relevance for the recipient, thus reducing its impact, or you risk
spending money unnecessarily on compressing the process. To summarize the relationship
between action manifestation and predicted event occurrence:
• You gain benefit when your action manifests itself with enough lead time relative to
the predicted event to have the intended impact.
- For customer management processes, it also requires an appropriate and
receptive customer in order for value to be generated. Actions that do not
produce a response generate no value.
- Revenue reductions occur when costly offers are accepted by inappropriate
customers, thereby costing money without generating a return. If a credit
card company reduces the interest rate on an unprofitable customer to avert
probable attrition, this constitutes an action on an inappropriate customer.
They not only lose by keeping an unprofitable customer, they compound their
losses by further reducing interest revenue.
• Net gains must provide an appropriate return relative to development, infrastructure,
and operational expenses.
Time relative to Precursor Event
Frequency
Action
M
anifestation
Mean
event
occurrence
When predicting an event, a
certain percentage of the time it
will not actually happen. The
remaining time, it will occur
according to a certain probability
distribution. The action
manifestation must occur prior to
the predicted event, and with
sufficient lead time to allow for a
change in behavior to occur.
Time relative to Precursor Event
Frequency
Action
M
anifestation
Mean
event
occurrence
When predicting an event, a
certain percentage of the time it
will not actually happen. The
remaining time, it will occur
according to a certain probability
distribution. The action
manifestation must occur prior to
the predicted event, and with
sufficient lead time to allow for a
change in behavior to occur.

19
Environmental and process-related factors that will determine how effective your event-
response processes are and how much value they generate include:
• Operational effectiveness will determine how efficiently you can detect and respond
to events.
– Rapid and accurate execution of your operational processes
– High quality data being input into the process
– Efficient data interfaces and transfer mechanisms
• Quality of Analytics will determine the effectiveness of the business rules used to
drive your operational processes and how optimal your responses are.
– Accuracy of prediction: maximizing the probability that the condition you are
predicting actually occurs, thereby reducing “false positives” where you take
expensive action that is not needed.
– Accuracy of timing: narrowing the variance of the timing of the predicted
event, so that the action occurs with sufficient lead time to allow behavior
change to take place, but not so far in advance as to be irrelevant and
ineffective.
Because of the tradeoffs that need to be made, there is more involved in the model
development process than just producing a single deliverable. A wide range of predictive
models could be developed for the same usage, with varying input data and data latency, and
whose outputs have different statistical characteristics (accuracy of prediction and accuracy of
timing). Implementation and operational costs will vary for these. Optimization requires an
iterative development process, which generates and analyzes potential alternatives:
• Utilize statistical modeling to analyze a series of potential data input scenarios,
comparing the predictive precision of each scenario.
• Derive cost curve by looking at development/operational expense associated with
each scenario.
• Depending on the predictive accuracy of the models and on the timing relationship
between the original precursor event and the predicted event, the success of the
action will vary. Utilize this information for varying times to derive benefit curve.
You will find some general characteristics in your curves. In general, the further you try to
reduce latency and decrease response lag, the higher the cost. More data elements from more
diverse sources can also drive increased costs. Some sources are more expensive than others,
and this needs to be considered. At the same time, benefits will vary according to the
statistical predictive effectiveness of different model scenarios. Benefit also decreases based
on response lag, approaching zero as you near the predicted mean event time. The goal is to
identify the point where net return is maximized.

20
Graphically, it looks something like this:
Essentially, what this says is that the most robust and elaborate solution may not be the one
that is most cost effective, and that the most important thing is to match the solution to the
dynamics of the decision process being supported.
Some interesting examples of proactive event response processes come from the financial
services industry. One such example is trying to capture funds from customer windfalls. If a
customer receives a windfall payment, it will generally be deposited in his checking account.
It will sit there for a brief period of time, after which the customer will remove it to either
spend or invest. If the financial institution can detect that initial deposit, it is possible that they
could cross-sell a brokerage account, mutual fund, or other type of investment to this person.
The process will start out by looking for deposits over a specific threshold. This can be done
either by sifting through deposit records, or possibly by looking for a daily delta which shows a
large increase in balance. Once these are identified, context has to be collected for analysis.
This context could include the remainder of the banking relationship, the normal variance in
$
cost/
benefit
Decreasing data breadth,
currency, action lead time
Point where
Net Return is
maximized

21
account balances, and some demographic information. Predictive modeling has indicated that
if the customer has a low normal variance (high normal variance means that he often makes
large deposits as part of his normal transaction patterns), does not already have an investment
account, has an income of between 30k and 70k, and has low to moderate non-mortgage debt ,
he is probably a good prospect. A referral would then be sent to a sales representative from the
investment company, who would then contact him to try to secure his business.
Since modeling indicated that the money would probably be there five days before it is moved,
a response process that gets a referral out the next day and results in a customer contact by the
second day would probably have a high enough lead time. Therefore, identifying these large
deposits and identifying prospects for referrals in an overnight batch process is sufficiently
quick turnaround for this process.
Another example shows that real-time processes do not necessarily need anything close to real-
time data. This relates to cross-selling to customers who make a service call. After handling
the service call, satisfied customers are given an offer to buy a new product. The way it works
is that on a regular basis, large batch jobs compute the probable products needed (if any) for
the customer base. These are kept in a database. When the customer calls, the database is
checked to see if there is any recommended product to be sold to that customer. If so, the
customer is verified to make sure there are no new derogatories on his record (missed
payments), and he has not already purchased that product. If neither of those are true, the
customer receives the offer. Results are subsequently tracked for the purpose of fine-tuning
the process.
There are, however, processes that are totally appropriate for real-time analysis, or at minimum
a combination of real-time with batch analysis. If somebody is on your web site, the key is to
get that person directed in real time to the offer he is most likely to want. This may be an
upsell, a cross-sell, or just a sale to somebody who comes in to “browse”. Real time analysis
would be very expensive, requiring that the person’s purchase, offer, and web session history
be available in an “in memory” database to analyze. A more cost effective model might be to
do batch analysis on a regular basis to determine a person’s current “state”, which is a
categorization that is computed based on all his prior history. The combination of this current
state with the recent events (what sequence of pages got the person to where he is, what is in
his cart, how was he directed to the site, etc) would then need to be analyzed to determine what
should be offered, which is substantially less data manipulation that needs to be done while
somebody is waiting for the next page to be served up.
There are no shortage of architectural approaches to any given problem – the key will be
balancing operational effectiveness, business effectiveness, implementation cost, and
implementation time.

22
Analytical Information Process Model
The question then is, how do you identify this optimal set of business rules? Historically, this
has been done through intuition and anecdotal evidence. Today, staying ahead of competitors
requires that you leverage quantitative data analysis utilizing business intelligence technologies
to achieve the next level. This analysis and technology is incorporated into Analytical
Information Processes, which I define as follows:
These processes are focused around understanding patterns, trends, and meaning imbedded
within the data. Even more importantly, they are oriented towards utilizing this understanding
as the basis for action, which in this context is the generation of new and enhanced business
rules. Viewed from the perspective of Operational Information Processes, they would look
like this at a high level:
Business
Rules
Entities/
Status
Events
Operational
Data
Analytical
Information
Repositories
Analytical
Information
Processes
focused around
the optimization
of Business Rules
Foundational Processes
• Product development
• Capacity/Infrastructure
Planning
• Marketing strategy
planning
Reactive Processes
• Dynamic pricing/discounts
• Customer service support
• Collections decisioning
Proactive Processes
• Sales/Customer Mgmt
• Production/Inventory
Management
Operational Information
Processes
Analytical Information Processes are iterative, closed-loop, collaborative
workflows that leverage knowledge to produce new and updated business
rules. These processes consist of a prescribed series of interrelated data
manipulation and interpretation activities performed by different participants
in a logical sequence.

23
As you can see from the prior diagram, the inputs for the Analytical Information Processes are
data stored in Analytical Information Repositories. These are distinct from the operational
databases in two ways:
 They provide sufficient history to be able to pick a point in time in your data, and
have enough history going backward from there to be able to discern meaningful
patterns and enough data going forward from there to allow for outcomes to be
determined.
 They are optimized for the retrieval and integration of data into Information End
Products, which I define as the facts and context needed to make decisions, initiate
actions, or determine the next step in the process workflow.
The following diagram illustrates the true role of a BI team. Its goal is to create a system of
user-tool-data interactions that enable the creation, usage, and communication of information
end products to support effective execution of analytical information processes:
Users
Interact with
their data
through a….
Tool Suite, which includes:
Query tools
Reporting Tools,
OLAP tools
Analytical Tools
Multi-tiered Information
Environment,
consisting of:
Extreme volume, low latency
High Volume, quick access for analytics
Low volume, immediate access for
realtime decisions
Which access data
structures within a...
Analytical
Information
Processes,
or activities that together will
optimize business rules and
generate improved profitability
or competitive advantage.
Interact with each
other
to implement…

24
Part of the problem with assessing and truly understanding analytical information processes is
that these processes can be very complex, and often are ad-hoc and poorly documented.
Without a framework for simplifying, systematizing, and organizing these processes into
understandable components, they can be completely overwhelming. Faced with this
complexity, many project managers responsible for data warehouse requirements gathering
will generally just ignore the details of the business processes themselves, and focus on the
simplistic question ‘what data elements do you want?’ If you design a warehouse with the
focus on merely delivering pieces of data, and neglect to ascertain how it will be used, then
your result may be a system that is difficult, time consuming, or even impossible to use by its
intended users for its intended purpose.
Understanding the nature of information processes is therefore critical for success. If we look
closely at the type of processes that are performed that fall within the decision support realm,
we can actually notice some significant commonalities across processes. My assertion is that
virtually all analytical information processes can be decomposed into a common sequence of
sub-processes. These sub-processes have a specific set of inputs, outputs, and data
analysis/manipulation activities associated with them. This also implies that specific sub-
processes can be mapped to roles, which are performed by specific segments of the
information user community, and which require specific repository types and tools. The
Analytical Information Process Model decomposes projects into a sequence of five standard
components, or sub-processes:
Problem/Opportunity Identification
Drill-down to determine root causes
Identify/select behaviors & strategy for change
Implement strategy to induce changes
Measure behavioral changes/ assess results

25
A detailed description of each sub-process is as follows:
Sub-process 1 – Problem/Opportunity Identification
In this process component, the goal is to achieve a high-level view of the organization.
The metrics here tend to be directional, allowing overall organizational health and
performance to be assessed. In many cases, leading indicators are used to predict
future performance. The organization is viewed across actionable dimensions that will
enable executives to identify and pinpoint potential problems or opportunities. The
executives will generally look for problem cells (intersections of dimensions) where
performance anomalies have occurred or where they can see possible opportunities.
These may be exceptions or statistical outliers, or could even be reasonable results that
are just intuitively unexpected or inconsistent with other factors.
Sub-process 2 - Drill Down to Determine Root Causes
Here, analysts access more detailed information to determine the ‘why’s. This is done
by drilling into actionable components of the high level metrics at a granular level, and
examining the set of individuals comprising the populations identified in the cells
targeted for action. The end-product of this step is to discover one or more root causes
of the problems identified or opportunities for improvement, and to assess which of
these issues to address. For example, if we identify a profitability problem with holders
of a specific product, the drivers of profitability would be things like retention rates,
balances, channel usage, transaction volumes, fees/waivers, etc. By pulling together a
view of all the business drivers that contribute to a state of business, we can produce a
list of candidate business drivers that we could potentially manipulate to achieve our
desired results. Once we have collected the information on candidate business drivers,
the decision needs to be made of which to actually target. There are a number of
factors that need to be considered, including sensitivity (amount of change in the
business driver needed to affect a certain change in your performance metric), cost, and
risk factors. The output from this sub-process will be a target set of business drivers to
manipulate, a target population that they need to be manipulated for, and some high-
level hypotheses on how to do it.
Sub-process 3 - Identify/Select Behaviors & Strategy for Change
This sub-process probes into the next level, which is to understand actual set of
interacting behaviors that affect business drivers, and determine how to manipulate
those behaviors. For those who are familiar with theories of behavior, this is an
application of the ABC theory: antecedent => behavior => consequence. What this

26
means is that in order to initiate a behavior, it is first necessary to create antecedents,
or enablers of the behavior. This could include any type of communication or offers
relative to the desired behavior. To motivate the behavior, one must devise
consequences for performing and possibly for not performing the behavior. This could
include incentive pricing/punitive pricing, special rewards, upgrades, etc. Assessing
this requires complex models which predict behavioral responses, additionally taking
into account how certain actions performed on our customer base can have a series of
cascading impacts, affecting both the desired behavior and also potentially producing
other side affects. From an information perspective, this is by far the most complex and
least predictable task, and often requires deep drill into the data warehouse, sifting
through huge volumes of detailed behavioral information.
Sub-process 4 - Implement
Ability to implement is perhaps the most critical but least considered part of the entire
process. This criticality is due to the fact that the value of an action decreases as the
duration of time from the triggering event increases. This duration has two
components. The first is the analytical delay, which is the time it takes to determine
what action to take. The second is the execution delay, the time for the workflow to
occur that implements the antecedents and/or consequences required into the
operational environment
Implementation is often a complex activity, requiring not only information from the
decision support environment (data marts, data warehouse, and operational data
stores), but also processes to transport this back into the operational environment.
Because time to market is a critical consideration in being able to gain business value
from these strategies, special mechanisms may need to be developed in advance to
facilitate a rapid deployment mode for these strategies. Generally, this is a very
collaborative effort, but is focused around the ability of the information management
and application systems programming staffs to be able to execute. There could be a
wide variation in this time, depending on what needs to be done. Changes to the
programming of a production system could take months. Generation of a direct mail
campaign could take weeks. Updating a table or rules repository entry may take hours
or minutes.
Sub-process 5 - Assess Direct Results of Actions
There are two key assessments that need to be made after a tactic is implemented. The
first is whether or not it actually produced the anticipated behaviors. Generally,
behaviors are tracked in the actual impacted population plus a similarly profiled but
unimpacted control group to determine the magnitude of the behavioral change that
occurred. In addition to understanding what happened (or did not happen), it is also

27
critical to understand why. There could have been problems with execution, data
quality, or the strategy itself that caused the results to differ from the expectations. The
output from this step is essentially the capture of organizational learnings, which
hopefully will be analyzed to allow the organization to do a better job in the future of
developing and implementing strategies.
Because most business processes are cyclical, you end where you began, assessing the current
state of business to determine where you are relative to your goals.
To illustrate how this maps to specific activities, I have chosen the marketing function for a
financial institution. I have taken an illustrative set of measures and activities that occur and
showed how they map into the five sub-processes:
Let’s look at an example of how operational and analytical information processes are
interrelated. My team was involved with a project that actually had both operational and
analytical components. Our task was to access the data warehouse for the retail portion of the
Business
Performance
Management
• Financial
ratios
• Profitability
• Retention/
attrition rates
• Risk Profile
Drill-down to
root causes/
business
drivers:
• Product
utilization and
profitability
• Channel
utilization and
profitability
• Customer
relationship
measurements
• Attriter/retained
customer
profiling
• Profitable
customer
profiling
• Transaction
breakdowns
• Fees paid vs.
waived
Assess/select
behaviors to
manipulate:
Implement
strategy to
alter
behaviors:
Measure
strategy
effectiveness:
• Statistical
analysis
• Data mining
• Predictive
model
development
• What-if
analysis
• Intuitive
assessment of
information
• Direct Mail
• Telemarketing
• Feedback to
customer
contact points
• Changes in
pricing, credit
lines, service
levels
• Changes to
customer
scores,
segments, or
tags
• Measure new
and closed
account activity
• Measure
changes in
balances
• Measure
changes in
transaction
behavior
• Measure
changes in
attrition and
retention rates
• Measure
collections
rates

28
bank (demand deposits, CDs, car loans, mortgages, etc.), and pull together information on the
customer’s overall relationship. This consisted of overall profitability, breadth of relationship,
and derogatories (late payments). This information was to be ported over to the credit card
data warehouse platform, after which it would be funneled into two different directions. The
data would be shipped to the operational system used to support credit card customer service,
where it would be displayed on a screen that supports various operational information
processes. In addition, it would go into the credit card data warehouse, where it would be
accumulated over time in the analytical environment.
By moving the data into the data warehouse, it could be integrated with organizational metrics
and dimensions and used in the execution of analytical information processes. These processes
would be used to devise new or enhanced business rules, so that operational processes such
credit line management, interest rate management, customer retention, and even cross-sells,
could leverage the additional information. These business rules could either be incorporated
directly into the customer service application (via scripting), or else could be incorporated into
procedures manuals and training. As you collect more historical data, your analytical
information processes will yield continuously improved business rules. This is because of two
factors: models would work better with a longer time series of information, and you
additionally have the benefit of the feedback loop as you assess the results of the application of
prior business rules and apply those learnings.

29
Understanding your Information
All information is not created equal. Different types of information have different roles in
analytical information processes. Different roles mean that it flows through the environment
differently, is stored differently, and is used differently. At a high level, I use this taxonomy to
describe the different classes of information to capture and manage:
Performance
Metrics
Organizational
Dimensions
Actionable
Measures that
drive
performance
Behavioral
descriptors/
measures
A small set of high-level measures, generally utilized
by senior managers to evaluate and diagnose
organizational performance; in addition to current
performance indicators, leading indicators may be
included to determine performance trend.
These are standardized, actionable views of an
organization which allow managers to pinpoint the
subsets of the customers or products where there
might be performance issues
These represent the actual root causes of
performance, at a sufficiently low level that actions can
be devised which can directly affect them. Analysts
can then make the determination of which of these
measures should be targeted for improvement in their
strategies to impact the high level organizational
performance across the appropriate dimensions.
Customer behaviors related to purchases,
transactions, or requests for service link back to the
measures that drive performance. Strategy
development consists of deciding which of these
behaviors to modify, and how to do it. As behaviors
are modified, assessment must be made of both
intended and unintended consequences.
Examples include
Profitability, Customer
Retention, Risk-
Adjusted Margins,
ROA, ROI
Examples include
Business Units,
Segments, Profitability
Tiers, Collection
Buckets, geography
Examples include:
Interest income,
transaction volume,
transaction fees, late
fees, new account
sales, balances,
balance increases
Examples include:
product usage,
account purchases,
channel access,
transaction behavior,
payments made
Facts
Reflect current or prior state of an entity or its
activities/ changes. This would include
purchases, balances, etc.
Context
Frame of reference used to evaluate facts for
meaning and relevance. Includes forecasts,
trends, industry averages, etc.

30
Facts and context are descriptors that permeate all other information categories. Whether
describing metrics, business drivers, or behaviors, you would present facts about business
entities framed in context. Facts and context apply equally well whether looking at a single
cell of the organization or looking at the enterprise at the macro level.
An extremely important concept here is that of the information end-product. An information
end-product is the direct input into a decision or trigger for an action. An information end-
product may consist of metrics, business drivers, or behaviors. It will contain all needed facts
and relevant context. It will be presented in such a fashion as to be understandable and
facilitate analysis and interpretation.
It is sometimes not clear what actually constitutes an information end-product. If an analyst
gets three reports, pulls some data from each the three reports and retypes it into a spreadsheet
so that he can make sense of it and make a decision, the spreadsheet is the end-product. In a
less intuitive but equally valid example, if the analyst took those same three reports and
analyzed the data in his head, his mental integration process and resulting logical view of the
data would be the information end-product. More complex information end-products could
include a customer stratification by a behavioral score, monthly comparisons of actual metrics
with forecasts, and customer retention by product and segment. Note also that a physical
information end-product with respect to one activity/decision point may additionally be an
input in the process of preparing a downstream information end-product.
Like any other consumer product, information has value because it fulfills a need of the user
within the context of a process. It is up to the developers of this information to ensure that it
can effectively produce that value. There are several determinants of information effectiveness
that have significant impacts on the ability of an organization to utilize the information and
ultimately produce real business value. The five broad categories impacting value of
information are:
Accuracy
Accuracy generally refers to the degree to which information reflects reality. This is the
most fundamental and readily understood property of information. Either it is correct,
or it is not. For example, if you have a decimal point shifted on an account balance, or
an invalid product code, the you do not have accurate information.
Completeness
Completeness implies that all members of the specified population are included. Causes
of incomplete data might be applications that are not sourced, or processing
/transmission errors that cause records to be dropped, or data omitted from specific
data records by the source system.
Usability
Usability is a much less tangible, but much more prevalent problem. It pertains to the
appropriateness of the information for its intended purposes. There are many problems

31
with information that could negatively impact its usability. Poor information could be
introduced right at the collection point. For example, freeform address lines may make
it very difficult to use the address information for specific applications. Certain fields
may be used for multiple purposes, causing conflicts. There could be formatting
inconsistencies or coding inconsistencies introduced by the application systems. This is
especially common when similar information is directed into the warehouse from
multiple source systems. Usability problems could also arise from product codes not
defined to an appropriate level of granularity, or defined inconsistently across systems.
Usability is even impacted when data mismatches cause inconsistencies in your ability
to join tables.
Timeliness
Timeliness is the ability to make information available to its users as rapidly as
possible. This enables a business to respond as rapidly as possible to business events.
For example, knowing how transaction volumes and retention rates responded to
punitive pricing changes for excessive usage will allow you to change or abort if it is
not having the predicted effect. In addition, knowing as early as possible that a
customer has opened new accounts and is now highly profitable will enable that
customer to be upgraded to a higher service level ASAP. Timeliness of information is
achieved by effectively managing critical paths in the information delivery process.
Cost-effectiveness
As with any other expense of operation, it is critical that the cost of collecting,
processing, delivering, and using information be kept to a minimum. This is critical for
maintaining a high level of return on investment. This means being efficient, both in
ETL process operations and in process development and enhancement.
These must be appropriately managed and balanced as the BI manager devises an information
delivery architecture and strategy.

32
Understanding your User Community
Prior to even contemplating databases, tools, and training, it is critical that an understanding be
developed of the actual people who are expected to be able to utilize and generate value from a
decision support environment. Just as companies segment their customer bases to identify
homogeneous groups for which they can devise a unique and optimal servicing strategy, so too
can your business intelligence user community be segmented and serviced accordingly. Like
customers, the information users have a specific set of needs and a desire to interact with
information suppliers in a certain way.
What I have done is to come up with a simple segmentation scheme that identifies four levels
of user, broken out by role and level of technical sophistication:
Level 1
Level 2
Level 3
Level 4
Senior
Managers
and
Strategists
Business
Analysts
Information
Specialists
Predictive
modelers
and
Statisticians
Looking for a high level view of the organization. They generally
require solutions (OLAP, Dashboard, etc) which entail minimal
data manipulation skills, often viewing prepared data/analysis or
directly accessing information through simple, pre-defined
access paths.
Analysts who are more focused on the business than
technology. They can handle data that is denormalized,
summarized, and consolidated into a small number of tables
accessed with a simple tool. They are generally involved with
drilling down to find the business drivers of performance
problems. They often prepare data for strategists.
These are actual programmers who can use more sophisticated
tools and complex, generalized data structures to assemble
dispersed data into usable information. They may be involved
with assessing the underlying behaviors that impact business
drivers, in strategy implementation, and in measurement of
behaviors. They may assist in the preparation of data for
business analysts, strategists, or statisticians.
These are highly specialized analysts who can use advanced
tools to do data mining and predictive modeling. They need
access to a wide range of behavioral descriptors across
extended timeframes. Their primary role is to identify behaviors
to change to achieve business objectives, and to select
appropriate antecedents/consequences to initiate the changes.

33
Let me clarify the fact that this is a sample segmentation scheme. This specific breakout is not
as important as the fact that you must not only know and understand your users, but you must
be aware of the critical differentiation points that will direct how these users would like to and
would be able to interact with information. It is also important to remember that this must
apply to your end-state processes and not just your current state. This means that roles may be
invented that do not currently exist, and those roles must be accounted for in any segmentation
scheme.
Note that while user segmentation is very important from a planning and design perspective,
real users may not fall neatly into these well defined boxes. In reality, there is a continuum of
roles and skill levels, so be prepared to deal with a lot of gray. Many people will naturally map
into multiple segments because of the manner in which their role has evolved within the
process over time. Many of the information users that I have dealt with have the analytical
skills of a business analyst and the technical skills of an information analyst. They would feel
indignant if a person was to try to slot them in one box or the other. The key point to be made
here is that role-based segmentation will be the driving force behind the design of information
structures and BI interfaces. The important thing is that you design these for the role and not
the current person performing that role. A person performing a business analyst role should
utilize information structures and tools appropriate for that role, even if that person’s skill level
is well beyond that. This will provide much more process flexibility as people move to
different roles.
One of the biggest mistakes in developing a data warehouse is to provide a huge and complex
entanglement of information, and expect that by training hundreds of people, usage of this
monstrosity will be permeated into corporate culture and processes. When only a tiny minority
of those who were trained actually access the databases and use the information (and those
people were the ones who already had expertise in using information prior to development of
the warehouse), they then assume that this is a user information adoption problem. Their
solution - apply more resources to marketing the data warehouse and to training users.
Unfortunately, training will only get people so far. Some people do not have the natural
aptitudes and thought processes that are necessary to being successful knowledge workers. In
addition, many people have absolutely no desire to become skilled technicians with
information and tools. No amount of training and support will change this.
The key point to remember is, you are supplying a service to information users, who are your
customers. You must therefore start with the knowledge of who your customers are, what they
are capable of doing, and what they have to accomplish. You then apply this information by
delivering to them things that they need and can actually use. If you are supplying a product or
service that they either do not need, or do not have the current or potential skills and aptitudes
to actually use, there will not be adoption and the system you are building will fail. Business
Intelligence needs to adapt to the user community, and not vice-versa.

34
Mapping and Assessing Analytical Information Processes
In order to be able to evaluate and improve your analytical information processes, it is essential
that there be some way to capture and communicate these processes in a meaningful way. To
do this, I came up with the Analytical Information Process Matrix. With user segments
representing rows and sub-processes as columns, this graphical tool allows individual activities
within the process to be mapped to each cell. In the diagram below, you can see some
examples of the types of activities that might be mapped into each cell:
Although this representation is in two dimensions, it can easily be extrapolated to three
dimensions to accommodate multi-departmental processes, so that activities and participants
can be tied to their specific departments.
Managers/
Strategists
Business
Analysts
Statistical
Analysts
WHAT are the
performance
issues?
Deliver & Assess
Performance
Metrics
WHY is this
situation
occurring?
Drill-down/
research to find
root causes
HOW can we
improve
performance?
Analyze alternatives
and devise action
plan
IMPLEMENT
Action Plan!
Interface with
processes and
channels
LEARN and apply
to future
strategies.
Measure changes to
assess effectiveness
Information
Specialists
A wide range
of possible
roles exist as
you design
your closed-
loop analytical
processes.
Mine data for
opportunities
Develop and
research
hypotheses
Assess
performance
and identify
opportunities
Performance
reporting
Collect,
analyze,
and present
metrics
Collect and
Assess
Results
Transactional
and
behavioral
reporting
Data
integration
and complex
drill down
Perform
what-if
analysis
Create
transport
mechanisms
and
interfaces
Develop
behavioral
models
Develop
statistical
profiles
Select
optimal
strategy
Develop
alternative
strategies

35
To graphically map the process, the specific information activities are incorporated as boxes in
the diagram, along with arrows representing the outputs of a specific activity which are
connected as inputs to the subsequent activity. This is a very simple example:
This shows types of activities at a high level. Within each box could actually be numerous
individual information activities. The question then is: How much detail do you actually need
to do a meaningful process mapping? While more detail is definitely better, getting mired in
too much detail can lead to ‘analysis paralyses’. As long as you can identify the key
communication lines and data handoff points, you can derive meaningful benefit from a high-
level mapping. The key is to integrate this with the information taxonomy to identify the
information that corresponds with each box.
Managers/
Strategists
Business
Analysts
Statistical
Analysts
Information
Specialists
Sample
Analytical
Information
Process flow
scenario
Data Mining, Statistical analysis, and
scenario evaluation
Query/Reporting
Assess
performance
using
OLAP/Dashboard
Reporting/Data
Manipulation
Decide on
appropriate
actions
Execute complex
data
integration
Implement
Analyze
Results
WHAT are the
performance
issues?
Deliver & Assess
Performance
Metrics
WHY is this
situation
occurring?
Drill-down/
research to find
root causes
HOW can we
improve
performance?
Analyze
alternatives and
devise action
plan
IMPLEMENT
Action Plan!
Interface with
processes and
channels
LEARN and
apply to future
strategies.
Measure changes
to assess
effectiveness

36
For example, in the ‘assess performance’ box, the key is to identify the meaningful, high-level
metrics that will be used for gauging organizational health and identifying opportunities. This
should be a relatively small number of metrics, since too many metrics can lead to confusion.
If the goal is to optimize performance, taking a certain action can move different metrics
different amounts, and possibly even in opposite directions. Simplicity and clarity are
achieved by having a primary metric for the organization, with supplemental metrics that align
with various aspects of organizational performance, and leading indicators that give an idea of
what performance might look like in the future. In addition, you need standardized dimensions
across which these metrics can be viewed, which can enable you to pinpoint where within the
organization, customer base, and product portfolio there are issues.
Once you know what needs to be delivered, you then need to understand the segment that will
be accessing the information to determine how best to deliver it. The managers and strategists
who are looking at organizational performance at a high level will need to do things like
identify exceptions, drill from broader to more granular dimensions, and be able to
communicate with the analysts who will help them research problems. A specific data
architecture and tool set will be needed to support their activities.
Business analysts need to be able to drill into the actionable performance drivers that constitute
the root causes for performance issues. For example, in a Credit Card environment, risk-
adjusted margin is a critical high level metric looked at by senior management. In our
implementation, we included roughly 40 different component metrics, which are sufficiently
granular to be actionable. The components include each individual type of fee (annual, balance
consolidation, cash advance, late, over-limit), statistics on waivers and reversals, information
on rates and balances subjected to those rates, cost of funds, insurance premiums, charge-
offs/delinquencies, and rebates/rewards. By drilling into these components, changes in risk-
adjusted margin can be investigated to determine if a meaningful pattern exists that would
explain why an increase or decrease has occurred within any cell or sub-population of the total
customer base. By analyzing these metrics, analysts can narrow down the root causes of
performance issues and come up with hypotheses for correcting them. Doing this requires
more complex access mechanisms and flexible data structures, while still maintaining fairly
straightforward data access paths.
The next level down consists of measures of behavior, which is generally the realm of
statistical modelers. Because there are so many different types of behaviors, this is the most
difficult set of activities to predict and support. Behaviors include whether individual
customers pay off their whole credit card bill or just make partial payments, whether they call
customer service regularly or sporadically, whether they pay their credit card bill by the
internet or by mail, whether they use their credit cards for gas purchases or grocery purchases,
and countless other possibilities. The issue here is to determine which of these behaviors could
be changed to in order to impact the business driver(s) identified as the root causes by the
analysts. The key is to determine antecedents and consequences such that the desired change
in behavior is maximized, without incurring negative changes in other behaviors that would

37
counteract the positive change. To do this requires access to detailed behavioral data over long
periods of time.
As an example, senior management has identified a performance issue with the gasoline
rewards credit card, which has been consistently meeting profitability forecasts in the past but
has all of a sudden had a noticeable reduction in profitability. After drilling into the problem,
analysts identified the issue as decreased balances and usage combined with increased attrition
among the customers who had been the most profitable. Because this coincided with the
marketing of a new rewards product by a competitor that provided a 5% instead of a 3%
reward, the hypothesis was that some of our customers were being lured by this competitor’s
offer. Some customers were keeping their card and using it less, others were actually closing
their card accounts.
Through statistical analysis, we then needed to figure out:
 What would we need to do to keep our active, profitable customers?
 Is there anything we could do for those customers already impacted?
 Given the revenue impacts of increasing rewards, what do we offer, and to what
subset of customers, that will maximize our risk-adjusted margin.
Once the course of action was determined, next comes the implementation phase. For this,
there might be three hypothetical actions:
 Proactively interface with the direct marketing channel to offer an enhanced rewards
product to the most profitable customers impacted.
 Closely monitor transaction volumes of the next tier of customer, and use statement
stuffers to offer them temporary incentives if their volumes start decreasing.
 Identify any remaining customers who would be eligible for temporary incentives or
possible product upgrade if they contact customer service with the intention of
closing their account.
Implementation of this strategy would therefore require that information be passed to three
separate channels via direct feeds of account lists and supporting data.
Once implementation is complete, the final sub-process is monitoring. For those people who
were impacted, their behavior was tracked, and compared with control groups (meeting the
same profile but not targeted for action) to determine the changes in behavior motivated by the
offer. Based on the tracking, appropriate learnings would be captured that could assist in the
development of future response strategies.
As with any iterative process, with the next reporting cycle senior management will be able to
look at the key metrics across the dimensions of the organization, and be able to ascertain if
overall performance goals have been attained.

38
As you can see from the previous example, the process requires the involvement of various
individuals and execution of numerous information activities. In any given sub-process,
individuals may take on roles of producers or consumers of information end-products (or
sometimes both). Effective information transfer capabilities and consistent information must
be available across all process steps to ensure that collaboration among individuals and
interfaces between activities within and across sub-processes occur accurately and efficiently.
For example, when a manager specifies a cell, this corresponds to an intersection of
dimensions. These same dimensions must be understood by the analyst and must be
incorporated with the exact same meanings into whatever data structures are being used by the
analysts. When the analyst identifies business drivers that are potential problems, these need to
be communicated to the statistical analysts. In addition, if a business analyst identifies a set of
customers that looks interesting, these need to be able to be transferred directly to the statistical
analysts so that they can work with this same data set. After a strategy is developed and
implementation occurs, the same data that drives implementation must be available within the
tracking mechanisms to ensure that the correct customers are tracked and accurate learnings
occur.
As we map out these information flows, we will be looking for the following types of issues:
 Too many assembly stages can cause the process to pull together information to be
excessively labor intensive, extend information latency time, and increase likelihood of
errors being introduced. In many cases, excessive assembly stages are not due to intent,
but due to the fact that processes evolve and information deliverables take on roles that
were not initially intended. A rational look at a process can easily identify these
vestigial activities.
 Inefficient information hand-off points between stages can occur when information is
communicated on paper, or using ‘imprecise’ terminology. For an example, if a
manager communicates that there is a problem in the New York consumer loan
portfolio, it could refer to loan customers with addresses in New York, loans which
were originated in New York branches, or loans that are currently domiciled and being
serviced through New York branches. It is extremely important that precise
terminology is used to differentiate similar organizational dimensions. It is also critical
that electronic communication be used where possible, so that specific metrics and
dimensions can be unambiguously captured, and data sets can be easily passed between
different individuals for subsequent processing.
 Multiple people preparing the same information to be delivered to different users can
cause potential data inconsistencies and waste effort. This wasted effort may not be
limited to the production of data – it may also include extensive research that must be
done if two sources provide different answers to what appears to be the same question.
 Information gaps may exist where outputs from one process activity being passed to the
next do not map readily into the data being utilized in the subsequent step. This can
occur if data passes between two groups using different information repositories which

39
may include different data elements or have different data definitions for the same
element.
 Delivery that is ineffective or inconsistent with usage can cause excessive work to be
required to produce an end-product. This can occur when the data structure is too
complicated for the ability of the end user, requiring complex joins and manipulation to
produce results, or when intricate calculations need to be implemented. It can also
occur when the tool is too complicated for the intended user. An even worse
consequence is that the difficulty in data preparation may make the process vulnerable
to logic errors and data quality problems, thereby impacting the effectiveness of the
supported information processes.
In addition to promoting efficiency, understanding processes helps in one of the most daunting
tasks of the BI manager – identifying and quantifying business benefits. With the process
model, you can tie information to a set of analytical processes that optimize business rules.
You can tie the business rules to the operational processes that produce value for the
organization, and you can estimate the delta in the value of the operational processes.
Otherwise, you get Business Intelligence and Data Warehousing projects assessed for approval
based on the nebulous and non-committal “improved information” justification. The problem
with this is that you do not have any valid means of determining the relative benefits of
different BI projects (or even of comparing BI projects with non-BI projects). As a result,
projects get approved based on:
 Fictional numbers
 Who has the most political clout
 Who talks the loudest
This is definitely not the way to ensure that you maximize the business benefits of your data
warehousing and business intelligence resources. If you, as a BI manager, are going to be
evaluated based on the contribution of your team to the enterprise, it is essential that you
enforce appropriate discipline in linking projects to benefits to guarantee the highest value
projects being implemented.

40
Information Value-Chain
Many of you are familiar with the concept of the Corporate Information Factory described by
Bill Inmon, which is well known in data warehousing circles. It depicts how data flows
through a series of processes, repositories and delivery mechanisms on route to end users.
Driving these processes is the need to deliver information in a form suitable for its ultimate
purpose, which I model as an information value-chain. Rather than looking at how data flows
and is stored, the value chain depicts how value is added to source data to generate information
end-products.
Note that while traditional ‘manufacturing’ models consist only of the IT value-add steps (or
what is here referred to as architected value-add), this model looks at those components as only
providing a ‘hand-off’ to the end users. The users themselves must then take the delivered
information and expend whatever effort is needed to prepare the information end-products and
deploy them within the process. I tend to call the user value-add the ‘value gap’ because it
represents the gulf that has to be bridged in order for the business information users to be able
to perform their roles in their analytical information processes.
Raw data
from
operational
systems
Integrational
Value-add:
Consistency,
rational entity-
relationship
structure, and
accessibility
Computational
Value-add:
Metrics, scoring,
segmentation at
atomic levels
Structural
Value-add:
Aggregation,
summarization,
dimensionality
BI Tool
Value-add
Simplified
semantics,
automated/
pre-built data
interaction
capabilities,
visualization
Information
end-products
deployed in
analytical
processes
User
Value-add
Interacting with
tools/ coding of
data extract/
manipulation/
presentation
processes
Core ETL Analytical/Aggregational
Engines/Processes
User Interface
Delivery
Infrastructure
Architected
Environment Value-add
Information
Hand-off Point
Value-gap
There is a distinct value chain for the information end-products associated
with each unique information activity across your analytical processes
Human
Effort
Information Infrastructure

41
When looking at the data manipulations needed to bridge the value-gap, you will find that a
substantial value-gap is not necessarily a bad thing, just like a small value-gap is not
necessarily a good thing. It is all relative to the dynamics of the overall process, the user
segments involved, and the organizational culture and paradigms. The key to the value gap is
that it should not just ‘happen’; it needs to be planned and managed. The process of
planning and managing the value gap corresponds to your information strategy.
Once a handoff point is established that specifies how the information is supposed to look to
the end-users, you then need to determine how best to generate those information deliverables.
The way this is done is to work backwards to assess how best to partition value-add among the
four environmental categories. Note that there are numerous trade-offs that are associated with
different categories of environmental value-add. The process of assessing and managing these
trade-offs in order to define the structure of your environment corresponds to the development
of your information architecture.
An information plan is then needed to move from a strategy and architecture to
implementation. Input into the planning process consists of the complete set of data
deliverables and architectural constructs. The planning process will then partition them out
into a series of discrete projects that will ultimately achieve the desired end-state and in the
interim provide as much value early on as possible.

42
A look at Information Strategy
The key issues associated with devising and implementing an information strategy are related
to managing the value gap. This gap must be bridged by real people, whose ability to
manipulate data is constrained by their skills and aptitudes. They must access information
elements using a specific suite of tools. They will have certain individual information accesses
that they must perform repeatedly, and certain ones that are totally unpredictable that they may
do once and never again. The nature of this gap will determine how much support and training
are needed, how effectively and reliably business processes can be implemented, and even
whether specific activities can or can not be practically executed using the existing staff. By
understanding the value gap, cost-benefit decisions can be made which will direct the amount
of value that will need to be built into the pre-handoff information processes, and what is better
left to the ultimate information users.
When developing an information strategy, the first thing that needs to be documented is the
target end-state process vision. The information strategy needs to consider three sets of issues
and strike a workable balance:
Users and
Activities
Tools
Information
Tool Issues:
• In-use already vs.
acquire
• Wide vs. Narrow
Scope
• Power vs. ease of
use
• Best of breed vs.
integrated suite
User/Activity Issues:
• Training/learning
curve
• Activity/process
redesign required
• Realignment of
roles required
• Acquisition of
skilled resources
required
• Development time and cost
• Load and data availability timing
• Flexibility vs. Value Added
Information Issues

43
Based on the issues illustrated in the diagram, it is apparent that:
 Strategy development is focused around the mapping of the information end-
products associated with analytical information processes back to a set of
information deliverables corresponding to a set of information structures and
delivery technologies
 Implicit in strategy development is the resolution of cost-benefit issues surrounding
technology and systems choices
 Responsibility for strategy is shared by both business functions and IT, and has close
ties to architecture development
Once you have the target processes documented and activities identified, you will see that
strategy development is essentially the recognition and resolution of process tradeoffs. The
types of trade-offs you will have to consider will include:
 Trade-off of environmental value-add for user value-add
 Trade-off of dynamic computations within tools versus static computations within
data structures
 Trade-off of breadth of tool capabilities with ease of usage and adoption
 Trade-off of segment-focus available with multiple targeted tools versus reduced
costs associated with fewer tools
 Trade-off of development complexity with project cost and completion time
 Trade-off of ETL workload with data delivery time
The trade-off of environmental value-add with user value-add is critical to the success or
failure of a BI initiative. To start off, a complete user inventory would need to be undertaken
to segment users based on their current skill levels. This would then need to be mapped into
roles in the end-state process. This will allow you to assess:
 Current user capabilities and the degree of productivity that can be expected.
 What training, coaching, and experience are necessary to expand users from their
current skill level to where they need to be to fulfill their intended process roles.
 Critical skill gaps that cannot be filled by the existing user community.
By shifting the information hand-off point to the right, users will need less technical skill to
generate their information end-products. This would reduce the need for training and
enhancing skills through hiring. However, this potentially increases development complexity
and ETL workload, which would increase development cost and data delivery times.
Another huge issue which will impact the trade-off of environmental value-add versus user
value-add is the stability of the information end-products, which is a critical consideration for
organizations that already have a population of skilled information users. Both value-add
scenarios will have both drawbacks and benefits associated with them. The key is to balance

44
reliability and organizational leverage versus cost and expediency.
Those who have been involved with companies with a strong information culture know that
information users can be extremely resourceful. Having previously operated on the user side
and made extensive usage of fourth generation languages, it is amazing what kind of results
can be achieved by applying brute force to basic data. Since users can dynamically alter their
direction at the whim of their management, this is by far the most expedient way to get
anything done. It is also the least expensive (on a departmental level), since user development
does not carry with it the rigors of production implementation. Unfortunately, this has some
negative implications:
 Each user department must have a set of programming experts, forming
decentralized islands of skill and knowledge.
 Much is done that is repetitive across and within these islands. This is both labor
intensive and promotes inconsistency.
 Service levels across the organization are widely variable, and internal bidding wars
may erupt for highly skilled knowledge workers.
 User-developed processes may not be adequately tested or be sufficiently reliable to
have multi-million dollar decisions based on them.
 Documentation may be scant, and detailed knowledge may be limited to one or a
small number of individuals, thereby promoting high levels of dependency and
incurring significant risks.
Therefore, while expedient at the departmental level, this carries with it high costs at the
overall organizational level.
Building information value into production processes is a much more rigorous undertaking,
which carries with it its own benefits and drawbacks. It requires that much thought and effort
be expended up front in understanding user processes and anticipating their ongoing needs
over time. Therefore, this is a very deliberate process as opposed to an expedient one. It
requires significant process design, programming, and modeling work to produce the correct
information and store it appropriately in repositories for user access. It also entails risk, since
if the analysis done is poor or if the business radically changes, the value added through the
production processes may be obsolete and not useful after a short period of time, thereby never
recouping sunk costs.
However, there are also extremely positive benefits of implementing value-added processes in
a production environment.
 It reduces the value-gap that users must bridge, allowing user departments to utilize
less technically skilled individuals. This results in less need for training and
maximizes ability to leverage existing staffing.
 It increases consistency, by providing standard, high-level information building

45
blocks that can be incorporated directly into user information processes without
having to be rebuilt each time they are needed.
 It is reliable, repeatable, and controlled, thereby reducing integrity risk.
 It provides a Metadata infrastructure which captures and communicates the
definition and derivation of each data element, thus minimizing definitional
ambiguity and simplifying research.
 It can dramatically reduce resource needs across the entire organization, both human
and computing, versus the repeated independent implementation of similar processes
across multiple departments.
Note that no reasonable solution will involve either all ‘user build; or all ‘productionization’.
The key is understanding the trade-offs, and balancing the two. As you move more into
production, you increase your fixed costs. You will be doing additional development for these
information processes, and will operate those processes on an ongoing basis, expending
manpower and computing resources. The results will need to be stored, so there will be
continuing cost of a potentially large amount of DASD. When processes are built within the
user arena, costs are variable and occur only when and if the processes are actually executed.
However, these costs can quickly accumulate due to multiple areas doing the same or similar
processes, and they entail more risk of accuracy and reliability problems. This can actually
mean an even larger amount of DASD, since the same or similar data may actually be stored
repeatedly. The trade-off is that sufficient usage must be made of any production
summarizations and computations so that the total decrease in user costs and risks provides
adequate financial returns on the development and operational cost.
Depending on technologies used, data structures accessed, and complexity of algorithms,
performing the same set of calculations in a production environment versus an ad-hoc (user)
environment can take several times longer to implement, assuming implementers of equivalent
skill levels. This difference will be related to formalized requirements gathering, project
methodology compliance, metadata requirements, documentation standards, production library
management standards, and more rigorous design, testing, and data integrity management
standards.
Savings occur due to the greater ease of maintenance of production processes. Since
derivation relationships within metadata can enable you to do an impact analysis and identify
things that are impacted by upstream changes, it is much easier to keep production processes
updated as inputs change over time. Also, built-in data integrity checking can often detect
errors prior to delivery of data to user applications, thereby avoiding reruns and reducing the
probability of bad data going out. For ad-hoc processes, the author must somehow find out
about data changes in advance, or else data problems may propagate through these processes
and may not be captured until after information has already been delivered to decision makers.
In some cases, trade-offs made will impact tool and data communication expenses. If data is
delivered in a relatively complete form, it merely needs to be harvested. This generally means
that what the user will pull from the information environment is either highly summarized data

46
or a small targeted subset of the record population. In situations where the data is raw and
substantial value needs to be added by the users in order to make the information useful, large
amounts of data may need to be either downloaded to a PC or transmitted to a mid-range or a
server for further manipulation. This can dramatically impact data communications bandwidth
requirements and related costs.
For end-products that tend to change over time, consider providing to users a set of stable
components that they can dynamically assemble into their needed end-products. Changing
production processes requires significant lead time. If certain analytical metrics can potentially
change frequently, attempting to keep up with the changes could bog down your ETL
resources and not be sufficiently responsive. A lot depends on the types of changes that could
occur, since some types of change could be handled merely by making the process table or
rules driven. For changes that mandate recoding, efficiency in making changes is related to the
nature of the technology used for ETL. In many cases, the usage of automated tools can
dramatically reduce turnaround time and resources for system changes and enhancements.
Regardless, the need for flexibility must always be considered when determining what
deliverables need to be handed over to end users and in what format.
In addition to the issue of whether to do calculations in production or leave them to the user, an
even more vexing issue is how to structure data for retrieval. Complexity of access path is
often an even more impenetrable barrier to information adoption than even having to do
complex calculations. If tables are highly normalized, an extended series of table joins might
be necessary in order to pull together data needed for analysis. To simplify the user interface,
there are two alternatives. We can bury the joins into a tool interface to try to make them as
simple and transparent as possible, or else we can introduce denormalized tables. Tool-based
dynamic joins may be more flexible, but do not provide any performance benefit.
Denormalized tables provide a significant performance benefit, but at the cost of additional
DASD and the requirement of recoding ETL and restructuring databases if significant changes
occur in the business that require different views of data. Again, there will generally not be an
either/or solution, but rather a blending that takes into account which things are most stable and
which things require quickest access.
Critical decisions will need to be made with respect to tools. Tools with more power tend to be
harder to learn. In some cases, tools that are provided that are not consistent with the
corresponding user segment’s technical abilities can cause adoption resistance and ultimate
failure. There are trade-offs that need to be made with the number of tools. The more
individual tools in the suite, the more targeted they can be towards their intended segment.
However, this leads to increased support and training costs and reduced staff mobility and
interchangeability. In some cases, a single-vendor suite can be used that provides targeting of
capabilities while simplifying adoption by providing a consistent look and feel. This may
result in a compromise in functionality, since many of the best of breed individual solutions do
not have offerings that cover the complete spectrum of end-user requirements.
In a dynamic business environment, time is a critical factor. Here we need to look at time from

Making sense of BI

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Making sense of BI

Similar to Making sense of BI (20)

Recently uploaded

Recently uploaded (20)

Making sense of BI