SlideShare a Scribd company logo
1 of 51
Download to read offline
Business unIntelligence | 1
Business unIntelligence | 1
Table of Contents
Chapter 1: A modern trinity
1.1 Why now? Why Business unIntelligence?
1.2 What’s it all about, trinity?
1.3 Pandora’s Box of information
1.4 Process, process every where
1.5 There’s nowt so queer as folk
1.6 Architecting the biz-tech ecosystem
1.7 Conclusions
Chapter 2: The biz-tech ecosystem
2.1 The birth of the Beast
2.2 Beauty and the Beast—the biz-tech
ecosystem
2.3 Tyranny of the Beast
2.4 In practice—all change in the organization
2.5 Conclusions
Chapter 3: Data, information and the
hegemony of IT
3.1 The knowledge pyramid and the ancient
serpent of wisdom
3.2 What is this thing called data?
3.3 From information to data
3.4 The modern meaning model—m3
3.5 Database daemons & delicate data models
3.6 The importance of being information
3.7 IDEAL architecture (1): Information,
Structure / Context dimension
3.8 Metadata is two four-letter words
3.9 In practice—focusing IT on information
3.10 Conclusions
Chapter 4: Fact, fiction or fabrication
4.1 Questions and answers
4.2 Where do you come from (my lovely)?
4.3 It’s my data, and I’ll play if I want to
4.4 I’m going home… and I’m taking my data
with me
4.5 Information from beyond the Pale
4.6 Tales of sails and sales
4.7 A new model for information trust
4.8 IDEAL architecture (2): Information,
Reliance/Usage dimension
4.9 In practice—(re)building trust in data
4.10 Conclusions
Chapter 5: Data-based decision making
5.1 Turning the tables on business
5.2 The data warehouse at the end of the
universe
5.3 Business intelligence—really?
5.4 Today’s conundrum—consistency or
timeliness
5.5 Most of our assumptions have outlived
their uselessness
5.6 IDEAL architecture (3): Information,
Timeliness/Consistency dimension
5.7 Beyond the data warehouse
5.8 REAL architecture (1): Core business
information
5.9 In practice—upgrading your data
warehouse
5.10 Conclusions
Chapter 6: Death and rebirth in the
information explosion
6.1 Data deluge, information tsunami
6.2 What is big data and why bother?
6.3 Internal reality mirrors the external
6.4 A primer on big data technology
6.5 Information—the tri-domain logical model
6.6 REAL architecture (2): Pillars replace
layers
6.7 In practice—bringing big data on board
6.8 Conclusions
Chapter 7: How applications became apps
and other process peculiarities
7.1 Hunter-gatherers, farmers and
industrialists
7.2 From make and sell to sense and respond
7.3 Process is at the heart of decision making
Business unIntelligence | 2
7.4 Stability or agility (also known as SOA)
7.5 Keeping up with the fashionistas
7.6 IDEAL architecture (4), Process
7.7 REAL architecture (3), The six process-
ations
7.8 In practice—implementing process
flexibility
7.9 Conclusions
Chapter 8: Insightful decision making
8.1 Business intelligence (the first time)
8.2 Information—some recent history
8.3 Copyright or copywrong
8.4 I spy with my little eye something
beginning…
8.5 The care and grooming of content
8.6 A marriage of convenience
8.7 Knowledge management is the answer;
now, what was the question?
8.8 Models, ontologies and the Semantic Web
8.9 In practice—finally moving beyond data
8.10 Conclusions
Chapter 9: Innovation in the human and
social realm
9.1 Meaning—and the stories we tell ourselves
9.2 Rational decision making, allegedly
9.3 Insight—engaging the evolved mind
9.4 Working 9 to 5… at the MIS mill
9.5 Enter prize two dot zero
9.6 People who need people…
9.7 IDEAL architecture (5): People
9.8 In practice—introducing collaborative
decision making
9.9 Conclusions
Chapter 10: Business unIntelligence—
whither now?
10.1 IDEAL architecture (6): Summary
10.2 REAL architecture (4): Implementation
10.3 Past tense, future perfect
Business unIntelligence
Insight and innovation
beyond analytics and big data
Dr. Barry Devlin
Technics Publications, LLCTechnics Publications, LLCTechnics Publications, LLCTechnics Publications, LLC
New JerseyNew JerseyNew JerseyNew Jersey
Published by:
Technics Publications, LLCTechnics Publications, LLCTechnics Publications, LLCTechnics Publications, LLC
2 Lindsley Road
Basking Ridge, NJ 07920 U.S.A.
www.technicspub.com
Edited by Carol Lehn
Cover design by Mark Brye
All rights reserved. No part of this book may be reproduced or transmitted in any form or by
any means, electronic or mechanical, including photocopying, recording or by any information
storage and retrieval system without written permission from the publisher, except for the
inclusion of brief quotations in a review.
The author and publisher have taken care in the preparation of this book, but make no
expressed or implied warranty of any kind and assume no responsibility for errors or
omissions. No liability is assumed for incidental or consequential damages in connection with
or arising out of the use of the information or programs contained herein.
All trade and product names are trademarks, registered trademarks, or service marks of their
respective companies, and are the property of their respective holders and should be treated
as such.
Artwork used by permission. See www.BusinessunIntelligence.com for picture credits.
Copyright © 2013 by Dr. Barry Devlin
This book is printed on acid-free paper.
ISBN, print ed. 978-1-935504-56-6
ISBN, Kindle ed. 978-1-935504-57-3
ISBN, ePub ed. 978-1-935504-58-0
First Printing 2013
Library of Congress Control Number: 2013948310
ATTENTION SCHOOLS AND BUSINESSES: Technics Publications books are available at quantity
discounts with bulk purchase for educational, business, or sales promotional use. For information,
please email Steve Hoberman, President of Technics Publications, at me@stevehoberman.com.
CHAPTER 5
DataDataDataData----based decision makingbased decision makingbased decision makingbased decision making
“Most executives, many scientists, and almost all business school
graduates believe that if you analyze data, this will give
you new ideas. Unfortunately, this belief is totally wrong.
The mind can only see what it is prepared to see.”
“The purpose of computing is insight, not numbers.”
Edward de Bono
Richard W. Hamming
y the mid-1980s, corporate IT figured that they had a rea-
sonably good handle on building and running the operational
systems responsible for the day-to-day processes of increas-
ingly automated businesses. The majority of companies pro-
grammed their own systems, mostly in Cobol, computerizing the
financially important and largely repetitive operational activities,
one business activity at a time. Of course, the applications IT built
weren’t perfect and there were backlogs in development, but the
problems were understood and solutions seemingly in sight. We
will return to the illusion that the operational environment was
largely complete in Chapter 7, but for now we’ll focus on data.
Attention thus turned to a rather different business need: MIS or
decision support. We’ve already seen how MIS grew in the 1970s
through point solutions. IT saw two problems worth tackling. First,
from the business view, there was growing inconsistency among
the results they were getting. Second, the explosion of extracts
from the operational systems was causing IT serious headaches. An
integrated solution to ensure consistency and reduce extract loads
was required. And a modern technology—relational databases—
was seen as the way to do it.
In 1985, I defined an architecture for business reporting and analy-
sis in IBM (Devlin & Murphy, 1988), which became a foundation of
data warehousing. At the heart of that architecture and data ware-
housing in general, is the need for a high-quality, consistent store of
B
Business unIntelligence | 116
historically complete and accurate data. Defining and delivering it
turned out to be tougher and slower than anybody imagined. Over
the succeeding decades, the focus shifted back and forth between
these consistency goals and timeliness—another eternal business
need. The enterprise-oriented data warehouse was praised for
quality or excoriated for never-ending, over-budget projects. Data
marts were introduced for immediate business value but soon de-
rided as quick and dirty. The pendulum continues to swing. Data
warehousing soon begat business intelligence, drove advances in
data management, benefited from developments in information
technology, and is now claimed to be replaced by business analyt-
ics. But analytics and big data still focus on data rather than infor-
mation, numbers rather than insight. And even there, the increasing
role of simulation and modeling poses questions about what we are
trying to achieve.
Ackoff’s issues with MIS have not gone away. Does better and
more data drive “improved” decisions? Is de Bono wrong to say that
new thinking never emerges from analyzing data? We focus on
bigger, faster and better data and continue to dream of a single
version of the truth. Behind a superficially unchanging architecture,
we are forced to address the conundrum of consistency vs. timeli-
ness—accepting the fundamental inconsistency that characterizes
the real world and the basic inability of humans to think and decide
at the speed of light. We must reexamine the postulates at the
foundation of data warehousing and business intelligence. We find
them wanting.
With this, we reach the third and final dimension of information in
the IDEAL architecture. The evolution of data-based decision mak-
ing has reached a punctuation point. A focus on data—in its old
restricted scope—as the basis for all decision making has left busi-
ness with increasingly restrictive tools and processes in a rapidly
evolving biz-tech ecosystem. It’s time to introduce the core compo-
nents of a new logical architecture—a consolidated, harmonized
information platform (REAL)—that is the foundation for a proper
balance between stability and innovation in decision making.
117 | CHAPTER 5: Data-based decision making
5.1 Turning the tables on business
hat do you recall—or, if you’re some-
what younger than me, imagine—of
1984? Other than George Orwell’s
dystopian novel. Los Angeles hosted the Olympic
Games. In the UK, a year-long coal miners’ strike
began. The Space Shuttle Discovery made her
maiden voyage. More than 23,000 people died of
gas poisoning in the Bhopal Disaster. Terms of En-
dearment won five Oscars. The Bell System of
companies was broken up by the U.S. Justice De-
partment after 107 years. The Sony Discman was
launched. Ronald Regan won a second term as
President of the United States. The first Apple
Macintosh went on sale with 128kB of RAM and a single
floppy disk drive. I’ve focused in on this year because it was in 1984
that Teradata released the DBC/1012, the first relational database
MPP (massively parallel processing) machine aimed squarely at
DSS applications. Oracle introduced read consistency in its version
4. And 1984 also marked the full year hiatus between the an-
nouncement of DB2 for MVS by IBM and its general availability in
April, 1985. In summary, this was the year that the technology re-
quired to move MIS to a wider audience finally arrived.
Until that time, MIS were hampered by a combination of hardware
and software limitations. As John Rockart noted in a paper intro-
ducing critical success factors (CSFs), the most common approach
to providing information to executives was via a multitude of re-
ports that were by-products of routine paperwork processing sys-
tems for middle managers and, most tellingly, “where the information
subsystem is not computer-based, the reports reaching the top are often
typed versions of what a lower level feels is useful” (Rockart, 1979). My
experience in IBM Europe in 1985 was that use of the existing
APL-based system (Section 4.1) was mainly through pre-defined
reports, that business users needed significant administrative sup-
port, and that tailoring of reports for specific needs required IT
intervention. The new relational databases were seen as the way to
rejuvenate MIS. Business users were initially offered SQL (Struc-
tured Query Language), a data manipulation language first defined
W
FigureFigureFigureFigure 5555----1111::::
DBC/1012 Data
Base Computer
System, 1987
Teradata Brochure
Business unIntelligence | 118
as SEQUEL at IBM in the 1970s (Chamberlin & Boyce, 1974) and
QMF (Query Management Facility), because the language was well
defined and basic queries were seen as simple enough for business
analysts to master. Views—the output of relational queries—
provided a means to restrict user access to subsets of data or to
hide from them the need for table joins (Devlin & Murphy, 1988)
and the more arcane reaches of SQL. Simpler approaches were also
needed, and QBE (Query by Example), developed at IBM (Zloof,
1975), presaged the graphical interfaces common today.
This consensus was not driven by user requirements or technologi-
cal fit alone. Relational database vendors of the time were in need
of a market. Their early databases performed poorly vs. hierarchical
or network databases for operational applications, in many cases.
MIS, still a relatively immature market, offered an ideal opportuni-
ty. From the start, a few new vendors—notably Teradata—
optimized their technology exclusively for this space, while other
RDBMS vendors chose a more general-purpose path to support
both types of requirements. However, given their deeper under-
standing of transactional processing, these vendors were more
successful in improving their technology for operational needs,
such as update performance, transactional consistency, and work-
load management, already well understood from hierarchical or
network databases. The relational database market thus diverged
in the 1980s, with large-scale, data-driven DSS, pioneered mainly
by Teradata, and other vendors successful at small- and medium-
scale. A continued focus on transactional requirements allowed
RDBMSs to eventually overtake hierarchical and network data-
bases for new operational developments, in a classical example of
the working of a disruptive technology as defined in The Innovator’s
Dilemma (Christensen, 1997). The outcome was that ERP, SCM,
and similar enterprise-wide initiatives throughout the 1990s were
all developed in the relational paradigm, which has become the de
facto standard for operational systems since then. From the late
1990s on, the major general-purpose RDBMS vendors increased
focus on MIS function to play more strongly in this market.
Specialized multidimensional cubes and models also emerged
through the 1970s and 1980s to support the type of drill-down and
pivoting analysis of results widely favored by managers. The term
119 | CHAPTER 5: Data-based decision making
oooonlinenlinenlinenline aaaanalyticalnalyticalnalyticalnalytical pppprocessingrocessingrocessingrocessing (OLAP)(OLAP)(OLAP)(OLAP) was coined in the early 1990s
(Codd, et al., 1993) to describe this approach. It is implemented on
relational databases (ROLAP), specialized stores optimized for
multidimensional processing (MOLAP), or in hybrid systems
(HOLAP). As a memorable contrast to OLTP (online transaction
processing), the term OLAP gained widespread acceptance and
continues to be used by some vendors and analysts as a synonym
for BI, MIS or DSS to cover all informational processing.
5.2 The data warehouse at the end of the universe1
eyond the report generation culture and lack of easily used or
understood tools for exploring data that relational technolo-
gy was expected to address, another key issue that had
emerged was the multiplicity of inconsistent analytic data sets be-
ing created throughout the organization. Both business and IT
were struggling with this. Over time, the problem and the solution
became a mantra for BI: a single version of the truthsingle version of the truthsingle version of the truthsingle version of the truth.
Operational applications are optimized for particular tasks within
the functional or divisional context in which they were built. Banks
have separate systems for checking (current) accounts and credit
cards. Accounts receivable and accounts payable run on different
databases. However, they also contain overlapping data, which may
be defined differently or may be inconsistent at certain moments in
the process. Acquisitions lead to multiple systems across geograph-
ical regions doing the same tasks. Furthermore, data sourced from
the same operational system through different pathways may give
1
The Restaurant at the End of the Universe (1980) is the second book in the series
The Hitchhiker’s Guide to the Galaxy by Douglas Adams.
B
To this day, there continues to exist a tacit assumption that the cor-
rect, and perhaps only, approach to providing data for decision
making is through a separate set of data copied from the opera-
tional and other source environments. Now, knowing the business
and technical reasons for the original separation, we must surely
ask if the assumption is still valid.
Business unIntelligence | 120
differing results. Figure 5-2 is a simplified
view of a typical environment; tracing the
flow of data fragments via the various num-
bered management information systems (de-
veloped by different teams for divergent
purposes) to the business users gives some
idea of the potential for data disorientation.
All of this leaves the business with difficult
reconciliation problems—even getting a sin-
gle, consistent list of customers can be a chal-
lenge. The result is inconsistent and incorrect
business decisions. Unseemly disputes arise
in management meetings over the validity of
differing reports. IT is blamed and tasked by
irate executives to explain and fix these in-
consistencies and simultaneously required to
deliver ever more extracts for new MIS and
reporting needs. Add the inefficiency of the
same data being extracted from overworked operational systems
again and again, and the IT cup doth flow over (Devlin, 1996).
Enter the data warehouseEnter the data warehouseEnter the data warehouseEnter the data warehouse
IBM faced the same problems in its internal IT systems, and in
1985, Paul Murphy and I were tasked to define a solution. The term
data warehouse was conceived in this internal work, and based upon
it, we published the first data warehouse architecture in 1988,
shown in Figure 5-3 (Devlin & Murphy, 1988). It proposed a “Busi-
ness Data Warehouse (BDW)… [as] the single logical storehouse of all
the information used to report on the business… In relational terms, a
view / number of views that…may have been obtained from different
tables.” The BDW was largely normalized, its data reconciled and
cleansed through an integrated interface to operational systems.
Among the first things that we and other data warehouse builders
discovered was that cobbling together even a single, consistent list
of customers or products, for example, was hard work. Operational
systems that were never designed to work together didn’t. Even
when individually reliable, these systems failed to dependably de-
liver consistency. With different meanings, contexts and timings for
multiple sources, data reconciliation was expensive. The conclusion
was that operational applications could not be fully trusted; they
FigureFigureFigureFigure 5555----2222::::
A typical MIS
environment
121 | CHAPTER 5: Data-based decision making
contained data that was incomplete, often
inaccurate, and usually inconsistent across
different sources. As a result, the data
warehouse was proposed as the sole place
where a complete, accurate and consistent
view of the business could be obtained.
A second cornerstone of the architecture
was that in order to be useful to and usable
by business people, data must have a frame-
framework describing what it means, how it
is derived and used, who is responsible for
it, and so on—a business data directory. This
is none other than metadata, sourced from
operational systems’ data dictionaries and
business process definitions from business
people. Data dictionaries were components
of or add-ons to the hierarchical and net-
work databases of the time (Marco, 2000).
However, they typically contained mostly
technical metadata about the fields and
relationships in the database, supplemented by basic descriptions,
written by programmers for programmers. Making data as reliable
as possible at its source is only step one. When information about
the same topic comes from different sources, understanding the
context of its creation and the processes through which it has
passed becomes a mandatory second step.
For this process, enterprise-level modeling and enterprise-wide IT
are needed. Over the same period, the foundations of information
architecture were established (Zachman, 1987), (Evernden, 1996),
leading to the concept of enterprise data models (EDM), among
other model types. An enterprise data model has become a key
design component of data warehousing metadata, although its
definition and population has proven to be somewhat problematic
in practice. A key tenet was that the EDM should be physically in-
stantiated as fully as possible in the data warehouse to establish
agreed definitions for all information. It was also accepted that the
operational environment is too restricted by performance limita-
tions and too volatile to business change to allow instantiation of
FigureFigureFigureFigure 5555----3333::::
First data warehouse
architecture, based
on Devlin & Murphy,
1988
Business unIntelligence | 122
the EDM there. The data models of operational applications were
fragmented, incomplete and disjointed, so the data warehouse
became the only reliable source of facts.
The final driver for the data warehouse was also the reason for its
name. It was a place to store the historical data that operational
systems could not keep. In the 1980s, disk storage was expensive,
and database performance diminished rapidly as data volumes
increased. As a result, historical data was purged from these sys-
tems as soon as it was no longer required for day-to-day business
needs. Perhaps more importantly, data was regularly overwritten
as new events and transactions occurred. For example, an order for
goods progresses through many stages in its lifecycle, from provi-
sional to accepted, in production, paid, in transit, delivered and
eventually closed. These many stages (and this is a highly simplified
example) are often represented by status flags and corresponding
dates in a single record, with each new status and date overwriting
its predecessor. The data warehouse was to be the repository
where all this missing history could be stored, perhaps forever.
The data mart warsThe data mart warsThe data mart warsThe data mart wars ofofofof thethethethe 1990s1990s1990s1990s and beyand beyand beyand beyondondondond
While envisaging a single logical data repository accessed through
relational views is straightforward, its physical implementation
was—certainly in the 1990s, and for many years—another matter.
Database performance, extract/transform/load (ETL) tooling, data
administration, data distribution, project size, and other issues
quickly arose. The horizontal divisions in the BDW in Figure 5-3
became obvious architectural boundaries for implementation, and
The trustworthiness of operational data—its completeness, con-
sistency and cleanliness—although still a concern for many BI im-
plementations, is much improved since the 1980s. Enterprise data
modeling plays a stronger operational role, with enterprise-scope
ERP and SCM applications now the norm. The data warehouse is no
longer the only place where the EDM can be instantiated and popu-
lated with trustworthy data. And yet, the goal of a single version of
the truth seems to be more remote than ever, as big data brings ever
more inconsistent data to light.
123 | CHAPTER 5: Data-based decision making
vendors began to focus on the distinc-
tions between summary/enhanced data
and raw/detailed data. The former is seen
by users and typically provides obvious
and immediate business value. The latter
is, at least initially, the primary concern of
IT and delivers less obvious and immedi-
ate business value, such as data integra-
tion and quality. In contrast to the user-
unfriendly concept of a warehouse, a datadatadatadata
martmartmartmart—optimized for business users—
sounded far more attractive and inviting.
As a result, many vendors began to pro-
mote independent data martsindependent data martsindependent data martsindependent data marts in the 1990s
as largely stand-alone DSS environments
based on a variety of technologies and
sourced directly from the operational
applications.
Their attraction was largely based on the
lengthy timeframe for and high cost of
building the integrated data store, by
then called the enterprise data warehouseenterprise data warehouseenterprise data warehouseenterprise data warehouse
(EDW)(EDW)(EDW)(EDW). Business users with urgent deci-
sion-support needs were easily convinced. Architecturally, of
course, this approach was a step backwards. If the data warehouse
was conceived to reduce the number and variety of extracts from
the operational environment—often described as spaghetti—
independent data marts significantly reversed that goal. In fact,
except perhaps in terms of the technology used, such marts were
largely identical to previous copies of data for DSS.
Many data warehouse experts consider independent data marts to
be unnecessary political concessions that drain the IT budget. Ven-
dors promote them for speed and ease of delivery of business value
in a variety of guises. Data warehouse appliances, described below,
are often promoted by vendors with data mart thinking. Similar
approaches are often used to sell data analytic tools that promise
rapid delivery of reports, graphs and so on without having to “both-
er with” an enterprise data warehouse. Independent data marts
Warehouse or mart
The terms data warehouse, enterprise
data warehouse and data mart are
much confused and abused in common
parlance. For clarity, I define:
Data warehouse:Data warehouse:Data warehouse:Data warehouse: the data collection,
management and storage environment
supporting MIS and DSS.
Enterprise data warehouse (EEnterprise data warehouse (EEnterprise data warehouse (EEnterprise data warehouse (EDW):DW):DW):DW): a
detailed, cleansed, reconciled and mod-
eled store of cross-functional, historical
data as part of a data warehouse.
Data mart:Data mart:Data mart:Data mart: a set of MIS data optimized
and physically stored for the convenience
of a group of users.
In a layered data warehouse, dependent
data marts are fed from the EDW and
independent marts are discouraged.
Business unIntelligence | 124
often deliver early business value. However, they also drive medi-
um and longer term costs, both for business users, who have to deal
with incompatible results from different sources, and for IT, who
must maintain multiple data feeds and stores, and firefight exten-
sively on behalf of the business when inconsistencies arise. On the
other hand, independent data marts may be seen as complemen-
tary to an EDW strategy, allowing certain data to be made available
more quickly and in technologies other than relational databases—
a characteristic that is increasing in importance as larger and more
varied data sources are required for decision support.
Another approach, known as dependent ddependent ddependent ddependent data martsata martsata martsata marts, is to physically
instantiate subsets of the EDW, fed from the consistent raw data
there and optimized for the needs of particular sets of users. This
approach was adopted by practitioners who understood the enor-
mous value of an integrated and well-modeled set of base data and
favored centralized control of the data resource (Inmon, 1992).
From the early 1990s, many data warehouse projects attempting
to adhere to the stated data quality and management goals of the
architecture were severely limited by the performance of general
purpose databases and moved to this hybrid or layered model
(Devlin, 1996), as depicted in Figure 5-4, where dependent data
marts are sourced from the EDW and treat-
ed as an integral part of the warehouse en-
vironment. While addressing query
performance needs, as well as providing
faster development possibilities, the down-
side of this approach, however, is that it
segments the data resource both vertical-
ly—between EDW and data marts—and
horizontally—between separate marts. Fur-
thermore, it adds another ETL layer into the
architecture, with added design and mainte-
maintenance costs, as well as additional
runtime latency in populating the data
marts. However, many vendors and con-
sultants continue to promote this layered
approach to ensure query performance as a
way to isolate data in a mart, and/or shorten
development project timelines. A few ven-
FigureFigureFigureFigure 5555----4444::::
Layered DW archi-
tecture, based on
Devlin, 1996
125 | CHAPTER 5: Data-based decision making
dors, notably Teradata aided by its purpose-built parallel database,
pursued the goal of a single, integrated physical implementation of
the original architecture with significant success.
The ODSThe ODSThe ODSThe ODS and the virtual data warehouseand the virtual data warehouseand the virtual data warehouseand the virtual data warehouse
The simple and elegant layered architecture shown in Figure 5-4,
despite its continued use by vendors and implementers of data
warehouses, was further compromised as novel business needs,
technological advances, and even marketing initiatives added new
and often poorly characterized components to the mix. In the mid-
’90s, the operational data store (ODS2
) was introduced as a new
concept (Inmon, et al., 1996) integrating operational data in a sub-
ject-oriented, volatile data store, modeled along the lines of the
EDW. First positioned as part of the operational environment, it
became an integral part of many data management architectures,
supporting closer to real-time, non-historical reporting. Although
still seen regularly in the literature, the term ODS has been appro-
priated for so many different purposes that its original meaning is
often lost. Nonetheless, the concept supports an increasingly im-
portant near real-time data warehouse workload, albeit one that is
implemented more like an independent data mart and with limited
appreciation of the impact of the extra layering involved.
Another long vilified approach to data warehousing is the virtual
data warehouse—an approach that leaves all data in its original
locations and accesses it through remote queries that federate
results across multiple, diverse sources and technologies. An early
example was in IBM’s Information Warehouse Framework an-
nouncement in 1991, where EDA/SQL3
from Information Builders
Inc. (IBI) provided direct query access to remote data sources. The
approach met with fierce opposition throughout the 1990s and
early 2000s from data warehouse architects, who foresaw signifi-
cant data consistency—both in meaning and timing—problems,
security concerns, performance issues, and impacts on operational
systems. The concept re-emerged in the early 2000s as Enterprise
Information Integration (EII), based on research in schema mapping
(Haas, et al., 2005), and applied to data warehousing in IBM DB2
2
It has long amused me that ODS spoken quickly sounds like odious ☺
3
Since 2001, this technology is part of iWay Software, a subsidiary of IBI.
Business unIntelligence | 126
Information Integrator (Devlin, 2003). The approach has been re-
cently rehabilitated—now called data virtualizationdata virtualizationdata virtualizationdata virtualization, or sometimes
data federationdata federationdata federationdata federation—with an increased recognition that, while a physical
consolidation of data in a warehouse is necessary for consistency
and historical completeness, other data required for decision sup-
port can remain in its original location and be accessed remotely at
query time. Advantages include faster development using virtual-
ization for prototyping and access to non-relational and/or real-
time data values. This change in sentiment is driven by growing
business demands for (i) increased timeliness of data for decision
making and (ii) big data from multiple, high volume, and often web-
based sources (Halevy, et al., 2005). Technologically, both of these
factors militate strongly against the copy-and-store layered archi-
tecture of traditional data warehousing. In addition, the mashup,
popular in the world of Web 2.0, which promotes direct access
from PCs, tablets, etc., to data combined on the fly from multiple
data sources, is essentially another subspecies.
Gartner has also promoted the approach as a logical data warehouselogical data warehouselogical data warehouselogical data warehouse
(Edjlali, et al., 2012), although the term may suggest the old virtual
DW terminology, where physical consolidation is unnecessary. In
modern usage, however, more emphasis is placed on the need for
an underlying integration or canonical modelcanonical modelcanonical modelcanonical model to ensure consistent
communication and messaging between different components. In
fact, it has been proposed that the enterprise data model can be the
foundation for virtualization of the data warehouse as well as, or
instead of, its instantiation in a physical EDW (Johnston, 2010). By
2012, data virtualization achieved such respectability that long-
time proponents of the EDW accepted its role in data management
(Swoyer, 2012). Such acceptance has become inevitable, given the
growing intensity of the business demands for agility mentioned
above. Furthermore, technology has advanced. Virtualization tools
have matured. Operational systems have become cleaner. Imple-
mentations are becoming increasingly common (Davis & Eve,
2011) in operational, informational and mixed scenarios. Used cor-
rectly and with care, data virtualization eliminates or reduces the
need to create additional copies of data and provides integrated,
dynamic access to real-time information and big data in all its forms.
127 | CHAPTER 5: Data-based decision making
The advance of the appliancesThe advance of the appliancesThe advance of the appliancesThe advance of the appliances
The data mart wars of the 1990s were reignited early in the new
millennium under the banner of analytic / data warehouse / data-
base appliances with the launch of Netezza’s first product in 2002.
By mid-decade the category was all the rage, as most appliance
vendors, such as Netezza, DATAllegro, Greenplum, Vertica and
ParAccel (before their acquisitions) sold their appliances as typical
independent data marts. As we entered the teens, the appliance
had become mainstream as the main players were acquired by
hardware and software industry giants. More user-oriented analyt-
ic tools, such as QlikView and Tableau, can also be seen as part of
the data mart tradition supporting substantial independent data
stores on PCs and servers. All these different appliances were ena-
bled by—and further drove—a combination of advances in database
hardware and software that provided substantial performance
gains at significantly lower prices than the traditional relational
database for informational processing. These advances have oc-
curred in three key technology areas, combining to create a “per-
fect storm” in the data warehousing industry in the past decade.
Parallel processingParallel processingParallel processingParallel processing————SMP and MPPSMP and MPPSMP and MPPSMP and MPP hardwarehardwarehardwarehardware
The growth in processing power through faster clock speeds and
larger, more complex processors (scale-up) has been largely super-
seded by a growth in the number of cores per processor, proces-
sors per blade, and servers per cluster operating in parallel (scale-
out). Simplistically, there are two approaches to parallel processing.
First—and most common, from dual-core PCs all the way to IBM
System z—is symmetric multi-processing (SMP) where multiple
processors share common memory. SMP is well understood and
works well for applications from basic word processing to running a
high performance OLTP system like airline reservations. Problems
amenable to being broken up into smaller, highly independent parts
that can be simultaneously worked upon can benefit greatly from
massively parallel processing (MPP), where each processor has its
own memory and disks. Many BI and analytic procedures, as well as
supercomputer-based scientific computing, fall into this category.
MPP for data warehousing was pioneered by Teradata from its
inception, with IBM also producing an MPP edition of DB2, since
Business unIntelligence | 128
the mid-1990s. Such systems were mainly sold as top-of-the-range
data warehouses and regarded as complex and expensive. MPP has
become more common as appliance vendors combined commodity
hardware into parallel clusters and took advantage of multi-core
processors. As data volumes and analytic complexity grow, there is
an increasing drive to move BI to MPP platforms. Programming
databases to take full advantage of parallel processing is complex
but is advancing apace. Debates continue about the relative ad-
vantages of SMP, MPP and various flavors of both. However, a
higher perspective—the direction toward increasing parallelization
for higher data throughput and performance—is clear.
SolidSolidSolidSolid----sssstatetatetatetate storagestoragestoragestorage————disks and indisks and indisks and indisks and in----memorymemorymemorymemory hardwarehardwarehardwarehardware
Advances—and price reductions—in solid-state memory have al-
lowed core memory sizes to grow enormously, and allow at least
some disk storage to be replaced with solid-state devices. Larger
volumes of data can be accessed at speeds orders of magnitude
faster than possible on spinning disks. This trend splits further into
in-memory and solid-state disk (SSD) approaches. The former ena-
bles higher-speed performance, but may require redesign of the
basic hardware and software architecture of the computer. The
latter provides significant, but lower, performance gains without
re-architecting the access methods by presenting the solid-state
device as a traditional disk. BI and analytic applications, with their
need for large volumes of data, benefit greatly from this direction.
OLTP systems also benefit from increased processing speed. Be-
cause current solid-state devices are volatile and lose their data
when power drops, this technology is seen as more appropriate for
BI, where there exists another source for any data lost, as opposed
to operational systems where the risk of data loss is higher.
Because spinning disk remains substantially cheaper than solid-
state storage, and is likely to remain so for the foreseeable future,
most solid-state storage is used for data that is accessed regularly,
where the benefit is greatest. This leads to temperature-based—
hot, warm and cold data is defined on frequency of access—storage
hierarchies. Some vendors, however, opt for fully in-memory data-
bases, redesigned to take maximum advantage of the solid-state
approach, using disk only as a disaster recovery mechanism. Be-
cause solid-state stores remain significantly smaller than the larger
129 | CHAPTER 5: Data-based decision making
disks, solid-state is favored where speed is the priority, as opposed
to large data volumes, which favor disk-based approaches. The
general trend, therefore, is towards hybrid systems containing a
substantial amount of solid-state storage combined with large disk
stores. SSDs, with their hybrid design, will likely continue to bridge
fully integrated, in-memory databases and large disk drives with
high speed, mid-size, non-volatile storage.
RRRRowowowow----basedbasedbasedbased and coland coland coland columnarumnarumnarumnar ddddatabase designatabase designatabase designatabase design
In the relational model of tables consisting of rows and columns,
row-based storage—physically storing the fields of a single data-
base record sequentially on disk—was the physical design of choice
of all early relational database designers because it performed well
for the typical single record, read/write access method prevalent in
OLTP applications. Since the start of the 21st
century, relational
database and appliance designers have begun experimenting with
column-based storage—storing all the fields in each column physi-
cally together. Both types are shown in Figure 5-5. The columnar
structure is very effective in reducing query time for many types of
BI application, which are typically read-only and require only a sub-
set of the fields in each row. In disk-
based databases, the resulting re-
duction in I/O can be significant.
Columns also enable more efficient
data compression because they
contain data of the same structure.
Together, these factors enable sub-
stantial performance improvements
for typical BI queries. Again we see
a trade-off in performance. Row-
based is optimal for OLTP; colum-
nar is better suited to certain clas-
ses of BI application. Loading data
into a columnar database requires
restructuring of the incoming data,
which generally arrives in row-
based records. This and other per-
formance trade-offs between row
and column storage have led in-
creasingly towards hybrid schemes,
FigureFigureFigureFigure 5555----5555::::
Physical layout of
row-based and
columnar databases
Business unIntelligence | 130
where the DBMS decides which type or even multiple types of
storage to use for which data to optimize overall performance.
SummarySummarySummarySummary of technology advancesof technology advancesof technology advancesof technology advances
The major driver for the advances above has been the business
demand for faster query and analysis performance for ever larger
sets of data. Performance and price/performance measures for
analytic databases in TPC-H benchmarks and in quoted customer
examples over the past few years show gains of 10X to 100X—and
in some cases considerably more—over general-purpose data-
bases. Of course, performance optimization is not new in relational
databases. Indexes, materialized views, caching and specialized
table designs have been the stock in trade for 20 years now. How-
ever, these traditional approaches are often highly specific to the
data model in use and anticipated query patterns. Tuning and opti-
mization is thus a labor-intensive process that delays initial use of
the data and often requires rework as usage patterns change. And
complete ad hoc usage cannot be optimized by these means. In
contrast, the performance gains in analytic DBMSs step from fun-
damental hardware/software improvements and are model-
independent and generally applicable to all data and most analytical
query patterns. This improved performance has also allowed ven-
dors to simplify database design and management. Physical design
trade-offs are reduced. The need for indexes can be limited or re-
moved altogether, simplifying both initial and ongoing tuning and
maintenance work for database administrators, thus lowering DBA
involvement and costs over the entire lifetime of the system.
Data warehouse implementation has strayed far from the ideals of
the original architecture. The concept of a single source for DSS
was quickly overthrown by technology limitations, practi-
cal/political issues for buyers and the needs of vendors to close
sales quickly. It is often assumed that data layers are mandatory in
any large-scale warehouse implementation. However, the original
main driver for layering—query performance of relational data-
bases—has been at least partially overcome in modern technology.
Perhaps the key question posed by this history is: why do we per-
sist with the highly fragmented data structure that has evolved?
131 | CHAPTER 5: Data-based decision making
So far, much of the impact has been on traditional BI—running fast-
er queries over larger data sets. However, as we shall explore in
Section 5.7, these advances also enable new ways of thinking about
the overall operational/informational architecture that has evolved
over nearly three decades. Today, the focus is shifting to the rela-
tionship between operational and informational needs. In the fu-
ture, the emphasis will likely extend to operational processing.
Well, here’s another fineWell, here’s another fineWell, here’s another fineWell, here’s another fine4444
mess you’ve gotten me intomess you’ve gotten me intomess you’ve gotten me intomess you’ve gotten me into
The outcome has been that most data warehouse implementations
have become increasingly complex, with combinations of inde-
pendent and dependent data marts, marts fed from other marts,
and even read/write marts added to the mix. These added types
and layers of marts lead to an extensive set of ETL that is difficult to
maintain as users’ needs change. All of
this harks back to the earliest days of
decision support, when many users
made specialized copies of any data they
needed, while others—with sufficient
technical nous—dived directly into the
actual sources, irrespective of any ensu-
ing data management chaos. The result-
ing data warehouse “architecture” today,
depicted in Figure 5-6, has lost its origi-
nal simplicity and provides implementers
with little guidance on how to structure
DSS in a modern business. As the layers
and silos increase, many problems be-
come more pressing. Data duplication
leads to ever-growing levels of incon-
sistency that have to be manually recon-
ciled in the reporting process, reducing
users’ confidence in the data warehouse.
Hardware, software and labor costs
grow, and maintenance becomes ever
more complex, constraining the provi-
sion of new functionality and information
4
Ollie’s catchphrase in the Laurel and Hardy films was actually “…another nice
mess…”! See http://bit.ly/kARK1
FigureFigureFigureFigure 5555----6666::::
A modern data
warehouse
“architecture”
Business unIntelligence | 132
to meet business demands. Customer and partner interactions
suffer because of siloed and inconsistent information. And despite
demands for more timely information, the added layers actually
delay data supply to the users.
It is now mandatory to address these issues. From a business view-
point, increased competition and higher customer expectations are
driving demands that both hard and soft information from all
sources—human-sourced, machine-generated and process-
mediated, both internal and external—is integrated and internally
consistent as far as possible and necessary across the organization,
and delivered at ever increasing speed. Data warehousing as origi-
nally conceived, with its almost exclusive focus on hard information
and internally generated, process-mediated data, fails these busi-
ness demands. On the technology front, we’ve seen the advances in
databases that have changed the computing landscape. Still to
come, we will see how Service Oriented Architecture (SOA) and
mobile computing approaches are dramatically changing the data
and process structures of the operational environment, while In-
ternet technologies are redefining how users expect to interact
with all applications. Big data has added a further, and perhaps final,
twist to the story: the data volumes and velocities involved are
incompatible with a dependent data mart approach that involves
passing such data through the EDW. This, together with the new
storage technologies needed, leads to the conclusion that these
data sets can be supported only in an architecture that allows inde-
pendent data marts in addition to an EDW. All of these changes in
technology press in upon the data warehouse environment from
above and below, within and without, challenging the fundamental
assumptions upon which data warehousing was originally defined.
The data warehouse may be at the end of its own universe. Its orig-
inal raison d’être as the one, true, consistent past and present state
of the business is no longer possible nor, arguably, needed. Howev-
er, it is the only approach to data management that has even con-
sidered many of the information issues raised so far. The bottom
line is that the data warehouse architecture, as originally conceived
and eventually delivered, is in need of a radical overhaul.
133 | CHAPTER 5: Data-based decision making
5.3 Business intelligence—really?
he previous section focused on the data aspect of data-based
decision making, particularly on the preparation phase. While
this is probably the IT comfort zone in decision support, it is
also fair to say that without meaningful and consistent data, pro-
moting data-based decision making to the business is a recipe for
disaster. It takes only a single set of erroneous information for the
entire approach to be discredited in the users’ eyes. So it was that
in data warehousing, although originally defined as covering the
entire process of data-based decision support—from defining, col-
lecting and preparing the needed data to the user-facing tooling
required by business users—much of the early focus was on data
management issues. Plus ça change, plus c'est la même chose.
By the early 1990s, the data warehouse was perceived much as its
physical counterpart—a user-unfriendly place, cavernous, poorly lit
and infested with deadly fork-lift trucks. As a result, the phrase
business intellbusiness intellbusiness intellbusiness intelligenceigenceigenceigence (BI)(BI)(BI)(BI) was adopted by Gartner analyst, Howard
Dresner, when he moved from Digital Equipment Corporation
(DEC), where the phrase was in use internally from 19895
. The
term was also in use in the early 1990s in the intelligence—meaning
spying—community in the context of industrial espionage. Howev-
er, Dresner’s stated aim was to emphasize the business aspect of
data warehousing, summarized in a common definition of BI as “a
set of concepts and methods to improve business decision making by
using fact-based support systems” (Power, 2009). In practical terms,
this translated into a set of reporting and ad hoc query tools with
attractive presentation capabilities. Spreadsheets clearly meet
these criteria and are widely used to support decision making, but,
as we saw in Chapter 4, they are seen as anathema to the data
management foundation of data-based decision making.
One might argue that business intelligence is actually an oxymoron.
Those of us who’ve worked in large enterprises have seen enough
evidence to conclude that many decisions have limited relevance to
stated business goals and a shaky relationship with intelligence.
How many successful decisions have been declared as based on
5
Private communication
T
Business unIntelligence | 134
“gut feeling”? And unsuccessful ones blamed on “lack of reliable
information”? How often does political expedience override a
strongly argued rationale? How many business analysts have been
asked to “just take one more look at the figures” when the numbers
seemed to contradict the group wisdom of the boardroom? So,
what does the term really mean?
Pharaoh’s tombPharaoh’s tombPharaoh’s tombPharaoh’s tomb————tttthe BI usage pyramidhe BI usage pyramidhe BI usage pyramidhe BI usage pyramid
The meaning of BI may be best explored through its support of
decision making as it relates to business roles and organization,
depicted in the second ubiquitous pyramid in the BI literature and
shown in Figure 5-7. Classical Egyptologists identify pyramids as
the Pharaohs’ tombs; alternative historians propose an array of far
more interesting possibilities. Their original purpose remains ob-
scure. The BI usage pyramid has a simple and obvious purpose, but
is weighed down with added—and often misleading—connotations.
In its most basic form, it describes three broad levels in the organi-
zation where BI plays. The original, and still most widely practiced,
form is tactical BItactical BItactical BItactical BI. Mid-level managers and supervisors of ongoing
activities, as well as the more numerically savvy (often termed inde-
pendent) business analysts, who support them, are the target audi-
ence. The goal of tactical BI is three-fold: (i) to ensure ongoing
operational processes and their operators are running optimally, (ii)
to find and encourage speed or productivity improvements in these
processes and (iii) to investigate and fix any anomalies that may
arise from either internal or external causes. Typically operating in
a timeframe of days to weeks, tactical BI uses historical, often rec-
onciled, data sourced from operational systems through the data
warehouse environment, usually via data marts. Tactical BI is well
suited to the traditional data warehouse architecture shown in
Section 5.2 and well supported by the BI query, analysis and
reporting tools that emerged in the 1980s and beyond.
In fact, the first two goals above drove tactical BI to-
wards the current preponderance of report gener-
ation and, more recently, dashboard creation at
this level. The investigative third goal above
has, in many cases, been taken over by
spreadsheets and thus deemed not worthy
of consideration as BI by some purists.
FigureFigureFigureFigure 5555----7777::::
The BI usage
pyramid
135 | CHAPTER 5: Data-based decision making
Historically, strategic BIstrategic BIstrategic BIstrategic BI—aimed at supporting senior managers and
executives in long-term, strategic decision making—was the next
target for proponents of BI. This need had been identified as early
as 1982 in a Harvard Business Review paper The CEO Goes On-Line
(Rockart & Treacy, 1982), where the authors reported on the
shocking discovery that some C-level executives were using desk-
top computer terminals to access status information—reports—
about their businesses and even analyzing and graphing data
trends. One executive even reported that “Access to the relevant
data to check out something…is very important. My home terminal lets
me perform the analysis while it’s at the forefront of my mind.” The
paper also introduced the term executive information systemexecutive information systemexecutive information systemexecutive information system (EIS)(EIS)(EIS)(EIS) to
an unsuspecting world; a reading some 40 years later reveals just
how little has changed in the interim in senior managers’ needs for
data about their businesses and their ability to probe into it. An
intriguing and earlier reference to a system on a boardroom screen
and computer terminals declares “Starting this week [the CEO of
Gould] will be able to tap three-digit codes into a 12-button box resem-
bling the keyboard of a telephone. SEX will get him sales figures. GIN will
call up a balance sheet. MUD is the keyword for inventory” (Business
Week, 1976). At least the codes were memorable!
These executive-level, dependent users needed extensive support
teams to prepare reports. However, the thought of an executive
analyzing data no longer surprises anyone, and today’s executives
have grown up in an era when computer, and later Internet use was
becoming pervasive. Driven by the iPad revolution, independent
executives probably outnumber their dependent colleagues today,
although extensive backroom data preparation and support re-
mains common. Driven in large part from business management
schools, the concept of EIS developed largely independently from
data warehousing through the 1980s (Thierauf, 1991), (Kaniclides
& Kimble, 1995). With the growing popularity of data warehousing
and BI, and the recognition that data consistency was a vital pre-
requisite, IT shops and vendors gradually began to include EIS with-
in BI as the top layer of the pyramid.
I speak of data above because despite the information in its name,
EIS focused more on data, mainly originating from the financial and
operational systems. External data sources such as Standard and
Business unIntelligence | 136
Poor’s were also seen as important to executives. However, it has
long been recognized that soft information—press and TV reports,
analyst briefings, and internal documents and presentations, as well
as informal information from face-to-face meetings—forms a high
percentage of the information needs of the majority of executives,
especially when strategizing on longer-term (months to years) de-
cisions. Its absence from EIS implementations, especially those fed
from enterprise data warehouses, is probably an important factor
in their relative lack of commercial success in the market. Strategic
BI also maintained an emphasis on historical data and was differen-
tiated from tactical BI largely by the longer business impact
timeframe—months to years—expected at this level. Strategic BI
implementations have struggled to gain widespread traction for
two main reasons. First, they usually exclude soft information, of
both the formal and the collaborative, informal varieties. Second,
they typically require the reconciliation of all hard information
across the full range of operational sources, pushing them far out
on any reasonable data warehouse implementation timeline.
The final layer of the usage pyramid, operational BIoperational BIoperational BIoperational BI, emerged from
the development of the ODS and operational BI, described above.
The focus is on intra-day decisions that must be made in hours or
even seconds or less. Operational analytics is today’s incarnation,
emphasizing the use of hard information in ever increasing
amounts. The initial users of operational BI were seen as front-line
staff who deal directly with customer, supplier, manufacturing and
other real-time processes of the business, supported through live
dashboards. Increasingly, operational applications use this function
directly through exposed APIs. Based on detailed, near real-time,
low latency data, operational BI poses significant technical chal-
lenges to the traditional data warehouse architecture, where rec-
onciling disparate data sources is often a prolonged process.
Nonetheless, operational BI has grown in stature and is now of
equal importance to the tactical BI layer for most businesses.
Figure 5-7 is used widely to represent several aspects of BI usage.
It reflects the traditional hierarchical structure of organizations,
both in terms of the relative importance of individual decisions and
the number of decisions and potential BI users at each level. It is
also often used to illustrate data volumes needed, although this can
137 | CHAPTER 5: Data-based decision making
be misleading for two reasons. First is an assumption about the
level of summarization vs. detail in the three layers. Operational BI
demands highly detailed and ongoing data feeds, clearly requiring
the largest possible volume of data. As we move to the tactical lay-
er, it is often reasonable to summarize data. Even with lengthy his-
torical periods involved, this seldom offsets the summarization
savings. However, some business needs do require substantial
levels of detail for tactical BI. At the strategic level, significant
summarization is common. However, the second factor, the need
for soft information for strategic BI mentioned earlier, must also be
taken into account. Soft information can be voluminous and is diffi-
cult to summarize mathematically. In short, the shape of the pyra-
mid indicates information volumes poorly.
The relationship of the pyramid to the organizational hierarchy may
suggest that data flows up and decisions down the structure. Again,
while this may be true in some business areas, it is certainly not
universally the case. Many individual operational, tactical and stra-
tegic decisions have an existence entirely independent of other
layers. A strategic decision about a merger or acquisition, for ex-
ample, is highly unlikely to require particular operational decisions
in its support. The IT origins of the BI pyramid and its consequent
focus on data rather than information shed little light on the pro-
cess of decision making. The visual resemblance of the BI usage
pyramid to the DIKW version we met in Chapter 3 promotes these
further assumptions: that data is the prevalent basis for operational
BI (only partly true), while strategic BI is built on knowledge (likely)
or even wisdom (probably untrue). However, the BI usage pyramid
is more resilient than its DIKW counterpart. It identifies three rela-
tively distinct uses of BI that relate well to particular roles and pro-
cesses in the organization, to which we return in Chapters 7 and 9.
What it misses, because of that very focus on roles and processes,
is the topic we now call business analyticsbusiness analyticsbusiness analyticsbusiness analytics.
AnalyticsAnalyticsAnalyticsAnalytics————digging beneath the pyramidsdigging beneath the pyramidsdigging beneath the pyramidsdigging beneath the pyramids
Data mininData mininData mininData miningggg, known academically as knowledge discovery in data-
bases (KDD), emerged at the same time as BI in the 1990s (Fayyad,
et al., 1996) as the application of statistical techniques to large data
sets to discover patterns of business interest. There are probably
few BI people who haven’t heard and perhaps repeated the “beer
Business unIntelligence | 138
and diapers (or nappies)” story: a large retailer discovered through
basket analysis—data mining of till receipts—that men who buy
diapers on Friday evenings often also buy beer. The store layout
was rearranged to place the beer near the diapers and beer sales
soared. Sadly, this story is now widely believed to be an urban leg-
end or sales pitch rather than a true story of unexpected and mo-
mentous business value gleaned from data mining. Nevertheless, it
makes the point that there may be nuggets of useful information to
be discovered through statistical methods in any large body of data,
and action that can be taken to benefit from these insights.
In the past few years, the phrase business analytics has come to
prominence. Business analytics, or more simply, analytics, is defined
by Thomas Davenport as “the extensive use of data, statistical and
quantitative analysis, explanatory and predictive models, and fact-
based management to drive decisions and actions” (Davenport &
Harris, 2007) and as a subset of BI. Other authors suggest it is ei-
ther an extension of or even a replacement for BI. It also clearly
overlaps with data mining. Often discussed as predictive analyticspredictive analyticspredictive analyticspredictive analytics or
operational analyticsoperational analyticsoperational analyticsoperational analytics, the market further tries to differentiate ana-
lytics from BI as focused on influencing future customer behavior,
either longer term or immediately. A common pattern of opera-
tional analytics is to analyze real-time activity—on a website, for
example—in combination with historical patterns and instantly
adapt the operational interaction—offering a different or additional
product or appropriate discount—to drive sales. Thus, none of
these ideas are particularly novel. BI included similar concepts from
the beginning.
If we position business analytics in the BI usage pyramid, we can
immediately see that operational analytics is, at most, an extension
of operational BI. Similarly, predictive analytics enhances and ex-
tends the investigative goal of tactical BI that has mainly migrated
into spreadsheets. Davenport’s definition above is often quoted to
emphasize the role of statistics and predictive models, but perhaps
the most important aspect is its underlining of the goal of driving
decisions and actions. Beyond that, what have changed are the data
sources and volumes available in big data, as well as the faster pro-
cessing demanded by users and provided by modern hardware and
software advances. For example, logistics firms now use analysis of
139 | CHAPTER 5: Data-based decision making
real-time traffic patterns, combined with order transaction data
and traditional route planning, to optimize scheduling of deliveries
to maximize truck utilization, and to improve customer satisfaction
by providing more accurate estimates of arrival times. While one
might be tempted to think that this is simply a matter of speed or
scale, in fact, the situation is more of a step change in what is possible,
enabling new ways of making decisions and driving the new ways of
doing business defined as the biz-tech ecosystem.
Data scientists or Egyptologists?Data scientists or Egyptologists?Data scientists or Egyptologists?Data scientists or Egyptologists?
As digging beneath pyramids of data has become an
increasingly popular pastime, we’ve seen the emergence
of a new profession: the data scientist. Although the
term data science has a long history, both it and the role
of data scientist have been taken to heart by the big data
movement. And given the breadth of definitions of big
data itself (see Chapter 6), you won’t be surprised to
discover that data scientists are equally forgiving about the scope
of their job. Unlike Egyptologists.
IBM’s Swami Chandrasekaran has built a comprehensive visual
Metro map of the skills required of a data scientist
(Chandrasekaran, 2013). The visual metaphor is appropriate for a
budding data scientist but, with disconnected lines and a technical,
big data point of view, the overall picture is disappointing for a
business trying to grasp precisely what a data scientist is. In the
simplest terms, I believe that a data scientist is best thought of as
an advanced, inspired business analyst and power user of a wide
set of data preparation and mining, business intelligence, infor-
mation visualization, and presentation tools. Added to this he or
she needs to understand the business, both process and infor-
mation, and have the ability to present a convincing case to manag-
ers and executives. A very broad skill set and unlikely to be found in
one person. Bill Franks, Chief Analytics Officer at Teradata, pro-
vides a comprehensive recipe for the making of an analytic profes-
sional or data scientist (Franks, 2012).
FigureFigureFigureFigure 5555----8888::::
Khafre’s Pyramid,
Giza, Egypt
Business unIntelligence | 140
A step beyond the pyramidA step beyond the pyramidA step beyond the pyramidA step beyond the pyramid
Looking forward, a change in thinking of particular interest consid-
ers how we analyze and interpret reality. BI tools and approaches
have generally followed the basic premise of the scientific method in
their use of information, where hypotheses and theories are pro-
posed and subsequently verified or discarded based on the collec-
tion and analysis of information. It has been suggested that
business analytics, when used on big data, signals the end of the
scientific method (Anderson, 2012). The statistician, George E. P.
Box said, over thirty years ago, that “all models are wrong, but some
are useful” (Box, 1979). Anderson reported that Peter Norvig,
Google's research director, suggested that today’s reality is that “all
models are wrong, and increasingly you can succeed without them.”
With the petabytes of data and petaflops of processing power
Google has at its disposal, one can dispense with the theorizing and
simply allow conclusions to emerge from the computer.
Correlation trumps causation, declare the authors of Big Data: A
Revolution That Will Transform How We Live, Work and Think (Mayer-
Schonberger & Cukier, 2013). Clearly, the emergence of big data
has reemphasized the analysis of information and the excitement of
discovering the previously unknown in its midst. But what becomes
“known” if it is a mathematical model so complex that its only expla-
nation is that the simulation works? At least until the arrival of a
giant dinosaur extinction event asteroid that wasn’t—and couldn’t
be—in the equations because it wasn’t in the underlying data
(Weinberger, 2012). The problem is not confined to asteroids or,
indeed, black swans—a metaphor for unexpected events that have
a major effect, and are often inappropriately rationalized. As we
gather ever more data and analyze it more deeply and rapidly, we
begin to fall prey to the myth that we are increasingly predicting
the future with ever greater certainty. A more realistic view might
be that the computers and algorithms are making sophisticated
guesses about future outcomes. As Alistair Croll opines, “Just be-
cause the cost of guessing is dropping quickly to zero doesn’t mean we
should treat a guess as the truth” (Croll, 2013).
The above radical thinking may be the ultimate logical conclusion of
data-based decision making, but I also seriously question if we can
trust the Beast in the computer that far, basing decisions solely on
141 | CHAPTER 5: Data-based decision making
basic data. Chapter 8 expands the scope of thinking about soft in-
formation and knowledge—using the full scope of information
stored digitally today. And that, of course, is only a staging post on
the quest for a full understanding of how human and team deci-
sions can be fully supported by computers, as we’ll explore in Chap-
ter 9. But for now, it’s back to the present, where the data
warehouse has faced its biggest challenge for quite a few years
now: the timeliness of its data.
5.4 Today’s conundrum—consistency or timeliness
want it all and I want it now…” Queen’s 1989 hit6
sums it up. So
far, we’ve been dealing with the demand for it all. Now we need
to address delivering it now. Speed is of the essence, whether in
travel, delivery times, or news. For business, speed has become a
primary driver of behavior. Shoppers demand instant gratification
in purchases; retailers respond with constantly stocked shelves.
Suppliers move to real-time delivery via the Web and just-in-time
manufacturing. In short, processes have moved into overdrive.
Within the business, data needs, already extensive, are thus becom-
ing ever closer to real-time. Sales, front-office, and call center per-
sonnel require current information from diverse channels about
customer status, purchases, orders and even complaints in order to
serve customers more quickly. Marketing and design functions
operate on ever-shortening cycles, needing increasingly current
information to react to market directions and customer prefer-
ences and behavior. Just-in-time manufacturing and delivery de-
mand near real-time monitoring of and actions on supply chains.
Managers look for up-to-the-minute and even up-to-the-second
information about business performance internally and market
conditions externally.
At a technology level, faster processors, parallel processing, solid-
state disks and in-memory stores all drive faster computing. Data-
bases and database appliances are marketed on speed of response
to queries. Dashboard vendors promise to deliver near real-time
KPIs (key performance indicators). ETL tools move from batch de-
6
I Want it All, Queen’s 1989 hit written by Brian May.
I
Business unIntelligence | 142
livery of data to micro-batches and eventually to streaming ap-
proaches. Complex event processing (CEP) tools monitor events as
they stream past on the network, analyze correlations, infer higher-
order events and situations—then act without human intervention.
In business intelligence, IT strives to provide faster responses to
decision makers’ needs for data. Timeliness manifests most obvi-
ously in operational BI, where information availability is pushed
from weekly or daily to intra-day, hourly and lower. Near instanta-
neous availability of facts and figures is supported by streaming
ETL, federated access to operational databases or CEP.
But as we’ve seen, before e-commerce made speed and timeliness
the flavors du jour, correctness and consistency of data and behav-
iors were more widely valued. Data consistency and integration
were among the key drivers for data warehousing and business
intelligence. Users longed for consistent reports and answers to
decision-support questions so that different departments could
give agreed-upon answers to the CEO’s questions. Meetings would
descend into chaos as managers battled over whose green lineflow
report depicted the most correct version of the truth. Decisions
were delayed as figures were reworked. And IT, as provider of
many of the reports, often got the blame—and the expensive,
thankless and time-consuming task of figuring out who was right.
Unfortunately, within the realm of business information, timeliness
and consistency, while not totally incompatible, make uncomforta-
ble bedfellows. Business information is generated by and changed
in widely disparate processes and physical locations. The processes
often consist of a mix of legacy and modern applications, often built
independently at different times and by different departments. The
result is inconsistent information at its very source. Increasingly,
different parts of business processes are highly distributed geo-
graphically. Despite companies’ best efforts, such distribution, of-
ten predicated on staff cost savings, usually introduces further
inconsistencies in the data.
Typically, IT has carried out some level of integration over the years
to improve data consistency, but it is often piecemeal and asyn-
chronous. Master data management is but one more recent exam-
ple of such an effort. However, integration takes time—both in
initial design and implementation, as well as in operation. In such an
143 | CHAPTER 5: Data-based decision making
environment, achieving simultaneous
timeliness and consistency of information
requires: (i) potentially extensive applica-
tion redesign to ensure fully consistent
data definitions and complete real-time
processing and (ii) introduction of syn-
chronous interactions between different
applications. This process becomes more
technically demanding and financially
expensive in a roughly exponential man-
ner as information is made consistent
within ever shorter time periods, as
shown in Figure 5-9.
Despite some consultants’ and vendors’
claims to the contrary, neither timeliness
nor consistency have ever been absolutes,
nor are they now. Each has its pros and cons, its benefits and costs.
Different parts of the business value them to varying degrees. Find-
ing a balance between the two in the context of both business
needs and IT architectural decisions is increasingly important. And
adopting technological solutions that ease the conflict becomes
mandatory. The solution, however, entails making choices—
potentially difficult ones—about which applications prefer timeli-
ness over consistency and vice versa, as well as creating different
delivery mechanisms. An EDW-based approach maximizes con-
sistency; virtualization maximizes timeliness. Some classes of data
might need to be delivered through both methods so that, for ex-
ample, a report that is delivered with consistent data in the morn-
ing might be supplemented by timely data during the day. In such
these cases and as a general principle, metadata informing users of
the limitations that apply to each approach is required.
Integration
Cost
Time
Interval
Week Day Hour Minute Second
Delivering a wrong answer early can have a longer-term and great-
er impact as incorrect decisions, based on the initial error, multiply
in the time period before the correct answer arrives—especially if
the error is only discovered much later. Timely, but (potentially) in-
consistent, data may be better delivered as an update to a prior
consistent base set.
FigureFigureFigureFigure 5555----9999::::
Integration cost as a
function of timeliness
Business unIntelligence | 144
DeDeDeDe----layering the operational and informational worldslayering the operational and informational worldslayering the operational and informational worldslayering the operational and informational worlds
As already noted, the need for consistency was the primary driver
of the data warehouse architecture, leading to a layered structure,
due—at least in part—to limitations of the technology used to im-
plement it. The question thus arises of whether advances in tech-
nology could eliminate or reduce layering to gain improvements in
timeliness or maintainability. There are two related, but different
aspects: (i) removing layering within the data warehouse, and (ii)
reuniting the operational and information environments. The pos-
sibility of the former has been increasing for nearly a decade now,
as increasing processor power and software advances in relational
databases have driven gains in query performance. Mainstream
RDBMSs have re-promoted the concept of views, often material-
ized and managed by the database itself to reduce the dependency
of dependent data marts on separate platforms populated via ETL
tools. The extreme query performance—rated at up to two or three
orders of magnitude higher than general purpose RDBMSs—of
analytic databases has also allowed consideration of reducing data
duplication across the EDW and data mart layers.
We are now seeing the first realistic attempts to merge operational
and informational systems. Technically, this is driven by the rapid
price decreases for memory and multi-core systems, allowing new
in-memory database designs. The design point for traditional
RDBMSs has been disk based, with special focus on optimizing the
bottleneck of accessing data on spinning disk, which is several or-
ders of magnitude slower than memory access. By 2008, a number
of researchers, including Michael Stonebraker and Hasso Plattner,
were investigating the design point for in-memory databases, both
OLTP (Stonebraker, et al., 2008) and combined operation-
al/informational (Plattner, 2009). The latter work has led to the
development of SAP HANA, a hardware/software solution first
rolled out for informational, and subsequently for operational, ap-
plications. The proposition is relatively simple: with the perfor-
mance gains of an in-memory database, the physical design trade-
offs made in the past for operational vs. informational processing
become unnecessary. The same data structure in memory per-
forms adequately in both modes.
145 | CHAPTER 5: Data-based decision making
In terms of reducing storage and ongoing maintenance costs, par-
ticularly when driven by a need for timeliness, this approach is at-
tractive. However, it doesn’t explicitly support reconciliation of
data from multiple operational systems as data warehousing does.
Building history is technically supported by the use of an insert-
based update approach, but keeping an ever-growing and seldom-
used full history of the business in relatively expensive memory
makes little sense. Nor does the idea of storing vast quantities of
soft information in a relational format appeal. Nonetheless, when
confined to operational and near-term informational data, the ap-
proach offers a significant advance in delivering increased timeli-
ness and improved consistency within the combined set of data.
5.5 Most of our assumptions have outlived
their uselessness7
he questions posed throughout this and the previous chap-
ters lead directly to an examination of how our thinking
about data management in business, both in general and with
particular reference to decision support, has evolved. Or in many
cases, has not moved very far at all. An earlier paper (Devlin, 2009)
identified four “ancient postulates” of data warehousing based on
an analysis of the evolution of the data warehouse architecture. An
additional three postulates now emerge.
1. Operational and informational environments should be sep1. Operational and informational environments should be sep1. Operational and informational environments should be sep1. Operational and informational environments should be sepa-a-a-a-
rated for both business and technical reasonsrated for both business and technical reasonsrated for both business and technical reasonsrated for both business and technical reasons
Dating from the 1970s, this first postulate predates data ware-
housing, but was incorporated without question in the first data
warehouse architecture. At that time, both business management
and technology were still at a comparatively early stage of evolu-
tion. Business decision makers operated on longer planning cycles
and often deliberately ignored the fluctuating daily flow of business
events—their interest was in monthly, quarterly or annual report-
ing; understanding longer trends and directions; or in providing
input to multi-year strategies. On the technology front, applications
7
Marshall McLuhan
T
Business unIntelligence | 146
were hand-crafted and run in mainframes operating at the limits of
their computing power and storage.
These factors led to one of the longest-lived postulates in IT—the
need to separate operational and informational computing and
systems. From its earliest days, DSS envisaged extracting data from
the operational applications into a separate system designed for
decision makers. And, at the time, that made sense: it was what
business users needed, and the technology could support it more
easily than allowing direct ad hoc access to operational databases.
Of all seven postulates, this is the one that has never been seriously
challenged…until now, as we saw in the previous section.
2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable,
integrated view of the businessintegrated view of the businessintegrated view of the businessintegrated view of the business
This postulate was clearly visible in the first architecture paper
(Devlin & Murphy, 1988) and in the prior work carried out in IBM
and other companies in the mid-1980s. As we’ve already seen in
the previous section, a basic assumption of this architecture was
that operational applications could not be trusted. The data they
contained was often incomplete, inaccurate, and inconsistent
across different applications. As a result, the data warehouse was
the only place where a complete, accurate and consistent view of
the business could be obtained.
This postulate is now under challenge on two fronts. First, the data
quality of operational systems has improved since then. While still
far from perfect, many companies now use commercial off-the-
shelf applications in house or in the Cloud, such as SAP or
Salesforce.com, with well-defined, widely tested schemas, and ex-
tensive validation of input data. These factors, together with exten-
sive sharing of data between businesses electronically and
between process stages, have driven improved data quality and
consistency in the operational environment. Second, the growth of
big data poses an enormous challenge to the principle that a de-
pendable, integrated view of the business is achievable in any single
place. Going forward, we move from the concept of a single version
of the truth to multiple, context-dependent versions of the truth,
which must be related to one another and users’ understanding of
them via business metadata.
147 | CHAPTER 5: Data-based decision making
3.3.3.3. The data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the full
enterprise data modelenterprise data modelenterprise data modelenterprise data model
Another cornerstone of the data warehouse was that data is use-
less without a framework that describes what the data means, how
it is derived and used, who is responsible for it, and so on. Thus
arose the concept of metadata and one of its key manifestations:
the enterprise data model. By 1990, this concept had been adopted
by data warehousing from information architecture (Zachman,
1987) as the basis for designing the EDW and consolidating data
from the disparate operational environment. A key tenet was that
the EDM should be physically instantiated as fully as possible in the
data warehouse to establish agreed definitions for all information.
It was also accepted that the operational environment is too re-
stricted by performance limitations; too volatile to business change
and its data models too fragmented, incomplete and disjointed to
allow instantiation of the EDM there. Thus, the data warehouse
became the only reliable placement for the EDM. However, imple-
mentation has proven rather problematic in practice.
With the increasing pace of business change and the growing role
of soft data in business, it is increasingly difficult to envisage the
type of enterprise-scope projects required to reach this goal. The
EDM and its instantiation in the data warehouse thus remain aspi-
rational, at best, and probably in need of serious rethinking.
4.4.4.4. A layered data warehouse isA layered data warehouse isA layered data warehouse isA layered data warehouse is necessary for speedy and reliablenecessary for speedy and reliablenecessary for speedy and reliablenecessary for speedy and reliable
query performancequery performancequery performancequery performance
As discussed in Section 5.2, data marts and, subsequently, other
layering of the data warehouse were introduced in the 1990s to
address RDBMS performance issues and the long project cycles
associated with data warehouse projects. The value of this postu-
late can be clearly seen in the longevity of the architectural ap-
proach. However, this layering presents its own problems. It delays
the passage of data from operations to decision makers; real-time
reaction is impossible. Maintenance can become costly as impact
analysis for any change introduced can be complex and far reaching
across complex ETL trails and multiple copies of data.
Business unIntelligence | 148
The emergence of analytic databases in the early to mid-2000s,
with their combination of software and hardware advances demon-
strated that query speeds over large data volumes could be im-
proved by orders of magnitude over what had been previously
possible. Even more clearly than the split between operational and
informational in postulate 1, the layering in the warehouse itself is
becoming, in many cases, increasingly unnecessary.
5. Information can be treated simply as a super5. Information can be treated simply as a super5. Information can be treated simply as a super5. Information can be treated simply as a super----class of dataclass of dataclass of dataclass of data
Since the beginning of computing in the 1950s, the theoretical and
practical focus has largely been on data—optimized for computer
usage—and information has been viewed through a data-centric
lens. Chapter 3 discussed at length why this is completely back-to-
front, placing an IT-centric construct at the heart of communication
that is essentially human—however imprecise, emotionally laden,
and intimately and ultimately dependent on the people involved,
their inter-relationships, and the context in which they operate. As
the biz-tech ecosystem comes to fruition, we must start from
meaning—in a personal and business context—and work back
through knowledge and information all the way to data. In this way,
we can begin to reorient decision making to its business purpose
and the people who must make the decision and take action.
6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through
a largely centralized environmenta largely centralized environmenta largely centralized environmenta largely centralized environment
This postulate had its origins in the complexity and experimental
nature of early computers, but it continues to hold sway even
though modern PCs and an ever-increasing number of mobile de-
vices are now used extensively by business users, and hold far more
data in total than centralized systems. While centralized control
and management are the ideal approach to assure data quality and
consistency, the real world of today’s computing environment
makes that impossible.
Management of data quality and consistency must now be auto-
mated and highly distributed. In addition, they must be applied to
data based on careful judgment, rather than seen as a mandatory
requirement for all data.
149 | CHAPTER 5: Data-based decision making
7.7.7.7. Business users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seen
by IT as marginal and threatening to data qualby IT as marginal and threatening to data qualby IT as marginal and threatening to data qualby IT as marginal and threatening to data qualityityityity
This belief can be traced to the roughly simultaneous appearance
of viable PCs and RDBMSs in the 1980s. As we observed in Chap-
ter 4, IT was used to managing the entire data resource of the busi-
ness, and as data became more complex and central to the
business, the emerging relational and largely centralized databases
were seized with both hands. PCs and spreadsheets were first ig-
nored and then reviled by the IT arbiters of data quality. This postu-
late has continued to hold sway even until today, despite the
growing quantity and role of distributed, user-controlled data.
Properly targeted and funded data governance initiatives are re-
quired to change this situation. Such initiatives are now widely rec-
ognized as a business responsibility (Hopwood, 2008), but in many
companies, the drive and interest still comes from IT. In the biz-
tech ecosystem, business management must step up to their re-
sponsibility for data quality and work closely with IT to address the
technical issues arising from a highly distributed environment.
All these commonly held assumptions have contributed to the rela-
tive stasis we’ve seen in the data warehousing world over the past
two decades. The time has come to let them go.
5.6 IDEAL architecture (3): Information,
Timeliness/Consistency dimension
s we’ve seen throughout this chapter, the original business
drive for consistency in reporting has been largely supplant-
ed by a demand for timeliness. However, from a conceptual
point of view, in a highly distributed computing environment where
information is created in diverse, unrelated systems, these two
characteristics are actually interdependent. Increase one and you
decrease the other. In our new conceptual architecture, we thus
need a dimension of the information layer that describes this. In
fact, data warehouse developers have been implicitly aware of this
dimension since the inception of data warehousing. However, it has
been concealed by two factors: (1) an initial focus only on con-
sistency and (2) the conflation of a physical architecture consisting
A
Business unIntelligence | 150
of discrete computing systems with a conceptual/logical architec-
ture that separated different business and processing needs.
As shown in Figure 5-10, the timeliness / consistency (TC)timeliness / consistency (TC)timeliness / consistency (TC)timeliness / consistency (TC) dimension
of information at the conceptual architecture level consists of five
classes that range from highly timely but necessarily inconsistent
information on the left, to highly consistent but necessarily untime-
ly on the right. From left to right, timeliness moves from infor-
mation that is essentially ephemeral to eternal.
InInInIn----flightflightflightflight information consists of messages on the wire or on an en-
terprise service bus; it is valid only at the instant it passes by. This
data-in-motion might be processed, used, and discarded. It is guar-
anteed only to be consistent within the message or, perhaps, the
stream of which it is part. In-flight information may be recorded
somewhere, depending on process needs, at which stage it be-
comes live.
LiveLiveLiveLive information has a limited period of validity and is subject to
continuous change. It also is not necessarily completely consistent
with other live information. In terms of typical usage, these two
classes correspond to today’s operational systems.
StableStableStableStable information, the mid-point on the continuum, represents a
first step towards guaranteed consistency by ensuring that stored
data is protected from constant change and, in some cases, en-
hanced by contextual information or structuring. In existing sys-
tems, the stable class corresponds to any data store where data is
FigureFigureFigureFigure 5555----10101010::::
Timeliness/
consistency
dimension of
information
151 | CHAPTER 5: Data-based decision making
not over-written whenever it changes, including data marts, partic-
ularly the independent version, and content stores. This class thus
marks the transition point from operational to informational.
Full enterprise-wide, cross-system consistency is the characteristic
of reconciledreconciledreconciledreconciled information, which is stored in usage-neutral form and
stable in the medium to long term. Its timeliness, however, is likely
to have been sacrificed to an extent depending on its sources; old-
er, internal, batch-oriented sources and external sources often
delay reconciliation considerably. The enterprise data warehouse is
the prime example of this class. MDM stores and ODSs can, de-
pending on circumstances, contain reconciled information, but
often bridge this and the live or stable classes.
HistoricalHistoricalHistoricalHistorical information is the final category, where the period of
validity and consistency is, in principle, forever. But, like real-world
history, it also contains much personal opinion and may be rewrit-
ten by the victors in any power struggle! Historical information
may be archived in practice, or may exist in a variety of long-term
data warehouse or data mart stores. It is, of course, subject to data
retention policies and practices, which are becoming ever more
important in the context of big data.
Many of the examples used above come from the world of hard
information and data warehousing, in particular. This is a conse-
quence of the transactional nature of the process-mediated data
that has filled the majority of information needs of business until
now. However, the classes themselves apply equally to all types of
information to a greater or lesser extent. For softer information,
they are seldom recognized explicitly today but will become of in-
creasing importance as social media and other human-sourced
information is applied in business decisions of increasing value.
The timeliness/consistency dimension broadly mirrors the lifecycle
of information from creation through use, to archival and/or dis-
posal. This spectrum also relates to the concept of hot, warm, and
cold data, although these terms are used in a more technical con-
text. As with all our dimensions, these classes are loosely defined
with soft boundaries; in reality, these classes gradually merge from
one into the next. It is therefore vital to apply critical judgment
when deciding which technology is appropriate for any particular
Business unIntelligence | 152
base information set. The knowledge and skills of any existing BI
team will be invaluable in this exercise, but will need to be comple-
mented by expertise from the content management team.
5.7 Beyond the data warehouse
iven the extent and rate of changes in business and tech-
nology described thus far, it is somewhat unexpected that
the term data warehouse and the architectural structures
and concepts described in Sections 5.2 and 5.3 still carry consider-
able weight after more than a quarter of a century. However, this
resistance to change cannot endure much longer. Indeed, one goal
of this book is to outline what a new, pervasive information archi-
tecture looks like, within the scope of data-based decision making
and the traditional data sources of BI for the past three decades.
Reports of my death have been greatly exaggeratedReports of my death have been greatly exaggeratedReports of my death have been greatly exaggeratedReports of my death have been greatly exaggerated8888
Of course, the data warehouse has been declared terminally ill
before now. BI and data warehouse projects have long had a poor
reputation for delivering on-time or within budget. While these
difficulties have clear and well-understood reasons—emanating
from project scope and complexity, external dependencies, organi-
zational issues, and more—vendors have regularly proposed quick-
fix solutions to businesses seeking quick and reliable solutions to BI
needs. The answers, as we’ve seen, range from data marts and ana-
lytic appliances to spreadsheets and big data. As each of these ap-
proaches has gained traction in the market, the death of the data
warehouse has been repeatedly—and incorrectly—pronounced.
The underlying reason for such faulty predictions is a misunder-
standing of the consistency vs. timeliness conundrum described in
section 5.4 above. The data warehouse is primarily designed for
consistency; the other solutions are more concerned with timeli-
ness, in development and/or operation. And data consistency re-
mains a valid business requirement, alongside timeliness, which has
growing importance in a fully interconnected world. Nonetheless,
as the biz-tech ecosystem evolves to become essentially real-time,
8
Mark Twain’s actual written reaction in 1897 was: “The report of my death was an
exaggeration.”
G
153 | CHAPTER 5: Data-based decision making
the data warehouse cannot retain its old role of all things to all in-
formational needs, going forward. As a consequence, while it will
not die, the data warehouse concept faces a shrinking role in deci-
sion support as the business demands increasing quantities of in-
formation of a structure or speed that are incompatible with the
original architecture or relational technology.
5.8 REAL architecture (1): Core business information
n essence, the data warehouse must return to its roots, as repre-
sented by Figure 5-3 on page 121. This requires separate con-
sideration of the two main architectural components of today’s
data warehouses—the enterprise data warehouse and the data
mart environment. In the case of the EDW, this means an increas-
ing focus on its original core value propositions of consistency and
historical depth, where they have business value, including:
1.1.1.1. The data to be loaded is process-mediated data, sourced from
the operational systems of the organization
2.2.2.2. This loaded data provides a fully agreed, cross-functional view
of the one consistent, historical record of the business at a de-
tailed, atomic level as created through operational transactions
3.3.3.3. Data is cleansed and reconciled based on an EDM, and stored
in a largely normalized, temporally based representation of that
model; star-schemas, summarizations and similar derived data
and structures are defined to be data mart characteristics
4.4.4.4. The optimum structure is a “single relational database” using
the power of modern hardware and software to avoid the copy-
ing, layering and partitioning of data common in the vast major-
ity of today’s data warehouses
5.5.5.5. The EDM and other metadata describing the data content is
considered as an integral, logical component of the data ware-
house, although its physical storage mechanism may need to be
non-relational for performance reasons
The first business role of any “modern data warehouse” is thus to
present a historically consistent, legally binding view of the busi-
I
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5
Business unIntelligence, Chapter 5

More Related Content

What's hot

Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Happiest Minds Technologies
 
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Mindshappiestmindstech
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and CloudRajagopalan V
 
Is the Data asset really different?
Is the Data asset really different?Is the Data asset really different?
Is the Data asset really different?Christopher Bradley
 
Cognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsCognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsAdrienne Debigare
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Dana Gardner
 
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 "Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 Leon Kappelman
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Stijn (Stan) Christiaens
 
Anomaly42 Context Data Broker FINAL 5
Anomaly42 Context Data Broker FINAL 5Anomaly42 Context Data Broker FINAL 5
Anomaly42 Context Data Broker FINAL 5Freddie McMahon
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData Blueprint
 
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...DATAVERSITY
 
Investing in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity CurveInvesting in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity CurveCognizant
 
"Expanding Business Analytics: Supporting ALL Information Workers"
"Expanding Business Analytics: Supporting ALL Information Workers""Expanding Business Analytics: Supporting ALL Information Workers"
"Expanding Business Analytics: Supporting ALL Information Workers"IBM India Smarter Computing
 

What's hot (18)

Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
 
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
 
Seminar presentation format
Seminar presentation formatSeminar presentation format
Seminar presentation format
 
Misceb intro2014
Misceb intro2014Misceb intro2014
Misceb intro2014
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and Cloud
 
Is the Data asset really different?
Is the Data asset really different?Is the Data asset really different?
Is the Data asset really different?
 
Cognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + BotsCognitive technologies with David Schatsky at Blocks + Bots
Cognitive technologies with David Schatsky at Blocks + Bots
 
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
Democratizing Advanced Analytics Propels Instant Analysis Results to the Ubiq...
 
The ABCs of Big Data
The ABCs of Big DataThe ABCs of Big Data
The ABCs of Big Data
 
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010 "Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
"Enterprise Architecture and the Information Age Enterprise" @ CSDM2010
 
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
Successfully Kickstarting Data Governance's Social Dynamics: Define, Collabor...
 
TierPoint_ColocationWhitepaper-Six_Reasons
TierPoint_ColocationWhitepaper-Six_ReasonsTierPoint_ColocationWhitepaper-Six_Reasons
TierPoint_ColocationWhitepaper-Six_Reasons
 
Anomaly42 Context Data Broker FINAL 5
Anomaly42 Context Data Broker FINAL 5Anomaly42 Context Data Broker FINAL 5
Anomaly42 Context Data Broker FINAL 5
 
Data-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROIData-Ed: Show Me the Money: The Business Value of Data and ROI
Data-Ed: Show Me the Money: The Business Value of Data and ROI
 
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...
Enterprise Data Webinar World Series: Leading the Data Asset Management Team ...
 
Investing in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity CurveInvesting in AI: Moving Along the Digital Maturity Curve
Investing in AI: Moving Along the Digital Maturity Curve
 
"Expanding Business Analytics: Supporting ALL Information Workers"
"Expanding Business Analytics: Supporting ALL Information Workers""Expanding Business Analytics: Supporting ALL Information Workers"
"Expanding Business Analytics: Supporting ALL Information Workers"
 
P3 4-gregoris mentzas
P3 4-gregoris mentzasP3 4-gregoris mentzas
P3 4-gregoris mentzas
 

Viewers also liked

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Barry Devlin
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBarry Devlin
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data ArchitectureEd Thewlis
 
Making big data work
Making big data work Making big data work
Making big data work Ed Thewlis
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BIDeZyre
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 

Viewers also liked (6)

Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too Why Big Data Analytics Needs Business Intelligence Too
Why Big Data Analytics Needs Business Intelligence Too
 
Business unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop TourBusiness unIntelligence - a Whistle Stop Tour
Business unIntelligence - a Whistle Stop Tour
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Making big data work
Making big data work Making big data work
Making big data work
 
How big data is transforming BI
How big data is transforming BIHow big data is transforming BI
How big data is transforming BI
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 

Similar to Business unIntelligence, Chapter 5

Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big dataDigimark
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategyHimanshu Bari
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business ModelingNeil Raden
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
9 Steps to Successful Information Lifecycle Management
9 Steps to Successful Information Lifecycle Management9 Steps to Successful Information Lifecycle Management
9 Steps to Successful Information Lifecycle ManagementIron Mountain
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052kavi172
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052Gilbert Rozario
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieSunil Ranka
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015Edgar Alejandro Villegas
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Angie Jorgensen
 
Assignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxAssignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxsherni1
 
Assignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxAssignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxdeanmtaylor1545
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraVin Malhotra
 
Business Analytics A Certain Something
Business Analytics A Certain SomethingBusiness Analytics A Certain Something
Business Analytics A Certain SomethingLogicalis
 
Business analytics a certain something
Business analytics   a certain somethingBusiness analytics   a certain something
Business analytics a certain somethingLogicalis
 
Semantech Inc. - Mastering Enterprise Big Data - Intro
Semantech Inc. - Mastering Enterprise Big Data - IntroSemantech Inc. - Mastering Enterprise Big Data - Intro
Semantech Inc. - Mastering Enterprise Big Data - IntroStephen Lahanas
 

Similar to Business unIntelligence, Chapter 5 (20)

Ab cs of big data
Ab cs of big dataAb cs of big data
Ab cs of big data
 
Big dataplatform operationalstrategy
Big dataplatform operationalstrategyBig dataplatform operationalstrategy
Big dataplatform operationalstrategy
 
The Case for Business Modeling
The Case for Business ModelingThe Case for Business Modeling
The Case for Business Modeling
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
9 Steps to Successful Information Lifecycle Management
9 Steps to Successful Information Lifecycle Management9 Steps to Successful Information Lifecycle Management
9 Steps to Successful Information Lifecycle Management
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Oea big-data-guide-1522052
Oea big-data-guide-1522052Oea big-data-guide-1522052
Oea big-data-guide-1522052
 
Why Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A LieWhy Everything You Know About bigdata Is A Lie
Why Everything You Know About bigdata Is A Lie
 
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015Analytics Trends 20145 -  Deloitte - us-da-analytics-analytics-trends-2015
Analytics Trends 20145 - Deloitte - us-da-analytics-analytics-trends-2015
 
Big data upload
Big data uploadBig data upload
Big data upload
 
Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...Encrypted Data Management With Deduplication In Cloud...
Encrypted Data Management With Deduplication In Cloud...
 
Assignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxAssignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docx
 
Assignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docxAssignment 1TextbookInformation Systems for Business and Beyond.docx
Assignment 1TextbookInformation Systems for Business and Beyond.docx
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
9sight operational analytics white paper
9sight   operational analytics white paper9sight   operational analytics white paper
9sight operational analytics white paper
 
Bidata
BidataBidata
Bidata
 
Big Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin MalhotraBig Data & Analytics Trends 2016 Vin Malhotra
Big Data & Analytics Trends 2016 Vin Malhotra
 
Business Analytics A Certain Something
Business Analytics A Certain SomethingBusiness Analytics A Certain Something
Business Analytics A Certain Something
 
Business analytics a certain something
Business analytics   a certain somethingBusiness analytics   a certain something
Business analytics a certain something
 
Semantech Inc. - Mastering Enterprise Big Data - Intro
Semantech Inc. - Mastering Enterprise Big Data - IntroSemantech Inc. - Mastering Enterprise Big Data - Intro
Semantech Inc. - Mastering Enterprise Big Data - Intro
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Business unIntelligence, Chapter 5

  • 2. Business unIntelligence | 1 Table of Contents Chapter 1: A modern trinity 1.1 Why now? Why Business unIntelligence? 1.2 What’s it all about, trinity? 1.3 Pandora’s Box of information 1.4 Process, process every where 1.5 There’s nowt so queer as folk 1.6 Architecting the biz-tech ecosystem 1.7 Conclusions Chapter 2: The biz-tech ecosystem 2.1 The birth of the Beast 2.2 Beauty and the Beast—the biz-tech ecosystem 2.3 Tyranny of the Beast 2.4 In practice—all change in the organization 2.5 Conclusions Chapter 3: Data, information and the hegemony of IT 3.1 The knowledge pyramid and the ancient serpent of wisdom 3.2 What is this thing called data? 3.3 From information to data 3.4 The modern meaning model—m3 3.5 Database daemons & delicate data models 3.6 The importance of being information 3.7 IDEAL architecture (1): Information, Structure / Context dimension 3.8 Metadata is two four-letter words 3.9 In practice—focusing IT on information 3.10 Conclusions Chapter 4: Fact, fiction or fabrication 4.1 Questions and answers 4.2 Where do you come from (my lovely)? 4.3 It’s my data, and I’ll play if I want to 4.4 I’m going home… and I’m taking my data with me 4.5 Information from beyond the Pale 4.6 Tales of sails and sales 4.7 A new model for information trust 4.8 IDEAL architecture (2): Information, Reliance/Usage dimension 4.9 In practice—(re)building trust in data 4.10 Conclusions Chapter 5: Data-based decision making 5.1 Turning the tables on business 5.2 The data warehouse at the end of the universe 5.3 Business intelligence—really? 5.4 Today’s conundrum—consistency or timeliness 5.5 Most of our assumptions have outlived their uselessness 5.6 IDEAL architecture (3): Information, Timeliness/Consistency dimension 5.7 Beyond the data warehouse 5.8 REAL architecture (1): Core business information 5.9 In practice—upgrading your data warehouse 5.10 Conclusions Chapter 6: Death and rebirth in the information explosion 6.1 Data deluge, information tsunami 6.2 What is big data and why bother? 6.3 Internal reality mirrors the external 6.4 A primer on big data technology 6.5 Information—the tri-domain logical model 6.6 REAL architecture (2): Pillars replace layers 6.7 In practice—bringing big data on board 6.8 Conclusions Chapter 7: How applications became apps and other process peculiarities 7.1 Hunter-gatherers, farmers and industrialists 7.2 From make and sell to sense and respond 7.3 Process is at the heart of decision making
  • 3. Business unIntelligence | 2 7.4 Stability or agility (also known as SOA) 7.5 Keeping up with the fashionistas 7.6 IDEAL architecture (4), Process 7.7 REAL architecture (3), The six process- ations 7.8 In practice—implementing process flexibility 7.9 Conclusions Chapter 8: Insightful decision making 8.1 Business intelligence (the first time) 8.2 Information—some recent history 8.3 Copyright or copywrong 8.4 I spy with my little eye something beginning… 8.5 The care and grooming of content 8.6 A marriage of convenience 8.7 Knowledge management is the answer; now, what was the question? 8.8 Models, ontologies and the Semantic Web 8.9 In practice—finally moving beyond data 8.10 Conclusions Chapter 9: Innovation in the human and social realm 9.1 Meaning—and the stories we tell ourselves 9.2 Rational decision making, allegedly 9.3 Insight—engaging the evolved mind 9.4 Working 9 to 5… at the MIS mill 9.5 Enter prize two dot zero 9.6 People who need people… 9.7 IDEAL architecture (5): People 9.8 In practice—introducing collaborative decision making 9.9 Conclusions Chapter 10: Business unIntelligence— whither now? 10.1 IDEAL architecture (6): Summary 10.2 REAL architecture (4): Implementation 10.3 Past tense, future perfect
  • 4. Business unIntelligence Insight and innovation beyond analytics and big data Dr. Barry Devlin Technics Publications, LLCTechnics Publications, LLCTechnics Publications, LLCTechnics Publications, LLC New JerseyNew JerseyNew JerseyNew Jersey
  • 5. Published by: Technics Publications, LLCTechnics Publications, LLCTechnics Publications, LLCTechnics Publications, LLC 2 Lindsley Road Basking Ridge, NJ 07920 U.S.A. www.technicspub.com Edited by Carol Lehn Cover design by Mark Brye All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system without written permission from the publisher, except for the inclusion of brief quotations in a review. The author and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. All trade and product names are trademarks, registered trademarks, or service marks of their respective companies, and are the property of their respective holders and should be treated as such. Artwork used by permission. See www.BusinessunIntelligence.com for picture credits. Copyright © 2013 by Dr. Barry Devlin This book is printed on acid-free paper. ISBN, print ed. 978-1-935504-56-6 ISBN, Kindle ed. 978-1-935504-57-3 ISBN, ePub ed. 978-1-935504-58-0 First Printing 2013 Library of Congress Control Number: 2013948310 ATTENTION SCHOOLS AND BUSINESSES: Technics Publications books are available at quantity discounts with bulk purchase for educational, business, or sales promotional use. For information, please email Steve Hoberman, President of Technics Publications, at me@stevehoberman.com.
  • 6. CHAPTER 5 DataDataDataData----based decision makingbased decision makingbased decision makingbased decision making “Most executives, many scientists, and almost all business school graduates believe that if you analyze data, this will give you new ideas. Unfortunately, this belief is totally wrong. The mind can only see what it is prepared to see.” “The purpose of computing is insight, not numbers.” Edward de Bono Richard W. Hamming y the mid-1980s, corporate IT figured that they had a rea- sonably good handle on building and running the operational systems responsible for the day-to-day processes of increas- ingly automated businesses. The majority of companies pro- grammed their own systems, mostly in Cobol, computerizing the financially important and largely repetitive operational activities, one business activity at a time. Of course, the applications IT built weren’t perfect and there were backlogs in development, but the problems were understood and solutions seemingly in sight. We will return to the illusion that the operational environment was largely complete in Chapter 7, but for now we’ll focus on data. Attention thus turned to a rather different business need: MIS or decision support. We’ve already seen how MIS grew in the 1970s through point solutions. IT saw two problems worth tackling. First, from the business view, there was growing inconsistency among the results they were getting. Second, the explosion of extracts from the operational systems was causing IT serious headaches. An integrated solution to ensure consistency and reduce extract loads was required. And a modern technology—relational databases— was seen as the way to do it. In 1985, I defined an architecture for business reporting and analy- sis in IBM (Devlin & Murphy, 1988), which became a foundation of data warehousing. At the heart of that architecture and data ware- housing in general, is the need for a high-quality, consistent store of B
  • 7. Business unIntelligence | 116 historically complete and accurate data. Defining and delivering it turned out to be tougher and slower than anybody imagined. Over the succeeding decades, the focus shifted back and forth between these consistency goals and timeliness—another eternal business need. The enterprise-oriented data warehouse was praised for quality or excoriated for never-ending, over-budget projects. Data marts were introduced for immediate business value but soon de- rided as quick and dirty. The pendulum continues to swing. Data warehousing soon begat business intelligence, drove advances in data management, benefited from developments in information technology, and is now claimed to be replaced by business analyt- ics. But analytics and big data still focus on data rather than infor- mation, numbers rather than insight. And even there, the increasing role of simulation and modeling poses questions about what we are trying to achieve. Ackoff’s issues with MIS have not gone away. Does better and more data drive “improved” decisions? Is de Bono wrong to say that new thinking never emerges from analyzing data? We focus on bigger, faster and better data and continue to dream of a single version of the truth. Behind a superficially unchanging architecture, we are forced to address the conundrum of consistency vs. timeli- ness—accepting the fundamental inconsistency that characterizes the real world and the basic inability of humans to think and decide at the speed of light. We must reexamine the postulates at the foundation of data warehousing and business intelligence. We find them wanting. With this, we reach the third and final dimension of information in the IDEAL architecture. The evolution of data-based decision mak- ing has reached a punctuation point. A focus on data—in its old restricted scope—as the basis for all decision making has left busi- ness with increasingly restrictive tools and processes in a rapidly evolving biz-tech ecosystem. It’s time to introduce the core compo- nents of a new logical architecture—a consolidated, harmonized information platform (REAL)—that is the foundation for a proper balance between stability and innovation in decision making.
  • 8. 117 | CHAPTER 5: Data-based decision making 5.1 Turning the tables on business hat do you recall—or, if you’re some- what younger than me, imagine—of 1984? Other than George Orwell’s dystopian novel. Los Angeles hosted the Olympic Games. In the UK, a year-long coal miners’ strike began. The Space Shuttle Discovery made her maiden voyage. More than 23,000 people died of gas poisoning in the Bhopal Disaster. Terms of En- dearment won five Oscars. The Bell System of companies was broken up by the U.S. Justice De- partment after 107 years. The Sony Discman was launched. Ronald Regan won a second term as President of the United States. The first Apple Macintosh went on sale with 128kB of RAM and a single floppy disk drive. I’ve focused in on this year because it was in 1984 that Teradata released the DBC/1012, the first relational database MPP (massively parallel processing) machine aimed squarely at DSS applications. Oracle introduced read consistency in its version 4. And 1984 also marked the full year hiatus between the an- nouncement of DB2 for MVS by IBM and its general availability in April, 1985. In summary, this was the year that the technology re- quired to move MIS to a wider audience finally arrived. Until that time, MIS were hampered by a combination of hardware and software limitations. As John Rockart noted in a paper intro- ducing critical success factors (CSFs), the most common approach to providing information to executives was via a multitude of re- ports that were by-products of routine paperwork processing sys- tems for middle managers and, most tellingly, “where the information subsystem is not computer-based, the reports reaching the top are often typed versions of what a lower level feels is useful” (Rockart, 1979). My experience in IBM Europe in 1985 was that use of the existing APL-based system (Section 4.1) was mainly through pre-defined reports, that business users needed significant administrative sup- port, and that tailoring of reports for specific needs required IT intervention. The new relational databases were seen as the way to rejuvenate MIS. Business users were initially offered SQL (Struc- tured Query Language), a data manipulation language first defined W FigureFigureFigureFigure 5555----1111:::: DBC/1012 Data Base Computer System, 1987 Teradata Brochure
  • 9. Business unIntelligence | 118 as SEQUEL at IBM in the 1970s (Chamberlin & Boyce, 1974) and QMF (Query Management Facility), because the language was well defined and basic queries were seen as simple enough for business analysts to master. Views—the output of relational queries— provided a means to restrict user access to subsets of data or to hide from them the need for table joins (Devlin & Murphy, 1988) and the more arcane reaches of SQL. Simpler approaches were also needed, and QBE (Query by Example), developed at IBM (Zloof, 1975), presaged the graphical interfaces common today. This consensus was not driven by user requirements or technologi- cal fit alone. Relational database vendors of the time were in need of a market. Their early databases performed poorly vs. hierarchical or network databases for operational applications, in many cases. MIS, still a relatively immature market, offered an ideal opportuni- ty. From the start, a few new vendors—notably Teradata— optimized their technology exclusively for this space, while other RDBMS vendors chose a more general-purpose path to support both types of requirements. However, given their deeper under- standing of transactional processing, these vendors were more successful in improving their technology for operational needs, such as update performance, transactional consistency, and work- load management, already well understood from hierarchical or network databases. The relational database market thus diverged in the 1980s, with large-scale, data-driven DSS, pioneered mainly by Teradata, and other vendors successful at small- and medium- scale. A continued focus on transactional requirements allowed RDBMSs to eventually overtake hierarchical and network data- bases for new operational developments, in a classical example of the working of a disruptive technology as defined in The Innovator’s Dilemma (Christensen, 1997). The outcome was that ERP, SCM, and similar enterprise-wide initiatives throughout the 1990s were all developed in the relational paradigm, which has become the de facto standard for operational systems since then. From the late 1990s on, the major general-purpose RDBMS vendors increased focus on MIS function to play more strongly in this market. Specialized multidimensional cubes and models also emerged through the 1970s and 1980s to support the type of drill-down and pivoting analysis of results widely favored by managers. The term
  • 10. 119 | CHAPTER 5: Data-based decision making oooonlinenlinenlinenline aaaanalyticalnalyticalnalyticalnalytical pppprocessingrocessingrocessingrocessing (OLAP)(OLAP)(OLAP)(OLAP) was coined in the early 1990s (Codd, et al., 1993) to describe this approach. It is implemented on relational databases (ROLAP), specialized stores optimized for multidimensional processing (MOLAP), or in hybrid systems (HOLAP). As a memorable contrast to OLTP (online transaction processing), the term OLAP gained widespread acceptance and continues to be used by some vendors and analysts as a synonym for BI, MIS or DSS to cover all informational processing. 5.2 The data warehouse at the end of the universe1 eyond the report generation culture and lack of easily used or understood tools for exploring data that relational technolo- gy was expected to address, another key issue that had emerged was the multiplicity of inconsistent analytic data sets be- ing created throughout the organization. Both business and IT were struggling with this. Over time, the problem and the solution became a mantra for BI: a single version of the truthsingle version of the truthsingle version of the truthsingle version of the truth. Operational applications are optimized for particular tasks within the functional or divisional context in which they were built. Banks have separate systems for checking (current) accounts and credit cards. Accounts receivable and accounts payable run on different databases. However, they also contain overlapping data, which may be defined differently or may be inconsistent at certain moments in the process. Acquisitions lead to multiple systems across geograph- ical regions doing the same tasks. Furthermore, data sourced from the same operational system through different pathways may give 1 The Restaurant at the End of the Universe (1980) is the second book in the series The Hitchhiker’s Guide to the Galaxy by Douglas Adams. B To this day, there continues to exist a tacit assumption that the cor- rect, and perhaps only, approach to providing data for decision making is through a separate set of data copied from the opera- tional and other source environments. Now, knowing the business and technical reasons for the original separation, we must surely ask if the assumption is still valid.
  • 11. Business unIntelligence | 120 differing results. Figure 5-2 is a simplified view of a typical environment; tracing the flow of data fragments via the various num- bered management information systems (de- veloped by different teams for divergent purposes) to the business users gives some idea of the potential for data disorientation. All of this leaves the business with difficult reconciliation problems—even getting a sin- gle, consistent list of customers can be a chal- lenge. The result is inconsistent and incorrect business decisions. Unseemly disputes arise in management meetings over the validity of differing reports. IT is blamed and tasked by irate executives to explain and fix these in- consistencies and simultaneously required to deliver ever more extracts for new MIS and reporting needs. Add the inefficiency of the same data being extracted from overworked operational systems again and again, and the IT cup doth flow over (Devlin, 1996). Enter the data warehouseEnter the data warehouseEnter the data warehouseEnter the data warehouse IBM faced the same problems in its internal IT systems, and in 1985, Paul Murphy and I were tasked to define a solution. The term data warehouse was conceived in this internal work, and based upon it, we published the first data warehouse architecture in 1988, shown in Figure 5-3 (Devlin & Murphy, 1988). It proposed a “Busi- ness Data Warehouse (BDW)… [as] the single logical storehouse of all the information used to report on the business… In relational terms, a view / number of views that…may have been obtained from different tables.” The BDW was largely normalized, its data reconciled and cleansed through an integrated interface to operational systems. Among the first things that we and other data warehouse builders discovered was that cobbling together even a single, consistent list of customers or products, for example, was hard work. Operational systems that were never designed to work together didn’t. Even when individually reliable, these systems failed to dependably de- liver consistency. With different meanings, contexts and timings for multiple sources, data reconciliation was expensive. The conclusion was that operational applications could not be fully trusted; they FigureFigureFigureFigure 5555----2222:::: A typical MIS environment
  • 12. 121 | CHAPTER 5: Data-based decision making contained data that was incomplete, often inaccurate, and usually inconsistent across different sources. As a result, the data warehouse was proposed as the sole place where a complete, accurate and consistent view of the business could be obtained. A second cornerstone of the architecture was that in order to be useful to and usable by business people, data must have a frame- framework describing what it means, how it is derived and used, who is responsible for it, and so on—a business data directory. This is none other than metadata, sourced from operational systems’ data dictionaries and business process definitions from business people. Data dictionaries were components of or add-ons to the hierarchical and net- work databases of the time (Marco, 2000). However, they typically contained mostly technical metadata about the fields and relationships in the database, supplemented by basic descriptions, written by programmers for programmers. Making data as reliable as possible at its source is only step one. When information about the same topic comes from different sources, understanding the context of its creation and the processes through which it has passed becomes a mandatory second step. For this process, enterprise-level modeling and enterprise-wide IT are needed. Over the same period, the foundations of information architecture were established (Zachman, 1987), (Evernden, 1996), leading to the concept of enterprise data models (EDM), among other model types. An enterprise data model has become a key design component of data warehousing metadata, although its definition and population has proven to be somewhat problematic in practice. A key tenet was that the EDM should be physically in- stantiated as fully as possible in the data warehouse to establish agreed definitions for all information. It was also accepted that the operational environment is too restricted by performance limita- tions and too volatile to business change to allow instantiation of FigureFigureFigureFigure 5555----3333:::: First data warehouse architecture, based on Devlin & Murphy, 1988
  • 13. Business unIntelligence | 122 the EDM there. The data models of operational applications were fragmented, incomplete and disjointed, so the data warehouse became the only reliable source of facts. The final driver for the data warehouse was also the reason for its name. It was a place to store the historical data that operational systems could not keep. In the 1980s, disk storage was expensive, and database performance diminished rapidly as data volumes increased. As a result, historical data was purged from these sys- tems as soon as it was no longer required for day-to-day business needs. Perhaps more importantly, data was regularly overwritten as new events and transactions occurred. For example, an order for goods progresses through many stages in its lifecycle, from provi- sional to accepted, in production, paid, in transit, delivered and eventually closed. These many stages (and this is a highly simplified example) are often represented by status flags and corresponding dates in a single record, with each new status and date overwriting its predecessor. The data warehouse was to be the repository where all this missing history could be stored, perhaps forever. The data mart warsThe data mart warsThe data mart warsThe data mart wars ofofofof thethethethe 1990s1990s1990s1990s and beyand beyand beyand beyondondondond While envisaging a single logical data repository accessed through relational views is straightforward, its physical implementation was—certainly in the 1990s, and for many years—another matter. Database performance, extract/transform/load (ETL) tooling, data administration, data distribution, project size, and other issues quickly arose. The horizontal divisions in the BDW in Figure 5-3 became obvious architectural boundaries for implementation, and The trustworthiness of operational data—its completeness, con- sistency and cleanliness—although still a concern for many BI im- plementations, is much improved since the 1980s. Enterprise data modeling plays a stronger operational role, with enterprise-scope ERP and SCM applications now the norm. The data warehouse is no longer the only place where the EDM can be instantiated and popu- lated with trustworthy data. And yet, the goal of a single version of the truth seems to be more remote than ever, as big data brings ever more inconsistent data to light.
  • 14. 123 | CHAPTER 5: Data-based decision making vendors began to focus on the distinc- tions between summary/enhanced data and raw/detailed data. The former is seen by users and typically provides obvious and immediate business value. The latter is, at least initially, the primary concern of IT and delivers less obvious and immedi- ate business value, such as data integra- tion and quality. In contrast to the user- unfriendly concept of a warehouse, a datadatadatadata martmartmartmart—optimized for business users— sounded far more attractive and inviting. As a result, many vendors began to pro- mote independent data martsindependent data martsindependent data martsindependent data marts in the 1990s as largely stand-alone DSS environments based on a variety of technologies and sourced directly from the operational applications. Their attraction was largely based on the lengthy timeframe for and high cost of building the integrated data store, by then called the enterprise data warehouseenterprise data warehouseenterprise data warehouseenterprise data warehouse (EDW)(EDW)(EDW)(EDW). Business users with urgent deci- sion-support needs were easily convinced. Architecturally, of course, this approach was a step backwards. If the data warehouse was conceived to reduce the number and variety of extracts from the operational environment—often described as spaghetti— independent data marts significantly reversed that goal. In fact, except perhaps in terms of the technology used, such marts were largely identical to previous copies of data for DSS. Many data warehouse experts consider independent data marts to be unnecessary political concessions that drain the IT budget. Ven- dors promote them for speed and ease of delivery of business value in a variety of guises. Data warehouse appliances, described below, are often promoted by vendors with data mart thinking. Similar approaches are often used to sell data analytic tools that promise rapid delivery of reports, graphs and so on without having to “both- er with” an enterprise data warehouse. Independent data marts Warehouse or mart The terms data warehouse, enterprise data warehouse and data mart are much confused and abused in common parlance. For clarity, I define: Data warehouse:Data warehouse:Data warehouse:Data warehouse: the data collection, management and storage environment supporting MIS and DSS. Enterprise data warehouse (EEnterprise data warehouse (EEnterprise data warehouse (EEnterprise data warehouse (EDW):DW):DW):DW): a detailed, cleansed, reconciled and mod- eled store of cross-functional, historical data as part of a data warehouse. Data mart:Data mart:Data mart:Data mart: a set of MIS data optimized and physically stored for the convenience of a group of users. In a layered data warehouse, dependent data marts are fed from the EDW and independent marts are discouraged.
  • 15. Business unIntelligence | 124 often deliver early business value. However, they also drive medi- um and longer term costs, both for business users, who have to deal with incompatible results from different sources, and for IT, who must maintain multiple data feeds and stores, and firefight exten- sively on behalf of the business when inconsistencies arise. On the other hand, independent data marts may be seen as complemen- tary to an EDW strategy, allowing certain data to be made available more quickly and in technologies other than relational databases— a characteristic that is increasing in importance as larger and more varied data sources are required for decision support. Another approach, known as dependent ddependent ddependent ddependent data martsata martsata martsata marts, is to physically instantiate subsets of the EDW, fed from the consistent raw data there and optimized for the needs of particular sets of users. This approach was adopted by practitioners who understood the enor- mous value of an integrated and well-modeled set of base data and favored centralized control of the data resource (Inmon, 1992). From the early 1990s, many data warehouse projects attempting to adhere to the stated data quality and management goals of the architecture were severely limited by the performance of general purpose databases and moved to this hybrid or layered model (Devlin, 1996), as depicted in Figure 5-4, where dependent data marts are sourced from the EDW and treat- ed as an integral part of the warehouse en- vironment. While addressing query performance needs, as well as providing faster development possibilities, the down- side of this approach, however, is that it segments the data resource both vertical- ly—between EDW and data marts—and horizontally—between separate marts. Fur- thermore, it adds another ETL layer into the architecture, with added design and mainte- maintenance costs, as well as additional runtime latency in populating the data marts. However, many vendors and con- sultants continue to promote this layered approach to ensure query performance as a way to isolate data in a mart, and/or shorten development project timelines. A few ven- FigureFigureFigureFigure 5555----4444:::: Layered DW archi- tecture, based on Devlin, 1996
  • 16. 125 | CHAPTER 5: Data-based decision making dors, notably Teradata aided by its purpose-built parallel database, pursued the goal of a single, integrated physical implementation of the original architecture with significant success. The ODSThe ODSThe ODSThe ODS and the virtual data warehouseand the virtual data warehouseand the virtual data warehouseand the virtual data warehouse The simple and elegant layered architecture shown in Figure 5-4, despite its continued use by vendors and implementers of data warehouses, was further compromised as novel business needs, technological advances, and even marketing initiatives added new and often poorly characterized components to the mix. In the mid- ’90s, the operational data store (ODS2 ) was introduced as a new concept (Inmon, et al., 1996) integrating operational data in a sub- ject-oriented, volatile data store, modeled along the lines of the EDW. First positioned as part of the operational environment, it became an integral part of many data management architectures, supporting closer to real-time, non-historical reporting. Although still seen regularly in the literature, the term ODS has been appro- priated for so many different purposes that its original meaning is often lost. Nonetheless, the concept supports an increasingly im- portant near real-time data warehouse workload, albeit one that is implemented more like an independent data mart and with limited appreciation of the impact of the extra layering involved. Another long vilified approach to data warehousing is the virtual data warehouse—an approach that leaves all data in its original locations and accesses it through remote queries that federate results across multiple, diverse sources and technologies. An early example was in IBM’s Information Warehouse Framework an- nouncement in 1991, where EDA/SQL3 from Information Builders Inc. (IBI) provided direct query access to remote data sources. The approach met with fierce opposition throughout the 1990s and early 2000s from data warehouse architects, who foresaw signifi- cant data consistency—both in meaning and timing—problems, security concerns, performance issues, and impacts on operational systems. The concept re-emerged in the early 2000s as Enterprise Information Integration (EII), based on research in schema mapping (Haas, et al., 2005), and applied to data warehousing in IBM DB2 2 It has long amused me that ODS spoken quickly sounds like odious ☺ 3 Since 2001, this technology is part of iWay Software, a subsidiary of IBI.
  • 17. Business unIntelligence | 126 Information Integrator (Devlin, 2003). The approach has been re- cently rehabilitated—now called data virtualizationdata virtualizationdata virtualizationdata virtualization, or sometimes data federationdata federationdata federationdata federation—with an increased recognition that, while a physical consolidation of data in a warehouse is necessary for consistency and historical completeness, other data required for decision sup- port can remain in its original location and be accessed remotely at query time. Advantages include faster development using virtual- ization for prototyping and access to non-relational and/or real- time data values. This change in sentiment is driven by growing business demands for (i) increased timeliness of data for decision making and (ii) big data from multiple, high volume, and often web- based sources (Halevy, et al., 2005). Technologically, both of these factors militate strongly against the copy-and-store layered archi- tecture of traditional data warehousing. In addition, the mashup, popular in the world of Web 2.0, which promotes direct access from PCs, tablets, etc., to data combined on the fly from multiple data sources, is essentially another subspecies. Gartner has also promoted the approach as a logical data warehouselogical data warehouselogical data warehouselogical data warehouse (Edjlali, et al., 2012), although the term may suggest the old virtual DW terminology, where physical consolidation is unnecessary. In modern usage, however, more emphasis is placed on the need for an underlying integration or canonical modelcanonical modelcanonical modelcanonical model to ensure consistent communication and messaging between different components. In fact, it has been proposed that the enterprise data model can be the foundation for virtualization of the data warehouse as well as, or instead of, its instantiation in a physical EDW (Johnston, 2010). By 2012, data virtualization achieved such respectability that long- time proponents of the EDW accepted its role in data management (Swoyer, 2012). Such acceptance has become inevitable, given the growing intensity of the business demands for agility mentioned above. Furthermore, technology has advanced. Virtualization tools have matured. Operational systems have become cleaner. Imple- mentations are becoming increasingly common (Davis & Eve, 2011) in operational, informational and mixed scenarios. Used cor- rectly and with care, data virtualization eliminates or reduces the need to create additional copies of data and provides integrated, dynamic access to real-time information and big data in all its forms.
  • 18. 127 | CHAPTER 5: Data-based decision making The advance of the appliancesThe advance of the appliancesThe advance of the appliancesThe advance of the appliances The data mart wars of the 1990s were reignited early in the new millennium under the banner of analytic / data warehouse / data- base appliances with the launch of Netezza’s first product in 2002. By mid-decade the category was all the rage, as most appliance vendors, such as Netezza, DATAllegro, Greenplum, Vertica and ParAccel (before their acquisitions) sold their appliances as typical independent data marts. As we entered the teens, the appliance had become mainstream as the main players were acquired by hardware and software industry giants. More user-oriented analyt- ic tools, such as QlikView and Tableau, can also be seen as part of the data mart tradition supporting substantial independent data stores on PCs and servers. All these different appliances were ena- bled by—and further drove—a combination of advances in database hardware and software that provided substantial performance gains at significantly lower prices than the traditional relational database for informational processing. These advances have oc- curred in three key technology areas, combining to create a “per- fect storm” in the data warehousing industry in the past decade. Parallel processingParallel processingParallel processingParallel processing————SMP and MPPSMP and MPPSMP and MPPSMP and MPP hardwarehardwarehardwarehardware The growth in processing power through faster clock speeds and larger, more complex processors (scale-up) has been largely super- seded by a growth in the number of cores per processor, proces- sors per blade, and servers per cluster operating in parallel (scale- out). Simplistically, there are two approaches to parallel processing. First—and most common, from dual-core PCs all the way to IBM System z—is symmetric multi-processing (SMP) where multiple processors share common memory. SMP is well understood and works well for applications from basic word processing to running a high performance OLTP system like airline reservations. Problems amenable to being broken up into smaller, highly independent parts that can be simultaneously worked upon can benefit greatly from massively parallel processing (MPP), where each processor has its own memory and disks. Many BI and analytic procedures, as well as supercomputer-based scientific computing, fall into this category. MPP for data warehousing was pioneered by Teradata from its inception, with IBM also producing an MPP edition of DB2, since
  • 19. Business unIntelligence | 128 the mid-1990s. Such systems were mainly sold as top-of-the-range data warehouses and regarded as complex and expensive. MPP has become more common as appliance vendors combined commodity hardware into parallel clusters and took advantage of multi-core processors. As data volumes and analytic complexity grow, there is an increasing drive to move BI to MPP platforms. Programming databases to take full advantage of parallel processing is complex but is advancing apace. Debates continue about the relative ad- vantages of SMP, MPP and various flavors of both. However, a higher perspective—the direction toward increasing parallelization for higher data throughput and performance—is clear. SolidSolidSolidSolid----sssstatetatetatetate storagestoragestoragestorage————disks and indisks and indisks and indisks and in----memorymemorymemorymemory hardwarehardwarehardwarehardware Advances—and price reductions—in solid-state memory have al- lowed core memory sizes to grow enormously, and allow at least some disk storage to be replaced with solid-state devices. Larger volumes of data can be accessed at speeds orders of magnitude faster than possible on spinning disks. This trend splits further into in-memory and solid-state disk (SSD) approaches. The former ena- bles higher-speed performance, but may require redesign of the basic hardware and software architecture of the computer. The latter provides significant, but lower, performance gains without re-architecting the access methods by presenting the solid-state device as a traditional disk. BI and analytic applications, with their need for large volumes of data, benefit greatly from this direction. OLTP systems also benefit from increased processing speed. Be- cause current solid-state devices are volatile and lose their data when power drops, this technology is seen as more appropriate for BI, where there exists another source for any data lost, as opposed to operational systems where the risk of data loss is higher. Because spinning disk remains substantially cheaper than solid- state storage, and is likely to remain so for the foreseeable future, most solid-state storage is used for data that is accessed regularly, where the benefit is greatest. This leads to temperature-based— hot, warm and cold data is defined on frequency of access—storage hierarchies. Some vendors, however, opt for fully in-memory data- bases, redesigned to take maximum advantage of the solid-state approach, using disk only as a disaster recovery mechanism. Be- cause solid-state stores remain significantly smaller than the larger
  • 20. 129 | CHAPTER 5: Data-based decision making disks, solid-state is favored where speed is the priority, as opposed to large data volumes, which favor disk-based approaches. The general trend, therefore, is towards hybrid systems containing a substantial amount of solid-state storage combined with large disk stores. SSDs, with their hybrid design, will likely continue to bridge fully integrated, in-memory databases and large disk drives with high speed, mid-size, non-volatile storage. RRRRowowowow----basedbasedbasedbased and coland coland coland columnarumnarumnarumnar ddddatabase designatabase designatabase designatabase design In the relational model of tables consisting of rows and columns, row-based storage—physically storing the fields of a single data- base record sequentially on disk—was the physical design of choice of all early relational database designers because it performed well for the typical single record, read/write access method prevalent in OLTP applications. Since the start of the 21st century, relational database and appliance designers have begun experimenting with column-based storage—storing all the fields in each column physi- cally together. Both types are shown in Figure 5-5. The columnar structure is very effective in reducing query time for many types of BI application, which are typically read-only and require only a sub- set of the fields in each row. In disk- based databases, the resulting re- duction in I/O can be significant. Columns also enable more efficient data compression because they contain data of the same structure. Together, these factors enable sub- stantial performance improvements for typical BI queries. Again we see a trade-off in performance. Row- based is optimal for OLTP; colum- nar is better suited to certain clas- ses of BI application. Loading data into a columnar database requires restructuring of the incoming data, which generally arrives in row- based records. This and other per- formance trade-offs between row and column storage have led in- creasingly towards hybrid schemes, FigureFigureFigureFigure 5555----5555:::: Physical layout of row-based and columnar databases
  • 21. Business unIntelligence | 130 where the DBMS decides which type or even multiple types of storage to use for which data to optimize overall performance. SummarySummarySummarySummary of technology advancesof technology advancesof technology advancesof technology advances The major driver for the advances above has been the business demand for faster query and analysis performance for ever larger sets of data. Performance and price/performance measures for analytic databases in TPC-H benchmarks and in quoted customer examples over the past few years show gains of 10X to 100X—and in some cases considerably more—over general-purpose data- bases. Of course, performance optimization is not new in relational databases. Indexes, materialized views, caching and specialized table designs have been the stock in trade for 20 years now. How- ever, these traditional approaches are often highly specific to the data model in use and anticipated query patterns. Tuning and opti- mization is thus a labor-intensive process that delays initial use of the data and often requires rework as usage patterns change. And complete ad hoc usage cannot be optimized by these means. In contrast, the performance gains in analytic DBMSs step from fun- damental hardware/software improvements and are model- independent and generally applicable to all data and most analytical query patterns. This improved performance has also allowed ven- dors to simplify database design and management. Physical design trade-offs are reduced. The need for indexes can be limited or re- moved altogether, simplifying both initial and ongoing tuning and maintenance work for database administrators, thus lowering DBA involvement and costs over the entire lifetime of the system. Data warehouse implementation has strayed far from the ideals of the original architecture. The concept of a single source for DSS was quickly overthrown by technology limitations, practi- cal/political issues for buyers and the needs of vendors to close sales quickly. It is often assumed that data layers are mandatory in any large-scale warehouse implementation. However, the original main driver for layering—query performance of relational data- bases—has been at least partially overcome in modern technology. Perhaps the key question posed by this history is: why do we per- sist with the highly fragmented data structure that has evolved?
  • 22. 131 | CHAPTER 5: Data-based decision making So far, much of the impact has been on traditional BI—running fast- er queries over larger data sets. However, as we shall explore in Section 5.7, these advances also enable new ways of thinking about the overall operational/informational architecture that has evolved over nearly three decades. Today, the focus is shifting to the rela- tionship between operational and informational needs. In the fu- ture, the emphasis will likely extend to operational processing. Well, here’s another fineWell, here’s another fineWell, here’s another fineWell, here’s another fine4444 mess you’ve gotten me intomess you’ve gotten me intomess you’ve gotten me intomess you’ve gotten me into The outcome has been that most data warehouse implementations have become increasingly complex, with combinations of inde- pendent and dependent data marts, marts fed from other marts, and even read/write marts added to the mix. These added types and layers of marts lead to an extensive set of ETL that is difficult to maintain as users’ needs change. All of this harks back to the earliest days of decision support, when many users made specialized copies of any data they needed, while others—with sufficient technical nous—dived directly into the actual sources, irrespective of any ensu- ing data management chaos. The result- ing data warehouse “architecture” today, depicted in Figure 5-6, has lost its origi- nal simplicity and provides implementers with little guidance on how to structure DSS in a modern business. As the layers and silos increase, many problems be- come more pressing. Data duplication leads to ever-growing levels of incon- sistency that have to be manually recon- ciled in the reporting process, reducing users’ confidence in the data warehouse. Hardware, software and labor costs grow, and maintenance becomes ever more complex, constraining the provi- sion of new functionality and information 4 Ollie’s catchphrase in the Laurel and Hardy films was actually “…another nice mess…”! See http://bit.ly/kARK1 FigureFigureFigureFigure 5555----6666:::: A modern data warehouse “architecture”
  • 23. Business unIntelligence | 132 to meet business demands. Customer and partner interactions suffer because of siloed and inconsistent information. And despite demands for more timely information, the added layers actually delay data supply to the users. It is now mandatory to address these issues. From a business view- point, increased competition and higher customer expectations are driving demands that both hard and soft information from all sources—human-sourced, machine-generated and process- mediated, both internal and external—is integrated and internally consistent as far as possible and necessary across the organization, and delivered at ever increasing speed. Data warehousing as origi- nally conceived, with its almost exclusive focus on hard information and internally generated, process-mediated data, fails these busi- ness demands. On the technology front, we’ve seen the advances in databases that have changed the computing landscape. Still to come, we will see how Service Oriented Architecture (SOA) and mobile computing approaches are dramatically changing the data and process structures of the operational environment, while In- ternet technologies are redefining how users expect to interact with all applications. Big data has added a further, and perhaps final, twist to the story: the data volumes and velocities involved are incompatible with a dependent data mart approach that involves passing such data through the EDW. This, together with the new storage technologies needed, leads to the conclusion that these data sets can be supported only in an architecture that allows inde- pendent data marts in addition to an EDW. All of these changes in technology press in upon the data warehouse environment from above and below, within and without, challenging the fundamental assumptions upon which data warehousing was originally defined. The data warehouse may be at the end of its own universe. Its orig- inal raison d’être as the one, true, consistent past and present state of the business is no longer possible nor, arguably, needed. Howev- er, it is the only approach to data management that has even con- sidered many of the information issues raised so far. The bottom line is that the data warehouse architecture, as originally conceived and eventually delivered, is in need of a radical overhaul.
  • 24. 133 | CHAPTER 5: Data-based decision making 5.3 Business intelligence—really? he previous section focused on the data aspect of data-based decision making, particularly on the preparation phase. While this is probably the IT comfort zone in decision support, it is also fair to say that without meaningful and consistent data, pro- moting data-based decision making to the business is a recipe for disaster. It takes only a single set of erroneous information for the entire approach to be discredited in the users’ eyes. So it was that in data warehousing, although originally defined as covering the entire process of data-based decision support—from defining, col- lecting and preparing the needed data to the user-facing tooling required by business users—much of the early focus was on data management issues. Plus ça change, plus c'est la même chose. By the early 1990s, the data warehouse was perceived much as its physical counterpart—a user-unfriendly place, cavernous, poorly lit and infested with deadly fork-lift trucks. As a result, the phrase business intellbusiness intellbusiness intellbusiness intelligenceigenceigenceigence (BI)(BI)(BI)(BI) was adopted by Gartner analyst, Howard Dresner, when he moved from Digital Equipment Corporation (DEC), where the phrase was in use internally from 19895 . The term was also in use in the early 1990s in the intelligence—meaning spying—community in the context of industrial espionage. Howev- er, Dresner’s stated aim was to emphasize the business aspect of data warehousing, summarized in a common definition of BI as “a set of concepts and methods to improve business decision making by using fact-based support systems” (Power, 2009). In practical terms, this translated into a set of reporting and ad hoc query tools with attractive presentation capabilities. Spreadsheets clearly meet these criteria and are widely used to support decision making, but, as we saw in Chapter 4, they are seen as anathema to the data management foundation of data-based decision making. One might argue that business intelligence is actually an oxymoron. Those of us who’ve worked in large enterprises have seen enough evidence to conclude that many decisions have limited relevance to stated business goals and a shaky relationship with intelligence. How many successful decisions have been declared as based on 5 Private communication T
  • 25. Business unIntelligence | 134 “gut feeling”? And unsuccessful ones blamed on “lack of reliable information”? How often does political expedience override a strongly argued rationale? How many business analysts have been asked to “just take one more look at the figures” when the numbers seemed to contradict the group wisdom of the boardroom? So, what does the term really mean? Pharaoh’s tombPharaoh’s tombPharaoh’s tombPharaoh’s tomb————tttthe BI usage pyramidhe BI usage pyramidhe BI usage pyramidhe BI usage pyramid The meaning of BI may be best explored through its support of decision making as it relates to business roles and organization, depicted in the second ubiquitous pyramid in the BI literature and shown in Figure 5-7. Classical Egyptologists identify pyramids as the Pharaohs’ tombs; alternative historians propose an array of far more interesting possibilities. Their original purpose remains ob- scure. The BI usage pyramid has a simple and obvious purpose, but is weighed down with added—and often misleading—connotations. In its most basic form, it describes three broad levels in the organi- zation where BI plays. The original, and still most widely practiced, form is tactical BItactical BItactical BItactical BI. Mid-level managers and supervisors of ongoing activities, as well as the more numerically savvy (often termed inde- pendent) business analysts, who support them, are the target audi- ence. The goal of tactical BI is three-fold: (i) to ensure ongoing operational processes and their operators are running optimally, (ii) to find and encourage speed or productivity improvements in these processes and (iii) to investigate and fix any anomalies that may arise from either internal or external causes. Typically operating in a timeframe of days to weeks, tactical BI uses historical, often rec- onciled, data sourced from operational systems through the data warehouse environment, usually via data marts. Tactical BI is well suited to the traditional data warehouse architecture shown in Section 5.2 and well supported by the BI query, analysis and reporting tools that emerged in the 1980s and beyond. In fact, the first two goals above drove tactical BI to- wards the current preponderance of report gener- ation and, more recently, dashboard creation at this level. The investigative third goal above has, in many cases, been taken over by spreadsheets and thus deemed not worthy of consideration as BI by some purists. FigureFigureFigureFigure 5555----7777:::: The BI usage pyramid
  • 26. 135 | CHAPTER 5: Data-based decision making Historically, strategic BIstrategic BIstrategic BIstrategic BI—aimed at supporting senior managers and executives in long-term, strategic decision making—was the next target for proponents of BI. This need had been identified as early as 1982 in a Harvard Business Review paper The CEO Goes On-Line (Rockart & Treacy, 1982), where the authors reported on the shocking discovery that some C-level executives were using desk- top computer terminals to access status information—reports— about their businesses and even analyzing and graphing data trends. One executive even reported that “Access to the relevant data to check out something…is very important. My home terminal lets me perform the analysis while it’s at the forefront of my mind.” The paper also introduced the term executive information systemexecutive information systemexecutive information systemexecutive information system (EIS)(EIS)(EIS)(EIS) to an unsuspecting world; a reading some 40 years later reveals just how little has changed in the interim in senior managers’ needs for data about their businesses and their ability to probe into it. An intriguing and earlier reference to a system on a boardroom screen and computer terminals declares “Starting this week [the CEO of Gould] will be able to tap three-digit codes into a 12-button box resem- bling the keyboard of a telephone. SEX will get him sales figures. GIN will call up a balance sheet. MUD is the keyword for inventory” (Business Week, 1976). At least the codes were memorable! These executive-level, dependent users needed extensive support teams to prepare reports. However, the thought of an executive analyzing data no longer surprises anyone, and today’s executives have grown up in an era when computer, and later Internet use was becoming pervasive. Driven by the iPad revolution, independent executives probably outnumber their dependent colleagues today, although extensive backroom data preparation and support re- mains common. Driven in large part from business management schools, the concept of EIS developed largely independently from data warehousing through the 1980s (Thierauf, 1991), (Kaniclides & Kimble, 1995). With the growing popularity of data warehousing and BI, and the recognition that data consistency was a vital pre- requisite, IT shops and vendors gradually began to include EIS with- in BI as the top layer of the pyramid. I speak of data above because despite the information in its name, EIS focused more on data, mainly originating from the financial and operational systems. External data sources such as Standard and
  • 27. Business unIntelligence | 136 Poor’s were also seen as important to executives. However, it has long been recognized that soft information—press and TV reports, analyst briefings, and internal documents and presentations, as well as informal information from face-to-face meetings—forms a high percentage of the information needs of the majority of executives, especially when strategizing on longer-term (months to years) de- cisions. Its absence from EIS implementations, especially those fed from enterprise data warehouses, is probably an important factor in their relative lack of commercial success in the market. Strategic BI also maintained an emphasis on historical data and was differen- tiated from tactical BI largely by the longer business impact timeframe—months to years—expected at this level. Strategic BI implementations have struggled to gain widespread traction for two main reasons. First, they usually exclude soft information, of both the formal and the collaborative, informal varieties. Second, they typically require the reconciliation of all hard information across the full range of operational sources, pushing them far out on any reasonable data warehouse implementation timeline. The final layer of the usage pyramid, operational BIoperational BIoperational BIoperational BI, emerged from the development of the ODS and operational BI, described above. The focus is on intra-day decisions that must be made in hours or even seconds or less. Operational analytics is today’s incarnation, emphasizing the use of hard information in ever increasing amounts. The initial users of operational BI were seen as front-line staff who deal directly with customer, supplier, manufacturing and other real-time processes of the business, supported through live dashboards. Increasingly, operational applications use this function directly through exposed APIs. Based on detailed, near real-time, low latency data, operational BI poses significant technical chal- lenges to the traditional data warehouse architecture, where rec- onciling disparate data sources is often a prolonged process. Nonetheless, operational BI has grown in stature and is now of equal importance to the tactical BI layer for most businesses. Figure 5-7 is used widely to represent several aspects of BI usage. It reflects the traditional hierarchical structure of organizations, both in terms of the relative importance of individual decisions and the number of decisions and potential BI users at each level. It is also often used to illustrate data volumes needed, although this can
  • 28. 137 | CHAPTER 5: Data-based decision making be misleading for two reasons. First is an assumption about the level of summarization vs. detail in the three layers. Operational BI demands highly detailed and ongoing data feeds, clearly requiring the largest possible volume of data. As we move to the tactical lay- er, it is often reasonable to summarize data. Even with lengthy his- torical periods involved, this seldom offsets the summarization savings. However, some business needs do require substantial levels of detail for tactical BI. At the strategic level, significant summarization is common. However, the second factor, the need for soft information for strategic BI mentioned earlier, must also be taken into account. Soft information can be voluminous and is diffi- cult to summarize mathematically. In short, the shape of the pyra- mid indicates information volumes poorly. The relationship of the pyramid to the organizational hierarchy may suggest that data flows up and decisions down the structure. Again, while this may be true in some business areas, it is certainly not universally the case. Many individual operational, tactical and stra- tegic decisions have an existence entirely independent of other layers. A strategic decision about a merger or acquisition, for ex- ample, is highly unlikely to require particular operational decisions in its support. The IT origins of the BI pyramid and its consequent focus on data rather than information shed little light on the pro- cess of decision making. The visual resemblance of the BI usage pyramid to the DIKW version we met in Chapter 3 promotes these further assumptions: that data is the prevalent basis for operational BI (only partly true), while strategic BI is built on knowledge (likely) or even wisdom (probably untrue). However, the BI usage pyramid is more resilient than its DIKW counterpart. It identifies three rela- tively distinct uses of BI that relate well to particular roles and pro- cesses in the organization, to which we return in Chapters 7 and 9. What it misses, because of that very focus on roles and processes, is the topic we now call business analyticsbusiness analyticsbusiness analyticsbusiness analytics. AnalyticsAnalyticsAnalyticsAnalytics————digging beneath the pyramidsdigging beneath the pyramidsdigging beneath the pyramidsdigging beneath the pyramids Data mininData mininData mininData miningggg, known academically as knowledge discovery in data- bases (KDD), emerged at the same time as BI in the 1990s (Fayyad, et al., 1996) as the application of statistical techniques to large data sets to discover patterns of business interest. There are probably few BI people who haven’t heard and perhaps repeated the “beer
  • 29. Business unIntelligence | 138 and diapers (or nappies)” story: a large retailer discovered through basket analysis—data mining of till receipts—that men who buy diapers on Friday evenings often also buy beer. The store layout was rearranged to place the beer near the diapers and beer sales soared. Sadly, this story is now widely believed to be an urban leg- end or sales pitch rather than a true story of unexpected and mo- mentous business value gleaned from data mining. Nevertheless, it makes the point that there may be nuggets of useful information to be discovered through statistical methods in any large body of data, and action that can be taken to benefit from these insights. In the past few years, the phrase business analytics has come to prominence. Business analytics, or more simply, analytics, is defined by Thomas Davenport as “the extensive use of data, statistical and quantitative analysis, explanatory and predictive models, and fact- based management to drive decisions and actions” (Davenport & Harris, 2007) and as a subset of BI. Other authors suggest it is ei- ther an extension of or even a replacement for BI. It also clearly overlaps with data mining. Often discussed as predictive analyticspredictive analyticspredictive analyticspredictive analytics or operational analyticsoperational analyticsoperational analyticsoperational analytics, the market further tries to differentiate ana- lytics from BI as focused on influencing future customer behavior, either longer term or immediately. A common pattern of opera- tional analytics is to analyze real-time activity—on a website, for example—in combination with historical patterns and instantly adapt the operational interaction—offering a different or additional product or appropriate discount—to drive sales. Thus, none of these ideas are particularly novel. BI included similar concepts from the beginning. If we position business analytics in the BI usage pyramid, we can immediately see that operational analytics is, at most, an extension of operational BI. Similarly, predictive analytics enhances and ex- tends the investigative goal of tactical BI that has mainly migrated into spreadsheets. Davenport’s definition above is often quoted to emphasize the role of statistics and predictive models, but perhaps the most important aspect is its underlining of the goal of driving decisions and actions. Beyond that, what have changed are the data sources and volumes available in big data, as well as the faster pro- cessing demanded by users and provided by modern hardware and software advances. For example, logistics firms now use analysis of
  • 30. 139 | CHAPTER 5: Data-based decision making real-time traffic patterns, combined with order transaction data and traditional route planning, to optimize scheduling of deliveries to maximize truck utilization, and to improve customer satisfaction by providing more accurate estimates of arrival times. While one might be tempted to think that this is simply a matter of speed or scale, in fact, the situation is more of a step change in what is possible, enabling new ways of making decisions and driving the new ways of doing business defined as the biz-tech ecosystem. Data scientists or Egyptologists?Data scientists or Egyptologists?Data scientists or Egyptologists?Data scientists or Egyptologists? As digging beneath pyramids of data has become an increasingly popular pastime, we’ve seen the emergence of a new profession: the data scientist. Although the term data science has a long history, both it and the role of data scientist have been taken to heart by the big data movement. And given the breadth of definitions of big data itself (see Chapter 6), you won’t be surprised to discover that data scientists are equally forgiving about the scope of their job. Unlike Egyptologists. IBM’s Swami Chandrasekaran has built a comprehensive visual Metro map of the skills required of a data scientist (Chandrasekaran, 2013). The visual metaphor is appropriate for a budding data scientist but, with disconnected lines and a technical, big data point of view, the overall picture is disappointing for a business trying to grasp precisely what a data scientist is. In the simplest terms, I believe that a data scientist is best thought of as an advanced, inspired business analyst and power user of a wide set of data preparation and mining, business intelligence, infor- mation visualization, and presentation tools. Added to this he or she needs to understand the business, both process and infor- mation, and have the ability to present a convincing case to manag- ers and executives. A very broad skill set and unlikely to be found in one person. Bill Franks, Chief Analytics Officer at Teradata, pro- vides a comprehensive recipe for the making of an analytic profes- sional or data scientist (Franks, 2012). FigureFigureFigureFigure 5555----8888:::: Khafre’s Pyramid, Giza, Egypt
  • 31. Business unIntelligence | 140 A step beyond the pyramidA step beyond the pyramidA step beyond the pyramidA step beyond the pyramid Looking forward, a change in thinking of particular interest consid- ers how we analyze and interpret reality. BI tools and approaches have generally followed the basic premise of the scientific method in their use of information, where hypotheses and theories are pro- posed and subsequently verified or discarded based on the collec- tion and analysis of information. It has been suggested that business analytics, when used on big data, signals the end of the scientific method (Anderson, 2012). The statistician, George E. P. Box said, over thirty years ago, that “all models are wrong, but some are useful” (Box, 1979). Anderson reported that Peter Norvig, Google's research director, suggested that today’s reality is that “all models are wrong, and increasingly you can succeed without them.” With the petabytes of data and petaflops of processing power Google has at its disposal, one can dispense with the theorizing and simply allow conclusions to emerge from the computer. Correlation trumps causation, declare the authors of Big Data: A Revolution That Will Transform How We Live, Work and Think (Mayer- Schonberger & Cukier, 2013). Clearly, the emergence of big data has reemphasized the analysis of information and the excitement of discovering the previously unknown in its midst. But what becomes “known” if it is a mathematical model so complex that its only expla- nation is that the simulation works? At least until the arrival of a giant dinosaur extinction event asteroid that wasn’t—and couldn’t be—in the equations because it wasn’t in the underlying data (Weinberger, 2012). The problem is not confined to asteroids or, indeed, black swans—a metaphor for unexpected events that have a major effect, and are often inappropriately rationalized. As we gather ever more data and analyze it more deeply and rapidly, we begin to fall prey to the myth that we are increasingly predicting the future with ever greater certainty. A more realistic view might be that the computers and algorithms are making sophisticated guesses about future outcomes. As Alistair Croll opines, “Just be- cause the cost of guessing is dropping quickly to zero doesn’t mean we should treat a guess as the truth” (Croll, 2013). The above radical thinking may be the ultimate logical conclusion of data-based decision making, but I also seriously question if we can trust the Beast in the computer that far, basing decisions solely on
  • 32. 141 | CHAPTER 5: Data-based decision making basic data. Chapter 8 expands the scope of thinking about soft in- formation and knowledge—using the full scope of information stored digitally today. And that, of course, is only a staging post on the quest for a full understanding of how human and team deci- sions can be fully supported by computers, as we’ll explore in Chap- ter 9. But for now, it’s back to the present, where the data warehouse has faced its biggest challenge for quite a few years now: the timeliness of its data. 5.4 Today’s conundrum—consistency or timeliness want it all and I want it now…” Queen’s 1989 hit6 sums it up. So far, we’ve been dealing with the demand for it all. Now we need to address delivering it now. Speed is of the essence, whether in travel, delivery times, or news. For business, speed has become a primary driver of behavior. Shoppers demand instant gratification in purchases; retailers respond with constantly stocked shelves. Suppliers move to real-time delivery via the Web and just-in-time manufacturing. In short, processes have moved into overdrive. Within the business, data needs, already extensive, are thus becom- ing ever closer to real-time. Sales, front-office, and call center per- sonnel require current information from diverse channels about customer status, purchases, orders and even complaints in order to serve customers more quickly. Marketing and design functions operate on ever-shortening cycles, needing increasingly current information to react to market directions and customer prefer- ences and behavior. Just-in-time manufacturing and delivery de- mand near real-time monitoring of and actions on supply chains. Managers look for up-to-the-minute and even up-to-the-second information about business performance internally and market conditions externally. At a technology level, faster processors, parallel processing, solid- state disks and in-memory stores all drive faster computing. Data- bases and database appliances are marketed on speed of response to queries. Dashboard vendors promise to deliver near real-time KPIs (key performance indicators). ETL tools move from batch de- 6 I Want it All, Queen’s 1989 hit written by Brian May. I
  • 33. Business unIntelligence | 142 livery of data to micro-batches and eventually to streaming ap- proaches. Complex event processing (CEP) tools monitor events as they stream past on the network, analyze correlations, infer higher- order events and situations—then act without human intervention. In business intelligence, IT strives to provide faster responses to decision makers’ needs for data. Timeliness manifests most obvi- ously in operational BI, where information availability is pushed from weekly or daily to intra-day, hourly and lower. Near instanta- neous availability of facts and figures is supported by streaming ETL, federated access to operational databases or CEP. But as we’ve seen, before e-commerce made speed and timeliness the flavors du jour, correctness and consistency of data and behav- iors were more widely valued. Data consistency and integration were among the key drivers for data warehousing and business intelligence. Users longed for consistent reports and answers to decision-support questions so that different departments could give agreed-upon answers to the CEO’s questions. Meetings would descend into chaos as managers battled over whose green lineflow report depicted the most correct version of the truth. Decisions were delayed as figures were reworked. And IT, as provider of many of the reports, often got the blame—and the expensive, thankless and time-consuming task of figuring out who was right. Unfortunately, within the realm of business information, timeliness and consistency, while not totally incompatible, make uncomforta- ble bedfellows. Business information is generated by and changed in widely disparate processes and physical locations. The processes often consist of a mix of legacy and modern applications, often built independently at different times and by different departments. The result is inconsistent information at its very source. Increasingly, different parts of business processes are highly distributed geo- graphically. Despite companies’ best efforts, such distribution, of- ten predicated on staff cost savings, usually introduces further inconsistencies in the data. Typically, IT has carried out some level of integration over the years to improve data consistency, but it is often piecemeal and asyn- chronous. Master data management is but one more recent exam- ple of such an effort. However, integration takes time—both in initial design and implementation, as well as in operation. In such an
  • 34. 143 | CHAPTER 5: Data-based decision making environment, achieving simultaneous timeliness and consistency of information requires: (i) potentially extensive applica- tion redesign to ensure fully consistent data definitions and complete real-time processing and (ii) introduction of syn- chronous interactions between different applications. This process becomes more technically demanding and financially expensive in a roughly exponential man- ner as information is made consistent within ever shorter time periods, as shown in Figure 5-9. Despite some consultants’ and vendors’ claims to the contrary, neither timeliness nor consistency have ever been absolutes, nor are they now. Each has its pros and cons, its benefits and costs. Different parts of the business value them to varying degrees. Find- ing a balance between the two in the context of both business needs and IT architectural decisions is increasingly important. And adopting technological solutions that ease the conflict becomes mandatory. The solution, however, entails making choices— potentially difficult ones—about which applications prefer timeli- ness over consistency and vice versa, as well as creating different delivery mechanisms. An EDW-based approach maximizes con- sistency; virtualization maximizes timeliness. Some classes of data might need to be delivered through both methods so that, for ex- ample, a report that is delivered with consistent data in the morn- ing might be supplemented by timely data during the day. In such these cases and as a general principle, metadata informing users of the limitations that apply to each approach is required. Integration Cost Time Interval Week Day Hour Minute Second Delivering a wrong answer early can have a longer-term and great- er impact as incorrect decisions, based on the initial error, multiply in the time period before the correct answer arrives—especially if the error is only discovered much later. Timely, but (potentially) in- consistent, data may be better delivered as an update to a prior consistent base set. FigureFigureFigureFigure 5555----9999:::: Integration cost as a function of timeliness
  • 35. Business unIntelligence | 144 DeDeDeDe----layering the operational and informational worldslayering the operational and informational worldslayering the operational and informational worldslayering the operational and informational worlds As already noted, the need for consistency was the primary driver of the data warehouse architecture, leading to a layered structure, due—at least in part—to limitations of the technology used to im- plement it. The question thus arises of whether advances in tech- nology could eliminate or reduce layering to gain improvements in timeliness or maintainability. There are two related, but different aspects: (i) removing layering within the data warehouse, and (ii) reuniting the operational and information environments. The pos- sibility of the former has been increasing for nearly a decade now, as increasing processor power and software advances in relational databases have driven gains in query performance. Mainstream RDBMSs have re-promoted the concept of views, often material- ized and managed by the database itself to reduce the dependency of dependent data marts on separate platforms populated via ETL tools. The extreme query performance—rated at up to two or three orders of magnitude higher than general purpose RDBMSs—of analytic databases has also allowed consideration of reducing data duplication across the EDW and data mart layers. We are now seeing the first realistic attempts to merge operational and informational systems. Technically, this is driven by the rapid price decreases for memory and multi-core systems, allowing new in-memory database designs. The design point for traditional RDBMSs has been disk based, with special focus on optimizing the bottleneck of accessing data on spinning disk, which is several or- ders of magnitude slower than memory access. By 2008, a number of researchers, including Michael Stonebraker and Hasso Plattner, were investigating the design point for in-memory databases, both OLTP (Stonebraker, et al., 2008) and combined operation- al/informational (Plattner, 2009). The latter work has led to the development of SAP HANA, a hardware/software solution first rolled out for informational, and subsequently for operational, ap- plications. The proposition is relatively simple: with the perfor- mance gains of an in-memory database, the physical design trade- offs made in the past for operational vs. informational processing become unnecessary. The same data structure in memory per- forms adequately in both modes.
  • 36. 145 | CHAPTER 5: Data-based decision making In terms of reducing storage and ongoing maintenance costs, par- ticularly when driven by a need for timeliness, this approach is at- tractive. However, it doesn’t explicitly support reconciliation of data from multiple operational systems as data warehousing does. Building history is technically supported by the use of an insert- based update approach, but keeping an ever-growing and seldom- used full history of the business in relatively expensive memory makes little sense. Nor does the idea of storing vast quantities of soft information in a relational format appeal. Nonetheless, when confined to operational and near-term informational data, the ap- proach offers a significant advance in delivering increased timeli- ness and improved consistency within the combined set of data. 5.5 Most of our assumptions have outlived their uselessness7 he questions posed throughout this and the previous chap- ters lead directly to an examination of how our thinking about data management in business, both in general and with particular reference to decision support, has evolved. Or in many cases, has not moved very far at all. An earlier paper (Devlin, 2009) identified four “ancient postulates” of data warehousing based on an analysis of the evolution of the data warehouse architecture. An additional three postulates now emerge. 1. Operational and informational environments should be sep1. Operational and informational environments should be sep1. Operational and informational environments should be sep1. Operational and informational environments should be sepa-a-a-a- rated for both business and technical reasonsrated for both business and technical reasonsrated for both business and technical reasonsrated for both business and technical reasons Dating from the 1970s, this first postulate predates data ware- housing, but was incorporated without question in the first data warehouse architecture. At that time, both business management and technology were still at a comparatively early stage of evolu- tion. Business decision makers operated on longer planning cycles and often deliberately ignored the fluctuating daily flow of business events—their interest was in monthly, quarterly or annual report- ing; understanding longer trends and directions; or in providing input to multi-year strategies. On the technology front, applications 7 Marshall McLuhan T
  • 37. Business unIntelligence | 146 were hand-crafted and run in mainframes operating at the limits of their computing power and storage. These factors led to one of the longest-lived postulates in IT—the need to separate operational and informational computing and systems. From its earliest days, DSS envisaged extracting data from the operational applications into a separate system designed for decision makers. And, at the time, that made sense: it was what business users needed, and the technology could support it more easily than allowing direct ad hoc access to operational databases. Of all seven postulates, this is the one that has never been seriously challenged…until now, as we saw in the previous section. 2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable,2. A data warehouse is the only way to obtain a dependable, integrated view of the businessintegrated view of the businessintegrated view of the businessintegrated view of the business This postulate was clearly visible in the first architecture paper (Devlin & Murphy, 1988) and in the prior work carried out in IBM and other companies in the mid-1980s. As we’ve already seen in the previous section, a basic assumption of this architecture was that operational applications could not be trusted. The data they contained was often incomplete, inaccurate, and inconsistent across different applications. As a result, the data warehouse was the only place where a complete, accurate and consistent view of the business could be obtained. This postulate is now under challenge on two fronts. First, the data quality of operational systems has improved since then. While still far from perfect, many companies now use commercial off-the- shelf applications in house or in the Cloud, such as SAP or Salesforce.com, with well-defined, widely tested schemas, and ex- tensive validation of input data. These factors, together with exten- sive sharing of data between businesses electronically and between process stages, have driven improved data quality and consistency in the operational environment. Second, the growth of big data poses an enormous challenge to the principle that a de- pendable, integrated view of the business is achievable in any single place. Going forward, we move from the concept of a single version of the truth to multiple, context-dependent versions of the truth, which must be related to one another and users’ understanding of them via business metadata.
  • 38. 147 | CHAPTER 5: Data-based decision making 3.3.3.3. The data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the fullThe data warehouse is the only possible instantiation of the full enterprise data modelenterprise data modelenterprise data modelenterprise data model Another cornerstone of the data warehouse was that data is use- less without a framework that describes what the data means, how it is derived and used, who is responsible for it, and so on. Thus arose the concept of metadata and one of its key manifestations: the enterprise data model. By 1990, this concept had been adopted by data warehousing from information architecture (Zachman, 1987) as the basis for designing the EDW and consolidating data from the disparate operational environment. A key tenet was that the EDM should be physically instantiated as fully as possible in the data warehouse to establish agreed definitions for all information. It was also accepted that the operational environment is too re- stricted by performance limitations; too volatile to business change and its data models too fragmented, incomplete and disjointed to allow instantiation of the EDM there. Thus, the data warehouse became the only reliable placement for the EDM. However, imple- mentation has proven rather problematic in practice. With the increasing pace of business change and the growing role of soft data in business, it is increasingly difficult to envisage the type of enterprise-scope projects required to reach this goal. The EDM and its instantiation in the data warehouse thus remain aspi- rational, at best, and probably in need of serious rethinking. 4.4.4.4. A layered data warehouse isA layered data warehouse isA layered data warehouse isA layered data warehouse is necessary for speedy and reliablenecessary for speedy and reliablenecessary for speedy and reliablenecessary for speedy and reliable query performancequery performancequery performancequery performance As discussed in Section 5.2, data marts and, subsequently, other layering of the data warehouse were introduced in the 1990s to address RDBMS performance issues and the long project cycles associated with data warehouse projects. The value of this postu- late can be clearly seen in the longevity of the architectural ap- proach. However, this layering presents its own problems. It delays the passage of data from operations to decision makers; real-time reaction is impossible. Maintenance can become costly as impact analysis for any change introduced can be complex and far reaching across complex ETL trails and multiple copies of data.
  • 39. Business unIntelligence | 148 The emergence of analytic databases in the early to mid-2000s, with their combination of software and hardware advances demon- strated that query speeds over large data volumes could be im- proved by orders of magnitude over what had been previously possible. Even more clearly than the split between operational and informational in postulate 1, the layering in the warehouse itself is becoming, in many cases, increasingly unnecessary. 5. Information can be treated simply as a super5. Information can be treated simply as a super5. Information can be treated simply as a super5. Information can be treated simply as a super----class of dataclass of dataclass of dataclass of data Since the beginning of computing in the 1950s, the theoretical and practical focus has largely been on data—optimized for computer usage—and information has been viewed through a data-centric lens. Chapter 3 discussed at length why this is completely back-to- front, placing an IT-centric construct at the heart of communication that is essentially human—however imprecise, emotionally laden, and intimately and ultimately dependent on the people involved, their inter-relationships, and the context in which they operate. As the biz-tech ecosystem comes to fruition, we must start from meaning—in a personal and business context—and work back through knowledge and information all the way to data. In this way, we can begin to reorient decision making to its business purpose and the people who must make the decision and take action. 6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through6. Data quality and consistency can only be assured by IT through a largely centralized environmenta largely centralized environmenta largely centralized environmenta largely centralized environment This postulate had its origins in the complexity and experimental nature of early computers, but it continues to hold sway even though modern PCs and an ever-increasing number of mobile de- vices are now used extensively by business users, and hold far more data in total than centralized systems. While centralized control and management are the ideal approach to assure data quality and consistency, the real world of today’s computing environment makes that impossible. Management of data quality and consistency must now be auto- mated and highly distributed. In addition, they must be applied to data based on careful judgment, rather than seen as a mandatory requirement for all data.
  • 40. 149 | CHAPTER 5: Data-based decision making 7.7.7.7. Business users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seenBusiness users’ innovation in data / information usage is seen by IT as marginal and threatening to data qualby IT as marginal and threatening to data qualby IT as marginal and threatening to data qualby IT as marginal and threatening to data qualityityityity This belief can be traced to the roughly simultaneous appearance of viable PCs and RDBMSs in the 1980s. As we observed in Chap- ter 4, IT was used to managing the entire data resource of the busi- ness, and as data became more complex and central to the business, the emerging relational and largely centralized databases were seized with both hands. PCs and spreadsheets were first ig- nored and then reviled by the IT arbiters of data quality. This postu- late has continued to hold sway even until today, despite the growing quantity and role of distributed, user-controlled data. Properly targeted and funded data governance initiatives are re- quired to change this situation. Such initiatives are now widely rec- ognized as a business responsibility (Hopwood, 2008), but in many companies, the drive and interest still comes from IT. In the biz- tech ecosystem, business management must step up to their re- sponsibility for data quality and work closely with IT to address the technical issues arising from a highly distributed environment. All these commonly held assumptions have contributed to the rela- tive stasis we’ve seen in the data warehousing world over the past two decades. The time has come to let them go. 5.6 IDEAL architecture (3): Information, Timeliness/Consistency dimension s we’ve seen throughout this chapter, the original business drive for consistency in reporting has been largely supplant- ed by a demand for timeliness. However, from a conceptual point of view, in a highly distributed computing environment where information is created in diverse, unrelated systems, these two characteristics are actually interdependent. Increase one and you decrease the other. In our new conceptual architecture, we thus need a dimension of the information layer that describes this. In fact, data warehouse developers have been implicitly aware of this dimension since the inception of data warehousing. However, it has been concealed by two factors: (1) an initial focus only on con- sistency and (2) the conflation of a physical architecture consisting A
  • 41. Business unIntelligence | 150 of discrete computing systems with a conceptual/logical architec- ture that separated different business and processing needs. As shown in Figure 5-10, the timeliness / consistency (TC)timeliness / consistency (TC)timeliness / consistency (TC)timeliness / consistency (TC) dimension of information at the conceptual architecture level consists of five classes that range from highly timely but necessarily inconsistent information on the left, to highly consistent but necessarily untime- ly on the right. From left to right, timeliness moves from infor- mation that is essentially ephemeral to eternal. InInInIn----flightflightflightflight information consists of messages on the wire or on an en- terprise service bus; it is valid only at the instant it passes by. This data-in-motion might be processed, used, and discarded. It is guar- anteed only to be consistent within the message or, perhaps, the stream of which it is part. In-flight information may be recorded somewhere, depending on process needs, at which stage it be- comes live. LiveLiveLiveLive information has a limited period of validity and is subject to continuous change. It also is not necessarily completely consistent with other live information. In terms of typical usage, these two classes correspond to today’s operational systems. StableStableStableStable information, the mid-point on the continuum, represents a first step towards guaranteed consistency by ensuring that stored data is protected from constant change and, in some cases, en- hanced by contextual information or structuring. In existing sys- tems, the stable class corresponds to any data store where data is FigureFigureFigureFigure 5555----10101010:::: Timeliness/ consistency dimension of information
  • 42. 151 | CHAPTER 5: Data-based decision making not over-written whenever it changes, including data marts, partic- ularly the independent version, and content stores. This class thus marks the transition point from operational to informational. Full enterprise-wide, cross-system consistency is the characteristic of reconciledreconciledreconciledreconciled information, which is stored in usage-neutral form and stable in the medium to long term. Its timeliness, however, is likely to have been sacrificed to an extent depending on its sources; old- er, internal, batch-oriented sources and external sources often delay reconciliation considerably. The enterprise data warehouse is the prime example of this class. MDM stores and ODSs can, de- pending on circumstances, contain reconciled information, but often bridge this and the live or stable classes. HistoricalHistoricalHistoricalHistorical information is the final category, where the period of validity and consistency is, in principle, forever. But, like real-world history, it also contains much personal opinion and may be rewrit- ten by the victors in any power struggle! Historical information may be archived in practice, or may exist in a variety of long-term data warehouse or data mart stores. It is, of course, subject to data retention policies and practices, which are becoming ever more important in the context of big data. Many of the examples used above come from the world of hard information and data warehousing, in particular. This is a conse- quence of the transactional nature of the process-mediated data that has filled the majority of information needs of business until now. However, the classes themselves apply equally to all types of information to a greater or lesser extent. For softer information, they are seldom recognized explicitly today but will become of in- creasing importance as social media and other human-sourced information is applied in business decisions of increasing value. The timeliness/consistency dimension broadly mirrors the lifecycle of information from creation through use, to archival and/or dis- posal. This spectrum also relates to the concept of hot, warm, and cold data, although these terms are used in a more technical con- text. As with all our dimensions, these classes are loosely defined with soft boundaries; in reality, these classes gradually merge from one into the next. It is therefore vital to apply critical judgment when deciding which technology is appropriate for any particular
  • 43. Business unIntelligence | 152 base information set. The knowledge and skills of any existing BI team will be invaluable in this exercise, but will need to be comple- mented by expertise from the content management team. 5.7 Beyond the data warehouse iven the extent and rate of changes in business and tech- nology described thus far, it is somewhat unexpected that the term data warehouse and the architectural structures and concepts described in Sections 5.2 and 5.3 still carry consider- able weight after more than a quarter of a century. However, this resistance to change cannot endure much longer. Indeed, one goal of this book is to outline what a new, pervasive information archi- tecture looks like, within the scope of data-based decision making and the traditional data sources of BI for the past three decades. Reports of my death have been greatly exaggeratedReports of my death have been greatly exaggeratedReports of my death have been greatly exaggeratedReports of my death have been greatly exaggerated8888 Of course, the data warehouse has been declared terminally ill before now. BI and data warehouse projects have long had a poor reputation for delivering on-time or within budget. While these difficulties have clear and well-understood reasons—emanating from project scope and complexity, external dependencies, organi- zational issues, and more—vendors have regularly proposed quick- fix solutions to businesses seeking quick and reliable solutions to BI needs. The answers, as we’ve seen, range from data marts and ana- lytic appliances to spreadsheets and big data. As each of these ap- proaches has gained traction in the market, the death of the data warehouse has been repeatedly—and incorrectly—pronounced. The underlying reason for such faulty predictions is a misunder- standing of the consistency vs. timeliness conundrum described in section 5.4 above. The data warehouse is primarily designed for consistency; the other solutions are more concerned with timeli- ness, in development and/or operation. And data consistency re- mains a valid business requirement, alongside timeliness, which has growing importance in a fully interconnected world. Nonetheless, as the biz-tech ecosystem evolves to become essentially real-time, 8 Mark Twain’s actual written reaction in 1897 was: “The report of my death was an exaggeration.” G
  • 44. 153 | CHAPTER 5: Data-based decision making the data warehouse cannot retain its old role of all things to all in- formational needs, going forward. As a consequence, while it will not die, the data warehouse concept faces a shrinking role in deci- sion support as the business demands increasing quantities of in- formation of a structure or speed that are incompatible with the original architecture or relational technology. 5.8 REAL architecture (1): Core business information n essence, the data warehouse must return to its roots, as repre- sented by Figure 5-3 on page 121. This requires separate con- sideration of the two main architectural components of today’s data warehouses—the enterprise data warehouse and the data mart environment. In the case of the EDW, this means an increas- ing focus on its original core value propositions of consistency and historical depth, where they have business value, including: 1.1.1.1. The data to be loaded is process-mediated data, sourced from the operational systems of the organization 2.2.2.2. This loaded data provides a fully agreed, cross-functional view of the one consistent, historical record of the business at a de- tailed, atomic level as created through operational transactions 3.3.3.3. Data is cleansed and reconciled based on an EDM, and stored in a largely normalized, temporally based representation of that model; star-schemas, summarizations and similar derived data and structures are defined to be data mart characteristics 4.4.4.4. The optimum structure is a “single relational database” using the power of modern hardware and software to avoid the copy- ing, layering and partitioning of data common in the vast major- ity of today’s data warehouses 5.5.5.5. The EDM and other metadata describing the data content is considered as an integral, logical component of the data ware- house, although its physical storage mechanism may need to be non-relational for performance reasons The first business role of any “modern data warehouse” is thus to present a historically consistent, legally binding view of the busi- I