The document discusses the concept of the "Wisdom of the Crowd" and how it applies to analytics contexts where multiple data sources provide different perspectives that together yield more accurate insights than any single source. It then provides an overview of what a business intelligence (BI) stack is, describing it as a logical process for using BI tools and methods to process data through various layers from raw source systems through staging, loading, data warehousing, analytics modeling, and presentation layers. It uses a visual model of a BI stack to illustrate how data moves vertically through these layers, becoming cleaner and simpler at each stage while sacrificing some detail and source system alignment.
1. Copyright 2014 Proprietary and Confidential
NAVIGATING THE BI STACK
DEVELOPING OPERATIONAL AND TRANSFORMATIONAL
INSIGHT FROM THE WISDOM OF THE CROWD
2. CROWD WISDOM
2
• The “Wisdom of the Crowd” is the collective opinion of a group of individuals rather than that of
a single expert. This collective wisdom results in more accurate information and better
decisions than an individuals due to bias and idiosyncratic noise.
• When applied to an analytics context, we use each original data source as one or more
“opinions” about an aspect of the business in addition to the stakeholders.
• Multiple datasets, brought together in one location with the express purpose of “harmonizing
data”, adds to the collective wisdom of our crowd.
• The expectation is that while not all individuals will always agree, a ‘majority rule’ approach can
be effective in decision making.
• Dissenting “opinions” have an important role to play as well, establishing checks and balances
against propagated errors and organizational “groupthink”.
• Some examples of crowd wisdom are criminal juries and Google’s search engine.
This article uses material from the Wikipedia article Wisdom of the Crowd, which is released under the Commons Attribution-Share-Alike
License 3.0
3. Business Intelligence is set of tools used to implement an approach to
understanding one’s business, industry, and the best actions to take to maximize
value organizationally.
WHAT IS BI?
3
AN APPROACH
A method of analysis that compounds very
large, complex datasets into relationship driven
metrics that allow an organization to consume
and take action across multiple systems.
Examples include data warehousing, semantic /
statistical / predictive modeling, OLAP cubes,
and big data universes.
A TOOLSET
A collection of software driven tools used to
acquire, interpret, and communicate key
business data
Examples include SQL, Business Objects,
MicroStrategy, Crystal Reports, SSIS, SAS,
Microsoft Excel, and many others.
4. WHAT IS A BI STACK THEN?
4
• The BI Stack is the logical process of using the tools and methods of BI to process data.
• When data moved through the BI Stack, it’s called “migrating” or “promoting”, used in the
sense that the data has passed through the tools successfully and without errors tested in
the previous layers.
• There are many routes through the BI stack, but in general data only moves in one
direction. Think of it like a big filter organizing data as it moves through the stack.
• The stack has many disciplines and
players involved, and each part
requires a specific skillset.
• The end result of the process is simple:
actionable data that provides greater
insight to an organization.
5. WHAT IS AN EDW?
5
• An Enterprise Data Warehouse (EDW) is one part of the BI Stack. While it is a large part
of the BI Stack (both physically and logically) it isn’t the end point for data.
• The EDW is very similar to a warehouse in the sense that it is a huge space to put objects
for later use.
• Specifically it is the end point for the data in terms of data transformations. When data
reaches the EDW for all intents and purposes it is as “clean” as it can be but may NOT be
totally aligned with all business rules, standards, etc.
• A great deal of work still happens after the data warehouse. From a logical standpoint,
the EDW occupies a place about 2/3 of the way through the BI stack.
6. WHY IS AN EDW INSUFFICIENT FOR ACTIONABLE ANALYSIS?
6
• An EDW is often perceived as the end point of a BI method, but it’s actually much closer to the
beginning from an analytic standpoint.
• Generally once data arrives at the EDW, this is when analysts start to develop their work and
interpretations. Most operational reporting will live at this level, but analysis continues on. This is
what’s called Operational Insight, and while valuable really focuses on the questions “What am I
doing, and what have I done?”
• Most aggregate metrics, analytic models, and semantic relationships cannot occur at this level
without significant additional challenges.
7. IF NOT THE EDW, THEN WHAT?
7
• An EDW is the foundation of a highly performing analytic toolset. We must not think of
data as something to “warehouse” and more of something to keep moving, or “pumping”
• This is the difference between a static BI Stack and an active BI Stack. One sees EDW
as an endpoint, whereas the other sees the EDW as a data pump, constantly refining data
to get greater value from it.
• All relational analytic tools are based, in part, on a robust EDW. Most often, they will have
other aspects built above it though. These include semantic layers, OLAP cubes, data
universes/marts/stores, sandboxes, and other types of tools.
• These tools are often what result in Transformational Insight, or answers to questions
such as “What should I do next” or “What will my clients/suppliers/industry do in the
future”. Other types of transformational questions are “What should I have done” and
“Was my prediction accurate?”
8. DEFINING BI SUCCESS - SYSTEM
8
• Accessible
• The EDW should be accessible by all team members that interact with data in a way that is
consumable, understandable, and simple to obtain.
• Adaptable
• As the business evolves, the way we store, perceive, and consume data must evolve as well. Building
flexibility into the DNA of the data warehouse is fundamental to the overall success of the product.
• Quality
• No dataset is perfect, and it is always necessary to validate, scrub, and qualify datasets regardless of
pedigree. A careful balance must be struck to eliminate as many flaws as reasonably possible while
maintaining the integrity and culture of the original data. This includes standardization as well as
alignment where appropriate.
• Secure
• All care should be taken to protect our data as carefully as we would the patients we serve. Data
should only be released to those allowed to see it, and only so much data as is necessary to complete
the task at hand. All data generated should have a designated life span and disposal process.
9. DEFINING BI SUCCESS - APPLICATION
9
• Self Service Enablement
• Data is neither the exclusive domain of IT nor analysts. Data is necessary to support all aspects of the company and therefore all
team members should be able to access data necessary to their daily work independently.
• Platform Agnostic
• The purpose of the data warehouse structure should be to align the data we produce and consume with the business needs rather
than the originating data source. Thus, the data should be divorced from the paradigm of the originating application insofar as is
necessary to align with the organizational structure (and therefore business needs) of the company.
• Fully Integrated
• Regardless of data source or purpose, our business revolves around four core datasets: patient, provider, payor, and benchmarks.
Each of these data are inextricably linked and therefore so should the data in the data warehouse. Data should be related in an
efficient and accurate way that also allows for unique analytic approaches as necessary
• Advanced Analytic support
• The end goal of a data warehouse is not to recreate original data sources or to automate simple tasks. It is to enable the
advanced toolsets that are only possible with a much larger perspective of our business than any single operational tool can
provide. Enabling sandboxing, modeling, prediction, and other advanced capabilities that can then be integrated into all asp ects of
decision-making is the hallmark of an evolved data warehouse.
10. BUSINESS INTELLIGENCE IS A CYCLE, NOT A GOAL!
10
• The BI environment is best viewed as an organic cycle in which capabilities grow and evolve with
the business
• The tools must adapt to the business rather than vice-versa
11. LET’S BUILD A BI STACK! SOURCE DATA
11
Claims External Auths Fin / GL Benchmark Others
• At the bottom of the stack is the SOURCE DATA. This is also called the
Original Data Source (External) or sometimes just ODS.
• This ODS is generally NOT used for reporting or analysis except when
done from within the host application.
• This also becomes the validation “source of truth” until final approval is
given later to move into a production space.
• There is one box for every dataset sent into the EDW here. So every
application, external data source, etc.
• One important point to understand here is that every time data “moves”,
we use a program called an “ETL Process”. ETL means Extract,
Transform, and Load.
12. LET’S BUILD A BI STACK! DATA STAGING
12
Claims External Auths Fin / GL Benchmark Others
Stage
• In the STAGE layer, all of the data is unified into a single DBMS platform. A
DBMS platform is essentially a big database application like Microsoft SQL,
Oracle, Teradata, and many others.
• During this step, many different systems feed into the database. Everything
from a simple text file to major mainframe systems are organized into a
single common data format.
• No data correction takes place at this point. This data should be essentially
a “mirror image” of the data in its original source.
• While ETL is happening here, we’re much more focused on the Extract and
Load aspects at this point.
13. LET’S BUILD A BI STACK! LOAD LAYER
13
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV Load TEST / QA Load PROD
• In the LOAD LAYER, data begins it’s first set of transformations.
• Data begins to go through qualitative review, and the ETL process looks
for data failures, changes, and other such issues.
• There are 3 data “environments” here.
• DEV (Development)- a place to invent new ways to work with data
• TEST / QA- a place to validate the proper function of DEV data
• PROD (Production)- where the live data is loaded for promotion into
EDW.
14. LET’S BUILD A BI STACK! DATA / ODS LAYER
14
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
Load TEST / QA
Data / ODS TEST / QA
Load PROD
Data / ODS PROD
• In the DATA or ODS layer, we see the results of the initial transformations to the
data from LOAD.
• ODS in this case means Operational Data Store. This is the data that looks
almost exactly the same as the original data, but has passed quality tests and
is now stored consistent with most core data rules.
• While this is technically the beginning of the analytic area, data here is not yet
related to other data. So, you won’t find an easy way to align a claim with an
authorization for example.
• At this layer and lower, IT takes an ownership role while Analytics is
responsible for clarifying business needs.
15. LET’S BUILD A BI STACK! EDW LAYER
15
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
• The EDW layer is where the final core data transformations are made, as well as
all “standardized” relationships between datasets.
• At this point, the data is scrubbed for accuracy, completeness, and alignment
with organizational standards but it still may be very “raw”.
• Few metrics or aggregations, if any, will appear here. This is still a purely data-
driven dataset.
• Data here should be generally easy to validate against the ODS data.
16. LET’S BUILD A BI STACK! SEMANTIC LAYER
16
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
• This is where the analytic teams generally live. Most metrics, aggregations,
and analytic tools will use this layer for day to day work.
• As necessary, analysts may look deeper into the stack to obtain data not
available at this level.
• At this point IT tends to act as steward, while Analytics takes an ownership
role.
17. LET’S BUILD A BI STACK! MODEL LAYER
17
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Analytic Modeling
• In this modeling layer, the Advanced Analytics teams will build various
data models including predictive, statistic, behavioral, market, etc.
• The outcomes of these models can be used as not only an end-point
analysis, but can also continue to feed out the top of the BI Stack and
influence the data in the LOAD layer as well!
• These tools are often developed using highly cleansed datasets and
tend to have a much narrower analytic focus, requiring careful
interpretation.
18. LET’S BUILD A BI STACK!
18
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Analytic Modeling
Outbound Data Pumps / ETL
• In this layer, data is pushed out of the stack. This data generally includes
• Detailed analytic results routed to other applications internal to the
organization
• Data destined to be transformed or to transform data as it migrates
into the BI Stack STAGE layer
• Externally focused datasets that have been validated for automated
release
• Most often this is an automated solution
• These outputs are generally NOT considered analysis datasets, although
often are used as such
19. LET’S BUILD A BI STACK! PRESENTATION LAYER
19
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
• This is the final layer of the BI Stack, and it consists primarily of
aggregations, metrics, and visualizations.
• Report developers and operators tend to work mostly in this layer
• This layer will also contain unattended analyses such as automated,
subscribed reports, dashboards, and analytics-on-rails datasets
• Most leadership should be comfortable working in this layer. It often has
drag and drop interfaces, highly cleansed data, and well documented
standards
• It is uncommon for detail level data to be accessible in this layer without
special permissions.
20. LET’S BUILD A BI STACK! THE FINAL MODEL
20
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
21. LET’S BUILD A BI STACK! MOVEMENT
21
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
M
o
v
e
m
e
nt
Generally data only
moves vertically through
the BI Stack.
However, in more
advanced
implementations the
results of analyses
and models are
routed back into
the STAGE layer
to influence the
data as it promotes
upward.
This is another reason
an effective BI Stack is
called a “Data Pump”.
Each time data “moves” through the BI stack, a
program must be written to make that happen. These
programs are called ETL processes.
22. REMOVING THE DEVIL FROM THE DETAILS
22
• Typically, the higher (vertically) you move
in the stack, the cleaner the data
becomes.
• In addition, data becomes simpler to use
because more business rules are applied
the higher you go.
• The tradeoff for the greater simplicity and
cleaner data is at the sacrifice of ODS alignment
and detail availability.
• The data is still available through an effective
“Data Lineage” tools available to analysts that
help explain how the rules influenced the data.
Simple Data
Complex Data
Clean values
ODS Flaws
23. Business Users
Adv. Analytics Team
IT Oversight
IT EDW Team
Business Analytics
IT Application Team
MAJOR DOMAINS
23
Claims External Auths Fin / GL Benchmark Others
Stage
Load DEV
Data / ODS DEV
EDW DEV
Load TEST / QA
Data / ODS TEST / QA
EDW TEST / QA
Load PROD
Data / ODS PROD
EDW PROD
Semantic OLAP Sandbox Prod Models BI Universe
Presentation
Analytic Modeling
Outbound Data Pumps / ETL
Situationally these roles may change, but in general the areas of concern are fairly
well aligned with the level within the BI Stack.
24. SO IT’S JUST A BUNCH OF DATABASES?
24
• The magic of the BI Stack isn’t in the value of the data held therein, but in the
relationships between differing datasets.
• Ideally, because these datasets all essentially describe the same thing – our business,
patients, and industry – they should all be able to weave into each other.
• The wisdom of crowds is revealed in applying business questions to the many layers of
related data. Additionally, the dissenting opinions within the data also provide insight into
flaws, misunderstandings, and opportunities.
• It’s in the relationships between data where the true art of analysis becomes visible, and
this has a value that far exceeds the intrinsic measure.
• A skilled EDW development team can make a series of databases drive transformational
change not otherwise possible.
25. BI MATURITY – ROI PERSPECTIVE
25
http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf
26. BI MATURITY – UTILIZATION PERSPECTIVE
26
http://www.eurim.org.uk/activities/ig/voi/03-01-06_Executive_Series_Assessing_Your_BI_Maturity.pdf
27. REPORTING VALUE MATRIX
27
INSIGHT
ACTION
STRATEGYUNDERSTANDING
To determine the VALUE of an
analysis, we must establish the
insight it provides. This is measured
using 3 metrics driven by simple
questions:
ACTION
What change will you make using this
information?
STRATEGY
How will you change your approach to the
business with this report?
UNDERSTANDING
What deeper knowledge will you gain from
this data?
28. PRIORITIZATION HIERARCHY
28
Prioritization generally falls along two scales :
Regulatory
Contractual
Internal
Academic
Goodwill
Strategic
Compliance
OperationalFinancial
Reputational
High
Priority
Low
Priority
Higher priority
closer to center
& intersects
29. TEAM MEMBERS – BUSINESS CHAMPION
29
• Responsible for:
• Overall strategy & development of the BI Stack
• Facilitating organizational change related to BI
• Encouraging adoption of BI tools & methods
• Ideal Candidate:
• An experienced leader with deep insight into BI
• Background in analysis
• High-touch effective communicator
• Willing to break new ground
• Success Criteria:
• BI Stack effectiveness
• Innovative approaches leading to efficiency / accuracy /
insight breakthroughs
• User adoption and reliance on BI product
30. TEAM MEMBERS – IT CHAMPION
30
• Responsible for:
• Ensure implementation the technical aspects of the BI
Stack to support business needs
• Owner of hardware / software
• Manages day to day operations
• Ideal Candidate:
• An experienced leader with deep IT skillsets
• Strong experience in data warehousing, ETL, and
server resource management
• Success Criteria:
• “N 9’s” uptime
• Accurate uninterrupted data flows
• Able to accomplish business needs / requirements
within specified SLAs
31. TEAM MEMBERS – PROJECT MANAGER
31
• Responsible for:
• Owner of the project plan
• Facilitates forward project movement
• “keeping everyone honest”
• Ideal Candidate:
• Highly organized, skilled project manager
• Effective communicator / negotiator
• Forward thinking
• Success Criteria:
• Documented project plan with forecasts, metrics,
and progress analysis
• Identify and address all SWOT aspects
• Demonstrable progress made towards strategic
goals
32. TEAM MEMBERS – BUSINESS ANALYST
32
• Responsible for:
• Interviewing stakeholders to understand business
needs / rules
• Effective documentation of the BI projects
• Translating needs & rules into actionable development
goals / methods
• Ideal Candidate:
• Some experience in programming / development
• Strong documentation skills
• Industry experience
• Success Criteria:
• Effective documentation
• Captured all relevant business needs & rules
• Effective communication
33. TEAM MEMBERS – TRAINER
33
• Responsible for:
• Teaching the use of the BI tools
• Providing feedback to developers on refinements to tools to further enable adoption
• Ideal Candidate:
– Able to communicate complex
concepts effectively
– Patient and skilled communicator
– Skilled with BI tools
• Success Criteria:
– Users report successful use of BI tools
following training
– Effective feedback on tools provided to
developers to further enhance tools
34. TEAM MEMBERS – DBA
34
• Responsible for:
• Owner of the core databases related to the BI Stack and manages server hardware
• Oversees ETL efforts
• Teaches advanced coding techniques to support effective use of tools
• Ideal Candidate:
• Highly skilled in DBMS platform
• Strong awareness of organizational needs and vision necessary to meet those needs
• Creative approaches to complex issues
• Success Criteria:
• Successful data migrations / promotions
• No integrity lost in data outside of expectations
• Highly available BI Stack
35. TEAM MEMBERS – ARCHITECT / MODELER
35
• Responsible for:
• Development of the logical model of the BI stack
• Management of data governance efforts
• Owner of all BI Documentation
• Manages all semantic layers / components
• Ideal Candidate:
• Skilled in both conceptual design and practical implementation of data
• Clear understanding of business needs and desired outcomes
• Strong programming skillset in DBMS / ETL tools
• Success Criteria:
• Completion / maintenance of current data model documents
• Effective data governance (including I/O, security, recovery/destruction, and standards)
• Successful implementation & maintenance of data models that meet business needs
36. TEAM MEMBERS – DATA ANALYST
36
• Responsible for:
• Development of analytic tools based on the BI logical model
• Providing insight to the organization with actionable data
• Identify trends / behaviors / opportunities
• Ideal Candidate:
• Deep DBMS understanding
• Organizational / business knowledge
• Highly analytical and effective communicator.
• Success Criteria:
• Development & maintenance of BI tools
• Accurate and effective representation of responses to business
needs
• Clear communication and actionable guidance
37. TEAM MEMBERS – QUALITY ASSURANCE / GOVERNANCE
37
• Responsible for:
• Ensuring the overall quality of data as it promotes through the BI Stack
• Executing against the data governance plan
• Act as SME on behalf of the internal / external customer (Ombudsman)
• Ideal Candidate:
• Highly detail oriented
• Skilled in both the data and the context of the business, rules, and needs
• Able to clarify rules / needs and infer the purpose of same
• Success Criteria:
• Elimination and prevention of data quality / integrity
issues
• Data is acquired, used, and destroyed according to plan
• Insight is provided in accordance with business needs
and within appropriate context
38. TEAM MEMBERS –REPORT DEVELOPER
38
• Responsible for:
• Initial development or reporting solutions using data and methodologies created
by the analyst
• Create effective visualizations of data to ‘paint a picture’
• Determine where on the “Reporting Value Matrix” a report exists
• Ideal Candidate:
• Skilled with BI tools selected by the organization
• Ability to translate analytic products into actionable metrics
• Capable of explaining metrics in a consumable way
• Success Criteria:
• Report volume developed / maintained
• Elimination of duplication
• Standardization of metrics or communication of variations
• Appropriate reporting methodologies
39. TEAM MEMBERS – REPORT OPERATOR
39
• Responsible for:
• Automation or manual runs of established reporting packages
• Endpoint QA of finalized report
• Distribution of finalized reports
• Ideal Candidate:
• Detail oriented
• Interested in data analytics
• Intermediate skills with analytic tools
• Success Criteria:
• Reports delivered on time to “N 9’s” rate
• No obvious errors delivered to client
• Effective management of reporting stakeholders & recipients
40. TEAM MEMBERS – USER (YOU!)
40
• Responsible for:
• Utilization of the products of the BI Stack
• Providing feedback on those products
• Communicating any changes to business rules, needs,
assumptions, or strategy
• Ideal Candidate:
• Aware of the business needs, etc.
• Has some influence on the outcome of reports with respect
to the Report Value Matrix
• Able to consume analytic or reporting outputs
• Success Criteria:
• Action taken using reporting products
• Can communicate value of analytic and reporting products
41. THINGS TO REMEMBER
41
• “Crowd wisdom” values all voices- people and data, harmonizers and dissenters.
• The purpose of BI is Operational and Transformational Insight, NOT reporting.
• Business Intelligence is BOTH a set of tools and the process in which they are used.
• The “BI Stack” includes an EDW, but encompasses much more than EDW
• The BI environment is best viewed as a cycle rather than a goal.
• There are many “layers” to the BI Stack, each with a valuable part to play.
• Ultimately the art and value of the BI Stack comes from the creative and innovative relationships
within the data.
• Reporting value is determined using the Report Value Matrix measures.
• Many players participate in the BI process, each bringing value to the tools.