The BI Sandbox

TheThe BI SandboxBI Sandbox
Madison, Wisconsin AreaMadison, Wisconsin Area
Business Intelligence & Data WarehousingBusiness Intelligence & Data Warehousing
Discussion GroupDiscussion Group

Production ETL
Analytic Data LayerData Acquisition
Layer
Operational Data Layer
BI architecture at a glance …
Legacy
Source
Systems
Legacy
Source
Systems
New
Source
Systems
New
Source
Systems
TriageTriage
ConformedConformed
StorageStorage
AreaArea
batch
transaction OperationalOperational
Data StoresData Stores
OperationalOperational
XML
Message
XML
Message
DataData
MartsMarts
AnalysisAnalysis
SandboxesSandboxes
Other Sources:
Operational systems
 User supplied data
Manual Loads

BI architecture at a glance …
Operational Data Layer Analytic Data Layer
ConformedConformed
StorageStorage
AreaArea
DataData
MartsMarts
Consolidated
data feeds
(legacy & new)
to downstream
systems
Consolidated
data feeds
(legacy & new)
to downstream
systems
Near real-time
data feeds of new
systems’ data
Near real-time
data feeds of new
systems’ data
Standardized
reporting, ad
hoc reporting
and analysis,
data mining,
predictive
models
Standardized
reporting, ad
hoc reporting
and analysis,
data mining,
predictive
models
Standardized
reporting
Standardized
reporting
AnalysisAnalysis
SandboxesSandboxes

What do you think of when you hear
“sandbox”?
Sandboxes are places to play where
The sand and box are provided
You bring your own toys
What you create is temporary


Obviously some of us are more talented
with sandboxes than others…

Which is the best analogy for a BI
environment?
Assembly Line
Assembly Line
A Predictive Model Test Bed
A Predictive Model Test Bed
A Library
A Library
An Artist’s Studio
An Artist’s Studio
An Information Goldmine
An Information Goldmine

sandbox noun /'san(d) , bäks/

The BI Sandbox, defined
Responsibilities • To facilitate short term ad-hoc exploratory analysis.
• To remove roadblocks to client self-service (minimizing the need for I/S
assistance) with short term ad-hoc exploratory analysis.
• To avoid the creation of unmanaged spreadsheet based data on user
desktops or shared network drives.
• To better enable short term ad-hoc exploratory analysis to be converted to
long term operational analysis as needed (through traceability)
Collaborators Semantic Layer, Operational Data Layer (ODL), Analytic Data Layer (ADL)
Rationale Typically reporting and analysis is ongoing, consistent, and can be enabled by
production structures such as ODSs and data marts.
Occasionally, business requirements indicate a need for temporary or ad-hoc
exploratory data analysis that cannot be supported by existing data structures.
These business requirements often results in unmanaged disparate spreadsheet data
on individual user desktops or shared network drives.
Sandboxes are meant to mitigate the risk that these ad hoc data sets are created
through inconsistent techniques and the subsequent risk that analytical results
discovered by using them are hard to trace and convert to a more permanent
process; and doing so typically requires a complex project to convert the untraceable
data set, integration, and analytical rules into repeatable rules.

The BI Sandbox, defined
Issues and
Notes
• Sandbox data sets will be short-lived.
• The sandbox will support Ad hoc analysis.
• Sandbox data sets will be intended for a specific purpose.
• Reporting generated from the sandbox will not be considered “official”.
• Sandbox data sets should be transitional.
• Sandboxes, if they cannot be decommissioned, should be transitioned into
production structures (e.g., ODSs or data marts).
• Sandbox data set structure/format will be dependent on access tools.
• Sandbox data set composition and quality will be dependent on the source.
• Sandbox check-out (data validation) strategy will be the responsibility of the
end user.
• Sandbox data sets should require minimal I/S intervention.
• Sandbox data can come from external or user supplied sources.
• Data acquisition from operational systems is restricted.
• Sandbox data will not be automatically refreshed on a regular basis.
• Naming standards do not apply to sandbox structures.

The BI Sandbox, the real why
• Shed light on data integration work clients do
whether I/S wishes to acknowledge it or not
• Increase partnership between I/S and business
– I/S has an appropriate solution to offer for more real
problems
• Most innovation doesn’t happen in well-defined
structures

The BI Sandbox, the how
Provide a place to play
• Typically SAS storage
Bring your own toys
• Manual loads of data from various sources including
• Data marts
• ODSs
• Operational systems
• User-supplied data sets
Create & Learn
• Use analysis tools (Business Objects, SAS, Excel) to
explore the data and discover
Transfer what you learn elsewhere
• Covert discoveries into operational changes to build
value

The BI Sandbox, the limitations
• Joins between disparate sources on natural keys
alone
– Operational system keys
– Functional keys
• No cleansing, no column renaming, minimal
metadata, no data modeling
• No automated refresh process

The BI Sandbox, the examples
• Prototyping new enterprise measure
• Experimenting with integration of disparate data
sources
• Predictive model creation, testing & validation
(in parallel with production development)

The BI Sandbox

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The BI Sandbox

Similar to The BI Sandbox (20)

Recently uploaded

Recently uploaded (20)

The BI Sandbox