Human Factors of XR: Using Human Factors to Design XR Systems
Madison WI BI Sandbox Group Discusses Data Exploration
1. TheThe BI SandboxBI Sandbox
Madison, Wisconsin AreaMadison, Wisconsin Area
Business Intelligence & Data WarehousingBusiness Intelligence & Data Warehousing
Discussion GroupDiscussion Group
2. Production ETL
Analytic Data LayerData Acquisition
Layer
Operational Data Layer
BI architecture at a glance …
Legacy
Source
Systems
Legacy
Source
Systems
New
Source
Systems
New
Source
Systems
TriageTriage
ConformedConformed
StorageStorage
AreaArea
batch
transaction OperationalOperational
Data StoresData Stores
OperationalOperational
Data StoresData Stores
XML
Message
XML
Message
DataData
MartsMarts
AnalysisAnalysis
SandboxesSandboxes
Other Sources:
Operational systems
User supplied data
Manual Loads
3. BI architecture at a glance …
Operational Data Layer Analytic Data Layer
ConformedConformed
StorageStorage
AreaArea
OperationalOperational
Data StoresData Stores
OperationalOperational
Data StoresData Stores
DataData
MartsMarts
Consolidated
data feeds
(legacy & new)
to downstream
systems
Consolidated
data feeds
(legacy & new)
to downstream
systems
Near real-time
data feeds of new
systems’ data
Near real-time
data feeds of new
systems’ data
Standardized
reporting, ad
hoc reporting
and analysis,
data mining,
predictive
models
Standardized
reporting, ad
hoc reporting
and analysis,
data mining,
predictive
models
Standardized
reporting
Standardized
reporting
AnalysisAnalysis
SandboxesSandboxes
4. What do you think of when you hear
“sandbox”?
Sandboxes are places to play where
The sand and box are provided
You bring your own toys
What you create is temporary
6. Which is the best analogy for a BI
environment?
Assembly Line
Assembly Line
A Predictive Model Test Bed
A Predictive Model Test Bed
A Library
A Library
An Artist’s Studio
An Artist’s Studio
An Information Goldmine
An Information Goldmine
8. The BI Sandbox, defined
Responsibilities • To facilitate short term ad-hoc exploratory analysis.
• To remove roadblocks to client self-service (minimizing the need for I/S
assistance) with short term ad-hoc exploratory analysis.
• To avoid the creation of unmanaged spreadsheet based data on user
desktops or shared network drives.
• To better enable short term ad-hoc exploratory analysis to be converted to
long term operational analysis as needed (through traceability)
Collaborators Semantic Layer, Operational Data Layer (ODL), Analytic Data Layer (ADL)
Rationale Typically reporting and analysis is ongoing, consistent, and can be enabled by
production structures such as ODSs and data marts.
Occasionally, business requirements indicate a need for temporary or ad-hoc
exploratory data analysis that cannot be supported by existing data structures.
These business requirements often results in unmanaged disparate spreadsheet data
on individual user desktops or shared network drives.
Sandboxes are meant to mitigate the risk that these ad hoc data sets are created
through inconsistent techniques and the subsequent risk that analytical results
discovered by using them are hard to trace and convert to a more permanent
process; and doing so typically requires a complex project to convert the untraceable
data set, integration, and analytical rules into repeatable rules.
9. The BI Sandbox, defined
Issues and
Notes
• Sandbox data sets will be short-lived.
• The sandbox will support Ad hoc analysis.
• Sandbox data sets will be intended for a specific purpose.
• Reporting generated from the sandbox will not be considered “official”.
• Sandbox data sets should be transitional.
• Sandboxes, if they cannot be decommissioned, should be transitioned into
production structures (e.g., ODSs or data marts).
• Sandbox data set structure/format will be dependent on access tools.
• Sandbox data set composition and quality will be dependent on the source.
• Sandbox check-out (data validation) strategy will be the responsibility of the
end user.
• Sandbox data sets should require minimal I/S intervention.
• Sandbox data can come from external or user supplied sources.
• Data acquisition from operational systems is restricted.
• Sandbox data will not be automatically refreshed on a regular basis.
• Naming standards do not apply to sandbox structures.
10. The BI Sandbox, the real why
• Shed light on data integration work clients do
whether I/S wishes to acknowledge it or not
• Increase partnership between I/S and business
– I/S has an appropriate solution to offer for more real
problems
• Most innovation doesn’t happen in well-defined
structures
11. The BI Sandbox, the how
Provide a place to play
• Typically SAS storage
Bring your own toys
• Manual loads of data from various sources including
• Data marts
• ODSs
• Operational systems
• User-supplied data sets
Create & Learn
• Use analysis tools (Business Objects, SAS, Excel) to
explore the data and discover
Transfer what you learn elsewhere
• Covert discoveries into operational changes to build
value
12. The BI Sandbox, the limitations
• Joins between disparate sources on natural keys
alone
– Operational system keys
– Functional keys
• No cleansing, no column renaming, minimal
metadata, no data modeling
• No automated refresh process
13. The BI Sandbox, the examples
• Prototyping new enterprise measure
• Experimenting with integration of disparate data
sources
• Predictive model creation, testing & validation
(in parallel with production development)