Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse

Building the Modern Data Hub:
Beyond the Traditional
Enterprise Data Warehouse

2www.datavail.com
The New World of Data
90% of the world’s information was created in the last two
years. 80% of all enterprise data is unstructured, which means
it’s not the neat and tidy data that for decades has been held in
relational databases, which in turn plug nicely into “business
intelligence” tools, enterprise data warehouses and other
traditional data analytics systems.
Today’s data needs different tools. And it requires a different
sort of data scientist.

3www.datavail.com
The EDW Analytic Conundrum
Modern DataHub
● Flexible - add new data easily
● Fresh - up to date data, near real
time
● Any query no matter how
complex
● Rapid deployment - days to
weeks
Traditional EDW
● ETL based - Brittle, hard to add
new data sources
● Stale - data can be out of date
● Limited - queries limited by
what data available
● Slow - months to deploy or
update

4www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance

5www.datavail.com
The Traditional Data Warehouse
Extract Load &
Transform Processes Star Schema
Data Warehouse (EDW)
Data
Visualization
Significant
Investment in Planning,
Development, Monitoring
& Maintenance
What’s the ROI?
How long is
this going to
take?
Are we sure
these are the
right reports?
How quickly can we
make changes?

6www.datavail.com
Today’s Traditional EDW Problems
Extraction, Transformation & Data Loading
•Highly transformative, structured ETLs are a costly
investment on many levels from development,
monitoring, tuning to operational maintenance &
remediation
•Target schema structures require planning based
on end goals but often those goals are not well
defined
•Often today the data we have is both structured
and unstructured
• Traditional EDWs are a long term investment and
the ROI is often hard to measure
• Perishable Insights are difficult to capture in
traditional EDWs requiring fast turnaround (Superbowl,
Mother's Day, Thanksgiving, etc.)

7www.datavail.com
Today’s Traditional EDW Problems Visualization &
Reporting
• Traditional analytic reporting is predicated on
structured schemas (star, snowflake, relational, etc..)
• if these are not planned well it can create
performance problems
• hard structures can lead to missing metrics and
reporting opportunities
• Any reworking of the final analytics requiring new
metrics or data elements often require going back to
the ETL to properly remediate the missing elements
• Producing insights and reporting for new trends can
be time consuming when predicated on pre-planned
data structures
• Missed opportunities on Perishing Insights (Superbowl,
Mother's Day, Thanksgiving, etc.)

8www.datavail.com
A Proposed Modern Approach
MongoDB
JSON Data Warehouse
No Predetermined
Schema
Cubes
Unstructured
Data Star Schema
EDW
OLTP
Data Mart
Reporting
ETL / ELT
Staging
Immediate Access to Data
for Analytic Insights, Fast
ROI & Planning
Other Data
Sources

9www.datavail.com
NoSQL as Source for Visualization
JSON
Structured Data
• RDBMS
• Cloud (AWS, Azure, etc)
-MongoDB
-Spark
BI Tools *
Tableau
PowerBI
Spotfire
Reporting
BI Connector
NoSQL
Hadoop
Hadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema

10www.datavail.com
Hadoop Data Lakes & Data Hubs
• Hadoop is NOT a database it’s a filesystem
• Impala, Cassandra or just JSON, XML, CSV files
• SlamData connects to Hadoop using Spark (both written in Scala)
• Much simpler to implement than 1st generation data
hub/lakes.
Historical Data
Historical Data
Historical Data
Hadoop HFS
JSON, CSV, XML Data Lake
No Predetermined Schema

11www.datavail.com
What is SlamData?
• SlamData is not a Database
• SlamData is not a monitoring tool
• SlamData is not an ETL tool
• SlamData is not NoSQL
• SlamData is not a replacement SQL
Server, Oracle, DB2, MySQL,
Informix, etc...
• SlamData is not expensive
• SlamData is an analytics engine
• SlamData uses SQL2
for queries
• SlamData will natively connect to
MongoDB, Hadoop (eventually SQL,
Oracle, MySQL, Flatfiles, and more)
• SlamData solves the problem of
directly querying JSON, CSV, ect.
• SlamData spans a huge gap in
traditional data warehouse needs
NOT IS

Examples of SlamData
in Action

13www.datavail.com
Interactive reports
• Live interactive reports. Embed them as real-time visuals in your own
Analytics Dashboard or share them as quick insights.

14www.datavail.com
Complex queries over nested data

15www.datavail.com
Chart out Machine Data
• Machine data visualizations are quick and easy. Embed them as real-time visuals in
your own Analytics Dashboard or share them as quick insights.

17www.datavail.com
When Could This Solution Make Sense?
1. You are using MongoDB and getting reporting out is a
struggle
2. You’re planning a traditional data warehouse project,
and the 6-12 month time frame is daunting and you
need better report planning to determine ROI
3. You are using a product like Splunk to capture machine
data and it’s become too expensive
4. You have Hadoop or are planning to implement
Hadoop as a DataLake or DataHub

18www.datavail.com
Why this approach? Simple, save time and money
• Scoping EDW is more simple
• Imagine the ability to eliminate the overhead of planning the data
structure before you know the end analytic needs
• ETL development is less complex
• If the task is just defined as capturing and storing the data; it
becomes much more simple
• Implement solutions in days to weeks, not weeks to months
• SAVE $$$$, Less costly storage options, no ETL software, less
maintenance, lower cost to implement.

19www.datavail.com
Case Studies
Global technology company
Needs:
• Consolidated security and log analytics
• Needed ability to do complex ad-hoc
queries without limitations.
• Share and publish results easily
Solution:
• Using MongoDB to live capture logs
• SlamData for ad-hoc queries and
visualizations
Large Government Agency
Needs:
• Consolidate data from 5+ data sources in
various formats
• Need to be able to answer ad-hoc questions in
minutes to hours, not days to weeks
• Data is perishable, slow brittle ETL or data
mapping was not a good option
Solution:
• Consolidate data into MongoDB datahub
• Use SlamData for building rapid reports that
can be shared and published

20www.datavail.com
So What’s Next Step?
• Lets us show you - give us your toughest data
analytics problem
• Deliver a POC in two weeks or less
• SlamData is the missing piece of data lake/data hub
•Fast time to value, less cost
• Leverage current SQL skills, lower the learning curve
• Build powerful reports, dashboards in minutes, on
live data

Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse

Similar to Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse (20)

Recently uploaded

Recently uploaded (20)

Building the Modern Data Hub: Beyond the Traditional Enterprise Data Warehouse