Talk given on Dec. 3, 2014 at MIT, sponsored by Hack/Reduce. This talk looks at the history of Business Intelligence from first generation OLAP tools through modern Data Discovery and visualization tools. And looking forward, what can we learn from that evolution as numerous new tools and architectures for analytics emerge in the Big Data era.
TeamStation AI System Report LATAM IT Salaries 2024
From Business Intelligence to Big Data - hack/reduce Dec 2014
1. From Business Intelligence to Big Data:
The Evolution of Business Analytics
@hackreduce – Dec. 3, 2014
Adam Ferrari
@AJFerrari
(All opinions expressed are my own / I’m not here representing my employers)
5. This talk
What did I learn as CTO of a BI product company as we
jumped into the BI market mid-stream, and then later as we
were acquired by one of the biggest “traditional BI” vendors?
Most Importantly:
Stay focused on real business value, not technology.
Note: My context is very “product provider” oriented, but I believe the lessons
are equally interesting to “product consumers” – after all, we’re all interested
in where the toolset is going and why
6. A note about scope
Analytics is a highly overloaded term
The vast majority of my experience, and the focus of this talk, is
around “BI-style” analytics, i.e.,
Delivering historical and aggregate views of data (e.g.,
charts, reports, dashboards, etc.) to business decision makers
There are many other important forms of “analytics”
E.g., Data mining, statistics, data science, etc.
These are very important and complementary,
but not in my scope here
7. Part 1 (of 3)
Some Ancient History
(or, a bunch of important stuff that happened before my time)
8. In the beginning…
…there was the cube
(well, there was a bunch of stuff before that – Hans Peter Luhn coins the term Business
Intelligence in 1958, Edgar Codd invents the relational data model in 1970, etc…
but we’ll start with the beginning of modern Business Intelligence, which is OLAP)
Image source: oracle.com
9. Research sponsored by Arbor Software in 1993,
defined the “12 Rules for OLAP Products”
Rule #1 – “Multidimensional Conceptual View”
12. ROLAP Modeling
• Manage mapping between
physical data stores, “logical
view” (core dimensional model),
and “business view”
• Definition of metrics,
dimensions
• Management of pre-
computed aggregates
Image source: rittmanmead.com
13. Data Warehousing: go big or go home
HW
• Teradata
• Netezza (IBM)
• Oracle Exadata
SW – Traditional DBMS
• Oracle
• MS SQL Server
• IBM DB2
SW – Analytical DBMS
• Vertica (HP)
• ParAccell (/ RedShift)
• SAP HANA
Image source: teradata.com
14. ETL- Extract/Transform/Load
Image source: informatica.com
Notable ETL Products
• Informatica Power Center
• Ascential DataStage (IBM)
• Ab Initio
• … numerous others
• Capture History
• Manage dimensions
– E.g., what happens if a
customer moves?
“slow changing dimensions”
• Pre-compute aggregates
• Serve as the versionable
managed record of how the
dimensional model of the
warehouse is derived from
the raw data
17. Business Analytics 1.0 - Pros & Cons
• Governance, re-use, and quality
– “One Version of the Truth” – correct, agreed upon, reusable definitions of core
business metrics and dimensions
But…
• Poor Agility – development process requires:
– Creating or modifying a dimensional model
– Creating ETL to populate the new model
– Creating report or dashboard content on top of the model
– Iterating to make the model perform
• Lack of self-service for end users
• Historically, poor user experience for end consumers
• Cost and Complexity – large, complex stack of
components, code, and configuration to manage, scale,
troubleshoot, etc.
18. Part 2 (of 3)
Some Recent History
(or, where I joined the story already in progress)
19. Data Discovery & Visualization
Key Features
• Visual data presentation
• Interactive data exploration –
“facets,” “lassos,” etc.
• Simplified stack – DBMS and Server optional
• Self-service: data loading & content creation,
no dimensional modeling
Notable products:
• QlikView (Qlik Tech)
• Tableau
• Spotfire (TIBCO)
• Endeca Latitude
(now Oracle Information Discovery)
• EdgeSpring (now Salesforce.com Wave)
• Business Objects Explorer
Image source: tibco.com
Image source: sap.com
24. Data Discovery Lessons
• Improved User Experience, Self-service
But…
• BI is still really hard
– Reading from raw, real-world operational schemas is messy and
complicated
– And the requisite history may not even be available
• The usability benefits of discovery tools come with significant
scalability limitations
• Additional data types – so called “unstructured” data (logs,
text, etc.) is even harder, as discovery tools (generally) target
structured, tabular data (didn’t address “Big Data”)
And…
• Traditional BI tools are rapidly adding better UX, Visualization,
and Self-service
25. Part 3 (of 3) (woohoo!)
Future History
(or, stuff that’s still anyone’s guess)
26. Our analytics ambitions have only grown!
We want BIG, EASY, DEEP analytics
• [BIG] the headline grabber:
More data from more sources, aka: Big Data
• [EASY] the real issue (IMHO):
Faster time to value, at lower cost of ownership
• [DEEP] increasingly important:
Deeper intelligence from data…
not just data, but actions, predictions, etc…
… Can we solve these problems without creating an
ever larger mess of technology and products?
27. [BIG]: the Hadoop Solution
Posits that what we need is a better, more flexible and
scalable foundation for the Data Warehouse – more like a
“data operating system” than a DBMS
Image source: cloudera.com
28. [BIG] and [EASY] “On-Hadoop” Solutions
Image source: gigaom.com
Platfora Architecture
Posit that although Hadoop
is indeed a powerful
platform, it’s complexity
needs to be wrapped in a BI
/ analytics application
Notable Products
• Platfora
• Datameer
• Oracle Big Data Discovery
(based on Endeca)
29. [BIG+]: The Logical Data Warehouse*
Posits that what is needed is a variety of data stores to constitute the
“Data Warehouse,” along with integration to allow data to be stored
and processed where most appropriate with little or no additional
development effort or operational management overhead
Image source: teradata.com
* From Understanding the Logical Data Warehouse: The Emerging
Practice, 21 June 2012, Mark A. Beyer and Roxane Edjlali
30. [EASY] The Cloud Solution
• Agility via all of the traditional cloud benefits –
reduced setup, less customization, reduced
ongoing management, etc…
• SaaS-based BI tools, such as
– GoodData
– Domo
• SaaS-based BI applications, such as
– Numerify (IT analytics on ServiceNow, etc.)
– InsightSquared (Sales analytics on Salesforce)
31. Other notable examples
• [DEEP] and [EASY]: BeyondCore – data discovery
with automatic/algorithmic analysis of attribute
relationships
• [DEEP]: Ayasdi – deeper insight into data based
on novel topological data visualization
• [DEEP] Alteryx – democratizing more complex
analytical workflows
• [EASY 2.0]: Looker – lightweight BI without
sacrificing modeling, yet avoiding the need for a
warehouse
• [BIG] and [EASY]: Tamr, Trifacta - curating and
wrangling data into usable forms
32. My guesses about the future?
• I voted with my feet. My beliefs:
– Fast time to real value is of paramount importance
• Zero-friction SaaS applications targeted to specific
business problems are an essential enabler – essential to
amortizing the cost of developing meaningful analytics
and quickly disseminating best practice updates – DIY just
doesn’t cut it any more in many cases.
– Our ability to do basic BI (dashboards, data
discovery, etc.) is mature, and the real action is in
deeper analysis of data
• Yet highly custom data science efforts are at odds with
fast time to value, and hard to advance in many cases
33. Crisply – quantified work for CRM
model & activity
activity
quantified
work
• Algorithmic quantification of the human effort behind each customer,
opportunity, support case, etc.
• Determine the true cost to acquire a specific customer or type of customer,
and understand the true profitability of that customer or segment over time
34. Thanks!
And stay focused on
the value that analytics creates
(the technology with follow from that)
Editor's Notes
http://www.tcsnycmarathon.org/analytics
Data Discovery tools improved agility and UX, and enabled more powerful self-service / DIY
But did these “model-less tools” truly advance Business Analytics, or just expand the toolset?
How will their impact trend as traditional tools become more agile and visual, at the same time that more modern tools advance the functional envelope?
Large organizations are still sorting out the impact of data discovery on their BI strategies, even as the picture changes quickly with new tools emerging and incumbent standards improving.