3. 3
Everyone's an “expert”
Industry standard for spreadsheets 750 million users worldwide Over 30 years old
How many Excel “experts” does your organization have?
Excel is Familiar
4. 4
Ultimately, Excel puts Analysts in Control
“Show me the data and I’ll know it when I see it”
...Not just about data consumption, but data
consumption and contribution
Analysts need to develop their own “personal” data
modification techniques and mashups
Business Analysts don’t know how to provide reporting
requirements until they get their hands on the data
5. 5
Despite Excel’s utility for analysts, three primary issues exist...
But Problems Arise
No Data Variety...
No Data Volume...
No Data Governance...
6. 6
By the time you count to 60...
This data will be structured, semi-structured, and completely unstructured
Excel Doesn’t Accommodate Variety
More than 204
million emails will
be sent
Billions of new sensor
data points will be
detected
Over 2 million
Google search
queries will be
performed
684,000 bits of
content shared on
Facebook
More than 100,000
tweets will be sent
7. 7
Most companies in the US have at least 100,000 GBs of data stored
Excel Doesn’t Accommodate Volume
...Meanwhile, Excel is
limited to just 1 million
rows…
43 trillion GBs will be created by
2020
Enterprise data will grow
650% in the next five years
The world’s info now
doubles every year
and a half...
8. Excel Doesn’t Allow for Governance
Spreadsheets
Give analysts
control of the data,
but security and
integrity are lost as
multiple “versions”
of data are created
Data Warehouse
Designed to
provide a single
version of truth for
analysts and
facilitate
governance
IT wants governance … Business wants control
IT Analysts
9. While a traditional warehouse may be able to handle expected volumes, it can’t...
Is Your Current Warehouse the Solution?
Data Warehouse
CRM
ERP
etc.
ETL
Support rapid data
development, ad hoc
analysis
Answer unknown questions Quickly integrate new or
unstructured data
sources
Reporting
10. A New Approach Is Required...
To give analysts control and access to data
To accommodate increased data variety
To scale your analytical capabilities
To complement the existing solutions
To create a centralized governed repository
11. Enable Ad-hoc Analysis for the Business
Questions You’re
Not Asking
Questions
You’re Asking
Things you don’t know
Things you know
01101100 01100110 101011
00111011 01110011 01 1100
01101000 01100010 00 1101
01101100 01100110 0 01011
01100001 011100111000100
01101000 01100010 00111011
01101100 01100110 01101011
01100001 01110011 01100100
Ad-Hoc Analysis
● Heterogenous Data
● Massive Compute
● Ad-Hoc Analysis
● Centralized Repository
● Advanced Transform
...What your business needs
Traditional Reporting
● Trusted KPIs
● Historic Data
● Scheduled Reports
● Homogenous Data
● Pixel Perfect
What your business has...
12. Enable Discovery Before Reporting
Data Lake
Data Warehouse
00111011 01101100 01100110 101011
00111011 01100001 01110011011100
01101011 01101000 0110001000 1101
00111011 01101100 011001100 01011
00111011 01100001 011100111000100
01101011 01101000 01100010 001110110
00111011 01101100 01100110 011010111
CRM
ERP
Conform
Archive
Ad-Hoc
Analysis Reporting
New Data Sources
Existing Data Sources
Copy/Ingest
13. 13
Load all types of existing data into the lake “as is”
Step 1 - Fill the Lake
Data Variety
Centralized Repository
Incorporate New Data Sources
One Centralized Repository
• Eliminates Data Silos
• Improves Data Integration
• Promotes Data Governance
• Social Media
• Transactions
• Unstructured
• Sensor Data
• “As-is” Data
14. 00111011 01101100
01100110 10101100
00111011 01100001
01110011 10011100
01101011 01101000
01100010 00101101
Step 2 - Add a Discovery Layer
Give analysts control
and access to the data
Select a Data Discovery tool that is right for your business
Analyst Control Software Agnostic
• Total autonomy
• Ad-hoc analysis
• Personalized mash-ups
• Single version of the “truth”
• Oracle Big Data Discovery
• Datameer
• Platfora
• Open Source
Read the fine print: Be wary of tools that promise ad-hoc analysis, but only enable data consumption or visualization
15. Step 3 - Graduate to the Warehouse
Augment Existing
Solutions
Lake + Warehouse
quicker time-to-value, more data, more capability
Migrate crucial insights to the warehouse
Leverage existing reports/create new ones
Archive back into the data lake
Identify data quality issues quickly
Build transforms at massive scale
16. The Bigger Picture
Scalable Storage
and Compute
Tech Replacement
Massive Transform
Capabilities
New Advanced
Analytics
Introduce a repository that
can house all your
organization’s data, at scale,
with no risk of data loss
Lay the foundation for new
“untapped” analytical
capabilities like predictive,
machine learning, search, and
real-time alerting
Over time, reduce the size
and cost of your warehouse
by re-platforming some
reporting onto the data lake
Deliver powerful, performant
transforms leveraging the massive
compute power of the data lake
17. 17
Scenario:
Flipflops Resort is located in the heart of the caribbean and is a popular tourist destination
Their marketing team would like to better understand the impact of social sentiment on sales
How might this play out in the “real world”?
1010
1011 01101100
0110 01101011
1011 01100001
0011 01100100
10010101 0 0101011
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
01 01010
00111011 01101100
01100110 01101011
00111011 01100001
01110011 01100100
10 1010 110
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
01111 1 001
00111011 01101100 011
01101011 00111011 011
01110011 01100100 011
01101000 01100010 001
0011 010
00111011 01101100
01100110 01101011
00111011 01100001
01110011 01100100
1110 0101 111
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
18. Currently, Flipflops uses database file dumps in excel format to gather any insights...
This can be very time consuming and does not promote the inclusion of new data sources
Current Strategy
19. “Does our resort’s weather impact social
media sentiment?”
Discovery Starts With a Question
20. Need to ingest data from sources and formats that may not be not
structured in a spreadsheet friendly way
Limitations of Current Practices
Obtaining this data can be a labor intensive process
22. It's clear that Excel does not handle semi-structured data well,
and doesn’t support unstructured data at all
Attempting to draw insights from this data, or joining additional
data sources to draw any correlations would be difficult at best
This is where we can utilize discovery and the data lake to
answer our question
Outgrowing Excel
23. Piping Outside Data to the Lake
We’re focused on social media sentiment, so let’s grab some tweets and weather
data, and put it into our lake
New Data Sources
23
Data Lake
Data Warehouse
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
Ad-Hoc
Analysis
Reporting
24. 24
Piping More Data to the Lake
Data Lake
Data Warehouse
Ad-Hoc
Analysis Reporting
Additionally, let’s leverage existing marketing and booking data to
help answer our question
Existing Data Sources
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
25. View of our Data Lake through a web interface called Hue
Note the variety of file types that can be stored
Hue Lake View
Data Lake
26. 26
Analysis on top of Lake
We are now ready to start our discovery phase
and will use an analytical tool on top of our lake to visualize any insights
Data Lake
Data Warehouse
Ad-Hoc
Analysis
Reporting
Existing Data Sources
New Data Sources
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
27. Diving into the Lake
With a variety of both open
source and proprietary tools
available, we can quickly view
our data and gather potential
insights
28. 28
Options for Discovery
28
Ad-Hoc
Analysis
There are many different ways to analyze the data in the lake
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
30. Incorporating New Insights
Any insights we discover could be
included in a traditional data
warehouse and integrated into
regular reporting
Data Warehouse Reporting
New Data Fields/Sources
Data Lake
31. Data Lake
• Centralized access to heterogeneous
data
• Powerful data transformations
• Easily join data sets together
• Ability to visualize fields within
moments of upload
• Garnish insights into data without
significant time investment
• Maintain data integrity
Demo Recap
Microsoft Excel
• Local access to homogeneous data
• Slow data transformations, data loaded
onto local machine
• Tedious joining of data sets
• Visualizations must be built and configured
for new data sets
• Gathering data insights may involve notable
amount of staff time
• Loss of data governance and integrity
A comparison of what we accomplished using a data lake:
32. 32
Next Steps
So What Now?
1. Let Ranzal help your organization understand how to best move
forward with an “Analytics Roadmap”
2.) Start small with your data lake. Let Ranzal implement the
first
solution to deliver real ROI. This is often Infrastructure
Replacement, Active Archive, and/or ETL Offload
33. 33
Contact Information
Edgewater Ranzal
108 Corporate Park Drive, Suite 105
White Plains, NY 10604
Tel (914) 253-6600
Email: info@ranzal.com
45 Beech Street, Suite 109
London EC2Y 8AD
United Kingdom
Tel +44 (0) 2033 717 174
130 S. Jefferson St.
Suite 101
Chicago, IL 60661
Tel (847) 269-3524
200 Harvard Mill Square
Suite 210
Wakefield, MA 01880
Tel (781) 246-3343