SlideShare a Scribd company logo
1 of 33
NASDAQ: EDGW
Business Analytics Solutions Start Here
Integrated EPM, BI, and Big Data Solutions
2
Why is Microsoft Excel the most commonly
used BI tool in the world?
3
Everyone's an “expert”
Industry standard for spreadsheets 750 million users worldwide Over 30 years old
How many Excel “experts” does your organization have?
Excel is Familiar
4
Ultimately, Excel puts Analysts in Control
“Show me the data and I’ll know it when I see it”
...Not just about data consumption, but data
consumption and contribution
Analysts need to develop their own “personal” data
modification techniques and mashups
Business Analysts don’t know how to provide reporting
requirements until they get their hands on the data
5
Despite Excel’s utility for analysts, three primary issues exist...
But Problems Arise
No Data Variety...
No Data Volume...
No Data Governance...
6
By the time you count to 60...
This data will be structured, semi-structured, and completely unstructured
Excel Doesn’t Accommodate Variety
More than 204
million emails will
be sent
Billions of new sensor
data points will be
detected
Over 2 million
Google search
queries will be
performed
684,000 bits of
content shared on
Facebook
More than 100,000
tweets will be sent
7
Most companies in the US have at least 100,000 GBs of data stored
Excel Doesn’t Accommodate Volume
...Meanwhile, Excel is
limited to just 1 million
rows…
43 trillion GBs will be created by
2020
Enterprise data will grow
650% in the next five years
The world’s info now
doubles every year
and a half...
Excel Doesn’t Allow for Governance
Spreadsheets
Give analysts
control of the data,
but security and
integrity are lost as
multiple “versions”
of data are created
Data Warehouse
Designed to
provide a single
version of truth for
analysts and
facilitate
governance
IT wants governance … Business wants control
IT Analysts
While a traditional warehouse may be able to handle expected volumes, it can’t...
Is Your Current Warehouse the Solution?
Data Warehouse
CRM
ERP
etc.
ETL
Support rapid data
development, ad hoc
analysis
Answer unknown questions Quickly integrate new or
unstructured data
sources
Reporting
A New Approach Is Required...
To give analysts control and access to data
To accommodate increased data variety
To scale your analytical capabilities
To complement the existing solutions
To create a centralized governed repository
Enable Ad-hoc Analysis for the Business
Questions You’re
Not Asking
Questions
You’re Asking
Things you don’t know
Things you know
01101100 01100110 101011
00111011 01110011 01 1100
01101000 01100010 00 1101
01101100 01100110 0 01011
01100001 011100111000100
01101000 01100010 00111011
01101100 01100110 01101011
01100001 01110011 01100100
Ad-Hoc Analysis
● Heterogenous Data
● Massive Compute
● Ad-Hoc Analysis
● Centralized Repository
● Advanced Transform
...What your business needs
Traditional Reporting
● Trusted KPIs
● Historic Data
● Scheduled Reports
● Homogenous Data
● Pixel Perfect
What your business has...
Enable Discovery Before Reporting
Data Lake
Data Warehouse
00111011 01101100 01100110 101011
00111011 01100001 01110011011100
01101011 01101000 0110001000 1101
00111011 01101100 011001100 01011
00111011 01100001 011100111000100
01101011 01101000 01100010 001110110
00111011 01101100 01100110 011010111
CRM
ERP
Conform
Archive
Ad-Hoc
Analysis Reporting
New Data Sources
Existing Data Sources
Copy/Ingest
13
Load all types of existing data into the lake “as is”
Step 1 - Fill the Lake
Data Variety
Centralized Repository
Incorporate New Data Sources
One Centralized Repository
• Eliminates Data Silos
• Improves Data Integration
• Promotes Data Governance
• Social Media
• Transactions
• Unstructured
• Sensor Data
• “As-is” Data
00111011 01101100
01100110 10101100
00111011 01100001
01110011 10011100
01101011 01101000
01100010 00101101
Step 2 - Add a Discovery Layer
Give analysts control
and access to the data
Select a Data Discovery tool that is right for your business
Analyst Control Software Agnostic
• Total autonomy
• Ad-hoc analysis
• Personalized mash-ups
• Single version of the “truth”
• Oracle Big Data Discovery
• Datameer
• Platfora
• Open Source
Read the fine print: Be wary of tools that promise ad-hoc analysis, but only enable data consumption or visualization
Step 3 - Graduate to the Warehouse
Augment Existing
Solutions
Lake + Warehouse
quicker time-to-value, more data, more capability
Migrate crucial insights to the warehouse
Leverage existing reports/create new ones
Archive back into the data lake
Identify data quality issues quickly
Build transforms at massive scale
The Bigger Picture
Scalable Storage
and Compute
Tech Replacement
Massive Transform
Capabilities
New Advanced
Analytics
Introduce a repository that
can house all your
organization’s data, at scale,
with no risk of data loss
Lay the foundation for new
“untapped” analytical
capabilities like predictive,
machine learning, search, and
real-time alerting
Over time, reduce the size
and cost of your warehouse
by re-platforming some
reporting onto the data lake
Deliver powerful, performant
transforms leveraging the massive
compute power of the data lake
17
Scenario:
Flipflops Resort is located in the heart of the caribbean and is a popular tourist destination
Their marketing team would like to better understand the impact of social sentiment on sales
How might this play out in the “real world”?
1010
1011 01101100
0110 01101011
1011 01100001
0011 01100100
10010101 0 0101011
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
01 01010
00111011 01101100
01100110 01101011
00111011 01100001
01110011 01100100
10 1010 110
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
01111 1 001
00111011 01101100 011
01101011 00111011 011
01110011 01100100 011
01101000 01100010 001
0011 010
00111011 01101100
01100110 01101011
00111011 01100001
01110011 01100100
1110 0101 111
00111011 01101100 01100110
01101011 00111011 01100001
01110011 01100100 01101011
01101000 01100010 00111011
Currently, Flipflops uses database file dumps in excel format to gather any insights...
This can be very time consuming and does not promote the inclusion of new data sources
Current Strategy
“Does our resort’s weather impact social
media sentiment?”
Discovery Starts With a Question
Need to ingest data from sources and formats that may not be not
structured in a spreadsheet friendly way
Limitations of Current Practices
Obtaining this data can be a labor intensive process
Semi-Structured Data Example
It's clear that Excel does not handle semi-structured data well,
and doesn’t support unstructured data at all
Attempting to draw insights from this data, or joining additional
data sources to draw any correlations would be difficult at best
This is where we can utilize discovery and the data lake to
answer our question
Outgrowing Excel
Piping Outside Data to the Lake
We’re focused on social media sentiment, so let’s grab some tweets and weather
data, and put it into our lake
New Data Sources
23
Data Lake
Data Warehouse
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
Ad-Hoc
Analysis
Reporting
24
Piping More Data to the Lake
Data Lake
Data Warehouse
Ad-Hoc
Analysis Reporting
Additionally, let’s leverage existing marketing and booking data to
help answer our question
Existing Data Sources
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
View of our Data Lake through a web interface called Hue
Note the variety of file types that can be stored
Hue Lake View
Data Lake
26
Analysis on top of Lake
We are now ready to start our discovery phase
and will use an analytical tool on top of our lake to visualize any insights
Data Lake
Data Warehouse
Ad-Hoc
Analysis
Reporting
Existing Data Sources
New Data Sources
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
Diving into the Lake
With a variety of both open
source and proprietary tools
available, we can quickly view
our data and gather potential
insights
28
Options for Discovery
28
Ad-Hoc
Analysis
There are many different ways to analyze the data in the lake
00111011 0110110001100110
101011 0011101101100001
01110011 01110001101011
01101000 01100010 00 1101
00111011 01101100 01100110 0
01011 010100 111100 100010
Data Lake Demo
Incorporating New Insights
Any insights we discover could be
included in a traditional data
warehouse and integrated into
regular reporting
Data Warehouse Reporting
New Data Fields/Sources
Data Lake
Data Lake
• Centralized access to heterogeneous
data
• Powerful data transformations
• Easily join data sets together
• Ability to visualize fields within
moments of upload
• Garnish insights into data without
significant time investment
• Maintain data integrity
Demo Recap
Microsoft Excel
• Local access to homogeneous data
• Slow data transformations, data loaded
onto local machine
• Tedious joining of data sets
• Visualizations must be built and configured
for new data sets
• Gathering data insights may involve notable
amount of staff time
• Loss of data governance and integrity
A comparison of what we accomplished using a data lake:
32
Next Steps
So What Now?
1. Let Ranzal help your organization understand how to best move
forward with an “Analytics Roadmap”
2.) Start small with your data lake. Let Ranzal implement the
first
solution to deliver real ROI. This is often Infrastructure
Replacement, Active Archive, and/or ETL Offload
33
Contact Information
Edgewater Ranzal
108 Corporate Park Drive, Suite 105
White Plains, NY 10604
Tel (914) 253-6600
Email: info@ranzal.com
45 Beech Street, Suite 109
London EC2Y 8AD
United Kingdom
Tel +44 (0) 2033 717 174
130 S. Jefferson St.
Suite 101
Chicago, IL 60661
Tel (847) 269-3524
200 Harvard Mill Square
Suite 210
Wakefield, MA 01880
Tel (781) 246-3343

More Related Content

What's hot

Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?Inside Analysis
 
Data Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryData Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsCloudera, Inc.
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationJen Stirrup
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Cloudera, Inc.
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudDavid Menninger
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiProfessor Lili Saghafi
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey ResultsAtScale
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensChase McMichael
 
Supply chain and Big data : top 5 Trends
Supply chain and Big data : top 5 TrendsSupply chain and Big data : top 5 Trends
Supply chain and Big data : top 5 TrendsRetigence Technologies
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture Wake Tech BAS
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...Domino Data Lab
 

What's hot (19)

Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?Data Discovery and BI - Is there Really a Difference?
Data Discovery and BI - Is there Really a Difference?
 
Data Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryData Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal History
 
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data HubEnable Advanced Analytics with Hadoop and an Enterprise Data Hub
Enable Advanced Analytics with Hadoop and an Enterprise Data Hub
 
Data Lake,beyond the Data Warehouse
Data Lake,beyond the Data WarehouseData Lake,beyond the Data Warehouse
Data Lake,beyond the Data Warehouse
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data HubsWhat Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
What Comes After The Star Schema? Dimensional Modeling For Enterprise Data Hubs
 
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data VisualisationDigital Pragmatism with Business Intelligence, Big Data and Data Visualisation
Digital Pragmatism with Business Intelligence, Big Data and Data Visualisation
 
Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...Moving from data to insights: How to effectively drive business decisions & g...
Moving from data to insights: How to effectively drive business decisions & g...
 
Why Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the CloudWhy Your Data and Analytics Should Live in the Cloud
Why Your Data and Analytics Should Live in the Cloud
 
Notebooks in IBM
Notebooks in IBMNotebooks in IBM
Notebooks in IBM
 
Big data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili SaghafiBig data and Predictive Analytics By : Professor Lili Saghafi
Big data and Predictive Analytics By : Professor Lili Saghafi
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic HappensWhen Big Data and Predictive Analytics Collide: Visual Magic Happens
When Big Data and Predictive Analytics Collide: Visual Magic Happens
 
Supply chain and Big data : top 5 Trends
Supply chain and Big data : top 5 TrendsSupply chain and Big data : top 5 Trends
Supply chain and Big data : top 5 Trends
 
BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture BAS 150 Lesson 2 Lecture
BAS 150 Lesson 2 Lecture
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 

Viewers also liked

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Tema 2: Redes y Seguridad
Tema 2: Redes y SeguridadTema 2: Redes y Seguridad
Tema 2: Redes y Seguridadjaimejhc
 
David Burkett Resumé Dec 2015
David Burkett Resumé Dec 2015David Burkett Resumé Dec 2015
David Burkett Resumé Dec 2015David Burkett
 
Drampa prestation fr
Drampa prestation frDrampa prestation fr
Drampa prestation frDrampa.com
 
James pountney pgce interview presentation
James pountney pgce interview presentationJames pountney pgce interview presentation
James pountney pgce interview presentationPountneyJ
 
Overcoming Japan's Liquidity Trap
Overcoming Japan's Liquidity TrapOvercoming Japan's Liquidity Trap
Overcoming Japan's Liquidity Trappkconference
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introductionIBM Analytics
 
Unidades de almacenamiento
Unidades de almacenamientoUnidades de almacenamiento
Unidades de almacenamientoIsaacklilop
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThomas Kelly, PMP
 
S4 tarea4 megumj1
S4 tarea4 megumj1S4 tarea4 megumj1
S4 tarea4 megumj1Monse Melo
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best PracticesCapgemini
 

Viewers also liked (20)

Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
ACS DataMart_ppt
ACS DataMart_pptACS DataMart_ppt
ACS DataMart_ppt
 
Tema 2: Redes y Seguridad
Tema 2: Redes y SeguridadTema 2: Redes y Seguridad
Tema 2: Redes y Seguridad
 
Excel
ExcelExcel
Excel
 
David Burkett Resumé Dec 2015
David Burkett Resumé Dec 2015David Burkett Resumé Dec 2015
David Burkett Resumé Dec 2015
 
Drampa prestation fr
Drampa prestation frDrampa prestation fr
Drampa prestation fr
 
Finance Team Presentation(light layout)
Finance Team Presentation(light layout)Finance Team Presentation(light layout)
Finance Team Presentation(light layout)
 
James pountney pgce interview presentation
James pountney pgce interview presentationJames pountney pgce interview presentation
James pountney pgce interview presentation
 
Diseño de presas de presion
Diseño de presas de presionDiseño de presas de presion
Diseño de presas de presion
 
Overcoming Japan's Liquidity Trap
Overcoming Japan's Liquidity TrapOvercoming Japan's Liquidity Trap
Overcoming Japan's Liquidity Trap
 
Apresentacao_Zabbix
Apresentacao_ZabbixApresentacao_Zabbix
Apresentacao_Zabbix
 
Aviation and socio economic dynamics
Aviation and socio economic dynamicsAviation and socio economic dynamics
Aviation and socio economic dynamics
 
Data Lake: A simple introduction
Data Lake: A simple introductionData Lake: A simple introduction
Data Lake: A simple introduction
 
Unidades de almacenamiento
Unidades de almacenamientoUnidades de almacenamiento
Unidades de almacenamiento
 
The Emerging Data Lake IT Strategy
The Emerging Data Lake IT StrategyThe Emerging Data Lake IT Strategy
The Emerging Data Lake IT Strategy
 
S4 tarea4 megumj1
S4 tarea4 megumj1S4 tarea4 megumj1
S4 tarea4 megumj1
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 

Similar to BDD Data Lake Demo

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star SchemaDATAVERSITY
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataDATAVERSITY
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...DataScienceConferenc1
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lakepunedevscom
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDLT Solutions
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightAmazon Web Services LATAM
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeJared Winick
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeKoverse, Inc.
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauDATAVERSITY
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Denodo
 
Don’t Make Bad Data an Excuse
Don’t Make Bad Data an ExcuseDon’t Make Bad Data an Excuse
Don’t Make Bad Data an ExcuseConnexica
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Databricks
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)Moacyr Passador
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructureSimon Belak
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousingEr. Nawaraj Bhandari
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyJen Stirrup
 

Similar to BDD Data Lake Demo (20)

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in Data
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Big data in action
Big data in actionBig data in action
Big data in action
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
Decision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great DataDecision Ready Data: Power Your Analytics with Great Data
Decision Ready Data: Power Your Analytics with Great Data
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Big Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of LightBig Data & Analytics - Innovating at the Speed of Light
Big Data & Analytics - Innovating at the Speed of Light
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
One Large Data Lake, Hold the Hype
One Large Data Lake, Hold the HypeOne Large Data Lake, Hold the Hype
One Large Data Lake, Hold the Hype
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
Don’t Make Bad Data an Excuse
Don’t Make Bad Data an ExcuseDon’t Make Bad Data an Excuse
Don’t Make Bad Data an Excuse
 
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
Transforming Devon’s Data Pipeline with an Open Source Data Hub—Built on Data...
 
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)How to Quickly and Easily Draw Value  from Big Data Sources_Q3 symposia(Moa)
How to Quickly and Easily Draw Value from Big Data Sources_Q3 symposia(Moa)
 
Levelling up your data infrastructure
Levelling up your data infrastructureLevelling up your data infrastructure
Levelling up your data infrastructure
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Preconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technologyPreconference Overview of data visualisation and technology
Preconference Overview of data visualisation and technology
 

BDD Data Lake Demo

  • 1. NASDAQ: EDGW Business Analytics Solutions Start Here Integrated EPM, BI, and Big Data Solutions
  • 2. 2 Why is Microsoft Excel the most commonly used BI tool in the world?
  • 3. 3 Everyone's an “expert” Industry standard for spreadsheets 750 million users worldwide Over 30 years old How many Excel “experts” does your organization have? Excel is Familiar
  • 4. 4 Ultimately, Excel puts Analysts in Control “Show me the data and I’ll know it when I see it” ...Not just about data consumption, but data consumption and contribution Analysts need to develop their own “personal” data modification techniques and mashups Business Analysts don’t know how to provide reporting requirements until they get their hands on the data
  • 5. 5 Despite Excel’s utility for analysts, three primary issues exist... But Problems Arise No Data Variety... No Data Volume... No Data Governance...
  • 6. 6 By the time you count to 60... This data will be structured, semi-structured, and completely unstructured Excel Doesn’t Accommodate Variety More than 204 million emails will be sent Billions of new sensor data points will be detected Over 2 million Google search queries will be performed 684,000 bits of content shared on Facebook More than 100,000 tweets will be sent
  • 7. 7 Most companies in the US have at least 100,000 GBs of data stored Excel Doesn’t Accommodate Volume ...Meanwhile, Excel is limited to just 1 million rows… 43 trillion GBs will be created by 2020 Enterprise data will grow 650% in the next five years The world’s info now doubles every year and a half...
  • 8. Excel Doesn’t Allow for Governance Spreadsheets Give analysts control of the data, but security and integrity are lost as multiple “versions” of data are created Data Warehouse Designed to provide a single version of truth for analysts and facilitate governance IT wants governance … Business wants control IT Analysts
  • 9. While a traditional warehouse may be able to handle expected volumes, it can’t... Is Your Current Warehouse the Solution? Data Warehouse CRM ERP etc. ETL Support rapid data development, ad hoc analysis Answer unknown questions Quickly integrate new or unstructured data sources Reporting
  • 10. A New Approach Is Required... To give analysts control and access to data To accommodate increased data variety To scale your analytical capabilities To complement the existing solutions To create a centralized governed repository
  • 11. Enable Ad-hoc Analysis for the Business Questions You’re Not Asking Questions You’re Asking Things you don’t know Things you know 01101100 01100110 101011 00111011 01110011 01 1100 01101000 01100010 00 1101 01101100 01100110 0 01011 01100001 011100111000100 01101000 01100010 00111011 01101100 01100110 01101011 01100001 01110011 01100100 Ad-Hoc Analysis ● Heterogenous Data ● Massive Compute ● Ad-Hoc Analysis ● Centralized Repository ● Advanced Transform ...What your business needs Traditional Reporting ● Trusted KPIs ● Historic Data ● Scheduled Reports ● Homogenous Data ● Pixel Perfect What your business has...
  • 12. Enable Discovery Before Reporting Data Lake Data Warehouse 00111011 01101100 01100110 101011 00111011 01100001 01110011011100 01101011 01101000 0110001000 1101 00111011 01101100 011001100 01011 00111011 01100001 011100111000100 01101011 01101000 01100010 001110110 00111011 01101100 01100110 011010111 CRM ERP Conform Archive Ad-Hoc Analysis Reporting New Data Sources Existing Data Sources Copy/Ingest
  • 13. 13 Load all types of existing data into the lake “as is” Step 1 - Fill the Lake Data Variety Centralized Repository Incorporate New Data Sources One Centralized Repository • Eliminates Data Silos • Improves Data Integration • Promotes Data Governance • Social Media • Transactions • Unstructured • Sensor Data • “As-is” Data
  • 14. 00111011 01101100 01100110 10101100 00111011 01100001 01110011 10011100 01101011 01101000 01100010 00101101 Step 2 - Add a Discovery Layer Give analysts control and access to the data Select a Data Discovery tool that is right for your business Analyst Control Software Agnostic • Total autonomy • Ad-hoc analysis • Personalized mash-ups • Single version of the “truth” • Oracle Big Data Discovery • Datameer • Platfora • Open Source Read the fine print: Be wary of tools that promise ad-hoc analysis, but only enable data consumption or visualization
  • 15. Step 3 - Graduate to the Warehouse Augment Existing Solutions Lake + Warehouse quicker time-to-value, more data, more capability Migrate crucial insights to the warehouse Leverage existing reports/create new ones Archive back into the data lake Identify data quality issues quickly Build transforms at massive scale
  • 16. The Bigger Picture Scalable Storage and Compute Tech Replacement Massive Transform Capabilities New Advanced Analytics Introduce a repository that can house all your organization’s data, at scale, with no risk of data loss Lay the foundation for new “untapped” analytical capabilities like predictive, machine learning, search, and real-time alerting Over time, reduce the size and cost of your warehouse by re-platforming some reporting onto the data lake Deliver powerful, performant transforms leveraging the massive compute power of the data lake
  • 17. 17 Scenario: Flipflops Resort is located in the heart of the caribbean and is a popular tourist destination Their marketing team would like to better understand the impact of social sentiment on sales How might this play out in the “real world”? 1010 1011 01101100 0110 01101011 1011 01100001 0011 01100100 10010101 0 0101011 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011 01 01010 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 10 1010 110 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011 01111 1 001 00111011 01101100 011 01101011 00111011 011 01110011 01100100 011 01101000 01100010 001 0011 010 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 1110 0101 111 00111011 01101100 01100110 01101011 00111011 01100001 01110011 01100100 01101011 01101000 01100010 00111011
  • 18. Currently, Flipflops uses database file dumps in excel format to gather any insights... This can be very time consuming and does not promote the inclusion of new data sources Current Strategy
  • 19. “Does our resort’s weather impact social media sentiment?” Discovery Starts With a Question
  • 20. Need to ingest data from sources and formats that may not be not structured in a spreadsheet friendly way Limitations of Current Practices Obtaining this data can be a labor intensive process
  • 22. It's clear that Excel does not handle semi-structured data well, and doesn’t support unstructured data at all Attempting to draw insights from this data, or joining additional data sources to draw any correlations would be difficult at best This is where we can utilize discovery and the data lake to answer our question Outgrowing Excel
  • 23. Piping Outside Data to the Lake We’re focused on social media sentiment, so let’s grab some tweets and weather data, and put it into our lake New Data Sources 23 Data Lake Data Warehouse 00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010 Ad-Hoc Analysis Reporting
  • 24. 24 Piping More Data to the Lake Data Lake Data Warehouse Ad-Hoc Analysis Reporting Additionally, let’s leverage existing marketing and booking data to help answer our question Existing Data Sources 00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010
  • 25. View of our Data Lake through a web interface called Hue Note the variety of file types that can be stored Hue Lake View Data Lake
  • 26. 26 Analysis on top of Lake We are now ready to start our discovery phase and will use an analytical tool on top of our lake to visualize any insights Data Lake Data Warehouse Ad-Hoc Analysis Reporting Existing Data Sources New Data Sources 00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010
  • 27. Diving into the Lake With a variety of both open source and proprietary tools available, we can quickly view our data and gather potential insights
  • 28. 28 Options for Discovery 28 Ad-Hoc Analysis There are many different ways to analyze the data in the lake 00111011 0110110001100110 101011 0011101101100001 01110011 01110001101011 01101000 01100010 00 1101 00111011 01101100 01100110 0 01011 010100 111100 100010
  • 30. Incorporating New Insights Any insights we discover could be included in a traditional data warehouse and integrated into regular reporting Data Warehouse Reporting New Data Fields/Sources Data Lake
  • 31. Data Lake • Centralized access to heterogeneous data • Powerful data transformations • Easily join data sets together • Ability to visualize fields within moments of upload • Garnish insights into data without significant time investment • Maintain data integrity Demo Recap Microsoft Excel • Local access to homogeneous data • Slow data transformations, data loaded onto local machine • Tedious joining of data sets • Visualizations must be built and configured for new data sets • Gathering data insights may involve notable amount of staff time • Loss of data governance and integrity A comparison of what we accomplished using a data lake:
  • 32. 32 Next Steps So What Now? 1. Let Ranzal help your organization understand how to best move forward with an “Analytics Roadmap” 2.) Start small with your data lake. Let Ranzal implement the first solution to deliver real ROI. This is often Infrastructure Replacement, Active Archive, and/or ETL Offload
  • 33. 33 Contact Information Edgewater Ranzal 108 Corporate Park Drive, Suite 105 White Plains, NY 10604 Tel (914) 253-6600 Email: info@ranzal.com 45 Beech Street, Suite 109 London EC2Y 8AD United Kingdom Tel +44 (0) 2033 717 174 130 S. Jefferson St. Suite 101 Chicago, IL 60661 Tel (847) 269-3524 200 Harvard Mill Square Suite 210 Wakefield, MA 01880 Tel (781) 246-3343