SlideShare a Scribd company logo
Strategic  Advisory
Big  Data  – Cloud   -­‐ Analytics
Info
Strategy
Fishing  in  the  
big  data  lake
DATA  EXPLORATION  AND  DISCOVERY  ANALYTICS  
FOR  DEEPER  BUSINESS  INSIGHTS
InfoStrategy
What  is  a  “data  lake”
data  lake (plural data  lakes)
A  massive,  easily  accessible  data  repository  
built  on  (relatively)  inexpensive  computer  
hardware  for  storing  "big  data".  Unlike  data  marts,  
which  are  optimized  for  data  analysis  by  storing  only  some  
attributes  and  dropping  data  below  the  level  aggregation,  a  
data  lake  is  designed  to  retain  all  attributes,  
especially  so  when  you  do  not  yet  know  what  the  
scope  of  data  or  its  use  will  be.
http://en.wiktionary.org/wiki/data_lake
…  Enterprise  Data  Hub  sounds  too  boring   !
InfoStrategy
Optimise  business  through  insights
Insight
Action
Optimise
Move  a  metric
Change  a  product
Change  behaviour/process
Hindsight
Realtime
Foresight
Trusted  information
Act  on  insights  gained
Execute  theories
Measure
Outcomes
Sentiment
Feedback
Explore  datasets,  discover  correlations,  patterns.
Undiscovered  facts
Information  Value
Data  Volumes
Forecasting,  planning  &  trending
Statistical  Analysis
Operational  reporting,  SCADA  control
Alerts  &  Events
Historical  reporting, Proof  of  operation
Regulatory,  statutory,  financial
Uncover  previously  
unknown  facts  
from  enriched  data  
in  the  data  lake
InfoStrategy
Future  state  of  analytics
Strategic  Intent
To  improve  BI  and  Analytical  capabilities  to  a  level  where  organisations  are  able  to  
access  and  analyse  information  in  a  secure,  timely  and  cost-­‐effective  manner.
Gain  key  insights  to  optimise  the  operations  of  your  business,  predict  the  best  
possible  outcomes  for  growth,  new  opportunities,   and  competitive  advantage  
across  all  business  lines.
Mission  Statement
“Providing  advanced  analytics  capability  across  all  business  units,  empowering  our  
people  with  the    processes  and  supporting  technologies  to  exploit  our  information  
assets  for  business  benefit.”
Target  Operating  Model  will  deliver:
Rapid  access  to  data  to  uncover  new  facts  via  advanced  data  exploration  and  
discovery  analytics.
Clarity  of  who  is  responsible  and  accountable  for  maintaining  critical  information  
assets  via  a  well  structured  governance  and  engagement  model.
A  trusted  and  highly  secure  source  of  data  for  all  analytical  information  requirements  
via  a  data  quality  assurance  program.
Trawling  for  value  in  the  big  data  lake
InfoStrategy
‘Fish  stocks’  are  replenished  from  existing  and  future  
operational  systems  plus  external  sources
Core  
Transactional  Data  
“operational”
Management  
Reporting
Unstructured  &  
External  Data
“contextual”
Enterprise  Dashboards
Reporting
Consolidation
Data  ScientistsBusiness  AnalystsBusiness  UsersCustomers
Data  Extraction
Discovery  Analytics  
Platform
Visualisation
Analysis
Data  Preparation
Data  Collection
Operational  
Reporting
Operational  Dashboards
Real-­‐time  Reports
Alerts  &  Exceptions
Embedded  BI
Production   Data  Repository
“Data  Lake”
Information  Governance
Data  Management
Supplier  &  
Industry  Data
“comparative”
InfoStrategy
Consolidated
Management
Reporting
Operational
Supporting
Capability
Discovery
Analytics
To  meet  the  demand  for  rapid  access  to  information  
users  must  adopt  a  flexible  multi-­‐platform   architecture  
What  reporting  does  for  established  operations  …  discovery  analytics  does  for  new  business  development.
The  trend  within  industry  is  to  move  away  from  the  single-­‐platform  monolithic  data  warehouses  towards  a  physically  distributed  environment  
for  information  delivery.  Many  businesses  are  extending  their  data  warehouse  environments  to  include  new  standalone  data  platforms  that  
are  conducive  to  discovery  analytics.  A  holistic  view  is  maintained  via  a  common,  single  replicated  dataset  and  an  enterprise information  
management  program,  governing  delivery  and  access  to  key  information  (data  lake).
Source   Applications
ERP
CRM
HR
Finance
Telemetry
Geospatial  GIS
Documents
Email
Files
Real-­time  Data  
Capture
Cleansing
Loading
Data  Warehouse
Modelling
Relational  DW
Data  Marts
Analysis  Cubes
Analytics Delivery
Cloud-­based    Service  Model
Actuarial  
Applications
Event-­Based  
Applications
Reporting
Production  
Reporting
OLAP  Analytics
Ad  Hoc  Query
External
Data
Exploration  &  
Discovery
Metadata  Integration
Event  Processing Results
Detailed  Datasets Results  
Collection  and  blending Insights
Portal
PDF
Desktop
Guided  
Visualisation
Mobile  BI
Active  
Dashboards
Data  Replication
Historical Data  Preparation
Storytelling
Information  Governance
Operational  Reporting  
Dimensional  
Modelling
ProductioniseInsights
InfoStrategy
Principles:  Easier  access  information   to  discover  new  
facts  about  the  business.
◦ Described  as  a  ‘sandpit’  environment,  providing  the  ability  to  explore  and  discover  new  
facts  about  the  business,  it’s  members  and  customers,  partners  and  competitive  
pressures.
◦ Also  used  for  testing  a  hypothesis  or  running  scenarios  across  the  data
◦ Getting  answers  to  ‘one-­‐off’  questions  which  are  not  addressed  through  the  normal  
published,  scheduled  operational  reporting  channels
◦ Data  is  replicated  from  all  operational  systems  into  a  single  landing  area,  ensuring  
traceability  and  reconciliation  to  all  consuming  applications,  such  as  the  data  warehouse,  
analytical  application,  and  other  business  applications.
◦ Clearly  defined  critical  business  entities/records  are  synchronised  (or  Mastered)  across  
all  applications  eliminating  duplication  and  confusion.  Data  quality  attributes  are  defined  
and  managed  for  each  critical  business  entity.
◦ A  fully  integrated  Member/Customer  view  is  established  across  both  analytical  and  
transactional  applications.
◦ Using  the  replicated  data  to  build  more  dynamic  analytical  data  structures  for  scheduled  
production  reporting  and  ah-­‐hoc  analysis
◦ Provide  users  with  the  tools  to  access    and  analyse data,  freely  explore  current  and  new  
datasets,  and  visualise patterns  and  discoveries  to  gain  deep  insights.
Providing  business  users  with  direct  
access  to  data  to  meet  immediate  
information  needs  where  the  
accuracy  of  the  data  is  not  the  
primary  objective.  
Having  a  single  source  of  truth  
across  all  business  applications  at  
detailed  level  from  which  all  
information  requests  are  satisfied.
Improved  environment  for  more  
cost  effective  and  faster  business  
intelligence  delivery.
Provide  business   users  with  the  ability  to  access  production  information  directly,  collect  it  as  needed,  and  
prepare  the  data  for  analysis.  Exploring  the  data  to  uncover  previously   unknown  facts  about  the  business,   and  
sharing  those  facts  visually  with  others.  Enrich  production  data  with  external  “context”  to  extend  insights.
Key  Principles Description
InfoStrategy
Benefits  of  Discovery  Analytics  versus  traditional   data  
warehousing
Classic  Data  Warehouse  Issues Discovery  Analytics Benefit
Lengthy  IT  Backlog  and  lack  of  resources  to  extend the  
EDW  to  support  new  business  requirements.
Data  can  be  explored  and  analysed  outside  of the  EDW  
environment  before  it  is  put  into  production  use.
High  costs  of  supporting increasing  data  volumes  and  
new  types  of  data.
Data  can  be  filtered  and  transformed  before  it  is  loaded  
into  the  EDW
Lack  of  flexibility  in  the  EDW  data  model  to  support  
constantly changing  business  requirements.
Data  discovery  support  dynamic  schema  on  read  
approach  which reduces  the  need  for  detailed  up-­‐front  
modelling.
Need  to  have  data  quality  and  governance  processes  in  
place  before  user  can  access  the  EDW  data.
The  investigative  nature  of data  discovery  has  lower  data  
quality  and  governance  requirements
Growing  use  of  personal  data  marts to  overcome  IT  
barriers  and  the  performance  overheads  of  ad  hoc  
processing
The  flexibility  and  performance  of  data  discovery  
encourages  shared  use  of  data  and  analytics.
Recent  proof  of  concept  for  Discovery  Analytics  in  the  cloud  (AWS),  has  provided  some  
considerable  cost  &  time  savings  in  infrastructure  and  hosting,  viz.:
$55  per  day  to  host  a  960GB  data  warehouse  
$32  per  day  to  host  a  Data  Integration  server  AND  a  BI  server.
2.5  weeks  to  setup  POC  environment  and  start  analysis  and  visualising  results.
InfoStrategy
Discovery  Analytics  Target  POC  Architecture
Structured  
Data
Unstructured  
Data
ERP
Telemetry
Web/External
Replication  of  corporate  data,  enriched  with  external  data  and  
content,  available  in  a  centrally  available  and  scalable  repository  
ready  for  exploration,  discovery  and  predictive  analysis  to  gain  
deep  insights  and  actionable  results.
InfoStrategy
Fishing  safely  with  the  appropriate  life  vests  is  
important  too.
Security  and  data  management  standards  are  available
International  
Standard  on  
Assurance  
Engagements
Service  Organisation  
Control  framework
Federal  Information  
Management  
Security  Act
Payment  Card  
Industry  –Data  
Security  Standard
Federal  Information  
Processing  Standard
International  Standards  
Organisation  –
Information  Security  
Standard
Source:  Amazon  Web  Services
Info
Strategy
To  learn  more  about  how  InfoStrategy
can  help  you  develop  your  big  data  
strategy  to  solve  your  big  business  
problems,  or  to  arrange  a  Proof  of  
Concept,  please  contact  us  today  using  
the  details  below.
InfoStrategy Pty  Ltd
246  Oxford  St,  Balmoral
Queensland  4171
Australia
Tel:  +61  7  3151  2021
Email:  
contactus@infostrategy.com.au

More Related Content

What's hot

Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
LibbySchulze
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
Antonios Chatzipavlis
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
Adam Doyle
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Data Mesh
Data MeshData Mesh
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
Databricks
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
Michael Rys
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
Rob Winters
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
Rodney Joyce
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
Databricks
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
priyadharshini626440
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 

What's hot (20)

Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)Azure Data Lake Intro (SQLBits 2016)
Azure Data Lake Intro (SQLBits 2016)
 
Design Principles for a Modern Data Warehouse
Design Principles for a Modern Data WarehouseDesign Principles for a Modern Data Warehouse
Design Principles for a Modern Data Warehouse
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Azure Data Engineering.pptx
Azure Data Engineering.pptxAzure Data Engineering.pptx
Azure Data Engineering.pptx
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 

Similar to Data lake benefits

intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
Lindy-Anne Botha
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
Caserta
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
Syaifuddin Ismail
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
Caserta
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
work
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
Microsoft
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Denodo
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
Sourabh Saxena
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
Sai Paravastu
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
Palani Kumar
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
Caserta
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Dataware housing
Dataware housingDataware housing
Dataware housing
work
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
Denodo
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Overview of Business Intelligence
Overview of Business IntelligenceOverview of Business Intelligence
Overview of Business Intelligence
Parthiv Dixit
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docx
Zitin Technologies PVT LTD
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
Nicolas Morales
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
Xpand IT
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
Cloudera, Inc.
 

Similar to Data lake benefits (20)

intelligent-data-lake_executive-brief
intelligent-data-lake_executive-briefintelligent-data-lake_executive-brief
intelligent-data-lake_executive-brief
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 
Setting Up the Data Lake
Setting Up the Data LakeSetting Up the Data Lake
Setting Up the Data Lake
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Big data and oracle
Big data and oracleBig data and oracle
Big data and oracle
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 
CS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_ArchitectureCS8091_BDA_Unit_I_Analytical_Architecture
CS8091_BDA_Unit_I_Analytical_Architecture
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Dataware housing
Dataware housingDataware housing
Dataware housing
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Overview of Business Intelligence
Overview of Business IntelligenceOverview of Business Intelligence
Overview of Business Intelligence
 
Big Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docxBig Data Analytics and Machine Learning Document.docx
Big Data Analytics and Machine Learning Document.docx
 
Machine Data Analytics
Machine Data AnalyticsMachine Data Analytics
Machine Data Analytics
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Big data journey to the cloud maz chaudhri 5.30.18
Big data journey to the cloud   maz chaudhri 5.30.18Big data journey to the cloud   maz chaudhri 5.30.18
Big data journey to the cloud maz chaudhri 5.30.18
 

Recently uploaded

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

Data lake benefits

  • 1. Strategic  Advisory Big  Data  – Cloud   -­‐ Analytics Info Strategy Fishing  in  the   big  data  lake DATA  EXPLORATION  AND  DISCOVERY  ANALYTICS   FOR  DEEPER  BUSINESS  INSIGHTS
  • 2. InfoStrategy What  is  a  “data  lake” data  lake (plural data  lakes) A  massive,  easily  accessible  data  repository   built  on  (relatively)  inexpensive  computer   hardware  for  storing  "big  data".  Unlike  data  marts,   which  are  optimized  for  data  analysis  by  storing  only  some   attributes  and  dropping  data  below  the  level  aggregation,  a   data  lake  is  designed  to  retain  all  attributes,   especially  so  when  you  do  not  yet  know  what  the   scope  of  data  or  its  use  will  be. http://en.wiktionary.org/wiki/data_lake …  Enterprise  Data  Hub  sounds  too  boring   !
  • 3. InfoStrategy Optimise  business  through  insights Insight Action Optimise Move  a  metric Change  a  product Change  behaviour/process Hindsight Realtime Foresight Trusted  information Act  on  insights  gained Execute  theories Measure Outcomes Sentiment Feedback Explore  datasets,  discover  correlations,  patterns. Undiscovered  facts Information  Value Data  Volumes Forecasting,  planning  &  trending Statistical  Analysis Operational  reporting,  SCADA  control Alerts  &  Events Historical  reporting, Proof  of  operation Regulatory,  statutory,  financial Uncover  previously   unknown  facts   from  enriched  data   in  the  data  lake
  • 4. InfoStrategy Future  state  of  analytics Strategic  Intent To  improve  BI  and  Analytical  capabilities  to  a  level  where  organisations  are  able  to   access  and  analyse  information  in  a  secure,  timely  and  cost-­‐effective  manner. Gain  key  insights  to  optimise  the  operations  of  your  business,  predict  the  best   possible  outcomes  for  growth,  new  opportunities,   and  competitive  advantage   across  all  business  lines. Mission  Statement “Providing  advanced  analytics  capability  across  all  business  units,  empowering  our   people  with  the    processes  and  supporting  technologies  to  exploit  our  information   assets  for  business  benefit.” Target  Operating  Model  will  deliver: Rapid  access  to  data  to  uncover  new  facts  via  advanced  data  exploration  and   discovery  analytics. Clarity  of  who  is  responsible  and  accountable  for  maintaining  critical  information   assets  via  a  well  structured  governance  and  engagement  model. A  trusted  and  highly  secure  source  of  data  for  all  analytical  information  requirements   via  a  data  quality  assurance  program. Trawling  for  value  in  the  big  data  lake
  • 5. InfoStrategy ‘Fish  stocks’  are  replenished  from  existing  and  future   operational  systems  plus  external  sources Core   Transactional  Data   “operational” Management   Reporting Unstructured  &   External  Data “contextual” Enterprise  Dashboards Reporting Consolidation Data  ScientistsBusiness  AnalystsBusiness  UsersCustomers Data  Extraction Discovery  Analytics   Platform Visualisation Analysis Data  Preparation Data  Collection Operational   Reporting Operational  Dashboards Real-­‐time  Reports Alerts  &  Exceptions Embedded  BI Production   Data  Repository “Data  Lake” Information  Governance Data  Management Supplier  &   Industry  Data “comparative”
  • 6. InfoStrategy Consolidated Management Reporting Operational Supporting Capability Discovery Analytics To  meet  the  demand  for  rapid  access  to  information   users  must  adopt  a  flexible  multi-­‐platform   architecture   What  reporting  does  for  established  operations  …  discovery  analytics  does  for  new  business  development. The  trend  within  industry  is  to  move  away  from  the  single-­‐platform  monolithic  data  warehouses  towards  a  physically  distributed  environment   for  information  delivery.  Many  businesses  are  extending  their  data  warehouse  environments  to  include  new  standalone  data  platforms  that   are  conducive  to  discovery  analytics.  A  holistic  view  is  maintained  via  a  common,  single  replicated  dataset  and  an  enterprise information   management  program,  governing  delivery  and  access  to  key  information  (data  lake). Source   Applications ERP CRM HR Finance Telemetry Geospatial  GIS Documents Email Files Real-­time  Data   Capture Cleansing Loading Data  Warehouse Modelling Relational  DW Data  Marts Analysis  Cubes Analytics Delivery Cloud-­based    Service  Model Actuarial   Applications Event-­Based   Applications Reporting Production   Reporting OLAP  Analytics Ad  Hoc  Query External Data Exploration  &   Discovery Metadata  Integration Event  Processing Results Detailed  Datasets Results   Collection  and  blending Insights Portal PDF Desktop Guided   Visualisation Mobile  BI Active   Dashboards Data  Replication Historical Data  Preparation Storytelling Information  Governance Operational  Reporting   Dimensional   Modelling ProductioniseInsights
  • 7. InfoStrategy Principles:  Easier  access  information   to  discover  new   facts  about  the  business. ◦ Described  as  a  ‘sandpit’  environment,  providing  the  ability  to  explore  and  discover  new   facts  about  the  business,  it’s  members  and  customers,  partners  and  competitive   pressures. ◦ Also  used  for  testing  a  hypothesis  or  running  scenarios  across  the  data ◦ Getting  answers  to  ‘one-­‐off’  questions  which  are  not  addressed  through  the  normal   published,  scheduled  operational  reporting  channels ◦ Data  is  replicated  from  all  operational  systems  into  a  single  landing  area,  ensuring   traceability  and  reconciliation  to  all  consuming  applications,  such  as  the  data  warehouse,   analytical  application,  and  other  business  applications. ◦ Clearly  defined  critical  business  entities/records  are  synchronised  (or  Mastered)  across   all  applications  eliminating  duplication  and  confusion.  Data  quality  attributes  are  defined   and  managed  for  each  critical  business  entity. ◦ A  fully  integrated  Member/Customer  view  is  established  across  both  analytical  and   transactional  applications. ◦ Using  the  replicated  data  to  build  more  dynamic  analytical  data  structures  for  scheduled   production  reporting  and  ah-­‐hoc  analysis ◦ Provide  users  with  the  tools  to  access    and  analyse data,  freely  explore  current  and  new   datasets,  and  visualise patterns  and  discoveries  to  gain  deep  insights. Providing  business  users  with  direct   access  to  data  to  meet  immediate   information  needs  where  the   accuracy  of  the  data  is  not  the   primary  objective.   Having  a  single  source  of  truth   across  all  business  applications  at   detailed  level  from  which  all   information  requests  are  satisfied. Improved  environment  for  more   cost  effective  and  faster  business   intelligence  delivery. Provide  business   users  with  the  ability  to  access  production  information  directly,  collect  it  as  needed,  and   prepare  the  data  for  analysis.  Exploring  the  data  to  uncover  previously   unknown  facts  about  the  business,   and   sharing  those  facts  visually  with  others.  Enrich  production  data  with  external  “context”  to  extend  insights. Key  Principles Description
  • 8. InfoStrategy Benefits  of  Discovery  Analytics  versus  traditional   data   warehousing Classic  Data  Warehouse  Issues Discovery  Analytics Benefit Lengthy  IT  Backlog  and  lack  of  resources  to  extend the   EDW  to  support  new  business  requirements. Data  can  be  explored  and  analysed  outside  of the  EDW   environment  before  it  is  put  into  production  use. High  costs  of  supporting increasing  data  volumes  and   new  types  of  data. Data  can  be  filtered  and  transformed  before  it  is  loaded   into  the  EDW Lack  of  flexibility  in  the  EDW  data  model  to  support   constantly changing  business  requirements. Data  discovery  support  dynamic  schema  on  read   approach  which reduces  the  need  for  detailed  up-­‐front   modelling. Need  to  have  data  quality  and  governance  processes  in   place  before  user  can  access  the  EDW  data. The  investigative  nature  of data  discovery  has  lower  data   quality  and  governance  requirements Growing  use  of  personal  data  marts to  overcome  IT   barriers  and  the  performance  overheads  of  ad  hoc   processing The  flexibility  and  performance  of  data  discovery   encourages  shared  use  of  data  and  analytics. Recent  proof  of  concept  for  Discovery  Analytics  in  the  cloud  (AWS),  has  provided  some   considerable  cost  &  time  savings  in  infrastructure  and  hosting,  viz.: $55  per  day  to  host  a  960GB  data  warehouse   $32  per  day  to  host  a  Data  Integration  server  AND  a  BI  server. 2.5  weeks  to  setup  POC  environment  and  start  analysis  and  visualising  results.
  • 9. InfoStrategy Discovery  Analytics  Target  POC  Architecture Structured   Data Unstructured   Data ERP Telemetry Web/External Replication  of  corporate  data,  enriched  with  external  data  and   content,  available  in  a  centrally  available  and  scalable  repository   ready  for  exploration,  discovery  and  predictive  analysis  to  gain   deep  insights  and  actionable  results.
  • 10. InfoStrategy Fishing  safely  with  the  appropriate  life  vests  is   important  too. Security  and  data  management  standards  are  available International   Standard  on   Assurance   Engagements Service  Organisation   Control  framework Federal  Information   Management   Security  Act Payment  Card   Industry  –Data   Security  Standard Federal  Information   Processing  Standard International  Standards   Organisation  – Information  Security   Standard Source:  Amazon  Web  Services
  • 11. Info Strategy To  learn  more  about  how  InfoStrategy can  help  you  develop  your  big  data   strategy  to  solve  your  big  business   problems,  or  to  arrange  a  Proof  of   Concept,  please  contact  us  today  using   the  details  below. InfoStrategy Pty  Ltd 246  Oxford  St,  Balmoral Queensland  4171 Australia Tel:  +61  7  3151  2021 Email:   contactus@infostrategy.com.au