SlideShare a Scribd company logo
In the age of Big Data Analytics
Phil Watt
21st January 2019
Modernising Data Warehousing
Phil Watt
Bio
Phil is a Director in the Escient Victoria Consulting Team
with more than 25 years in large scale enterprise analytics
and integrated data management programmes. His focus is
in the journey to scale business programmes from small,
proof of concept initiatives through to operational company-
wide solutions with a high strategic impact. He has deep
experience in applying business analytics in the CME and
FS sectors in Western Europe and South Pacific, including
global technology leadership roles for Fortune 500
companies. After leading the definition of the technology
components of a State Government data reform strategy,
he now leads the technology implementation and business
alignment of three of its key foundation programmes.
3
All views expressed are my own and may not represent the
opinions of any entity whatsoever with whom I have been, am
now, or will be affiliated.
Disclaimer
4
Why have a data warehouse?
Why modernise your data warehouse?
Design Principles for a modern data warehouse
Cloud and Big Data
Patterns
Outline
Value from integrated data is proportional to the number of users
‘Build it and they will come’ is not a good strategy
Why Modernise?
• New capability
• Better query performance
• Lower data latency (data freshness)
• Lower support/ Opex costs
• Higher developer / end user productivity
• Faster implementation of new data /
requirements
• Risk reduction (stack out of support,
security concerns, skills availability)
• Developer productivity
• Maintenance (number of operations
and support staff)
• End user productivity
The modernisation business case
is likely to involve a mixture of:
Your biggest costs are likely to be
labour – not software or
infrastructure
Incumbent vendors may encourage you
to stick with current ‘best practice’ or
Suggest you have too much invested in
the current platform
https://en.wikipedia.org/wiki/Appeal_t
o_tradition
Vendors often use Appeal to Novelty
(shiny-shiny is better than old-
fangled…) to upsell or get in the door
Remember: If it ain’t broke, don’t fix it
https://en.wikipedia.org/wiki/Appeal_t
o_novelty
Avoid Appeal to Tradition &
Sunk Cost Fallacies
Avoid the Appeal to Novelty
Fallacy
Design Principles
10
# Principle Description
1 Climb the Stack SaaS | PaaS | IaaS | Metal. Compose higher order solutions from components. as-a-
Service allows outsourcing of lower level components.
2 Connect People to Data While transactional business systems are designed to to prevent direct access to data,
Analytics systems are designed to enable a connection to data.
3 Privacy by Design Information privacy and governance is included from the start of system design, on par
with system functionality.
4 Scalable Day 1 Capable of distributed scale-out from day 1.
5 Open Innovation Innovation in data and analytics capabilities is being driven by open collaboration on
algorithms and open source software.
6 Pipeline of Parts Data processing and pipeline components must have clear boundaries & hand-off points.
7 Reuse over Rebuild Reuse and extend components - design and build them in re-usable ways. Use DRY
(Don’t Repeat Yourself) code versus WET (Write Every Time) code.
8 Repeatable over Recoverable Service continuity driven by repeatability and automation over backup/restore.
9 Everything Testable All components must be verifiable via test automation.
10 Know your Data Ensure a solid understanding of the data – including how it was collected (& why), data
definitions, data quality, transformation rules and lineage, and operational metadata.
Carefully Choose Your Design Principles
(Samples below)
Cloud encourages an engineering approach
“If a human operator needs to touch your system during normal
operations, you have a bug. The definition of normal changes as
your systems grow.”
Carla Geisser, Google SRE
SRE – Site Reliability Engineering
Toil often has the following characteristics:
• Manual
• Repetitive
• Automatable
• Tactical
• No enduring value
• Effort to do it scales linearly as a service
grows
See https://landing.google.com/sre/sre-
book/toc/
Tenets of SRE
• Ensuring a Durable Focus on Engineering
• Pursuing Maximum Change Velocity Without
Violating a Service’s SLO
• Monitoring (Alerts, Tickets, Logging)
• Emergency Response
• Change Management
• Demand Forecasting and Capacity Planning
• Provisioning
• Efficiency and Performance
With SRE we work to avoid ToilSRE
14
Enables responsive change in business requirements
Reduces the body of technical knowledge you need to maintain internally
Spend time considering security and privacy challenges
• Engage a third party security expert if needed to help with security designs
Best match for the technical design principles above
• Easier access to SaaS and PaaS offerings
Be open to multi-cloud platform
• Help convince your cloud provider you have choices
• Take advantage of best of breed capabilities
• Don’t always rely on cloud vendor’s native offerings – consider third parties to help mitigate for stickiness
Cloud may INCREASE your infrastructure costs
• Likely to be offset by increased business responsiveness and richer feature availability
Using Cloud Infrastructure
16
‘Hadoop’ is much less relevant in the cloud today
• The overhead of HDFS is unnecessary given cloud storage options like
AWS S3 or Azure Blob Storage
• Useful data processing services are often packaged in PaaS – avoiding
the need to manage complex Hadoop clusters
Big Data and Cloud
Design Patterns
17
Analytical Ecosystem
Based on a diagram by Humza Naseer, University of Melbourne 2019
19
LoadTransform
Extract /
Access
Source
CRM / ERP /
Billing, etc.
Get / Put
Clean
Validate
Conform to
model
Use/present
High Level Patterns Have Hardly Changed for Data
Warehouse ETL in the last 15 years
20
LoadTransform
Extract /
Access
Source
CRM / ERP /
Billing, etc.
Get / Put
Clean
Validate
Conform to
model
Use/present
But latency requirements have
Batch
Stream
Mini Batch
ETL
•Talend
•Databricks
•Snaplogic
•etc.
iPaaS
•Informatica
•Dell Boomi
•Mulesoft
•etc.
ELT
•SSIS
•SQL
•Oracle Data
Integrator
•etc.
Frameworks
•Bonobo
•Pygrametl
•Apache Airflow
•etc.
Raw code
•Python
•Scala
•Spark
•etc.
And there is a bewildering choice of tools just to
get data into the Data Warehouse
Recommendation: Use tools that closely support your
design principles
Keep these in mind when choosing
• a database / query execution engine
• Where you do your data transformations – e.g. should you separate transformations from user queries?
IO and Query Concurrency Drives Performance and User Experience
22
De-risk tool selection by using
Continuous Integration / Continuous Delivery (CI/CD)
De-risk using a phased approach
CI/CD from day one
Select some core-reusable services to use first and do
parallel runs if possible
e.g. load modules, address cleansing,
Deployment – avoid all or nothing ‘big bang’
25
Inmon (normalized core) – Labrador
•Labradors love being around people and wants to be everybody's friend. They are
very sociable, intelligent, active, fun-loving animals who are eager to please. They
make ideal pets for families with children, and make great watchdogs too. The best
possible reference for the breed's docile and reliable nature is the fact that
virtually all guide dogs for the blind in Australia are Labrador Retrievers.
Kimball (dimensional core) – Kelpie
•Australian Kelpies are tough, independent, highly intelligent dogs with extreme
loyalty and utmost devotion to duty, and have a tractable disposition. Obedient
and super alert, the Australian Kelpie is eager to please and makes a devoted
companion, however, their inexhaustible energy makes them unsuitable for
suburban living.
Data Vault (hubs and satellites) – Chow Chow
•The Chow Chow has a reputation for being a one-man dog and not very tolerant of
those it doesn’t know. It can also tend to be willful and hard to train, so they are
not a good choice for a weak or new owner. In addition, this dog has a thick coat
that it sheds about twice a year. Expect to find fur everywhere during this time.
Choosing a Data Model Methodology
www.escient.com.au
Phil Watt
Director
phil.watt@escient.com.au
linkedin.com/in/dataphil
Appendix
Dev Stack (excluding ETL choices above…)

More Related Content

What's hot

Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
SnapLogic
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Dell World
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
Dell World
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
Inside Analysis
 
Kythera BioPharma Commercial Infrastructure 2015 05 28 final
Kythera BioPharma Commercial Infrastructure 2015 05 28 finalKythera BioPharma Commercial Infrastructure 2015 05 28 final
Kythera BioPharma Commercial Infrastructure 2015 05 28 final
Michael W. Hughes
 
Adapting to a Hybrid World [Webinar on Demand]
Adapting to a Hybrid World [Webinar on Demand]Adapting to a Hybrid World [Webinar on Demand]
Adapting to a Hybrid World [Webinar on Demand]
ServerCentral
 
Dell - HPC-29mai2012
Dell - HPC-29mai2012Dell - HPC-29mai2012
Dell - HPC-29mai2012
Agora Group
 
Webinar: DataStax Managed Cloud: focus on innovation, not administration
Webinar:  DataStax Managed Cloud: focus on innovation, not administrationWebinar:  DataStax Managed Cloud: focus on innovation, not administration
Webinar: DataStax Managed Cloud: focus on innovation, not administration
DataStax
 
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
TeamQuest Corporation
 
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoTMT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
Dell EMC World
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
Slim Baltagi
 
Nimble storage investor presentation - Q2 FY15
Nimble storage investor presentation -  Q2 FY15Nimble storage investor presentation -  Q2 FY15
Nimble storage investor presentation - Q2 FY15
nimblestorageIR
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Sean Cohen
 
Datacenter Pulse Stack v2
Datacenter Pulse Stack v2Datacenter Pulse Stack v2
Datacenter Pulse Stack v2
Jan Wiersma
 
Enterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A PerspectiveEnterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A Perspective
Saurav Mukherjee
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
SnapLogic
 
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureWhen Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
Irfan Elahi
 
Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices
Senturus
 
Solutions for a Data Intensive World in a Parallel Universe..
Solutions for a Data Intensive World in a Parallel Universe..Solutions for a Data Intensive World in a Parallel Universe..
Solutions for a Data Intensive World in a Parallel Universe..
Intel IT Center
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 

What's hot (20)

Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?Beyond Batch: Is ETL still relevant in the API economy?
Beyond Batch: Is ETL still relevant in the API economy?
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
 
Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?Are You Prepared For The Future Of Data Technologies?
Are You Prepared For The Future Of Data Technologies?
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
Kythera BioPharma Commercial Infrastructure 2015 05 28 final
Kythera BioPharma Commercial Infrastructure 2015 05 28 finalKythera BioPharma Commercial Infrastructure 2015 05 28 final
Kythera BioPharma Commercial Infrastructure 2015 05 28 final
 
Adapting to a Hybrid World [Webinar on Demand]
Adapting to a Hybrid World [Webinar on Demand]Adapting to a Hybrid World [Webinar on Demand]
Adapting to a Hybrid World [Webinar on Demand]
 
Dell - HPC-29mai2012
Dell - HPC-29mai2012Dell - HPC-29mai2012
Dell - HPC-29mai2012
 
Webinar: DataStax Managed Cloud: focus on innovation, not administration
Webinar:  DataStax Managed Cloud: focus on innovation, not administrationWebinar:  DataStax Managed Cloud: focus on innovation, not administration
Webinar: DataStax Managed Cloud: focus on innovation, not administration
 
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
Optimizing IT Costs & Services With Big Data (Little Effort!) - Case Studies ...
 
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoTMT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
MT11 - Turn Science Fiction into Reality by Using SAP HANA to Make Sense of IoT
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Nimble storage investor presentation - Q2 FY15
Nimble storage investor presentation -  Q2 FY15Nimble storage investor presentation -  Q2 FY15
Nimble storage investor presentation - Q2 FY15
 
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructureDeterministic capacity planning for OpenStack as elastic cloud infrastructure
Deterministic capacity planning for OpenStack as elastic cloud infrastructure
 
Datacenter Pulse Stack v2
Datacenter Pulse Stack v2Datacenter Pulse Stack v2
Datacenter Pulse Stack v2
 
Enterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A PerspectiveEnterprise Data Management - Data Lake - A Perspective
Enterprise Data Management - Data Lake - A Perspective
 
Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies Data Warehousing in the Cloud: Practical Migration Strategies
Data Warehousing in the Cloud: Practical Migration Strategies
 
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online LectureWhen Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
When Databases Meet Big data and Hadoop - Uni of Tromso Online Lecture
 
Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices Tableau Dashboard Design Best Practices
Tableau Dashboard Design Best Practices
 
Solutions for a Data Intensive World in a Parallel Universe..
Solutions for a Data Intensive World in a Parallel Universe..Solutions for a Data Intensive World in a Parallel Universe..
Solutions for a Data Intensive World in a Parallel Universe..
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 

Similar to Modernising the data warehouse - January 2019

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Cloudera, Inc.
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
Sense Corp
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
Patrick Van Renterghem
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
Embarcadero Technologies
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
DATAVERSITY
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
DATAVERSITY
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
DATAVERSITY
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
Dr. Wilfred Lin (Ph.D.)
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
Abdelkader OUARED
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
Caserta
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
DATAVERSITY
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
Cloudera, Inc.
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
DataScienceConferenc1
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
Perficient, Inc.
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 

Similar to Modernising the data warehouse - January 2019 (20)

The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and TableauAnalyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
Analyzing Billions of Data Rows with Alteryx, Amazon Redshift, and Tableau
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
Big Data's Impact on the Enterprise
Big Data's Impact on the EnterpriseBig Data's Impact on the Enterprise
Big Data's Impact on the Enterprise
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...
 
Five Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data StrategyFive Attributes to a Successful Big Data Strategy
Five Attributes to a Successful Big Data Strategy
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 

Recently uploaded

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
zsjl4mimo
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 

Recently uploaded (20)

一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(Harvard毕业证书)哈佛大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 

Modernising the data warehouse - January 2019

  • 1. In the age of Big Data Analytics Phil Watt 21st January 2019 Modernising Data Warehousing
  • 2. Phil Watt Bio Phil is a Director in the Escient Victoria Consulting Team with more than 25 years in large scale enterprise analytics and integrated data management programmes. His focus is in the journey to scale business programmes from small, proof of concept initiatives through to operational company- wide solutions with a high strategic impact. He has deep experience in applying business analytics in the CME and FS sectors in Western Europe and South Pacific, including global technology leadership roles for Fortune 500 companies. After leading the definition of the technology components of a State Government data reform strategy, he now leads the technology implementation and business alignment of three of its key foundation programmes.
  • 3. 3 All views expressed are my own and may not represent the opinions of any entity whatsoever with whom I have been, am now, or will be affiliated. Disclaimer
  • 4. 4 Why have a data warehouse? Why modernise your data warehouse? Design Principles for a modern data warehouse Cloud and Big Data Patterns Outline
  • 5. Value from integrated data is proportional to the number of users
  • 6. ‘Build it and they will come’ is not a good strategy
  • 8. • New capability • Better query performance • Lower data latency (data freshness) • Lower support/ Opex costs • Higher developer / end user productivity • Faster implementation of new data / requirements • Risk reduction (stack out of support, security concerns, skills availability) • Developer productivity • Maintenance (number of operations and support staff) • End user productivity The modernisation business case is likely to involve a mixture of: Your biggest costs are likely to be labour – not software or infrastructure
  • 9. Incumbent vendors may encourage you to stick with current ‘best practice’ or Suggest you have too much invested in the current platform https://en.wikipedia.org/wiki/Appeal_t o_tradition Vendors often use Appeal to Novelty (shiny-shiny is better than old- fangled…) to upsell or get in the door Remember: If it ain’t broke, don’t fix it https://en.wikipedia.org/wiki/Appeal_t o_novelty Avoid Appeal to Tradition & Sunk Cost Fallacies Avoid the Appeal to Novelty Fallacy
  • 11. # Principle Description 1 Climb the Stack SaaS | PaaS | IaaS | Metal. Compose higher order solutions from components. as-a- Service allows outsourcing of lower level components. 2 Connect People to Data While transactional business systems are designed to to prevent direct access to data, Analytics systems are designed to enable a connection to data. 3 Privacy by Design Information privacy and governance is included from the start of system design, on par with system functionality. 4 Scalable Day 1 Capable of distributed scale-out from day 1. 5 Open Innovation Innovation in data and analytics capabilities is being driven by open collaboration on algorithms and open source software. 6 Pipeline of Parts Data processing and pipeline components must have clear boundaries & hand-off points. 7 Reuse over Rebuild Reuse and extend components - design and build them in re-usable ways. Use DRY (Don’t Repeat Yourself) code versus WET (Write Every Time) code. 8 Repeatable over Recoverable Service continuity driven by repeatability and automation over backup/restore. 9 Everything Testable All components must be verifiable via test automation. 10 Know your Data Ensure a solid understanding of the data – including how it was collected (& why), data definitions, data quality, transformation rules and lineage, and operational metadata. Carefully Choose Your Design Principles (Samples below)
  • 12. Cloud encourages an engineering approach
  • 13. “If a human operator needs to touch your system during normal operations, you have a bug. The definition of normal changes as your systems grow.” Carla Geisser, Google SRE SRE – Site Reliability Engineering
  • 14. Toil often has the following characteristics: • Manual • Repetitive • Automatable • Tactical • No enduring value • Effort to do it scales linearly as a service grows See https://landing.google.com/sre/sre- book/toc/ Tenets of SRE • Ensuring a Durable Focus on Engineering • Pursuing Maximum Change Velocity Without Violating a Service’s SLO • Monitoring (Alerts, Tickets, Logging) • Emergency Response • Change Management • Demand Forecasting and Capacity Planning • Provisioning • Efficiency and Performance With SRE we work to avoid ToilSRE 14
  • 15. Enables responsive change in business requirements Reduces the body of technical knowledge you need to maintain internally Spend time considering security and privacy challenges • Engage a third party security expert if needed to help with security designs Best match for the technical design principles above • Easier access to SaaS and PaaS offerings Be open to multi-cloud platform • Help convince your cloud provider you have choices • Take advantage of best of breed capabilities • Don’t always rely on cloud vendor’s native offerings – consider third parties to help mitigate for stickiness Cloud may INCREASE your infrastructure costs • Likely to be offset by increased business responsiveness and richer feature availability Using Cloud Infrastructure
  • 16. 16 ‘Hadoop’ is much less relevant in the cloud today • The overhead of HDFS is unnecessary given cloud storage options like AWS S3 or Azure Blob Storage • Useful data processing services are often packaged in PaaS – avoiding the need to manage complex Hadoop clusters Big Data and Cloud
  • 18. Analytical Ecosystem Based on a diagram by Humza Naseer, University of Melbourne 2019
  • 19. 19 LoadTransform Extract / Access Source CRM / ERP / Billing, etc. Get / Put Clean Validate Conform to model Use/present High Level Patterns Have Hardly Changed for Data Warehouse ETL in the last 15 years
  • 20. 20 LoadTransform Extract / Access Source CRM / ERP / Billing, etc. Get / Put Clean Validate Conform to model Use/present But latency requirements have Batch Stream Mini Batch
  • 21. ETL •Talend •Databricks •Snaplogic •etc. iPaaS •Informatica •Dell Boomi •Mulesoft •etc. ELT •SSIS •SQL •Oracle Data Integrator •etc. Frameworks •Bonobo •Pygrametl •Apache Airflow •etc. Raw code •Python •Scala •Spark •etc. And there is a bewildering choice of tools just to get data into the Data Warehouse Recommendation: Use tools that closely support your design principles
  • 22. Keep these in mind when choosing • a database / query execution engine • Where you do your data transformations – e.g. should you separate transformations from user queries? IO and Query Concurrency Drives Performance and User Experience 22
  • 23. De-risk tool selection by using Continuous Integration / Continuous Delivery (CI/CD)
  • 24. De-risk using a phased approach CI/CD from day one Select some core-reusable services to use first and do parallel runs if possible e.g. load modules, address cleansing, Deployment – avoid all or nothing ‘big bang’
  • 25. 25 Inmon (normalized core) – Labrador •Labradors love being around people and wants to be everybody's friend. They are very sociable, intelligent, active, fun-loving animals who are eager to please. They make ideal pets for families with children, and make great watchdogs too. The best possible reference for the breed's docile and reliable nature is the fact that virtually all guide dogs for the blind in Australia are Labrador Retrievers. Kimball (dimensional core) – Kelpie •Australian Kelpies are tough, independent, highly intelligent dogs with extreme loyalty and utmost devotion to duty, and have a tractable disposition. Obedient and super alert, the Australian Kelpie is eager to please and makes a devoted companion, however, their inexhaustible energy makes them unsuitable for suburban living. Data Vault (hubs and satellites) – Chow Chow •The Chow Chow has a reputation for being a one-man dog and not very tolerant of those it doesn’t know. It can also tend to be willful and hard to train, so they are not a good choice for a weak or new owner. In addition, this dog has a thick coat that it sheds about twice a year. Expect to find fur everywhere during this time. Choosing a Data Model Methodology
  • 28. Dev Stack (excluding ETL choices above…)

Editor's Notes

  1. Providing engineered, integrated data for an individual is expensive – but becomes valuable when you integrate that data for many people or the whole organisation. There is a necessary governance overhead as data is integrated across the organisation as multiple departments need to get together to agree definitions, usage, etc.
  2. Have the capability to build and change things quickly – choose principles to enable this Don’t build before the demand appears – you probably can’t anticipate demand as well as you think
  3. Use design principles to inform and shape design and architecture choices Choose them carefully to avoid driving unintended consequences Our initial qualifying criteria is: Is there a reasonable opposite position to take for this principle? For example, you might reasonably prefer closed source software (principle 5), or prefer to use bare metal wherever you can (principle 1) These have been chosen carefully to encourage high reuse, low vendor lock-in, optionality and to be highly responsive to changing business requirements Keep them few in number so they are easy to absorb, understand (individually and in concert with the others) and easy to recall For example, principles 6, 7 and 8 lead to a conclusion that you should separate application logic from the data – so you have an implied ‘separation of concerns’ principle that doesn’t need to be explicitly stated. This is especially relevant for cloud and the ability to migrate technologies (e.g. change the underlying database It’s OK to have some tension between principles, as long as they don’t provoke confusion and team conflict
  4. Batch processing is seldom NOT required. Ensure consistency in update methods when using both batch and streaming to update the same target – this can cause profound DQ errors otherwise Be wary about patterns like the Lambda architecture (note this is not AWS Lambda serverless…) as they can cause information conflicts and different sources of the truth To get a consistent time for your integrated data it may not make sense to stream data all the way through Latency requirements can increase issues with records arriving out of order. How do you validate an order record if the customer record hasn’t been processed in the system yet? Should you just pass it through and revalidate later? Etc.
  5. Don’t forget concurrency for users – this is often a big performance issue iPaaS = integration Platform as a Service
  6. Note that the debate around ETL vs ELT has passionate advocates on both sides. Both patterns can be appropriate and you will need clear guidelines to choose between the two There are also new cloud patterns to spin up compute on demand – see Snowflake Data Warehouse Think strategically, not tactically – remember local optimisation can cause global sub-optimisation.