SlideShare a Scribd company logo
1 of 14
Download to read offline
Data and Analytics at
Holland & Barrett
Building a "3-Michelin-star" data platform on AWS
to power insights at the speed of thought
Dobo Radichkov
Chief Data Officer
7 June 2023
About Holland & Barrett
Founded in 1870, we
exist to make health
and wellness a way of
life for everyone.
3
The Holland & Barrett Data & Analytics vision
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
4
The Holland & Barrett Data & Analytics vision
Data platform
Single source of truth
Analytics & BI
Personalisation
Data Science & ML
Health analytics
Analytics in the field
(stores & suppliers)
Data monetisation
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐
⭐
⭐
⭐
⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
5
We are now in ‘Phase II’ of this journey
▪ Complete core reporting
▪ Self-service BI
▪ Functional analytics
▪ Analytics in the field
▪ Data science & ML
BUILD NEW
FOUNDATION
SCALE OPERA-
TING MODEL
DRIVE VALUE &
INNOVATION
▪ Data strategy & vision
▪ Set up data teams
▪ AWS-centric data lake
▪ Redshift data warehouse
▪ Metabase BI platform
▪ Data as driver of value:
– Increase revenue
– Reduce costs
– Improve UX
– Optimise processes
▪ Data as driver of
innovation
2022 2023 2024+
CRAWL METAMORPHOSE WALK FLY TRANSCEND
I II III
6
The H&B data organisation
§ Data lake &
governance
§ Source
system
integration
§ Data services
§ Data
modelling &
transforma-
tions
§ Single source
of truth for
reporting &
analytics
§ Management
reporting
§ Operational
reporting
§ Data
visualisation
§ Data science
and applied
machine
learning
§ Forecasting &
optimisation
§ Personali-
sation
§ Product
squad
analytics
§ Product
experimen-
tation
§ Digital trade
analytics
§ Performance
marketing
analytics
§ CRM
analytics
1 2 3 4 5
DATA
ENGINEERING
DATA
WAREHOUSE
BUSINESS
INTELLIEGENCE
DATA
SCIENCE
WEB & APP
ANALYTICS
DIGITAL
ANALYTICS
6
7
“3-Michelin-star” data platform 😋
Operational master data
(customers, products, orders, stock, etc.)
BI & Core
Reporting
Data Science /
Applied ML
Product &
Digital Analytics
DATA
WAREHOUSE
Raw systems data
(security, data governance, access control)
DATA LAKE
Supply Chain
Retail Ops
Commercial
Customer
Finance
“Raw
ingredients
& food
storage”
“The
kitchen &
cooking
process”
“The
finished
meals &
service”
AS400
(until
demise)
Oracle
(until
demise)
GA4 …
Till
system
Order
mgmt.
system
Single
view of
stock
Production systems & services
8
Data lake architecture
AS400 Oracle
Amazon
Aurora
Amazon
RDS
On-premise DBs
(legacy estate)
Cloud DBs
…
Kafka Connect
(Amazon MSK)
APIs &
SaaS
DynamoDB
tables
Scraper
(in-house crawler)
Katalog UI Katalog DB
(Aurora PgSQL)
Right to erase
/ access
Eraser /
Accessor Success
Data lake
(Amazon S3)
▪ 5,000 datasets
▪ 98k fields
▪ 10.4M files
Data lake
S3 buckets
▪ JSON*
▪ Parquet
▪ AVRO
▪ CSV
GOVERNANCE
INGEST
Data lake index
(DynamoDB)
Airflow Airflow
1 2 3
4
5
9
Data warehouse architecture
4 x ra3.16xlarge
Data warehouse
(Amazon Redshift)
Data lake
(Amazon S3)
ELT orchestration
COPY
(data ingest)
External tables
(Amazon Redshift Spectrum)
APIs &
SaaS
▪ 2,670 tables
▪ 2m queries / month
▪ Layered data architecture
▪ Raw data stored
in SUPER columns
▪ Hourly ELT with
idempotent pipelines
Cache
(Amazon Aurora)
Foreign data wrapper
(pg_cron for scheduling)
External schema
(live federated queries)
▪ Used as fast storage
layer for data apps
▪ Serves raw data
for ELT data pipelines
1
2
3
10
New Redshift features we are excited about
▪ Long-awaited
improvement that
help us efficiently
generate large pre-
aggregated multi-
dimensional cubes
▪ Great in combination
with HLL functions for
fast unique counts
▪ MERGE to simplify our
incremental data
pipelines
▪ S3 auto-copy to
simplify data lake
ingest pipelines
▪ Aurora zero-ETL
integration to simplify
CDC pipelines
▪ Create ”masked”
versions of tables to
improve data privacy
and governance
▪ Eliminates overhead
of maintaining
multiple versions /
slices of the data
ROLLUP / CUBE
1 DATA MASKING
2 OTHER
3
11
BI & Analytics architecture
Data warehouse
(Amazon Redshift)
Raw data layer
Operational data layer
BI data layer
Cubes
Consumers
Raw unmodified data from source
systems – ELT from data lake
Clean, transformed, disaggregated
entity relationship model – starting
point for all reporting & analytics
Customer, orders, product, stores,
warehouse, stock master data
Semi-aggregated datasets to
enable fast reporting & analytics.
Includes pre-computed
HLL sketches for efficient
unique counts.
Multi-dimensional ROLAP cubes
delivering pre-aggregated metrics
along pre-defined dimensions.
Best practice: CUBE/ROLLUP on
top of pre-computed HLL sketches
Data IDEs
(JDBC)
Data sharing
Athena
One-stop shop
analytics
APIs
1
2
3
4
5
12
Redshift enables all reporting & analytics use cases
▪ Official reporting
built by central BI
team
▪ Self-service
analytics done
autonomously
within teams
▪ Field analytics
embedded in
customer-facing
apps
Registered users (self-service analytics)
13
Data Science & ML architecture
Develop Train Serve
Amazon Athena Amazon Redshift
Amazon EC2 AWS Batch
Aurora / RDS
DynamoDB
API Gateway AWS Lambda
R / Python
Notebooks
Feature engineering
Model development Model training
Amazon Redshift
Feature extraction pipelines
Amazon Athena
EC2 instances
ML data layer
Serverless
1 2 3
14
H&B data drives core business value & innovation
✓ Unit economics
✓ Store network planning
✓ Competitor intelligence
✓ Promo effectiveness
✓ Econometrics / MMM
✓ Space & range analytics
Commercial
Finance
Wellness
Supply chain
✓ Daily / weekly / monthly
management reporting
✓ Operational trade reporting
✓ Intraday / peak reporting
✓ Exception reporting
✓ Single view of stock
✓ Forecasting & replenishment
✓ Fulfilment analytics
✓ Stock availability
✓ Clearance / overstock analytics
✓ Supplier analytics
✓ Diagnostics
✓ Health analytics
✓ Personalised wellness
✓ Behavioural engine
Customer Digital
✓ Single customer view
✓ Customer lifecycle
management
✓ eCRM enablement
✓ Customer lifetime value
✓ Digi marketing measurement
✓ Personalisation & search
✓ OKRs
✓ UX / funnel analytics
✓ Experimentation platform
✓ Web / app event tracking
✓ SEO analytics

More Related Content

What's hot

Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMark Kromer
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup Omid Vahdaty
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSAmazon Web Services
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4jNeo4j
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessEnterprise Knowledge
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
With Kafka on the way to production/Kafka in produktion_ausblick
With Kafka on the way to production/Kafka in produktion_ausblickWith Kafka on the way to production/Kafka in produktion_ausblick
With Kafka on the way to production/Kafka in produktion_ausblickconfluent
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesCarole Gunst
 
Fujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesFujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesAlessandro Guli
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningDatabricks
 

What's hot (20)

Microsoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the CloudMicrosoft Azure BI Solutions in the Cloud
Microsoft Azure BI Solutions in the Cloud
 
Data Pipline Observability meetup
Data Pipline Observability meetup Data Pipline Observability meetup
Data Pipline Observability meetup
 
Building A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWSBuilding A Modern Data Analytics Architecture on AWS
Building A Modern Data Analytics Architecture on AWS
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Intro to Neo4j
Intro to Neo4jIntro to Neo4j
Intro to Neo4j
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Vector database
Vector databaseVector database
Vector database
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
An Introduction to Druid
An Introduction to DruidAn Introduction to Druid
An Introduction to Druid
 
With Kafka on the way to production/Kafka in produktion_ausblick
With Kafka on the way to production/Kafka in produktion_ausblickWith Kafka on the way to production/Kafka in produktion_ausblick
With Kafka on the way to production/Kafka in produktion_ausblick
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
Fujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud ServicesFujitsu Hybrid IT & Multi Cloud Services
Fujitsu Hybrid IT & Multi Cloud Services
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 

Similar to Building a '3-Michelin-star' Data Platform at Holland & Barrett

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Lucas Jellema
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Amazon Web Services
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsAmazon Web Services
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarPeter Ward
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataMarco Silva
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessInformatica
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationPerficient, Inc.
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptShiv Bharti
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offeringsSandeep Vyas
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsSonata Software
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Nicolas Georgeault
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceCambridge Semantics
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overviewcglylesu
 

Similar to Building a '3-Michelin-star' Data Platform at Holland & Barrett (20)

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
The Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a ServiceThe Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a Service
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big Data
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Building a '3-Michelin-star' Data Platform at Holland & Barrett

  • 1. Data and Analytics at Holland & Barrett Building a "3-Michelin-star" data platform on AWS to power insights at the speed of thought Dobo Radichkov Chief Data Officer 7 June 2023
  • 2. About Holland & Barrett Founded in 1870, we exist to make health and wellness a way of life for everyone.
  • 3. 3 The Holland & Barrett Data & Analytics vision For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers
  • 4. 4 The Holland & Barrett Data & Analytics vision Data platform Single source of truth Analytics & BI Personalisation Data Science & ML Health analytics Analytics in the field (stores & suppliers) Data monetisation For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐ ⭐ ⭐ ⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
  • 5. 5 We are now in ‘Phase II’ of this journey ▪ Complete core reporting ▪ Self-service BI ▪ Functional analytics ▪ Analytics in the field ▪ Data science & ML BUILD NEW FOUNDATION SCALE OPERA- TING MODEL DRIVE VALUE & INNOVATION ▪ Data strategy & vision ▪ Set up data teams ▪ AWS-centric data lake ▪ Redshift data warehouse ▪ Metabase BI platform ▪ Data as driver of value: – Increase revenue – Reduce costs – Improve UX – Optimise processes ▪ Data as driver of innovation 2022 2023 2024+ CRAWL METAMORPHOSE WALK FLY TRANSCEND I II III
  • 6. 6 The H&B data organisation § Data lake & governance § Source system integration § Data services § Data modelling & transforma- tions § Single source of truth for reporting & analytics § Management reporting § Operational reporting § Data visualisation § Data science and applied machine learning § Forecasting & optimisation § Personali- sation § Product squad analytics § Product experimen- tation § Digital trade analytics § Performance marketing analytics § CRM analytics 1 2 3 4 5 DATA ENGINEERING DATA WAREHOUSE BUSINESS INTELLIEGENCE DATA SCIENCE WEB & APP ANALYTICS DIGITAL ANALYTICS 6
  • 7. 7 “3-Michelin-star” data platform 😋 Operational master data (customers, products, orders, stock, etc.) BI & Core Reporting Data Science / Applied ML Product & Digital Analytics DATA WAREHOUSE Raw systems data (security, data governance, access control) DATA LAKE Supply Chain Retail Ops Commercial Customer Finance “Raw ingredients & food storage” “The kitchen & cooking process” “The finished meals & service” AS400 (until demise) Oracle (until demise) GA4 … Till system Order mgmt. system Single view of stock Production systems & services
  • 8. 8 Data lake architecture AS400 Oracle Amazon Aurora Amazon RDS On-premise DBs (legacy estate) Cloud DBs … Kafka Connect (Amazon MSK) APIs & SaaS DynamoDB tables Scraper (in-house crawler) Katalog UI Katalog DB (Aurora PgSQL) Right to erase / access Eraser / Accessor Success Data lake (Amazon S3) ▪ 5,000 datasets ▪ 98k fields ▪ 10.4M files Data lake S3 buckets ▪ JSON* ▪ Parquet ▪ AVRO ▪ CSV GOVERNANCE INGEST Data lake index (DynamoDB) Airflow Airflow 1 2 3 4 5
  • 9. 9 Data warehouse architecture 4 x ra3.16xlarge Data warehouse (Amazon Redshift) Data lake (Amazon S3) ELT orchestration COPY (data ingest) External tables (Amazon Redshift Spectrum) APIs & SaaS ▪ 2,670 tables ▪ 2m queries / month ▪ Layered data architecture ▪ Raw data stored in SUPER columns ▪ Hourly ELT with idempotent pipelines Cache (Amazon Aurora) Foreign data wrapper (pg_cron for scheduling) External schema (live federated queries) ▪ Used as fast storage layer for data apps ▪ Serves raw data for ELT data pipelines 1 2 3
  • 10. 10 New Redshift features we are excited about ▪ Long-awaited improvement that help us efficiently generate large pre- aggregated multi- dimensional cubes ▪ Great in combination with HLL functions for fast unique counts ▪ MERGE to simplify our incremental data pipelines ▪ S3 auto-copy to simplify data lake ingest pipelines ▪ Aurora zero-ETL integration to simplify CDC pipelines ▪ Create ”masked” versions of tables to improve data privacy and governance ▪ Eliminates overhead of maintaining multiple versions / slices of the data ROLLUP / CUBE 1 DATA MASKING 2 OTHER 3
  • 11. 11 BI & Analytics architecture Data warehouse (Amazon Redshift) Raw data layer Operational data layer BI data layer Cubes Consumers Raw unmodified data from source systems – ELT from data lake Clean, transformed, disaggregated entity relationship model – starting point for all reporting & analytics Customer, orders, product, stores, warehouse, stock master data Semi-aggregated datasets to enable fast reporting & analytics. Includes pre-computed HLL sketches for efficient unique counts. Multi-dimensional ROLAP cubes delivering pre-aggregated metrics along pre-defined dimensions. Best practice: CUBE/ROLLUP on top of pre-computed HLL sketches Data IDEs (JDBC) Data sharing Athena One-stop shop analytics APIs 1 2 3 4 5
  • 12. 12 Redshift enables all reporting & analytics use cases ▪ Official reporting built by central BI team ▪ Self-service analytics done autonomously within teams ▪ Field analytics embedded in customer-facing apps Registered users (self-service analytics)
  • 13. 13 Data Science & ML architecture Develop Train Serve Amazon Athena Amazon Redshift Amazon EC2 AWS Batch Aurora / RDS DynamoDB API Gateway AWS Lambda R / Python Notebooks Feature engineering Model development Model training Amazon Redshift Feature extraction pipelines Amazon Athena EC2 instances ML data layer Serverless 1 2 3
  • 14. 14 H&B data drives core business value & innovation ✓ Unit economics ✓ Store network planning ✓ Competitor intelligence ✓ Promo effectiveness ✓ Econometrics / MMM ✓ Space & range analytics Commercial Finance Wellness Supply chain ✓ Daily / weekly / monthly management reporting ✓ Operational trade reporting ✓ Intraday / peak reporting ✓ Exception reporting ✓ Single view of stock ✓ Forecasting & replenishment ✓ Fulfilment analytics ✓ Stock availability ✓ Clearance / overstock analytics ✓ Supplier analytics ✓ Diagnostics ✓ Health analytics ✓ Personalised wellness ✓ Behavioural engine Customer Digital ✓ Single customer view ✓ Customer lifecycle management ✓ eCRM enablement ✓ Customer lifetime value ✓ Digi marketing measurement ✓ Personalisation & search ✓ OKRs ✓ UX / funnel analytics ✓ Experimentation platform ✓ Web / app event tracking ✓ SEO analytics