SlideShare a Scribd company logo
Data and Analytics at
Holland & Barrett
Building a "3-Michelin-star" data platform on AWS
to power insights at the speed of thought
Dobo Radichkov
Chief Data Officer
7 June 2023
About Holland & Barrett
Founded in 1870, we
exist to make health
and wellness a way of
life for everyone.
3
The Holland & Barrett Data & Analytics vision
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
4
The Holland & Barrett Data & Analytics vision
Data platform
Single source of truth
Analytics & BI
Personalisation
Data Science & ML
Health analytics
Analytics in the field
(stores & suppliers)
Data monetisation
For our colleagues
To become the beating heart of the organisation and unlock
success for our colleagues, customers and partners.
For our partners
For our customers
⭐⭐⭐
⭐⭐⭐
⭐⭐⭐
⭐⭐
⭐⭐
⭐
⭐
⭐
⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
5
We are now in ‘Phase II’ of this journey
▪ Complete core reporting
▪ Self-service BI
▪ Functional analytics
▪ Analytics in the field
▪ Data science & ML
BUILD NEW
FOUNDATION
SCALE OPERA-
TING MODEL
DRIVE VALUE &
INNOVATION
▪ Data strategy & vision
▪ Set up data teams
▪ AWS-centric data lake
▪ Redshift data warehouse
▪ Metabase BI platform
▪ Data as driver of value:
– Increase revenue
– Reduce costs
– Improve UX
– Optimise processes
▪ Data as driver of
innovation
2022 2023 2024+
CRAWL METAMORPHOSE WALK FLY TRANSCEND
I II III
6
The H&B data organisation
§ Data lake &
governance
§ Source
system
integration
§ Data services
§ Data
modelling &
transforma-
tions
§ Single source
of truth for
reporting &
analytics
§ Management
reporting
§ Operational
reporting
§ Data
visualisation
§ Data science
and applied
machine
learning
§ Forecasting &
optimisation
§ Personali-
sation
§ Product
squad
analytics
§ Product
experimen-
tation
§ Digital trade
analytics
§ Performance
marketing
analytics
§ CRM
analytics
1 2 3 4 5
DATA
ENGINEERING
DATA
WAREHOUSE
BUSINESS
INTELLIEGENCE
DATA
SCIENCE
WEB & APP
ANALYTICS
DIGITAL
ANALYTICS
6
7
“3-Michelin-star” data platform 😋
Operational master data
(customers, products, orders, stock, etc.)
BI & Core
Reporting
Data Science /
Applied ML
Product &
Digital Analytics
DATA
WAREHOUSE
Raw systems data
(security, data governance, access control)
DATA LAKE
Supply Chain
Retail Ops
Commercial
Customer
Finance
“Raw
ingredients
& food
storage”
“The
kitchen &
cooking
process”
“The
finished
meals &
service”
AS400
(until
demise)
Oracle
(until
demise)
GA4 …
Till
system
Order
mgmt.
system
Single
view of
stock
Production systems & services
8
Data lake architecture
AS400 Oracle
Amazon
Aurora
Amazon
RDS
On-premise DBs
(legacy estate)
Cloud DBs
…
Kafka Connect
(Amazon MSK)
APIs &
SaaS
DynamoDB
tables
Scraper
(in-house crawler)
Katalog UI Katalog DB
(Aurora PgSQL)
Right to erase
/ access
Eraser /
Accessor Success
Data lake
(Amazon S3)
▪ 5,000 datasets
▪ 98k fields
▪ 10.4M files
Data lake
S3 buckets
▪ JSON*
▪ Parquet
▪ AVRO
▪ CSV
GOVERNANCE
INGEST
Data lake index
(DynamoDB)
Airflow Airflow
1 2 3
4
5
9
Data warehouse architecture
4 x ra3.16xlarge
Data warehouse
(Amazon Redshift)
Data lake
(Amazon S3)
ELT orchestration
COPY
(data ingest)
External tables
(Amazon Redshift Spectrum)
APIs &
SaaS
▪ 2,670 tables
▪ 2m queries / month
▪ Layered data architecture
▪ Raw data stored
in SUPER columns
▪ Hourly ELT with
idempotent pipelines
Cache
(Amazon Aurora)
Foreign data wrapper
(pg_cron for scheduling)
External schema
(live federated queries)
▪ Used as fast storage
layer for data apps
▪ Serves raw data
for ELT data pipelines
1
2
3
10
New Redshift features we are excited about
▪ Long-awaited
improvement that
help us efficiently
generate large pre-
aggregated multi-
dimensional cubes
▪ Great in combination
with HLL functions for
fast unique counts
▪ MERGE to simplify our
incremental data
pipelines
▪ S3 auto-copy to
simplify data lake
ingest pipelines
▪ Aurora zero-ETL
integration to simplify
CDC pipelines
▪ Create ”masked”
versions of tables to
improve data privacy
and governance
▪ Eliminates overhead
of maintaining
multiple versions /
slices of the data
ROLLUP / CUBE
1 DATA MASKING
2 OTHER
3
11
BI & Analytics architecture
Data warehouse
(Amazon Redshift)
Raw data layer
Operational data layer
BI data layer
Cubes
Consumers
Raw unmodified data from source
systems – ELT from data lake
Clean, transformed, disaggregated
entity relationship model – starting
point for all reporting & analytics
Customer, orders, product, stores,
warehouse, stock master data
Semi-aggregated datasets to
enable fast reporting & analytics.
Includes pre-computed
HLL sketches for efficient
unique counts.
Multi-dimensional ROLAP cubes
delivering pre-aggregated metrics
along pre-defined dimensions.
Best practice: CUBE/ROLLUP on
top of pre-computed HLL sketches
Data IDEs
(JDBC)
Data sharing
Athena
One-stop shop
analytics
APIs
1
2
3
4
5
12
Redshift enables all reporting & analytics use cases
▪ Official reporting
built by central BI
team
▪ Self-service
analytics done
autonomously
within teams
▪ Field analytics
embedded in
customer-facing
apps
Registered users (self-service analytics)
13
Data Science & ML architecture
Develop Train Serve
Amazon Athena Amazon Redshift
Amazon EC2 AWS Batch
Aurora / RDS
DynamoDB
API Gateway AWS Lambda
R / Python
Notebooks
Feature engineering
Model development Model training
Amazon Redshift
Feature extraction pipelines
Amazon Athena
EC2 instances
ML data layer
Serverless
1 2 3
14
H&B data drives core business value & innovation
✓ Unit economics
✓ Store network planning
✓ Competitor intelligence
✓ Promo effectiveness
✓ Econometrics / MMM
✓ Space & range analytics
Commercial
Finance
Wellness
Supply chain
✓ Daily / weekly / monthly
management reporting
✓ Operational trade reporting
✓ Intraday / peak reporting
✓ Exception reporting
✓ Single view of stock
✓ Forecasting & replenishment
✓ Fulfilment analytics
✓ Stock availability
✓ Clearance / overstock analytics
✓ Supplier analytics
✓ Diagnostics
✓ Health analytics
✓ Personalised wellness
✓ Behavioural engine
Customer Digital
✓ Single customer view
✓ Customer lifecycle
management
✓ eCRM enablement
✓ Customer lifetime value
✓ Digi marketing measurement
✓ Personalisation & search
✓ OKRs
✓ UX / funnel analytics
✓ Experimentation platform
✓ Web / app event tracking
✓ SEO analytics

More Related Content

What's hot

Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
Maxim Salnikov
 
Is AI generation the next platform shift?
Is AI generation the next platform shift?Is AI generation the next platform shift?
Is AI generation the next platform shift?
Bessemer Venture Partners
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
Databricks
 
Digital Transformation Toolkit - Overview and Approach
Digital Transformation Toolkit - Overview and ApproachDigital Transformation Toolkit - Overview and Approach
Digital Transformation Toolkit - Overview and Approach
PeterFranz6
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
Alex Ivy
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
Customer migration to azure sql database from on-premises SQL, for a SaaS app...Customer migration to azure sql database from on-premises SQL, for a SaaS app...
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
George Walters
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Lviv Startup Club
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Machine Learning Pitch Deck
Machine Learning Pitch DeckMachine Learning Pitch Deck
Machine Learning Pitch Deck
Nicholas Vossburg
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Precisely
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
Ellen Friedman
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
Datavail
 

What's hot (20)

Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Is AI generation the next platform shift?
Is AI generation the next platform shift?Is AI generation the next platform shift?
Is AI generation the next platform shift?
 
Architecting Agile Data Applications for Scale
Architecting Agile Data Applications for ScaleArchitecting Agile Data Applications for Scale
Architecting Agile Data Applications for Scale
 
Digital Transformation Toolkit - Overview and Approach
Digital Transformation Toolkit - Overview and ApproachDigital Transformation Toolkit - Overview and Approach
Digital Transformation Toolkit - Overview and Approach
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
Customer migration to azure sql database from on-premises SQL, for a SaaS app...Customer migration to azure sql database from on-premises SQL, for a SaaS app...
Customer migration to azure sql database from on-premises SQL, for a SaaS app...
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Machine Learning Pitch Deck
Machine Learning Pitch DeckMachine Learning Pitch Deck
Machine Learning Pitch Deck
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability Keeping the Pulse of Your Data:  Why You Need Data Observability 
Keeping the Pulse of Your Data:  Why You Need Data Observability 
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building the Modern Data Hub
Building the Modern Data HubBuilding the Modern Data Hub
Building the Modern Data Hub
 

Similar to Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Pentaho
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Lucas Jellema
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
Microsoft Tech Community
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
Amazon Web Services
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
Amazon Web Services
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
Peter Ward
 
The Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a ServiceThe Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a Service
The Business Intelligence Store
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
Tapdata
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big Data
Marco Silva
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Informatica
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
Perficient, Inc.
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
Matthew W. Bowers
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Shiv Bharti
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
Sandeep Vyas
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Denodo
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
Sonata Software
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Nicolas Georgeault
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
Cambridge Semantics
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
cglylesu
 

Similar to Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS (20)

Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & AnalyticsMDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
MDS ap_OEM Product Portfolio Intorduction to the DT & Analytics
 
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
Turn Data into Business Value – Starting with Data Analytics on Oracle Cloud ...
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Leveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven DecisionsLeveraging Cloud Analytics to Support Data-Driven Decisions
Leveraging Cloud Analytics to Support Data-Driven Decisions
 
Introduction to Azure Synapse Webinar
Introduction to Azure Synapse WebinarIntroduction to Azure Synapse Webinar
Introduction to Azure Synapse Webinar
 
The Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a ServiceThe Bi-Store Business Intelligence as a Service
The Bi-Store Business Intelligence as a Service
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
UTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big DataUTAD - Jornadas de Informática - Potential of Big Data
UTAD - Jornadas de Informática - Potential of Big Data
 
Why an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business SuccessWhy an AI-Powered Data Catalog Tool is Critical to Business Success
Why an AI-Powered Data Catalog Tool is Critical to Business Success
 
How to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data VisualizationHow to Empower Your Business Users with Oracle Data Visualization
How to Empower Your Business Users with Oracle Data Visualization
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_pptUsing obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
Using obi apps to consolidate data for taleo, salesforce and net suite apps_ppt
 
Alten calsoft labs analytics service offerings
Alten calsoft labs   analytics service offeringsAlten calsoft labs   analytics service offerings
Alten calsoft labs analytics service offerings
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
Data & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft PlatformsData & Analytics with CIS & Microsoft Platforms
Data & Analytics with CIS & Microsoft Platforms
 
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
Cortana analytics ou comment office 365 peut rendre vos données plus intellig...
 
Knowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data ScienceKnowledge Graph for Machine Learning and Data Science
Knowledge Graph for Machine Learning and Data Science
 
Oracle EPM BI Overview
Oracle EPM BI OverviewOracle EPM BI Overview
Oracle EPM BI Overview
 

Recently uploaded

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
Mydbops
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 

Recently uploaded (20)

Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Must Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during MigrationMust Know Postgres Extension for DBA and Developer during Migration
Must Know Postgres Extension for DBA and Developer during Migration
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 

Data and Analytics at Holland & Barrett: Building a '3-Michelin-star' Data Platform on AWS

  • 1. Data and Analytics at Holland & Barrett Building a "3-Michelin-star" data platform on AWS to power insights at the speed of thought Dobo Radichkov Chief Data Officer 7 June 2023
  • 2. About Holland & Barrett Founded in 1870, we exist to make health and wellness a way of life for everyone.
  • 3. 3 The Holland & Barrett Data & Analytics vision For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers
  • 4. 4 The Holland & Barrett Data & Analytics vision Data platform Single source of truth Analytics & BI Personalisation Data Science & ML Health analytics Analytics in the field (stores & suppliers) Data monetisation For our colleagues To become the beating heart of the organisation and unlock success for our colleagues, customers and partners. For our partners For our customers ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐ ⭐⭐ ⭐ ⭐ ⭐ ⭐⭐⭐ Mature ⭐⭐ Scaling ⭐ Early days
  • 5. 5 We are now in ‘Phase II’ of this journey ▪ Complete core reporting ▪ Self-service BI ▪ Functional analytics ▪ Analytics in the field ▪ Data science & ML BUILD NEW FOUNDATION SCALE OPERA- TING MODEL DRIVE VALUE & INNOVATION ▪ Data strategy & vision ▪ Set up data teams ▪ AWS-centric data lake ▪ Redshift data warehouse ▪ Metabase BI platform ▪ Data as driver of value: – Increase revenue – Reduce costs – Improve UX – Optimise processes ▪ Data as driver of innovation 2022 2023 2024+ CRAWL METAMORPHOSE WALK FLY TRANSCEND I II III
  • 6. 6 The H&B data organisation § Data lake & governance § Source system integration § Data services § Data modelling & transforma- tions § Single source of truth for reporting & analytics § Management reporting § Operational reporting § Data visualisation § Data science and applied machine learning § Forecasting & optimisation § Personali- sation § Product squad analytics § Product experimen- tation § Digital trade analytics § Performance marketing analytics § CRM analytics 1 2 3 4 5 DATA ENGINEERING DATA WAREHOUSE BUSINESS INTELLIEGENCE DATA SCIENCE WEB & APP ANALYTICS DIGITAL ANALYTICS 6
  • 7. 7 “3-Michelin-star” data platform 😋 Operational master data (customers, products, orders, stock, etc.) BI & Core Reporting Data Science / Applied ML Product & Digital Analytics DATA WAREHOUSE Raw systems data (security, data governance, access control) DATA LAKE Supply Chain Retail Ops Commercial Customer Finance “Raw ingredients & food storage” “The kitchen & cooking process” “The finished meals & service” AS400 (until demise) Oracle (until demise) GA4 … Till system Order mgmt. system Single view of stock Production systems & services
  • 8. 8 Data lake architecture AS400 Oracle Amazon Aurora Amazon RDS On-premise DBs (legacy estate) Cloud DBs … Kafka Connect (Amazon MSK) APIs & SaaS DynamoDB tables Scraper (in-house crawler) Katalog UI Katalog DB (Aurora PgSQL) Right to erase / access Eraser / Accessor Success Data lake (Amazon S3) ▪ 5,000 datasets ▪ 98k fields ▪ 10.4M files Data lake S3 buckets ▪ JSON* ▪ Parquet ▪ AVRO ▪ CSV GOVERNANCE INGEST Data lake index (DynamoDB) Airflow Airflow 1 2 3 4 5
  • 9. 9 Data warehouse architecture 4 x ra3.16xlarge Data warehouse (Amazon Redshift) Data lake (Amazon S3) ELT orchestration COPY (data ingest) External tables (Amazon Redshift Spectrum) APIs & SaaS ▪ 2,670 tables ▪ 2m queries / month ▪ Layered data architecture ▪ Raw data stored in SUPER columns ▪ Hourly ELT with idempotent pipelines Cache (Amazon Aurora) Foreign data wrapper (pg_cron for scheduling) External schema (live federated queries) ▪ Used as fast storage layer for data apps ▪ Serves raw data for ELT data pipelines 1 2 3
  • 10. 10 New Redshift features we are excited about ▪ Long-awaited improvement that help us efficiently generate large pre- aggregated multi- dimensional cubes ▪ Great in combination with HLL functions for fast unique counts ▪ MERGE to simplify our incremental data pipelines ▪ S3 auto-copy to simplify data lake ingest pipelines ▪ Aurora zero-ETL integration to simplify CDC pipelines ▪ Create ”masked” versions of tables to improve data privacy and governance ▪ Eliminates overhead of maintaining multiple versions / slices of the data ROLLUP / CUBE 1 DATA MASKING 2 OTHER 3
  • 11. 11 BI & Analytics architecture Data warehouse (Amazon Redshift) Raw data layer Operational data layer BI data layer Cubes Consumers Raw unmodified data from source systems – ELT from data lake Clean, transformed, disaggregated entity relationship model – starting point for all reporting & analytics Customer, orders, product, stores, warehouse, stock master data Semi-aggregated datasets to enable fast reporting & analytics. Includes pre-computed HLL sketches for efficient unique counts. Multi-dimensional ROLAP cubes delivering pre-aggregated metrics along pre-defined dimensions. Best practice: CUBE/ROLLUP on top of pre-computed HLL sketches Data IDEs (JDBC) Data sharing Athena One-stop shop analytics APIs 1 2 3 4 5
  • 12. 12 Redshift enables all reporting & analytics use cases ▪ Official reporting built by central BI team ▪ Self-service analytics done autonomously within teams ▪ Field analytics embedded in customer-facing apps Registered users (self-service analytics)
  • 13. 13 Data Science & ML architecture Develop Train Serve Amazon Athena Amazon Redshift Amazon EC2 AWS Batch Aurora / RDS DynamoDB API Gateway AWS Lambda R / Python Notebooks Feature engineering Model development Model training Amazon Redshift Feature extraction pipelines Amazon Athena EC2 instances ML data layer Serverless 1 2 3
  • 14. 14 H&B data drives core business value & innovation ✓ Unit economics ✓ Store network planning ✓ Competitor intelligence ✓ Promo effectiveness ✓ Econometrics / MMM ✓ Space & range analytics Commercial Finance Wellness Supply chain ✓ Daily / weekly / monthly management reporting ✓ Operational trade reporting ✓ Intraday / peak reporting ✓ Exception reporting ✓ Single view of stock ✓ Forecasting & replenishment ✓ Fulfilment analytics ✓ Stock availability ✓ Clearance / overstock analytics ✓ Supplier analytics ✓ Diagnostics ✓ Health analytics ✓ Personalised wellness ✓ Behavioural engine Customer Digital ✓ Single customer view ✓ Customer lifecycle management ✓ eCRM enablement ✓ Customer lifetime value ✓ Digi marketing measurement ✓ Personalisation & search ✓ OKRs ✓ UX / funnel analytics ✓ Experimentation platform ✓ Web / app event tracking ✓ SEO analytics