SlideShare a Scribd company logo
1 of 32
Download to read offline
Scalable E-Commerce Data Pipelines with Kafka:
Real-Time Analytics, Batch, ML, Data Lake, and Beyond
Aristatle Subramaniam
Lead Data Engineer
Bigcommerce
Mahendra Kumar
VP, Data and Software Engineering
Bigcommerce
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Bigcommerce Data Pipeline Architecture
◿ Importance of Real-time
Data Handling
◿ Processing massive volumes
of Data
◿ Agility and Adaptability
◿ Data-Driven
Decision-Making
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
◿ Shopper Click Events
◿ Data Collection with Filebeat
◿ Confluent Kafka for Reliable Event Handling
◿ Real-time Analytics Processing
using Kafka Streams
Real-time Analytics Data Pipeline Architecture
◿ Deployment in Google Kubernetes
◿ Data Storage with HBase
◿ Querying with Apache Phoenix
◿ Cloud SQL for Aggregated and
Precomputed results
◿ Python API-Powered Dashboards
Real-time Analytics Design Considerations
◿ Guaranteed Event Writes
◿ Resilient Event Streaming platform
◿ Fault tolerance
◿ Why Kafka Streams?
◿ Scalable Kubernetes Deployment
◿ NoSQL Database (HBase)
◿ Advanced Querying with Apache Phoenix
◿ TTL to reduce data storage costs
◿ Message Replay and Idempotent
◿ Security, Audit, Log, GDPR, SOC2
◿ Overall Pipeline Latency
◿ Business Continuity and Disaster Recovery
◿ Store overview report
◿ Purchase funnel
◿ Marketing report
◿ Merchandising report
◿ Carts report
◿ Orders report
◿ Customer report
◿ more..
Bigcommerce Real-time Analytics Reports
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Challenges
◿ Bot filtering :
● Bot Impact on Analytics - Conversion ratios and purchase funnels.
● Rising Bot Traffic - 40%, of product page views and shopper visit events.
● Challenges in Data Accuracy - bot traffic can lead to skewed results.
● JavaScript non-bot event - for lookup
◿ Bulk import of historical orders and catalog for onboarding a new store.
◿ Repeated Orders and Carts events.
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Observability, Monitoring and Alerting
Confluent - Production Cluster Throughput
Observability, Monitoring and Alerting
HBase Read/Write latency
Observability, Monitoring and Alerting
Processing Jobs - Consumer Group Lag
Observability, Monitoring and Alerting
GCP Dashboard - A bird's-eye view of the data pipeline processing health.
Observability, Monitoring and Alerting
Alert Policies
Deployment Strategy
Google Kubernetes workloads
Deployment Strategy
GCP - Logging
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
◿ Meta’s Conversion API helps merchants effectively run advertising
campaigns on customer audiences.
◿ Server-Side Event Transmission APIs- visits, product page views,
cart additions, and orders
Meta conversion APIs
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Data Lake Architecture and Use Cases
◿ GCS Sink Connector to store and archive raw events
◿ Batch Processing ad hoc queries
◿ To train the machine learning model
◿ Insights - Rockstar Products, Most Abandoned Products, Best
Customers, Repeat Purchase Rate
◿ Internal Analytics
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
AI powered Personalized Product Recommendation
For shoppers
Enable shoppers to easily
discover the products they love
For merchants
Enable merchants to easily
leverage AI to merchandise
their products
Boost shopper engagement
and conversion rates
Others You May
Like
Frequently Bought
Together
◿ Business objectives
● Click through rate
○ Product catalog
○ Product Page View events
● Conversion rate
○ Product catalog
○ Product Page View events
○ Added To Cart Events
◿ Placement Type
● Detail page view
◿ Training data used
● Full product catalog data
● Product Page View and Cart : 3 month of data
● Real Time data
◿ Model tuning options
● Automatically
● Trigger Manually
Others you may like model
Frequently bought together
◿ Business objectives
● Revenue per session
◿ Placement Type
● Add to cart
● Registry
◿ Datasets used:
● Purchase events
● Product catalog
◿ Training data used
● Full product catalog data
● Purchase events data for 1 year
● Real Time data
◿ Automated model build and deployment
◿ Provide secure, scalable and performant APIs for serving product
recommendations
Personalized Product Recommendation
Model Training - Using Historical and Real-time Data
Personalized Product Recommendation
Serving recommendations to shoppers
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
BigCommerce Data Team
32
Thank you!
Aristatle
Subramaniam
/aristatle
Mahendra
Kumar
/mahendrajkumar

More Related Content

Similar to Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

Google Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGoogle Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGBG Mumbai
 
Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar ReapDigital
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...National Retail Federation
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Lonnell Branch
 
Webanalytics with Microsoft BI
Webanalytics with Microsoft BIWebanalytics with Microsoft BI
Webanalytics with Microsoft BITillmann Eitelberg
 
2011 Web Analytics Seminar
2011 Web Analytics Seminar2011 Web Analytics Seminar
2011 Web Analytics SeminarUnilytics
 
Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Craig Beak
 
Getting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google AnalyticsGetting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google Analyticsmarcwan
 
Establish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauEstablish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauCristina Vidu
 
LogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinLogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinChandan Kalita
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Data Con LA
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analyticsGayathri Choda
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analyticsGayathri Choda
 
Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011fendmark
 
Designing Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalDesigning Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalWIKOLO
 

Similar to Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond (20)

Google Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGoogle Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG Mumbai
 
Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
 
Dixon lau pm-concept
Dixon lau pm-conceptDixon lau pm-concept
Dixon lau pm-concept
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009
 
Webanalytics with Microsoft BI
Webanalytics with Microsoft BIWebanalytics with Microsoft BI
Webanalytics with Microsoft BI
 
2011 Web Analytics Seminar
2011 Web Analytics Seminar2011 Web Analytics Seminar
2011 Web Analytics Seminar
 
Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]
 
Getting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google AnalyticsGetting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google Analytics
 
Establish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauEstablish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and Tableau
 
LogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinLogicScale_Presentation_Linkedin
LogicScale_Presentation_Linkedin
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
 
BI_TVS.ppt
BI_TVS.pptBI_TVS.ppt
BI_TVS.ppt
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analytics
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analytics
 
WebC2 t1 t2-t3
WebC2 t1 t2-t3WebC2 t1 t2-t3
WebC2 t1 t2-t3
 
Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011
 
Designing Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalDesigning Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst Final
 
Supplier Performance Management V1 1
Supplier Performance Management V1 1Supplier Performance Management V1 1
Supplier Performance Management V1 1
 
Google Analytics Overview
Google Analytics OverviewGoogle Analytics Overview
Google Analytics Overview
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 

Recently uploaded (20)

Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

  • 1. Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond Aristatle Subramaniam Lead Data Engineer Bigcommerce Mahendra Kumar VP, Data and Software Engineering Bigcommerce
  • 2.
  • 3. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 4. Bigcommerce Data Pipeline Architecture ◿ Importance of Real-time Data Handling ◿ Processing massive volumes of Data ◿ Agility and Adaptability ◿ Data-Driven Decision-Making
  • 5. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 6. ◿ Shopper Click Events ◿ Data Collection with Filebeat ◿ Confluent Kafka for Reliable Event Handling ◿ Real-time Analytics Processing using Kafka Streams Real-time Analytics Data Pipeline Architecture ◿ Deployment in Google Kubernetes ◿ Data Storage with HBase ◿ Querying with Apache Phoenix ◿ Cloud SQL for Aggregated and Precomputed results ◿ Python API-Powered Dashboards
  • 7. Real-time Analytics Design Considerations ◿ Guaranteed Event Writes ◿ Resilient Event Streaming platform ◿ Fault tolerance ◿ Why Kafka Streams? ◿ Scalable Kubernetes Deployment ◿ NoSQL Database (HBase) ◿ Advanced Querying with Apache Phoenix ◿ TTL to reduce data storage costs ◿ Message Replay and Idempotent ◿ Security, Audit, Log, GDPR, SOC2 ◿ Overall Pipeline Latency ◿ Business Continuity and Disaster Recovery
  • 8. ◿ Store overview report ◿ Purchase funnel ◿ Marketing report ◿ Merchandising report ◿ Carts report ◿ Orders report ◿ Customer report ◿ more.. Bigcommerce Real-time Analytics Reports
  • 9. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 10. Challenges ◿ Bot filtering : ● Bot Impact on Analytics - Conversion ratios and purchase funnels. ● Rising Bot Traffic - 40%, of product page views and shopper visit events. ● Challenges in Data Accuracy - bot traffic can lead to skewed results. ● JavaScript non-bot event - for lookup ◿ Bulk import of historical orders and catalog for onboarding a new store. ◿ Repeated Orders and Carts events.
  • 11. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 12. Observability, Monitoring and Alerting Confluent - Production Cluster Throughput
  • 13. Observability, Monitoring and Alerting HBase Read/Write latency
  • 14. Observability, Monitoring and Alerting Processing Jobs - Consumer Group Lag
  • 15. Observability, Monitoring and Alerting GCP Dashboard - A bird's-eye view of the data pipeline processing health.
  • 16. Observability, Monitoring and Alerting Alert Policies
  • 19. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 20. ◿ Meta’s Conversion API helps merchants effectively run advertising campaigns on customer audiences. ◿ Server-Side Event Transmission APIs- visits, product page views, cart additions, and orders Meta conversion APIs
  • 21. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 22. Data Lake Architecture and Use Cases ◿ GCS Sink Connector to store and archive raw events ◿ Batch Processing ad hoc queries ◿ To train the machine learning model ◿ Insights - Rockstar Products, Most Abandoned Products, Best Customers, Repeat Purchase Rate ◿ Internal Analytics
  • 23. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 24. AI powered Personalized Product Recommendation For shoppers Enable shoppers to easily discover the products they love For merchants Enable merchants to easily leverage AI to merchandise their products Boost shopper engagement and conversion rates
  • 26. ◿ Business objectives ● Click through rate ○ Product catalog ○ Product Page View events ● Conversion rate ○ Product catalog ○ Product Page View events ○ Added To Cart Events ◿ Placement Type ● Detail page view ◿ Training data used ● Full product catalog data ● Product Page View and Cart : 3 month of data ● Real Time data ◿ Model tuning options ● Automatically ● Trigger Manually Others you may like model
  • 27. Frequently bought together ◿ Business objectives ● Revenue per session ◿ Placement Type ● Add to cart ● Registry ◿ Datasets used: ● Purchase events ● Product catalog ◿ Training data used ● Full product catalog data ● Purchase events data for 1 year ● Real Time data ◿ Automated model build and deployment ◿ Provide secure, scalable and performant APIs for serving product recommendations
  • 28. Personalized Product Recommendation Model Training - Using Historical and Real-time Data
  • 29. Personalized Product Recommendation Serving recommendations to shoppers
  • 30. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A