SlideShare a Scribd company logo
1 of 32
Download to read offline
Scalable E-Commerce Data Pipelines with Kafka:
Real-Time Analytics, Batch, ML, Data Lake, and Beyond
Aristatle Subramaniam
Lead Data Engineer
Bigcommerce
Mahendra Kumar
VP, Data and Software Engineering
Bigcommerce
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Bigcommerce Data Pipeline Architecture
◿ Importance of Real-time
Data Handling
◿ Processing massive volumes
of Data
◿ Agility and Adaptability
◿ Data-Driven
Decision-Making
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
◿ Shopper Click Events
◿ Data Collection with Filebeat
◿ Confluent Kafka for Reliable Event Handling
◿ Real-time Analytics Processing
using Kafka Streams
Real-time Analytics Data Pipeline Architecture
◿ Deployment in Google Kubernetes
◿ Data Storage with HBase
◿ Querying with Apache Phoenix
◿ Cloud SQL for Aggregated and
Precomputed results
◿ Python API-Powered Dashboards
Real-time Analytics Design Considerations
◿ Guaranteed Event Writes
◿ Resilient Event Streaming platform
◿ Fault tolerance
◿ Why Kafka Streams?
◿ Scalable Kubernetes Deployment
◿ NoSQL Database (HBase)
◿ Advanced Querying with Apache Phoenix
◿ TTL to reduce data storage costs
◿ Message Replay and Idempotent
◿ Security, Audit, Log, GDPR, SOC2
◿ Overall Pipeline Latency
◿ Business Continuity and Disaster Recovery
◿ Store overview report
◿ Purchase funnel
◿ Marketing report
◿ Merchandising report
◿ Carts report
◿ Orders report
◿ Customer report
◿ more..
Bigcommerce Real-time Analytics Reports
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Challenges
◿ Bot filtering :
● Bot Impact on Analytics - Conversion ratios and purchase funnels.
● Rising Bot Traffic - 40%, of product page views and shopper visit events.
● Challenges in Data Accuracy - bot traffic can lead to skewed results.
● JavaScript non-bot event - for lookup
◿ Bulk import of historical orders and catalog for onboarding a new store.
◿ Repeated Orders and Carts events.
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Observability, Monitoring and Alerting
Confluent - Production Cluster Throughput
Observability, Monitoring and Alerting
HBase Read/Write latency
Observability, Monitoring and Alerting
Processing Jobs - Consumer Group Lag
Observability, Monitoring and Alerting
GCP Dashboard - A bird's-eye view of the data pipeline processing health.
Observability, Monitoring and Alerting
Alert Policies
Deployment Strategy
Google Kubernetes workloads
Deployment Strategy
GCP - Logging
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
◿ Meta’s Conversion API helps merchants effectively run advertising
campaigns on customer audiences.
◿ Server-Side Event Transmission APIs- visits, product page views,
cart additions, and orders
Meta conversion APIs
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
Data Lake Architecture and Use Cases
◿ GCS Sink Connector to store and archive raw events
◿ Batch Processing ad hoc queries
◿ To train the machine learning model
◿ Insights - Rockstar Products, Most Abandoned Products, Best
Customers, Repeat Purchase Rate
◿ Internal Analytics
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
AI powered Personalized Product Recommendation
For shoppers
Enable shoppers to easily
discover the products they love
For merchants
Enable merchants to easily
leverage AI to merchandise
their products
Boost shopper engagement
and conversion rates
Others You May
Like
Frequently Bought
Together
◿ Business objectives
● Click through rate
○ Product catalog
○ Product Page View events
● Conversion rate
○ Product catalog
○ Product Page View events
○ Added To Cart Events
◿ Placement Type
● Detail page view
◿ Training data used
● Full product catalog data
● Product Page View and Cart : 3 month of data
● Real Time data
◿ Model tuning options
● Automatically
● Trigger Manually
Others you may like model
Frequently bought together
◿ Business objectives
● Revenue per session
◿ Placement Type
● Add to cart
● Registry
◿ Datasets used:
● Purchase events
● Product catalog
◿ Training data used
● Full product catalog data
● Purchase events data for 1 year
● Real Time data
◿ Automated model build and deployment
◿ Provide secure, scalable and performant APIs for serving product
recommendations
Personalized Product Recommendation
Model Training - Using Historical and Real-time Data
Personalized Product Recommendation
Serving recommendations to shoppers
Data Platform Architecture
Overview
Data Lake
Ad-hoc analysis, and ML
Real-time Analytics & Insights
1.6B events per day
Personalized Product Recommendation
Improve conversion ratio and click through rate
Meta Conversion APIs
Run effective Ad campaigns
Challenges
Bot filtering
Agenda
Observability, Monitoring & Alerting
Charts
Q&A
BigCommerce Data Team
32
Thank you!
Aristatle
Subramaniam
/aristatle
Mahendra
Kumar
/mahendrajkumar

More Related Content

Similar to Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

Google Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGoogle Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGBG Mumbai
 
Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar ReapDigital
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...National Retail Federation
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Lonnell Branch
 
Webanalytics with Microsoft BI
Webanalytics with Microsoft BIWebanalytics with Microsoft BI
Webanalytics with Microsoft BITillmann Eitelberg
 
2011 Web Analytics Seminar
2011 Web Analytics Seminar2011 Web Analytics Seminar
2011 Web Analytics SeminarUnilytics
 
Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Craig Beak
 
Getting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google AnalyticsGetting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google Analyticsmarcwan
 
Establish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauEstablish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauCristina Vidu
 
LogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinLogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinChandan Kalita
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Data Con LA
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analyticsGayathri Choda
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analyticsGayathri Choda
 
Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011fendmark
 
Designing Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalDesigning Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalWIKOLO
 

Similar to Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond (20)

Google Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG MumbaiGoogle Analytics & Web Masters Tools - GBG Mumbai
Google Analytics & Web Masters Tools - GBG Mumbai
 
Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar Getting Actionable Insights with Google Analytics - Webinar
Getting Actionable Insights with Google Analytics - Webinar
 
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
Not Tooling Around: How The Home Depot Uses Machine Learning for Vendor Accou...
 
Dixon lau pm-concept
Dixon lau pm-conceptDixon lau pm-concept
Dixon lau pm-concept
 
Business Intelligence Challenges 2009
Business Intelligence Challenges 2009Business Intelligence Challenges 2009
Business Intelligence Challenges 2009
 
Webanalytics with Microsoft BI
Webanalytics with Microsoft BIWebanalytics with Microsoft BI
Webanalytics with Microsoft BI
 
2011 Web Analytics Seminar
2011 Web Analytics Seminar2011 Web Analytics Seminar
2011 Web Analytics Seminar
 
Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]Net suite+crm+++customer+presentation[1]
Net suite+crm+++customer+presentation[1]
 
Getting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google AnalyticsGetting Started with AdWords API and Google Analytics
Getting Started with AdWords API and Google Analytics
 
Establish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and TableauEstablish a 360-view of your data with UiPath and Tableau
Establish a 360-view of your data with UiPath and Tableau
 
LogicScale_Presentation_Linkedin
LogicScale_Presentation_LinkedinLogicScale_Presentation_Linkedin
LogicScale_Presentation_Linkedin
 
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
 
BI_TVS.ppt
BI_TVS.pptBI_TVS.ppt
BI_TVS.ppt
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analytics
 
Module 1 introduction to web analytics
Module 1   introduction to web analyticsModule 1   introduction to web analytics
Module 1 introduction to web analytics
 
WebC2 t1 t2-t3
WebC2 t1 t2-t3WebC2 t1 t2-t3
WebC2 t1 t2-t3
 
Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011Web Analytics and Wordpress - Wordcamp Chicago 2011
Web Analytics and Wordpress - Wordcamp Chicago 2011
 
Designing Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst FinalDesigning Outcomes For Usability Nycupa Hurst Final
Designing Outcomes For Usability Nycupa Hurst Final
 
Supplier Performance Management V1 1
Supplier Performance Management V1 1Supplier Performance Management V1 1
Supplier Performance Management V1 1
 
Google Analytics Overview
Google Analytics OverviewGoogle Analytics Overview
Google Analytics Overview
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Recently uploaded (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond

  • 1. Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML, Data Lake, and Beyond Aristatle Subramaniam Lead Data Engineer Bigcommerce Mahendra Kumar VP, Data and Software Engineering Bigcommerce
  • 2.
  • 3. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 4. Bigcommerce Data Pipeline Architecture ◿ Importance of Real-time Data Handling ◿ Processing massive volumes of Data ◿ Agility and Adaptability ◿ Data-Driven Decision-Making
  • 5. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 6. ◿ Shopper Click Events ◿ Data Collection with Filebeat ◿ Confluent Kafka for Reliable Event Handling ◿ Real-time Analytics Processing using Kafka Streams Real-time Analytics Data Pipeline Architecture ◿ Deployment in Google Kubernetes ◿ Data Storage with HBase ◿ Querying with Apache Phoenix ◿ Cloud SQL for Aggregated and Precomputed results ◿ Python API-Powered Dashboards
  • 7. Real-time Analytics Design Considerations ◿ Guaranteed Event Writes ◿ Resilient Event Streaming platform ◿ Fault tolerance ◿ Why Kafka Streams? ◿ Scalable Kubernetes Deployment ◿ NoSQL Database (HBase) ◿ Advanced Querying with Apache Phoenix ◿ TTL to reduce data storage costs ◿ Message Replay and Idempotent ◿ Security, Audit, Log, GDPR, SOC2 ◿ Overall Pipeline Latency ◿ Business Continuity and Disaster Recovery
  • 8. ◿ Store overview report ◿ Purchase funnel ◿ Marketing report ◿ Merchandising report ◿ Carts report ◿ Orders report ◿ Customer report ◿ more.. Bigcommerce Real-time Analytics Reports
  • 9. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 10. Challenges ◿ Bot filtering : ● Bot Impact on Analytics - Conversion ratios and purchase funnels. ● Rising Bot Traffic - 40%, of product page views and shopper visit events. ● Challenges in Data Accuracy - bot traffic can lead to skewed results. ● JavaScript non-bot event - for lookup ◿ Bulk import of historical orders and catalog for onboarding a new store. ◿ Repeated Orders and Carts events.
  • 11. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 12. Observability, Monitoring and Alerting Confluent - Production Cluster Throughput
  • 13. Observability, Monitoring and Alerting HBase Read/Write latency
  • 14. Observability, Monitoring and Alerting Processing Jobs - Consumer Group Lag
  • 15. Observability, Monitoring and Alerting GCP Dashboard - A bird's-eye view of the data pipeline processing health.
  • 16. Observability, Monitoring and Alerting Alert Policies
  • 19. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 20. ◿ Meta’s Conversion API helps merchants effectively run advertising campaigns on customer audiences. ◿ Server-Side Event Transmission APIs- visits, product page views, cart additions, and orders Meta conversion APIs
  • 21. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 22. Data Lake Architecture and Use Cases ◿ GCS Sink Connector to store and archive raw events ◿ Batch Processing ad hoc queries ◿ To train the machine learning model ◿ Insights - Rockstar Products, Most Abandoned Products, Best Customers, Repeat Purchase Rate ◿ Internal Analytics
  • 23. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A
  • 24. AI powered Personalized Product Recommendation For shoppers Enable shoppers to easily discover the products they love For merchants Enable merchants to easily leverage AI to merchandise their products Boost shopper engagement and conversion rates
  • 26. ◿ Business objectives ● Click through rate ○ Product catalog ○ Product Page View events ● Conversion rate ○ Product catalog ○ Product Page View events ○ Added To Cart Events ◿ Placement Type ● Detail page view ◿ Training data used ● Full product catalog data ● Product Page View and Cart : 3 month of data ● Real Time data ◿ Model tuning options ● Automatically ● Trigger Manually Others you may like model
  • 27. Frequently bought together ◿ Business objectives ● Revenue per session ◿ Placement Type ● Add to cart ● Registry ◿ Datasets used: ● Purchase events ● Product catalog ◿ Training data used ● Full product catalog data ● Purchase events data for 1 year ● Real Time data ◿ Automated model build and deployment ◿ Provide secure, scalable and performant APIs for serving product recommendations
  • 28. Personalized Product Recommendation Model Training - Using Historical and Real-time Data
  • 29. Personalized Product Recommendation Serving recommendations to shoppers
  • 30. Data Platform Architecture Overview Data Lake Ad-hoc analysis, and ML Real-time Analytics & Insights 1.6B events per day Personalized Product Recommendation Improve conversion ratio and click through rate Meta Conversion APIs Run effective Ad campaigns Challenges Bot filtering Agenda Observability, Monitoring & Alerting Charts Q&A