SlideShare a Scribd company logo
Real Time DQMM on Flink
Jaydeep
Staff Engineer in Search Team @ WalmartLabs
Apache Oozie Committer
June, 2019
Table of Contents
2
• What is Real Time Aggregation​?
• Use Cases
• Processing engine
• Flink vs Spark
• Flink on Yarn
• Real Time DQMM Architecture
• 100% data completeness
• Demo
• What Next
• Open Items
What is Real Time Aggregation​?
3
• What is real time ?​
• What is the processing delay today?​
• What real time offering?​
• Why do we need it?
Use Cases
4
• Release with Beacon Bugs​
• Item Catalogue health​
• Item out of stock (specially on event days)​
• Best seller item tracking​
• Top query monitoring​
• Top item tracking​
• Category performance
Processing Engine
5
• Support for Real-time processing.
• Support to track the events.
• Easy to recover from failure.
• Exactly once processing
• Backpressure handling
• Support for Event based, Time based and Dynamic Window
• Highly Available
Flink on Yarn
6
Real Time DQMM Architecture
7
100% Data Completeness
8
Event Arrival Time Actual Event Time Clicks
2019-06-01 10:01:00 2019-06-01 10:01:00 3
2019-06-01 10:02:00 2019-06-01 10:02:00 1
2019-06-01 10:04:00 2019-06-01 10:03:00 4
2019-06-01 10:06:00 2019-06-01 10:04:00 5
2019-06-01 10:08:00 2019-06-01 10:04:00 1
Processed Time Event time Window Clicks
2019-06-01 10:05:00 2019-06-01 10:05:00 8
2019-06-01 10:10:00 2019-06-01 10:10:00 6
100% Data Completeness
9
• Event Time data processing
• Handling the delayed event
• Prevent false anomaly detection
• Probability based Model for data completeness
What we deal with?
10
~4 Billion logs
~8 million records per minutes
~600 GB Records
Yarn HA cluster
14 Nodes clusters
1.12 GB memory
112 cores
Demo
Walmart Labs PPT Timesaver – Privileged and Confidential11
What next
12
• Page Type Onboarding
• Category Onboarding
• Mobile App Onboarding
• Best Seller Onboarding
• Improvement on Notification layer
Open Items
13
• Real time Model training
• Handling Seasonality while detecting Anomaly
Walmart Labs – Privileged and Confidential14

More Related Content

What's hot

What's hot (19)

Shipping apps to eks with code pipeline and lambda functions
Shipping apps to eks with code pipeline and lambda functionsShipping apps to eks with code pipeline and lambda functions
Shipping apps to eks with code pipeline and lambda functions
 
Monitoring the #DevOps way
Monitoring the #DevOps wayMonitoring the #DevOps way
Monitoring the #DevOps way
 
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud PipelinesAI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
AI-Powered DevOps: Injecting Speed & Quality Across Verizon’s Cloud Pipelines
 
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
MongoDB World 2018: Using Puppet, Ansible and Ops Manager to Create Your Own ...
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Saving Money by Optimizing Your Cloud Add-On Infrastructure
Saving Money by Optimizing Your Cloud Add-On InfrastructureSaving Money by Optimizing Your Cloud Add-On Infrastructure
Saving Money by Optimizing Your Cloud Add-On Infrastructure
 
DevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with DynatraceDevOps Transformation at Dynatrace and with Dynatrace
DevOps Transformation at Dynatrace and with Dynatrace
 
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and TestersHugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
Hugs instead of Bugs: Dreaming of Quality Tools for Devs and Testers
 
Real-time image sharing
Real-time image sharingReal-time image sharing
Real-time image sharing
 
Serverless adventure tooling
Serverless adventure toolingServerless adventure tooling
Serverless adventure tooling
 
Continuous Delivery in the Cloud with Bitbucket Pipelines
Continuous Delivery in the Cloud with Bitbucket PipelinesContinuous Delivery in the Cloud with Bitbucket Pipelines
Continuous Delivery in the Cloud with Bitbucket Pipelines
 
Accelerating Add-on Development From Concept to Launch
Accelerating Add-on Development From Concept to LaunchAccelerating Add-on Development From Concept to Launch
Accelerating Add-on Development From Concept to Launch
 
The guardian and app engine
The guardian and app engineThe guardian and app engine
The guardian and app engine
 
Bitbucket Pipelines: Serverless CI/CD That Will Save Your Life
Bitbucket Pipelines: Serverless CI/CD That Will Save Your LifeBitbucket Pipelines: Serverless CI/CD That Will Save Your Life
Bitbucket Pipelines: Serverless CI/CD That Will Save Your Life
 
Using Cookies to Store Your Postman Secrets
Using Cookies to Store Your Postman SecretsUsing Cookies to Store Your Postman Secrets
Using Cookies to Store Your Postman Secrets
 
Building Realtime Web Apps with Angular and Meteor
Building Realtime Web Apps with Angular and MeteorBuilding Realtime Web Apps with Angular and Meteor
Building Realtime Web Apps with Angular and Meteor
 
How to Grow a Serverless Team
How to Grow a Serverless TeamHow to Grow a Serverless Team
How to Grow a Serverless Team
 
Enterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience ReportEnterprise Serverless Adoption. An Experience Report
Enterprise Serverless Adoption. An Experience Report
 
Serverless operations for the iRobot fleet
Serverless operations for the iRobot fleetServerless operations for the iRobot fleet
Serverless operations for the iRobot fleet
 

Similar to Real time dqmm on flink

The Unicorn Project and The Five Ideals (Updated Dec 2019)
The Unicorn Project and The Five Ideals (Updated Dec 2019)The Unicorn Project and The Five Ideals (Updated Dec 2019)
The Unicorn Project and The Five Ideals (Updated Dec 2019)
Gene Kim
 
Cloudstack Continuous Delivery
Cloudstack Continuous DeliveryCloudstack Continuous Delivery
Cloudstack Continuous Delivery
buildacloud
 

Similar to Real time dqmm on flink (20)

Real time DQMM on Flink
Real time DQMM on FlinkReal time DQMM on Flink
Real time DQMM on Flink
 
Real time data quality on Flink
Real time data quality on FlinkReal time data quality on Flink
Real time data quality on Flink
 
Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7
 
Get lean tutorial
Get lean tutorialGet lean tutorial
Get lean tutorial
 
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and DruidA Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
 
YAPC::EU 2015 - Perl Conferences
YAPC::EU 2015 - Perl ConferencesYAPC::EU 2015 - Perl Conferences
YAPC::EU 2015 - Perl Conferences
 
The Unicorn Project and The Five Ideals (Updated Dec 2019)
The Unicorn Project and The Five Ideals (Updated Dec 2019)The Unicorn Project and The Five Ideals (Updated Dec 2019)
The Unicorn Project and The Five Ideals (Updated Dec 2019)
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Cloudstack Continuous Delivery
Cloudstack Continuous DeliveryCloudstack Continuous Delivery
Cloudstack Continuous Delivery
 
Monitor Micro-service with MicroProfile metrics
Monitor Micro-service with MicroProfile metricsMonitor Micro-service with MicroProfile metrics
Monitor Micro-service with MicroProfile metrics
 
Monitor Microservices with MicroProfile Metrics
Monitor Microservices with MicroProfile MetricsMonitor Microservices with MicroProfile Metrics
Monitor Microservices with MicroProfile Metrics
 
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
 
2015 jcconf-h2s-devops-practice
2015 jcconf-h2s-devops-practice2015 jcconf-h2s-devops-practice
2015 jcconf-h2s-devops-practice
 
A look behind the scenes: Windows 8 background processing
A look behind the scenes: Windows 8 background processingA look behind the scenes: Windows 8 background processing
A look behind the scenes: Windows 8 background processing
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
5 Quick JavaScript Performance Improvement Tips
5 Quick JavaScript Performance Improvement Tips5 Quick JavaScript Performance Improvement Tips
5 Quick JavaScript Performance Improvement Tips
 
The Unicorn Project and the Five Ideals.pdf
The Unicorn Project and the Five Ideals.pdfThe Unicorn Project and the Five Ideals.pdf
The Unicorn Project and the Five Ideals.pdf
 
crashing in style
crashing in stylecrashing in style
crashing in style
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 

Recently uploaded (20)

Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Uber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis ReportUber Ride Supply Demand Gap Analysis Report
Uber Ride Supply Demand Gap Analysis Report
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 

Real time dqmm on flink

  • 1. Real Time DQMM on Flink Jaydeep Staff Engineer in Search Team @ WalmartLabs Apache Oozie Committer June, 2019
  • 2. Table of Contents 2 • What is Real Time Aggregation​? • Use Cases • Processing engine • Flink vs Spark • Flink on Yarn • Real Time DQMM Architecture • 100% data completeness • Demo • What Next • Open Items
  • 3. What is Real Time Aggregation​? 3 • What is real time ?​ • What is the processing delay today?​ • What real time offering?​ • Why do we need it?
  • 4. Use Cases 4 • Release with Beacon Bugs​ • Item Catalogue health​ • Item out of stock (specially on event days)​ • Best seller item tracking​ • Top query monitoring​ • Top item tracking​ • Category performance
  • 5. Processing Engine 5 • Support for Real-time processing. • Support to track the events. • Easy to recover from failure. • Exactly once processing • Backpressure handling • Support for Event based, Time based and Dynamic Window • Highly Available
  • 7. Real Time DQMM Architecture 7
  • 8. 100% Data Completeness 8 Event Arrival Time Actual Event Time Clicks 2019-06-01 10:01:00 2019-06-01 10:01:00 3 2019-06-01 10:02:00 2019-06-01 10:02:00 1 2019-06-01 10:04:00 2019-06-01 10:03:00 4 2019-06-01 10:06:00 2019-06-01 10:04:00 5 2019-06-01 10:08:00 2019-06-01 10:04:00 1 Processed Time Event time Window Clicks 2019-06-01 10:05:00 2019-06-01 10:05:00 8 2019-06-01 10:10:00 2019-06-01 10:10:00 6
  • 9. 100% Data Completeness 9 • Event Time data processing • Handling the delayed event • Prevent false anomaly detection • Probability based Model for data completeness
  • 10. What we deal with? 10 ~4 Billion logs ~8 million records per minutes ~600 GB Records Yarn HA cluster 14 Nodes clusters 1.12 GB memory 112 cores
  • 11. Demo Walmart Labs PPT Timesaver – Privileged and Confidential11
  • 12. What next 12 • Page Type Onboarding • Category Onboarding • Mobile App Onboarding • Best Seller Onboarding • Improvement on Notification layer
  • 13. Open Items 13 • Real time Model training • Handling Seasonality while detecting Anomaly
  • 14. Walmart Labs – Privileged and Confidential14