SlideShare a Scribd company logo
Real Time DQMM on Flink
Jaydeep
Staff Engineer in Search Team @ WalmartLabs
Apache Oozie Committer
June, 2019
Table of Contents
2
• What is Real Time Aggregation​?
• Use Cases
• Processing engine
• Flink vs Spark
• Flink on Yarn
• Real Time DQMM Architecture
• 100% data completeness
• Demo
• What Next
• Open Items
What is Real Time Aggregation​?
3
• What is real time ?​
• What is the processing delay today?​
• What real time offering?​
• Why do we need it?
Use Cases
4
• Release with Beacon Bugs​
• Item Catalogue health​
• Item out of stock (specially on event days)​
• Best seller item tracking​
• Top query monitoring​
• Top item tracking​
• Category performance
Processing Engine
5
• Support for Real-time processing.
• Support to track the events.
• Easy to recover from failure.
• Exactly once processing
• Backpressure handling
• Support for Event based, Time based and Dynamic Window
• Highly Available
Flink vs Spark
6
• Event Processing Model
• Data Shuffling Mode
• Window Function
• Memory management
• State Storage
• Recovery
• Re-utilization and Iterations
Flink on Yarn
7
Real Time DQMM Architecture
8
100% Data Completeness
9
Event Arrival Time Actual Event Time Clicks
2019-06-01 10:01:00 2019-06-01 10:01:00 3
2019-06-01 10:02:00 2019-06-01 10:02:00 1
2019-06-01 10:04:00 2019-06-01 10:03:00 4
2019-06-01 10:06:00 2019-06-01 10:04:00 5
2019-06-01 10:08:00 2019-06-01 10:04:00 1
Processed Time Event time Window Clicks
2019-06-01 10:05:00 2019-06-01 10:05:00 8
2019-06-01 10:10:00 2019-06-01 10:10:00 6
100% Data Completeness
10
• Event Time data processing
• Handling the delayed event
• Prevent false anomaly detection
• Probability based Model for data completeness
What we deal with?
11
~4 Billion logs
~8 million records per minutes
~600 GB Records
Yarn HA cluster
14 Nodes clusters
1.12 GB memory
112 cores
Demo
Walmart Labs PPT Timesaver – Privileged and Confidential12
What next
13
• Page Type Onboarding
• Category Onboarding
• Mobile App Onboarding
• Best Seller Onboarding
• Improvement on Notification layer
Open Items
14
• Real time Model training
• Handling Seasonality while detecting Anomaly
Walmart Labs – Privileged and Confidential15

More Related Content

What's hot

Supporting large scale React applications
Supporting large scale React applicationsSupporting large scale React applications
Supporting large scale React applications
Maurice De Beijer [MVP]
 
Monitoring with Stackdriver
Monitoring with StackdriverMonitoring with Stackdriver
Monitoring with Stackdriver
denise stockman
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Codemotion
 
ASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG LondonASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG London
Phil Pursglove
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
confluent
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Mark Heckler
 
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
tdc-globalcode
 
Lessons Learned: Spring Cloud -> Docker -> Kubernetes
Lessons Learned: Spring Cloud -> Docker -> KubernetesLessons Learned: Spring Cloud -> Docker -> Kubernetes
Lessons Learned: Spring Cloud -> Docker -> Kubernetes
Mauricio (Salaboy) Salatino
 
Service Fabric
Service FabricService Fabric
Service Fabric
Daniel Toomey
 
How to adopt React for moving fast startup
How to adopt React for moving fast startupHow to adopt React for moving fast startup
How to adopt React for moving fast startup
Sira Sujjinanont
 
NoSQL at Gumtree
NoSQL at GumtreeNoSQL at Gumtree
NoSQL at Gumtree
Andy Summers
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Outlyer
 
Nunca digas nunca: La nube y algunos principios que no pensabas que llegaría...
Nunca digas nunca:  La nube y algunos principios que no pensabas que llegaría...Nunca digas nunca:  La nube y algunos principios que no pensabas que llegaría...
Nunca digas nunca: La nube y algunos principios que no pensabas que llegaría...
GeneXus
 
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
Blake Crosby
 
CloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew FowlerCloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew Fowler
Chris Purrington
 
Event-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applicationsEvent-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applications
Maurice De Beijer [MVP]
 
Event-Sourcing your React-Redux applications at HolyJS 2016
Event-Sourcing your React-Redux applications at HolyJS 2016Event-Sourcing your React-Redux applications at HolyJS 2016
Event-Sourcing your React-Redux applications at HolyJS 2016
Maurice De Beijer [MVP]
 
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
HostedbyConfluent
 
Servers? Where we're going we don't need servers.
Servers? Where we're going we don't need servers.Servers? Where we're going we don't need servers.
Servers? Where we're going we don't need servers.
drnugent
 
Event-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applicationsEvent-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applications
Maurice De Beijer [MVP]
 

What's hot (20)

Supporting large scale React applications
Supporting large scale React applicationsSupporting large scale React applications
Supporting large scale React applications
 
Monitoring with Stackdriver
Monitoring with StackdriverMonitoring with Stackdriver
Monitoring with Stackdriver
 
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
 
ASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG LondonASP.NET Scalability - VBUG London
ASP.NET Scalability - VBUG London
 
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
Maintaining Consistency for a Financial Event-Driven Architecture (Iago Borge...
 
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
Living on the Edge (Service): Bundling Microservices to Optimize Consumption ...
 
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
TDC2016SP - Living on the Edge (Service): Bundling Microservices to Optimize ...
 
Lessons Learned: Spring Cloud -> Docker -> Kubernetes
Lessons Learned: Spring Cloud -> Docker -> KubernetesLessons Learned: Spring Cloud -> Docker -> Kubernetes
Lessons Learned: Spring Cloud -> Docker -> Kubernetes
 
Service Fabric
Service FabricService Fabric
Service Fabric
 
How to adopt React for moving fast startup
How to adopt React for moving fast startupHow to adopt React for moving fast startup
How to adopt React for moving fast startup
 
NoSQL at Gumtree
NoSQL at GumtreeNoSQL at Gumtree
NoSQL at Gumtree
 
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning WorldLeonard Austin (Ravelin) - DevOps in a Machine Learning World
Leonard Austin (Ravelin) - DevOps in a Machine Learning World
 
Nunca digas nunca: La nube y algunos principios que no pensabas que llegaría...
Nunca digas nunca:  La nube y algunos principios que no pensabas que llegaría...Nunca digas nunca:  La nube y algunos principios que no pensabas que llegaría...
Nunca digas nunca: La nube y algunos principios que no pensabas que llegaría...
 
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
How Do you Scale for both Predictable and Unpredictable Events on such a Larg...
 
CloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew FowlerCloudCamp London 3 - NT/e - Matthew Fowler
CloudCamp London 3 - NT/e - Matthew Fowler
 
Event-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applicationsEvent-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applications
 
Event-Sourcing your React-Redux applications at HolyJS 2016
Event-Sourcing your React-Redux applications at HolyJS 2016Event-Sourcing your React-Redux applications at HolyJS 2016
Event-Sourcing your React-Redux applications at HolyJS 2016
 
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
Mistakes - I’ve made a few. Blunders in event-driven architecture | Simon Aub...
 
Servers? Where we're going we don't need servers.
Servers? Where we're going we don't need servers.Servers? Where we're going we don't need servers.
Servers? Where we're going we don't need servers.
 
Event-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applicationsEvent-Sourcing your React-Redux applications
Event-Sourcing your React-Redux applications
 

Similar to Real time DQMM on Flink

Real time dqmm on flink
Real time dqmm on flinkReal time dqmm on flink
Real time dqmm on flink
Jaydeep Vishwakarma
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Ververica
 
Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7
Gary Park
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
Grid Dynamics
 
TidalScale Slides
TidalScale SlidesTidalScale Slides
TidalScale Slides
Lisa Ellington
 
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and DruidA Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
HostedbyConfluent
 
Building real-time data analytics on Google Cloud
Building real-time data analytics on Google CloudBuilding real-time data analytics on Google Cloud
Building real-time data analytics on Google Cloud
Jonny Daenen
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...
Peter Gfader
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
mkherlakian
 
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
Amazon Web Services
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
CEE-SEC(R)
 
Enterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless ApproachEnterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless Approach
RightScale
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
SnapLogic
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
VMware Tanzu Korea
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
Precisely
 
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
Keith Braswell
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
Jamie Grier
 
Rich, Real-time Mobile User Experiences @Devoxx UK
Rich, Real-time Mobile User Experiences @Devoxx UKRich, Real-time Mobile User Experiences @Devoxx UK
Rich, Real-time Mobile User Experiences @Devoxx UK
Andy Piper
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
Aljoscha Krettek
 
Micrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamisMicrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamis
Tier1app
 

Similar to Real time DQMM on Flink (20)

Real time dqmm on flink
Real time dqmm on flinkReal time dqmm on flink
Real time dqmm on flink
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7Getting started with caliburn.micro and windows phone 7
Getting started with caliburn.micro and windows phone 7
 
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...In-Stream Processing Service Blueprint, Reference architecture for real-time ...
In-Stream Processing Service Blueprint, Reference architecture for real-time ...
 
TidalScale Slides
TidalScale SlidesTidalScale Slides
TidalScale Slides
 
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and DruidA Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
 
Building real-time data analytics on Google Cloud
Building real-time data analytics on Google CloudBuilding real-time data analytics on Google Cloud
Building real-time data analytics on Google Cloud
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...
 
Big server-is-watching-you
Big server-is-watching-youBig server-is-watching-you
Big server-is-watching-you
 
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
AWS Security in Your Sleep: Build End-to-End Automation for IR Workflows (SEC...
 
Потоковая обработка больших данных
Потоковая обработка больших данныхПотоковая обработка больших данных
Потоковая обработка больших данных
 
Enterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless ApproachEnterprise Cloud Governance: A Frictionless Approach
Enterprise Cloud Governance: A Frictionless Approach
 
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogicWebinar: Improve Splunk Analytics and Automate Processes with SnapLogic
Webinar: Improve Splunk Analytics and Automate Processes with SnapLogic
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
 
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
Automated Asset Tracking in the Data Center: How IBM Reduced the Time/Cost of...
 
Stream Processing @ Lyft
Stream Processing @ LyftStream Processing @ Lyft
Stream Processing @ Lyft
 
Rich, Real-time Mobile User Experiences @Devoxx UK
Rich, Real-time Mobile User Experiences @Devoxx UKRich, Real-time Mobile User Experiences @Devoxx UK
Rich, Real-time Mobile User Experiences @Devoxx UK
 
Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...Stream processing for the practitioner: Blueprints for common stream processi...
Stream processing for the practitioner: Blueprints for common stream processi...
 
Micrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamisMicrometrics to forecast performance tsunamis
Micrometrics to forecast performance tsunamis
 

Recently uploaded

一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
SaffaIbrahim1
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 

Recently uploaded (20)

一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docxDATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
DATA COMMS-NETWORKS YR2 lecture 08 NAT & CLOUD.docx
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 

Real time DQMM on Flink

  • 1. Real Time DQMM on Flink Jaydeep Staff Engineer in Search Team @ WalmartLabs Apache Oozie Committer June, 2019
  • 2. Table of Contents 2 • What is Real Time Aggregation​? • Use Cases • Processing engine • Flink vs Spark • Flink on Yarn • Real Time DQMM Architecture • 100% data completeness • Demo • What Next • Open Items
  • 3. What is Real Time Aggregation​? 3 • What is real time ?​ • What is the processing delay today?​ • What real time offering?​ • Why do we need it?
  • 4. Use Cases 4 • Release with Beacon Bugs​ • Item Catalogue health​ • Item out of stock (specially on event days)​ • Best seller item tracking​ • Top query monitoring​ • Top item tracking​ • Category performance
  • 5. Processing Engine 5 • Support for Real-time processing. • Support to track the events. • Easy to recover from failure. • Exactly once processing • Backpressure handling • Support for Event based, Time based and Dynamic Window • Highly Available
  • 6. Flink vs Spark 6 • Event Processing Model • Data Shuffling Mode • Window Function • Memory management • State Storage • Recovery • Re-utilization and Iterations
  • 8. Real Time DQMM Architecture 8
  • 9. 100% Data Completeness 9 Event Arrival Time Actual Event Time Clicks 2019-06-01 10:01:00 2019-06-01 10:01:00 3 2019-06-01 10:02:00 2019-06-01 10:02:00 1 2019-06-01 10:04:00 2019-06-01 10:03:00 4 2019-06-01 10:06:00 2019-06-01 10:04:00 5 2019-06-01 10:08:00 2019-06-01 10:04:00 1 Processed Time Event time Window Clicks 2019-06-01 10:05:00 2019-06-01 10:05:00 8 2019-06-01 10:10:00 2019-06-01 10:10:00 6
  • 10. 100% Data Completeness 10 • Event Time data processing • Handling the delayed event • Prevent false anomaly detection • Probability based Model for data completeness
  • 11. What we deal with? 11 ~4 Billion logs ~8 million records per minutes ~600 GB Records Yarn HA cluster 14 Nodes clusters 1.12 GB memory 112 cores
  • 12. Demo Walmart Labs PPT Timesaver – Privileged and Confidential12
  • 13. What next 13 • Page Type Onboarding • Category Onboarding • Mobile App Onboarding • Best Seller Onboarding • Improvement on Notification layer
  • 14. Open Items 14 • Real time Model training • Handling Seasonality while detecting Anomaly
  • 15. Walmart Labs – Privileged and Confidential15