SlideShare a Scribd company logo
1 of 20
Subramanya Mulgund
Manimuthu Ayyannan
Self Service Metadata driven
Data Loader Platform
About Us
Manimuthu Ayyannan
manimuthu.ayyannan@walmart.com
LinkedIn:@manimuthuayyannan
Senior Manager II, Personalization at
Walmart Global Tech
Subramanya Mulgund
subramanya.mulgund@walmart.com
LinkedIn:@mulgunds
Sr Software Engineer, Personalization at
Walmart Global Tech
Agenda
• Personalization @Walmart
• Challenges
• Solution Approaches
• High Level System Architecture
• Metadata Design and Connectors
• Orchestrator
• Schedule Optimizer
• Telemetry
Personalization @Walmart
• Our Customers are becoming increasingly
omni channel
• ~220M Customers & Members visits ~10,500
stores & clubs under 46 banners in
24 countries & eCommerce websites in a
week
• Billions of product impressions served every
week which generates events in petabytes
• We at FE team, run thousands of data
applications to generate features that
powers the personalized recommendations
to our customers
source
Walmart
General
Merchandise
+Walmart
Grocery, Store
Pickup &
Delivery
+Walmart
Stores
Persoalization | Data
Landscape
User Experience & Access Control
Security
Logging
Alerting
Telemetry
Data Engineers Data Scientists Data Analysts
Data Apps | Data Loader Platform
Muti – DC and Public Cloud
Streaming | In Memory | No SQL | Analytical
Personalization| Data Landscape
• Data application onboarding requires a lot of manual hand coding and developers need
time to develop, integrate, and test code to solve the underlying complexities
• Building functionality rich application needs integration with various big data technologies,
wide array of data sources, sinks and data processors
• Isolated deployment, difficult to control the resource allocation/usage and do the
retrospection
• Competing high and low priority applications are introducing the latency to the serving
layers
Challenges
Challenges | New App Onboarding | Cumbersome & Fragile
Integrate
Application 1 Integrate Develop Implement Enable
Source System Target System Processor Schedule Telemetry
Test and Deploy
Integrate
Application 2 Integrate Develop Implement Enable Test and Deploy
Integrate
Application 3 Integrate Develop Implement Enable Test and Deploy
Integrate
Application 4 Integrate Develop Implement Enable Test and Deploy
Integrate
Application N Integrate Develop Implement Enable Test and Deploy
Allocate
Resource
Allocate
Allocate
Allocate
Allocate
Data Loader Simplifies the onboarding
Configure
Application 1
Source System Target System Processor Schedule Telemetry
Test and Deploy
Configure
Application 2 Test and Deploy
Configure
Application 3 Test and Deploy
Configure
Application 4 Test and Deploy
Configure
Application N Test and Deploy
Resource
Parsers Connectors
Processors Schedulers
Execution Plan
Dashboard
Data Loader Platform
• A centralized metadata driven data loading platform with plug and play
onboarding capability
• An abstraction layer to build the workflow orchestration which simplifies the complex
service integrations and faster time to deployment
• A compelling UI that dramatically increases the developer’s productivity by providing ready-
to-use connectors to configure the business logic
• An Intelligent system to provide optimized recommendation based on the previous runs
• Smart run schedule pool to enqueue and dequeue the run instances based on priority
Solution Approach
High Level System Architecture
Metadata Under the hood
•Platform is equipped to parse and handle all the data formats like JSON, AVRO,
Parquet and CSV
•Users can pick the existing connectors supporting different source and target systems
like Kafka, Cassandra and BQ.
•Metadata stores the system and application specific resource configuration to
optimize the resource allocations
•Abstract layer bundled with Custom UDFs that provides user flexibility to query the systems
like Kafka and Cassandra with SQL
Connectors
Sample Domain API call in SQL UDF
• Accessing new domain APIs requires lot of engineering effort to integrate it in any data
applications
• Creating UDFs for Domain APIs and use these APIs in parallel computational engine
like Spark where it accepts UDFs usage in SQL
spark.sql("select getAccountStatus('cust_id:xxxxxxxxx') as is_active from table limit 1").show(false)
+------------------------------+
|is_active |
+------------------------------+
|Y|
+------------------------------+
Orchestrator
• Builds the optimized execution plan
based on the application configs from
the metadata store
• Responsible for generating the run
instances based on the app priority and
source systems
• Executors picks the optimized
execution plan during the execution
Metadata
Store
Executors
Read App
Config
Job Optimizer
Generate Run
Instance
Run Scheduler
Orchestrator
• Smart priority groups assigned to each loader for all the applications based on
the criticality
• Top priority jobs take precedence over the already scheduled lower priority
ones by dequeuing them
• Automatic resumption of the lower priority jobs once all the top priority and
SLA bound jobs are complete
Schedule Optimizer
Schedule Optimizer Illustration
10:00 | non-core app | instance 1 | Done
10:00 | non-core app | instance 2 | Done
10:00 | non-core app | instance 3 | In
Progress
10:00 | non-core app | instance 4 | In
Progress
10:00 | non-core app | instance 5 | waiting
10:00 | non-core app | instance 6 | waiting
10:00 | non-core app | instance 1 | Done
10:00 | non-core app | instance 2 | Done
10:00 | non-core app | instance 3 | Done
10:00 | non-core app | instance 4 | Done
10:00 | non-core app | instance 5 |waiting
10:00 | non-core app | instance 6 | waiting
10:30 | core app | instance 2 | waiting
10:30 | core app | instance 3 | waiting
10:30 | core app | instance 4 | waiting
10:30 | non-core app | instance
1 | waiting
Current Schedule Pool
Updated Schedule Pool
Incoming Schedule Pool
10:30 | core app | instance 1 | In Progress
10:30 | core app | instance 2 | In Progress
10:30 | core app | instance 3 | waiting
10:30 | core app | instance 4 | waiting
10:30 | core app | instance 1 | waiting
10:30 | non-core app | instance 1 | waiting
• Real-time dashboards that provide run time statistics for each application
• Insightful experience to deep dive on various metrics
• Alerting and notification mechanism to let app owners know about any erroneous or fault
scenarios
• Consolidated view of all applications with corresponding success/failure ratio
Telemetry
Putting the pieces together
Self Service
Metadata Store
Multiple
Execution
Engines
E2E App Life
Cycle
Management
Multiple
Source &
Target Systems
Telemetry
Version Control
& CI/CD
Cloud Native
Plug & Play
Low or No code
• Quick turnaround time from weeks to days
• Developer productivity expected to increase by multiple folds
• Non-Engineering teams can also leverage this Platform to build functional applications
with knowledge of SQL
• Intelligent app execution based on the app priority compared to non-SLA applications
Outcome
Thank You

More Related Content

Similar to Data Stack Summit 2023

Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Apache Apex
 
Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Increased IT infrastructure effectiveness by 80% with Microsoft system center...Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Increased IT infrastructure effectiveness by 80% with Microsoft system center...Aspire Systems
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2ManageEngine, Zoho Corporation
 
Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1ManageEngine, Zoho Corporation
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHewlett-Packard
 
Presentation tritan erp service
Presentation tritan erp servicePresentation tritan erp service
Presentation tritan erp serviceTritan solution
 
Enterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesEnterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesHemang Rindani
 
Enterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesEnterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesCygnet Infotech
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...apprize360
 
Driving TAS Enterprise Fitness
Driving TAS Enterprise FitnessDriving TAS Enterprise Fitness
Driving TAS Enterprise FitnessVMware Tanzu
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectDevOps.com
 
Leveraging Analytics for DevOps
Leveraging Analytics for DevOpsLeveraging Analytics for DevOps
Leveraging Analytics for DevOpsMichael Floyd
 
Automated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarAutomated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarSafe Software
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfTamir Dresher
 
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...Smart ERP Solutions, Inc.
 

Similar to Data Stack Summit 2023 (20)

Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Increased IT infrastructure effectiveness by 80% with Microsoft system center...Increased IT infrastructure effectiveness by 80% with Microsoft system center...
Increased IT infrastructure effectiveness by 80% with Microsoft system center...
 
Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2Server and application monitoring webinars [Applications Manager] - Part 2
Server and application monitoring webinars [Applications Manager] - Part 2
 
Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1Server and application monitoring webinars [Applications Manager]: Part 1
Server and application monitoring webinars [Applications Manager]: Part 1
 
Sadiq_CV_7
Sadiq_CV_7Sadiq_CV_7
Sadiq_CV_7
 
Journey to the center of DevOps - v6
Journey to the center of DevOps - v6Journey to the center of DevOps - v6
Journey to the center of DevOps - v6
 
Hybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShareHybrid Cloud example for SlideShare
Hybrid Cloud example for SlideShare
 
Arunprakash Alagesan
Arunprakash AlagesanArunprakash Alagesan
Arunprakash Alagesan
 
Presentation tritan erp service
Presentation tritan erp servicePresentation tritan erp service
Presentation tritan erp service
 
Enterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesEnterprise QA and Application Testing Services
Enterprise QA and Application Testing Services
 
Enterprise QA and Application Testing Services
Enterprise QA and Application Testing ServicesEnterprise QA and Application Testing Services
Enterprise QA and Application Testing Services
 
Whitepaper factors to consider when selecting an open source infrastructure ...
Whitepaper  factors to consider when selecting an open source infrastructure ...Whitepaper  factors to consider when selecting an open source infrastructure ...
Whitepaper factors to consider when selecting an open source infrastructure ...
 
Neev QA Offering
Neev QA OfferingNeev QA Offering
Neev QA Offering
 
Driving TAS Enterprise Fitness
Driving TAS Enterprise FitnessDriving TAS Enterprise Fitness
Driving TAS Enterprise Fitness
 
Modernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-ArchitectModernizing Testing as Apps Re-Architect
Modernizing Testing as Apps Re-Architect
 
Leveraging Analytics for DevOps
Leveraging Analytics for DevOpsLeveraging Analytics for DevOps
Leveraging Analytics for DevOps
 
Automated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks WebinarAutomated Application Integration with FME & Cityworks Webinar
Automated Application Integration with FME & Cityworks Webinar
 
SCCM 2019 Demo.pptx
SCCM 2019 Demo.pptxSCCM 2019 Demo.pptx
SCCM 2019 Demo.pptx
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
 
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...
Get More Out of Your PeopleSoft Applications Using Tools that You May Not Eve...
 

Recently uploaded

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Data Stack Summit 2023

  • 1. Subramanya Mulgund Manimuthu Ayyannan Self Service Metadata driven Data Loader Platform
  • 2. About Us Manimuthu Ayyannan manimuthu.ayyannan@walmart.com LinkedIn:@manimuthuayyannan Senior Manager II, Personalization at Walmart Global Tech Subramanya Mulgund subramanya.mulgund@walmart.com LinkedIn:@mulgunds Sr Software Engineer, Personalization at Walmart Global Tech
  • 3. Agenda • Personalization @Walmart • Challenges • Solution Approaches • High Level System Architecture • Metadata Design and Connectors • Orchestrator • Schedule Optimizer • Telemetry
  • 4. Personalization @Walmart • Our Customers are becoming increasingly omni channel • ~220M Customers & Members visits ~10,500 stores & clubs under 46 banners in 24 countries & eCommerce websites in a week • Billions of product impressions served every week which generates events in petabytes • We at FE team, run thousands of data applications to generate features that powers the personalized recommendations to our customers source Walmart General Merchandise +Walmart Grocery, Store Pickup & Delivery +Walmart Stores
  • 5. Persoalization | Data Landscape User Experience & Access Control Security Logging Alerting Telemetry Data Engineers Data Scientists Data Analysts Data Apps | Data Loader Platform Muti – DC and Public Cloud Streaming | In Memory | No SQL | Analytical Personalization| Data Landscape
  • 6. • Data application onboarding requires a lot of manual hand coding and developers need time to develop, integrate, and test code to solve the underlying complexities • Building functionality rich application needs integration with various big data technologies, wide array of data sources, sinks and data processors • Isolated deployment, difficult to control the resource allocation/usage and do the retrospection • Competing high and low priority applications are introducing the latency to the serving layers Challenges
  • 7. Challenges | New App Onboarding | Cumbersome & Fragile Integrate Application 1 Integrate Develop Implement Enable Source System Target System Processor Schedule Telemetry Test and Deploy Integrate Application 2 Integrate Develop Implement Enable Test and Deploy Integrate Application 3 Integrate Develop Implement Enable Test and Deploy Integrate Application 4 Integrate Develop Implement Enable Test and Deploy Integrate Application N Integrate Develop Implement Enable Test and Deploy Allocate Resource Allocate Allocate Allocate Allocate
  • 8. Data Loader Simplifies the onboarding Configure Application 1 Source System Target System Processor Schedule Telemetry Test and Deploy Configure Application 2 Test and Deploy Configure Application 3 Test and Deploy Configure Application 4 Test and Deploy Configure Application N Test and Deploy Resource Parsers Connectors Processors Schedulers Execution Plan Dashboard Data Loader Platform
  • 9. • A centralized metadata driven data loading platform with plug and play onboarding capability • An abstraction layer to build the workflow orchestration which simplifies the complex service integrations and faster time to deployment • A compelling UI that dramatically increases the developer’s productivity by providing ready- to-use connectors to configure the business logic • An Intelligent system to provide optimized recommendation based on the previous runs • Smart run schedule pool to enqueue and dequeue the run instances based on priority Solution Approach
  • 10. High Level System Architecture
  • 12. •Platform is equipped to parse and handle all the data formats like JSON, AVRO, Parquet and CSV •Users can pick the existing connectors supporting different source and target systems like Kafka, Cassandra and BQ. •Metadata stores the system and application specific resource configuration to optimize the resource allocations •Abstract layer bundled with Custom UDFs that provides user flexibility to query the systems like Kafka and Cassandra with SQL Connectors
  • 13. Sample Domain API call in SQL UDF • Accessing new domain APIs requires lot of engineering effort to integrate it in any data applications • Creating UDFs for Domain APIs and use these APIs in parallel computational engine like Spark where it accepts UDFs usage in SQL spark.sql("select getAccountStatus('cust_id:xxxxxxxxx') as is_active from table limit 1").show(false) +------------------------------+ |is_active | +------------------------------+ |Y| +------------------------------+
  • 14. Orchestrator • Builds the optimized execution plan based on the application configs from the metadata store • Responsible for generating the run instances based on the app priority and source systems • Executors picks the optimized execution plan during the execution Metadata Store Executors Read App Config Job Optimizer Generate Run Instance Run Scheduler Orchestrator
  • 15. • Smart priority groups assigned to each loader for all the applications based on the criticality • Top priority jobs take precedence over the already scheduled lower priority ones by dequeuing them • Automatic resumption of the lower priority jobs once all the top priority and SLA bound jobs are complete Schedule Optimizer
  • 16. Schedule Optimizer Illustration 10:00 | non-core app | instance 1 | Done 10:00 | non-core app | instance 2 | Done 10:00 | non-core app | instance 3 | In Progress 10:00 | non-core app | instance 4 | In Progress 10:00 | non-core app | instance 5 | waiting 10:00 | non-core app | instance 6 | waiting 10:00 | non-core app | instance 1 | Done 10:00 | non-core app | instance 2 | Done 10:00 | non-core app | instance 3 | Done 10:00 | non-core app | instance 4 | Done 10:00 | non-core app | instance 5 |waiting 10:00 | non-core app | instance 6 | waiting 10:30 | core app | instance 2 | waiting 10:30 | core app | instance 3 | waiting 10:30 | core app | instance 4 | waiting 10:30 | non-core app | instance 1 | waiting Current Schedule Pool Updated Schedule Pool Incoming Schedule Pool 10:30 | core app | instance 1 | In Progress 10:30 | core app | instance 2 | In Progress 10:30 | core app | instance 3 | waiting 10:30 | core app | instance 4 | waiting 10:30 | core app | instance 1 | waiting 10:30 | non-core app | instance 1 | waiting
  • 17. • Real-time dashboards that provide run time statistics for each application • Insightful experience to deep dive on various metrics • Alerting and notification mechanism to let app owners know about any erroneous or fault scenarios • Consolidated view of all applications with corresponding success/failure ratio Telemetry
  • 18. Putting the pieces together Self Service Metadata Store Multiple Execution Engines E2E App Life Cycle Management Multiple Source & Target Systems Telemetry Version Control & CI/CD Cloud Native Plug & Play Low or No code
  • 19. • Quick turnaround time from weeks to days • Developer productivity expected to increase by multiple folds • Non-Engineering teams can also leverage this Platform to build functional applications with knowledge of SQL • Intelligent app execution based on the app priority compared to non-SLA applications Outcome

Editor's Notes

  1. Large data-driven enterprises needs for all data processing tasks ranging from ingest through ETL and data quality processing to advanced analytics and machine learning jobs.