This document introduces a self-service metadata driven data loading platform developed by Walmart to simplify and optimize the process of onboarding and running data applications. The key components of the platform include a centralized metadata store, connectors to integrate various data sources and targets, an orchestrator to build optimized execution plans, a schedule optimizer to prioritize jobs, and telemetry dashboards for monitoring. The goal of the platform is to dramatically increase developer productivity, provide a low-code experience, and intelligently manage resources and job scheduling across applications.
3. Agenda
• Personalization @Walmart
• Challenges
• Solution Approaches
• High Level System Architecture
• Metadata Design and Connectors
• Orchestrator
• Schedule Optimizer
• Telemetry
4. Personalization @Walmart
• Our Customers are becoming increasingly
omni channel
• ~220M Customers & Members visits ~10,500
stores & clubs under 46 banners in
24 countries & eCommerce websites in a
week
• Billions of product impressions served every
week which generates events in petabytes
• We at FE team, run thousands of data
applications to generate features that
powers the personalized recommendations
to our customers
source
Walmart
General
Merchandise
+Walmart
Grocery, Store
Pickup &
Delivery
+Walmart
Stores
5. Persoalization | Data
Landscape
User Experience & Access Control
Security
Logging
Alerting
Telemetry
Data Engineers Data Scientists Data Analysts
Data Apps | Data Loader Platform
Muti – DC and Public Cloud
Streaming | In Memory | No SQL | Analytical
Personalization| Data Landscape
6. • Data application onboarding requires a lot of manual hand coding and developers need
time to develop, integrate, and test code to solve the underlying complexities
• Building functionality rich application needs integration with various big data technologies,
wide array of data sources, sinks and data processors
• Isolated deployment, difficult to control the resource allocation/usage and do the
retrospection
• Competing high and low priority applications are introducing the latency to the serving
layers
Challenges
7. Challenges | New App Onboarding | Cumbersome & Fragile
Integrate
Application 1 Integrate Develop Implement Enable
Source System Target System Processor Schedule Telemetry
Test and Deploy
Integrate
Application 2 Integrate Develop Implement Enable Test and Deploy
Integrate
Application 3 Integrate Develop Implement Enable Test and Deploy
Integrate
Application 4 Integrate Develop Implement Enable Test and Deploy
Integrate
Application N Integrate Develop Implement Enable Test and Deploy
Allocate
Resource
Allocate
Allocate
Allocate
Allocate
8. Data Loader Simplifies the onboarding
Configure
Application 1
Source System Target System Processor Schedule Telemetry
Test and Deploy
Configure
Application 2 Test and Deploy
Configure
Application 3 Test and Deploy
Configure
Application 4 Test and Deploy
Configure
Application N Test and Deploy
Resource
Parsers Connectors
Processors Schedulers
Execution Plan
Dashboard
Data Loader Platform
9. • A centralized metadata driven data loading platform with plug and play
onboarding capability
• An abstraction layer to build the workflow orchestration which simplifies the complex
service integrations and faster time to deployment
• A compelling UI that dramatically increases the developer’s productivity by providing ready-
to-use connectors to configure the business logic
• An Intelligent system to provide optimized recommendation based on the previous runs
• Smart run schedule pool to enqueue and dequeue the run instances based on priority
Solution Approach
12. •Platform is equipped to parse and handle all the data formats like JSON, AVRO,
Parquet and CSV
•Users can pick the existing connectors supporting different source and target systems
like Kafka, Cassandra and BQ.
•Metadata stores the system and application specific resource configuration to
optimize the resource allocations
•Abstract layer bundled with Custom UDFs that provides user flexibility to query the systems
like Kafka and Cassandra with SQL
Connectors
13. Sample Domain API call in SQL UDF
• Accessing new domain APIs requires lot of engineering effort to integrate it in any data
applications
• Creating UDFs for Domain APIs and use these APIs in parallel computational engine
like Spark where it accepts UDFs usage in SQL
spark.sql("select getAccountStatus('cust_id:xxxxxxxxx') as is_active from table limit 1").show(false)
+------------------------------+
|is_active |
+------------------------------+
|Y|
+------------------------------+
14. Orchestrator
• Builds the optimized execution plan
based on the application configs from
the metadata store
• Responsible for generating the run
instances based on the app priority and
source systems
• Executors picks the optimized
execution plan during the execution
Metadata
Store
Executors
Read App
Config
Job Optimizer
Generate Run
Instance
Run Scheduler
Orchestrator
15. • Smart priority groups assigned to each loader for all the applications based on
the criticality
• Top priority jobs take precedence over the already scheduled lower priority
ones by dequeuing them
• Automatic resumption of the lower priority jobs once all the top priority and
SLA bound jobs are complete
Schedule Optimizer
17. • Real-time dashboards that provide run time statistics for each application
• Insightful experience to deep dive on various metrics
• Alerting and notification mechanism to let app owners know about any erroneous or fault
scenarios
• Consolidated view of all applications with corresponding success/failure ratio
Telemetry
18. Putting the pieces together
Self Service
Metadata Store
Multiple
Execution
Engines
E2E App Life
Cycle
Management
Multiple
Source &
Target Systems
Telemetry
Version Control
& CI/CD
Cloud Native
Plug & Play
Low or No code
19. • Quick turnaround time from weeks to days
• Developer productivity expected to increase by multiple folds
• Non-Engineering teams can also leverage this Platform to build functional applications
with knowledge of SQL
• Intelligent app execution based on the app priority compared to non-SLA applications
Outcome
Large data-driven enterprises needs for all data processing tasks ranging from ingest through ETL and data quality processing to advanced analytics and machine learning jobs.