Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fifth elephant 2018 - Incremental Processing

Incremental transform of transactional data models to analytical data models in near real time

Transactional systems are designed with data models to maximize write throughput across multiple parallel business flows. They evolve iteratively with business and need to react quickly to the changing business landscape to minimize time to market.
Analytical systems, on the other hand, require data models to maximize query throughput over broad, deep and large data volumes.
The need for a platform which transforms from the transactional data model to an analytical data model is well established in the industry. This is currently achieved through two different paradigms. Stream processing at lower latencies and batch processing at higher latencies.
We have solved the same problem through a third paradigm of incremental processing for intermediate latencies (5 minutes to 1 hour).
We considered and dropped implementations of the streaming paradigm either because of a lack of completeness guarantees or the absence of complex join capabilities across a large number of entities.
Our incremental processing platform transforms transactional data models to analytical data models. It provides expressability for complex joins across multiple entities (live with 30) through a Transformation Definition Language (TDL). These complex joins are evaluated incrementally as transactional data changes to periodically update the analytical data model. For near real time use cases, this is done every 5-10 minutes.
Changes to transactional data models are handled through version support in the TDL. These changes are absorbed with a pause and resume of the transformations.
The Flipkart Fulfillment Services Group serves over a million shipments in a day at its peak. Customer delight through a reliable and fast delivery of orders is our primary goal. To succeed in this endeavour, our ground operations depends on live and accurate visibility into the journey of all shipments pan India. Overall data volumes range in 10s of TBs with a change frequency of over 25k QPS at peak. All our transactional systems combined generate mutations with volumes close to 200 GB every second. Our incremental platform is built to handle this scale.
With this platform, we have achieved analytics at low latencies with high completeness without compromising on business agility.
In this talk, I will cover the specifics of our evaluations and our learnings from the journey of building the platform.

  • Login to see the comments

  • Be the first to like this

Fifth elephant 2018 - Incremental Processing

  1. 1. INCREMENTAL TRANSFORM OF TRANSACTIONAL DATA MODELS TO ANALYTICAL DATA MODELS IN NEAR-REAL-TIME Govind Pandey
  2. 2. Definitely NOT the “the greatest invention since sliced bread”
  3. 3. We came together@Flipkart to create this Ashendra Bansal Mayank Verma Govind Pandey Darshak Bhagat Bala N Saloni Khandelwal
  4. 4. Prasanna R Influencers Vinoth Chandar The case for incremental processing
  5. 5. @Flipkart: Supply Chain Automation, Predictive Optimizations and Actionable Insights through Data and AI.
  6. 6. ● Motivators ● Algorithm ● Trade-offs Agenda
  7. 7. Grocery fulfillment journey Multiple systems participate in the journey Time to act drives latency requirements 100% accuracy expected Minimal effort for business monitoring
  8. 8. Characteristics of transactional data models Normalized Complex, Directed, Deterministic relationships Longer life cycles Tumbling time windows
  9. 9. Normalization helps parallelism and fast writes order_items picker order_items shipment shipment van Order (1) Picklists (2) Vans (4) order order_items Shipments (3)
  10. 10. time order picker t1 o1 p1 order shipment o1 s1 shipment van s1 v1 Picklists Shipments Vans key order picker pick_time shipment van location o1+s1 o1 p1 t1 s1 v1 <lat, long> Pick_To_Dispatch Composite Key: order+shipment Denormalization helps fast reads
  11. 11. Incremental processing algorithm Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  12. 12. Source Source Source Source Source Blockitecture and flows R E L A Y Staging Area Replica Replica Replica Replica Replica Transform & Enrich δ Transform Area Denormalized Datamodel Visualize Data Quality (Freshness, Completeness)
  13. 13. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  14. 14. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  15. 15. time order picker 10:00 o1 p1 10:00 o2 p2 Picklists order shipment Shipments shipment van Vans start end time key order picker pick_time shipment van location 9:50 10:00 10:05 o1+null o1 p1 10:00 9:50 10:00 10:05 o2+null o2 p2 10:00 Pick_To_Dispatch At 10:05
  16. 16. time order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 order shipment o1 s1 o1 s2 shipment van previous start previous end 9:50 10:00 Picklists Shipments Vans current start current end 10:00 10:10 At 10:10
  17. 17. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  18. 18. time order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 order shipment o1 s1 o1 s2 shipment van Picklists Shipments Vans current start current end 10:00 10:10 Left Outer Join Inner Join At 10:10
  19. 19. start end time key order picker pick_time shipment van location 9:50 10:00 10:05 o1+null o1 p1 10:00 9:50 10:00 10:05 o2+null o2 p2 10:00 10:00 10:10 10:15 o1+s1 o1 p1 10:00 s1 10:00 10:10 10:15 o1+s2 o1 p1 10:00 s2 10:00 10:10 10:15 o3+null o3 p2 10:10 10:00 10:10 10:15 o4+null o4 p3 10:10 Pick_To_Dispatch At 10:15
  20. 20. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  21. 21. time order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 order shipment o1 s1 o1 s2 shipment van Picklists Shipments Vans At 10:10 Primary Key: Order+Shipment Delete Candidate: Order+null
  22. 22. start end time key order picker pick_time shipment van location 9:50 10:00 10:05 o1+null o1 p1 10:00 9:50 10:00 10:05 o2+null o2 p2 10:00 10:00 10:10 10:15 o1+s1 o1 p1 10:00 s1 10:00 10:10 10:15 o1+s2 o1 p1 10:00 s2 10:00 10:10 10:15 o3+null o3 p2 10:10 10:00 10:10 10:15 o4+null o4 p3 10:10 Pick_To_Dispatch At 10:15
  23. 23. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  24. 24. time order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 10:15 o5 p3 10:20 10:20 order shipment o1 s1 o1 s2 o2 s4 o3 s3 o4 s5 shipment van s1 v1 s2 v1 Picklists Shipments Vans Left Outer Join Inner Join Inner Join At 10:20
  25. 25. start end time key order picker pick_time shipment van location 10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1 10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1 10:10 10:20 10:25 o2+null o2 p2 10:00 10:10 10:20 10:25 o3+null o3 p2 10:10 10:10 10:20 10:25 o4+null o4 p3 10:10 10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4 10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3 10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5 10:10 10:20 10:25 o5+null o5 p3 10:15 Pick_To_Dispatch At 10:25
  26. 26. start end time key order picker pick_time shipment van location 10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1 10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1 10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4 10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3 10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5 10:10 10:20 10:25 o5+null o5 p3 10:15 Pick_To_Dispatch At 10:25
  27. 27. Identify δ mutations and join Replicate Identify temporal window Identify delete candidates Apply Enrich
  28. 28. start end time key order picker pick_time shipment van location 10:10 10:20 10:25 o1+s1 o1 p1 10:00 s1 v1 <lat,lng> 10:10 10:20 10:25 o1+s2 o1 p1 10:00 s2 v1 <lat,lng> 10:10 10:20 10:25 o2+s4 o2 p2 10:00 s4 10:10 10:20 10:25 o3+s3 o3 p2 10:10 s3 10:10 10:20 10:25 o4+s5 o4 p3 10:10 s5 10:10 10:20 10:25 o5+null o5 p3 10:15 Pick_To_Dispatch At 10:30 Orders Received Orders Picked Shipments Packed Shipments Dispatched 4 4 5 2
  29. 29. Configurable dashboards and ad-hoc analysis
  30. 30. Data is processed when it becomes available ᶬ21, ᶬ22, ᶬ23, ᶬ24, ᶬ25 ...Shipment ᶬ31, ᶬ32, ᶬ33, ᶬ34 ᶬ35, ᶬ36 ᶬ37, ᶬ38...Picklist ᶬ11, ᶬ12, ᶬ13, ᶬ14, ᶬ15, ᶬ16, ᶬ17 ...Order Booking Iteration 1 Iteration 2
  31. 31. Grocery control tower 8 different microservices 35 entities and 34 joins 100% completeness in 15 mins Configuration driven
  32. 32. Data processing implementation trade offs LATENCY ACCURACY LOW COST AGILITY
  33. 33. Batch - High accuracy, high latency, low cost Batch Bulk writes leverage cheaper disks Range queries take longer for scans Compute can be shared Replays are simpler but take as long
  34. 34. Stream Record level updates need fast writes Range scans slow down processing Compute is consistently engaged Replay is complex (ϰ) or infeasible (𝝺) Stream - Lower accuracy, low latency, low cost
  35. 35. Incremental Replication needs fast writes Joins need fast scans Compute is shared Replays addressed by design Incremental - High accuracy, mid latency, mid cost
  36. 36. Stream Batch Incremental Lower accuracy, low latency, low cost High accuracy, high latency, low cost High accuracy, medium latency, medium cost Data processing implementation trade offs
  37. 37. Applications of Incremental Processing Positive Indicators ● Time to act is 30 minutes or higher ● Accuracy is crucial ● Incremental visibility is acceptable ● Multiple systems come together with complex join criteria Negative Indicators ● Low infrastructure cost is a constraint ● Independent systems
  38. 38. Thick, Medium and Thin Slices - Choose yours BATCH STREAM INCREMENTAL
  39. 39. Next up Dashboards, thy end is nigh! Set for release in Nov 2018
  40. 40. Thick, Medium and Thin Slices - Choose yours BATCH STREAM INCREMENTAL THANK YOU!
  41. 41. time_d order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 order shipment o1 s1 o6 s6 shipment van Picklists Shipments Vans Left Outer Join Inner Join ? Example - Out of order mutations
  42. 42. time_d order picker 10:00 o1 p1 10:00 o2 p2 10:10 o3 p2 10:10 o4 p3 10:20 o6 p6 order shipment o1 s1 o6 s6 shipment van Picklists Shipments Vans Left Outer Join Example - Out of order mutations

×