Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Real-Time Feature Store at iFood

316 views

Published on

iFood is the largest food tech company in Latin America. We serve more than 26 million orders each month from more than 150 thousand restaurants.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Building a Real-Time Feature Store at iFood

  1. 1. Building a Realtime Feature Store at iFood Daniel Galinkin ML Platform Tech Lead
  2. 2. Agenda iFood and AI What is the iFood mission, and how we use AI What is a Feature Store What is a Feature Store, and why it is important to solve AI problems How iFood built its Feature Store How iFood built its Feature Store by leveraging Spark, Databricks and Delta Tables
  3. 3. iFood and AI
  4. 4. BIGGEST FOODTECH IN LATIN AMERICA (we’re in Brazil, Mexico and Colombia) ~30 million orders per month +800 cities in all Brazilian states +100 thousand restaurants
  5. 5. AI Everywhere ▪ Restaurants recommendations ▪ Dishes recommendations ▪ Optimize the drivers allocation ▪ Estimate the delivery time ▪ Find the most efficient route LogisticsDiscovery ▪ Optimize the use of marketing ads ▪ Optimize the use of coupons Marketing
  6. 6. ML Platform Configuration Data Collection Feature Extraction ML Data Verification Analysis Tools Machine Resource Management Process Management Tools Serving Infrastructure Monitoring
  7. 7. What is a Feature Store
  8. 8. What are features? ▪ Any kind of data used to train a ML model ▪ Feature types: ▪ State features ▪ Did the user have a coupon at the time? ▪ Aggregate features ▪ Average ticket price in the last 30 days for the user ▪ External features ▪ Was it raining at the time?
  9. 9. What is a feature store? ▪ The feature store is the central place in an organization to query for features ▪ Features are mostly used by machine learning algorithms ▪ They can also be useful for other applications ▪ For example, you could use the average ticket price for a user to show a high end or low end list of restaurants
  10. 10. Feature store requirements ▪ General: ▪ Low latency ▪ Access & Calculation ▪ Access control ▪ Versioning ▪ Scalability ▪ Easy API for data access ▪ Machine Learning: ▪ Backfilling ▪ “Time-travel” - snapshot for historical feature values
  11. 11. How iFood built its Feature Store
  12. 12. Feature Store Aggregation Service iFood Software Architecture Streaming as a first-class citizen Orders Microservice Payments Microservice Fleet location Microservice Sessions Microservice Coupons Microservice Notifications Microservice Real-time Data Lake Feature Store Realtime events Central Bus
  13. 13. iFood Real-time Data Lake Architecture ▪ Kafka storage is expensive ▪ Retention is limited ▪ Full event history enables recalculation and backfilling for features ▪ Delta tables provide a cheap storage option ▪ Delta tables can double as either batch or streaming sources Realtime events Central Bus Data Lake Streaming Jobs Data Lake Streaming Delta Table
  14. 14. iFood Feature Store Architecture Kafka Bus Real-time Redis Storage Data Lake Streaming Delta Table DynamoDB Metadata Aggregation Jobs Aggregation Jobs Aggregation Jobs Historic Backfilling Jobs Historic Backfilling Jobs Historic Backfilling Jobs Historic Materialization Job Real-time Materialization Job Historic Delta Table Storage
  15. 15. iFood Feature Store Architecture Kafka Bus Data Lake Streaming Delta Table Aggregation Jobs Aggregation Jobs Aggregation Jobs Real-time Redis Storage DynamoDB Metadata Historic Backfilling Jobs Historic Backfilling Jobs Historic Backfilling Jobs Historic Materialization Job Real-time Materialization Job Historic Delta Table Storage The aggregation jobs
  16. 16. iFood Feature Store Architecture The aggregation jobs ▪ Features are usually combinations of: ▪ Source - orders stream ▪ Window range - last 30 days ▪ Grouping key - by each user ▪ Value - ticket price ▪ Filter - during lunch ▪ Aggregation type - average Kafka Bus Data Lake Streaming Delta Table Aggregation Jobs Aggregation Jobs Aggregation Jobs
  17. 17. iFood Feature Store Architecture The aggregation jobs ▪ With spark streaming, you can only execute one group by operation per dataframe/job ▪ Each combination of grouping key and window range results in a new dataframe ▪ That means increased costs and operational complexity ordersStreamDF .groupBy(col("user_id"), window(col("order_ts"), "1 day")) .agg(sum("ticket")) ordersStreamDF .groupBy(col("user_id"), window(col("order_ts"), "3 days")) .agg(sum("ticket")) ordersStreamDF .groupBy(col("user_id"), window(col("order_ts"), "7 days")) .agg(sum("ticket"))
  18. 18. iFood Feature Store Architecture The aggregation jobs ▪ We store the intermediate state for several aggregation types for a fixed smaller window ▪ We then combine the results to emit the result for several window sizes at once ▪ This also allows us use the same code and the same job to calculate historical and real-time features
  19. 19. iFood Feature Store Architecture The aggregation jobs - Two-step aggregation logic Orders Streaming Source D-6 1 D-5 2 D-4 3 D-3 0 D-2 1 D-1 1 D-0 2 D-6 1 D-5 2 D-4 3 D-3 0 D-2 1 D-1 1 D-0 2 D-6 to D-4 6 D-5 to D-3 5 D-4 to D-2 4 D-6 to D-0 10 D-3 to D-1 2 D-2 to D-0 4 1 day windows 3 days windows 7 days windows
  20. 20. iFood Feature Store Architecture The aggregation jobs ▪ How to express that? ▪ flatMapGroupsWithState ▪ Flexibility on storing state and expressing calculation logic ▪ That allows us to combine dozens of jobs into one def combineAggregations( sourceDF: DataFrame, groupByKeys: Seq[String], windowStep: Long, combinationRules: Seq[CombinationRule]): DataFrame = { putStateAndOutputPlaceholdersToFitCombinedSchema(df) .groupByKey(row => combineGroupKeys()) .flatMapGroupsWithState((state, miniBatchIterator) => { miniBatchIterator.foreach(row => { if (inputWindowEnd() > newestOutputWindowEnd()) { moveStateRangeForward() } if (inputRowIsInStateRange()) { firstStepUpdateIntermediateValue() } }) combinationRules.foreach(combinationRule => { secondStepCalculateFinalResultBasedOnIntermediateValues() }) yieldAnOutputRowBasedOnTheResults() }) }
  21. 21. iFood Feature Store Architecture The aggregation jobs Order ID Customer ID Date ... Customer 1 2020-01-01 ... Entity Entity ID Date Feat. Name Feat. Value Customer 1 2020-01-01 NOrders1Day 2 Customer 1 2020-01-01 NOrders3Days 6 Customer 1 2020-01-01 NOrders7Days 10
  22. 22. iFood Feature Store Architecture Kafka Bus Real-time Redis Storage Historic Materialization Job Real-time Materialization Job Historic Delta Table Storage Data Lake Streaming Delta Table Aggregation Jobs Aggregation Jobs Aggregation Jobs DynamoDB Metadata Historic Backfilling Jobs Historic Backfilling Jobs Historic Backfilling Jobs The materialization jobs
  23. 23. iFood Feature Store Architecture The materialization jobs ▪ Feature update commands are stored to a kafka topic - think CDC or log tailing ▪ Update feature F for entity E at row R with value V ▪ Using the Delta Table Storage, we use MERGE INTO and the map_concat function to be flexible Entity Entity ID Date Feat. Name Feat. Value Customer 1 2020-01-01 AvgTicket Price30Days 25.8 Entity Entity ID Date Features Map Customer 1 2020-01-01 AvgTicketPrice30 Days -> 25.8 Entity Entity ID Date Feat. Name Feat. Value Customer 2 2020-02-01 NOrders30D ays 17 Entity Entity ID Date Features Map Customer 1 2020-01-01 AvgTicketPrice30 Days -> 25.8 Customer 2 2020-02-01 NOrders30Days -> 17 Entity Entity ID Date Feat. Name Feat. Value Customer 1 2020-01-01 NOrders30D ays 3 Entity Entity ID Date Features Map Customer 1 2020-01-01 AvgTicketPrice30 Days -> 25.8 NOrders30Days -> 3 Customer 2 2020-02-01 NOrders30Days -> 17
  24. 24. iFood Feature Store Architecture The materialization jobs ▪ Consumers are free to materialize them to their database of choice ▪ For ML, we use: ▪ A delta table for historic feature values ▪ A redis cluster for low latency real-time access Kafka Bus Real-time Redis Storage Historic Materialization Job Real-time Materialization Job Historic Delta Table Storage
  25. 25. iFood Feature Store Architecture Real-time Redis Storage Historic Materialization Job Real-time Materialization Job Historic Delta Table Storage Kafka Bus Data Lake Streaming Delta Table Aggregation Jobs Aggregation Jobs Aggregation Jobs DynamoDB Metadata Historic Backfilling Jobs Historic Backfilling Jobs Historic Backfilling Jobs The backfilling jobs
  26. 26. iFood Feature Store Architecture The backfilling jobs ▪ How to calculate features for streaming data registered before the creation of a feature? ▪ Use a metadata database to store the creation time of each feature ▪ Run a backfilling job to create feature values up to the feature creation ▪ Start the streaming job to emit results using values that arrive after the creation date Kafka Bus Data Lake Streaming Delta Table Aggregation Jobs Aggregation Jobs Aggregation Jobs DynamoDB Metadata Historic Backfilling Jobs Historic Backfilling Jobs Historic Backfilling Jobs
  27. 27. Lessons learned & Best practices ▪ Delta Tables double as streaming or batch sources ▪ OPTIMIZE is a must for streaming jobs saving to a Delta Table ▪ Either on auto mode, or as a separate process ▪ When starting a brand new job from a streaming delta table source, the files reading order is not guaranteed ▪ This is even more noticeable after running OPTIMIZE (which you should!) ▪ If the events processing order is important for your job, either use Trigger.Once to process the first historical batch, or process each partition sequentially in order
  28. 28. Lessons learned & Best practices ▪ flatMapGroupsWithState is really powerful ▪ State management should be handled with care ▪ foreachBatch is really powerful ▪ Please note it can be triggered on an empty Dataframe, though ▪ Be sure to use correct partition pruning when using the MERGE INTO operation ▪ Be careful with parameter changes between job restarts ▪ StreamTest really helps with unit tests, debugging and raising the bar
  29. 29. Positive outcomes ▪ Unified codebase for historical and real-time features - 50% less code ▪ Unified jobs for historical and real-time features - from dozens of jobs to around 10 ▪ Huge batch ETL jobs are substituted by much smaller streaming clusters ▪ Though they run 24/7 ▪ Delta tables allow for isolation between read and write operations
  30. 30. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

×