Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
×
1 of 40

Zipline - A Declarative Feature Engineering Framework

0

Share

Download to read offline

Zipline is Airbnb’s data management platform specifically designed for ML use cases. Previously, ML practitioners at Airbnb spent roughly 60% of their time on collecting and writing transformations for machine learning tasks.

Related Books

Free with a 30 day trial from Scribd

See all

Zipline - A Declarative Feature Engineering Framework

  1. 1. Zipline Declarative Feature Engineering Framework Nikhil Simha nikhil.simha@airbnb.com
  2. 2. Exploration Problem Feature Creation Model Training Model Serving Feature Serving Application Data Engineer Data Scientist ML/Systems Engineer ML/Systems Engineer
  3. 3. “We recognize that a mature system might end up being (at most) 5% machine learning code and (at least) 95% glue code” – Sculley, NIPS 2015
  4. 4. • Question – “glue code” • Imperative process -> Declarative specification • Months to days • With just the DS Goal
  5. 5. • 60 – 70% • Good data with okay/simple model Feature Engineering
  6. 6. Feature Engineering
  7. 7. • Part of Bighead • Supervised learning • Structured data vs unstructured data • systems problem vs. math problem Context
  8. 8. What makes Feature Engineering Hard?
  9. 9. • Features + Algorithm • Data • Continuously Arriving Everything changes
  10. 10. Your typical Data Warehouse
  11. 11. Service Fleet Production Database DB Snapshot Event log Change Capture Stream Event Stream Change capture log M essage Bus D ata Lake Live Derived Data Media
  12. 12. An example ● Predict likelihood of you liking a particular Indian restaurant ● Total visits to Indian places last month ● Average rating of the restaurant last year ● They are all aggregations
  13. 13. An example ● Predict likelihood of you liking a particular Indian restaurant ● Total visits to Indian places last month ● Operation: Count, Input: Visit, Window = 1month, ● Source: Check-in stream ● Average rating of the restaurant ● Operation: AVG, Input: rating, Window = 1yr ● Source: Ratings table ● They are all aggregations
  14. 14. Feature Set Example
  15. 15. Feature Set Example
  16. 16. Feature Set Example
  17. 17. F1 F2 F3 0 5 7 3 0 8 Time 4 2 4 Label 4 L Prediction P1 P2 7 3 8 4 2 8 L L Training data set Aggregations + Temporal Join
  18. 18. Feature Serving for inference What is the value of these feature aggregates now?
  19. 19. Real-time features • Event log + Event Stream = Realtime – Features • DB Snapshots + Change data = Realtime-features
  20. 20. Feature Serving • Latency • Optimized for point queries • Freshness vs latency • Service Events and DB Mutations • Batch correction
  21. 21. Feature Computation for training What are the exact feature values at the points-of-interest in history?
  22. 22. user Time 123 2019-09-13 17:31 234 2019-09-14 17:40 345 2019-09-15 17:02 Example Visits Cnt / month Rating Avg / year 5 4 20 4 6 2 Query Log Aggregated Features
  23. 23. Model Server Architecture Feature Declaration Streaming Updates Batch partial aggregates Feature Store Feature Backfills Model Training Model Feature Client Labeling Application Server
  24. 24. Aggregation Math
  25. 25. Aggregations – SUM • Commutative: a + b = b + a • Associative: (a + b) + c = a + (b + c) • Reversible: (a + b) – a = b • Abelian Group
  26. 26. Aggregations – AVG • One not-so-clever trick • Operate on “Intermediate Representation” / IR • Factors into (sum, count) • Finalized by a division: (sum/count)
  27. 27. Aggregations • Constant memory / Bounded IR • Two classes of aggregations • Sum, Avg, Count etc., • Reversible / Abelian Groups • Min, Max, Approx Unique, most sketches etc., • Non-Reversible / Commutative Monoids / Non-Groups
  28. 28. Incremental Windowing – with reversibility 0 1 .. .. 0 1 0 .. Visits – check-in stream of a user 1 4 6 8 9 8 7 In the last year -1 +0
  29. 29. 2 2 2 Incremental Windowing – with reversibility 1 3 Max rating – Ratings table – grouped by user 3 2 4 4 4 1 0 1 0 1 1 1 4 2 3 3 1 0 1 3 1 2 2 2 2 4
  30. 30. Windowing – w/o reversibility • Time: O(N^2) vs O(NLogN) • Space: N vs 2N memory Groups Non-Groups Un-Windowed No-Reversal No-Reversal Windowed Reversal Tree
  31. 31. Windowing – w/o reversibility • Tiling problem • Tile([left, right]) => Tile([left, split_point]) + Tile([split_point, right]) • Split_point => right && (MAX_INT << msb(left ^ right)) • Tiles are the binary representation of (right – split_point) and (split_point - left) • Less hand-waving in the paper
  32. 32. Reversibility - Unpacking Change data • Deletion is a reversal • Update is a delete followed by an insert • Example: • Sudden heat wave forecast at 7 pm.
  33. 33. user Time 123 2019-09-13 17:31 234 2019-09-14 17:40 345 2019-09-15 17:02 Example Visits Sum / month Rating Max / year 5 4 20 4 6 2 Query Log Aggregated Features
  34. 34. Feature Backfill • Time-series join with aggregations • Left :: Query Log :: [(Entity Key, timestamp)] • Right :: Raw Data :: [(Entity Key, timestamp, unaggregated)] • Output :: Feature Data :: [(Entity Key, timestamp, aggregated)] • Aggregation and join is fused • Raw data >> query log
  35. 35. 12 13 Tree Merge 0 1 Query timestamps 0-1 2 3 2-3 0-3 4 5 4-5 6 7 6-7 4-7 0-7 8 9 8-9 10 11 10-11 8-11 12-13 14 15 14-15 12-15 8-15 0-15 Incoming Event (ts, payload) Event span
  36. 36. Feature Backfill – Topology Query Log (key, query time) Raw Data (key, event time, payload) Pivoted queries (key, [query time]) Broadcast Partial aggregate (key, [query time], aggregate) Tree merge Flat map & Re-key Partial Aggregate ((key, query time), aggregate) Results (key, query time, aggregate) Shuffle & Merge GroupBy
  37. 37. Feature Backfill – Nuances • Time Skew • Event time vs ingestion time • Many sources of raw data at once • Un-skewed can be faster • More in paper
  38. 38. Feature Serving – lambda • Head = Streaming, Tail = Batch • Availability for batch correction • Reduced tail Resolution 30 Day window 30 Day window
  39. 39. Links • 95%+ glue code: • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning- systems.pdf • 50%+ feature engineering • https://developers.google.com/machine-learning/data-prep/process
  40. 40. Questions

×