Successfully reported this slideshow.
Your SlideShare is downloading. ×

Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks featurestore

Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks featurestore

Debu Sinha, Sr Specialist Solutions Architect - AI/ML at Databricks
AI/ ML/ Data Science
1. What are feature stores. 2. Why are they important? 3. Using Databricks and the feature store offering to streamline ml. This hold true for small companies as well. How we frame our approach to AI initiatives will determine its success. Don't worry, I am not a zealot. I will not tell you AI and ML are the cure-all and will solve all your problems. Some tasks are particularly well suited to these techniques, but not all. What I love about them is the fact that they allow us to tackle difficult problems that might otherwise be too daunting.

Debu Sinha, Sr Specialist Solutions Architect - AI/ML at Databricks
AI/ ML/ Data Science
1. What are feature stores. 2. Why are they important? 3. Using Databricks and the feature store offering to streamline ml. This hold true for small companies as well. How we frame our approach to AI initiatives will determine its success. Don't worry, I am not a zealot. I will not tell you AI and ML are the cure-all and will solve all your problems. Some tasks are particularly well suited to these techniques, but not all. What I love about them is the fact that they allow us to tackle difficult problems that might otherwise be too daunting.

More Related Content

More from Data Con LA

Data Con LA 2022 - Pre- Recorded - Simplifying AI/ML using Databricks featurestore

  1. 1. Simplifying AI/ML using Databricks Feature Store
  2. 2. Debu Sinha • Senior Specialist Solutions Architect at Databricks focused on AI and ML. • Senior Solutions Architect at Lifion by ADP. • Tech-Cofounder of Throtle Onboarding, that focuses on identity graph management in Ad-tech space. • Sr Software Engineer at V12 Group, Bank of America focused on Big Data and ML initiatives. • MS in Computer Science from Johns Hopkins University. Thesis on Machine Translation. LinkedIn : https://www.linkedin.com/in/debusinha Blog : https://medium.com/@debusinha2009 Email : debusinhaa2009@gmail.com
  3. 3. Agenda Understanding a typical ML development lifecycle. Understanding features. Motivation for Feature Stores. Discovering Feature Store implementation in Databricks. Demo.
  4. 4. Understanding a typical ML development lifecycle. 4
  5. 5. Understanding Features Dates Product ID Quantity Customer ID Store ID Total Order Amount 8/8/2022 0001 2 98092 01 8.00 … … … … … … Feature Raw Data Feature Engineering ● Data Transformation. (skew treatment) ● Data Augmentation. (season, time of the year, weather) ● Data Aggregation. (aggregating total sales over past 7 days, 30 days etc)
  6. 6. Features Prediction Model
  7. 7. Motivation for Feature Stores Raw Data Featurization Training Joins, Aggregates, Transforms, etc. csv csv Serving Client No reuse of Features Online / Offline Skew Duplication of Feature engineering logic.
  8. 8. Raw Data Feature Engineering Feature Store Model Training Model Inference Feature Stores - A central repository for your curated features Eliminates duplication of Feature engineering logic. Eliminates Online/Offline Skew
  9. 9. Raw Data Featurization Training Joins, Aggregates, Transforms, etc. Serving Client Feature Store Feature Registry Feature Provider batch online
  10. 10. Discovering Feature Store implementation in Databricks. Feature Store Feature Registry Feature Provider Batch (high throughput) Online (low latency) Feature Provider ▪ Batch and online access to Features ▪ Feature lookup packaged with Models ▪ Simplified deployment process Feature Registry ▪ Discoverability and Reusability ▪ Versioning ▪ Upstream and downstream Lineage Co-designed with ▪ Open format ▪ ACID transactions, Schema enforcement ▪ Unified batch and streaming API. ▪ indexing, file skipping, compaction, caching ▪ Built-in data versioning and governance ▪ Native access through PySpark, SQL, etc. Co-designed with ▪ Open model format that supports all ML frameworks ▪ Feature version and lookup logic hermetically logged with Model
  11. 11. Demo • Github: https://github.com/debu-sinha/databricks-feature-store • Databricks Trial: https://databricks.com/try-databricks • Databricks Official Feature Store page: https://www.databricks.com/product/feature-store

×