Successfully reported this slideshow.
Your SlideShare is downloading. ×

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

Advertisement

More Related Content

Advertisement
Advertisement

Feature store Overview St. Louis Big Data IDEA Meetup aug 2020

  1. 1. Confidential and Proprietary to Daugherty Business Solutions Feature Store Overview Adam Doyle St. Louis Big Data IDEA August 2020
  2. 2. Confidential and Proprietary to Daugherty Business Solutions The Data Science Process
  3. 3. Confidential and Proprietary to Daugherty Business Solutions “A feature is an individual measurable property or characteristic of a phenomenon being observed… Feature data is used both as input to models during training and when models are served in production.” Key takeaways • Features are not data • Features enumerate information • Not all features are equal Features https://docs.feast.dev/user-guide/features
  4. 4. Confidential and Proprietary to Daugherty Business Solutions Feature Engineering is the process of extracting features from raw data. Feature Engineering Techniques • Imputation • Handling Outliers • Binning • Numerical Transform • One-Hot Encoding • Grouping • Extraction • Scaling Feature Engineering
  5. 5. Confidential and Proprietary to Daugherty Business Solutions • Feature Reuse Between Models • Consistent Feature Definitions • Latency / Recency • Environmental Variation • Unstable Dependencies • Governance • Versioning Feature Challenges
  6. 6. Confidential and Proprietary to Daugherty Business Solutions Feature Store API Metadata / Model / Predictions Offline Data Store Online Data Store Batch Engine Stream Engine Batch Prediction Stream Prediction
  7. 7. Confidential and Proprietary to Daugherty Business Solutions • Retrieve Feature Metadata • Retrieve Feature Values • Remove Features • Store Features • Stream Store Features • Stream Retrieve Features • Feature Versioning • Model Versioning • Record Predictions Feature Store Use Cases
  8. 8. Confidential and Proprietary to Daugherty Business Solutions • Data engineers interact with a feature store by creating data pipeline definitions. • Data pipeline definitions combine – Data Sources – Business definitions – Transformation rule – Streaming/Batch definitions – Scheduling • Data pipelines are executed by the feature store engines and stored in online and offline data stores. Data Pipeline
  9. 9. Confidential and Proprietary to Daugherty Business Solutions • Data scientists interact with the feature store through the Feature Registry. • They can search for and browse feature definitions. • They can register data science models as a class of data pipeline. Feature Registry
  10. 10. Confidential and Proprietary to Daugherty Business Solutions • Feature stores can assist with versioning and monitoring data science applications. • Predictions are recorded in the feature store API including source data, model used, version of that model, and the rendered prediction. • Predictions can be compared with reality to determine the accuracy of the models. • Models and versions are tracked and can be used to determine the lift provided by a particular instance of a model. Versioning and Monitoring
  11. 11. Confidential and Proprietary to Daugherty Business Solutions • Open Source – GoJEK/Google FEAST • Product Offerings – Logical Clocks Hopsworks – Scribble Enrich • Presentations Only – Uber Michaelangelo – Airbnb Zipline – Survey Monkey ML Feature Store – Netflix MetaFlow Feature Store Implementations
  12. 12. Confidential and Proprietary to Daugherty Business Solutions • http://featurestore.org/ • https://www.scribbledata.io/resources-feature-store-guide • https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf • https://towardsdatascience.com/feature-stores-components-of-a-data-science-factory-f0f1f73d39b8 • https://towardsdatascience.com/what-are-feature-stores-and-why-are-they-critical-for-scaling-data-science- 3f9156f7ab4 • https://www.logicalclocks.com/hopsworks-featurestore • https://eng.uber.com/michelangelo-machine-learning-platform/ • https://technology.condenast.com/story/accelerating-machine-learning-with-the-feature-store-service • https://cloud.google.com/blog/products/ai-machine-learning/introducing-feast-an-open-source-feature-store-for- machine-learning • https://databricks.com/session/zipline-airbnbs-machine-learning-data-management-platform • https://engineering.linkedin.com/blog/2017/06/building-the-activity-graph--part-i • https://databricks.com/session/fact-store-scale-for-netflix-recommendations • https://medium.com/@changshe/rethinking-feature-stores-74963c2596f0 Links

×