IDE.pptx

Feature Store
• Redis, Cassandra or MongoDB
• online features are required in real-time and
stored in databases such as MongoDB,
CassandraDB, or Elasticsearch, with low-
latency capabilities.
• Cassandra is a great database choice. It's
specifically designed for denormalized data
storage (you have the same data stored in
different forms or variations, so that your
application gets exactly what it needs without
further computation).
• Elastic Search

Image Reference
Storage—Features stores contain both online and offline storage. Offline storage contains all the historic data
transformed into features. They are stored in data lakes and data warehouses. Snowflake and BigQuery can be used
for offline storage. Online storage consists of data that are very recent. They contain mostly streaming data.
Online storage layers have to have very little latency. Kafka and Redis can be used for online storage.
Transformation—Features for a Machine learning model are generated through a data pipeline. The feature store
acts as an orchestrator for these pipelines. The features are recomputed based on a specified time interval and
the transformation pipeline logic can be reused for this purpose.
Ingestion of Features:
The Feature Store architecture consists of Ingestion and Consumption mechanisms. Ingestion is the process of
collecting raw data, feature engineering it to required features, and storing them in a storage solution. There
are two types of ingestion: batch processing and streaming.
Batch Processing Ingestion—Batch processing is done when a bulk of data arrives at a scheduled time. The
frequency can be something like once a day, twice an hour, once a week, etc. Since the data will be coming in
bulk the data would be stored with the likes of Amazon S3, Database, HDFS, Data Warehouse, and Data Lakes. Spark
can be used to handle bulk data with ease and store the Entity ID and Features in the Feature Store.
Streaming Ingestion—Streaming is real-time data. The data will be coming without any prior information. So Kafka
will be an ideal candidate for Streaming ingestion. The data will be stored as log files or we can get them
through API calls.
Consumption of Features:
Consumption is the process of consuming the stored features in an efficient manner. The types of consumption are
model training and model serving.
Model Training—In this case we only select a subset of features of the total population but we would be
selecting all the entities. We might consume data for experimentation or production in this method. For
experimentation, we go with Google Colab or Jupyter notebook and for products, we use Spark or TensorFlow,
or Pytorch.
Model Serving—In this case, we consume features from the Feature Store using an API call. The output would be
sent to a web or mobile application. We would call only certain entities based on the entity ID received. The
main requirement of this method is to support very low latency.

A Feature Store usually consists of Registry, Monitoring, Serving, Storage, and Transformation.
Registry—The registry is also called the metadata store which contains information such as what features are present in each entity. This will be
useful in cases where a developer from a different team needs information regarding the features which are available for a particular entity. Based on
the query of Entity ID, the features are returned.
Monitoring—Monitoring is a new feature provided in the Feature store. The monitor can raise alerts based on failure or decay in data quality. Alerts
can be configured to mail and this helps in the timely recovery and management of data.
Serving—This is the part of the Feature Store which serves features for training and inference purposes. For training purposes usually, SDKs are
provided to interact with the Feature Store. For inference, Feature Stores offer a single entity based on request.

IDE.pptx

Recommended

Recommended

More Related Content

Similar to IDE.pptx

Similar to IDE.pptx (20)

Recently uploaded

Recently uploaded (20)

IDE.pptx