SlideShare a Scribd company logo
Feature Store
Feature Store
Feature Store
Feature Store
Feature Store
• Redis, Cassandra or MongoDB
• online features are required in real-time and
stored in databases such as MongoDB,
CassandraDB, or Elasticsearch, with low-
latency capabilities.
• Cassandra is a great database choice. It's
specifically designed for denormalized data
storage (you have the same data stored in
different forms or variations, so that your
application gets exactly what it needs without
further computation).
• Elastic Search
Feature Store
Image Reference
Storage—Features stores contain both online and offline storage. Offline storage contains all the historic data
transformed into features. They are stored in data lakes and data warehouses. Snowflake and BigQuery can be used
for offline storage. Online storage consists of data that are very recent. They contain mostly streaming data.
Online storage layers have to have very little latency. Kafka and Redis can be used for online storage.
Transformation—Features for a Machine learning model are generated through a data pipeline. The feature store
acts as an orchestrator for these pipelines. The features are recomputed based on a specified time interval and
the transformation pipeline logic can be reused for this purpose.
Ingestion of Features:
The Feature Store architecture consists of Ingestion and Consumption mechanisms. Ingestion is the process of
collecting raw data, feature engineering it to required features, and storing them in a storage solution. There
are two types of ingestion: batch processing and streaming.
Batch Processing Ingestion—Batch processing is done when a bulk of data arrives at a scheduled time. The
frequency can be something like once a day, twice an hour, once a week, etc. Since the data will be coming in
bulk the data would be stored with the likes of Amazon S3, Database, HDFS, Data Warehouse, and Data Lakes. Spark
can be used to handle bulk data with ease and store the Entity ID and Features in the Feature Store.
Streaming Ingestion—Streaming is real-time data. The data will be coming without any prior information. So Kafka
will be an ideal candidate for Streaming ingestion. The data will be stored as log files or we can get them
through API calls.
Consumption of Features:
Consumption is the process of consuming the stored features in an efficient manner. The types of consumption are
model training and model serving.
Model Training—In this case we only select a subset of features of the total population but we would be
selecting all the entities. We might consume data for experimentation or production in this method. For
experimentation, we go with Google Colab or Jupyter notebook and for products, we use Spark or TensorFlow,
or Pytorch.
Model Serving—In this case, we consume features from the Feature Store using an API call. The output would be
sent to a web or mobile application. We would call only certain entities based on the entity ID received. The
main requirement of this method is to support very low latency.
A Feature Store usually consists of Registry, Monitoring, Serving, Storage, and Transformation.
Registry—The registry is also called the metadata store which contains information such as what features are present in each entity. This will be
useful in cases where a developer from a different team needs information regarding the features which are available for a particular entity. Based on
the query of Entity ID, the features are returned.
Monitoring—Monitoring is a new feature provided in the Feature store. The monitor can raise alerts based on failure or decay in data quality. Alerts
can be configured to mail and this helps in the timely recovery and management of data.
Serving—This is the part of the Feature Store which serves features for training and inference purposes. For training purposes usually, SDKs are
provided to interact with the Feature Store. For inference, Feature Stores offer a single entity based on request.
IDE.pptx
IDE.pptx
IDE.pptx

More Related Content

Similar to IDE.pptx

DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
martinbpeters
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
Yann Cluchey
 
Academy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageAcademy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storage
Binary Studio
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
Rajesh Kumar
 
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
Megha Shah
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
AWS User Group Kochi
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
David Groozman
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
Amazon Web Services
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
Big Data Spain
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
Rich Lee
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
SnappyData
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
Amazon Web Services
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
Abhishek Gupta
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Luan Moreno Medeiros Maciel
 
Distributed Cache with dot microservices
Distributed Cache with dot microservicesDistributed Cache with dot microservices
Distributed Cache with dot microservices
Knoldus Inc.
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
SnappyData
 
Caching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching PatternsCaching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching Patterns
VMware Tanzu
 

Similar to IDE.pptx (20)

DoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics PlatformDoneDeal - AWS Data Analytics Platform
DoneDeal - AWS Data Analytics Platform
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
Academy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storageAcademy PRO: HTML5 Data storage
Academy PRO: HTML5 Data storage
 
Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture Azure data analytics platform - A reference architecture
Azure data analytics platform - A reference architecture
 
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE   UNDER AZURE ...
COMPARING THE PERFORMANCE OF ETL PIPELINE USING SPARK AND HIVE UNDER AZURE ...
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S... New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
New usage model for real-time analytics by Dr. WILLIAM L. BAIN at Big Data S...
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Centralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stackCentralized log-management-with-elastic-stack
Centralized log-management-with-elastic-stack
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Distributed Cache with dot microservices
Distributed Cache with dot microservicesDistributed Cache with dot microservices
Distributed Cache with dot microservices
 
Thing you didn't know you could do in Spark
Thing you didn't know you could do in SparkThing you didn't know you could do in Spark
Thing you didn't know you could do in Spark
 
Caching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching PatternsCaching for Microservices Architectures: Session II - Caching Patterns
Caching for Microservices Architectures: Session II - Caching Patterns
 

Recently uploaded

一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
eygkup
 
Digital Fleet Management - Why Your Business Need It?
Digital Fleet Management - Why Your Business Need It?Digital Fleet Management - Why Your Business Need It?
Digital Fleet Management - Why Your Business Need It?
jennifermiller8137
 
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill RoadsWhat Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
Sprinter Gurus
 
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
afkxen
 
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
Fifth Gear Automotive Argyle
 
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
ahmedendrise81
 
AadiShakti Projects ( Asp Cranes ) Raipur
AadiShakti Projects ( Asp Cranes ) RaipurAadiShakti Projects ( Asp Cranes ) Raipur
AadiShakti Projects ( Asp Cranes ) Raipur
AadiShakti Projects
 
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
Fifth Gear Automotive Cross Roads
 
What Is Recruitment Processing Outsourcing (RPO) Services?
What Is Recruitment Processing Outsourcing (RPO) Services?What Is Recruitment Processing Outsourcing (RPO) Services?
What Is Recruitment Processing Outsourcing (RPO) Services?
Impeccable HR
 
Skoda Octavia Rs for Sale Perth | Skoda Perth
Skoda Octavia Rs for Sale Perth | Skoda PerthSkoda Octavia Rs for Sale Perth | Skoda Perth
Skoda Octavia Rs for Sale Perth | Skoda Perth
Perth City Skoda
 
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
mymwpc
 
Hero Glamour Xtec Brochure | Hero MotoCorp
Hero Glamour Xtec Brochure | Hero MotoCorpHero Glamour Xtec Brochure | Hero MotoCorp
Hero Glamour Xtec Brochure | Hero MotoCorp
Hero MotoCorp
 
What do the symbols on vehicle dashboard mean?
What do the symbols on vehicle dashboard mean?What do the symbols on vehicle dashboard mean?
What do the symbols on vehicle dashboard mean?
Hyundai Motor Group
 
What Could Cause The Headlights On Your Porsche 911 To Stop Working
What Could Cause The Headlights On Your Porsche 911 To Stop WorkingWhat Could Cause The Headlights On Your Porsche 911 To Stop Working
What Could Cause The Headlights On Your Porsche 911 To Stop Working
Lancer Service
 
Things to remember while upgrading the brakes of your car
Things to remember while upgrading the brakes of your carThings to remember while upgrading the brakes of your car
Things to remember while upgrading the brakes of your car
jennifermiller8137
 
TRAINEES-RECORD-BOOK- electronics and electrical
TRAINEES-RECORD-BOOK- electronics and electricalTRAINEES-RECORD-BOOK- electronics and electrical
TRAINEES-RECORD-BOOK- electronics and electrical
JohnCarloPajarilloKa
 
Kaizen SMT_MI_PCBA for Quality Engineerspptx
Kaizen SMT_MI_PCBA for Quality EngineerspptxKaizen SMT_MI_PCBA for Quality Engineerspptx
Kaizen SMT_MI_PCBA for Quality Engineerspptx
vaibhavsrivastava482521
 
Here's Why Every Semi-Truck Should Have ELDs
Here's Why Every Semi-Truck Should Have ELDsHere's Why Every Semi-Truck Should Have ELDs
Here's Why Every Semi-Truck Should Have ELDs
jennifermiller8137
 
How To Fix The Key Not Detected Issue In Mercedes Cars
How To Fix The Key Not Detected Issue In Mercedes CarsHow To Fix The Key Not Detected Issue In Mercedes Cars
How To Fix The Key Not Detected Issue In Mercedes Cars
Integrity Motorcar
 
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptxStatistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
coc7987515756
 

Recently uploaded (20)

一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
一比一原版(AIS毕业证)奥克兰商学院毕业证成绩单如何办理
 
Digital Fleet Management - Why Your Business Need It?
Digital Fleet Management - Why Your Business Need It?Digital Fleet Management - Why Your Business Need It?
Digital Fleet Management - Why Your Business Need It?
 
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill RoadsWhat Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill Roads
 
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
一比一原版(WashU文凭证书)圣路易斯华盛顿大学毕业证如何办理
 
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
5 Warning Signs Your Mercedes Exhaust Back Pressure Sensor Is Failing
 
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
欧洲杯比赛投注官网-欧洲杯比赛投注官网网站-欧洲杯比赛投注官网|【​网址​🎉ac123.net🎉​】
 
AadiShakti Projects ( Asp Cranes ) Raipur
AadiShakti Projects ( Asp Cranes ) RaipurAadiShakti Projects ( Asp Cranes ) Raipur
AadiShakti Projects ( Asp Cranes ) Raipur
 
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
5 Red Flags Your VW Camshaft Position Sensor Might Be Failing
 
What Is Recruitment Processing Outsourcing (RPO) Services?
What Is Recruitment Processing Outsourcing (RPO) Services?What Is Recruitment Processing Outsourcing (RPO) Services?
What Is Recruitment Processing Outsourcing (RPO) Services?
 
Skoda Octavia Rs for Sale Perth | Skoda Perth
Skoda Octavia Rs for Sale Perth | Skoda PerthSkoda Octavia Rs for Sale Perth | Skoda Perth
Skoda Octavia Rs for Sale Perth | Skoda Perth
 
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
一比一原版(AUT毕业证)奥克兰理工大学毕业证成绩单如何办理
 
Hero Glamour Xtec Brochure | Hero MotoCorp
Hero Glamour Xtec Brochure | Hero MotoCorpHero Glamour Xtec Brochure | Hero MotoCorp
Hero Glamour Xtec Brochure | Hero MotoCorp
 
What do the symbols on vehicle dashboard mean?
What do the symbols on vehicle dashboard mean?What do the symbols on vehicle dashboard mean?
What do the symbols on vehicle dashboard mean?
 
What Could Cause The Headlights On Your Porsche 911 To Stop Working
What Could Cause The Headlights On Your Porsche 911 To Stop WorkingWhat Could Cause The Headlights On Your Porsche 911 To Stop Working
What Could Cause The Headlights On Your Porsche 911 To Stop Working
 
Things to remember while upgrading the brakes of your car
Things to remember while upgrading the brakes of your carThings to remember while upgrading the brakes of your car
Things to remember while upgrading the brakes of your car
 
TRAINEES-RECORD-BOOK- electronics and electrical
TRAINEES-RECORD-BOOK- electronics and electricalTRAINEES-RECORD-BOOK- electronics and electrical
TRAINEES-RECORD-BOOK- electronics and electrical
 
Kaizen SMT_MI_PCBA for Quality Engineerspptx
Kaizen SMT_MI_PCBA for Quality EngineerspptxKaizen SMT_MI_PCBA for Quality Engineerspptx
Kaizen SMT_MI_PCBA for Quality Engineerspptx
 
Here's Why Every Semi-Truck Should Have ELDs
Here's Why Every Semi-Truck Should Have ELDsHere's Why Every Semi-Truck Should Have ELDs
Here's Why Every Semi-Truck Should Have ELDs
 
How To Fix The Key Not Detected Issue In Mercedes Cars
How To Fix The Key Not Detected Issue In Mercedes CarsHow To Fix The Key Not Detected Issue In Mercedes Cars
How To Fix The Key Not Detected Issue In Mercedes Cars
 
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptxStatistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
Statistics5,c.xz,c.;c.;d.c;d;ssssss.pptx
 

IDE.pptx

  • 1.
  • 6. Feature Store • Redis, Cassandra or MongoDB • online features are required in real-time and stored in databases such as MongoDB, CassandraDB, or Elasticsearch, with low- latency capabilities. • Cassandra is a great database choice. It's specifically designed for denormalized data storage (you have the same data stored in different forms or variations, so that your application gets exactly what it needs without further computation). • Elastic Search
  • 8. Image Reference Storage—Features stores contain both online and offline storage. Offline storage contains all the historic data transformed into features. They are stored in data lakes and data warehouses. Snowflake and BigQuery can be used for offline storage. Online storage consists of data that are very recent. They contain mostly streaming data. Online storage layers have to have very little latency. Kafka and Redis can be used for online storage. Transformation—Features for a Machine learning model are generated through a data pipeline. The feature store acts as an orchestrator for these pipelines. The features are recomputed based on a specified time interval and the transformation pipeline logic can be reused for this purpose. Ingestion of Features: The Feature Store architecture consists of Ingestion and Consumption mechanisms. Ingestion is the process of collecting raw data, feature engineering it to required features, and storing them in a storage solution. There are two types of ingestion: batch processing and streaming. Batch Processing Ingestion—Batch processing is done when a bulk of data arrives at a scheduled time. The frequency can be something like once a day, twice an hour, once a week, etc. Since the data will be coming in bulk the data would be stored with the likes of Amazon S3, Database, HDFS, Data Warehouse, and Data Lakes. Spark can be used to handle bulk data with ease and store the Entity ID and Features in the Feature Store. Streaming Ingestion—Streaming is real-time data. The data will be coming without any prior information. So Kafka will be an ideal candidate for Streaming ingestion. The data will be stored as log files or we can get them through API calls. Consumption of Features: Consumption is the process of consuming the stored features in an efficient manner. The types of consumption are model training and model serving. Model Training—In this case we only select a subset of features of the total population but we would be selecting all the entities. We might consume data for experimentation or production in this method. For experimentation, we go with Google Colab or Jupyter notebook and for products, we use Spark or TensorFlow, or Pytorch. Model Serving—In this case, we consume features from the Feature Store using an API call. The output would be sent to a web or mobile application. We would call only certain entities based on the entity ID received. The main requirement of this method is to support very low latency.
  • 9. A Feature Store usually consists of Registry, Monitoring, Serving, Storage, and Transformation. Registry—The registry is also called the metadata store which contains information such as what features are present in each entity. This will be useful in cases where a developer from a different team needs information regarding the features which are available for a particular entity. Based on the query of Entity ID, the features are returned. Monitoring—Monitoring is a new feature provided in the Feature store. The monitor can raise alerts based on failure or decay in data quality. Alerts can be configured to mail and this helps in the timely recovery and management of data. Serving—This is the part of the Feature Store which serves features for training and inference purposes. For training purposes usually, SDKs are provided to interact with the Feature Store. For inference, Feature Stores offer a single entity based on request.