SlideShare a Scribd company logo
Feature Engineering in
Machine Learning
Tanishka Garg
Mayura Zadane
08th April 2022
Agenda
1. What is a Feature? What is feature Engineering?
2. What is its importance and why it is used?
3. Main processes of Feature Engineering
4. Feature Engineering Techniques
● Imputation
● Handling Outliers
● Transformations
● Encoding
● Scaling (Normalization & Standardization)
● Binning
What is a Feature? What is Feature Engineering?
● Generally, all machine learning algorithms take input data to generate the output. The input data remains in a tabular form consisting
of rows (instances or observations) and columns (variable or attributes), and these attributes are often known as features. For
example, an image is an instance in computer vision, but a line in the image could be the feature. Similarly, in NLP, a document can
be an observation, and the word count could be the feature. So, we can say a feature is an attribute that impacts a problem or is
useful for the problem.
● The features you use influence more than everything else then the result. No algorithm alone, to my knowledge, can
supplement the information gain given by correct feature engineering.
What is a Feature?
● Feature engineering is the pre-processing step of machine learning, which extracts features from raw data.
● It helps to represent an underlying problem to predictive models in a better way, which as a result, improve the
accuracy of the model for unseen data.
● The predictive model contains predictor variables and an outcome variable, and while the feature engineering
process selects the most useful predictor variables for the model.
What is Feature Engineering?
What is its importance and why it is used?
What is its importance and why it is used?
Feature engineering in machine learning improves the model's performance. Below are some points that explain the need for feature engineering:
● Better features mean flexibility.
In machine learning, we always try to choose the optimal model to get good results. However, sometimes after choosing the wrong model, still, we
can get better predictions, and this is because of better features. The flexibility in features will enable you to select the less complex models.
Because less complex models are faster to run, easier to understand and maintain, which is always desirable.
● Better features mean simpler models.
If we input the well-engineered features to our model, then even after selecting the wrong parameters (Not much optimal), we can have good
outcomes. After feature engineering, it is not necessary to do hard for picking the right model with the most optimized parameters. If we have
good features, we can better represent the complete data and use it to best characterize the given problem.
● Better features mean better results.
As already discussed, in machine learning, as data we will provide will get the same output. So, to obtain better results, we must need to use better
features.
Main processes of Feature Engineering
Main processes of Feature Engineering
The steps of feature engineering may vary as per different data scientists and ML engineers. However, there are some common steps that are involved
in most machine learning algorithms, and these steps are as follows:
● Data Preparation: The first step is data preparation. In this step, raw data acquired from different resources are prepared to make it in a suitable
format so that it can be used in the ML model. The data preparation may contain cleaning of data, delivery, data augmentation, fusion, ingestion,
or loading.
● Exploratory Analysis: Exploratory analysis or Exploratory data analysis (EDA) is an important step of features engineering, which is mainly
used by data scientists. This step involves analysis, investing data set, and summarization of the main characteristics of data. Different data
visualization techniques are used to better understand the manipulation of data sources, to find the most appropriate statistical technique for data
analysis, and to select the best features for the data.
● Benchmark: Benchmarking is a process of setting a standard baseline for accuracy to compare all the variables from this baseline. The
benchmarking process is used to improve the predictability of the model and reduce the error rate.
Feature Engineering Techniques
Feature Engineering Techniques
1. Imputation
Feature engineering deals with inappropriate data, missing values, human interruption, general errors, insufficient data sources, etc. Missing values within the
dataset highly affect the performance of the algorithm, and to deal with them "Imputation" technique is used. Imputation is responsible for handling
irregularities within the dataset.
For example, removing the missing values from the complete row or complete column by a huge percentage of missing values. But at the same time, to
maintain the data size & to prevent loss of information, it is required to impute the missing data, which can be done as:
● For numerical data imputation, a default value can be imputed in a column, and missing values can be filled with means or medians of the columns.
● For categorical data imputation, missing values can be interchanged with the maximum occurred value in a column.
2. Handling Outliers
Outliers are the deviated values or data points that are observed too away from other data points in such a way that they badly affect the performance of the
model. Outliers can be handled with this feature engineering technique. This technique first identifies the outliers and then remove them out.
Standard deviation can be used to identify the outliers. For example, each value within a space has a definite to an average distance, but if a value is greater
distant than a certain value, it can be considered as an outlier.
Feature Engineering Techniques
3. Log transform
Logarithm transformation or log transform is one of the commonly used mathematical techniques in machine learning. Log transform helps in handling the skewed data,
and it makes the distribution more approximate to normal after transformation. It also reduces the effects of outliers on the data, as because of the normalization of
magnitude differences, a model becomes much robust.
4. Encoding
One hot encoding is the popular encoding technique in machine learning. It is a technique that converts the categorical data in a form so that they can be easily understood
by machine learning algorithms and hence can make a good prediction. It enables group the of categorical data without losing any information.
5. Scaling
In most cases, the numerical features of the dataset do not have a certain range and they differ from each other. In real life, it is nonsense to expect age and income columns
to have the same range. But from the machine learning point of view, how these two columns can be compared?
Scaling solves this problem. The continuous features become identical in terms of the range, after a scaling process.
Any Questions??
Reference :
1. https://www.repath.in/gallery/feature_engineering_for_machine_learning.pdf
2. https://www.analyticssteps.com/blogs/feature-engineering-method-machine- learning
Thank You!

More Related Content

What's hot

Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
Upekha Vandebona
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Shrey Malik
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine Learning
Hichem Felouat
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
CloudxLab
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hakky St
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
Aman Patel
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
Charles Vestur
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
Impetus Technologies
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Kush Kulshrestha
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
QuantUniversity
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
Abhimanyu Dwivedi
 

What's hot (20)

Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine Learning
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Building a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to ZBuilding a performing Machine Learning model from A to Z
Building a performing Machine Learning model from A to Z
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Decision tree
Decision treeDecision tree
Decision tree
 
Anomaly detection Workshop slides
Anomaly detection Workshop slidesAnomaly detection Workshop slides
Anomaly detection Workshop slides
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 

Similar to Feature Engineering in Machine Learning

Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
jaffarbikat
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
NeerajNishad4
 
Feature engineering
Feature engineeringFeature engineering
Feature engineering
SaurabhWani6
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
Dr.Shweta
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET Journal
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
JayChauhan100
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
priyanka rajput
 
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET Journal
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
Luminous8
 
Data Structure Notes unit 1.docx
Data Structure Notes unit 1.docxData Structure Notes unit 1.docx
Data Structure Notes unit 1.docx
kp370932
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
Davis David
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
Ajay Taneja
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
GibDevs
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
Shivam327815
 
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
cscpconf
 
Formalization & data abstraction during use case modeling in object oriented ...
Formalization & data abstraction during use case modeling in object oriented ...Formalization & data abstraction during use case modeling in object oriented ...
Formalization & data abstraction during use case modeling in object oriented ...
csandit
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
VaishaliBagewadikar
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET Journal
 

Similar to Feature Engineering in Machine Learning (20)

Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
Feature engineering
Feature engineeringFeature engineering
Feature engineering
 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
 
Data Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data QualityData Cleaning and Preprocessing: Ensuring Data Quality
Data Cleaning and Preprocessing: Ensuring Data Quality
 
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
 
Anwar kamal .pdf.pptx
Anwar kamal .pdf.pptxAnwar kamal .pdf.pptx
Anwar kamal .pdf.pptx
 
Data Structure Notes unit 1.docx
Data Structure Notes unit 1.docxData Structure Notes unit 1.docx
Data Structure Notes unit 1.docx
 
Feature enginnering and selection
Feature enginnering and selectionFeature enginnering and selection
Feature enginnering and selection
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 Machine Learning Data Life Cycle in Production (Week 2   feature engineering... Machine Learning Data Life Cycle in Production (Week 2   feature engineering...
Machine Learning Data Life Cycle in Production (Week 2 feature engineering...
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Presentation 7.pptx
Presentation 7.pptxPresentation 7.pptx
Presentation 7.pptx
 
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
FORMALIZATION & DATA ABSTRACTION DURING USE CASE MODELING IN OBJECT ORIENTED ...
 
Formalization & data abstraction during use case modeling in object oriented ...
Formalization & data abstraction during use case modeling in object oriented ...Formalization & data abstraction during use case modeling in object oriented ...
Formalization & data abstraction during use case modeling in object oriented ...
 
Module-4_Part-II.pptx
Module-4_Part-II.pptxModule-4_Part-II.pptx
Module-4_Part-II.pptx
 
IRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data ScienceIRJET - An Overview of Machine Learning Algorithms for Data Science
IRJET - An Overview of Machine Learning Algorithms for Data Science
 

More from Knoldus Inc.

Using InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in JmeterUsing InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in Jmeter
Knoldus Inc.
 
Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)
Knoldus Inc.
 
Stakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) PresentationStakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) Presentation
Knoldus Inc.
 
Introduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) PresentationIntroduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) Presentation
Knoldus Inc.
 
Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)
Knoldus Inc.
 
Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)
Knoldus Inc.
 
Clean Code in Test Automation Differentiating Between the Good and the Bad
Clean Code in Test Automation  Differentiating Between the Good and the BadClean Code in Test Automation  Differentiating Between the Good and the Bad
Clean Code in Test Automation Differentiating Between the Good and the Bad
Knoldus Inc.
 
Integrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test AutomationIntegrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test Automation
Knoldus Inc.
 
State Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptxState Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptx
Knoldus Inc.
 
Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptx
Knoldus Inc.
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)
Knoldus Inc.
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
Knoldus Inc.
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Knoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
Knoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
Knoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
Knoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
Knoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
Knoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
Knoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
Knoldus Inc.
 

More from Knoldus Inc. (20)

Using InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in JmeterUsing InfluxDB for real-time monitoring in Jmeter
Using InfluxDB for real-time monitoring in Jmeter
 
Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)Intoduction to KubeVela Presentation (DevOps)
Intoduction to KubeVela Presentation (DevOps)
 
Stakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) PresentationStakeholder Management (Project Management) Presentation
Stakeholder Management (Project Management) Presentation
 
Introduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) PresentationIntroduction To Kaniko (DevOps) Presentation
Introduction To Kaniko (DevOps) Presentation
 
Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)Efficient Test Environments with Infrastructure as Code (IaC)
Efficient Test Environments with Infrastructure as Code (IaC)
 
Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)Exploring Terramate DevOps (Presentation)
Exploring Terramate DevOps (Presentation)
 
Clean Code in Test Automation Differentiating Between the Good and the Bad
Clean Code in Test Automation  Differentiating Between the Good and the BadClean Code in Test Automation  Differentiating Between the Good and the Bad
Clean Code in Test Automation Differentiating Between the Good and the Bad
 
Integrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test AutomationIntegrating AI Capabilities in Test Automation
Integrating AI Capabilities in Test Automation
 
State Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptxState Management with NGXS in Angular.pptx
State Management with NGXS in Angular.pptx
 
Authentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptxAuthentication in Svelte using cookies.pptx
Authentication in Svelte using cookies.pptx
 
OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)OAuth2 Implementation Presentation (Java)
OAuth2 Implementation Presentation (Java)
 
Supply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptxSupply chain security with Kubeclarity.pptx
Supply chain security with Kubeclarity.pptx
 
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 

Feature Engineering in Machine Learning

  • 1. Feature Engineering in Machine Learning Tanishka Garg Mayura Zadane 08th April 2022
  • 2. Agenda 1. What is a Feature? What is feature Engineering? 2. What is its importance and why it is used? 3. Main processes of Feature Engineering 4. Feature Engineering Techniques ● Imputation ● Handling Outliers ● Transformations ● Encoding ● Scaling (Normalization & Standardization) ● Binning
  • 3. What is a Feature? What is Feature Engineering?
  • 4. ● Generally, all machine learning algorithms take input data to generate the output. The input data remains in a tabular form consisting of rows (instances or observations) and columns (variable or attributes), and these attributes are often known as features. For example, an image is an instance in computer vision, but a line in the image could be the feature. Similarly, in NLP, a document can be an observation, and the word count could be the feature. So, we can say a feature is an attribute that impacts a problem or is useful for the problem. ● The features you use influence more than everything else then the result. No algorithm alone, to my knowledge, can supplement the information gain given by correct feature engineering. What is a Feature?
  • 5. ● Feature engineering is the pre-processing step of machine learning, which extracts features from raw data. ● It helps to represent an underlying problem to predictive models in a better way, which as a result, improve the accuracy of the model for unseen data. ● The predictive model contains predictor variables and an outcome variable, and while the feature engineering process selects the most useful predictor variables for the model. What is Feature Engineering?
  • 6. What is its importance and why it is used?
  • 7. What is its importance and why it is used? Feature engineering in machine learning improves the model's performance. Below are some points that explain the need for feature engineering: ● Better features mean flexibility. In machine learning, we always try to choose the optimal model to get good results. However, sometimes after choosing the wrong model, still, we can get better predictions, and this is because of better features. The flexibility in features will enable you to select the less complex models. Because less complex models are faster to run, easier to understand and maintain, which is always desirable. ● Better features mean simpler models. If we input the well-engineered features to our model, then even after selecting the wrong parameters (Not much optimal), we can have good outcomes. After feature engineering, it is not necessary to do hard for picking the right model with the most optimized parameters. If we have good features, we can better represent the complete data and use it to best characterize the given problem. ● Better features mean better results. As already discussed, in machine learning, as data we will provide will get the same output. So, to obtain better results, we must need to use better features.
  • 8. Main processes of Feature Engineering
  • 9. Main processes of Feature Engineering The steps of feature engineering may vary as per different data scientists and ML engineers. However, there are some common steps that are involved in most machine learning algorithms, and these steps are as follows: ● Data Preparation: The first step is data preparation. In this step, raw data acquired from different resources are prepared to make it in a suitable format so that it can be used in the ML model. The data preparation may contain cleaning of data, delivery, data augmentation, fusion, ingestion, or loading. ● Exploratory Analysis: Exploratory analysis or Exploratory data analysis (EDA) is an important step of features engineering, which is mainly used by data scientists. This step involves analysis, investing data set, and summarization of the main characteristics of data. Different data visualization techniques are used to better understand the manipulation of data sources, to find the most appropriate statistical technique for data analysis, and to select the best features for the data. ● Benchmark: Benchmarking is a process of setting a standard baseline for accuracy to compare all the variables from this baseline. The benchmarking process is used to improve the predictability of the model and reduce the error rate.
  • 11. Feature Engineering Techniques 1. Imputation Feature engineering deals with inappropriate data, missing values, human interruption, general errors, insufficient data sources, etc. Missing values within the dataset highly affect the performance of the algorithm, and to deal with them "Imputation" technique is used. Imputation is responsible for handling irregularities within the dataset. For example, removing the missing values from the complete row or complete column by a huge percentage of missing values. But at the same time, to maintain the data size & to prevent loss of information, it is required to impute the missing data, which can be done as: ● For numerical data imputation, a default value can be imputed in a column, and missing values can be filled with means or medians of the columns. ● For categorical data imputation, missing values can be interchanged with the maximum occurred value in a column. 2. Handling Outliers Outliers are the deviated values or data points that are observed too away from other data points in such a way that they badly affect the performance of the model. Outliers can be handled with this feature engineering technique. This technique first identifies the outliers and then remove them out. Standard deviation can be used to identify the outliers. For example, each value within a space has a definite to an average distance, but if a value is greater distant than a certain value, it can be considered as an outlier.
  • 12. Feature Engineering Techniques 3. Log transform Logarithm transformation or log transform is one of the commonly used mathematical techniques in machine learning. Log transform helps in handling the skewed data, and it makes the distribution more approximate to normal after transformation. It also reduces the effects of outliers on the data, as because of the normalization of magnitude differences, a model becomes much robust. 4. Encoding One hot encoding is the popular encoding technique in machine learning. It is a technique that converts the categorical data in a form so that they can be easily understood by machine learning algorithms and hence can make a good prediction. It enables group the of categorical data without losing any information. 5. Scaling In most cases, the numerical features of the dataset do not have a certain range and they differ from each other. In real life, it is nonsense to expect age and income columns to have the same range. But from the machine learning point of view, how these two columns can be compared? Scaling solves this problem. The continuous features become identical in terms of the range, after a scaling process.
  • 13. Any Questions?? Reference : 1. https://www.repath.in/gallery/feature_engineering_for_machine_learning.pdf 2. https://www.analyticssteps.com/blogs/feature-engineering-method-machine- learning