SlideShare a Scribd company logo
1 of 22
Download to read offline
P1WU
UNIT – III: CLASSIFICATION
Topic 8 FEATURE SELECTION OR
DIMENSIONALITY REDUCTION
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naïve Text Classification
4. Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or
Dimensionality
Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
FEATURE SELECTION OR DIMENSIONALITY REDUCTION
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
FEATURE SELECTION OR DIMENSIONALITY REDUCTION
• Feature selection and dimensionality reduction allow us to
• minimize the number of features in a dataset by only keeping features that are
important.
• In other words, we want to retain features that contain the
most useful information that is needed by our model to
• make accurate predictions while discarding redundant features that contain
little to no information.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
FEATURE SELECTION OR DIMENSIONALITY REDUCTION
• There are several benefits in performing feature selection and
dimensionality reduction which include:
• model interpretability,
• minimizing overfitting
as well as
• reducing the size of the training set and consequently training time.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Dimensionality Reduction
• The number of input variables or features for a dataset is referred to
as its dimensionality.
• Dimensionality reduction refers to techniques that reduce the number
of input variables in a dataset.
• More input features often make a predictive modeling task more
challenging to model, more generally referred to as the curse of
dimensionality.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Dimensionality Reduction
• High-dimensionality statistics and dimensionality reduction
techniques are often used for data visualization.
• Nevertheless these techniques can be used in applied machine
learning to
• simplify a classification or regression dataset in order to better fit a predictive
model.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Problem With Many Input Variables
• If your data is represented using rows and columns, such as in a
spreadsheet, then
• the input variables are the columns that are fed as input to a model to predict
the target variable. Input variables are also called features.
• We can consider the columns of data representing dimensions on an
n-dimensional feature space and the rows of data as points in that
space.
• This is a useful geometric interpretation of a dataset.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Problem With Many Input Variables
• Having a large number of dimensions in the feature space can mean that
the volume of that space is very large, and in turn,
• the points that we have in that space (rows of data) often represent a small and non-
representative sample.
• This can dramatically impact the performance of machine learning
algorithms fit on data with many input features, generally referred to as the
“curse of dimensionality.”
• Therefore, it is often desirable to reduce the number of input features. This
reduces the number of dimensions of the feature space, hence the name
“dimensionality reduction.”
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Dimensionality Reduction
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Dimensionality Reduction
• Dimensionality reduction refers to techniques for reducing the
number of input variables in training data.
• When dealing with high dimensional data, it is often useful to reduce
the dimensionality by projecting the data to a lower dimensional
subspace which captures the “essence” of the data. This is called
dimensionality reduction.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Dimensionality Reduction
• Fewer input dimensions often mean correspondingly fewer
parameters or a simpler structure in the machine learning model,
referred to as degrees of freedom.
• A model with too many degrees of freedom is likely to overfit the
training dataset and therefore may not perform well on new data.
• It is desirable to have simple models that generalize well, and in turn,
input data with few input variables.
• This is particularly true for linear models where the number of inputs
and the degrees of freedom of the model are often closely related.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Techniques for Dimensionality Reduction
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Techniques for Dimensionality Reduction
• There are many techniques that can be used for dimensionality
reduction.
• Feature Selection Methods
• Matrix Factorization
• Manifold Learning
• Auto encoder Methods
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Feature Selection Methods
• Feature selection is also called variable selection or attribute
selection.
• It is the automatic selection of attributes in your data (such as
columns in tabular data) that are most relevant to the predictive
modeling problem you are working on.
• feature selection…
• is the process of selecting a subset of relevant features for use in model
construction
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Feature Selection Methods
• Feature selection is different from dimensionality reduction.
• Both methods seek to reduce the number of attributes in the dataset,
• but a dimensionality reduction method do so by creating new combinations of
attributes,
• where as feature selection methods include and exclude attributes present in
the data without changing them.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Feature Selection Methods
• Examples of dimensionality reduction methods include
• Principal Component Analysis,
• Singular Value Decomposition and
• Sammon’s Mapping.
• Feature selection is itself useful, but it mostly acts as a filter, muting
out features that aren’t useful in addition to your existing features.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Feature Selection Algorithms
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Filter Methods
• Filter feature selection methods apply a statistical measure to assign a
scoring to each feature.
• The features are ranked by the score and either selected to be kept or
removed from the dataset.
• The methods are often univariate and consider the feature
independently, or with regard to the dependent variable.
• Some examples of some filter methods include the Chi squared test,
information gain and correlation coefficient scores.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Wrapper Methods
• Wrapper methods consider the selection of a set of features as a search problem,
where different combinations are prepared, evaluated and compared to other
combinations.
• A predictive model us used to evaluate a combination of features and assign a
score based on model accuracy.
• The search process may be
• methodical such as a best-first search,
• it may stochastic such as a random hill-climbing algorithm, or
• it may use heuristics, like forward and backward passes to add and remove features.
• An example if a wrapper method is the recursive feature elimination algorithm.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Embedded Methods
• Embedded methods learn which features best contribute to the accuracy of the
model while the model is being created.
• The most common type of embedded feature selection methods are
• regularization methods.
• Regularization methods are also called penalization methods that introduce
• additional constraints into the optimization of a predictive algorithm (such as a regression
algorithm) that bias the model toward lower complexity (fewer coefficients).
• Examples of regularization algorithms are the
• LASSO, Elastic Net and Ridge Regression.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER – VIII
PROFESSIONAL ELECTIVE – IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES

More Related Content

Similar to CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf

Customer relationship management
Customer relationship managementCustomer relationship management
Customer relationship management
Rohit Gupta
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
SILVIUSyt
 
1) Question Add Targets to Balanced score Card
1) Question  Add Targets to Balanced score Card1) Question  Add Targets to Balanced score Card
1) Question Add Targets to Balanced score Card
MartineMccracken314
 
1) Question Add Targets to Balanced score Card
1) Question  Add Targets to Balanced score Card1) Question  Add Targets to Balanced score Card
1) Question Add Targets to Balanced score Card
AbbyWhyte974
 

Similar to CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf (20)

CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdfCS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
CS8080_IRT_UNIT - III T14 SEQUENTIAL SEARCHING.pdf
 
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdfCS8080_IRT_UNIT - III T13 INVERTED  INDEXES.pdf
CS8080_IRT_UNIT - III T13 INVERTED INDEXES.pdf
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 
ml-09x01.pdf
ml-09x01.pdfml-09x01.pdf
ml-09x01.pdf
 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
 
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdfCS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
 
DIGITAL TRANSFORMATION AND STRATEGY_final.pptx
DIGITAL TRANSFORMATION AND STRATEGY_final.pptxDIGITAL TRANSFORMATION AND STRATEGY_final.pptx
DIGITAL TRANSFORMATION AND STRATEGY_final.pptx
 
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdfCS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
 
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdfCS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
CS8080_IRT_UNIT - III T6 K-NN CLASSIFIER.pdf
 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
 
Feature Engineering.pdf
Feature Engineering.pdfFeature Engineering.pdf
Feature Engineering.pdf
 
Week 8: Programming for Data Analysis
Week 8: Programming for Data AnalysisWeek 8: Programming for Data Analysis
Week 8: Programming for Data Analysis
 
Customer relationship management
Customer relationship managementCustomer relationship management
Customer relationship management
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdfe3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
e3f55595181f7cad006f26db820fb78ec146e00e-1646623528083 (1).pdf
 
CSEIT- ALL.pptx
CSEIT- ALL.pptxCSEIT- ALL.pptx
CSEIT- ALL.pptx
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
1) Question Add Targets to Balanced score Card
1) Question  Add Targets to Balanced score Card1) Question  Add Targets to Balanced score Card
1) Question Add Targets to Balanced score Card
 
1) Question Add Targets to Balanced score Card
1) Question  Add Targets to Balanced score Card1) Question  Add Targets to Balanced score Card
1) Question Add Targets to Balanced score Card
 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (15)

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.ppt
 
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
 
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdfCS3391 -OOP -UNIT – V NOTES FINAL.pdf
CS3391 -OOP -UNIT – V NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdfCS3391 -OOP -UNIT – IV NOTES FINAL.pdf
CS3391 -OOP -UNIT – IV NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
CS3391 -OOP -UNIT – III  NOTES FINAL.pdfCS3391 -OOP -UNIT – III  NOTES FINAL.pdf
CS3391 -OOP -UNIT – III NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
CS3391 -OOP -UNIT – II  NOTES FINAL.pdfCS3391 -OOP -UNIT – II  NOTES FINAL.pdf
CS3391 -OOP -UNIT – II NOTES FINAL.pdf
 
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
CS3391 -OOP -UNIT – I  NOTES FINAL.pdfCS3391 -OOP -UNIT – I  NOTES FINAL.pdf
CS3391 -OOP -UNIT – I NOTES FINAL.pdf
 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS3251-_PIC
 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

CS8080_IRT_UNIT - III T8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION.pdf

  • 1. P1WU UNIT – III: CLASSIFICATION Topic 8 FEATURE SELECTION OR DIMENSIONALITY REDUCTION AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 2. UNIT III : TEXT CLASSIFICATION AND CLUSTERING 1.A Characterization of Text Classification 2. Unsupervised Algorithms: Clustering 3. Naïve Text Classification 4. Supervised Algorithms 5. Decision Tree 6. k-NN Classifier 7. SVM Classifier 8. Feature Selection or Dimensionality Reduction 9. Evaluation metrics 10. Accuracy and Error 11. Organizing the classes 12. Indexing and Searching 13. Inverted Indexes 14. Sequential Searching 15. Multi-dimensional Indexing AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 3. FEATURE SELECTION OR DIMENSIONALITY REDUCTION AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 4. FEATURE SELECTION OR DIMENSIONALITY REDUCTION • Feature selection and dimensionality reduction allow us to • minimize the number of features in a dataset by only keeping features that are important. • In other words, we want to retain features that contain the most useful information that is needed by our model to • make accurate predictions while discarding redundant features that contain little to no information. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 5. FEATURE SELECTION OR DIMENSIONALITY REDUCTION • There are several benefits in performing feature selection and dimensionality reduction which include: • model interpretability, • minimizing overfitting as well as • reducing the size of the training set and consequently training time. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 6. Dimensionality Reduction • The number of input variables or features for a dataset is referred to as its dimensionality. • Dimensionality reduction refers to techniques that reduce the number of input variables in a dataset. • More input features often make a predictive modeling task more challenging to model, more generally referred to as the curse of dimensionality. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 7. Dimensionality Reduction • High-dimensionality statistics and dimensionality reduction techniques are often used for data visualization. • Nevertheless these techniques can be used in applied machine learning to • simplify a classification or regression dataset in order to better fit a predictive model. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 8. Problem With Many Input Variables • If your data is represented using rows and columns, such as in a spreadsheet, then • the input variables are the columns that are fed as input to a model to predict the target variable. Input variables are also called features. • We can consider the columns of data representing dimensions on an n-dimensional feature space and the rows of data as points in that space. • This is a useful geometric interpretation of a dataset. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 9. Problem With Many Input Variables • Having a large number of dimensions in the feature space can mean that the volume of that space is very large, and in turn, • the points that we have in that space (rows of data) often represent a small and non- representative sample. • This can dramatically impact the performance of machine learning algorithms fit on data with many input features, generally referred to as the “curse of dimensionality.” • Therefore, it is often desirable to reduce the number of input features. This reduces the number of dimensions of the feature space, hence the name “dimensionality reduction.” AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 10. Dimensionality Reduction AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 11. Dimensionality Reduction • Dimensionality reduction refers to techniques for reducing the number of input variables in training data. • When dealing with high dimensional data, it is often useful to reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data. This is called dimensionality reduction. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 12. Dimensionality Reduction • Fewer input dimensions often mean correspondingly fewer parameters or a simpler structure in the machine learning model, referred to as degrees of freedom. • A model with too many degrees of freedom is likely to overfit the training dataset and therefore may not perform well on new data. • It is desirable to have simple models that generalize well, and in turn, input data with few input variables. • This is particularly true for linear models where the number of inputs and the degrees of freedom of the model are often closely related. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 13. Techniques for Dimensionality Reduction AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 14. Techniques for Dimensionality Reduction • There are many techniques that can be used for dimensionality reduction. • Feature Selection Methods • Matrix Factorization • Manifold Learning • Auto encoder Methods AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 15. Feature Selection Methods • Feature selection is also called variable selection or attribute selection. • It is the automatic selection of attributes in your data (such as columns in tabular data) that are most relevant to the predictive modeling problem you are working on. • feature selection… • is the process of selecting a subset of relevant features for use in model construction AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 16. Feature Selection Methods • Feature selection is different from dimensionality reduction. • Both methods seek to reduce the number of attributes in the dataset, • but a dimensionality reduction method do so by creating new combinations of attributes, • where as feature selection methods include and exclude attributes present in the data without changing them. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 17. Feature Selection Methods • Examples of dimensionality reduction methods include • Principal Component Analysis, • Singular Value Decomposition and • Sammon’s Mapping. • Feature selection is itself useful, but it mostly acts as a filter, muting out features that aren’t useful in addition to your existing features. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 18. Feature Selection Algorithms AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 19. Filter Methods • Filter feature selection methods apply a statistical measure to assign a scoring to each feature. • The features are ranked by the score and either selected to be kept or removed from the dataset. • The methods are often univariate and consider the feature independently, or with regard to the dependent variable. • Some examples of some filter methods include the Chi squared test, information gain and correlation coefficient scores. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 20. Wrapper Methods • Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared, evaluated and compared to other combinations. • A predictive model us used to evaluate a combination of features and assign a score based on model accuracy. • The search process may be • methodical such as a best-first search, • it may stochastic such as a random hill-climbing algorithm, or • it may use heuristics, like forward and backward passes to add and remove features. • An example if a wrapper method is the recursive feature elimination algorithm. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 21. Embedded Methods • Embedded methods learn which features best contribute to the accuracy of the model while the model is being created. • The most common type of embedded feature selection methods are • regularization methods. • Regularization methods are also called penalization methods that introduce • additional constraints into the optimization of a predictive algorithm (such as a regression algorithm) that bias the model toward lower complexity (fewer coefficients). • Examples of regularization algorithms are the • LASSO, Elastic Net and Ridge Regression. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 22. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER – VIII PROFESSIONAL ELECTIVE – IV CS8080- INFORMATION RETRIEVAL TECHNIQUES