Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
How to Handle Categorical Data
1. How to handle Categorical Data
By
Srinivas Rao PrithviNag Kolla,
Masters in Data Science,
University of North Texas,
Email: prithvikolla8@gmail.com
2. Categorical Variable:
Generally Data falling into a fixed set of categories is called a categorical data.
Ex:
Survey of what type of phone brand people own comes under categorical data.
Id Name Phone Brand
1 Alex Apple
2 George Nokia
3 Chen Apple
4 prithvi Samsung
Dropping Categorical Variables:
If the columns in the data set have categorical values , which are not useful for
modeling , we can drop them.
3. Label Encoding:
Giving a unique integer value for the labels in the categorical column.
Ex:
Phone Brand
Apple
Nokia
Jio
Apple
Phone Brand
0
1
2
0
Label Encoding
Decision Trees and Random Forests work well with Label Encoding.
‘Apple’(0) < ‘Nokia’(1) < ‘Jio’(2)
4. One – Hot – Encoding
• As previously in Label Encoding, we gave an order based unique values to the
labels.
• It doesn’t work well, so we use the method of separating the categorical values
into columns and give ‘1’ as they are present in that row, if not ‘0’.
Ex:
Phone Brand
Apple
Nokia
Apple
Jio
Nokia
Apple Nokia Jio
1 0 0
0 1 0
1 0 0
0 0 1
0 1 0
One-Hot-Encoding