How to Handle Categorical Data

How to handle Categorical Data
By
Srinivas Rao PrithviNag Kolla,
Masters in Data Science,
University of North Texas,
Email: prithvikolla8@gmail.com

Categorical Variable:
Generally Data falling into a fixed set of categories is called a categorical data.
Ex:
Survey of what type of phone brand people own comes under categorical data.
Id Name Phone Brand
1 Alex Apple
2 George Nokia
3 Chen Apple
4 prithvi Samsung
Dropping Categorical Variables:
If the columns in the data set have categorical values , which are not useful for
modeling , we can drop them.

Label Encoding:
Giving a unique integer value for the labels in the categorical column.
Ex:
Phone Brand
Apple
Nokia
Jio
Apple
Phone Brand
0
1
2
0
Label Encoding
Decision Trees and Random Forests work well with Label Encoding.
‘Apple’(0) < ‘Nokia’(1) < ‘Jio’(2)

One – Hot – Encoding
• As previously in Label Encoding, we gave an order based unique values to the
labels.
• It doesn’t work well, so we use the method of separating the categorical values
into columns and give ‘1’ as they are present in that row, if not ‘0’.
Ex:
Phone Brand
Apple
Nokia
Apple
Jio
Nokia
Apple Nokia Jio
1 0 0
0 1 0
1 0 0
0 0 1
0 1 0
One-Hot-Encoding

How to Handle Categorical Data

More Related Content

What's hot

Similar to How to Handle Categorical Data

Recently uploaded

How to Handle Categorical Data