It is an important part of data preprocessing to encode labels appropriately in numerical form in order to make sure that the learning algorithm interprets the features correctly. Some labels may have order associated with them (ordinal features) while others may not have any orders associated with them (nominal features). Labels represents string labels of both ordinal and nominal features in relation to categorical feature set. OneHotEncoder will be discussed in the later posts. It is recommended to use OneHotEncoder in place of LabelEncoder.
Classes such as LabelEncoder or OneHotEncoder comes into picture which are part of sklearn.preprocessing module. This is where encoding comes into picture. You may note there is no order to the value of the color features.īoth of the above type of categorical features need to be converted into number / integer form. For example, if the color of car is a feature, color can take value such as. Nominal features – Features which are just labels or names and don’t have any order.You may note that there is an order to the values. For example, t-shirt size feature can have values in. Ordinal features – Features which has some order.
Many machine learning algorithms require the categorical data (labels) to be converted or encoded in the numerical or number form. When working with dataset having categorical features, you come across two different types of features such as the following.
Use LabelEncoder to Encode Multiple Columns All at Once.Use LabelEncoder to Encode Single Columns.
#TOKENIZE PANDAS COLUMN HOW TO#