Encode ordinal categorical

  • OrdinalEncoder is very confusing (so don’t worry if you don’t get it… it’s confusing)
  • Make sure you input a list of lists
import pandas as pd
import numpy as np
from sklearn.preprocessing import OrdinalEncoder

Create data

d = {'rating': ["first", "second", "third", "first", "second", "second"]}
df = pd.DataFrame(d)
df

rating
0 first
1 second
2 third
3 first
4 second
5 second

Initialise transformer

Read carefully.

list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric values, and should be sorted in case of numeric values.

categories is a list with each list having the expected categories in the ith column. In other words categories = [mapping_col_1{}, mapping_col_2{}, ... ]

My god this is confusing.

# WRONG: categories = ["first", "second", "third"] # first = 0, second = 1, third = 2
categories = [["first", "second", "third"]] # NOTE: LIST OF LIST!!! 
ordinal_encoder = OrdinalEncoder(categories)

Fit transformer

X = df['rating'].to_numpy().reshape(-1,1)
ordinal_encoder.fit(X)
OrdinalEncoder(categories=[['first', 'second', 'third']],
               dtype=<class 'numpy.float64'>)

Apply transformer

ordinal_encoder.transform(X)
array([[0.],
       [1.],
       [2.],
       [0.],
       [1.],
       [1.]])

Get back labels from integers

ordinal_encoder.inverse_transform(ordinal_encoder.transform(X))
array([['first'],
       ['second'],
       ['third'],
       ['first'],
       ['second'],
       ['second']], dtype=object)
categories = [["bad", "good", "neutral"]] 
enc = OrdinalEncoder(categories=categories)
X = np.array(["bad", "good", "neutral"]).reshape(-1,1)
enc.fit(X)
OrdinalEncoder(categories=[['bad', 'good', 'neutral']],
               dtype=<class 'numpy.float64'>)
enc.fit_transform(X)
array([[0.],
       [1.],
       [2.]])