Creating a keras loss function by averaging MAE across groups from another column


I’m trying to create a custom loss function in keras that takes grouped MAE and then averages it.

I have a training and a validation set (obtained by randomly splitting the training data): (X_train, y_train), (X_val, y_val)

X_train and X_val contain around 80 numeric features, and 2 one-hot encoded features from a categorical variable with 3 categories, say "a", "b", and "c".

y_train and y_val contain two numeric outputs.

I’m trying to create a neural network model with a custom loss function that instead of taking overall MAE, takes MAE by groups and then averages it.

My current attempt at loss function looks like:

def avg_mae(grouping_col_train, grouping_col_val):
    def custom_loss(y_true, y_pred):
        grouping_col = grouping_col_val
        if len(y_true) == len(grouping_col_train):
            grouping_col = grouping_col_train
        df = pd.DataFrame({
            "geid": grouping_col,
            "y_true": y_true,
            "y_pred": y_pred
        return np.mean(df.groupby("geid").apply(lambda x: mean_absolute_error(x["y_true"], x["y_pred"])))
    return custom_loss

The reason this does not work is because I’m taking batches during training:, 
          validation_data=(X_val, y_val), 

How do I pass the grouping column in the loss function for the batches? Also how to group average in the validation data as well?

Here’s some code and output to explain the difference between overall MAE and grouped average MAE:

df = pd.DataFrame({
    "y_true": np.random.randn(20000),
    "y_pred": np.random.randn(20000),
    "grouping_col": ["a"] * 8000 + ["b"] * 1000 + ["c"] * 11000

overall_mae = mean_absolute_error(df["y_true"], df["y_pred"])
print("overall_mae:", overall_mae)

grouped_mae = df.groupby(["grouping_col"]).apply(lambda x: mean_absolute_error(x["y_true"], x["y_pred"]))

avg_grouped_mae = np.mean(grouped_mae)
print("\navg_grouped_mae:", avg_grouped_mae)


overall_mae: 1.1325261117842

a    1.141619
b    1.069323
c    1.131659
dtype: float64

avg_grouped_mae: 1.1142004357897866


Firstly, I would advise against using pandas directly as input to keras model, since the way keras interacts with pandas is not officially documented. For example, while I can fit the model using dataframes

x = pd.DataFrame(np.random.randn(100,2))
y = pd.DataFrame(np.random.randn(100))

inp = tf.keras.Input(shape=(2,))
intermediate = tf.keras.layers.Dense(10)(inp)
output = tf.keras.layers.Dense(1)(intermediate)
model = tf.keras.models.Model(inputs=inp, outputs=output)
model.compile(loss='mae', optimizer='sgd'), y=y_true, batch_size=5, verbose=1)

I got an error trying to call the model on x


AttributeError: Exception encountered when calling layer "model_34" (type Functional).
'tuple' object has no attribute 'rank'

However, using pandas to preprocess data then feed the underlying numpy arrays, e.g., x.values and y.values, to the model is completely fine, so I will do exactly that in my following answer.

From your descriptions, I assume that your x and y data (both training and validation) are something like

x = pd.concat([pd.DataFrame(np.random.randn(100,2), columns=['feat_1', 'feat_2']), 
               pd.get_dummies(pd.Series(np.random.choice(["a", "b", "c"], size=100)))], axis=1)
y = pd.DataFrame(np.random.randn(100), columns=['val'])

That is, x is a dataframe of the form (only 2/80 features are used for simplicity)

    feat_1      feat_2      a   b   c
0   2.276849    -1.023108   0   1   0
1   -0.004519   0.001371    1   0   0
2   0.636205    0.426500    1   0   0
3   0.509223    -0.944194   1   0   0
4   -1.111043   1.238563    0   0   1
... ... ... ... ... ...

and y is a dataframe of the form

0   -1.434603
1   0.538255
2   1.242824
3   1.614832
4   0.770691
... ...

To start with, convert the one-hot encoded category of the samples to integer form, e.g.,

x_int_cat = x[['a','b','c']].rename(columns={'a': 0, 'b': 1, 'c': 2}).idxmax(axis=1)

Then, define an additional model input for this integer series and concatenate that with the original output. Doing so creates an output tensor containing both the prediction and the integer category.

inp = tf.keras.Input(shape=(len(x.columns),))
int_cat = tf.keras.Input(shape=(1,))
intermediate = tf.keras.layers.Dense(10)(inp)
output = tf.keras.layers.Dense(1)(intermediate)
final_output = tf.keras.layers.Concatenate(axis=-1)([output, int_cat])

model = tf.keras.models.Model(inputs=[inp, int_cat], outputs=final_output)

Now comes the main part, which is to write a custom loss function to calculate the grouped mae from the composite output tensor. To do so, we re-split the prediction tensor and category tensor from y_pred, then make use of TF’s segmented ops, more specifically unsorted_segment_mean, inside the function.

def group_mae(y_true, y_pred):
    pred, int_cat = y_pred[..., :-1], y_pred[..., -1]
    tf.print("integer category: \n", int_cat)
    tf.print("model prediction: \n", pred)
    tf.print("ground truth: \n", y_true)
    int_cat = tf.cast(int_cat, tf.int32)
    mae = tf.keras.metrics.mean_absolute_error(y_true, pred)
    grouped_mae = tf.math.unsorted_segment_mean(mae, int_cat, 3)
    avg_grouped_mae = tf.math.reduce_sum(grouped_mae) / tf.math.reduce_sum(tf.cast(grouped_mae != 0, tf.float32))
    tf.print("grouped mae: \n", grouped_mae)
    tf.print("average of grouped mae: \n", avg_grouped_mae)
    return avg_grouped_mae

The tf.print statements are just for debugging purposes and can be safely removed later. Also, only groups that have actual samples in the batch are taken into account when calculating avg_grouped_mae.

Compiling the model with this custom loss function and fit it with a batch size of 5 gives the following printout

model.compile(loss=group_mae, optimizer='sgd')[x.values, x_int_cat.values], y=y.values, batch_size=5, verbose=0)

integer category: 
 [2 2 2 2 1]
model prediction: 
ground truth: 
grouped mae: 
 [0 1.00912917 0.781218767]
average of grouped mae: 

which confirms the correctness of the implementation. Note that grouped_mae is always ordered so that the i-th entry corresponds to group i.

Once training is finished, you can call the model and extract the prediction tensor as

model([x.values, x_int_cat.values])[...,:-1]

Answered By – bui

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published