Issue
I’m trying to create a custom loss function in keras that takes grouped MAE and then averages it.
I have a training and a validation set (obtained by randomly splitting the training data): (X_train, y_train), (X_val, y_val)
X_train
and X_val
contain around 80 numeric features, and 2 one-hot encoded features from a categorical variable with 3 categories, say "a", "b", and "c"
.
y_train
and y_val
contain two numeric outputs.
I’m trying to create a neural network model with a custom loss function that instead of taking overall MAE, takes MAE by groups and then averages it.
My current attempt at loss function looks like:
def avg_mae(grouping_col_train, grouping_col_val):
def custom_loss(y_true, y_pred):
grouping_col = grouping_col_val
if len(y_true) == len(grouping_col_train):
grouping_col = grouping_col_train
df = pd.DataFrame({
"geid": grouping_col,
"y_true": y_true,
"y_pred": y_pred
})
return np.mean(df.groupby("geid").apply(lambda x: mean_absolute_error(x["y_true"], x["y_pred"])))
return custom_loss
The reason this does not work is because I’m taking batches during training:
model.fit(X_train,
y_train,
epochs=500,
batch_size=2**16,
validation_data=(X_val, y_val),
callbacks=[callback],
verbose=1)
How do I pass the grouping column in the loss function for the batches? Also how to group average in the validation data as well?
Here’s some code and output to explain the difference between overall MAE and grouped average MAE:
df = pd.DataFrame({
"y_true": np.random.randn(20000),
"y_pred": np.random.randn(20000),
"grouping_col": ["a"] * 8000 + ["b"] * 1000 + ["c"] * 11000
})
overall_mae = mean_absolute_error(df["y_true"], df["y_pred"])
print("overall_mae:", overall_mae)
grouped_mae = df.groupby(["grouping_col"]).apply(lambda x: mean_absolute_error(x["y_true"], x["y_pred"]))
print("\ngrouped_mae:")
print(grouped_mae)
avg_grouped_mae = np.mean(grouped_mae)
print("\navg_grouped_mae:", avg_grouped_mae)
Output:
overall_mae: 1.1325261117842
grouped_mae:
grouping_col
a 1.141619
b 1.069323
c 1.131659
dtype: float64
avg_grouped_mae: 1.1142004357897866
Solution
Firstly, I would advise against using pandas directly as input to keras model, since the way keras interacts with pandas is not officially documented. For example, while I can fit the model using dataframes
x = pd.DataFrame(np.random.randn(100,2))
y = pd.DataFrame(np.random.randn(100))
inp = tf.keras.Input(shape=(2,))
intermediate = tf.keras.layers.Dense(10)(inp)
output = tf.keras.layers.Dense(1)(intermediate)
model = tf.keras.models.Model(inputs=inp, outputs=output)
model.compile(loss='mae', optimizer='sgd')
model.fit(x=x, y=y_true, batch_size=5, verbose=1)
I got an error trying to call the model on x
model(x)
AttributeError: Exception encountered when calling layer "model_34" (type Functional).
'tuple' object has no attribute 'rank'
However, using pandas to preprocess data then feed the underlying numpy arrays, e.g., x.values
and y.values
, to the model is completely fine, so I will do exactly that in my following answer.
From your descriptions, I assume that your x
and y
data (both training and validation) are something like
x = pd.concat([pd.DataFrame(np.random.randn(100,2), columns=['feat_1', 'feat_2']),
pd.get_dummies(pd.Series(np.random.choice(["a", "b", "c"], size=100)))], axis=1)
y = pd.DataFrame(np.random.randn(100), columns=['val'])
That is, x
is a dataframe of the form (only 2/80 features are used for simplicity)
feat_1 feat_2 a b c
0 2.276849 -1.023108 0 1 0
1 -0.004519 0.001371 1 0 0
2 0.636205 0.426500 1 0 0
3 0.509223 -0.944194 1 0 0
4 -1.111043 1.238563 0 0 1
... ... ... ... ... ...
and y
is a dataframe of the form
val
0 -1.434603
1 0.538255
2 1.242824
3 1.614832
4 0.770691
... ...
To start with, convert the one-hot encoded category of the samples to integer form, e.g.,
x_int_cat = x[['a','b','c']].rename(columns={'a': 0, 'b': 1, 'c': 2}).idxmax(axis=1)
Then, define an additional model input for this integer series and concatenate that with the original output. Doing so creates an output tensor containing both the prediction and the integer category.
inp = tf.keras.Input(shape=(len(x.columns),))
int_cat = tf.keras.Input(shape=(1,))
intermediate = tf.keras.layers.Dense(10)(inp)
output = tf.keras.layers.Dense(1)(intermediate)
final_output = tf.keras.layers.Concatenate(axis=-1)([output, int_cat])
model = tf.keras.models.Model(inputs=[inp, int_cat], outputs=final_output)
Now comes the main part, which is to write a custom loss function to calculate the grouped mae from the composite output tensor. To do so, we re-split the prediction tensor and category tensor from y_pred
, then make use of TF’s segmented ops, more specifically unsorted_segment_mean
, inside the function.
def group_mae(y_true, y_pred):
pred, int_cat = y_pred[..., :-1], y_pred[..., -1]
tf.print("____________")
tf.print("integer category: \n", int_cat)
tf.print("model prediction: \n", pred)
tf.print("ground truth: \n", y_true)
int_cat = tf.cast(int_cat, tf.int32)
mae = tf.keras.metrics.mean_absolute_error(y_true, pred)
grouped_mae = tf.math.unsorted_segment_mean(mae, int_cat, 3)
avg_grouped_mae = tf.math.reduce_sum(grouped_mae) / tf.math.reduce_sum(tf.cast(grouped_mae != 0, tf.float32))
tf.print("grouped mae: \n", grouped_mae)
tf.print("average of grouped mae: \n", avg_grouped_mae)
return avg_grouped_mae
The tf.print
statements are just for debugging purposes and can be safely removed later. Also, only groups that have actual samples in the batch are taken into account when calculating avg_grouped_mae
.
Compiling the model with this custom loss function and fit it with a batch size of 5 gives the following printout
model.compile(loss=group_mae, optimizer='sgd')
model.fit(x=[x.values, x_int_cat.values], y=y.values, batch_size=5, verbose=0)
____________
integer category:
[2 2 2 2 1]
model prediction:
[[-0.394804418]
[0.495555431]
[-0.533690453]
[0.539922774]
[0.640398502]]
ground truth:
[[1.62094569]
[0.77069056]
[-0.244118527]
[1.08434057]
[-0.368730634]]
grouped mae:
[0 1.00912917 0.781218767]
average of grouped mae:
0.895173967
____________
....
which confirms the correctness of the implementation. Note that grouped_mae
is always ordered so that the i-th entry corresponds to group i.
Once training is finished, you can call the model and extract the prediction tensor as
model([x.values, x_int_cat.values])[...,:-1]
Answered By – bui
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0