Low accuracy of Transformer model for 1D Data

Issue

My dataset( Network traffic dataset where we do binary classification)-

enter image description here

Number of features in data is 25

This is the Transformer model –

embed_dim = 25  # Embedding size for each token
num_heads = 2  # Number of attention heads
ff_dim = 32  # Hidden layer size in feed forward network inside transformer

inputs = layers.Input(shape=(25,1,))
transformer_block = TransformerBlock(25, num_heads, ff_dim)
x = transformer_block(inputs)
x = layers.GlobalAveragePooling1D()(x)
x = layers.Dropout(0.1)(x)
x = layers.Dense(20, activation="relu")(x)
outputs = layers.Dense(1, activation="softmax")(x)

model = keras.Model(inputs=inputs, outputs=outputs)

model.compile("adam", "binary_crossentropy", metrics=["accuracy"])
history = model.fit(
    x_train, y_train, batch_size=32, epochs=50, validation_data=(x_test, y_test))

But the accuracy isn’t changing and is extremely poor and it isn’t changing with epochs-

Epoch 1/50
1421/1421 [==============================] - 9s 6ms/step - loss: 0.5215 - accuracy: 0.1192 - val_loss: 0.4167 - val_accuracy: 0.1173

Solution

Overall, one should be able to get to 100% (train) accuracy, as long as data is not contradictory. Arguably it is the best strategy to get there, before worrying about generalisation (test error), for the specific case:

  • final activation should be sigmoid (otherwise we have f(x) = exp(x) / exp(x) = 1
  • there is no need to dropout (it will only make training accuracy lower)
  • global pooling can remove important information – replace it with a Dense layer for the time being
  • normalise your data, your features are in quite wide ranges, this can cause training to struggle to converge
  • consider lowering your learning rate, as it will make it easier to overfit to training data

If all the above fail, just increase size of the model, as "20-25" range of your features might just not be big enough. Neural networks need quite a lot of redundancy to learn properly.

Personally I would also replace the whole model with just an MLP and verify everything works, I am not sure why transformer would be the model of choice here, and it will allow you to verify if the issue is with the model chosen, or with the code.

Finally – make sure that indeed 100% accuracy is obtainable, take your training data and check if there are any 2 datapoints that have exactly the same features, and different labels. If there are none – you should be able to get to 100% accuracy, it is just a matter of getting hyperparamters right.

Answered By – lejlot

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published