Why is StringLookup from producing an extra label?

Issue

From TF documentation:
"one_hot": Encodes each individual element in the input into an array the same size as the vocabulary.

alphabet = set("abcdefghijklmnopqrstuvwxyz")
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), output_mode='one_hot')
print(len(alphabet)) #26
print(one_hot_encoder("a").shape) #(27,)

As far as I understand it it should encode to a 26 shaped tensor. Why does it encode to a 27 shaped one? Should there be an extra label to represent "no class"?

Solution

The position 0 is reserved for the OOV token (out of vocabulary). If you don’t want that, you can set num_oov_indices to zero:

one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), num_oov_indices=0, output_mode='one_hot')

Answered By – AndrzejO

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published