Issue
From TF documentation:
"one_hot": Encodes each individual element in the input into an array the same size as the vocabulary.
alphabet = set("abcdefghijklmnopqrstuvwxyz")
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), output_mode='one_hot')
print(len(alphabet)) #26
print(one_hot_encoder("a").shape) #(27,)
As far as I understand it it should encode to a 26 shaped tensor. Why does it encode to a 27 shaped one? Should there be an extra label to represent "no class"?
Solution
The position 0 is reserved for the OOV
token (out of vocabulary). If you don’t want that, you can set num_oov_indices
to zero:
one_hot_encoder = tf.keras.layers.StringLookup(vocabulary=list(alphabet), num_oov_indices=0, output_mode='one_hot')
Answered By – AndrzejO
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0