Predict text binary classification with RNN and didn't get expected output


I’m doing Amazon review sentiment analysis with RNN and LSTM.
df2[‘Texts’] are Amazon customer reviews, and df2[‘label’] are binary integer 0 or 1.

tokenizer = Tokenizer(num_words=5000, split=' ') 
encoded_docs = tokenizer.texts_to_sequences(df2['Text'].values)
X = pad_sequences(encoded_docs, maxlen = 1000)
X.shape  # (3872, 1000)
y = df2['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 42)

This is my model:

model = tf.keras.Sequential()
model.add(Embedding(1000, 64, input_length = X.shape[1]))
model.add(LSTM(176, dropout=0.4, recurrent_dropout=0.4))
model.add(tf.keras.layers.Dense(32, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))


history =, y_train, epochs=13, batch_size=batch_size, validation_data=(X_test, y_test))

The validation accuracy for the last epoch is around 0.86.

And then I tried to predict the result of a text:

def anal_sent(my_text, my_model, my_tokenizer):
  encoded_text = my_tokenizer.texts_to_sequences(my_text)
  X = pad_sequences(encoded_text, maxlen = 1000)
  return (my_model.predict(X))

ex_review = "I bought it for my son and he says he likes it."
print(anal_sent(ex_review, model, tokenizer)) # this tokenizer is what I used for training dataset.

But the output is an array like [[0.73], [0.68], …] instead of 0 or 1.

Is there anything wrong? What’s the correct way to make prediction?


texts_to_sequences should receive a list of texts. Otherwise it will interpret each word as a sentence. Try this:

ex_review = ["I bought it for my son and he says he likes it."] 

Answered By – AndrzejO

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published