I am training an LSTM deep learning model with time series sequences and labels.
I generate a tensorflow dataset "train_data" and "test_data"
train_data = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=total_window_size, sequence_stride=1, batch_size=batch_size, shuffle=is_shuffle).map(split_window).prefetch(tf.data.AUTOTUNE)
I then train the model with the above datasets
model.fit(train_data, epochs=epochs, validation_data = test_data, callbacks=callbacks)
And then run predictions to obtain the predicted values
train_labels = np.concatenate([y for x, y in train_data], axis=0) train_predictions = model.predict(train_data) test_labels = np.concatenate([y for x, y in test_data], axis=0) test_predictions = model.predict(test_data)
Here is my question: When I plot the train/test label data against the predicted values I get the following plot when I do not shuffle the sequences in the dataset building step:
Here the output with shuffling:
Question Why is this the case? I use the exact same source dataset for training and prediction. The dataset should be shuffled. Is there a chance that TensorFlow shuffles the data twice randomly, once during training and another time for predictions? I tried to supply a shuffle seed but that did not change things either.
The dataset gets shuffled everytime you iterate through it. What you get after your list comprehension isn’t in the same order as when you write
predict. If you don’t want that, pass:
Answered By – Nicolas Gervais