Keras bug when adding preprocessing layer to sequential model

Issue

I created a Sequential preprocessing layer model like so:

import tensorflow.keras as keras
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dropout, RandomRotation
from tensorflow.keras.utils import set_random_seed; set_random_seed(72)
import matplotlib.pyplot as plt

(ax, ay), (qx, qy) = cifar10.load_data()
ay = keras.utils.to_categorical(ay, 10)
qy = keras.utils.to_categorical(qy, 10)
ax = ax.astype('float32'); ax /= 255;
qx = qx.astype('float32'); qx /= 255;

DA = Sequential([RandomRotation(180/360,fill_mode="nearest",interpolation="bilinear", input_shape=(32, 32, 3))])  

I then printed the output of the first image using:

X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()

transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()

Result:

enter image description here
enter image description here

This is the expected output. The network added a random rotation to the image.

Then, I added the preprocessing model to another sequential model including nothing but it and a Dropout layer.

model = Sequential()
model.add(DA)
model.add(Dropout(0.25))

Finally, I printed the images again in the same way as before without using the new model at all:

X=ax[0:1,:,:,:]
plt.imshow(X[0])
plt.show()

transformedX=DA(X).numpy()
plt.imshow(transformedX[0,:,:,:])
plt.show()

Result:

enter image description here
enter image description here

I got this result both locally (in Spyder) and using Google Colab. Here‘s the notebook if you want to try it out.

From here, every other time you run the program, every image will look like the original. To get this result again, I need to Restart Runtime (in Google Colab), %reset does not seem to work locally.

If I remove the input_shape=(32, 32, 3) line from the preprocessing layer, the problem does not occur. However, I was under the impression this was necessary to include in the first layer of a model.

Is this a real bug or a problem in my code?

If it is a bug, is this particular of some outdated version of Keras or Tensorflow?

Solution

The reason for this is threefold. It is related to

  • How TF handle the training argument (or the lack thereof) passed to a layer call
  • How Dropout layer handles training=None
  • How TF constructs Sequential models

Note that my answer is based on TF v2.9.1.


The training argument

Some layers, such as Dropout or RandomRotation, behave differently during training and inferencing. That’s why at their core, layers always try to identify if the call to them is made during training or not whenever they are called via () (syntactic sugar for __call__). Internally, the training flag is set to, in priority order,

  1. training argument with non-None value explicitly passed to the layer call e.g., when you call the layer as layer(inputs, training=True/False)
  2. training argument determined by this very same 4-check procedure for its parent layer in a layer call chain.
  3. learning_phase variable of the backend if that variable has been set. Checking the variable’s state is done by keras.backend.global_learning_phase_is_set() and getting its value is done by keras.backend.learning_phase().
  4. Default value of training argument in this layer call signature. Note that call ≠ __call__. The former is a TF-defined method and the latter is one of many built-in magic methods in Python, although __call__ implementation of base layer eventually invokes call at some point.

If none of the 4 checks yields a non-None value, then training=None is used.

RandomRotation layer only rotates images if it sees training=True. Your call to it failed the first three checks but passed the last thanks to training being defaulted to True in its call signature. Thus, the layer saw training=True and behaved as expected. However, as soon as you added Dropout, everything went south so what’s happening?


Dropout and training=None

It turns out that a call to Dropout with eventual training=None can actually set the state (but not the value) of the learning_phase variable. This happens easily because unlike RandomRotation, Dropout has default training=None which provides no guard for check 4.

>>> keras.backend.global_learning_phase_is_set()
False
>>> _ = tf.keras.layers.Dropout(.25)([1,2,3])
>>> keras.backend.global_learning_phase_is_set()
True

Once that happens, check 4 is essentially ignored for all subsequent calls to any layer: They will always see that learning_phase was set and use training=learning_phase (which defaults to 0) whenever reaching check 3. Your later calls to RandomRotation fell victim to this, causing the layer to think it was called during inference and thus returned the input as-is.

More precisely, Dropout won’t accept None for training and will directly fetch learning_phase regardless of its state, by calling learning_phase() without checking if global_learning_phase_is_set() first. This unchecked learning_phase() call will set the state for learning_phase in the process.

>>> keras.backend.global_learning_phase_is_set()
False
>>> keras.backend.learning_phase()
0
>>> keras.backend.global_learning_phase_is_set()
True

But I did not call Dropout?

Here comes the final part, which is about the way Sequential adds layers to its stack. When you add the first layer that is not a keras tensor but has a known input shape, sequential will create an input keras tensor with that exact same shape and immediately call the layer on that tensor to secure an output keras tensor. This is possible because the input shape is already known.

>>> Sequential([RandomRotation(0.5)]).outputs is None
True
>>> Sequential([RandomRotation(0.5, input_shape=(2,2,1))]).outputs
[<KerasTensor: shape=(None, 2, 2, 1) dtype=float32 (created by layer 'random_rotation_7')>]

From there, each time you add another layer, the sequential model will check if it already has an output keras tensor (i.e., checking if the input shape is already known). If so, it will again immediately call the new layer on the current output tensor to obtain an updated one. Otherwise, the input shape is unknown, and the model defers the construction of output keras tensor until later when it is called on actual input data.

>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> class DropoutWithCount(Dropout):
...     def __init__(self, rate, noise_shape=None, seed=None, **kwargs):
...         super().__init__(rate, noise_shape, seed, **kwargs)
...         self.count = 0
...
...     def call(self, inputs, training=None):
...         self.count += 1
...         print(f"Dropout called with training={training}, call counts = {self.count}")
...         return super().call(inputs, training)
...
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1)), DropoutWithCount(.25)])
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(DropoutWithCount(.25))
Dropout called with training=None, call counts = 1
>>> m = Sequential([RandomRotation(0.5), DropoutWithCount(.25)])
>>>

So yes, since the input shape is known, the Dropout layer will be automatically called without any training argument as soon as it is added to sequential, which consequently sets the state for learning_phase.


What should I do?

Always pass training argument properly to your model/layer calls as explicit argument checking ranks highest in precedence. Otherwise, don’t pass training to any calls but instead set global value learning_phase to either True or False via keras.backend.set_learning_phase(True/False) as this will take precedence over the default training values of the layers.

>>> from tensorflow.keras.models import Sequential
>>> from tensorflow.keras.layers import RandomRotation, Dropout
>>> import keras as keras
>>> import numpy as np
>>> img = np.array([[[[1],[2]],[[3],[4]]]])
>>> m = Sequential([RandomRotation(0.5, input_shape=(2,2,1))])
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.6862597],
         [3.3725195]],

        [[1.6274806],
         [3.3137403]]]], dtype=float32)>
>>> m1 = Sequential()
>>> m1.add(m)
>>> m1.add(Dropout(.25))
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.],
         [2.]],

        [[3.],
         [4.]]]], dtype=float32)>
>>> m(img, training=True)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[1.8427435],
         [3.685487 ]],

        [[1.314513 ],
         [3.1572566]]]], dtype=float32)>
>>> keras.backend.set_learning_phase(True)
>>> m(img)
<tf.Tensor: shape=(1, 2, 2, 1), dtype=float32, numpy=
array([[[[3.3871531],
         [3.3064234]],

        [[1.6935766],
         [1.612847 ]]]], dtype=float32)>

Answered By – bui

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published