In CNN, what does the size of a layer mean exactly?


I have been reading about the basic architecture of neural networks. When going through this explanation of the Pix2Pix GAN, I am struggling to understand this chunk of code:

def downsample(filters, size, ..):
    tf.keras.layers.Conv2D(filters, size, ..)

downsample(128, 4) (batch_size, 64, 64, 128)
downsample(256, 4) (batch_size, 32, 32, 256)
downsample(512, 4) (batch_size, 16, 16, 512)

What does it mean for a layer to have a size of 4? Also, how are X and Y in (batch_size, X, Y, Z) being halved each downsample?


Here the size is the size of the kernel with which you will apply your convolution. In other term it is the number of neighbours (along one dimension) that will be taking into account for computing one output pixel. For instance if size = 3, you will get all the pixels around each pixel (in a 3*3 grid). This article explains the concept of convolution with size = 3 examples.

Here X and Y are the number of pixel in each hidden representative map. Sometimes you want to summarize the information and so reduce its size. To do so the classic methods are pooling (aggregates patch of pixels in one pixel) or striding (the convolution skips pixels regularly).

Answered By – Valentin Goldité

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published