Are those Keras and PyTorch snippets equivalent?

Issue

I am wondering if I succeeded in translating the following definition in PyTorch to Keras?

In PyTorch, the following multi-layer perceptron was defined:

from torch import nn
hidden = 128
def mlp(size_in, size_out, act=nn.ReLU):
    return nn.Sequential(
        nn.Linear(size_in, hidden),
        act(),
        nn.Linear(hidden, hidden),
        act(),
        nn.Linear(hidden, hidden),
        act(),
        nn.Linear(hidden, size_out),
    )

My translation is

from tensorflow import keras

from keras import layers

hidden = 128

def mlp(size_in, size_out, act=keras.layers.ReLU):
    return keras.Sequential(
        [
            layers.Dense(hidden, activation=None, name="layer1", input_shape=(size_in, 1)),
            act(),
            layers.Dense(hidden, activation=None, name="layer2", input_shape=(hidden, 1)),
            act(),
            layers.Dense(hidden, activation=None, name="layer3", input_shape=(hidden, 1)),
            act(),
            layers.Dense(size_out, activation=None, name="layer4", input_shape=(hidden, 1))
        ])

I am particularly confused about the input/output arguments, because that seems to be where tensorflow and PyTorch differ.

From the documentation:

When a popular kwarg input_shape is passed, then keras will create an
input layer to insert before the current layer. This can be treated
equivalent to explicitly defining an InputLayer.

So, did I get it right?

Thank you so much!

Solution

In Keras, you can provide an input_shape for the first layer or alternatively use the tf.keras.layers.Input layer. If you do not provide either of these details, the model gets built the first time you call fit, eval, or predict, or the first time you call the model on some input data. So the input shape will actually be inferred if you do not provide it. See the docs for more details. PyTorch generally infers the input shape at runtime.

def keras_mlp(size_in, size_out, act=layers.ReLU):
    return keras.Sequential([layers.Input(shape=(size_in,)),
                             layers.Dense(hidden, name='layer1'),
                             act(),
                             layers.Dense(hidden, name='layer2'),
                             act(),
                             layers.Dense(hidden, name='layer3'),
                             act(),
                             layers.Dense(size_out, name='layer4')])

def pytorch_mlp(size_in, size_out, act=nn.ReLU):
    return nn.Sequential(nn.Linear(size_in, hidden),
                         act(),
                         nn.Linear(hidden, hidden),
                         act(),
                         nn.Linear(hidden, hidden),
                         act(),
                         nn.Linear(hidden, size_out))

You can compare their summary.

  • For Keras:

    >>> keras_mlp(10, 5).summary()
    Model: "sequential_2"
    _________________________________________________________________
     Layer (type)                Output Shape              Param #   
    =================================================================
     layer1 (Dense)              (None, 128)               1408      
    
     re_lu_6 (ReLU)              (None, 128)               0         
    
     layer2 (Dense)              (None, 128)               16512     
    
     re_lu_7 (ReLU)              (None, 128)               0         
    
     layer3 (Dense)              (None, 128)               16512     
    
     re_lu_8 (ReLU)              (None, 128)               0         
    
     layer4 (Dense)              (None, 5)                 645       
    
    =================================================================
    Total params: 35,077
    Trainable params: 35,077
    Non-trainable params: 0
    _________________________________________________________________
    
  • For PyTorch:

    >>> summary(pytorch_mlp(10, 5), (1,10))
    ============================================================================
    Layer (type:depth-idx)                   Output Shape              Param #
    ============================================================================
    Sequential                               [1, 5]                    --
    ├─Linear: 1-1                            [1, 128]                  1,408
    ├─ReLU: 1-2                              [1, 128]                  --
    ├─Linear: 1-3                            [1, 128]                  16,512
    ├─ReLU: 1-4                              [1, 128]                  --
    ├─Linear: 1-5                            [1, 128]                  16,512
    ├─ReLU: 1-6                              [1, 128]                  --
    ├─Linear: 1-7                            [1, 5]                    645
    ============================================================================
    Total params: 35,077
    Trainable params: 35,077
    Non-trainable params: 0
    Total mult-adds (M): 0.04
    ============================================================================
    Input size (MB): 0.00
    Forward/backward pass size (MB): 0.00
    Params size (MB): 0.14
    Estimated Total Size (MB): 0.14
    ============================================================================
    

Answered By – Ivan

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published