## Issue

The model worked well before optimization step. However, when I want to optimize my model, the error message showed up:

Incompatible shapes between op input and calculated input gradient.

Forward operation: softmax_cross_entropy_with_logits_sg_12. Input

index: 0. Original input shape: (16, 1). Calculated input gradient

shape: (16, 16)

the following is my code.

```
import tensorflow as tf;
batch_size = 16
size = 400
labels = tf.placeholder(tf.int32, batch_size)
doc_encode = tf.placeholder(tf.float32, [batch_size, size])
W1 = tf.Variable(np.random.rand(size, 100), dtype=tf.float32, name='W1')
b1 = tf.Variable(np.zeros((100)), dtype=tf.float32, name='b1')
W2 = tf.Variable(np.random.rand(100, 1),dtype=tf.float32, name='W2')
b2 = tf.Variable(np.zeros((1)), dtype=tf.float32, name='b2')
D1 = tf.nn.relu(tf.matmul(doc_encode, W1) + b1)
D2 = tf.nn.selu(tf.matmul(D1, W2) + b2)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=D2))
optim = tf.train.GradientDescentOptimizer(0.01).minimize(cost, aggregation_method=tf.AggregationMethod.EXPERIMENTAL_TREE)
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
_cost, _optim = sess.run([cost, optim], {labels:np.array([1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]), doc_encode: np.random.rand(batch_size, size)})
```

## Solution

Correct following things.

First,

Change placeholders input shape to this

```
X = tf.placeholder(tf.int32, shape=[None,400]
Y = tf.placeholder(tf.float32, shape=[None,1]
```

Why **None** because this gives you freedom of feeding any size. This is preferred because while training you want to use mini batch but while predicting or inference time, you will generally feed single thing. Marking it None, takes care of that.

Second,

Correct your weight initialization, you are feeding in random values, they may be negatives too. It is always recommended to initialize with slight positive value. (I see you are using relu as activation, the Gradient of which is zero for negative weight values, so those weights are never updated in Gradient descent, in other words those are useless weights)

Third,

Logits are result you obtain from `W2*x + b2`

. And that `tf.nn.softmax_cross.....(..)`

automatically applied softmax activation. So no need of SeLu for last layer.

Answered By – coder3101

**This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **