No Hidden Layer Neural Network Doesn't Equal Logistic Regression

Issue

In theory, a no-hidden layer neural network should be the same as a logistic regression, however, we collect wildly varied results. What makes this even more bewildering is that the test case is incredibly basic, yet the neural network fails to learn.

sklearn logistic regression

tensorflow no-hidden-layer neural network

We have attempted to choose the parameters of both models to be as similar as possible (same number of epochs, no L2 penalty, same loss function, no addition optimizations such as momentum, etc…). The sklearn logistic regression correctly finds the decision boundary consistently, with minimal variation. The tensorflow neural network is highly variable, where it looks like the bias is ‘struggling’ to train.

The code is included below to recreate this issue. An ideal solution would have the tensorflow decision boundary very similar to the logistic regression decision boundary.

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Conv1D, Dense, Flatten, Input, Concatenate, Dropout
from tensorflow.keras import Sequential, Model
from tensorflow.keras.optimizers import SGD

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.linear_model import LogisticRegression


X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

model = LogisticRegression(penalty = 'none',
                           solver='sag',
                           max_iter = 300,
                           tol = 1e-100)
model.fit(X, y)

model.score(X, y)

model.coef_.flatten()[1]

model.intercept_

w_1 = model.coef_.flatten()[0]
w_2 = model.coef_.flatten()[1]
b = model.intercept_
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.gca().set_aspect('equal')
plt.show()

X = np.array([[1, 1],
              [2, 2]])
y = np.array([0, 1])

optimizer = SGD(learning_rate=0.01,
                momentum = 0.0,
                nesterov = False,
                name = 'SGD')

inputs = Input(shape = (2,), name='inputs')
outputs = Dense(1, activation = 'sigmoid', name = 'outputs')(inputs)

model = Model(inputs = inputs, outputs = outputs, name = 'model')
model.compile(loss = 'bce', optimizer = optimizer, metrics = ['AUC', 'accuracy'])
model.fit(X, y, epochs = 100, verbose=False)

print(model.evaluate(X, y))

weights, bias = model.layers[1].get_weights()
weights = weights.flatten()

w_1 = weights[0]
w_2 = weights[1]
b = bias
n = np.linspace(0, 3, 10000, endpoint=False)
x_n = -w_1 / w_2 * n - b / w_2

plt.scatter(X[:, 0], X[:, 1], c = y)
plt.plot(n, x_n)
plt.grid()
plt.gca().set_aspect('equal')

plt.show()

Solution

A simple way to determine if this is actually a bug is to let the number of epochs in your perceptron go to some arbitrary large number (say, 5000). You’ll note that the decision boundary approaches that of your logistic regression model.

The natural question is why LR needs fewer iterations to achieve a near-optimal decision boundary. For strongly convex functions (like in your example), SAG enjoys much faster convergence than SGD. Thus, it takes SGD longer to converge to a "globally good" solution (though not many to converge to a locally good solution).

Answered By – erip

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published