In theory, a no-hidden layer neural network should be the same as a logistic regression, however, we collect wildly varied results. What makes this even more bewildering is that the test case is incredibly basic, yet the neural network fails to learn.
We have attempted to choose the parameters of both models to be as similar as possible (same number of epochs, no L2 penalty, same loss function, no addition optimizations such as momentum, etc…). The sklearn logistic regression correctly finds the decision boundary consistently, with minimal variation. The tensorflow neural network is highly variable, where it looks like the bias is ‘struggling’ to train.
The code is included below to recreate this issue. An ideal solution would have the tensorflow decision boundary very similar to the logistic regression decision boundary.
import numpy as np import tensorflow as tf from tensorflow.keras.layers import Conv1D, Dense, Flatten, Input, Concatenate, Dropout from tensorflow.keras import Sequential, Model from tensorflow.keras.optimizers import SGD import matplotlib.pyplot as plt %matplotlib inline from sklearn.linear_model import LogisticRegression X = np.array([[1, 1], [2, 2]]) y = np.array([0, 1]) model = LogisticRegression(penalty = 'none', solver='sag', max_iter = 300, tol = 1e-100) model.fit(X, y) model.score(X, y) model.coef_.flatten() model.intercept_ w_1 = model.coef_.flatten() w_2 = model.coef_.flatten() b = model.intercept_ n = np.linspace(0, 3, 10000, endpoint=False) x_n = -w_1 / w_2 * n - b / w_2 plt.scatter(X[:, 0], X[:, 1], c = y) plt.plot(n, x_n) plt.gca().set_aspect('equal') plt.show() X = np.array([[1, 1], [2, 2]]) y = np.array([0, 1]) optimizer = SGD(learning_rate=0.01, momentum = 0.0, nesterov = False, name = 'SGD') inputs = Input(shape = (2,), name='inputs') outputs = Dense(1, activation = 'sigmoid', name = 'outputs')(inputs) model = Model(inputs = inputs, outputs = outputs, name = 'model') model.compile(loss = 'bce', optimizer = optimizer, metrics = ['AUC', 'accuracy']) model.fit(X, y, epochs = 100, verbose=False) print(model.evaluate(X, y)) weights, bias = model.layers.get_weights() weights = weights.flatten() w_1 = weights w_2 = weights b = bias n = np.linspace(0, 3, 10000, endpoint=False) x_n = -w_1 / w_2 * n - b / w_2 plt.scatter(X[:, 0], X[:, 1], c = y) plt.plot(n, x_n) plt.grid() plt.gca().set_aspect('equal') plt.show()
A simple way to determine if this is actually a bug is to let the number of epochs in your perceptron go to some arbitrary large number (say, 5000). You’ll note that the decision boundary approaches that of your logistic regression model.
The natural question is why LR needs fewer iterations to achieve a near-optimal decision boundary. For strongly convex functions (like in your example), SAG enjoys much faster convergence than SGD. Thus, it takes SGD longer to converge to a "globally good" solution (though not many to converge to a locally good solution).
Answered By – erip