I have a working back-propagation algorithm that correctly minimizes the error when iterated 100,000 times over the same singular input, for example [ 1, 0 ] -> 1.
But I am not sure how to extend this to train the neural network when there are multiple inputs.
Suppose we wish to train the XOR function, with four possible input and output states:
[ 0, 0 ] -> 0
[ 0, 1 ] -> 1
[ 1, 0 ] -> 1
[ 1, 1 ] -> 0
I have tried calling the back-propagation algorithm after every single input-output test data. The network doesn’t learn at all in this fashion even over large number of iterations.
Should I instead compute the accumulated error over the entire the training set (which is the 4 cases above) before calling back-propagation?
How is the accumulated errors to be stored and used for the entire training set in this example?
Both updating after every example, and accumulated versions are correct. They simply implement two slightly different algorithms, updating every step would make it an SGD (stochastic gradient descent) while the other GD (gradient descent). One can also do things in between, where you update every batch of data. The issues you are describing (lack of learning) have nothing to do with when the update takes place.
Note, that "correctly learning" one sample does not mean you have a bug free algorithm! A network where you only adjust a bias of the final layer should be able to do so if you only have one sample, but will fail for multiple. This is just one example of what can be broken yet pass your "single sample test".
Answered By – lejlot