In the documentation for tensorflow’s
CyclicalLearningRate, there is an argument called
According to the documentation,
A function. Scheduling function applied in cycle
However this explanation isn’t very clear.
What argument does this function take and what should it output?
More specifically, how does one use this?
It’s required for using the
CyclicalLearningRate but I couldn’t find a comprehensive explanation on how to use it.
You are right in that the documentation is not very enlightening in its usage. A better explanation along with an example can be found in Super Convergence with Cyclical Learning Rates in TensorFlow; quoting:
from tensorflow_addons.optimizers import CyclicalLearningRate cyclical_learning_rate = CyclicalLearningRate( initial_learning_rate=3e-7, maximal_learning_rate=3e-5, step_size=2360, scale_fn=lambda x: 1 / (2.0 ** (x - 1)), scale_mode='cycle')
The Scale function is the function controlling the change from the
initial learning rate to the maximal learning rate and back to the
initial learning rate. In [Smith’s] paper this is one of triangular,
triangular 2 or exponential range. In my own experiments with models
for generative images (for example super resolution) I have found
Triangular 2 to be most effective.
Triangular: A basic triangular cycle with no amplitude scaling:
lambda x: 1.0
Triangular 2: A basic triangular cycle that scales initial amplitude
by half with each cycle:
lambda x: 1 / (2.0 ** (x — 1))
Exponential range: A cycle that scales initial amplitude by gamma to
the power of the cycle iterations with each cycle:
lambda x: gamma ** x
And FWIW, the author also notes:
I believe that maybe a lack of clarity on these parameters is one of the reasons this TensorFlow learning rate schedule is not in wider use.
Answered By – desertnaut
This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0