system auto reboot when tensorflow model is too large

Issue

I’m using a nvidia GTX1080 gpu(8GB) to run Inception model on tensorflow, when I set batch_size = 16 and image_size = 400, then after I start the program, my ubuntu14.04 will auto reboot.

Solution

Make sure it is not a power supply unit problem. I was observing strange occasional reboots on my development machine. As I was increasing the size of input (batch size, larger NN) the rate of reboots was increasing as well. Turned out to be a PSU problem. A quick check is to limit GPU power consumption and see if this behavior will go away. For instance, you can limit power to about 150 watts with this command (you’ll need a sudo rights):

sudo nvidia-smi -pl 150

Answered By – Sergey

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published