Keras with tensorflow-gpu totally freezes PC

Issue

I have pretty simple architecture lstm NN. After few epoch 1-2 my PC totally freezes I can’t even move my mouse :

Layer (type)                 Output Shape              Param #   
=================================================================
lstm_4 (LSTM)                (None, 128)               116224    
_________________________________________________________________
dropout_3 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_5 (Dense)              (None, 98)                12642     
=================================================================
Total params: 128,866
Trainable params: 128,866
Non-trainable params: 0

    # Same problem  with 2 layers LSTM  with dropout and Adam optimizer

    SEQUENCE_LENGTH =3, len(chars) = 98
    model = Sequential()
    model.add(LSTM(128, input_shape = (SEQUENCE_LENGTH, len(chars))))
    #model.add(Dropout(0.15))
    #model.add(LSTM(128))
    model.add(Dropout(0.10))
    model.add(Dense(len(chars), activation = 'softmax'))

    model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr=0.01), metrics=['accuracy'])

This is how I train:

history = model.fit(X, y, validation_split=0.20, batch_size=128, epochs=10, shuffle=True,verbose=2).history

NN needs 5 minutes to finish 1 epoch. Higher size of batch doesn’t mean that problem will occur faster. But more complex model can train more time achieving almost same accuracy – about 0.46 (full code here )

I have last up to date Linux Mint, 1070ti with 8GB, 32Gb ram

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26 Driver Version: 396.26 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:08:00.0 On | N/A |
| 0% 35C P8 10W / 180W | 303MiB / 8116MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

Libraries:

Keras==2.2.0
Keras-Applications==1.0.2
Keras-Preprocessing==1.0.1
keras-sequential-ascii==0.1.1
keras-tqdm==2.0.1
tensorboard==1.8.0
tensorflow==1.0.1
tensorflow-gpu==1.8.0

I have tried limit GPU memory usage, but it can’t be a problem here because during training it eats only 1 GB of gpu memory:

from keras.backend.tensorflow_backend 
import set_session config = tf.ConfigProto() 

config.gpu_options.per_process_gpu_memory_fraction = 0.9 

config.gpu_options.allow_growth = True set_session(tf.Session(config=config))

What is wrong here? How can I fix the problem?

Solution

This is some kind of weird for me but problem was related with my new just april 2018 released CPU from AMD. So having up to date linux kernel was crucial: following this guide https://itsfoss.com/upgrade-linux-kernel-ubuntu/ I updated kernel from 4.13 to 4.17 – now everything works

UPD: The motherboard was crashing the system as well, I have changed it – now everythings works well

Answered By – Rocketq

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published