How to force tensorflow to use all available GPUs?


I have an 8 GPU cluster and when I run a piece of Tensorflow code from Kaggle (pasted below), it only utilizes a single GPU instead of all 8. I confirmed this using nvidia-smi.

# Build model
outputs = Conv2D(1, (1, 1), activation='sigmoid') (c9)

model = Model(inputs=[inputs], outputs=[outputs])

sgd = optimizers.SGD(lr=0.03, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd, loss='binary_crossentropy', metrics=[mean_iou])
# Fit model
results =, Y_train, validation_split=0.05, batch_size = 32, verbose=1, epochs=100)

I would like to use mxnet or some other method to run this code on all available GPUs. However, I’m not sure how to do this. All the resources only show how to do this on mnist data set. I have my own data set that I am reading differently. Hence, not quite sure how to amend the code.


TL;DR: Use tf.distribute.MirroredStrategy() as a scope, like

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
    [...create model as you would otherwise...]

If you do not specify any arguments, tf.distribute.MirroredStrategy() will use all available GPUs. You can also specify which ones to use if you want, like this: mirrored_strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"]).

Refer to this Distributed training with TensorFlow guide for implementation details and other strategies.

Earlier answer (now outdated: deprecated, removed as of April 1, 2020.):
Use multi_gpu_model() from Keras. ()


TensorFlow 2.0 now has the tf.distribute module, “a library for running a computation across multiple devices”. It builds on the concept of “distribution strategies”. You can specify the distribution strategy and then use it as a scope. TensorFlow will split the input, parallelize the calculations, and join the outputs for you basically transparently. Backpropagation is also subject to this. Since all processing is now done behind the scenes, you might want to familiarize yourself with the available strategies and their parameters as they might affect the speed of your training a lot. For example, do you want variables to reside on the CPU? Then use tf.distribute.experimental.CentralStorageStrategy(). Refer to the Distributed training with TensorFlow guide for more info.

Earlier answer (now outdated, leaving it here for reference):

From the Tensorflow Guide:

If you have more than one GPU in your system, the GPU with the lowest ID will be selected by default.

If you want to use multiple GPUs, unfortunately you have to manually specify what tensors to put on each GPU like

with tf.device('/device:GPU:2'):

More info in the Tensorflow Guide Using Multiple GPUs.

In terms of how to distribute your network over the multiple GPUs, there are two main ways of doing that.

  1. You distribute your network layer-wise over the GPUs. This is easier to implement but will not yield a lot of performance benefit because the GPUs will wait for each other to complete the operation.

  2. You create separate copies of your network, called “towers” on each GPU. When you feed the octuple network, you break up you input batch into 8 parts, and distribute them. Let the network forward propagate, then sum the gradients, and do the backward propagation. This will result in an almost-linear speedup with the number of GPUs. It’s much more difficult to implement, however, because you also have to deal with complexities related to batch normalization, and very advisable to make sure you randomize your batch properly. There is a nice tutorial here. You should also review the Inception V3 code referenced there for ideas how to structure such a thing. Especially _tower_loss(), _average_gradients() and the part of train() starting with for i in range(FLAGS.num_gpus):.

In case you want to give Keras a try, it now has simplified multi-gpu training significantly with multi_gpu_model(). It can do all the heavy lifting for you.

Answered By – Peter Szoldan

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published