I can't load an small AI model with nvidia 3090 in Windows (Cuda: Out Of Memory)


I come here because i am having problems loading any size of model with a nvidia 3090 (24Gb ram)
I just followed this video (and thousands more xD) and install everything that Jeffs said here:
Jeffs instruction to install tensorflow, cuda, etc in windows

Then I installed pytorch for cuda 11.6:

pip3 install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu116

I have tensoflow 2.9.1 and cuda 11.5.
When I load any model of microsoft/DialoGPT-* , the vram of the 3090 go directly to 24Gb so i go out of memory.
The code in Colab its working correctly with 16 Gb of VRAM.
I tried in windows without success so I installed Ubuntu today to check it and I have the same problem!
Except one time that it loaded correctly 10Gb (I thought that was the pytorch version) but the app wasn’t doing predictions because It needed libs and then it stopped working.

btw this is the code:

This is what happen with my VRAM loading DialoGPT-small:
Nvidia 3090 VRAM

How can I solve this?



Loading the pipeline 'microsoft/DialoGPT-medium'...
2022-07-06 03:38:34.293286: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-07-06 03:38:34.709282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 21670 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:0b:00.0, compute capability: 8.6
Loading the pipeline 'microsoft/DialogRPT-updown'...
Traceback (most recent call last):
  File "C:\Users\barbi\PycharmProjects\gpt2bot\run_bot.py", line 33, in <module>         
  File "C:\Users\barbi\PycharmProjects\gpt2bot\gpt2bot\telegram_bot.py", line 286, in run
  File "C:\Users\barbi\PycharmProjects\gpt2bot\gpt2bot\telegram_bot.py", line 240, in __init__
    self.ranker_dict = build_ranker_dict(device=device, **prior_ranker_weights, **cond_ranker_weights)
  File "C:\Users\barbi\PycharmProjects\gpt2bot\gpt2bot\utils.py", line 210, in build_ranker_dict
    pipeline=load_pipeline('sentiment-analysis', model='microsoft/DialogRPT-updown', **kwargs),
  File "C:\Users\barbi\PycharmProjects\gpt2bot\gpt2bot\utils.py", line 166, in load_pipeline
    return transformers.pipeline(task, **kwargs)
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\transformers\pipelines\__init__.py", line 684, in pipeline
    return pipeline_class(model=model, framework=framework, task=task, **kwargs)
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\transformers\pipelines\text_classification.py", line 68, in __init__
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\transformers\pipelines\base.py", line 770, in __init__
    self.model = self.model.to(self.device)
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 927, in to
    return self._apply(convert)
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
  [Previous line repeated 2 more times]
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
    param_applied = fn(param)
  File "C:\Users\barbi\anaconda3\envs\tensorflow\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 24.00 GiB total capacity; 2.07 GiB already allocated; 0 bytes free; 2.07 GiB reserved in tota
l by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_

nvidia-smi COlab
VRAM in colab after load all the models:


The problem was that I was using Tensorflow GPU + Pytorch GPU so it colapsed the VRAM. As far as I begun to use Tensorflow CPU + Pytorch GPU, everything works correctly.

Answered By – MagiCs ito

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply

(*) Required, Your email will not be published