63

Note : this question was initially asked on github, but it was asked to be here instead

I'm having trouble running tensorflow on gpu, and it does not seems to be the usual cuda's configuration problem, because everything seems to indicate cuda is properly setup.

The main symptom: when running tensorflow, my gpu is not detected (the code being run, and its output).

What differs from usual issues is that cuda seems properly installed and running ./deviceQuery from cuda samples is successful (output).

I have two graphical cards:

  • an old GTX 650 used for my monitors (I don't want to use that one with tensorflow)
  • a GTX 1060 that I want to dedicate to tensorflow

I use:

I've tried:

  • adding /usr/local/cuda/bin/ to $PATH
  • forcing gpu placement in tensorflow script using with tf.device('/gpu:1'): (and with tf.device('/gpu:0'): when it failed, for good measure)
  • whitelisting the gpu I wanted to use with CUDA_VISIBLE_DEVICES, in case the presence of my old unsupported card did cause problems
  • running the script with sudo (because why not)

Here are the outputs of nvidia-smi and nvidia-debugdump -l, in case it's useful.

At this point, I feel like I have followed all the breadcrumbs and have no idea what I could try else. I'm not even sure if I'm contemplating a bug or a configuration problem. Any advice about how to debug this would be greatly appreciated. Thanks!

Update: with the help of Yaroslav on github, I gathered more debugging info by raising log level, but it doesn't seem to say much about the device selection : https://gist.github.com/oelmekki/760a37ca50bf58d4f03f46d104b798bb

Update 2: Using theano detects gpu correctly, but interestingly it complains about cuDNN being too recent, then fallback to cpu (code ran, output). Maybe that could be the problem with tensorflow as well?

4
  • 1
    as another sanity check you could try another framework (like Theano) with GPU to see if it works, perhaps your GPU setup is somehow broken that's not detected by deviceQuery Feb 19, 2017 at 15:16
  • Good idea, thanks. I'll try that and report it in the question body.
    – kik
    Feb 19, 2017 at 15:20
  • that output is suspiciously small, here's what I see when I run with VLOG=1 -- pastebin.com/LQF0j3Ri Feb 19, 2017 at 15:25
  • Yep, I've truncated it past the device selection, as the rest is probably irrelevant. Here is the full log : gist.github.com/oelmekki/25ea3b1186c2ee7aaa23448547bc23b2
    – kik
    Feb 19, 2017 at 15:26

8 Answers 8

80

From the log output, it looks like you are running the CPU version of TensorFlow (PyPI: tensorflow), and not the GPU version (PyPI: tensorflow-gpu). Running the GPU version would either log information about the CUDA libraries, or an error if it failed to load them or open the driver.

If you run the following commands, you should be able to use the GPU in subsequent runs:

$ pip uninstall tensorflow
$ pip install tensorflow-gpu
8
  • Oh, indeed. I followed this doc [1] while tensorflow was already installed, I wasn't aware it needed an other package. Thanks! -- [1] tensorflow.org/tutorials/using_gpu
    – kik
    Mar 8, 2017 at 10:26
  • 2
    Whenever I install tensorflow-gpu, it will reinstall tensorflow. Is this supposed to happen? I can't get it to detect my devices.. Aug 17, 2017 at 14:18
  • 1
    I have the same issue, and I follow your steps, "uninstall tensorlow" and "install tensorflow-gpu" I got this error : AttributeError: 'module' object has no attribute 'Session'
    – patti_jane
    Sep 30, 2017 at 14:29
  • Hallelujah, I've been beating my brains against getting my laptop GeForce 940MX working and the uninstall above worked. Would never have thought to try that. Sep 15, 2018 at 0:11
  • 8
    This answer is most likely deprecated in 2021.. any other solutions?
    – MJimitater
    Jan 8, 2021 at 15:24
28

None of the other answers here worked for me. After a bit of tinkering I found that this fixed my issues when dealing with Tensorflow built from binary:


Step 0: Uninstall protobuf

pip uninstall protobuf

Step 1: Uninstall tensorflow

pip uninstall tensorflow
pip uninstall tensorflow-gpu

Step 2: Force reinstall Tensorflow with GPU support

pip install --upgrade --force-reinstall tensorflow-gpu

Step 3: If you haven't already, set CUDA_VISIBLE_DEVICES

So for me with 2 GPUs it would be

export CUDA_VISIBLE_DEVICES=0,1
2
  • 2
    you save my day ! Oct 2, 2018 at 6:29
  • 1
    Glad to hear it :)
    – Mark Sonn
    Nov 10, 2018 at 4:53
18

In my case:

pip3 uninstall tensorflow

is not enough. Because when reinstall with:

pip3 install tensorflow-gpu

It is still reinstall tensorflow with cpu not gpu. So, before install tensorflow-gpu, I tried to remove all related tensor folders in site-packages uninstall protobuf, and it works!

For conclusion:

pip3 uninstall tensorflow

Remove all tensor folders in ~\Python35\Lib\site-packages

pip3 uninstall protobuf
pip3 install tensorflow-gpu
1
  • 1
    Typo suggestion : I suggest you use pip3 everywhere, in your post. In my case, i removed tensorboard, and tensorflow-1.3.0.dist-info from dist-packages and it unblocked this issue.
    – piercus
    Aug 20, 2017 at 22:36
9

Might seem dumb but a sudo reboot has fixed the exact same problem for me and a couple others.

2
  • Rebooting was all I needed too. ^^
    – Peque
    Jul 17, 2018 at 5:54
  • saved my day ^_^ Aug 2, 2019 at 22:26
2

The answer that saved my day came from Mark Sonn. Simply add this to .bashrc and source ~/.bashrc if you are on Linux:

export CUDA_VISIBLE_DEVICES=0,1

Previously I had to use this workaround to get tensorflow recognize my GPU:

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices(device_type="GPU")
tf.config.experimental.set_visible_devices(devices=gpus[0], device_type="GPU")
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)

Even though the code still worked, adding these lines every time is clearly not something I would want. My version of tensorflow was built from source according to the documentation to get v2.3 support CUDA 10.2 and cudnn 7.6.5.

If anyone having trouble with that, I suggest doing a quick skim over the docs. Took 1.5 hours to build with bazel. Make sure you have gcc7 and bazel installed.

0

This error may be caused by your GPU's compute capability, CUDA officially supports GPU's compute capability within 3.5 ~ 5.0, you can check here: https://en.wikipedia.org/wiki/CUDA

In my case, the error was like this:

Ignoring visible gpu device (device: 0, name: GeForce GT 640M, pci bus id: 0000:01:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.

For now we can only compile from source code on Linux (or mac OS) to break the '3.5~5.0' limit.

0

There are various system incompatible problems.

The requirement for libraries can vary from the version of TensorFlow.

During using python in interactive mode a lot of useful information is printing into stderr. What I suggest for TensorFlow with version 2.0 or more to call:

python3.8 -c "import tensorflow as tf; print('tf version:', tf.version); tf.config.list_physical_devices()"

After this command, you will observe missing libraries (or a version of it) for work with GPU in addition to requirements:

p.s. CUDA_VISIBLE_DEVICES should not have a real connection with TensorFlow, or it's more general - it's a way to customize available GPUs for all launched processes.

0

For anaconda users. I installed tensorflow-gpu via GUI using Anaconda Navigator and configured NVIDIA GPU as in tensorflow guide but tensorflow couldn't find the GPU anyway. Then I uninstalled tensorflow, always via GUI (see here) and reinstalled it via command line in an anaconda prompt issuing:

conda install -c anaconda tensorflow-gpu

and then tensorflow could find the GPU correctly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.