Description
As I understand from the documentation, running sess.close()
is supposed to release the resources, but it doesn't. I have been running the following test:
with tf.Session() as sess:
with tf.device('/gpu:0'):
matrix1 = tf.constant([[3., 3]])
matrix2 = tf.constant([[2.], [2.]])
product = tf.matmul(matrix1, matrix2)
result = sess.run(product)
print(result)
This allocates all the free memory of gpu0, but it is not released when sess
is closed (both using a context manager as in the code above, but also when calling sess.close()
manually). The memory usage persists until that Python process is terminated. The way I have been checking memory usage is through nvidia-smi
, but I have also confirmed that other processes can't allocate that GPU memory until the process terminates (not the session closes). I would like to be able to free the resources and still keep the Python process running.
Environment info
I am running a 64-bit Linux (CentOS) with a computer that has two Tesla K40c (driver 346.46, CUDA 7.0). I installed the 0.7.1 tensorflow for Linux and Python 3.4 through pip. The output of tf.__version__
is 0.7.1.
Steps to reproduce
Simply running the code above should according to the document allocate and then release the memory. However, the GPU memory is still allocated and thus unusable by other processes. However, it can be re-used by the same Python process, meaning that I can re-run the snippet over and over as long as I do it from the same Python process.
Logs or other output that would be helpful
Here is a log of the session. At the end, the memory is still allocated. Note that another user is connected to both GPUs through Torch7, and is actively using gpu0.
In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
In [2]: with tf.Session() as sess:
with tf.device('/gpu:1'):
matrix1 = tf.constant([[3., 3]])
matrix2 = tf.constant([[2.], [2.]])
product = tf.matmul(matrix1, matrix2)
result = sess.run(product)
print(result)
...:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.25GiB
Free memory: 3.27GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:82:00.0
Total memory: 11.25GiB
Free memory: 11.05GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1: N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 10.50GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 1 memory begins at 0x4208f40000 extends to 0x44a8b030cd
[[ 12.]]
In [3]:
Activity
gustavla commentedon Apr 4, 2016
I just wanted to add that I have also tested this on the most recent master (de5101d) now and it is still a problem.
speakerjohnash commentedon Apr 6, 2016
I'm having this issue as well.
Both _opened and _closed come back as false
martinwicke commentedon Apr 7, 2016
This is a bug.
OscarDPan commentedon Apr 7, 2016
I am experiencing the same issue... thought I missed something...
Please look into this ASAP if it is a bug.
redpanda-ai commentedon Apr 17, 2016
Same problem here.
TimoFriedri commentedon May 17, 2016
I have the same problem here with version 0.8.0.
This is really anoying, because I have to kill the python kernel to free the resources!
sytham commentedon May 24, 2016
Same issue here with version 0.8.0rc0. In fact, the GPU memory isn't even released after shutting down the Python kernel. Running 64-bit Linux (CentOS) with 4 nVidia GRID K520 GPUs, Python 2.7.
In lieue of fixing the issue, a quick workaround could be to allow the user to free up the memory explicitly (of course fixing the issue would be preferable)
EDIT: this seems to only happen on GPU with ID 0. If I mask available GPUs through the env var CUDA_VISIBLE_DEVICES to not include 0, then all appears to go fine
yaroslavvb commentedon May 28, 2016
A negative side-effect of this is that you can't run all the tests with
bazel test -c opt --config=cuda tensorflow/...
A fraction of the tests (10-30%) fail with
CUDA_ERROR_OUT_OF_MEMORY
on my 4GB GTX 980.But then if I rerun any of the failing tests using separate
blaze test
, it works.vrv commentedon May 28, 2016
Each bazel test invocation is a separate process, so when the process exits, it does release the memory. In this case, bazel is running multiple tests in parallel. Use bazel test -j 1 to only run one at a time.
yaroslavvb commentedon May 28, 2016
Thanks @vrv, that fixed all the out-of-memory errors I had
zheng-xq commentedon Jun 13, 2016
As for the original problem, currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down. Even if a second session chooses a different GPUOptions, it would not take effect.
Changing this would be a fairly large change. We need to rethink how the device is initialized and how it interacts with the session, and therefore modify the current API. It is unlikely the TensorFlow team can get to this in the short term.
Marking it as contribution welcome. If anyone is interested, a design proposal could be discussed here, before proceeding to implementations.
recursionbane commentedon Mar 10, 2017
Is it possible to provide a dedicated subroutine we can call at the end of our sessions that will free up GPU resources and release GPU control back to the OS?
On Windows 7, no GPU-required programs can run after TensorFlow until the PC is restarted.
yaroslavvb commentedon Mar 10, 2017
@recursionbane it's supposed to release it all, although on Linux I've seen cases that GPU was rendered unusable until restart, and this was caused by NVidia driver problems
zheng-xq commentedon Mar 10, 2017
recursionbane commentedon Mar 11, 2017
I am on v369.30 for my Quadro M1000M.
I will upgrade to v376.84 to check if the problem is resolved.
SrivastavaKshitij commentedon May 3, 2017
I am using Ubuntu 16.04 with tensorflow 1.0 and NVIDIA Tesla K20Xm GPU (CUDA 8.0) . I am facing similar problems. Memory is not released after the session is over
vrv commentedon May 3, 2017
As mentioned here, the best idea anyone has is: #1727 (comment), saying to upgrade your NVIDIA drivers. Closing this and locking to make sure the current conclusion is easily found; if there is evidence that upgrading drivers does not solve the problem, we can open up a new bug!