Skip to content

GPU resources not released when session is closed #1727

Closed
@gustavla

Description

@gustavla
Contributor

As I understand from the documentation, running sess.close() is supposed to release the resources, but it doesn't. I have been running the following test:

with tf.Session() as sess:
    with tf.device('/gpu:0'):
        matrix1 = tf.constant([[3., 3]])
        matrix2 = tf.constant([[2.], [2.]])
        product = tf.matmul(matrix1, matrix2)
        result = sess.run(product)
        print(result)

This allocates all the free memory of gpu0, but it is not released when sess is closed (both using a context manager as in the code above, but also when calling sess.close() manually). The memory usage persists until that Python process is terminated. The way I have been checking memory usage is through nvidia-smi, but I have also confirmed that other processes can't allocate that GPU memory until the process terminates (not the session closes). I would like to be able to free the resources and still keep the Python process running.

Environment info

I am running a 64-bit Linux (CentOS) with a computer that has two Tesla K40c (driver 346.46, CUDA 7.0). I installed the 0.7.1 tensorflow for Linux and Python 3.4 through pip. The output of tf.__version__ is 0.7.1.

Steps to reproduce

Simply running the code above should according to the document allocate and then release the memory. However, the GPU memory is still allocated and thus unusable by other processes. However, it can be re-used by the same Python process, meaning that I can re-run the snippet over and over as long as I do it from the same Python process.

Logs or other output that would be helpful

Here is a log of the session. At the end, the memory is still allocated. Note that another user is connected to both GPUs through Torch7, and is actively using gpu0.

In [1]: import tensorflow as tf
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally

In [2]: with tf.Session() as sess:
    with tf.device('/gpu:1'):
        matrix1 = tf.constant([[3., 3]])
        matrix2 = tf.constant([[2.], [2.]])
        product = tf.matmul(matrix1, matrix2)
        result = sess.run(product)
        print(result)
   ...:
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:03:00.0
Total memory: 11.25GiB
Free memory: 3.27GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 1 with properties:
name: Tesla K40c
major: 3 minor: 5 memoryClockRate (GHz) 0.745
pciBusID 0000:82:00.0
Total memory: 11.25GiB
Free memory: 11.05GiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 0 to device ordinal 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:59] cannot enable peer access from device ordinal 1 to device ordinal 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 1
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y N
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 1:   N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla K40c, pci bus id: 0000:03:00.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:717] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla K40c, pci bus id: 0000:82:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00GiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 10.50GiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 1 memory begins at 0x4208f40000 extends to 0x44a8b030cd
[[ 12.]]

In [3]: 

Activity

gustavla

gustavla commented on Apr 4, 2016

@gustavla
ContributorAuthor

I just wanted to add that I have also tested this on the most recent master (de5101d) now and it is still a problem.

speakerjohnash

speakerjohnash commented on Apr 6, 2016

@speakerjohnash

I'm having this issue as well.

import tensorflow as tf
sess = tf.Session()
sess.close()
sess._closed
sess._opened

Both _opened and _closed come back as false

martinwicke

martinwicke commented on Apr 7, 2016

@martinwicke
Member

This is a bug.

OscarDPan

OscarDPan commented on Apr 7, 2016

@OscarDPan

I am experiencing the same issue... thought I missed something...
Please look into this ASAP if it is a bug.

redpanda-ai

redpanda-ai commented on Apr 17, 2016

@redpanda-ai

Same problem here.

TimoFriedri

TimoFriedri commented on May 17, 2016

@TimoFriedri

I have the same problem here with version 0.8.0.

This is really anoying, because I have to kill the python kernel to free the resources!

sytham

sytham commented on May 24, 2016

@sytham

Same issue here with version 0.8.0rc0. In fact, the GPU memory isn't even released after shutting down the Python kernel. Running 64-bit Linux (CentOS) with 4 nVidia GRID K520 GPUs, Python 2.7.

In lieue of fixing the issue, a quick workaround could be to allow the user to free up the memory explicitly (of course fixing the issue would be preferable)

EDIT: this seems to only happen on GPU with ID 0. If I mask available GPUs through the env var CUDA_VISIBLE_DEVICES to not include 0, then all appears to go fine

yaroslavvb

yaroslavvb commented on May 28, 2016

@yaroslavvb
Contributor

A negative side-effect of this is that you can't run all the tests with
bazel test -c opt --config=cuda tensorflow/...

A fraction of the tests (10-30%) fail with CUDA_ERROR_OUT_OF_MEMORY on my 4GB GTX 980.
But then if I rerun any of the failing tests using separate blaze test, it works.

vrv

vrv commented on May 28, 2016

@vrv

Each bazel test invocation is a separate process, so when the process exits, it does release the memory. In this case, bazel is running multiple tests in parallel. Use bazel test -j 1 to only run one at a time.

yaroslavvb

yaroslavvb commented on May 28, 2016

@yaroslavvb
Contributor

Thanks @vrv, that fixed all the out-of-memory errors I had

zheng-xq

zheng-xq commented on Jun 13, 2016

@zheng-xq
Contributor

As for the original problem, currently the Allocator in the GPUDevice belongs to the ProcessState, which is essentially a global singleton. The first session using GPU initializes it, and frees itself when the process shuts down. Even if a second session chooses a different GPUOptions, it would not take effect.

Changing this would be a fairly large change. We need to rethink how the device is initialized and how it interacts with the session, and therefore modify the current API. It is unlikely the TensorFlow team can get to this in the short term.

Marking it as contribution welcome. If anyone is interested, a design proposal could be discussed here, before proceeding to implementations.

removed their assignment
on Jun 13, 2016
added and removed on Feb 9, 2017
recursionbane

recursionbane commented on Mar 10, 2017

@recursionbane

Is it possible to provide a dedicated subroutine we can call at the end of our sessions that will free up GPU resources and release GPU control back to the OS?
On Windows 7, no GPU-required programs can run after TensorFlow until the PC is restarted.

yaroslavvb

yaroslavvb commented on Mar 10, 2017

@yaroslavvb
Contributor

@recursionbane it's supposed to release it all, although on Linux I've seen cases that GPU was rendered unusable until restart, and this was caused by NVidia driver problems

zheng-xq

zheng-xq commented on Mar 10, 2017

@zheng-xq
Contributor
recursionbane

recursionbane commented on Mar 11, 2017

@recursionbane

I am on v369.30 for my Quadro M1000M.
I will upgrade to v376.84 to check if the problem is resolved.

SrivastavaKshitij

SrivastavaKshitij commented on May 3, 2017

@SrivastavaKshitij

I am using Ubuntu 16.04 with tensorflow 1.0 and NVIDIA Tesla K20Xm GPU (CUDA 8.0) . I am facing similar problems. Memory is not released after the session is over

vrv

vrv commented on May 3, 2017

@vrv

As mentioned here, the best idea anyone has is: #1727 (comment), saying to upgrade your NVIDIA drivers. Closing this and locking to make sure the current conclusion is easily found; if there is evidence that upgrading drivers does not solve the problem, we can open up a new bug!

locked and limited conversation to collaborators on May 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @yaroslavvb@girving@redpanda-ai@aselle@vrv

        Issue actions

          GPU resources not released when session is closed · Issue #1727 · tensorflow/tensorflow