Conv2d checks raise an error with confusing message #1472

AnudeepKonda · 2017-05-04T17:28:49Z

RuntimeError: expected CPU tensor (got CUDA tensor)

It shows the above message when I give it a CPU tensor rather than a CUDA tensor. What I mean to say is, the error should have been

RuntimeError: expected CUDA tensor (got CPU tensor)

I get this on nn.parallel.data_parallel

apaszke · 2017-05-05T13:16:05Z

I've checked the code that produces this error message and it looks correct. Can you show me a snippet that throws it?

AnudeepKonda · 2017-05-05T14:42:47Z

Sorry, I can't. I don't have a snapshot of my code that gave the error. But, I solved it by using .cuda() on my tensor.

bkj · 2017-06-08T11:34:33Z

@apaszke -- Does this help?

import torch
from torch.autograd import Variable
from torchvision.models import vgg16

x = Variable(torch.zeros((1, 3, 224, 224)))
model = vgg16(pretrained=False)
model.cuda()
model(x)

throws

...
RuntimeError: expected CPU tensor (got CUDA tensor)

I'm very new to pytorch but it does sound backwards -- x works on the CPU model, and I don't know how it would've become a CUDA tensor. Perhaps the error message is not referring to x, but to something that touches x inside the model.

romaniukm · 2017-06-11T22:31:21Z

@bkj @apaszke This error seems to appear when you call .cuda on the module and the input tensor, but forget to use the value returned by the call on the tensor (rather than the original tensor, which stays on the cpu).

It seems that the error message is getting something wrong: should it say "expected CUDA tensor (got CPU tensor)"?

I find it somewhat confusing that calling .cuda on a module moves it to gpu wheareas calling .cuda on a tensor returns a new tensor but the original one stays on the cpu.

apaszke · 2017-06-11T23:01:23Z

Will fix. Yes, cuda() has different semantics for modules and tensors, but it's because modules are much more heavyweight objects that maintain a lot of state. It could have been emphasised more it the docs though.

jiankang1991 · 2017-06-27T08:37:49Z

I also came across the same issue.
If I run the code:

model = SimpleNet_2cls_2(img_shape, cls_num)
  
model = model.cuda()

It will have the error:
RuntimeError: expected CPU tensor (got CUDA tensor)
But If I run the code:

  model = SimpleNet_2cls_2(img_shape, cls_num)
  
#  model = model.cuda()
  
  model = torch.nn.DataParallel(model).cuda()

There is no error.
What is the difference here?

varunagrawal · 2017-07-07T21:20:00Z

It seems that calling .cuda() on the input Variable fixes this, but is a non-issue on VGG and Alexnet. The inconsistency is a tad disturbing.

anandbhattad · 2017-07-13T21:56:26Z

Yes, even I'am having the same issue. Can we expect some update on this?

bywbilly · 2017-07-18T18:17:50Z

I have also met the same situation.

state_action_values = self.model.forward(state_batch.cpu()).gather(1, action_batch)

this code snippet is the same as in the Pytorch DQN tutorial. When I run this I got:

RuntimeError: expected CPU tensor (got CUDA tensor)

If I run the snippet below, also got the same error message.

state_action_values = self.model.forward(state_batch).gather(1, action_batch)

varunagrawal · 2017-07-21T15:54:05Z

@bhattad2 & @bywbilly make sure you explicitly call .cuda() on your Variables. That should fix the issue and make it compatible with all models.

Dawny33 · 2017-08-31T08:08:22Z

As @varunagrawal said, adding .cuda() to all the Variables worked.

4F2E4A2E · 2017-10-11T12:11:52Z

@karlTUM and @varunagrawal it worked, thanks a lot!

jdhao · 2017-10-31T12:37:14Z

I have also met this issue, input.cuda() does not put Variable input on GPU, we have to explicitly assign the return value to input. But model.cuda() will put the model on GPU, which is a different behaviour.
I think it is better to make the two behaviour consistent.

hadware · 2017-11-13T21:51:11Z

I had the same issue. While the error was definitely justified, it did raise:
RuntimeError: expected CPU tensor (got CUDA tensor)
while the tensor I actually gave the model was a CPU tensor, and the model was on cuda. The error should have been the other way around.

Note: for me it was on a Conv1D, but I don't think this changes much since they all use ConvnD in the end.

nashory · 2017-11-14T03:30:46Z

I had the same issue, and solved thanks to @karlTUM.
It seems there is a bug when calling .cuda() on modules, so please try
model = torch.nn.DataParallel(model).cuda() instead.

plus2047 · 2018-01-08T15:36:27Z

Get the same issue. When the model get a CPU tensor but it's in GPU, the error said it need a CPU tensor but get a GPU tensor.

neerajprad · 2018-10-02T00:16:15Z

I am seeing this error for models when run with the JIT tracer and the default tensor type being a CUDA tensor. JITing on the CPU or running without JIT on CUDA, seems fine though. I am still trying to isolate a minimal failing example. Any debugging tips are appreciated.

bhack · 2018-11-21T14:53:28Z

@neerajprad Cause JIT load always go to CPU. You need always an explicit move #12710

neerajprad · 2018-11-30T05:06:35Z

@neerajprad Cause JIT load always go to CPU. You need always an explicit move #12710

Is that true for jit.trace too? I am not able to construct a minimal example where the tracer fails in these cases, but I have many larger models failing with: expected type CUDADoubleType but got CPUDoubleType (compute_types at /home/npradhan/workspace/pyro_dev/pytorch/aten/src/ATen/native/TensorIterator.cpp:134).

EDIT: Original issue - pyro-ppl/pyro#1419.

yf225 · 2019-11-07T16:49:57Z

@bkj I tried this on v1.3.1

import torch
from torchvision.models import vgg16

x = torch.zeros((1, 3, 224, 224))
model = vgg16(pretrained=False)
model.cuda()
model(x)

and it throws

...
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

so I think the error message has been improved to match what we expect.

yf225 · 2019-11-07T16:50:48Z

@neerajprad Curious were you able to construct a minimal example for jit.trace?

yf225 · 2019-11-07T16:52:00Z

@neerajprad I am closing this issue now because the error message has been improved and it was originally not related to JIT. Please feel free to open a new issue and link to this issue if you are able to construct a minimal example for jit.trace. Thanks!

* Enable some tests for complex * typo * format

… sync (pytorch#1455) (pytorch#1472) * [SWDEV-469514] hipGraphExecDestroy requires an explicit sync There is a new hip feature where they do not free hipGraph memory as soon as hipGraphExecDestroy is called. This is to support async work on the GPU. See this for more details: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects We noticed this issue when an allreduce op inside a hipGraph hung. Essentially, ncclCommAbort was waiting for all GPU activity to finish. However, since hipGraph memory was technically still in use, we had an infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's destructor to esure that memory is freed and got test_allreduce_in_cudagraph UT to pass. However, when I ran this on CUDA machine, I noticed that they did not require this extra sync in order to successfully run the UT. It seems that they were calling cudaGraphInstantiateWithFlags with cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees memory after graph lauch. There is support for this API in our ROCm stack, but we were missing cuda to hip mappings in PyTorch. So, I brought them in and added the necesary conditions to call this API in HIP case also. * Update comments * Use USE_ROCM in keeping with convention * Use USE_ROCM to match convention --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit e752b4f)

… sync (pytorch#1455) (pytorch#1472) * [SWDEV-469514] hipGraphExecDestroy requires an explicit sync There is a new hip feature where they do not free hipGraph memory as soon as hipGraphExecDestroy is called. This is to support async work on the GPU. See this for more details: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects We noticed this issue when an allreduce op inside a hipGraph hung. Essentially, ncclCommAbort was waiting for all GPU activity to finish. However, since hipGraph memory was technically still in use, we had an infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's destructor to esure that memory is freed and got test_allreduce_in_cudagraph UT to pass. However, when I ran this on CUDA machine, I noticed that they did not require this extra sync in order to successfully run the UT. It seems that they were calling cudaGraphInstantiateWithFlags with cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees memory after graph lauch. There is support for this API in our ROCm stack, but we were missing cuda to hip mappings in PyTorch. So, I brought them in and added the necesary conditions to call this API in HIP case also. * Update comments * Use USE_ROCM in keeping with convention * Use USE_ROCM to match convention --------- Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com> (cherry picked from commit e752b4f) (cherry picked from commit d6b8773)

* Add script to convert MIOpen driver to ckProfiler * Fix

apaszke added the needs reproduction label May 5, 2017

apaszke closed this as completed May 5, 2017

apaszke reopened this Jun 11, 2017

apaszke changed the title ~~Wrong error message~~ Conv2d checks raise an error with confusing message Jun 11, 2017

apaszke mentioned this issue Jul 25, 2017

RuntimeError: expected Double tensor (got Float tensor) #2138

Closed

yf225 added the triaged label Nov 7, 2019

yf225 closed this as completed Nov 7, 2019

yf225 removed the needs reproduction label Nov 7, 2019

Conv2d checks raise an error with confusing message #1472

Conv2d checks raise an error with confusing message #1472

Comments

AnudeepKonda commented May 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

apaszke commented May 5, 2017

Uh oh!

AnudeepKonda commented May 5, 2017

Uh oh!

bkj commented Jun 8, 2017

Uh oh!

romaniukm commented Jun 11, 2017

Uh oh!

apaszke commented Jun 11, 2017

Uh oh!

jiankang1991 commented Jun 27, 2017

Uh oh!

varunagrawal commented Jul 7, 2017

Uh oh!

anandbhattad commented Jul 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bywbilly commented Jul 18, 2017

Uh oh!

varunagrawal commented Jul 21, 2017

Uh oh!

Dawny33 commented Aug 31, 2017

Uh oh!

4F2E4A2E commented Oct 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdhao commented Oct 31, 2017

Uh oh!

hadware commented Nov 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nashory commented Nov 14, 2017

Uh oh!

plus2047 commented Jan 8, 2018

Uh oh!

neerajprad commented Oct 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhack commented Nov 21, 2018

Uh oh!

neerajprad commented Nov 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yf225 commented Nov 7, 2019

Uh oh!

yf225 commented Nov 7, 2019

Uh oh!

yf225 commented Nov 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AnudeepKonda commented May 4, 2017 •

edited

Loading

anandbhattad commented Jul 13, 2017 •

edited

Loading

4F2E4A2E commented Oct 11, 2017 •

edited

Loading

hadware commented Nov 13, 2017 •

edited

Loading

neerajprad commented Oct 2, 2018 •

edited

Loading

neerajprad commented Nov 30, 2018 •

edited

Loading

yf225 commented Nov 7, 2019 •

edited

Loading