Skip to content

Conv2d checks raise an error with confusing message #1472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
AnudeepKonda opened this issue May 4, 2017 · 22 comments
Closed

Conv2d checks raise an error with confusing message #1472

AnudeepKonda opened this issue May 4, 2017 · 22 comments
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@AnudeepKonda
Copy link

AnudeepKonda commented May 4, 2017

RuntimeError: expected CPU tensor (got CUDA tensor)

It shows the above message when I give it a CPU tensor rather than a CUDA tensor. What I mean to say is, the error should have been

RuntimeError: expected CUDA tensor (got CPU tensor)

I get this on nn.parallel.data_parallel

@apaszke
Copy link
Contributor

apaszke commented May 5, 2017

I've checked the code that produces this error message and it looks correct. Can you show me a snippet that throws it?

@apaszke apaszke added the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label May 5, 2017
@AnudeepKonda
Copy link
Author

Sorry, I can't. I don't have a snapshot of my code that gave the error. But, I solved it by using .cuda() on my tensor.

@apaszke apaszke closed this as completed May 5, 2017
@bkj
Copy link

bkj commented Jun 8, 2017

@apaszke -- Does this help?

import torch
from torch.autograd import Variable
from torchvision.models import vgg16

x = Variable(torch.zeros((1, 3, 224, 224)))
model = vgg16(pretrained=False)
model.cuda()
model(x)

throws

...
RuntimeError: expected CPU tensor (got CUDA tensor)

I'm very new to pytorch but it does sound backwards -- x works on the CPU model, and I don't know how it would've become a CUDA tensor. Perhaps the error message is not referring to x, but to something that touches x inside the model.

@romaniukm
Copy link

@bkj @apaszke This error seems to appear when you call .cuda on the module and the input tensor, but forget to use the value returned by the call on the tensor (rather than the original tensor, which stays on the cpu).

It seems that the error message is getting something wrong: should it say "expected CUDA tensor (got CPU tensor)"?

I find it somewhat confusing that calling .cuda on a module moves it to gpu wheareas calling .cuda on a tensor returns a new tensor but the original one stays on the cpu.

@apaszke apaszke reopened this Jun 11, 2017
@apaszke apaszke changed the title Wrong error message Conv2d checks raise an error with confusing message Jun 11, 2017
@apaszke
Copy link
Contributor

apaszke commented Jun 11, 2017

Will fix. Yes, cuda() has different semantics for modules and tensors, but it's because modules are much more heavyweight objects that maintain a lot of state. It could have been emphasised more it the docs though.

@jiankang1991
Copy link

I also came across the same issue.
If I run the code:

model = SimpleNet_2cls_2(img_shape, cls_num)
  
model = model.cuda()

It will have the error:
RuntimeError: expected CPU tensor (got CUDA tensor)
But If I run the code:

  model = SimpleNet_2cls_2(img_shape, cls_num)
  
#  model = model.cuda()
  
  model = torch.nn.DataParallel(model).cuda()

There is no error.
What is the difference here?

@varunagrawal
Copy link
Contributor

It seems that calling .cuda() on the input Variable fixes this, but is a non-issue on VGG and Alexnet. The inconsistency is a tad disturbing.

@anandbhattad
Copy link

anandbhattad commented Jul 13, 2017

Yes, even I'am having the same issue. Can we expect some update on this?

@bywbilly
Copy link

I have also met the same situation.

state_action_values = self.model.forward(state_batch.cpu()).gather(1, action_batch)

this code snippet is the same as in the Pytorch DQN tutorial. When I run this I got:

RuntimeError: expected CPU tensor (got CUDA tensor)

If I run the snippet below, also got the same error message.

state_action_values = self.model.forward(state_batch).gather(1, action_batch)

@varunagrawal
Copy link
Contributor

@bhattad2 & @bywbilly make sure you explicitly call .cuda() on your Variables. That should fix the issue and make it compatible with all models.

@Dawny33
Copy link

Dawny33 commented Aug 31, 2017

As @varunagrawal said, adding .cuda() to all the Variables worked.

@4F2E4A2E
Copy link

4F2E4A2E commented Oct 11, 2017

@karlTUM and @varunagrawal it worked, thanks a lot!

@jdhao
Copy link

jdhao commented Oct 31, 2017

I have also met this issue, input.cuda() does not put Variable input on GPU, we have to explicitly assign the return value to input. But model.cuda() will put the model on GPU, which is a different behaviour.
I think it is better to make the two behaviour consistent.

@hadware
Copy link

hadware commented Nov 13, 2017

I had the same issue. While the error was definitely justified, it did raise:
RuntimeError: expected CPU tensor (got CUDA tensor)
while the tensor I actually gave the model was a CPU tensor, and the model was on cuda. The error should have been the other way around.

Note: for me it was on a Conv1D, but I don't think this changes much since they all use ConvnD in the end.

@nashory
Copy link

nashory commented Nov 14, 2017

I had the same issue, and solved thanks to @karlTUM.
It seems there is a bug when calling .cuda() on modules, so please try
model = torch.nn.DataParallel(model).cuda() instead.

@plus2047
Copy link

plus2047 commented Jan 8, 2018

Get the same issue. When the model get a CPU tensor but it's in GPU, the error said it need a CPU tensor but get a GPU tensor.

@neerajprad
Copy link
Contributor

neerajprad commented Oct 2, 2018

I am seeing this error for models when run with the JIT tracer and the default tensor type being a CUDA tensor. JITing on the CPU or running without JIT on CUDA, seems fine though. I am still trying to isolate a minimal failing example. Any debugging tips are appreciated.

@bhack
Copy link
Contributor

bhack commented Nov 21, 2018

@neerajprad Cause JIT load always go to CPU. You need always an explicit move #12710

@neerajprad
Copy link
Contributor

neerajprad commented Nov 30, 2018

@neerajprad Cause JIT load always go to CPU. You need always an explicit move #12710

Is that true for jit.trace too? I am not able to construct a minimal example where the tracer fails in these cases, but I have many larger models failing with: expected type CUDADoubleType but got CPUDoubleType (compute_types at /home/npradhan/workspace/pyro_dev/pytorch/aten/src/ATen/native/TensorIterator.cpp:134).

EDIT: Original issue - pyro-ppl/pyro#1419.

@yf225
Copy link
Contributor

yf225 commented Nov 7, 2019

@bkj I tried this on v1.3.1

import torch
from torchvision.models import vgg16

x = torch.zeros((1, 3, 224, 224))
model = vgg16(pretrained=False)
model.cuda()
model(x)

and it throws

...
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

so I think the error message has been improved to match what we expect.

@yf225
Copy link
Contributor

yf225 commented Nov 7, 2019

@neerajprad Curious were you able to construct a minimal example for jit.trace?

@yf225 yf225 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Nov 7, 2019
@yf225
Copy link
Contributor

yf225 commented Nov 7, 2019

@neerajprad I am closing this issue now because the error message has been improved and it was originally not related to JIT. Please feel free to open a new issue and link to this issue if you are able to construct a minimal example for jit.trace. Thanks!

@yf225 yf225 closed this as completed Nov 7, 2019
@yf225 yf225 removed the needs reproduction Someone else needs to try reproducing the issue given the instructions. No action needed from user label Nov 7, 2019
zasdfgbnm added a commit to zasdfgbnm/pytorch that referenced this issue Feb 23, 2022

Verified

This commit was signed with the committer’s verified signature.
vszakats Viktor Szakats
* Enable some tests for complex

* typo

* format
petrex pushed a commit to petrex/pytorch that referenced this issue Aug 29, 2024

Verified

This commit was signed with the committer’s verified signature.
vszakats Viktor Szakats
… sync (pytorch#1455) (pytorch#1472)

* [SWDEV-469514] hipGraphExecDestroy requires an explicit sync

There is a new hip feature where they do not free hipGraph memory
as soon as hipGraphExecDestroy is called. This is to support async
work on the GPU. See this for more details:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects

We noticed this issue when an allreduce op inside a hipGraph hung.
Essentially, ncclCommAbort was waiting for all GPU activity to finish.
However, since hipGraph memory was technically still in use, we had an
infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's
destructor to esure that memory is freed and got
test_allreduce_in_cudagraph UT to pass.

However, when I ran this on CUDA machine, I noticed that they did not
require this extra sync in order to successfully run the UT. It seems
that they were calling cudaGraphInstantiateWithFlags with
cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees
memory after graph lauch. There is support for this API in our ROCm
stack, but we were missing cuda to hip mappings in PyTorch. So, I
brought them in and added the necesary conditions to call this API in
HIP case also.

* Update comments

* Use USE_ROCM in keeping with convention

* Use USE_ROCM to match convention

---------

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
(cherry picked from commit e752b4f)
jagadish-amd pushed a commit to jagadish-amd/pytorch that referenced this issue Jan 14, 2025

Verified

This commit was signed with the committer’s verified signature.
vszakats Viktor Szakats
… sync (pytorch#1455) (pytorch#1472)

* [SWDEV-469514] hipGraphExecDestroy requires an explicit sync

There is a new hip feature where they do not free hipGraph memory
as soon as hipGraphExecDestroy is called. This is to support async
work on the GPU. See this for more details:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-user-objects

We noticed this issue when an allreduce op inside a hipGraph hung.
Essentially, ncclCommAbort was waiting for all GPU activity to finish.
However, since hipGraph memory was technically still in use, we had an
infinite hang. So, I added an extra hipDeviceSynchronize in CUDAGraph's
destructor to esure that memory is freed and got
test_allreduce_in_cudagraph UT to pass.

However, when I ran this on CUDA machine, I noticed that they did not
require this extra sync in order to successfully run the UT. It seems
that they were calling cudaGraphInstantiateWithFlags with
cudaGraphInstantiateFlagAutoFreeOnLaunch, which aggressively frees
memory after graph lauch. There is support for this API in our ROCm
stack, but we were missing cuda to hip mappings in PyTorch. So, I
brought them in and added the necesary conditions to call this API in
HIP case also.

* Update comments

* Use USE_ROCM in keeping with convention

* Use USE_ROCM to match convention

---------

Co-authored-by: Jithun Nair <37884920+jithunnair-amd@users.noreply.github.com>
(cherry picked from commit e752b4f)
(cherry picked from commit d6b8773)
akashveramd pushed a commit to akashveramd/pytorch that referenced this issue Apr 9, 2025

Verified

This commit was signed with the committer’s verified signature.
vszakats Viktor Szakats
* Add script to convert MIOpen driver to ckProfiler

* Fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests